COLLECTIONS-803: prevent duplicate call to convertKey on put for CaseInsensitiveMap #276

Simulant87 · 2022-02-03T20:08:38Z

Improves the performance by re-using the once converted key again when creating a new entry.
Explained in more detail in https://issues.apache.org/jira/browse/COLLECTIONS-803

…InsensitiveMap

garydgregory · 2022-02-03T21:00:59Z

Hello @Simulant87
You'll need to add a test case to this PR that shows at least one failing test when run without the main changes.

Simulant87 · 2022-02-14T09:04:19Z

@garydgregory Thank you for the feedback.
I added a test that verifies the convertKey method is only called once, which fails currently on the master branch without the production code changes provided.
I kindly request another review. Please let me know if there is anything else missing.

freya022 · 2022-02-24T18:07:51Z

I may also suggest changing the convertKey method to use key.toString().toLowerCase(Locale.ROOT) to convert the non-null key, instead of the char loop.

Benefits include better performance (measured a 2 times increase on my pc), and this is more fitted for the use case, as per the Character#toLowerCase documentation:

In general, String.toLowerCase() should be used to map characters to lowercase. 
String case mapping methods have several benefits over Character case mapping methods. 
String case mapping methods can perform locale-sensitive mappings, context-sensitive mappings, and 1:M character mappings, whereas the Character case mapping methods cannot

Simulant87 · 2022-02-24T19:51:00Z

@freya022 thank you for your suggestion. This sounds like another high performance improvement.
However it also sounds like the behaviour might slightly change in some edge cases. Something I expect the maintainers want to avoid and could cause a longer discussion. Therefore I would like to keep your suggestion separated from my changes. As my implementation does not cahnge the behaviour of the key conversion, it should be simpler to review and approve. I encourage your to create a ticket in Jira and create a separate pull request for that change.

Simulant87 · 2022-02-24T19:55:05Z

Hello @kinow,
I saw that you are able to merge in this repository in another recent PR. Would you also like to review my suggested changes in this PR, please?

kinow · 2022-02-24T20:52:07Z

Hello @kinow, I saw that you are able to merge in this repository in another recent PR. Would you also like to review my suggested changes in this PR, please?

Sorry, busy with other pull requests & issues 😥 🙇 code looks good from a brief look, but would need more time to really understand the change and confirm it's a good improvement 👍

…sensitivemap-put

codecov-commenter · 2022-05-10T20:39:24Z

Codecov Report

Merging #276 (2193363) into master (1677dac) will decrease coverage by 4.66%.
The diff coverage is 92.96%.

@@             Coverage Diff              @@
##             master     #276      +/-   ##
============================================
- Coverage     85.87%   81.21%   -4.66%     
+ Complexity     4676     4609      -67     
============================================
  Files           292      288       -4     
  Lines         13469    13442      -27     
  Branches       1955     1984      +29     
============================================
- Hits          11566    10917     -649     
- Misses         1326     1932     +606     
- Partials        577      593      +16

Impacted Files	Coverage Δ
.../org/apache/commons/collections4/ClosureUtils.java	`100.00% <ø> (ø)`
...g/apache/commons/collections4/ComparatorUtils.java	`81.48% <0.00%> (+2.91%)`	⬆️
...rg/apache/commons/collections4/FluentIterable.java	`100.00% <ø> (ø)`
...org/apache/commons/collections4/IterableUtils.java	`96.62% <0.00%> (ø)`
...org/apache/commons/collections4/IteratorUtils.java	`92.47% <0.00%> (ø)`
...org/apache/commons/collections4/MultiMapUtils.java	`60.00% <ø> (ø)`
...rg/apache/commons/collections4/PredicateUtils.java	`83.87% <ø> (ø)`
...java/org/apache/commons/collections4/SetUtils.java	`72.97% <ø> (ø)`
...ache/commons/collections4/bag/SynchronizedBag.java	`66.66% <ø> (-25.00%)`	⬇️
...ommons/collections4/bag/SynchronizedSortedBag.java	`25.00% <ø> (ø)`
... and 94 more

... and 22 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Simulant87 · 2022-12-14T07:49:45Z

@garydgregory May I request another review, to get my PR merged? I think the PR is complete with a test covering the new code, no conflicts to the main branch, and the pipeline is green.

…InsensitiveMap

…put' of https://github.com/Simulant87/commons-collections into COLLECTIONS-803-improve-performance-caseinsensitivemap-put

Simulant87 · 2023-03-22T20:48:54Z

@Claudenw I saw you were active on other pull requests. Could you please review my pull request here as well? I just updated it to resolve a conflict.

Claudenw

Other than the one issue, it looks good to me.

src/test/java/org/apache/commons/collections4/map/CaseInsensitiveMapTest.java

aherbert

The code for this is fine but the javadoc has all been copied from the parent class and a lot is not relevant. Can you update it to describe what the methods are actually doing.

src/main/java/org/apache/commons/collections4/map/CaseInsensitiveMap.java

src/test/java/org/apache/commons/collections4/map/CaseInsensitiveMapTest.java

in order to allow overwrites by subclasses. Updated JavaDoc and parameter names to point out required key conversion.

Simulant87 · 2023-03-23T20:32:49Z

Thank you @aherbert and @Claudenw for your valuable comments.
I addressed them all with my recent changes. I decided to make my private methods protected now to grant sub-classes the control to fine tune the implementation with single overwrites.

aherbert · 2023-03-26T22:22:45Z

I have put a comment on the jira ticket about this change. I believe the best approach is not to add new protected (or private methods) but to override the minimum of the current API to achieve the functionality. Note that any version of this improvement would be a behavioural change for any derived classes. The private approach can only be circumvented by reimplementing the public put method that calls new private methods. The protected approach will pollute the public API. Both can be circumvented by downstream derived classes but it does require a code update to workaround the new functionality.

The question remains as to whether breaking functional compatibility in derived classes is an acceptable change for the performance benefit. Downstream users who have extended this class would have to check if the change breaks their code and update it as appropriate. Any users of the CaseInsensitiveMap class would see a performance improvement.

Claudenw · 2023-03-29T16:53:34Z

@aherbert I am torn here. I think that the changes proposed for CaseInsensitiveMap should be applied to AbstractHashedMap as any class that overrides convertKey will benefit. However, this means changes to addMapping and createEntry

The changes proposed are, within themselves, clean. However, changing the original implementations would be a breaking change for anyone that has overridden them. My thought was to modify AbstractHashedMap.put() to work as the proposed CaseInsensitiveMap.put() works. But this introduces 2 new methods to AbstractHashedMap. I don't like the proposed names but I also don't see any standardised names that make sense though perhaps then names _addMapping and _createEntry would work. We could then mark addMapping and createEntry for deprecation.

What is your opinion?

aherbert · 2023-03-29T22:11:08Z

If you modify AbstractHashMap to just use the converted key in put:

    protected void addMapping(final int hashIndex, final int hashCode, final K key, final V value) {
        addConvertedMapping(hashIndex, hashCode, convertKey(key), value);
    }

    protected void addConvertedMapping(final int hashIndex, final int hashCode, final Object convertedKey, final V value) {
        modCount++;
        final HashEntry<K, V> entry = createConvertedEntry(data[hashIndex], hashCode, convertedKey, value);
        addEntry(entry, hashIndex);
        size++;
        checkCapacity();
    }

    protected HashEntry<K, V> createConvertedEntry(final HashEntry<K, V> next, final int hashCode, final Object convertedKey, final V value) {
        return new HashEntry<>(next, hashCode, convertedKey, value);
    }

With this change the existing createEntry method is not called in the codebase. The LRUMap and MultiKeyMap call addMapping.

Any child classes that do override convertKey will inherit the performance change from put. Any classes that use addMapping will not see a performance change if they have converted the key already (or do not have to).

Since this will not enhance the performance of any of the classes in Collections other than CaseInsensitiveMap then I do not see the point of adding more methods to the API and deprecating current ones. I think my suggestion on the Jira ticket to simply cache the converted key privately in CaseInsensitiveMap is the most conservative change. The new field to cache the converted key can be marked as transient to avoid serialisation issues. It just needs to be created before deserialisation by overriding doReadObject.

aherbert · 2023-03-29T23:57:03Z

On further thought, caching the key conversion is not always going to work. The conversion must be done for each new call to the map with a key as it may be mutable.

This test works on the current implementation. It fails when the conversion is cached as the StringBuilder object is the same, even though it now is a different converted key:

@Test
public void testCache() {
    CaseInsensitiveMap<Object, Integer> map = new CaseInsensitiveMap<>();
    Assertions.assertNull(map.get("foo"));
    StringBuilder sb = new StringBuilder();
    map.put(sb, 1);
    sb.append("foo");
    Assertions.assertNull(map.get(sb)); // fails with a cached conversion as the map returns 1
}

Note that adding mutable items, changing them, and adding again is not a typical use case. But for the CaseInsensitiveMap it can be as all keys are stored converted.

So this leads us back to adding more methods to reuse the key converted in put when creating a new entry. Wherever this change is introduced, it will potentially be a functionally breaking change as methods that could have been overridden may no longer be used. So the minimum breaking change would be to make the change in CaseInsensitiveMap. Only an external library that extends this class, and overrides put, addMapping and/or createEntry will be affected. Adding new methods to AbstractHashMap to replace those which duplicate key conversion may reveal more issues when this path is followed. For example note that AbstractLinkedMap also calls convertKey in an override of createEntry, but it doesn't override put. So this would be affected.

Currently I am not convinced that this performance enhancement can be cleanly integrated with the current code. It would be easier for an end user to just extend the map, cache key conversion and not deliberately break it with mutable keys. This should be done if put is the largest part of the performance overhead of using the map.

garydgregory · 2023-03-30T09:54:20Z

If the claim is that this change is for performance, then a JMH test to back up that claim would help.

aherbert · 2023-03-30T12:39:35Z

If the claim is that this change is for performance, then a JMH test to back up that claim would help.

I posted a performance table for a quick JMH benchmark in the Jira ticket. I can add the benchmark to git master if required. It adds JMH to the pom to be run in a profile as per Commons Text. The benchmark must run in the same package for the various implementations to have access to package-private data structures.

Claudenw · 2023-03-31T10:02:43Z

I think that we can make a change work for AbstractHashedMap.

If we simply change line 286 in the AbstractHashedMap.put() and inline the addMapping() call using the code proposed in this pull request replacing the call to createEntry() with an inline creation of the Entry object using the already converted key.

This has the advantages of

Calling the conversion once.
Improving the performance of implementations that do not override the put() method.
Does not impact overriding implementations unless the overriding class implements addMapping() or createEntry() and does not override put().

There is a possible implementation using a ThreadLocal variable to store the key and and override createEntry() to check for it.

aherbert · 2023-03-31T19:36:24Z

I think that we can make a change work for AbstractHashedMap.

This is another variant of having to second guess what a user may have overridden. If we ignore anything outside of Collections (i.e. user code that has extended classes) then we have to support what we have. Some classes extend AbstractHashedMap and override put (e.g. AbstractReferenceMap); some override createEntry (AbstractLinkedMap). AbstractLinkedMap even overrides createEntry where the convertKey method is called the second time. There does not seem to be an easy point at which to insert new code that avoids a duplicate conversion.

Improving the performance of implementations that do not override the put() method.

Performance improvement is negligible for Collections as no map overrides convertKey except CaseInsensitveMap. So changing a lot of the library to support unknown downstream users may be fixing a non-issue.

There is a possible implementation using a ThreadLocal variable

IIUC ThreadLocal is not relevant for this as the map is not thread safe for put anyway. Any cache implementation that does not call convertKey and reuses an existing conversion would have to be robust to clearing its own cache. Otherwise you can get a stale converted key for a mutable key object.

Currently I would favour the documentation of how to implement a class that perform convertKey using a cache. The user can then be made aware of downsides to possible implementations and can choose the best method for their use case.

COLLECTIONS-803: prevent duplicate call to convertKey on put for Case…

3ff965d

…InsensitiveMap

Simulant87 added 2 commits February 3, 2022 23:16

COLLECTIONS-803: add test verify convertKey is called only once

065dd8b

COLLECTIONS-803: add unchecked suppression to new test case

c4cb26e

Simulant87 added 2 commits May 10, 2022 22:18

Merge branch 'master' into COLLECTIONS-803-improve-performance-casein…

d33ddb0

…sensitivemap-put

[COLLECTIONS-803] add JUnit Test annotation to new test case

3ac374a

Simulant87 added 6 commits March 22, 2023 20:50

COLLECTIONS-803: prevent duplicate call to convertKey on put for Case…

70e7b29

…InsensitiveMap

COLLECTIONS-803: add test verify convertKey is called only once

ce8d7ba

COLLECTIONS-803: add unchecked suppression to new test case

ae40690

[COLLECTIONS-803] add JUnit Test annotation to new test case

651d7a3

[COLLECTIONS-803] init data on mock

7cf07ca

Merge branch 'COLLECTIONS-803-improve-performance-caseinsensitivemap-…

2193363

…put' of https://github.com/Simulant87/commons-collections into COLLECTIONS-803-improve-performance-caseinsensitivemap-put

Simulant87 changed the base branch from master to collections March 22, 2023 20:44

Simulant87 changed the base branch from collections to master March 22, 2023 20:45

Claudenw requested changes Mar 23, 2023

View reviewed changes

src/test/java/org/apache/commons/collections4/map/CaseInsensitiveMapTest.java Outdated Show resolved Hide resolved

aherbert reviewed Mar 23, 2023

View reviewed changes

Simulant87 commented Mar 23, 2023

View reviewed changes

src/test/java/org/apache/commons/collections4/map/CaseInsensitiveMapTest.java Outdated Show resolved Hide resolved

Simulant87 and others added 2 commits March 23, 2023 12:35

COLLECTIONS-803: parameterise generics in CaseInsensitiveMapTest

7f7b52f

[COLLECTIONS-803] make private methods protected for CaseInsensitiveMap

1134908

in order to allow overwrites by subclasses. Updated JavaDoc and parameter names to point out required key conversion.

Simulant87 requested review from Claudenw and removed request for Claudenw March 23, 2023 20:32

Simulant87 requested review from aherbert and Claudenw and removed request for aherbert and Claudenw March 23, 2023 20:33

Simulant87 requested review from Claudenw and removed request for aherbert March 28, 2023 19:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

COLLECTIONS-803: prevent duplicate call to convertKey on put for CaseInsensitiveMap #276

COLLECTIONS-803: prevent duplicate call to convertKey on put for CaseInsensitiveMap #276

Simulant87 commented Feb 3, 2022 •

edited

Loading

garydgregory commented Feb 3, 2022

Simulant87 commented Feb 14, 2022

freya022 commented Feb 24, 2022

Simulant87 commented Feb 24, 2022

Simulant87 commented Feb 24, 2022

kinow commented Feb 24, 2022

codecov-commenter commented May 10, 2022 •

edited

Loading

Simulant87 commented Dec 14, 2022

Simulant87 commented Mar 22, 2023

Claudenw left a comment

aherbert left a comment •

edited

Loading

Simulant87 commented Mar 23, 2023

aherbert commented Mar 26, 2023

Claudenw commented Mar 29, 2023

aherbert commented Mar 29, 2023

aherbert commented Mar 29, 2023

garydgregory commented Mar 30, 2023

aherbert commented Mar 30, 2023

Claudenw commented Mar 31, 2023

aherbert commented Mar 31, 2023

COLLECTIONS-803: prevent duplicate call to convertKey on put for CaseInsensitiveMap #276

Are you sure you want to change the base?

COLLECTIONS-803: prevent duplicate call to convertKey on put for CaseInsensitiveMap #276

Conversation

Simulant87 commented Feb 3, 2022 • edited Loading

garydgregory commented Feb 3, 2022

Simulant87 commented Feb 14, 2022

freya022 commented Feb 24, 2022

Simulant87 commented Feb 24, 2022

Simulant87 commented Feb 24, 2022

kinow commented Feb 24, 2022

codecov-commenter commented May 10, 2022 • edited Loading

Codecov Report

Simulant87 commented Dec 14, 2022

Simulant87 commented Mar 22, 2023

Claudenw left a comment

Choose a reason for hiding this comment

aherbert left a comment • edited Loading

Choose a reason for hiding this comment

Simulant87 commented Mar 23, 2023

aherbert commented Mar 26, 2023

Claudenw commented Mar 29, 2023

aherbert commented Mar 29, 2023

aherbert commented Mar 29, 2023

garydgregory commented Mar 30, 2023

aherbert commented Mar 30, 2023

Claudenw commented Mar 31, 2023

aherbert commented Mar 31, 2023

Simulant87 commented Feb 3, 2022 •

edited

Loading

codecov-commenter commented May 10, 2022 •

edited

Loading

aherbert left a comment •

edited

Loading