[COLLECTIONS-878] MapUtils.invertMap improve HashMap construction#652
[COLLECTIONS-878] MapUtils.invertMap improve HashMap construction#652ytatichno wants to merge 1 commit intoapache:masterfrom
Conversation
|
Hello @ytatichno -1: You are confusing initial capacity and load factor. Step through the debugger for |
|
Thank you for your instant feedback! I fully agree that the constructor parameter of HashMap(int initialCapacity) is not the same as the table size and that load factor remains constant (0.75) regardless input map's loadFactor. The optimization I proposed is not related to the load factor value itself, but to avoiding a resize that can happen when the map is filled exactly up to its declared size. For example, when using new HashMap<>(map.size()), the internal threshold becomes (int)(map.size() * 0.75). Using Math.ceil(map.size() / 0.75d) as initial capacity avoids that resize and results in a table that is sized exactly to hold all entries without rehashing. This is also consistent with the JDK’s approach in HashMap.calculateHashMapCapacity (introduced in Java 19). Here it is: /**
* Calculate initial capacity for HashMap based classes, from expected size and default load factor (0.75).
*
* @param numMappings the expected number of mappings
* @return initial capacity for HashMap based classes.
* @since 19
*/
static int calculateHashMapCapacity(int numMappings) {
return (int) Math.ceil(numMappings / (double) DEFAULT_LOAD_FACTOR);
}
/**
* Creates a new, empty HashMap suitable for the expected number of mappings.
* The returned map uses the default load factor of 0.75, and its initial capacity is
* generally large enough so that the expected number of mappings can be added
* without resizing the map.
*
* @param numMappings the expected number of mappings
* @param <K> the type of keys maintained by the new map
* @param <V> the type of mapped values
* @return the newly created map
* @throws IllegalArgumentException if numMappings is negative
* @since 19
*/
public static <K, V> HashMap<K, V> newHashMap(int numMappings) {
if (numMappings < 0) {
throw new IllegalArgumentException("Negative number of mappings: " + numMappings);
}
return new HashMap<>(calculateHashMapCapacity(numMappings));
}
```
I can add a short test showing the extra resize if you think it would clarify my intentions. |
According to https://issues.apache.org/jira/projects/COLLECTIONS/issues/COLLECTIONS-878
Total summary:
Why
0.75d:We have extra resize for 50% of Maps. In example:
Let's inspect for maps with size on [4, 16].
To sum up:
For each power of two we have segment from 50% to 100%. Because 50% is previous power of two. And for each initial size from 50% to 75% (DEFAULT_LOAD_FACTOR) everything ok, but for sizes from 75% to 100% we have extra resize. So we have 50% of probability to have extra resize.