[Hacking the JVM] Unraveling the Mystery of Object.hashCode()’s Native Implementation

A few weeks ago, while enjoying my afternoon coffee, I found myself Browse the OpenJDK source code. As I explored, the Object class’s hashCode() method, a method I’ve used and customized countless times, caught my eye. Like many, I’d always assumed its default implementation was directly tied to the object’s memory address.

The public native int hashCode(); declaration piqued my curiosity. Since it was native, I thought a quick glance at its underlying C++ implementation would only take a few minutes. Those few minutes, however, turned into a much longer, fascinating exploration, which I’ve condensed into this short post.

The Curiosity Deepens

My initial searches mostly yielded discussions on how to correctly implement hashCode() and equals() in Java. While valuable, these didn’t answer my specific question about the JVM’s default native behavior.

Undeterred, I dove into the JVM codebase directly. After a few detours, I finally landed on synchronizer.cpp. The JVM team has done a wonderful job leaving insightful comments throughout the code, which proved invaluable in understanding the mechanics.

static inline intptr_t get_next_hash(Thread* current, oop obj) {
intptr_t value = 0;
if (hashCode == 0) {
// This form uses global Park-Miller RNG.
// On MP system we'll have lots of RW access to a global, so the
// mechanism induces lots of coherency traffic.
value = os::random();
} else if (hashCode == 1) {
// This variation has the property of being stable (idempotent)
// between STW operations. This can be useful in some of the 1-0
// synchronization schemes.
intptr_t addr_bits = cast_from_oop<intptr_t>(obj) >> 3;
value = addr_bits ^ (addr_bits >> 5) ^ GVars.stw_random;
} else if (hashCode == 2) {
value = 1; // for sensitivity testing
} else if (hashCode == 3) {
value = ++GVars.hc_sequence;
} else if (hashCode == 4) {
value = cast_from_oop<intptr_t>(obj);
} else {
// Marsaglia's xor-shift scheme with thread-specific state
// This is probably the best overall implementation -- we'll
// likely make this the default in future releases.
unsigned t = current->_hashStateX;
t ^= (t << 11);
current->_hashStateX = current->_hashStateY;
current->_hashStateY = current->_hashStateZ;
current->_hashStateZ = current->_hashStateW;
unsigned v = current->_hashStateW;
v = (v ^ (v >> 19)) ^ (t ^ (t >> 8));
current->_hashStateW = v;
value = v;
}

value &= markWord::hash_mask;
if (value == 0) value = 0xBAD;
assert(value != markWord::no_hash, "invariant");
return value;
}

Deciphering the Options

In essence, the JVM offers six distinct implementations for the default Object.hashCode() method. The chosen implementation is controlled via the -XX:hashCode=<id> JVM option. Let’s break down what each ID signifies:

Int CodeScheme NameDescription
0RandomRandom number
1Function on object memoryObject memory address manipulation
2Hardcoded valueHardcoded value of 1
3A sequenceUses a Global Sequence
4Object memory pointerRaw memory address
5Marsaglia’s xor-shift schemeUses Thread specific states

Once calculated, this identity hash code is then stored in the object’s header for subsequent uses, avoiding recalculation.

Further Exploration

My journey didn’t end with understanding the code. Refining my search to XX:hashCode opened up a treasure trove of content, including benchmarks and its fascinating interaction with biased locking. Rather than reproducing that excellent content, I’ve compiled some references for your further exploration:

References

Leave a Reply

Your email address will not be published. Required fields are marked *