Open Source Support Tools
 
Search Item
 
Summary
  Reported Issue
Title: [COLLECTIONS-266] Issue with MultiKey when serialized/deserialized via RMI
Project: collections
Item Last Modified: Thu, 29 May 2008 23:24:13 -0700 (PDT)
Tags:  
 
 
1.5 Bug add added adding apache apache-directory attach attaching boolean called case change ci class collections committed compatibility contract current deal deserialization deserialized enum enums equal equals final fix fixed fixes hashcode hashmap href imho implement implementation integer iterator java javadoc joerg jvm key keyset keyword makes making map2 method methods multikey natural objects patch public put remain remove return sending serialization serialized serialversionuid sf sf-clirr something src svn test testallpackages tests that this transient transmitting unit
Details
[COLLECTIONS-266] Issue with MultiKey when serialized/deserialized via RMI
Reporter:   Julien Buret
Created:   Tue, 11 Sep 2007 14:02:35 -0700 (PDT)
Updated:   Thu, 29 May 2008 23:24:13 -0700 (PDT)
Key:   COLLECTIONS-266
Versions:   Not provided
Environment:  
Priority:   4
Status:   Closed
Resolution:   Fixed
Original Link:   http://issues.apache.org/jira/browse/COLLECTIONS-266
Summary:   Issue with MultiKey when serialized/deserialized via RMI
Description:
This is because the hash code of MultiKey is calculated only once.

<p>So if the MultiKey is deserialized in an other jvm, and if one at least of the subkeys defines its hash code with System.identityHashCode() (for example all the enums does), then the hash code of the MultiKey is no longer valid, and you can't retreive the key in your Map.</p>

<p>I fixed it by making the cached hash code field transient, and by recalculating the hash code during deserialization. </p>
Comments:
jburet Tue, 11 Sep 2007 14:04:39 -0700 (PDT)
Here is the updated source file.
Hope this help.
bayard Wed, 12 Sep 2007 10:21:09 -0700 (PDT)
Julien's fix as a patch.
bayard Wed, 12 Sep 2007 10:24:02 -0700 (PDT)
The patch seems good to me. Need to make a unit test and then apply.
bayard Wed, 12 Sep 2007 16:44:13 -0700 (PDT)
While creating the test I realized that the use case that found this problem seems quite rare.


  • MultiKey goes into Map.

  • Map gets sent through serialize/deserialize.

  • We have a new map, with a new multikey inside, with a new object inside that, and the multikey has based its hashCode on the old address of the object and not the new one.




If you dig that object out of the map, and then use it to try and get itself back out of the map; then you've got a problem. So definitely a bug of sorts.



But how did you get access to the object in the first place?



In my unit test, I transfer the Map, then have to get the key back out of the map to then use in a get request. ie:



MultiKey mk2 = (MultiKey) map2.keySet().iterator().next();
assertEquals(TWO, map2.get(mk2));



I find that the test passes for both the old code and your new code. Any idea what I'm doing differently?

bayard Wed, 12 Sep 2007 16:44:57 -0700 (PDT)
Attaching the unit test.
jburet Thu, 13 Sep 2007 00:56:21 -0700 (PDT)
I have updated the test and now it fails before the patch and is successful after.

In the test, the hash code of KEY_266 depends of the current (simulated) JVM (like System.identityHashCode() ).



HTH

bayard Thu, 13 Sep 2007 16:15:35 -0700 (PDT)
Thanks Julien.

Digging into it, I was a bit confused by the isJVM1 flag as it makes both the deserialized object and the TEST_266 object have the same hashCode. Then I realized that's probably how enums work, so fits your use case above.



I think this is a special case of a bigger and simpler unit test that uses the natural hashCode of the object (ie: same as System.identityHashCode). The current patch fails for that unit test. I'll attach the test for your thoughts.

joehni Thu, 13 Sep 2007 23:14:09 -0700 (PDT)
IMHO the key simply violates the contract. Extract from Javadoc to Object.hashCode:


  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result




It also states:




  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.




Without testing it, but if you use this key not as part of a MultiKey, but of a HashMap directly, you might get the same results.

jburet Fri, 14 Sep 2007 00:20:45 -0700 (PDT)
For Joerg: Here is the code of equals() and hashCode() methods of class Enum in the sun 1.5 jvm:


public final boolean equals(Object other) {
return this==other;
}



public final int hashCode() {
return System.identityHashCode(this);
}



I think (and I hope ) that the class Enum does not violate the hashCode contract - but you can see that the same enum will not have the same hashCode in two different jvms. The conclusion is : never serialize the hashCode (at least for a modular class like MultiKey).



And the HashMap will work fine in this case, because in its writeObject() and readObject() methods, the hashCode of each key is not serialized/deserialized: only the key, the value and the size of the map are serialized: It works, I have tested it.



Sorry for the multiple edits, but what I would like is to underline this sentence of the hashCode contract : "This integer need not remain consistent from one execution of an application to another execution of the same application."

joehni Fri, 14 Sep 2007 01:51:39 -0700 (PDT)
Well, this problem with Enums has a history:




However, in the end you're right and the hashCode should not have been stored in the MultiKey in this way. We might solve this by adding a readResolve method:



private Object readResolve() {
return new MultiKey(keys, false);
}



that way we create a new MultiKey with the correct hashCode. Your solution with the transient member will break the serialization compatibility, since you changed the binary layout. Therefore the hashCode member must be serialized - otherwise you have to change also the serialVersionUID. But with a private calculateHashCode method and setting the hashCode member not to final, we can implement readResolve different:



private Object readResolve() {
calculateHashCode();
return this;
}



But our clirr report may still choke about the final.




  • Jörg

jburet Fri, 14 Sep 2007 02:49:36 -0700 (PDT)
If the final modifier is a problem, an other solution could be to add a transient field "hashCode2" and no longer use the old field "hashCode " in the class (just keep it for compatibility).

There is no mention of the final keyword in the serialization spec about compatible or incompatible changes:



http://java.sun.com/javase/6/docs/platform/serialization/spec/version.html#5172

joehni Fri, 14 Sep 2007 02:57:16 -0700 (PDT)
Ah, well, the "final" modifier was meant as problem for binary compatibility itself, not for binary serialization compatibility
jburet Fri, 14 Sep 2007 03:12:51 -0700 (PDT)
Ah ok Well then it should be ok:

http://java.sun.com/docs/books/jls/second_edition/html/binaryComp.doc.html#45154

scolebourne Mon, 17 Sep 2007 01:30:42 -0700 (PDT)
Serialization is actually quite clever. You can change a field to transient, and keep the same serialVersionUID without a problem IIRC. And in this case, it doesn't matter if the serialVersionUID is changed, as the current code is broken.
bayard Tue, 18 Sep 2007 11:56:00 -0700 (PDT)
So... summarizing:


  • We want it to remain broken for normal natural hashCodes, as not keeping those in line with the spec is broken.

  • We want to fix it for enums though, as they are special - and Julien's test case is good there because it models the specialness with the isJVM1 flag.

  • Use Julien's fix because the move to transient doesn't break compat.




Stephen/Joerg???

joehni Wed, 19 Sep 2007 01:10:05 -0700 (PDT)
Patch to improve test case by simulating Enum behaviour.
joehni Wed, 19 Sep 2007 01:21:51 -0700 (PDT)
1/ You cannot fix natural hash codes in general. It works for Enums since they use always the same instance in the same VM.
2/ I've added a test case that does something similar ... I missed Julian's TC, but yes, that test would be sufficient also
3/ I had complaints against adding the "transient". I can run the TestAllPackages suite though and I assume (although I did not find where) that it also contains compatibility tests for serialization, since there are such objects in the data/test directory. This would prove Stephen's comment right that Java serialization can deal with the situation - at least in one direction. But I doubt it will work in the other direction i.e. an old version of CC can read such a serialized object. Therefore I'd simply remove the final. And IMHO it matters if the serialVersionUID changes, since the current code is only broken for a special use case
bayard Tue, 13 May 2008 00:22:52 -0700 (PDT)
Joerg - do you remember enough of this issue to put together a patch for your alternative fix?
joehni Tue, 13 May 2008 04:01:54 -0700 (PDT)
I've added a new patch that combines all changes for the main source and the test case.
bayard Thu, 29 May 2008 23:24:13 -0700 (PDT)
svn ci -m "Applying Joerg's final patch from COLLECTIONS-266, including the unit test that shows the problem and fixes the problem by making the hashcode transient, and moving the hashcode implementation such that it can be called from the deserialization as well as the hashcode method" src

Sending src/java/org/apache/commons/collections/keyvalue/MultiKey.java
Sending src/test/org/apache/commons/collections/keyvalue/TestMultiKey.java
Transmitting file data ..
Committed revision 661577.