Open Source Support Tools
 
Search Item
 
Summary
  Reported Issue
Title: [COLLECTIONS-302] CollectionUtils.subtract() should not use ArrayList to improve speed
Project: collections
Item Last Modified: Mon, 3 Nov 2008 08:36:11 -0800 (PST)
Tags:  
 
 
Bug apache apache-basic args arraylist called case changing class closing collection collections collectionutils constructor currenttimemillis final fix hasnext implementation iterator linkedlist list org parseint println public references return src t1 t2 test testcase tests util
Details
[COLLECTIONS-302] CollectionUtils.subtract() should not use ArrayList to improve speed
Reporter:   Joachim Rudolph
Created:   Wed, 13 Aug 2008 04:00:05 -0700 (PDT)
Updated:   Mon, 3 Nov 2008 08:36:11 -0800 (PST)
Key:   COLLECTIONS-302
Versions:   Not provided
Environment:  
Priority:   4
Status:   Closed
Resolution:   Won't Fix
Original Link:   http://issues.apache.org/jira/browse/COLLECTIONS-302
Summary:   CollectionUtils.subtract() should not use ArrayList to improve speed
Description:
The implementation of version 3.2.1 is
public static Collection subtract(final Collection a, final Collection b) {
ArrayList list = new ArrayList( a );
for (Iterator it = b.iterator(); it.hasNext()<img class="emoticon" src="https://issues.apache.org/jira/images/icons/emoticons/wink.gif" height="20" width="20" align="absmiddle" alt="" border="0"/> {
list.remove(it.next());
}
return list;
}
when a and b are large and similar the subtract implementation will call ArrayList.remove() frequently which copies a potentially large part of the list using system.arraycopy.

<p>Suggestion : use LinkedList ( at least for large lists )</p>
Comments:
bayard Thu, 23 Oct 2008 20:58:48 -0700 (PDT)
Ran a basic test and interestingly ArrayList came out better.


import java.util.*;
import org.apache.commons.collections.*;

public class Bob {

public static void main(String[] args) throws Exception {
test(Integer.parseInt(args[0]));
}
public static void test(int n) {
Collection a = new ArrayList();
for(int i=0; i a.add("bob"+i);
}
Collection b = new ArrayList();
for(int i=0; i b.add("bob"+i);
}
long t1 = System.currentTimeMillis();
CollectionUtils.subtract(a, b);
long t2 = System.currentTimeMillis();
System.err.println("T" + n + ": " + (t2-t1));
}

}



For an input of 10,000, both were around 550 msec. For 100,000 the ArrayList was 58000, while the LinkedList was 84000. Hardly scientific as I'm not repeating the test in the same run so could be missing out on JIT improving a second run, not running multiple times etc. My suspicion is that the ArrayList constructor checks to see if things are ArrayLists and does quick arraycopies, while the LinkedList constructor just sits and plods along. I retested by changing the input to LinkedLists from ArrayLists and the time doubled up to 102000. Of course when I try LinkedList passing in to the LinkedList variant, it goes up to 125000. Ah well.



Point of all that - apart from implying that more testing is needed - is that the collection type used might want to depend on the type of the 'a' variable.

joarud Fri, 24 Oct 2008 04:48:59 -0700 (PDT)
I did some tests myself, I just underestimated the speed of System.arraycopy()

In my problem the resulting collection would be almost empty.
For this case the following code models the algorithm :



Bar.java

private static final int REPS = 50;
@SuppressWarnings("unchecked")
public static void test(int n) throws InterruptedException {
Collection a = new ArrayList(n);
for(int i=0; i a.add("bob"+i);
}

for (int r = 0; r < REPS; ++r) {
ArrayList c = new ArrayList(a);
long t1 = System.currentTimeMillis();
while( c.size() > 0 ){
c.remove(0);
}
long t2 = System.currentTimeMillis();
System.err.println("T" + n + ": " + (t2-t1)+ " ms");
}
}



In this testcase for n = 200000, System.arraycopy() is called 200000 times with an average of 100000 references to be moved.
The code runs on my machine within 12.3 seconds/iteration which is about 6.5 GBytes/sec, much better than I expected.
So I will use Arraylists on many other problems where I was worried about the O(N^2) performance before.



I should have done my profiling before... sorry.

bayard Mon, 3 Nov 2008 08:36:11 -0800 (PST)
No worries - closing the issue as there's nothing major here to work on.