HashSet vs List

I was writing a simple class today that needed to check to see if a value was in a list.

As the list was about 2,000 strings, I wondered what the most performance efficient approach was. Normally I create a List<String> and call the .Contains() method to find out if the value I’m checking is in there – but this list is going to be hit millions of times. So getting it wrong could have performance implications.

I did a quick Google search and HashSet<T> seems like a good candidate. So I wrote a very simple console app that would create a list of strings, and then store the list in either a List<String> or a HashSet<String>.

Initially I set my list size as 2000 to model my application, and generated 10,000 calls to the list with a random element. I set up the lists so that about 10% of the random numbers should hit the list, and 90% not be in.

The first run was so quick that I was not sure about the results, so I reran with 20,000 numbers in the lists and 100,000 test hits:

Method Time to run: Hits/misses
List<String> 24.68 seconds 9921 / 90079
HashSet<String> 0.0257 seconds 9719 / 90281

 

I hadn’t expected such a clear cut difference: a HashSet<T> will be about a thousand times faster than List<T>