In my last post about the TekPub LINQ Challenge I asked if anyone wanted to post an algorithm which was faster than the solution that Nate had posted up on his blog. There are a few known algorithms for finding positive prime numbers, and one of those is called the Sieve of Eratosthenes. Well, it turns out that Steve Strong had already stepped up to the challenge and implemented this algorithm. Well done Steve!
Explanation
The Sieve of Eratosthenes rules goes as follows (as explained by Wikipedia):
To find all the prime numbers less than or equal to a given integer n by Eratosthenes’ method:
- Create a list of consecutive integers from two to n: (2, 3, 4, …, n).
- Initially, let p equal 2, the first prime number.
- Strike from the list all multiples of p greater than p.
- Find the first number remaining on the list greater than p (this number is the next prime); let p equal this number.
- Repeat steps 3 and 4 until p2 is greater than n.
- All the remaining numbers on the list are prime
Steve’s implementation is as follows:
int max = 100; var primes = Enumerable.Range(1, max) .Where(i => i > 1) .Aggregate(Enumerable.Range(2, max - 1).ToArray(), (sieve, i) => { if ((i > Math.Sqrt(max)) || (sieve[i - 2] == 0)) return sieve; for (int m = 2; m <= max / i; m++) sieve[i * m - 2] = 0; return sieve; }) .Where(n => n != 0);
We start with a range from 1 to n, and then we get all of the numbers greater than 1. Next we create a range from 2 to max – 1. We pass this into the aggregate method as a seed (we will use this array to accumulate our result), and then we run an accumulator method which takes each number in our initial range and runs it through the sieve.
The sieve first checks to see if the current number of greater than the square root of max. Since max doesn’t change we could certainly pull this value out of the LINQ query so that we don’t keep calling Math.Sqrt. Since Steve was following the rules of the challenge, he just left it in like this. We then check to see if the current number in the array, which is at index "i – 2", is equal to zero. If it is zero, this means that it has already been removed from the list and doesn’t need to be checked. Next we loop through and remove all multiples of "i" in the list. And then return the sieve. We just keep doing this over and over until eventually we have removed all non-prime numbers. Then we just filter the list to return only the non-zero numbers, and there we have a list of primes. Pretty awesome!
Initial Results
So, at first glance it is hard to tell if this algorithm is going to be faster. And so we could analyze it and use big O notation to represent the performance of this algorithm, but I think in this case it’ll be easier to just run it. 🙂 Besides, the Wikipedia page lists the complexity of this algorithm as O(n(logn)(loglogn)) which I’m going to go out on a limb here and say that I wouldn’t be able to reach this. So, here are the numbers from Steve’s algorithms run up against the optimized brute force approach from my previous post (I ran the brute force approach numbers again, to make sure that everything was fair, and because we didn’t run a single threaded test on 20 million with the brute force approach last time):
Primes Up To | Brute Force | Sieve Of Eratosthenes |
2000 | .001 seconds | .007 seconds |
200,000 | .193 seconds | .041 seconds |
2,000,000 | 3.3 seconds | .309 seconds |
20,000,000 | 79 seconds | 3.5 seconds |
The numbers look great! It starts off a bit slower at the very beginning, but then quickly pulls ahead at the second test. By the time we reach 20 million, there is really no comparison. In this case, a little bit of research and forethought has improved the algorithm by an absurd amount. But let’s see if we can speed this up a bit more.
Optimizations
The first thing I am going to do is ditch the call to Math.Sqrt and move it outside of the LINQ statement. Steve was following the rules of the contest and so he left this inside of the LINQ expression. I am however not bound by any such rules. Since I created them!
var primes = Enumerable.Range(1, max) .Where(i => i > 1) .Aggregate(Enumerable.Range(2, max - 1).ToArray(), (sieve, i) => { if ((i > maxSquaredRoot || (sieve[i - 2] == 0)) return sieve; for (int m = 2; m <= max / i; m++) sieve[i * m - 2] = 0; return sieve; }) .Where(n => n != 0);
Next I am going to remove that check against the square root of max up into the range creation. This way we aren’t even feeding in values which are too big into the aggregate:
var primes = Enumerable.Range(1, (int)maxSquareRoot + 2) .Where(i => i > 1) .Aggregate(Enumerable.Range(2, max - 1).ToArray(), (sieve, i) => { if (sieve[i - 2] == 0) return sieve; for (int m = 2; m <= max / i; m++) sieve[i * m - 2] = 0; return sieve; }) .Where(n => n != 0);
Then I’ll change my range to start at 2 so that I can remove the "Where" filter completely:
var primes = Enumerable.Range(2, (int)maxSquareRoot + 2) .Aggregate(Enumerable.Range(2, max - 1).ToArray(), (sieve, i) => { if (sieve[i - 2] == 0) return sieve; for (int m = 2; m <= max / i; m++) sieve[i * m - 2] = 0; return sieve; }) .Where(n => n != 0);
Optimized Results
So, we have implemented a few simple optimizations, let’s look at some new numbers:
Primes Up To | Sieve Of Eratosthenes (Original) | Sieve Of Eratosthenes (With Optimizations) |
2000 | .007 seconds | .011 seconds |
200,000 | .041 seconds | .029 seconds |
2,000,000 | .309 seconds | .190 seconds |
20,000,000 | 3.5 seconds | 2.2 seconds |
Great! It shaved even more off the already very quick times. Do you see anymore quick optimizations that we could make to this to speed it up even more?
Summary
I guess this goes to show you that while premature optimization may be the root of all evil, if you need performance in your application, it may behoove you to do a bit of research before you jump right in to implementing a naive solution in your application. The key is to test and benchmark to make sure that you really do need the optimizations before you start spending time on it. I hope you have found this little exercise interesting, I know that I sure have.
Loved the article? Hated it? Didn’t even read it?
We’d love to hear from you.
That’s pretty awesome, never heard of that technique to find Prime numbers before.
Justin, can you post results by using parallel LINQ ?
I’m also curious about the parallel results.
You could cache the
[quote]Enumerable.Range(2, max – 1).ToArray()[/quote]
just as you did with
[quote]Math.Sqrt(max)[/quote]
@alwin It might be deceptive, but the range is being passed in as a seed to the aggregate method, so it is only being executed once.
I tried to parallelize the inner for loop like this:
Parallel.For(2, max / i + 1, m => { sieve[i * m – 2] = 0; });
Before doing that the load was on one CPU only and by using Parallel.For it was almost 100% on both (Core 2 Duo). Sadly the speedup was marginal, from 2.2s to 2.1s. Maybe we’ve hit the memory bandwidth wall?
@Jonas I think that the operation happening in this loop is too fast, and so the overhead of making it parallel pretty much cancels out any performance gains you might get.
I implemented The Sieve of Atkin. Not in LINQ but it scales.
See:
http://alicebobandmallory.com/articles/2010/01/14/prime-factorization-in-parallel
You could change the inner for loop to not use multiplication. The for loop is really just marking ever "i" element 0.
int max = 100;
int maxSquareRoot = (int)Math.Sqrt(max);
var primes = Enumerable.Range(2, maxSquareRoot + 2)
.Aggregate(Enumerable.Range(2, max – 1).ToArray(), (sieve, i) =>
{
int m = i – 2;
if (sieve[m] == 0) return sieve;
while((m += i) <= max – 1)
sieve[m] = 0;
return sieve;
})
.Where(n => n != 0);
this is my solution..i know it includes some hardcoding values but it is only one line
var primes = Enumerable.Range(1, 20150).Where(x => x == 2 || x == 3 || x == 5 || x == 7 || (x> 10 && x%2 != 0 && x%3 != 0 && x%5 !=0) );
what do you think?