For a recent problem (87), I decided to write up a somewhat complicated prime sieve that calls itself recursively to ensure that you only ever "sieve" with prime numbers. That is, to check that a number is prime, instead of checking for divisibility by all numbers less than it, or even all numbers less than the square root, or even all odds less than the square root, you only do divisibility tests with primes. I am pretty happy with my sieve, which I wrote up first in python quite a while ago. However, after spending the time to rewrite it in D, I wondered whether it was really worth it. Does the improved sieve actually save a significant amount of time over a simpler sieve?
I decided to investigate this question thoroughly. I wrote up a series of sieve methods in C (so that I can feel confident about what code is actually getting executed, unlike in a language like python). The methods are as follows:
- isPrimeDumbest: checks primality of a number n by dividing by 2..n-1
- isPrimeSqrt: checks primality of a number n by dividing by 2..sqrt(n)
- isPrimeSqrt2: checks primality of a number n by dividing by 2 and also all odd numbers 3..sqrt(n).
- isPrimeSqrt23: checks primality of a number n by dividing by 2 and 3 and then also all numbers in 5..sqrt(n) not divisible by 2 or 3.
- smartSieve: checks primality of a number n by dividing by all primes in 2..sqrt(n)
Obviously checking only up to the square root will be better than checking all numbers, and obviously checking only odd numbers will give an approximately 2x speed up. The main things I was considering was how much also predividing by 3 helps (it lets you check only 2/7 of numbers, but requires a wee bit extra computation), and whether the overhead of recursion and memory accesses allowed the smart sieve method to be as smart as I wanted it to be.
And in the end, yes, it is as smart as I wanted it to be: to compute all primes below 10^8, in the tests I ran, smartSieve is about 5x faster than isPrimeSqrt, and also more than twice as fast as isPrimeSqrt23. So, in conclusion, smartSieve is worth using, at least if the number of primes concerned is at least around 10^7 (and notably, problem 87, which brought up all of this, required finding all primes less than 5*10^7). Also the tests drove home just how bad it is to not at least shorten things to the square root and truly brute force things...the dumbest method takes about 5 minute at 10^6, while the next slowest method still takes under half a second to compute all primes.
Below are graphs of my data: The horizontal axis is the N, the number below which all primes were computed. It is on a log scale, so I ran tests on all powers of 10 from 10^1 to 10^8 (100 million). The vertical axis is time, measured in seconds. For smaller N, as the code would run too quickly to get any meaningful timing results I ran each sieve 1000000 times for N = 10 to 10^3, then for 10^4 to 10^7, each sieve was run 50 times, except for the dumbest sieve, which I ran only once at 10^5 because its just super slow, it doesn't really matter how much, and for all of these the result I show is the computed average running time of multiple trials. Because 10^8 takes so long and the data is spread out enough, I only ran these tests once instead of computing an average.
Now that I have fully described what the data is, here is the data, which shows exactly what one would expect, except for perhaps the magnitudes. The first graph shows all methods (except no data for the dumbest method above 10^6 because that would take too long to compute), and the second graph shows all but the dumbest method run on small N, so that differences can actually be seen (though the differences are oh-so small on an absolute scale).
In case you want to know the exact actual results: all the data is in a google spreadsheet here: https://docs.google.com/spreadsheets/d/18-avzFTZDDOPMh7p_FRPOVc9SVfEhxPEoVFGQPTyNWk/edit?usp=sharing
Lastly, here is the code I ran to generate the information. In order to get code running as quickly as possible, and because it doesn't hurt, the below was compiled with gcc -O3 (highest amount of optimization). I haven't included the testing code, but its a fairly no-frills python script.
Also, it is notable that I am sure there are other prime sieve methods that may be faster out there in this world, and maybe I should eventually check out others, but this analysis is sufficient for my own purposes for now at least.
And in the end, yes, it is as smart as I wanted it to be: to compute all primes below 10^8, in the tests I ran, smartSieve is about 5x faster than isPrimeSqrt, and also more than twice as fast as isPrimeSqrt23. So, in conclusion, smartSieve is worth using, at least if the number of primes concerned is at least around 10^7 (and notably, problem 87, which brought up all of this, required finding all primes less than 5*10^7). Also the tests drove home just how bad it is to not at least shorten things to the square root and truly brute force things...the dumbest method takes about 5 minute at 10^6, while the next slowest method still takes under half a second to compute all primes.
Below are graphs of my data: The horizontal axis is the N, the number below which all primes were computed. It is on a log scale, so I ran tests on all powers of 10 from 10^1 to 10^8 (100 million). The vertical axis is time, measured in seconds. For smaller N, as the code would run too quickly to get any meaningful timing results I ran each sieve 1000000 times for N = 10 to 10^3, then for 10^4 to 10^7, each sieve was run 50 times, except for the dumbest sieve, which I ran only once at 10^5 because its just super slow, it doesn't really matter how much, and for all of these the result I show is the computed average running time of multiple trials. Because 10^8 takes so long and the data is spread out enough, I only ran these tests once instead of computing an average.
Now that I have fully described what the data is, here is the data, which shows exactly what one would expect, except for perhaps the magnitudes. The first graph shows all methods (except no data for the dumbest method above 10^6 because that would take too long to compute), and the second graph shows all but the dumbest method run on small N, so that differences can actually be seen (though the differences are oh-so small on an absolute scale).
In case you want to know the exact actual results: all the data is in a google spreadsheet here: https://docs.google.com/spreadsheets/d/18-avzFTZDDOPMh7p_FRPOVc9SVfEhxPEoVFGQPTyNWk/edit?usp=sharing
Lastly, here is the code I ran to generate the information. In order to get code running as quickly as possible, and because it doesn't hurt, the below was compiled with gcc -O3 (highest amount of optimization). I haven't included the testing code, but its a fairly no-frills python script.
Also, it is notable that I am sure there are other prime sieve methods that may be faster out there in this world, and maybe I should eventually check out others, but this analysis is sufficient for my own purposes for now at least.
#include<stdio.h> #include<stdlib.h> #include<math.h> /** Very brute force sieve * Always checks 2..n-1 */ int isPrimeDumbest(int n) { int i; for (i = 2; i < n - 1; ++i) { if (n % i == 0) { return 0; } } return 1; } /** Mostly brute force, but checks * 2..n-1 */ int isPrimeSqrt(int n) { int i; for (i = 2; i*i <= n; ++i) { if (n % i == 0) { return 0; } } return 1; } /** Checks for 2 explicitly, then does a += 2 */ int isPrimeSqrt2(int n) { if (n % 2 == 0) { return n == 2; } int i; for (i = 3; i*i <= n; i+=2) { if (n % i == 0) { return 0; } } return 1; } /** Sieves out both 2 and 3 */ int isPrimeSqrt23(int n) { if (n % 2 == 0) { return n == 2; } if (n % 3 == 0) { return n == 3; } int i; for (i = 3; i*i <= n; i += (i % 3 == 1 ? 4 : 2)) { if (n % i == 0) { return 0; } } return 1; } /** Function that takes one of the isPrime Methods above * and sieves: * For consistency, all sieves know 2 is prime * and then check all odds */ void mostSieves(int bound, int* primes, int* primeslength, int (*sieveFunc)(int)) { primes[0] = 2; int c = 1; int i; for (i = 3; i < bound; i+=2) { if (sieveFunc(i)) { primes[c++] = i; } } *primeslength = c; } /** Recursively determines prime lists * So that mods are only taken for * primes <= sqrt */ void smartSieve(int bound, int* primes, int* primeslength) { if (bound < 11) { primes[0] = 2; primes[1] = 3; primes[2] = 5; primes[3] = 7; *primeslength = 4; return; } int rbound = (int)sqrt(bound * 1.0f); smartSieve(rbound, primes, primeslength); int partialLength = *primeslength; int mLength = partialLength; int current = rbound; if (current % 2 == 0) { ++current; } while (current < bound) { int isprime = 1; int i ; for (i = 0; i < partialLength; ++i) { int p = primes[i]; if (p*p > current) { break; } else if (current % p == 0) { isprime = 0; break; } } if (isprime) { primes[mLength++] = current; } current += 2; } *primeslength = mLength; return; } int main(int argc, char** argv) { if (argc < 3) { printf("%s\n", "Not enough arguments"); return 1; } int fchoice = atoi(argv[1]); int bound = atoi(argv[2]); int numtimes = atoi(argv[3]); /** Allocate an over estimate in the beginning * So that resizing does not effect times */ int* primes = (int*)malloc(bound * sizeof(int)); // Why not use this place in memory for this... int primeslength = 0; int i; for (i = 0; i < numtimes; ++i) { switch(fchoice) { case 0: mostSieves(bound, primes, &primeslength, &isPrimeDumbest); break; case 1: mostSieves(bound, primes, &primeslength, &isPrimeSqrt); break; case 2: mostSieves(bound, primes, &primeslength, &isPrimeSqrt2); break; case 3: mostSieves(bound, primes, &primeslength, &isPrimeSqrt23); break; case 4: smartSieve(bound, primes, &primeslength); break; default: printf("Not a valid choice\n"); return 1; } } return 0; }
No comments:
Post a Comment