Work started on the following about May 93, the program in June, and the description in August, for a magazine article that went unpublished. Still convinced of its merit, I dusted it off, added (a little) polish, and put it here on the web. The program should be modified to run in protected mode to take advantage of the large RAM and HDD memory available today. My original intent was to have a small program, and use less than 64K RAM to produce prime fields stored in about 35M free hard disk space on my 286 machine. These days, 2GB drives and 64M ram seem common as nails. Now 17017 byte fields in the old DOS program seem puny! Because I changed descriptions of code labels, the source code is not part of this package yet. I hope to edit the code so it matches the text and include it, then later, add a field reader, and perhaps go to protected mode. On a standard 486-66 computer, Blackkey.exe takes 53 seconds to fill 3,335,332 bytes in the file Blackkey.000, for primes to 100,059,960. My computer is now a Cyrix 6x86 running at 200 MHz, and the times have dropped to less than three seconds to calculate, and less than 5 seconds to calculate and store on disk. The most recent changes to this article include running a spelling checker, times to run, and a little more about the shielding method. */
The Sieve of Eratosthenes has been the best means for finding prime numbers since about 200 B.C. Although very efficient, the "black key sieve" is an improvement. It uses an enriched sieve to cut calculation time and storage requirements. Using a Cyrix 6x86 200mHz, the r30 sieve in C and Assembly (see listings), calculates primes to 100,000,000 in less than three seconds, including the time to count primes. Storing the primes (on a typical hard disk) uses 3.3 Mbytes, and adds about another two or three seconds.
The classic or simple sieve is a field of elements, each element represents a positive integer from two through the last number of interest. Each element starts out as "unmarked". The sieve operation consists of two main operations, finding the next prime number, and canceling multiples of the prime by marking elements. Starting with number 2, and searching higher, if an element is unmarked, it represents a prime number. When a prime is found, every multiple of the prime from the prime squared through the end of the field is cancelled. Primes are found, and the field cancelled, until the prime exceeds the square root of the last sieve element value. The field elements now represent the prime status of all of the numbers in the field.
By analysis, operation of a simple sieve is not quite optimum. The algorithm is like listening to a long piece played on an imaginary piano, while keeping track of the keys struck. Each key is a number, and the keys not struck are primes. Cancellations melodically fall on the white keys, which are the numbers with low value factors, like two, three or five. The black keys are numbers without those factors, and are not struck as often, many not at all. The white keys above the first octave are never prime, while some black keys are. Improving the situation starts with the question "Why spend any time on the white keys, when the results are always on the black ones?" Two variations come to mind: shield the "white keys" from more than one cancellation, or build a "piano" with only black keys!
"Shielding" is a method to prevent redundant cancellations. It's like playing a note once in a performance, and avoiding it if it appears again in the score. Here are two ideas, the first I considered, the second I tried;
1. The simple version reads the field element to be cancelled, then tests for previous cancellation. If not marked, the canceling write is made, otherwise the write is skipped.
2. A better method assumes that the sieve cancellations are in ascending order, and the number to be cancelled is the product of the prime and a factor in the sieve. If the factor is non-prime, so is the product, and the product is not recancelled. For example, if the current prime factor is 5, start by squaring the prime and canceling 25. Next, 5 is added to 25, 6 is examined and found to be non-prime, so 30 is not cancelled. 5 is added to 30, 7 has not been cancelled, so 35 is, etc. Be very careful to cancel powers of the prime factor!
The value of shielding schemes is based on minimizing instruction times of the hardware. Clock cycle savings (or loss!) may be calculated, but don't expect much return on the effort. If Assembly Language is second nature, consider reading 16 or 32 bit sieve field words. Bits can be checked faster in registers than in RAM. Because primes thin out as numbers grow, shielding works best in the larger sieves.
The "black key" sieve holds more promise. Removing the "white keys" enriches the sieve with primes. Fewer elements to work with makes faster operation possible. The mask and offset calculations are more complicated than the simple sieve (to be discussed later.) A base 2 number system for an odds-only sieve is a good first example. Each byte or integer in the field represents an odd number, because all primes above two are odd. This simple odds-only compression technique has a few interesting points worth noting:
a. The field represents twice as many numbers for a given size.
b. The prime number two is not found in the field, because the odds-only sieve is a set of elements, and every element (mod2)=1.
c. The sieve operation is more efficient because every other cancellation (every factor of two) is not required or even possible.
Looking harder, an intrepid programmer sees that using a whole byte to represent one number is an obvious waste of seven bits! Some complications arise in calculating values of each bit, and mask work is required to set and read each bit. One of two methods is usually employed for representing numbers in the sieve field. Method 1 packs consecutive elements in a byte. Method 2 assigns elements in consecutive addresses starting with bit 0, wrapping around to offset_byte 0 bit 1, after overflowing the field, etc.
Method 1: n=(16 x offset_byte)+(2 x bit_num)+1
Method 2: n=2 x (offset_byte+(bit_num x field_size))+1
If the sieve field is to be saved, Method 1 is much preferred, because adjacent fields may worked on separately, then concatenated. Masking off bits and canceling is a little easier in Method 2, but variable field sizes cause grief. The sixteen fold advantage in memory utilization compared with a simple sieve is well worth extra code to store the results. This improvement gains speed and utility by building a sieve without representing factors of the base number two.
A case could be made for a better base number, like six. Using 2 and 3 as factors, the sieve field can be compressed further. The set of numbers representing base 6 is {0, 1, 2, 3, 4, 5}. For any Prime number except two and three, Prime (mod 6) is 1 or 5. The equation for finding a value in the field is
n=(6 x offset_byte) + remainder[bit_num], where
{x, x, x, x, x, x, 1, 0} is the bit_num, and
{x, x, x, x, x, x, 5, 1} is the remainder set
Constructing a sieve field using two bits per address is not very frugal. The sieve word can be packed with more bits by using another factor of four to yield base 24. For primes above 3, the mod 24 remainder set is:
{23, 19, 17, 13, 11, 7, 5, 1}
A value of a bit in the field is
n=(24 x offset_byte)+remainder, where the remainder corresponds to the bit position:
{ 7, 6, 5, 4, 3, 2, 1, 0} bit_num
{23, 19, 17, 13, 11, 7, 5, 1} remainder
Other two factor bases are not as efficient. Using factors two and five, base 10 uses the set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. For numbers above 5, any Prime(mod10)={1, 3, 7, 9}. Packing the byte gives a set of Prime (mod20)={19, 17, 13, 11, 9, 7, 3, 1}. When compared to base two sieves, the base six or base ten systems represent more numbers per storage element. As a consequence the base factors two and three (or two and five) are not reported as prime. The price for efficiency is higher complexity.
Numbers in this article are expressed in base 10, but that doesn't make it the best base for sieves, or for much else. Base 10 was adopted and promoted for general use by the assumption everyone counts on their fingers. Programmers are smarter, and know that left and right feet, knees, elbows, and arms can be added to fingers for a hexadecimal system. Generally speaking, the best base number for a sieve is derived from the set of primes {2, 3, 5, 7, 11, ....}. Use each element once in sequence to derive a product to use as the base number. Determine the remainder set, and repeat until a convenient number of elements is reached. Suggested base numbers are 2, 6, 30, 210, 2310, 30030, ... Large and unfamiliar base numbers are as fun to play with as a giant squid. The specific base number choice is not yet clear, but for efficiency, it has to have more than one or two factors, but not too many for practical reasons.
For this article, the base is 30. For all primes (except 2, 3, 5; the factors of the base), any Prime (mod 30) yields the remainder set: {1, 7, 11, 13, 17, 19, 23, 29}. This set fits nicely into a byte, having eight elements. Using mod 30 remainders yields a compression factor of thirty compared to the simple sieve, or 16 compared to a base 2 sieve.
The value of a bit(base 30) in the sieve field equals (30 x byte_offset)+remainder, where the remainder corresponds to the bit position.
{ 7, 6, 5, 4, 3, 2, 1, 0} bit_num
{ 29, 23, 19, 17, 13, 11, 7, 1} remainder
A sample four byte field looks like this:
{ |
119, |
113, |
109, |
107, |
103, |
101, |
97, |
91 |
} |
byte 3 |
{ |
89, |
83, |
79, |
77, |
73, |
71, |
67, |
61 |
} |
byte 2 |
{ |
59, |
53, |
49, |
47, |
43, |
41, |
37, |
31 |
} |
byte 1 |
{ |
29, |
23, |
19, |
17, |
13, |
11, |
7, |
1 |
} |
byte 0 |
Instead of calling this the Mod 30 prime remainder set sieve representation, we'll call it "r30", and reference other black key sieves r# as well. In the future, if the base number gets too large, it might be easier to refer to the last factor of the base, like rB37. The r30 procedures may be modified to suit other black key sieves. If your processor chokes on byte operations, you can use a 16 bit r60 sieve with a few adjustments.
Now field representation is fixed, the big hurdle is canceling non-prime values in the field. Cancellation of elements in a monotonic field occurs at every multiple of the prime. This is usually a simple running sum starting at the prime squared, plus the prime for each additional cancel. The problem with this multiple factor, remainder based sieve is that the simple running sum system no longer works! Take the first prime found, 7, as an example. The running sum method yields 49, 56, 63, 70, 77, 84, 91, 98, 105, 112, 119, 126, 133, 140 ... The numbers 56, 63, 70, 84, 98, 105, 112, 126, 140 are not in the field!
The valid values for the r30 string are: 49, 77, 91, 119, 133, 161, 203, 217... To be workable, the values to be canceled must be quickly computable! This is both a challenge (nuisance), and opportunity (to save computer time), as the valid set is much smaller than the simple white + black key set. Keep in mind the reward of using eight bits to represent thirty numbers. The other twenty two have been dropped from the field. No work will be wasted on the missing.
The valid "r#" cancellation series is generated by multiplying the prime by every member of the sieve field starting with the prime, ending when the cancel offset address overflows the field boundary. Programmed literally, the cancellation multiplications take too much time. A much better way is to break down the valid series into multiple simplified sets.
Going back to the piano analogy, have a seat on the bench and read the music. Start with the example of #7. The cancels will start on the products of 7*{7,11,13,17,19,23,29,31} using the following offsets and masks:
product |
offset |
remainder |
mask |
49 |
1 |
19 |
0010 0000 |
77 |
2 |
17 |
0001 0000 |
91 |
3 |
1 |
0000 0001 |
119 |
3 |
29 |
1000 0000 |
133 |
4 |
13 |
0000 1000 |
161 |
5 |
11 |
0000 0100 |
203 |
6 |
23 |
0100 0000 |
217 |
7 |
7 |
0000 0010 |
Next, the series continues with the products of 7*{37,41,43,47,49,53,59,61} resulting in:
product |
offset |
remainder |
mask |
259 |
8 |
19 |
0010 0000 |
287 |
9 |
17 |
0001 0000 |
301 |
10 |
1 |
0000 0001 |
329 |
10 |
29 |
1000 0000 |
343 |
11 |
13 |
0000 1000 |
371 |
12 |
11 |
0000 0100 |
413 |
13 |
23 |
0100 0000 |
427 |
14 |
7 |
0000 0010 |
We could continue with the next eight products, do calculations for the offsets, remainders, and convert the remainders to masks, but instead notice a repeating pattern: For a given mask, the delta offset increases by the prime. Instead of thinking of the cancel process as playing one note at a time, think of an eight note chord. Once sounded, this chord is moved up the keyboard a certain interval and struck again, then repeated until over the end of the sieve. Each eight note "chord" has all eight mod 30 remainders represented. The offset byte interval between each chord is the prime.
Converting this analogy to a program requires breaking the chord into eight notes (remainders) and running separate scales for each note. As enjoyable as parallel solutions are, my tin-eared computer runs fastest one note at a time! Here is the breakdown:
Search for a prime bit in the field. When a non-cancelled bit is found, it represents a Prime number:
Prime=30 x pr_off + remainder[bit_num]
where pr_off is the byte offset address of the prime in the field, and the bit_num is the bit position of the prime bit used to index the remainder set.
Series(n)=Prime(30(pr_off+i)+remainder[bit_num+n])
Series_off(n)=Series(n)/30, using integer values
Series_remainder(n)=Series(n)%30
n is the series number 0 to 7, i=0,1,2,... , and % is the mod operator
Each series number has a different, but constant remainder. Convert the remainder to a bit mask and use it to cancel by ORing it into the sieve field. One mask is used per series number. The beauty of the method is that the cancellation offset address is a running sum! The delta offset is the prime.
Selection of Base factors and field size is easy, here's what to expect:
Base Number B = (p0*p1*p2*p3*....)
remainder set size r_set= ([p0-1]*[p1-1]*[p2-1]*...)
If multiple factors of a prime are used for the Base number, then the remainder set size will increase directly by the excess factors. Say p0 is used N times: r_set = ([p0-1]*[(N-1)*p0]*[p1-1]*[p2-1]*...) For efficiency then, use each prime as a base factor in ascending order. Here are a few suggested sizes, and bytes to represent numbers to 9,699,690:
Base Factors |
Base Number |
elements |
bytes in field |
{2} |
2 |
1 |
606,231 |
{2,3} |
6 |
2 |
404,154 |
{2,5} |
10 |
4 |
484,985 |
{2,3,5} |
30 |
8 |
323,323 |
{2,3,5,7} |
210 |
48 |
277,134 |
{2,3,5,7,11} |
2,310 |
480 |
251,940 |
{2,3,5,7,11,13} |
30,030 |
5,760 |
232,560 |
{2,3,5,7,11,13,17} |
510,510 |
92,160 |
218,880 |
{2,3,5,7,11,13,17,19} |
9,699,690 |
1,658,880 |
207,360 |
The "bytes in field" for base 2, 6, and 10 have been adjusted to pack the byte fully, as described earlier, otherwise the "bytes in field" number would be much higher.
Listing One has C routines for the r30 sieve. Listing Two contains assembly replacements for the critical routines. Microsoft C 6.0 and Listing One yields a decent sieve program. If you have a 386, assemble Listing Two with MASM 5.1, remove redundant C routines from Listing One, compile and link. Hint: the C+ASM version runs twice as fast!
Note that using another factor of two to obtain a sixteen element mod 60 word does not speed up computation. Twice as many offsets and masks for the cancellation process have to be generated, increasing calculation overhead. Each cancellation series, the most efficient part of the sieve operation, is then half as long.
The operation of the r30 program needs some explanation. Three sieve fields are declared, the ref_field, xfer_field, and field. Each field has (17017+3) bytes, which equals (7x11x13x17)+3. The appended three bytes are unused, but make it possible to use 32 (or 16) bit transfers instead of byte moves. Any field(n) is a sieve representing numbers from (n x17017x30) through ((n+1)x17017x30).
The ref_field represents numbers to 510,510 and is used as the RAM reference source for prime numbers.
The xfer_field is cancelled from numbers 7 through 17. When transferred (copied) to the field, it both clears and cancels the field through number 17.
The field is the working sieve segment.
The first portion of main() does some housekeeping, sets up look-up tables, and records the start time for an elapsed time measurement. Work is begun on the ref_field. After new_prime=17 has been cancelled, the ref_field is copied to the xfer_field.
The xfer_field now has all multiples of {7,11,13,17} cancelled, except for the first byte, so 0x1E is ORed to the first byte of the xfer_field to cancel them. It's interesting to note that the xfer_field is a palindrome! If swapped end for end, bitwise it would read the same. In theory, only half of it would have to be saved, the other half generated by running the saved half backwards!
Work on the ref_field is continued until the new_prime exceeds the square root of the field. The ref_field is copied to the file blackkey.000, representing field(0), where 0 references the first set of 17017 bytes, or numbers to 510,510. To keep a count of the number of primes found, a look-up table translates a byte at a time of a field to a count of zero bits, which is added to the running p_count.
When the ref and xfer fields are completed, the program is ready to append additional fields to the sieve. Copying the xfer_field to the field prepares it by simultaneously clearing and canceling through new_prime = 17. Starting with 19, find primes in the ref_field. More housekeeping is done so field(1) offset calculations represent 510,511 through 1,021,019. An alignment is made so each series offset calculations matches the field values. When finished with field(1), it is appended to the file, and the process is repeated. When the disk is full, or the requested maximum number of primes has been reached, the finish time is taken, and the elapsed time is reported.
The file produced may be further compressed to free up space or to use the extra space for more primes. As usual, there is a cost in computer time and convenience using the primes found. Suppose that the base number was much larger, like 510,510 or 9,699,690 instead of 30. The calculation time and complexity makes construction of an r510510 sieve less practical than an r30 sieve. It is actually fairly easy to convert completed r30 sieve segments (or even simple sieves) into much larger base black key sieves for storage. When retrieved, the sieves may be reconverted back to a smaller base sieve segments for use.
One way to compress the files after cancellation of the fields is to use the xfer_field as a shift reference. The xfer_field is a 17017 byte r30 field where every multiple of 7, 11, 13, and 17 has been cancelled. It is clear that above 17, any bit cancelled will be non-prime in any multiple of the xfer_field. Shrink and convert the r30 field into r510510 fields by dropping these bits. The savings in storage space is 5497 bytes per 17017 byte field. There are many ways to do this job, but the suggested one is to simulate three shift registers. The first is loaded with the 17017 bytes of the xfer_field. The second is loaded with a data field of 17017 bytes. The last is the output_data, an empty shift register of 11520 bytes, its input connected to the output of the field shift register.
The operation is to shift the xfer_field one bit, and examine the output bit. If the bit is marked as non-prime, the field is shifted out one bit, discarding the bit. If the xfer_field shift register is not marked, the field is shifted one bit, and that bit is shifted into the output_data register. When all 17017 bytes have been shifted out, the output_data register will be holding 11520 bytes. This string represents a "black key" sieve word, where the factors of 2,3,5,7,11,13,17 have been removed. The base number and xfer_field size could be expanded to larger sizes by adding other prime factors. The problem is sheer size of the xfer_field. Say the size goes up by a factor of 19, to 323,323 bytes. If the computer or compiler boggles above 64K, or can only provide you with less than 640K of RAM at a time, the headaches exceed the gains. A mathematician might point out that since the xfer_field is a palindrome, only half of it is required. My opinion is that more ram or hard disk space is cheaper than more headaches.
Improvements are easily made to this program. A programmer will add user-friendly options, especially to control storage. The speed demon will expand the fields to gigantic sizes for efficiency and use Assembly to blast through cancellations. Look into offsetting the xfer_field if using field sizes not on 17017 boundaries. A multiplication table of the (remainder_set x remainder_set) with returns of offsets and mask is useful. A few minor alterations in the program, and it may be used for "bands" of primes of very high values. If you play with the code, watch out for overflowing C data types, or rounding into trouble with coprocessors. The best bet is assembly, using quad words or larger.
Before getting too deep in the code, decide how many primes are needed and when, and how much storage capacity is available. The author has been working on factoring methods for large numbers, and needs lots of primes. The original statement "Typically, about 50 Meg is free for use on the disk, but who wants to leave it filled up with a big old prime file? So grind some beans and load the coffee maker while r30.exe runs. When the coffee is ready, the disk is full, primed for use. When space is needed for other use, just erase the file!" is no longer valid. Today's computers run too fast, even running to a billion won't give you a coffee break. In fact, with the mass storage so cheap, it doesn't pay to even erase a mere 3.3 or even 33.3 Megabyte file.
To sum up, this method runs much faster, perhaps close to a factor of four, and stores up to thirty times more prime numbers than simple sieves. The stored sieve may be expanded at a later date, or additional sieves representing ranges of numbers may be generated. This program is sensitive to tinkering, and with a few improvements it can cover much more ground at a faster rate. Eratosthenes would be proud!