Wikipedia:Reference desk/Archives/Mathematics/2020 July 8

= July 8 =

The Six-Series Prime Number Sieve
This is an idea for a prime number sieve which I came across this week. I'm not sure about its authenticity, but here's how it goes:


 * 1) 1. Write 2, 3 on top.
 * 2) 2. Write 5 ( = 2 + 3) and 7 ( = 22 + 3) below 2 and 3 respectively.
 * 3) 3. With 5 as beginning term and common difference 6, write an AP of the form 5 + (k - 1)×6 in a column below 5.
 * 4) 4. Do step 3 below 7 with 7 as the beginning term.
 * 5) 5. Starting with 5 in the first column, remove all numbers of the form 5 + (k - 1)×5×6 [except 5 (k = 1)]
 * 6) 6. Do the same with the other numbers coming under 5 (let the number be 'n'), by removing numbers of the form n + (k - 1)×6×n [except the numbers at k = 1]
 * 7) 7. Execute steps 5 & 6 under the column of 7, starting with 7 as the first number used.

I tried this algorithm and found that only prime numbers appeared to be coming out of the sieve. What I want to know are the following: — Preceding unsigned comment of interest to Sam Ruben Abraham (talk • contribs) 05:31, 8 July 2020 (UTC)
 * Has this algorithm ever been discovered before ?
 * If not, how efficient is this algorithm ?
 * How efficient will this be once translated into a computer program ?
 * The idea of "pre-sieving" multiples of of small primes is not new; see for example the section "The Art of Prime Sieving" here. (This kind of thing should be in the article imo but a more reliable source is needed.) In general, prime numbers have been so heavily researched over the centuries that it's very difficult for an amateur to discover anything new about them, not that it isn't fun to try. I'm not entirely clear on the details of your algorithm though. First, I'm assuming 'AP' stands or 'arithmetic progression' here; it's not a good idea to use abbreviations unless you can verify they're in common use. Even then, math has plenty of jargon already and there's no reason to make the situation worse if you can help it. Second, the wording of step 7 is a bit vague. The starting members of the 7 column would be 7, 13, 19, 25, 31, 37, 43, 49, 55, 61, 67, 73, 79, 85, 91, 97, 103, 109, 115, 121, 127, ... . The first number is 7 so on the first pass you'd be eliminating 49, 91, ..., Then the second number is 13 so on the second pass you'd eliminate 91 (again), 169, ... . This leaves 25 and 55 which are composite. For the 7 column you should probably start by eliminating 25, 55, 85, 115, ... as step 1, then 49, 91, ... as step 2, then 55, 121, 187, ... as step 3, and so on. I'm not sure about efficiency in general, but since any composite in the 5 column is divisible by a prime in the 5 column, it seems like you mught save some by not testing them for divisibility by numbers in the 7 column. On the other hand this means you have to test more primes that you'd have to do otherwise. For example, in order to eliminate 707 in the 5 column, you need to go all the way to the sequence 707, 1313, 1919, ... . But if you included a step to eliminate 35, 77, 119, ... as step 2 then 707 would be eliminated much more quickly. The first numbers eliminated on the pass corresponding to n should be about n2. — Preceding unsigned comment added by RDBury (talk • contribs) 07:49, 8 July 2020 (UTC)
 * Correction, the idea is covered in the article; see the last paragraph of the Overview section. As the article points out, the idea is a variation on wheel factorization. --RDBury (talk) 18:41, 8 July 2020 (UTC)

Well then, is there any way I can do away with the non-primes ? Here's a link to my original version of the algorithm posted in another space:Math Stack Exchange - A New Prime Number Sieve Here, as you can see in one of the comments, some person told me that the last number in the preceding row reappears as the first number in the succeeding column, making it an unreliable sieve. Plus, the critic also says that the method given doesn't test for multiples of 2 and 3 in the set of numbers obtained, and when switching over from the last column to the first while checking for primes row-wise , the algorithm actually subjects the last number checked once again to the test a second time. I agree with the second claim, but there are more things to verify. To tell you the truth, before posting the current version of the algorithm, I decided to go along with the three - columned one. There, I decided to use the perfect squares that appear in the middle column to eliminate composites, and that seemed good to me, because that seemed to eliminate almost all composites. Plus, according to me, that seemed provide an advantage of only knowing the divisibility tests for prime factors rather than all possible numbers when searching for primes. And again, you'll be shrinking the number of numbers you'd have to remove and saves you some time if you know some primes (especially the ones between 1 and 100). Another thing is that in the first reference to my algorithm, I used the word 'remove' - by the word, I only meant to say 'mark off' or 'demarcate' the numbers and not completely removing them from memory. Now, I reviewed the algorithm and found out something - why not find the each of the perfect squares in the column of 7, take its square root as the number on the basis of which we'll be filtering the whole set ? For example, the first composite found in my sequence (25) is a perfect square. Taking its square root gives you 5, and with that as $$n$$, we can apply the formula $$n(m + (k - 1)6)$$ (which is equal to 5×(5 + (k - 1) × 6) in this scenario), right ? Then, in the next columns (columns of 5 and 11 - I have been referring to the 3 - columned algorithm in this paragraph), we can find the first multiple of $$n$$ that appears first and do the same process (of the application of the sequence ) the whole way down, right ? Thus, in the column of 5, you begin with 35 and keep on deleting numbers of the form 35 + (k - 1) × 30, in that of 7, you begin with 25 and go on deleting numbers of the form 25 + (k - 1) × 30, and in 11's column, you use 35 + (k - 1) × 30 (where 30 = √(25) × 6) - here , the square root of the perfect square in column of 7 (=n) times 6 becomes the common difference for an arithmetic progression (AP) which can be used to filter non - primes. I haven't completely tested the credibility of this filter, but I believe it can work. --Sam Ruben Abraham (talk) 03:26, 9 July 2020 (UTC)


 * Please sign and date your posts using 4 tildes (~). --  Jack of Oz   [pleasantries]  11:36, 8 July 2020 (UTC)


 * Yes, please sign your posts. Also, it's considered a bit rude to cross-post the same question on more than one site, or on more than one forum within a site. It's equivalent to running up to a bunch of strangers to ask them a question, and then before they have a chance to answer, running up to another bunch of strangers to ask the same question. With regard to your observation about the squares, I believe it's true but I doubt it's actually useful. Let's say you want to find all the primes less than a million. The primary thing you want to rely on to save time is that any composite less than a million will be divisible by a prime less than a thousand. So by the time you reached the pass corresponding to n=997 you should be done. But if you try to use this observation on squares, you wouldn't eliminate the multiples of 101 in the 7 column until you reached n=10201, while the 101 would be waiting in the 5 column the whole time. Anyway, whether or not your idea can be turned into an working program, I don't really see the point except as a programming exercise. The person who wrote the page I linked to above did basically the same thing but pre-sieving multiples of 2, 3, and 5, resulting in 8 columns, and has already implemented the scheme in C. I'd recommend that, as an exercise, you write up both the original sieve and your version in python, use both to generate the primes less than a million, compare the two answers to make sure you get the same answer, and compare execution times to see which is more efficient. --RDBury (talk) 14:30, 8 July 2020 (UTC)


 * Okay, so my method is a variant of wheel factorisation, right ? As you said, some multiples may not be eliminated (I'm not sure about that - I am not perfect in programming and hence, I don't think I can make a pluperfect C ,C++ or Python program that corresponds to my method). But still, is there a possibility that those numbers (the ones you say as the ones that won't get marked off) can be eliminated alongside the multiples of other primes ? I am a 10th grader and I am currently having online classes from my school, so I can't go on with my research as fluently (in the sense 'continuous , unrestrained'). That's why I decided to seek your help. --Sam Ruben Abraham (talk) 03:47, 9 July 2020 (UTC)


 * Just for fun I did this myself to see how many lines of Python it would take. The first version:


 * is based on Samuel Horsley's 1772 exposition. (His presentation is surprisingly readable and entertaining once you get past the old-timey 'eſſes'.) The second version:


 * pre-sieves multiples of 3 as well as 2. To test I computed sum(sieve_Eratosthenes_1col(10000000)) and sum(sieve_Eratosthenes_2col(10000000)) both to see if I got the same answer (3203324994356) and to compare performance. The second version was noticeably faster. One could pre-sieve 5 as well to get an 8 column version, and then 7 to get a 48 column version, but I doubt the increased efficiency would offset the extra coding effort required. In any case, I gather that the Sieve of Atkin, based on quadratic forms, is yet faster. --RDBury (talk) 03:11, 10 July 2020 (UTC)

Well, yesterday I came across a good sieve, I think. The algorithm is as follows:
 * 1) Take two columns, one with 2 as the first entry(let it be $$ A = {2}$$), the other with 3(let it be $$B = {3}$$).
 * 2) Append 5 to A, 7 to B.
 * 3) In A, construct an arithmetic progression (A.P.) with T1 = 5, common difference d = 6.
 * 4) In B, construct an arithmetic progression (A.P.) with T1 = 7, common difference d = 6.
 * 5) Let a third set C = A ∪ B.
 * 6) Take a number $$n$$ (the first number in the set when the set is in ordered form) from C. Declare a counter $$k = 1$$
 * 7) If $$(6*k + 1) * n, (4 *k + 1) * n $$ or $$n^2$$ present in C, remove it from C.
 * 8) Increment $$ k $$ by 1.
 * 9) Go to step 7 till no such number remains in the set.
 * 10) Go to step 6 till all composites are filtered out.

Is this good enough to be used ? Has this one ever been discovered before ? --Sam Ruben Abraham (talk) 05:10, 10 July 2020 (UTC)

It seems to me that I have reached where I wanted, but I am still not in the clear. Here's the revised version of Step 7 of the previously created algorithm: 7. If $$ n^2, n(6k + 1), n(6k - 1), n(4k + 1), n(4k - 1), n^2(6k +1)(4k + 1), n^2(6k - 1)(4k - 1)$$ present in C, remove it from C.

I tried to implement it in Python and it seemed to work well... I also made it a shell command so I can use it in  Command Prompt whenever required. This time, it seemed to work well (still, I'll have to verify it many number of times so as to ensure that there is no bug - till now I was able to find primes less than 3005 using my algorithm) and I felt good. Here's the code for the desktop tool :

I hope this one has never been found till today.

Cheers

Sam --Sam Ruben Abraham (talk) 07:02, 10 July 2020 (UTC)
 * Caution: Python sets are unordered collections. You said above you need to take C's members in order, but taking the set union of A and B is not guaranteed to preserve their orders. You want to use a list merge routine rather than having to re-sort afterwards. Also, using cosine is purely unnecessary and will just slow down your program a lot. You can use (-1)**(k + 1) to denote 1 if k is odd, -1 if k is even. You should use an anonymous function (a lambda form) or just define a separate function; using eval for this is not Pythonic. Your algorithm overall most strongly resembles the Sieve of Atkin but it's not straightforward to prove its correctness.--Jasper Deng (talk) 07:24, 10 July 2020 (UTC)
 * Quicker than exponentiation: -1+2*(k&1) —Tamfang (talk) 02:37, 12 July 2020 (UTC)
 * Maybe even quicker, the one-liner lambda form lambda k: 1 if k & 1 else -1.--Jasper Deng (talk) 06:00, 12 July 2020 (UTC)
 * I learn something every day! —Tamfang (talk) 19:54, 12 July 2020 (UTC)
 * Or, given that that sign bit simply alternates, put at the top of the k-loop signbit = -signbit. Also, each eval is done two to four times for each k; whether or not you replace them with functions or whatever, evaluate them once before the ifs. —Tamfang (talk) 20:03, 12 July 2020 (UTC)

Ok, as per your suggestions, I'll try to shrink the code. But you said that it almost resembles the Sieve of Atkin - but how ? In the Sieve of Atkin the primes and non-primes are marked and separated off in the end, right ? Plus, they use human - unfriendly (in the sense 'complex') calculations, while mine is simple and usable for humans, right ? My sieve uses a filtration technique derived from a small observation I had - some numbers, that appear as corners of some sort of parallelogram placed in between the columns, are related in the following fashion : the ones in a column are present in the form $$n*(6*n + 1)$$ and those in a row are present as $$n*(4*n + 1)$$ (can be -1 in the place of + 1 in both polynomials). Also, it seems to me that you haven't noticed that I did convert the resultant set to an ordered list for convenience (see the source code above, in the previous discussion). Plus, what did you mean by using some anonymous function instead of using eval function with string arguments ? Any redundancies ? I believe that my method saves time (if formatted a bit) because it needn't mark primes and non-primes out separately, and also since it creates a list mostly consisting primes and that non-primes have to be filtered out using some linear polynomials (instead of the quadratic polynomials used in the sieve of Atkin). I believe it is human-friendly and students may find it easy to use.--Sam Ruben Abraham (talk) 08:29, 10 July 2020 (UTC)
 * Well, technically speaking, your polynomials are not linear either (they contain terms of the form xy, and in fact the degree of the polynomial really isn't that important). The Sieve of Atkin is pretty usable as well. And I did notice that you did convert it to an ordered set, but it's quite unnecessary to go from ordered to unordered to ordered. Besides, an algorithm of this sort is only really useful for numbers that are checkable by computers. All your technique does is filter out more composite numbers earlier as any expression that is an integer (greater than one) times another will always be composite. In any case, optimizing this kind of sieve is usually an exercise in computer science, often done with parallel computing, so I don't view your algorithm as necessarily any faster than the sieve of Atkin. It's definitely an interesting one, but I don't view it as a revolutionary discovery either. --Jasper Deng (talk) 09:25, 10 July 2020 (UTC)
 * As for lambda functions, it's more that you could just make a separate function for each expression, or do it as a one-liner.

will output 15, for example. Much cleaner than using eval.--Jasper Deng (talk) 09:40, 10 July 2020 (UTC)

Sorry if I said that the calculations were linear - I took 'n' as constant in value. If so, it could be linear, right ? For a constant 'n', the polynomials that I used (except one of them) are linear , right?

I feel that the Python implementation is slower.... what about a C or C++ version of that ? Would you please mind trying it ?

Even though it is slow (as you say), won't it work ? I tried and (till now) found primes less than 3005 (I didn't go for bigger numbers feeling it will take longer for the output to come)- doesn't that mean it will work well and as required ? --Sam Ruben Abraham (talk) 10:45, 10 July 2020 (UTC)

I did the edits as per your suggestion and the program seemed to work faster.--Sam Ruben Abraham (talk) 11:33, 10 July 2020 (UTC)

Today, I decided to make a tool with less condition checks and a bit more of efficiency, and it seems to work faster than my previous algorithm. It uses the following polynomial to check for composites: $$ n^2 + 2xn$$, where: $$ n $$ is a prime and $$x$$ is a counter.

I feel pretty dubious about this system's credibility, but still it seemed to produce only primes. For test purposes, I am including the source code for the command-line tool:

Highlights of the Above Algorithm
This time, my conjecture arose from the thought that only odd composites occur in the sample I use for sieving in my algorithm. The above quadratic polynomial is helpful in eliminating the composites that occur in the sample and thus makes the algorithm a bit faster. Adding a trigger to terminate the use of one prime by detecting whether the expression exceeds the maximum value in the sample may help in hastening the process. As you say, it may be similar to the Sieve of Atkin (just because a list is initialized for the succeeding calculations), but I believe it has some bit of observation in it (the result of which is seen in the properties I conjectured in this as well as the previous talks), which will be useful for those interested in prime numbers. Purging many composites in the earliest will help in reducing the number of composites to be cut and therefore will help in finding the kind of relations I conjectured in the previous talks, as a result of which, I believe, we'll be able to calculate primes out of the sample faster than how we do it with the sieve of Eratosthenes (just because of the parallelogram relation I conjectured abut in a previous talk), when it comes to bigger ranges. From the above arguments, is it possible to say that my method can work just fine for students like me to calculate primes quickly ? This may not be a great or revolutionary theory, but is it worthy of being made a research paper and another contribution to those family of prime sieves ?--Sam Ruben Abraham (talk) 07:14, 11 July 2020 (UTC)
 * For future comments, please don't use the "new section" button. See WP:INDENT for how to format your reply. I encourage everyone to develop their own research, but I believe your algorithm is of no special significance. After all, integer factorization is suspected to be NP-intermediate, and this method would not be useful for factoring large integers or finding new primes. Shor's algorithm is the state of the art here, being a polynomial-time algorithm, though one needs a quantum computer to make it useful; without a quantum computer, the best we have is the general number field sieve which is not a sieve in the same sense as your algorithm. Primality testing is an easier problem, being in P (the fastest algorithms in practice actually do not have polynomial time bounds, but all run considerably faster than a sieve). The main issue with your algorithm is that it does a ton of duplicate work, especially the inner loop over j: you're going to be wasting lots of time trying to find composite numbers in c that were already taken out. Also, it is always dangerous to modify what you are iterating over in a loop and removing an element from a list is not cheap in Python, especially when not near the end.
 * That said, I have myself undertaken an implementation of the sieve of Atkin in C++ before. Indeed, you would want to use a low-level language like C in order to take advantage of systems programming, such as cache locality and parallel computing. Memory efficiency is very important with the size of your lists. With the use of C, you don't have convenient primitives for data structures like the one you used here; parallelization also demands that one devise a way to either ensure the sets of composite numbers removed by each thread are all disjoint, or otherwise split up the list in a way that allows each thread to work independently. It's a complex topic that every systems programmer should be familiar with.--Jasper Deng (talk) 08:06, 11 July 2020 (UTC)


 * , on the whole, what you mean to say is that what I'm up to is not really worth my hard work, but is indeed interesting when considered a programming exercise ? What about its usefulness when it comes to students ? Students will have to check primality of the numbers taken into consideration when it comes to Eratosthenes' sieve when larger numbers come (unless they reinforce the primality tests into their memory, they won't be able to do away with the composites that appear when doing the Eratosthenes' sieve method) , while mine can be utilized using the parallelogram relation (as I mentioned in one of my previous post), can't it be ? Doesn't that sound sensible ? Why I'm asking is because my main target is students like me and many others here on Earth. Plus, I am not aiming to do prime factorisation, but to find prime numbers (I believe it is obvious to you). I tried making those computer algorithms just to see if it works (and it works ! ☺ ). I know you are better than me at such useful research, but mine was aimed at students and all those who love simplicity rather than complexity (not all people can understand complex stuff at this age) . --Sam Ruben Abraham (talk) 08:41, 11 July 2020 (UTC)
 * I really don't follow your "parallelogram relation" because the way you explained it is rather hand-wavy. This algorithm is not really practical except on a computer, in which case trial division is in fact faster and simpler than your algorithm as a factorization and primality testing method for a given n, requiring (for a given n) only $$O(\sqrt{n})$$ (Big-O notation) steps in comparison to the $$O(n^2)$$ steps that your algorithm uses. The Baillie–PSW primality test is also far more efficient than either method for 64-bit integers, which encompasses every "everyday" number (though the RSA cryptosystem uses primes that are far larger, and the largest known prime number was tested using the Lucas–Lehmer primality test. In neither case would your algorithm or trial division hope to be able to test their primality within the lifetime of our universe).--Jasper Deng (talk) 08:51, 11 July 2020 (UTC)


 * , you haven't said to me whether it can be used by school students in the place of Eratosthenes' sieve . Plus, what is so handy-wavy about that ? The parallelogram relation I mentioned will help anyone to determine primes from the two - columned form ('cola' and 'colb' as separate columns, with lines connecting numbers that can be considered the corners of an imaginary parallelogram which is placed between the columns). Such geometric forms are sufficient to help anyone get rid of composites in no time.--Sam Ruben Abraham (talk) 09:20, 11 July 2020 (UTC)


 * Plus, if you are in Math Stack Exchange, would you mind upvoting my question ? A guy there downvoted my question (where I began from ) and I have lost some privileges. --Sam Ruben Abraham (talk) 09:24, 11 July 2020 (UTC)
 * If you meant, I can see why you were downvoted. They were not as nice as me, for a good reason: your method is not a particularly efficient one. I don't view it as a significant improvement over the sieve of Eratosthenes (both require about the same asymptotic runtime complexity and yours is far less straightforward to implement). Mathematical rigor takes time to train, but others are left confused about your thoughts when you don't talk with rigorous wording, and I think that was the primary reason you were downvoted. Don't feel discouraged, but also, I recommend you learn more about the "language" of mathematicians and computer scientists before trying to propose a new algorithm. Also, to wit, the algorithm you gave in the Stackexchange post is much worse than anything presented here, for it defeats the purpose of a sieve by doing another primality test on numbers to sieve out.
 * As you can see by the lack of replies by other mathematicians here, mathematicians will tend not to be interested in something like this. Indeed, a far more worthwhile challenge for you would be to write a scalable, parallel computer program implementing it. I say this as a holder of a degree in both applied mathematics and computer science, which this falls squarely under. To that end, you should learn some number theory and (thus) abstract algebra to help yourself here. For example, the AKS primality test, which is asymptotically the fastest primality testing algorithm, requires knowledge of the Euler totient function, which concerns the size of the group of units of modulo-n arithmetic.--Jasper Deng (talk) 09:58, 11 July 2020 (UTC)


 * Another programming tip. You wrote:
 * n**2 + 2*x*n # the quadratic sieve function (my new function !), which is equivalent to n * (n + 2 *x )
 * The latter expression is slightly quicker, having one multiplication fewer; see Horner's method. —Tamfang (talk) 20:49, 12 July 2020 (UTC)

Concluded; But we still have comments to share

 * Thanks for letting me know that it's not the right time for me to create some sort of prime sieve, and also for letting me know what was the reason behind the downvote . Maybe someday I may get you as a mentor and then, I'll be able to achieve even more than what I've got now. You were right - you were really nice to me, treated my curiosity the way I wanted, but still I haven't reached the answer to how effective will it be for students to use my algorithm using paper and pen. But never mind - they'll surely understand it one day (as I have now). It was indeed nice talking to you. By the way, is it required that I keep experimenting at the moment ? You see, I am a 10th grader in India and the 10th grade is one among those deadline years for us students. Plus, I am attending an entrance coaching class as well as a bit of self-study in Python (I am still at the basics; I have reached creation of classes and objects, but not that deep).