Template talk:Number of archives

Recursion

 * This discussion was moved here from the Village pump (technical), since this is the discussion that prompted me to make this template. --David Göthberg (talk) 09:53, 13 April 2009 (UTC)

The dev's seem to have broken recursion so now NextArchive doesn't work (a template I hacked together in my spare time; not widely used). See the template's wikitext for details on how it worked. Why doesn't that work anymore? I was thinking of extending it to FAC, but now I can't. I know it's a horrible hack, but if you don't like how my code looks, don't make me program in the equivalent of a COBOL/Pascal hybrid; give me sanity, or at least turing completeness. This type of check should be trivial, but now I can't see how to do it at all without writing a lot of boring code (like FAC is right now) and tying yourself to an arbitrary upper bound. -- Thin boy  00  @933, i.e. 21:23, 6 April 2009 (UTC)


 * This was deliberate, because these sorts of infinitely recursing templates were taking up ~40% of the CPU time of the entire Wikimedia cluster and also causing fatal OOM errors. If you want to discuss it, contact Domas Mituzas or Tim Starling on IRC.
 * Wikitext is not supposed to be a programming language, it's supposed to be a markup language. &mdash; Werdna  &bull;  talk  04:04, 8 April 2009 (UTC)


 * Too late. I think it's Turing-complete these days. --Carnildo (talk) 08:31, 8 April 2009 (UTC)


 * Thinboy00: I have spent some day thinking about your problem. I think I now have a much more efficient solution for your NextArchive template and some other archive templates. But it is too complex to explain here. I will contact you when I have coded up those solutions. It might take some time, I am pretty busy, although this is the kind of programming I love to do. :))
 * --David Göthberg (talk) 14:41, 11 April 2009 (UTC)


 * Sounds like Fermat's last solution. Gimmetrow 15:50, 12 April 2009 (UTC)


 * Gimmetrow: Okay, you provoked me to disclose what I am going to do. But don't blame me if you get a headache from reading this:
 * As far as I know: The only way to find out how many archive pages a page has is to use the #ifexist parser function, since we must check if the archive pages exist or not. But #ifexist is a rather expensive operation since the server that parses the template has to call one of the database servers to check if the page exist or not.
 * But we don't need to check all the archive pages, we just need to find the last one. So I am going to use a binary search to find the last page. (But with some modifications so it works better in template code, and so we find low values faster.) Thus I can find the last archive page in the range 1-1000 with about 11 #ifexist calls. (WP:ANI already has 529 archive pages...) That is much more efficient than the approaches used now. If we need to check higher numbers I can do that too. I have already coded a binary search for another template that searches the range 0-500, so I know how to do it.
 * --David Göthberg (talk) 18:11, 12 April 2009 (UTC)


 * I thought of that, too. It just didn't seem necessary for my application, where the vast majority of searches involve less than 10 items. I figured, if it became necessary to handle more than 10, an increasing binary search tree could be used so that low numbers were found without too much penalty. Gimmetrow 20:41, 12 April 2009 (UTC)


 * Ah, "increasing binary search tree", good name for it. That is what I use in str len. Check out the documentation in str len/core if you are in the mood.
 * --David Göthberg (talk) 21:07, 12 April 2009 (UTC)


 * Just curious, but did you do any analysis to come up with the division used at str len/core? On finding archive pages, I thought perhaps this approach: test N, then 2N, then 4N, then 8N, then 16N, etc. to find bounds, then binary search within those bounds. N would depend on the expected distributions. If you did that with string lengths, you might use N=32. Gimmetrow 23:28, 12 April 2009 (UTC)

End of sections moved here from the Village pump.


 * Yes, my very rough analysis was the following: str len is mostly used in italictitle, so the primary use so far is to check the length of page titles. And they are fairly short. The second use is the one that got me involved in string length: To check the length of the text input to boxes like ambox so we can automatically do tricks like determine if the box should be small or not, or when to insert line breaks. Such text input is usually 1-2 sentences and some wikimarkup. (But for such checking I strongly recommend using str ≥ len since it is much more efficient and for most cases also easier to use.) So my estimate is that shorter strings are much more common. I can't really come up with any good usage cases where people would need to check the length of longer strings. If we in the future discover that people do that a lot then we can change the optimisation.
 * And yes, I kind of do such bounds checking before I do the detailed search. But I do it a bit differently than you suggest. In template coding it is much easier to use the bounds 10, 100 and 1000 than using 4, 8, 16 and so on. Since we can simply concatenate output from searches that find values X__, _Y_ and __Z, instead of using a lot of #expr to do "binary" maths. Then I optimised the search within each of the bounds 10, 100 and 1000. Since I am not doing a strict binary search within each range, but instead have a bias towards finding lower values then it doesn't matter that the search in each range has 10 values. It think it would be harder to make such a bias if each range searched say 8 or 16 values.
 * But I do the initial checking in the reverse order compared to what you suggest. I first check if X >= 500, if not then I check X >= 100, if not then I check X >= 10, if not then I only do the 0-9 search. (That I do so is probably not obvious since I do it inside the searches of each range.) So it really just takes three checks to find out if X < 10. First checking x < 10 and then x < 100 would cause more complex code, so I didn't bother doing it. After all, the comparison function I use in str len isn't that expensive.
 * When checking number of archives the comparison function is much more expensive, since it is the #ifexist. So I should probably put even more effort into optimising this one. I will see if I can do the stepping from small to big as you suggest, but in steps 10, 100, 1000 of course. I already have a working "number of archives" template in my user space that can search up to 999 archives. It correctly returns that WP:ANI has 529 archives, and it does it with only 13 #ifexist comparisons. :)) But it currently is a bit expensive for low values, since it takes 5 #ifexist comparisons to find the value when there are only 4 archive pages. But it finds 9 archives with only 6 #ifexist comparisons, so it starts paying of pretty quickly. With your stepping I would save one comparison, making it find 4 archives in 4 comparisons, and 9 archives in 5 comparisons. But it will cost one extra comparisons for the range 10-99 and two extra comparisons for the range 100-999. And I'm not sure it is worth the code complexity, but I will study it.
 * --David Göthberg (talk) 09:53, 13 April 2009 (UTC)

David, I should have thought of doing an optimized search. I think 999 archives ought to be enough for anyone. Out of idle curiosity, is this template still Not Ready For Prime Time (the /doc page says it is.)? -- Thin boy  00  @930, i.e. 21:19, 13 April 2009 (UTC)


 * 1: Your message above came exactly in the right timing for me, since I had created str len some week ago. So I had just figured out how to do binary search and loops in template code. So making number of archives was partly just cutting and pasting code from str len.
 * 2: Well, Bill Gates once said "640 kB RAM ought to be enough for anyone"... Actually, 999 archives will perhaps only be enough for about two years. Since as I mention above WP:ANI already has 529 archives, and as you know things here at Wikipedia tend to accelerate. But no worries, I can easily update this template to handle 9999 archives. And that will just cost one extra #ifexist comparison for archives below 1000.
 * 3: Yes, as far as I know this template is now fully functioning. It has passed all my tests. But I know from experience that waiting some days before deploying is a good thing. Since we pretty often find some more bugs, or change our mind how the parameters should work or what the template should be named. And I haven't finished the documentation yet, there are some very important non-obvious usage instructions I must add. So don't go advertise this template yet, since without proper documentation it will cost us way too much support questions.
 * But you are very welcome to start testing and even building other things with this template. But don't deploy those things just yet.
 * 4: I am going to make some other templates to accompany this one: I will upgrade archive number so it can count to 999 and also count other archive page names, just like this template can. And I will upgrade archive-nav etc. to use these templates.
 * 5: And perhaps I will make archive name that can figure out what the archive name is, when on an archive page. That is, if on say "/Archive 1" it will return "Archive " (note the space), and when on "/somearch1" it will return "somearch". Thus with the help of we can automatically find the name to use in  and . But I am hesitating, since it might encourage people to use non-standard archive names. And  will only work when on an archive page, it can't be used from the base page where we need to link to the archive pages. And I checked, the majority of archive pages are already using the standard "/Archive 1" naming. And many of those that use non-standard naming don't use the same naming for all the archives in one series, so we can anyway not link to the previous and next archive. So perhaps we should instead "enforce" the standard naming?
 * --David Göthberg (talk) 23:07, 13 April 2009 (UTC)