Module talk:String2

New function "label" added
I added new function "label" which capitalize only first letter for fetched wikidata labels (diff). Because Wikidata English labels generally begin with a lowercase letter (d:Help:Label). New function is almost same as "sentence" function except that "label" doesn't lowering the rest of text. If there are any questions or problems, feel free to report here. Thanks! --Was a bee (talk) 03:03, 17 September 2017 (UTC)
 * Some objection. A "label" has multiple options. Especially: it is "free" (as in: irrelevant, not defining). If this option wants to imply "page title as a wikilink label", then change the option (parameter) name. -DePiep (talk)
 * I've removed the label function as obselete in favour of ucfirst. --RexxS (talk) 15:16, 13 November 2018 (UTC)
 * I've removed the label function as obselete in favour of ucfirst. --RexxS (talk) 15:16, 13 November 2018 (UTC)

ucfirst bug
There is an script error on articles like 999 (album) and Casual Viewin' USA, whith an error on line 35, which originates on line 34. "%a" on line 34 did not match numbers after the pipe in my test, although it does find an letter after the pipe. As both of these articles have an number after the pipe, they both give an script error. The script needs to deal with the possibility of an number after an pipe.--Snaevar (talk) 23:13, 9 October 2019 (UTC)
 * @Snaevar: I noticed many articles with that problem and I believe I have just fixed the module. Johnuniq (talk) 09:11, 10 October 2019 (UTC)

ucfirst bug, part 2
E.g.  returns ĐOrđe Balašević chronology, as if the function does not realize Đ is a letter and is capitalizing O instead. Note  returns Đorđe Balašević chronology, as it should. Oddly,  works correctly (i.e. Đorđe Balašević chronology is returned). Infobox album chronology is affected, although these kinds of errors seem to be very rare. GregorB (talk) 18:24, 24 July 2020 (UTC)
 * Almost all of the time, the standard Lua string library calls manage to cope although they only deal with single-byte character codes. Once in a while an application needs to work with UTF-8, and this is one of those cases. I've updated the ucfirst call to use the mw.ustring library which handles UTF-8 characters properly. Now we should get:
 * Thanks for spotting that, and please let me know if you find any more issues. Cheers --RexxS (talk) 19:24, 24 July 2020 (UTC)
 * That was super quick, thanks! GregorB (talk) 19:50, 24 July 2020 (UTC)
 * Thanks for spotting that, and please let me know if you find any more issues. Cheers --RexxS (talk) 19:24, 24 July 2020 (UTC)
 * That was super quick, thanks! GregorB (talk) 19:50, 24 July 2020 (UTC)

New function "findlast" added
Function findlast finds the last item in a list. The first unnamed parameter is the list. The second, optional unnamed parameter is the list separator (default = comma space). It returns the whole list if the separator is not found.

The list is trimmed of leading and trailing whitespace; the separator is not (so that leading or trailing spaces can be included).

One issue is that using Lua special pattern characters as the separator will probably cause problems.

Examples:
 * Normal usage:  →
 * Separator not found:  →
 * One item list:  →
 * List missing →
 * Space as separator:  →

Any bug reports welcome. --RexxS (talk) 20:33, 19 November 2020 (UTC)

Upgrading posnq
Now supports named parameters: source, target, plain, nomatch; and UTC characters: → Any bug reports welcome. --RexxS (talk) 00:08, 8 December 2020 (UTC)
 * Now deleted, per TfD outcome. Plastikspork ―Œ (talk) 16:18, 7 January 2022 (UTC)

Added matchAny function
I have added a matchAny function to the sandbox; it takes any number of patterns and returns the index of the first which matches, if any. Demo usage at Template:Infobox animanga/Header/sandbox. Comments welcome. (I'm a new template editor so I can make this change myself if no objections.) User:GKFXtalk 19:17, 8 April 2021 (UTC)
 * I'm a bit fuzzy at the moment and should not be relied on but the code looks good. I don't understand p._getParameters (please don't explain it) but my guess is that matchAll requires source=input. However, the example usage in the comment does not show that. Johnuniq (talk) 00:59, 9 April 2021 (UTC)
 * Good thing you pointed out p._getParameters, I was using it wrong. Fixed the docs also. User:GKFXtalk 09:56, 9 April 2021 (UTC)

Remove upper and lower functions?
I recently noticed a rather significant issue with uppercasing and lowercasing strings in Lua: it mangles strip markers. The built-in parser functions do not. See sample (now a mock-up) :

"UPPER.'"`UNIQ--REF-0000001E-QINU`"' invoke:string2"

- lower.'"`uniq--ref-0000001f-qinu`"' USING UC: using lc:

Is there any good reason to make this feature available to wikitext? It would be highly confusing for editors and template authors to see strip markers. Otherwise this module's upper and lower functions should be removed from anything using them and replaced with the uc:/lc: parser functions. User:GKFXtalk 18:47, 15 April 2021 (UTC)
 * ✅ These functions have been removed, and the sentence function has been made strip-marker safe. User:GKFXtalk 19:36, 24 April 2021 (UTC)

implementing Template:trunc in Lua
I implemented the behavior of Trunc in Lua, as function trunc in the sandbox. This code is simpler and faster than the template code. I'd like to promote it to the main module (as opposed to having a small side module just for trunc). Any objections? — hike395 (talk) 09:12, 9 June 2021 (UTC)


 * I’d like to delete Trunc entirely. We already have ample substring functions,  should be adequate (with or without ignore_errors as needed). User:GKFXtalk 09:15, 9 June 2021 (UTC)
 * Trunc is transcluded onto 4,400 pages and is used in a tangle of templates. Removing it would be painful. does not exactly implement the template (because it returns an empty string on error as opposed to the original untruncated string). If you'd like to bring it up at TfD and then be responsible for the cleanup of the mess, please go ahead. If you don't want to go down that path, I'd like to implement it in Lua, either here, or if necessary, in another Module (which I think would be less tidy). I think that would be a lot less work and still make the encyclopedia better. — hike395 (talk) 09:44, 9 June 2021 (UTC)
 * There is Module:Ustring which removes the unhelpful error handling. I don't think it would be that painful to remove. Having multiple substring functions with random names is a product of the pre-Lua era; I don't think it's something that should be perpetuated. User:GKFXtalk 17:14, 9 June 2021 (UTC)
 * I agree that it would be better to just call directly, if possible. If you have the spare time to clean up uses of Trunc, I would support deletion. I just don't have time to clean it up myself. — hike395 (talk) 17:47, 9 June 2021 (UTC)
 * I’ve made some effort to refactor templates using old string functions in recent weeks, but the idea of actual deletion hasn't always gone down well at TfD. I’ll think about another nomination. User:GKFXtalk 17:44, 10 June 2021 (UTC)

findpagetext throws big red Lua error for redlinked page
As I just discovered when importing this module on a sister project, findpagetext will throw a big ugly (and misleading) Lua error when the wikipage in its first argument doesn't exist, because the module doesn't check that it exists before getting its contents and doesn't try to catch this kind of error.

Simply checking that :getContent didn't return nil or empty and returning  before handing it to mw.ustring.find is probably enough. Xover (talk) 17:30, 26 October 2021 (UTC)
 * I tweaked the sandbox to fix this. I'll leave updating the main module for a day or two in case anyone sees other issues. Here is a tweaked version of your test above.


 * Johnuniq (talk) 06:11, 27 October 2021 (UTC)
 * Thanks! I tried a couple more random edge and gigo cases (empty regular wikipage, Special:BlankPage, Special:Watchlist) and nothing blew up.BTW, only vaguely related and not a bug as such, but I noticed that when you pass in an empty text you get empty output (because the check for this case returns nil) instead of the nomatch string, which was unexpected. My expectation for this would be that it is a nomatch, because an empty search string isn't really "invalid" as such, it just doesn't match anything. I could make the opposite argument too, of course, but it might be worth taking another look at when to return nil and when to use nomatch at some point. --Xover (talk) 08:10, 27 October 2021 (UTC)
 * Yes, I wondered about that but decided to keep the original. I doubt if there is any usage of the "feature" but if a template used find page text, it might easily pass its parameter as . In that case, templates often pass empty text to mean "no parameter" which sort-of makes an empty string result sensible. Johnuniq (talk) 09:01, 27 October 2021 (UTC)
 * I updated the main module so the above checks now work. Johnuniq (talk) 06:35, 28 October 2021 (UTC)
 * Johnuniq (talk) 06:11, 27 October 2021 (UTC)
 * Thanks! I tried a couple more random edge and gigo cases (empty regular wikipage, Special:BlankPage, Special:Watchlist) and nothing blew up.BTW, only vaguely related and not a bug as such, but I noticed that when you pass in an empty text you get empty output (because the check for this case returns nil) instead of the nomatch string, which was unexpected. My expectation for this would be that it is a nomatch, because an empty search string isn't really "invalid" as such, it just doesn't match anything. I could make the opposite argument too, of course, but it might be worth taking another look at when to return nil and when to use nomatch at some point. --Xover (talk) 08:10, 27 October 2021 (UTC)
 * Yes, I wondered about that but decided to keep the original. I doubt if there is any usage of the "feature" but if a template used find page text, it might easily pass its parameter as . In that case, templates often pass empty text to mean "no parameter" which sort-of makes an empty string result sensible. Johnuniq (talk) 09:01, 27 October 2021 (UTC)
 * I updated the main module so the above checks now work. Johnuniq (talk) 06:35, 28 October 2021 (UTC)

The function String2#posnq has been nominated for deletion
The posnq function from this template has been nominated for deletion. You are invited to comment on the discussion at the entry on the Templates for discussion page. User:GKFXtalk 15:37, 31 December 2021 (UTC)
 * Now deleted. Plastikspork ―Œ (talk) 16:23, 7 January 2022 (UTC)

One2a with fractions
When I try and use the One2a wrapper with convert on a faction such as it produces "a-half acre (0.20 ha)" with a hyphen, which isn't grammatically correct. Is there a way to change this to be a space? Thanks, -- Voello  talk  13:55, 15 January 2022 (UTC)
 * There is no good support for all variations like that. The simplest would be to give up and use:
 * → a half acre (1/2 acre)
 * Johnuniq (talk) 23:27, 15 January 2022 (UTC)

Fix ucfirst when using &times; or 32nd
Currently ucfirst fails if you try to use it on for example 32nd or using html entites such as &times;. Please update module with code from sandbox that fixes this. All ucfirst-testcases should be green afterwards. Tholme (talk) 15:31, 12 May 2022 (UTC)
 * ✅ Looks good to me – thanks! User:GKFXtalk 12:04, 14 May 2022 (UTC)

Redirects in findpagetext
If a page gets renamed, findpagetext no longer finds the text, and feels kinda lost. Can it be made to follow redirects (either using module:redirect or by itself)? — Guarapiranga ☎ 03:22, 7 June 2022 (UTC)

Re: this discussion (permalink) at Help desk.

I have tweaked  in Module:String2/sandbox to account for that case. I have tweaked Module:String2/testcases to show that I didn't break anything and to show that the ~/sandbox version correctly renders the string of wikilinks where the piped link is not the first link.

Comments desired. Without anyone comments, I shall update the live module from the sandbox.

—Trappist the monk (talk) 00:07, 25 February 2024 (UTC)
 * Good. See my edit at Module:String2/sandbox for a trivial issue. My head is not quite up to parsing the regexes at the moment but surely if first_text exists, that means it is not piped and you don't need to check for that? I think that's what is happening? Several of the mw.ustring could be plain string (faster) but that might be a bit tricky for subsequent editors. I suspect finding the start and end of the lowercase letter and replacing it with sub would be more efficient than gsub but I'm not up to that... Johnuniq (talk) 01:02, 25 February 2024 (UTC)
 * Thanks for the fix. Everything I write from scratch and everything that I maintain has  .  I forget that other modules don't always use that.  I've added it to this sandbox.
 * The code doesn't actually check for unpiped links but does check to see if the link we found is a piped link – we want to upcase the piped link's display text, not the piped link's link text. If not a piped link, we fall into the code that upcases the unpiped link's link text.
 * because unicode characters are allowed as article titles. I can imagine non-English redirect links for example.
 * You may be right with regard to  vs   but I have never really liked   because to me, it is more cryptic than patterns.  I don't suppose it really matters that much because I suspect that   isn't used all that often.
 * —Trappist the monk (talk) 02:00, 25 February 2024 (UTC)
 * Oh. I had a look at what aroused my suspicion and see that I was imagining the "extract" regex at line 26 included a pipe in the "not these characters" part so it only found an unpiped link. A hallucination from non-artificial intelligence. Johnuniq (talk) 04:51, 25 February 2024 (UTC)
 * In the time since my last post, I have hacked some more on ~/sandbox. I did that because it seemed odd to me that   had specific code to handle  but no other tags.  Why just that tag?  So I've hacked ~/sandbox so that   upcases the first letter character that is not in a stripmarker, an html-like tag, an html character or decimal/hexadecimal numeric entity, and strips the various list markup and miscellaneous punctuation.  For example, non-English text wrapped in a  template has multiple leading html tags:
 * → casa → casa
 * So we want the 'c' in casa to be uppercased:
 * The live version of the function can't handle that:
 * Miscellaneous other examples:
 * So we want the 'c' in casa to be uppercased:
 * The live version of the function can't handle that:
 * Miscellaneous other examples:
 * Miscellaneous other examples:
 * Miscellaneous other examples:

←  ←   –   → ' ←   ←   ←   – malformed markup ←
 * Anything I've missed? Is there anything glaringly wrong with the implementation?
 * —Trappist the monk (talk) 17:27, 3 March 2024 (UTC)
 * There having been no comment (and after a bit of a fumble), I have updated the live module from the sandbox.
 * —Trappist the monk (talk) 18:48, 10 March 2024 (UTC)
 * Thanks! Johnuniq (talk) 01:09, 11 March 2024 (UTC)