Talk:Comparison of programming languages (string functions)

C function toupper in UpperCase
This is misleading in the article. C doesn't have a function to uppercase a whole string. toupper takes and returns an integer as its arguments, NOT strings. It's prototype:

int toupper(int c);

If c is a lowercase letter (a-z), topupper returns the uppercase version (A-Z). Otherwise toupper returns c unchanged. toupper does not convert international characters (those with ASCII codes over 0x80), like ă or ç. To uppercase a whole string you need to write a function something like this:

In C strings are essentially pointers to a character and they end where there is a NULL ('\0') character. It would be worthwhile to explain what strings are in different languages.Senor Cuete (talk) 03:41, 10 May 2008 (UTC)Senor Cuete

The 1. should appear as a pound sign and the box is put there by Wiki's text engine. I didn't type it like that.Senor Cuete (talk) 03:44, 10 May 2008 (UTC)Senor Cuete


 * The tag should fix it. Ghettoblaster (talk) 12:43, 10 May 2008 (UTC)

Compare (integer result, fast/non-human ordering)
In the table row for C, why would you go through the hassle of writing your own function when you could call the C function strncmp?

Senor Cuete (talk) 00:52, 16 May 2008 (UTC)Senor Cuete

substring
Shouldn't the table row for C just mention the C function strncpy?

Why concatenate when you can copy?Senor Cuete (talk) 00:53, 16 May 2008 (UTC)Senor Cuete


 * Because strncpy will not copy a null-terminator if the string is n or more characters long. --Spoon! (talk) 12:13, 16 May 2008 (UTC)

strings vs lists
"In both Prolog and Erlang, a string is represented as a list (of character codes), therefore all list-manipulation procedures are applicable, though the latter also implements a set of such procedures that are string-specific."

I think this is the same for Haskell, should it also be noted? —Preceding unsigned comment added by 124.171.21.141 (talk) 00:20, 28 June 2008 (UTC)

Additional procedure/operators
Some further string manipulations for consideration:
 * substring append & prepends: eg in python: s+="ABD"
 * replace substring:
 * by substring text: eg AWK gsub("Earthling","Martian",string)
 * by slice: s[3:4]="XY"
 * insert substring at offset.

NevilleDNZ (talk) 08:17, 15 May 2009 (UTC)

ASC
Came here looking for a Python equivalent to the ASC function, which, in BASIC/VB6, returns the numeric value of the first character of a string.

Not exactly equivalent to any string function in any language which handles strings differently, but in BASIC it was a string function. —Preceding unsigned comment added by 203.206.162.148 (talk) 05:17, 22 June 2009 (UTC)


 * It's called ORD in many languages (since the character set / language / font may not be ASCII, but the idea is the same). This Wikipedia Page String Function comparison could use a section on (number to/from string, character to/from string) http://rosettacode.org/wiki/Character_code#Python  --BrianFennell (talk) 22:37, 3 September 2009 (UTC)

substring, startpos, base?
Ark! The substring table does not list the base for startpos and endpos. Is the startpos=1 the first character in the parent string, or the second? —Preceding unsigned comment added by 203.206.162.148 (talk) 05:57, 22 June 2009 (UTC)

Square bracket as syntax
There is a problem here: sometimes the square brackets indicate on optional field: string(1[,n]), and sometimes are part of the language: string[1,n].

That leaves the problem that we can't always see that part of the command is optional: string[1 /,n/]. —Preceding unsigned comment added by 203.206.162.148 (talk) 06:03, 22 June 2009 (UTC)
 * I see that it's been Fixed now - thank you whoever :~) 203.206.162.148 (talk) 07:22, 14 January 2010 (UTC)

LUA missing as programming language
I missed lua in this page. I'm willing to add lua examples (which might take some time) but there should be someone to cross-read them. Or are there reasons not to have lua in the examples?

LUA string.find and string.gsub misplaced?
These functions work with pattern matching, not with plain strings (well, find can be forced to do so with additional options) There should be at least a comment about this. Bassklampfe (talk) 15:12, 30 November 2010 (UTC)

Removal of "Compare (integer result, fast/non-human ordering)"
I am removing the Compare (integer result, fast/non-human ordering) section, for the following reasons:

I ran some benchmarks in Perl and OCaml, and I was unable to find any cases where the "fast" version was not actually slower than the standard approach. In one case (OCaml, comparing short strings), the code given in the article was literally 33% slower than a straightforward String.compare! It's possible that things might be different in other languages, and the technique might be generally faster in some very restricted circumstances (maybe when comparing very long strings that are very similar?), but it is clearly not something that anyone should be using without benchmarking it against their own data; and it's unlikely that string comparisons will frequently be enough of a bottleneck to justify this kind of micro-optimisation in the first place.
 * 1) This is not a common or primitive operation.  Observe that of the languages listed, not one provides a built-in operator or standard library function to perform this type of comparison.  Only one of the examples calls a single function, and that is in an uncommon third-party library.  The rest are all implemented in terms of structural comparison of tuples (not a string operation at all) or sequential boolean OR (using the basic string comparison already detailed in the previous section).  The section therefore does not in fact provide any new information about string functions at all. It merely describes an alleged optimisation technique.  But...
 * 2) This is not even an optimisation in most cases.  The complicated "fast" approaches given in the article all involved more operations than the straightforward standard approach, nullifying any speed improvement they might have brought.  The OCaml and Ruby examples were particularly bad, since the "fast" versions actually involved allocating and freeing memory on the heap!

In short, this is not the kind of useful information that Wikipedia prides itself on spreading, and I don't think it belongs in this article. 87.194.117.80 (talk) 17:04, 26 July 2009 (UTC)

equivalence relation missing
This article deals with three ways to compare string (equality, compare, and strcmp). This might have some issues:
 * From my understanding, the three ones cover the same feature.
 * This feature is not defined as long as lexicographical order is not defined.
 * It is not clear if this comparison is a low level comparison, or on an equivalence basis.

For instance, how do you compare Montr& and Montre& (the two canonically equivalents UTF16 unicode forms)?

"Code" format
The "code" tags on the keywords in the tables (or perhaps other changes) have destroyed the formatting, making the tables almost illegible. If you go back a decade and look the original tables, you'll see that the keywords are clearly delimited, making the tables clear and easy to read.

The present formating makes the whole excercise almost worthless: if you can't read it easily, whats the point of having pages of text? — Preceding unsigned comment added by 203.206.162.148 (talk) 09:27, 18 July 2017 (UTC)

How-to guide
It seems to me that this article, as useful as it is, is outside of Wikipedia's scope, in light of the principle that Wikipedia is not a how-to guide, which is exactly what this article is. Largoplazo (talk) 10:00, 18 June 2020 (UTC)