Talk:Null-terminated string/Archive 1

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
Archive 1

String to long

I am facing a problem in C language. I want to convert String into Long.

char* s = "hello"; long l;

l = atol(s); cout<<l;

it returns me 0.

can anybody solve this problem. I am waitinf for ur kind replies... Thank you

The string "hello" isn't numeric so it can't be converted to a number (whether int, long, or what-have-you).
Atlant 12:53, 9 November 2005 (UTC)
As a side note, the language used above cannot be C, since cout is used. cout is used in C++.
81.234.182.189 14:42, 31 January 2007 (UTC)

Here is how to reliably see whether you have a number:

 char* s = sourcestring;
 char* p;
 long l = strtol(s, &p, 10); // this *sets* p to point after the number
 if (p == s) {
   // string did not start with a number or whitespace
   // this also catches the nul string
 } else if (*p) {
   // there were trailing characters after the number
 } else {
   // string was only whitespace + a number and ok
 }

C String

Time to create a disambig page, and to give the c-string women's bottom thong garment its own page? 85.227.226.168 17:03, 2 June 2007 (UTC)

Yes, please do create a disambig page.

C String is the brand name of new type of women's lingerie, which is worn instead of a thong. While the manufacturer calls it 'invisable underwear' it is essentially a thong with no waist band. A flexible internal frame holds it to the pelvis and keeps it stable. The manufacturer claims that the advantages are that there are no panty-lines. While many women worry that it will not stay in place, the manufacturer claims that it does. In addition to being used as underwear it can be worn to the beach as swimwear and it is sometimes worn by Brazilian samba dancers who are competing to wear the skimpiest of outfits at Carnival.

UK http://www.lovehoney.co.uk/brands/c-string/

USA http://www.squidoo. com/CString

A UK Video showing a C String over lingerie. http://www.youtube.com/watch?v=2kUgkDH27e8 —Preceding unsigned comment added by 80.218.52.38 (talk) 05:32, 10 January 2008 (UTC)

Redirect ASCIZ and ASCIIZ here

Currently "ASCIZ" redirects to String_(computer_science) and "ASCIIZ" to American_Standard_Code_for_Information_Interchange. Both should redirectet to the same article. I think to here (C_string) would be best. Other suggestions? /David A 195.84.167.2 08:34, 19 October 2007 (UTC)

Done! Now ASCIZ and ASCIIZ points here. (Have since found out ASCIZ is also something about DNA and proteins. That's someone elses problem.) /David A 195.84.167.2 (talk) —Preceding comment was added at 13:55, 22 April 2008 (UTC)


Was this article written by a crackhead?

It is confusing, doesn't describe anything properly and fails to distinguish between C and C++ headers. The article contains numerous FUD-licious statements about C strings, and reminds me of Java propaganda. It's as though it were a brain-dropping produced by the innards of some college freshman, and dumped into Wikipedia as part of coursework. 194.187.213.95 (talk) 15:03, 6 June 2008 (UTC)

MFC

Probably this article should mention MFC's CString (http://msdn.microsoft.com/en-us/library/ms174288.aspx) Komap (talk) 08:09, 9 September 2008 (UTC)

Does anybody know if MFC strings use a length, or do they rely on the nul terminator. If they use a length then maybe they should be mentioned with C++ strings as a modern alternative. If they use the nul terminator then they could be listed with the (probably about 100) c-string wrapper libraries developed over the years. —Preceding unsigned comment added by Spitzak (talkcontribs) 20:11, 24 May 2009 (UTC)

Programming sample?

No examples of proper or buggy c string code, this would be a good addition. The std::string article has sample codeRedhanker (talk) 00:49, 21 April 2010 (UTC)

I put some references to strlen and strcpy which have sample code.Spitzak (talk) 05:54, 21 April 2010 (UTC)
Can you confirm that MFC strings (not .NET strings) use a length and not a nul terminator? It is unfortunate that I cannot find an example of a "c string wrapper" library, I thought that was one. Once upon a time there were literally hundreds of these!Spitzak (talk) 05:56, 21 April 2010 (UTC)

Definition of string in C

According to the C99 standard draft (n1256.pdf, section 7.1.1, point 1): A string is a contiguous sequence of characters terminated by and including the first null character. ... A pointer to a string is a pointer to its initial (lowest addressed) character.

This definition is in contradiction to the following statements, from the article:

In computing, a C string is a character sequence stored as a one-dimensional character array and terminated with a null character ('\0', called NUL in ASCII).

char foo = '\0'; // foo is a string, and &foo is a pointer to string according to n1256 section 7.1.1

It also means a string cannot contain the NUL byte.

As stated by 7.1.1 point 1, a string includes the NUL terminator, so it contains exactly one NUL byte. — Preceding unsigned comment added by Plebbeh (talkcontribs) 11:15, 19 December 2010 (UTC)

Other considerations: The input and result of strncpy, strncat and strncmp aren't required to be strings at all. They're fixed width, null padded fields. Plebbeh (talk) 11:29, 19 December 2010 (UTC)

Those definitions match. "First null character" implies all characters before that are NOT null characters. The variable contents of the string cannot contain a null character. The definition is trying to make it clear that the memory allocated includes space for that null but it is not part of the data, it is part of the structure.
strncpy certainly does use null-terminated strings. If the copied string has a null before n, then the remainder of the destination is filled with null, not with a copy of the data after the null. The same applies to all the others. Your argument makes as much sense as saying that strchr does not require the input to be a string: if the searched-for character is there then there is no need for a null!Spitzak (talk) 21:04, 19 December 2010 (UTC)
Again, from the C standard: A string is a contiguous sequence of characters terminated by and including the first null character. This renders the following statement incorrect: "It also means a string cannot contain the NUL byte". A string is required by definition to contain a null character.
The nul cannot be anywhere except the last byte in the string. No matter what the C standard says, most users consider the invariant part of the structure to not be part of it's "contents". It is not possible to put a nul "in" a C string because that would truncate it. All other bytes can appear any number of times.Spitzak (talk) 02:05, 22 December 2010 (UTC)
No matter what most users say, the C standard considers the terminating null byte part of the string. Who would you rather trust: Most users, who have never had any experience teaching C or writing C compilers/implementations, or the C standard which has lots of experience teaching C to people who wish to write C compilers/implementations? Plebbeh (talk) 02:31, 22 December 2010 (UTC)
I would much more trust users of the code (ie people WITH experience teaching or writing C) over any standards organization. NOBODY says the NUL is "inside" the string.Spitzak (talk) 00:46, 30 December 2010 (UTC)
We are talking about standard C here, not about versions of C some people may use. The standard C is defined by ISO/IEC 9899.1exec1 (talk) 00:15, 31 December 2010 (UTC)
We are talking about those who teach people to write C compilers, and those who write C compilers. I would much rather trust those who write 100% compliant C compilers to know C better than those who learnt C from some unnoteworthy source. Have you done a survey, prior to insisting that nobody says the '\0' is inside the string? If so, then it was extremely inaccurate not just because the people who wrote the C standard say the '\0' is inside the string, but also because there is at least one people right in this talk page telling you I say the '\0' is inside the string! I asked in ##C on freenode. There are most definitely enough people who believe the '\0' is inside the string to start a debate where there are three sides, consisting of multiple people: Those who believe the '\0' is inside the string, those who believe the '\0' isn't inside the string and those who believe a '\0' isn't required for a string. I believe this is an experiment that can be reproduced at any given time, considering the popularity of that channel. Please show the research leading to your statement 'NOBODY says the NUL is "inside" the string.' Plebbeh (talk) 03:07, 14 January 2011 (UTC)
strn* functions don't require the input to be strings, and the output may not be a string either.
Plebbeh (talk) 04:40, 21 December 2010 (UTC)
strncat() always results in a c string
You appear to be correct about this. It does not operate on a string as the article states, but an array as the C standard states. Plebbeh (talk) 02:31, 22 December 2010 (UTC)
I'm sorry, huh? I'm glad you agree the result is a C string. However since it stops at the first NUL, the input is also a C string.Spitzak (talk) 00:46, 30 December 2010 (UTC)
Strongly opposed. The sequence pointed to by the input (s2 in the reference following) needn't be more than an array; it may or may not be '\0 terminated, and when it isn't '\0' terminated it can't be called a string. "OpenGroup POSIX C standard library documentation". Retrieved 14 January 2011. ... Plebbeh (talk) 03:20, 14 January 2011 (UTC)
strncpy() will treat the nul specially (it does not copy the bytes after the nul in the source) so it is treating the input as a c string.
Unless you reach the upper limit. It does not operate on a string as the article states, but an array as the C standard states. Plebbeh (talk) 02:31, 22 December 2010 (UTC)
By your definition, strchr() does not require the input to be a string. The bytes after the one searched-for are irrelevant. Spitzak (talk) 02:05, 22 December 2010 (UTC)
Nonsense. If the character isn't found between the start of the string and the terminating null byte, the C standard requires that strchr shall return NULL. What makes you think my definition is as you suggest? I suggest not offering input if you don't understand the topic of conversation. Plebbeh (talk) 02:31, 22 December 2010 (UTC)
You are failing to explain the difference. For some reason strncpy, which does something special when encountering a NUL byte, is not working on strings. Yet strchr, which also does something special when encountering a NUL byte, IS working on strings. The fact is that all of these are working on strings. Any code that stops at the first NUL byte is working on strings. You seem to think that because there is another terminating condition in strncpy that somehow it is not working on a string, but my attempt to explain that the same argument applies to strchr seems to have sailed right over your head...Spitzak (talk) 00:46, 30 December 2010 (UTC)
1. strncpy does not require either the input or destination/result to be a string, as has been explained, because the input and output needn't be '\0' terminated. You can't call something that isn't '\0' terminated a string.
2. strncpy writes '\0' to the remaining bytes following the copy.
2a. The end of a string must be the first '\0'
2b. If multiple '\0' bytes are written: The end of destination is not the first '\0'; Destination stores a string but it is not one.
strncpy does not require that the input will contain a '\0', does not guarantee that the destination will contain a '\0' and may store multiple '\0's in the destination "OpenGroup stdlib strncpy docs". Retrieved 14 January 2011.; It is more logical to insist that strncpy operates on "fixed width, '\0' padded fields". strchr requires that the input be a string (terminated with '\0'). "OpenGroup stdlib strchr docs". Retrieved 14 January 2011. Plebbeh (talk) 03:36, 14 January 2011 (UTC)
The article's definition of a string is, as stated above, not conformant with the C standard. For starters, saying that "in computing, a C string (...)" is just silly. It should state "A string, according to the C standard (n1256.pdf - TC3 - 2007), is (...)". Such a definition includes the terminating null byte within the string. The literal "" would, then, create the string of size 1 CHAR_BIT, containing only the null-byte. —Preceding unsigned comment added by 187.39.191.205 (talk) 05:03, 21 December 2010 (UTC)
Strongly oppose. Wikipedia is read not only by programmers, the initial 'in computing, a C string' is essential in introduction. If there is need for more elaborate specification about what really is string, don't do it in introduction, create a section 'Definition' and quote the standard. Wikipedia is encyclopedia, not programming manual.1exec1 (talk) 01:10, 22 December 2010 (UTC)
I think the point you're missing is that the definition provided is not the accurate definition given to C programmers, but a definition given to programmers using a slightly different programming language that doesn't use C strings. Plebbeh (talk) 02:31, 22 December 2010 (UTC)
I believe we understood the same sentence differently. I agree that NUL byte is the part of the string. I clarified it a bit. See if that helps.
I think that wording 'usually stored in a one-dimensional array' is vague and not appropriate. Is it possible to store a character string differently than 'as one-dimensional character array'? C standard doesn't leave it as undefined (A string is a contiguous sequence of characters terminated by and including the first null character). 1exec1 (talk) 13:54, 22 December 2010 (UTC)
According to the C99 standard, a string need not be stored in an array at all. Consider char foo = '\0';. Removing the word 'usually' causes the statement on wikipedia to directly contradict the C standard. Please remove the footnote links if you're going to do that, and rename the page to "C strings stored within arrays", or something. — Preceding unsigned comment added by Plebbeh (talkcontribs) 12:17, 7 January 2011 (UTC)
Furthermore, consider char bar[512] = "hello"; The array, bar is not a string, because the array is not terminated by a NULL character. It does, however, store a string. If strings were stored as arrays, then sizeof (bar) should be 6, so that it is terminated by a NULL character. This further supports the idea that strings are not arrays. The array, bar, actually contains 6 strings: "hello", "ello", "llo", "lo", "o" and "".Plebbeh (talk) 12:26, 7 January 2011 (UTC)
Actually the array is padded with null characters if it is initialized like shown. And even if not, it will contain a null character at the end. Your claim that it contains six strings would apply to any string constant, ie "hello"+1 is "ello", etc. I think you could make a better claim that the array contains the string "hello" and 506 zero-length strings.Spitzak (talk) 20:55, 7 January 2011 (UTC)
You're right. The array is padded out with zeros, but the string terminates at the first zero. The array bar, as defined above, does store a string, but it is not one because it contains more than one NUL byte. Do you agree?Plebbeh (talk) 23:07, 7 January 2011 (UTC)
See answer at Ambiguity section1exec1 (talk) 23:21, 7 January 2011 (UTC)

terminating NUL

I think the following definition is even more vague It also means a string cannot contain the NUL byte (the only NUL byte is the one marking the end).. Its unclear whether the string does contain NULL since the first part of the sentence contradicts the second.

I suggest to sticking to something saying that NUL byte can appear only at the end of the string. Note, that NUL byte is part of the data.1exec1 (talk) 11:44, 23 December 2010 (UTC)

All I can do is STRONGLY ASSERT that real programmers using C strings do not consider the NUL to be "contained" or "inside" the string, and will say "no" to the question "can a C string contain a NUL byte". The C standards wording is contrary to standard usage of these words and thus misleading. And I very much more trust actual users of this code to define it correctly than some bureaucracy.Spitzak (talk) 00:41, 30 December 2010 (UTC)
OK, please explain how a string consisting of 10 characters needs at least 11 bytes to be stored. Your trust does not matter here; see WP:OR.1exec1 (talk) 00:05, 31 December 2010 (UTC)
Explain to me why a std::string of 10 bytes needs at least 14 bytes. Note that those 4 bytes used to store the length are not "inside the string". They are part of the strings STRUCTURE.Spitzak (talk) 01:10, 31 December 2010 (UTC)
Bad example. We're talking about C strings. In C++ the layout of std::string is not defined by the standard and hence may vary from one particular implementation to another, e.g. there are short string or copy-on-write optimizations added to some implementations but not to others. Also, it is possible to design std::string as a direct wrapper to a C style string (i.e. without storing the length of the string) and this implementation would still be perfectly conformant. Hence your argument is invalid and my above point is still holds.1exec1 (talk) 02:01, 2 January 2011 (UTC)
I agree with 1exec1. I am a real programmer, and I STRONGLY ASSERT that real programmers who write real C compilers DO consider the null terminator to be part of the string. The C standard agrees with him, and I. Without it, you are NOT dealing with a string. I've cited references that suggest that the terminating '\0' byte IS part of the string. Please cite references that support your claim, Spitzak: 'real programmers using C strings do not consider the NUL to be "contained" or "inside" the string'. Plebbeh (talk) 12:00, 7 January 2011 (UTC)
Furthermore, suggesting that the '\0' is NOT part of the string only serves to confuse users when engaging in discussions with compiler developers, who know what they're talking about. Please provided notable references (eg. the C99 or C89 standard) that support your claim. Until you can cite such a reference, your contribution is merely counterproductive.Plebbeh (talk) 12:03, 7 January 2011 (UTC)
I am absolutley floored with the lack of reading comprehension here. Please read the word "INSIDE". Of course the NUL occupies memory and must be there for the string to work. I am complaining that NOBODY, including YOU, would ever say "the NUL is INSIDE the string" or "part of the string's contents". This is my only complaint. Please reply down below were I try to more clearly ask this question. I would appreciate any quote from any literature that says the NUL is somehow "inside" the string. Text that does not use the word "inside" or "contains" does not count.Spitzak (talk) 20:45, 7 January 2011 (UTC)
Why do you feel the need to SHOUT?Plebbeh (talk) 16:54, 10 January 2011 (UTC)
Furthermore, I see no question from you, and I can't read minds. Yes. The '\0' is part of the strings contents. Without it, it isn't a string. A string is a sequence of characters terminated by, and including the '\0' (NUL) terminator. This is fairly straight forward, and it is what your implementation (stdlib, and/or compiler) developers use. Plebbeh (talk) 16:58, 10 January 2011 (UTC)
Spitzak, it is because of this "bureaucracy" that we have standard compilers today. It's not there to lock you in like some fascist regime; It's there to set rules so that you know what to expect! Without it, your code would never be portable. There would be dozens of different dialects of C. The document I cite is the document that compiler/stdlib developers follow. If the users don't agree, they should rebell and use a compiler from the 1970s. A definition conveyed by some unreputable user is not important or relevant to this article.Plebbeh (talk) 14:11, 7 January 2011 (UTC)
A Google search for "contains a nul byte" will show a huge number of hits where the writers believe that a C string cannot "contain" a NUL byte. I am reasonably certain the majority of them are aware there must be a NUL terminator on the end of the string, but not once did I see "a string cannot contain more than the one nul byte at the end." C programmers do not consider the NUL to be "contained" or "in" the C string. An accurate description of the knowledge of C programmers is: "The memory occupied by a C string includes the NUL byte that is after the last byte in the string".Spitzak (talk) 21:49, 7 January 2011 (UTC)
This may seem irrelevant, but I believe you'll have the same issue with it. According to highschool textbooks, atoms are obscurely similar to our solar system. This leads to many people believing atoms are tiny solar systems, and/or that our solar system is a giant atom. Is it possible that the planets orbit the sun like this? http://en.wikipedia.org/wiki/Atomic_orbital#Understanding_why_atomic_orbitals_take_these_shapes ... So what bearing does the belief of "most people" have on the scientific observations made? Who do we believe? Most people, or scientists? — Preceding unsigned comment added by Plebbeh (talkcontribs) 05:00, 12 January 2011 (UTC)

NUL "inside the string"

Rather than continuing to argue with this guy, can somebody else speak up and either confirm or deny my belief that 99.9999% of competent C programmers do not consider the terminating NUL to be "contained by the string" or "part of the string's contents" or "inside the string". Also a logical explanation to him as to why a C standard does not necessarily mean a god-level insight into the best way to describe a structure.Spitzak (talk) 01:14, 31 December 2010 (UTC)

The quoted C standard says "a string includes the NUL terminator". My complaint is that the article translated this to "it contains exactly one NUL byte". The problem is that the words "includes" and "contains" do not have the same meaning here. "includes" implies that the memory usage includes the space needed for the NUL terminator. "contains" means that it is part of the "contents" of the string, the same as saying "this string contains an 'a'".Spitzak (talk) 20:52, 7 January 2011 (UTC)

Incorrect. The C standard says, at 7.1.1p1: "A string is a contiguous sequence of characters terminated by and including the first null character." If by contents you mean data, and the length is an important part of that data, then the '\0' character is part of the contents. Plebbeh (talk) 17:48, 10 January 2011 (UTC)

The terminating null byte is considered to be part of the string. -- directly quoted from "OpenGroup strchr docs". Retrieved 14 January 2011. Plebbeh (talk) 03:45, 14 January 2011 (UTC)

"Ambiguity"?

I debate that the word "usually" was not at all ambiguous in this case. String literals are translated to character arrays containing strings, but a string need not be stored within an array.

Consider the following:

char foo = '\0';
char *bar = &foo;
printf("%s\n", bar);

By the definition within the C standard, the expression bar is a pointer to string and the expression foo is a string. foo is not an array, though.Plebbeh (talk) 12:11, 7 January 2011 (UTC)

The introduction was rendered ambiguous following the removal of the word "usually". See the new section of the article, entitled "Standard terms and definitions in C" for more information. In particular, note the example that declares a variable named "strange_string".Plebbeh (talk) 14:00, 7 January 2011 (UTC)

Note: The identifier "strange_string" has been renamed to "nul_byte".Plebbeh (talk) 05:07, 12 January 2011 (UTC)

&x is a pointer to an array of length 1, according to the C standard. So it is in fact an array.Spitzak (talk) 20:46, 7 January 2011 (UTC)

No it isn't. You haven't given x a context, so it is a syntax error. You might wish to read the sections of the standard that tell you how to declare a variable that has "pointer to an array" type, so you can write your context to fit and cite the standard. Please provide a context so we may continue.Plebbeh (talk) 23:00, 7 January 2011 (UTC)

Regarding the arrays int foo and int foo[1] certainly defines different types.
Plebbeh, do you know any other ways to store a string? I don't say that any array is (or holds) a string, my statement is that strings are stored as character arrays. If the char foo = '\0'; is the only exceptional case, we can directly include that information into the introduction. The reason that I'm avoiding usually so much is that it doesn't say anything at all and does more harm than good.1exec1 (talk) 23:32, 7 January 2011 (UTC)
Sure. unsigned int foo; memset(&foo, '\0', 1); foo now ontains a string. Consider the return value of malloc(). An array? Nope. You're not listening to me. The only way a string can be stored as an array is if the string occupies the entire array, because the string ends at the first '\0' (which would have to be the last character in the array). If there are any embedded '\0' values before the last item within the array, the array is not a string. The usage of the word "as" is ambiguous. — Preceding unsigned comment added by Plebbeh (talkcontribs) 06:47, 8 January 2011 (UTC)
If you go back and read 7.1.1p1, can you find a way of explaining what a string is without using the word "array"? — Preceding unsigned comment added by Plebbeh (talkcontribs) 06:50, 8 January 2011 (UTC)
Oh, OK. As a non-native speaker I actually wanted to express the idea that 'strings are stored in arrays'.
So you want to say that unsigned long foo; memcpy(&foo, "123\0", 4); foo is still a string?1exec1 (talk) 17:07, 8 January 2011 (UTC)
Nope. '123\0' is an implementation defined character literal.Plebbeh (talk) 16:44, 10 January 2011 (UTC)
I'm pretty certain he meant to use double quotes in that example. You do the exact same mistake below where you say memcpy(&foo, '\0', 1);.Spitzak (talk) 20:50, 10 January 2011 (UTC)
Yes, of course, my bad. So foo is still a string or not? 1exec1 (talk) 22:20, 10 January 2011 (UTC)
My opinion is that foo is an unsigned long which now happens to contain the bit pattern of a 3-byte string and it's nul terminator. But (char*)(&foo) is a string.Spitzak (talk) 01:00, 11 January 2011 (UTC)
Ahh, yes. I was thinking memset, and memcpy came out. Yes, foo stores a string. (char*)(&foo) is a pointer to string (7.1.1p1). If you use the terms incorrectly, the communication fault is yours because the terms are strictly and legally defined. Plebbeh (talk) 05:13, 12 January 2011 (UTC)
I have no problem with 'strings are stored in arrays', though it is still not completely accurate. An array is a sequence of elements, but a sequence of elements is not always an array. An array is implicitly converted to a pointer to the first element unless it is the subject of the address-of or sizeof operators, or it is part of an initialisation. The array[index] operator is actually a pointer[index] operator.Plebbeh (talk) 16:51, 10 January 2011 (UTC)
I think the unusual aspect is that in C, if you take a variable x (for "context" lets say it is a char) and do &x, you will get a value that is indistinguishable at this point from a pointer to an array of length 1 and can be used in all the same purposes (such as as a C string). But you are using this (mis)feature of C to convert a singleton to an array as an indication that strings are not arrays. However such an argument could be applied to anything: "an array of integers is not always an array, because you can do "int x; &x" and get something that acts like an array of integers". Therefore I do not see this as a reasonable argument that strings are somehow not arrays sometimes.Spitzak (talk) 04:50, 10 January 2011 (UTC)
You seem to be working on the premise that &x results in an array. That's incorrect. &x results in a pointer to x, which is not an array. int foo; foo is a sequence of characters. An array is implicitly converted to a pointer in many situations; foo is not. memset(&foo, '\0', 1); foo now contains a sequence of characters that is terminated at the first '\0'. &foo is a pointer to the string. The same argument follows for something like char bar[512] = "hello";. bar contains a sequence of characters that is terminated at the first '\0'. If used in an expression that does not use the address-of or sizeof operator on bar, bar will be implicitly converted to a pointer, which is a pointer to the string that it contains. Neither foo nor bar are strings, but they do both contain strings. Plebbeh (talk) 16:44, 10 January 2011 (UTC)
&x turns into a pointer, which as you indicated above, is the same as an array except when sizeof() is used. You seem to be using this to claim that strings are not arrays because you can do "char nul=0; foo(&nul)" when foo expects a C string. However you could use your same argument to say anything that takes an array as an argument does not use arrays. Say foo(int* p) did something where it treated p as an array (ie it did p[0] = 1 or something). By your argument the fact that I can do "int x; foo(&x);" means that somehow foo does not take an array of ints as an argument. That I don't thing makes any sense because it means you cannot claim ANY C api takes an array as an argument. I believe it is much better to say that &x is a C (mis)feature that treats a singleton as a one-entry array. Since this is a common C feature and not at all specific to strings, I don't think it is relevant to this page. A C string is in fact an array of characters, even if it is only the nul terminator.Spitzak (talk) 20:48, 10 January 2011 (UTC)
Ahh, another broken premise. If your point is to prove that you know a programming language that isn't C, you've proven it. You win. The following three words make your premise incorrect: "the same as". That particular pointer is not an lvalue, but an array is.Plebbeh (talk) 00:23, 12 January 2011 (UTC)
Array turns into a pointer not because this is primary language feature (such as equality of pointer and array types, which is not true), but because a standard conversion exists. These conversions are very similar (but not exact) to those of C++, the difference is that in C++ it's possible to extend the set of available conversions, while in C they are hardcoded into the compiler. 1exec1 (talk) 02:37, 12 January 2011 (UTC)

Ambiguity Again

Okay lets start this again.

There is a claim that it is incorrect to say a C string "is an array of characters".

If I understand it correctly, the argument why this is incorrect is:

The following code compiles and has a predictable result, which is exactly the same as if foo() was called with a zero-length string:

extern void foo(const char*); // a function that works on a C string
char nul_byte = 0;
foo(&nul_byte);

The argument is that because nul_byte is not an array, you cannot claim a C string is an array all the time.

My counter-argument is that the & operator in C makes nul_byte LOOK like an array. By the above logic the following code proves that integer arrays are not always arrays:

extern void bar(int x[]); // a function that works on a zero-terminated integer array
int zero = 0;
bar(&zero);

This also compiles. If the logic in the first argument is followed it implies that an argument in C declared as int x[] is also not an array, since we managed to pass the address of the non-array zero to it. I believe this argument can prove that nothing is an array in C, because you can always construct some pattern in memory without using arrays, use the & operator, and pass it to any function that treats memory as an array. Therefore I do not think this argument makes any sense, and I would like to restore the language that says a C string is an array.

Spitzak (talk) 23:09, 12 January 2011 (UTC)

Actually, there is more than one reason why your conclusion is incorrect, Spitzak. Are you sure that's how you'd like to convey your argument? Perhaps it'd be most wise searching for support within the C standard.

First, let us address your incorrect premises. These premises are leading you to an invalid conclusion.

The & operator does not make nul_byte look like an array at all, because:

1. &nul_byte is not an lvalue. You can't use the &address-of operator to get the address of &nul_byte. You CAN use the &address-of operator to get the address of array. This leads on to:
2. char array[n]; The type of &array is (char *)[n], where char is the element type and n is the number of elements in the array. The type of &&nul_byte, if it were valid, would be char **.
3. sizeof(array) == sizeof (array[0]) * n. There is no length (n) attribute for &nul_byte, so it is invalid to make such an assertion with relevance to &nul_byte.

A string is the sequence of characters that is pointed to, not the pointer value itself. What you're referring to as a "string" is in fact what the standard, and every piece of valid documentation calls a "pointer to string". Do not confuse type with representation. char nul_byte = '\0'; /* nul_byte is a string; &nul_byte is a pointer to string */

bar is a function that operates on a pointer, not an array. The declaration of x may look like an incomplete array, but x becomes a pointer long before bar is called; that conversion is part of the translation phase (compilation). Don't confuse execution with translation.

Now, let us address the reasons why your conclusion is invalid.

1. In the above example, nul_byte is a string but it is not an array. Another example could be: unsigned int foo; memset(&foo, 0, sizeof (foo)); /* foo now stores a string */
2. In another more common scenario, a string does not need to span the entire length of an array. If the array does not terminate at the first '\0', it is not a string because a string must terminate at the first '\0'. Consider the following code:
char foo[512] = "hello";
1. A string ends at the first '\0', which is foo[5].
2. The array, foo ends at foo[511]. It does not end at the first '\0'.
3. The array, foo is not a string, however foo does store a string that terminates at foo[5].

Plebbeh (talk) 03:04, 15 January 2011 (UTC)

Incorrect assumption regarding string representation efficiency

I find the following statements most questionable: Getting the "tail" of a C string can be done by just making a pointer to the middle of it. This is far faster than any other string representation, most of which require memory to be allocated and the tail copied.

The last statement is vague and could even be considered invalid due to possible confusion regarding implementation and specification. I'm going to consider that the author meant to refer to speed in terms of time complexity.

"Making a pointer" is an O(1) operation in C. However, determining whether or not the "tail" is in fact, really a tail of a string is in some cases, not. One must ensure such a pointer points to a location before the end of the string. When the string is input at runtime and the length is arbitrary, this involves finding the length of the string (which is linear time, O(n), as stated in the article in a section I may soon be modifying). I don't see how this is, theoretically, any quicker than "copying the tail" (also O(n)).

Consider a representation of "strings" (where the term "string", irrelevant to the C standard, does not require a '\0' terminator) that has out-of-bound representation for the length. One can immediately determine, from the length prefix, whether or not the "tail pointer" points within the string; Determining the length becomes constant time O(1).

Such an implementation may use functions such as scanf to retrieve the string, and the length with the same call. The same could be done with regular "C strings", but the point I'm arguing is that C strings do not provide a mechanism to retrieve the length in O(1), so one can not verify whether or not a pointer points into a string in O(1) using strictly those mechanisms only provided by C strings.

I'd like to propose: Getting the "tail" of a C string can be done by just making a pointer to the middle of it. This is at least as fast as any other string representation, providing the length of the string is available in O(1) constant time. Plebbeh (talk) 11:07, 17 January 2011 (UTC)

Holy crap.
It has nothing to do with the search. It is the RETURN VALUE!!!!!
The return value of strchr() can be passed to a function that takes a string (or a "pointer to a string" as you insist). To the function that takes the pointer to a string, this value is indistinguishable from a pointer to a copy of the tail of the string passed to strchr.
Okay, lets see how we will replace that with std::string. The obvious answer is to make the new strchr return an index. Unfortunatly an index is not sufficient, you need the original string plus an index. This means the second call must be modified to take two arguments, and the code must keep the original string in a variable to pass to the second, and you will need to add a zero index to all other calls to that second function. This is much too hard for any automatic conversion, so all attempts I have seen (things like C interpreters, converters of C to Python/C#/Java, etc) resort to having the strchr replacement resort to duplicating the tail of the string.
Therefore, after the o(n) search (which both do), the replacement has changed an o(1) add with at least an o(n) copy. More importantly it has added a memory allocation and free, which for normal strings is hundreds of times more expensive than copying the text.~~~~
I do not understand your "length" stuff at all. Being able to call strlen() does not prove something is a c string. Also you can take the tail of a c string without knowing the length (finding out the length of the tail is o(n) but creating the tail itself is o(1)).Spitzak (talk) 15:48, 17 January 2011 (UTC)
It's difficult to understand what your message is conveying. Your language causes me to believe you're upset. I suggest taking a walk through the park, taking medication, drinking a beer, whatever makes you feel less like an angry teenager. Once you've done that, come back and read the rest of my talk, write some code and explain yourself clearly... and remember, I don't make up these rules. I follow the rules set out by n1256.pdf so that my comments, and my code is correct. I do it so that I know my programs run reliably, and so that I know they will continue to run reliably in the real world. Regardless, if you're not going to make your point easy to understand, just skip to the last paragraph (the quote by Steve Summit) and save everyone the brain meltdown ;)
I'd just like to point out that the statement infers that C strings are always far faster than every other implementation at finding tails. Write me some example code and I'll prove to you that C strings are almost never far faster than every other implementation, but rather more likely at least as fast as, or possibly slower than some implementations, at finding the tail.
std::string is just one alternative, and an irrelevant one. Your statement, and the fact that you justified it using a single implementation seems to take the form of the following line of reasoning: "All men are humans. Jenny is a human. Therefore, Jenny is a man." Though the argument is correct under certain circumstances, it is invalid because it is not always correct.
What does strchr() have to do with it? That's just one mechanism for "finding the tail". What you're refusing to realise is that the statement I'm questioning is invalid because there are situations where it is not true. Would you like me to propose a "string representation" that permits you to do character searches in O(1)? Write some code...
Why isn't the search for the tail important? What does the "RETURN VALUE" have to do with efficiency?
As Steve Summit wrote, "I'm not a Standard-thumping fundamentalist who worships at the altar of X3J11 because I'm an anal-retentive dweeb who loves pouncing on people who innocently post code containing void main() to comp.lang.c; I'm a Standard-thumping fundamentalist who worships at the altar of X3J11 because it gives me eminently useful guarantees about the programs I write and helps me ensure that they'll work correctly next week and next month and next year, in environments I haven't heard of or can't imagine or that haven't been invented yet, and without continual hands-on bugfixing and coddling by me."
Plebbeh (talk) 17:12, 17 January 2011 (UTC)
: I agree with Plebbeh. The property we're interested in can be rephrased as: a const substring of a C string, that includes its tail, can be quickly acquired (O(1) complexity). However, other types of character storage can do it equally well or better. For example, a struct holding a dynamic array of chars and its length can certainly give any const substring in O(1) time. Going even further we can see that the O(1) complexity of the C string comes from the fact that during the computation it implicitly knows its length. So it's more a struct I described previously than genuine C string. Following this argumentation we see that a real C string can not know its length or it's not a string. So in reality we can access the tail in only O(n). This certainly contradicts the material currently presented in the article. 1exec1 (talk) 17:28, 17 January 2011 (UTC)
Here is a function that takes O(1) time that returns the tail of a C string consisting of all letters except the first one:
const char* returnTail(const char* string) {
  return string+1;
}
Yes it is possible to make a safe string representation that consists of both a pointer to the storage+length and an offset. However the only purpose of this would be C string compatability, if it were not for the need for a drop-in replacement, functions like strchr would be replaced with new ones that return the index and calls to use the result would be altered to take the string and index. The fact is that no popular replacements do this, making conversion of a C-string using program to any other string representation introduce unexpected slowness, which is why it has not happened 10 years ago.Spitzak (talk) 19:37, 17 January 2011 (UTC)
... const char *tail = returnTail(""); How do you ensure tail points to a location within the string? Any other string representation can make the same stupid assumption, and it leads to undefined behaviour. Plebbeh (talk) 00:31, 18 January 2011 (UTC)
Let's assume the documentation for returnTail says "results are undefined if the string has a length less than one". Or you could use this version, which returns string unchanged if it has a zero length or if it is NULL:Spitzak (talk) 02:01, 18 January 2011 (UTC)
const char* returnTail(const char* string) {
  if (string && *string) return string+1;
  else return string;
}
Ahh, my point exactly. The statement I'm questioning does not provide such a specific definition of "tail". The fact that it is unspecific means the statement is invalid, because there are interpretations where it is incorrect. What part of the statement I'm questioning states that 'the documentation for returnTail says "results are undefined if the string has a length less than one"'? If you add that to the statement I'm questioning, what meaning does it add? Feel free to add it for clarity, but I suggest you be aware of what other people may think... I'm clearly not the only person who understands, and it is clear that you understand even with the show of arrogance. If you act too arrogant, people might mistake you as an idiot. Is that what you want? Plebbeh (talk) 13:36, 18 January 2011 (UTC)

More on tail()

Obviously you can return an index or some structure containing the string and an index quickly. The problem is that this return value is not a string. It is a string plus an index!!! I think the thing I am not explaining correctly and is producing the argument is that I am failing to explain why this difference is important.

Direct mechanical replacement of C strings will often produce incredibly slow code. Imagine this is replaced with string objects:

strcat(dest, strchr(source, 'x'));

In order to pass the result of the strchr replacement with the argument to the strcat replacement, it has to produce the tail string, and destroy it after use. This is literally thousands of times slower than strchr or any search! Check how much code malloc and free do if you do not believe me.

Allocation of objects is irrelevant to string operations. An object can be created and re-used thousands of times before it is destroyed. This is precisely what you're doing with your strcat. Otherwise, wouldn't you expect strchr to return strdup(strchr(source, 'x'))? Plebbeh (talk) 01:28, 20 January 2011 (UTC)

I believe what the people arguing with is that they see an "obvious" replacement to avoid the tail. Indeed rewriting the above code to something like this will produce equally-fast code (or faster, if as some have argued, some later strlen() is sped up by the string replacement):

strcat_tail(dest, source, strindex(source, 'x'));

You could even imagine some clever overloaded way of doing this automatically, such as having strchr return a different string+index object and overloading strcat to take either this or a normal string. But if you have ever looked at some typical C string handling code you will find that this quickly becomes a nightmare, primarily because the same variable is often used to hold both normal strings and index results. All the C interpreters and translators to other languages like C# I have seen just punt and use tail(). I would not be surprised if this is responsible for half of the perception that interpreting code is slow.

It is possible to design a string replacement where taking the tail is fast, here is an example:

// Implementation of fast-tail string object:
struct string_storage {
  unsigned reference_count; // since more than one string points here
  unsigned length;
  char data[];
};
struct string {
  string_storage* storage;
  unsigned index; // normally zero but tail() will set to non-zero
};

// function to return the n'th character of a string (no error checking):
char string_index(string* this, unsigned i) {
  return this->storage->data[this->index+i];
}

// Function to return the tail (no error checking):
// also please note that this does not look at the length!
string string_tail(string* from, unsigned i) {
  // non-C psuedo code to make this easier to read:
  from->storage->reference_count++;
  return->storage = from->storage;
  return->index = from->index + i;
}

However no string replacements I am aware of do this. The index value serves no purpose other than making "tail" faster, which as you pointed out is trivial to replace if you are allowed to change the arguments to further functions to take a string plus an index. (the overhead of reference counting is also questionable, g++ std::string does it while Windows std::string does not). Therefore to the string implementers and users it looks like an unnecessary slowing of all string access and a bad compromise for C string compatibility.

This is the whole point of the paragraph you want to delete, because a lot of programmers cannot figure out why C strings have not been replaced yet. Sometimes they believe almost magical speed capabilities of C strings to try to explain this. I am trying to explain the actual reason. Perhaps somebody can come up with a clearer explanation, I have tried and apparently failed.Spitzak (talk) 16:45, 19 January 2011 (UTC)

The string struct you are talking about is not C string. From the standard's point of view it is just some random struct that happens to use C string as its storage device. Similarly, I can always create a struct for a particular storage type, and be able extract tail efficiently. The storage device can be whatever you want: an array of characters, std::vector, std::string, etc. Thus your point still does not hold. 1exec1 (talk) 18:06, 19 January 2011 (UTC)
I am trying to write a paragraph about "WHY C STRINGS ARE HARD TO REPLACE". Obviously any potential replacement is not a C string! Also the char array is not a C string in the example structure, there is no need for a NUL terminator. Please reread what I wrote above, I find it hard to believe I was that unclear.Spitzak (talk) 18:37, 19 January 2011 (UTC)
The actual reasons why bare C strings are still widely used is that there are a lot of C coders out there. Even on C++ it is not uncommon that someone who says he's writing C++ is actually writing C with classes, i.e. without using any benefits of C++ standard library. Also, there's no reason to convert working legacy code, so usually the programs are left as-is. Concluding, the reasons for not adopting better replacements for C strings are not lying in C string itself, thus IMO they do net need a separate paragraph.1exec1 (talk) 19 January 2011 (UTC)
"In order to pass the result of the strchr replacement with the argument to the strcat replacement, it has to produce the tail string, and destroy it after use. This is literally thousands of times slower than strchr or any search! Check how much code malloc and free do if you do not believe me." Allocation of objects is irrelevant to string operations. An object can be created and re-used thousands of times before it is destroyed. This is precisely what you're doing with your strcat and strchr: reusing the same object.
I think it is pretty obvious from the sample code that the result of strchr is used exactly once!Spitzak (talk) 04:59, 20 January 2011 (UTC)
"You could even imagine some clever overloaded way of doing this automatically, such as having strchr return a different string+index object and overloading strcat to take either this or a normal string." Or perhaps you could use a representation that can do the equivelant of strchr in O(1).
I showed exactly such a representation in the sample code where the "string" has both a pointer to a storage and an index to the actual start of a string. This one allows a tail to be constructed in O(1) time. The problem is that such representations are not being used by potential C string replacements.Spitzak (talk) 04:59, 20 January 2011 (UTC)
You need to form logical reasons as to why C strings are faster (as the language says). I'm attempting to debate "there is no reason any other representation can't be just as fast". You're only making my debate easier by saying, implicitly, "this representation is no faster than a representation that doesn't require O(n) searches, or O(n) concatenation". Store the length, and generate a 'lookup table' that will give the offset of each character in O(1). Both of these come from input. For example, consider the loop in which bytes are read from the keyboard and stored into such a string object. During the read loop, one can store bytes AND generate a lookup table. Prior to the read loop, one has access to the length of the string in O(1), and can store it for future use. One can now look up the index of any character and the length of the string in O(1).
At no time did I say that strchr() is O(1). The operation that is O(1) is "take the tail of a string starting at n", which is what strchr() returns after it does the O(n) operation of searching for the character. I suppose strchr is a bad example because of this. A better one would be code that does string+n where n is an integer variable, that has the same problem when an attempt is made to replace it with std::string or similar.Spitzak (talk) 04:59, 20 January 2011 (UTC)
... and your code still doesn't ensure that the "tail" is actually a tail, as opposed to undefined behaviour. You're really bad at this. No offense.
Again, holy crap! Do I really have to do something other than put a comment in there that says "no error checking"? These are EXAMPLES, not some kind of perfect code. I put that damn comment in there in an attempt to stop you from coming up with this sort of bogus argument, but that is not going to stop you, apparently. How about a longer comment: "this function will return the tail of the string provided the n argument is between 0 and the string length. The result is undefined if n is outside this range and this is IRRELEVANT because it is assumed in this example that the caller is a function such as strchr and thus n must be inside this range!"Spitzak (talk) 04:59, 20 January 2011 (UTC)
"Sometimes they believe almost magical speed capabilities of C strings to try to explain this." Citation needed. In addition to what 1exec1 added above, C strings have not been replaced because the C standard does not like to concern itself with opaque types and pointer masking. Why do you think they have fopen return FILE * instead of "file"? It's to keep the library simple, so that C code can easily be translated to run on embedded platforms.
C interpreters do in fact replace char* with a more complex opaque type.Spitzak (talk) 04:59, 20 January 2011 (UTC)

Plebbeh (talk) 01:28, 20 January 2011 (UTC)

Ok an obvious confusion appears to be that I mentioned strchr(), which is O(n) in C string.

The code that C strings do in O(1) is "return the tail of the string starting at int n". That is the function I am attempting to replace to show why replacements for C strings are slow. Imaging C strchr was written like this (I have used no dummy function calls and made sure this matches the C standard so you cannot complain about it):

char* strchr(char* s, char c) {
  int n;
  for (n = 0; s[n] != c; n++)
    if (!s[n]) return 0; // not found
  // LOOK AT THIS NEXT LINE!!!! THIS IS THE TAIL OPERATION:
  return s+n;
}

Please look ONLY at the line that says "return s+n". This is O(1) and is the same as "return a string equal to the tail of s starting at n". Please also notice that due to the previous code it is IMPOSSIBLE for n to be outside the length of s and therefore this ALWAYS works and there is no need to call strlen().

All popular C string replacements must replace this line with SLOW code.

Get it? I'm guessing you probably don't.

I have probably been writing C code since well before you were born, too, and I damn well know what I am doing a lot more than you do.Spitzak (talk) 05:19, 20 January 2011 (UTC)

If you know what you're doing, why is your strchr taking "char c" instead of "int c"? My point doesn't seem to be getting through to your head, so I'm going to try a different approach.
int main(void) {
    char *foo = strdup(strchr("hello world", ' ') + 1);
    printf(foo);
    free(foo);
    return 0;
}
Is it reasonable to expect that a compiler that does smart enough optimisations may realise that foo is never modified, and thus eliminate the calls to strdup and free? Is it also reasonable to expect that a compiler for another language that implements complex string operations to make similar optimisations? If such optimisations are implemented, how does this affect the statement "This is far faster than any other string representation, most of which require memory to be allocated and the tail copied."? Your experience with C prior to 1988 is likely irrelevant to the current context. Oh, master of the ancient wisdom. Does your age and experience make you incapable of producing errors? Nonsense. To the contrary: It has been observed that peoples brains degenerate as they get older, to the point where they're incapable of thinking reasonably. Now that you've been brought back to reality, you've picked a specific scenario for a vague statement. My suggestion was to be more specific, but you want to stand in the way of progress (typical old person)... So here goes:
#include <stdio.h>
#include <assert.h>

void PrintFooString(unsigned char *str, int length);

int main(void) {
    unsigned char Foo[512];
    int Length;
    unsigned int Offset;

    // read a string... irrelevant, but O(n)
    printf("Enter your string:\n");
    assert(scanf("%511[^\n]%n", Foo, &Length) == 1);

    // Just to make sure we no longer have a C string...
    Foo[Length] = 1;

    // read an offset... This is where the tail will start, and you're right... there's more than one way to find a tail ;)
    printf("Enter your offset:\n");
    assert(scanf("%u", &Offset) == 1);

    // check that the offset is valid. O(1) here. Lets see what you can do? <----- THIS IS IMPORTANT!
    if (Offset < Length) {
        printf("The tail at offset %u:\n", Offset);
        // pointing to the tail string is still O(1)! printing the string is irrelevant, but O(n).
        PrintFooString(Foo + Offset, Length - Offset);
        putchar('\n');
    }
    else {
        puts("Stop asking me to do bad things!");
    }

    do { Length = getchar(); } while (Length >= 0 && Length != '\n');
    getchar();

    return 0;
}

void PrintFooString(unsigned char *str, int length) {
    while (length > 0) {
        putchar(*str++);
        length--;
    }
}
Is there any reason an optimising C# implementation can't operate the same way as I have, above (and only duplicate the string if changes are made)? Write me some code that functions as the code above does, faster. You must make sure the offset is below the length as I have, otherwise you can't guarantee that what you have is a tail string. I was arguing that the wording be changed to at least as fast as (under the guise that the length is known), but you're arguing against that so I guess that means you think C strings can do it faster (or slower). Ohh, and one more thing: Stop confusing implementation with specification. Plebbeh (talk) 12:49, 20 January 2011 (UTC)
... and here's some code more relevant to your problem:
#include <stdio.h>
#include <assert.h>
#include <limits.h>

void GetFooString(size_t n, unsigned char Foo[n]);
void PrintFooString(unsigned char *str, size_t n);

int main(void) {
	unsigned char Foo[512];
	int x;

	GetFooString(sizeof (Foo), Foo);

	do { x = getchar(); } while (x >= 0 && x != '\n');
	getchar();

	return 0;
}

void GetFooString(size_t n, unsigned char Foo[n]) {
	// These allocations are typically O(1) when a "call-stack" is involved.
	unsigned int Bar[1 << CHAR_BIT][n];
	unsigned int x;
	int y;

	// read the string... O(n)
	printf("Enter your string:\n");
	for (x = 0; x < n; x++) {
		int c = getchar();
		if (c == '\n' || c < 0) { break; }
		Bar[c][++(Bar[c][0])] = x;
		Foo[x] = c;
	}

	// print out each search result. O(1) to search. O(1) to point to each result. O(n) to print each result.
	puts("Enter your search char:");
	assert(y = getchar(), y >= 0);
	for (x = 1; x <= Bar[y][0]; x++) {
		PrintFooString(Foo + Bar[y][x], n - Bar[y][x]);
		putchar('\n');
	}
}

void PrintFooString(unsigned char *str, size_t n) {
	while (n > 0) {
		putchar(*str++);
		n--;
	}
}
I do hope you've come to realise that bringing age into a debate when it's irrelevant can be a very bad idea ;) Plebbeh (talk) 14:40, 20 January 2011 (UTC)

I still cannot believe you are failing to understand this. Let's take your first example, with a single change to add a temporary assignment:
int main(void) {
    char* temporary1 = strchr("hello world", ' '); // O(n)
    char* temporary2 = temporary1+1; // LOOK HERE ONLY! O(1)
    char *foo = strdup(temporary2); // O(n)+malloc
    printf(foo); // O(n)
    free(foo); // O(1)+free
    return 0;
}

Please look at the line that says "LOOK HERE ONLY!" and the fact that the execution of that line is O(1). Do you see that line!

Now please rewrite the above to use std::string, keeping the temporaries. Because you seem to think when I say "replace the C string with something different" means "find the NUL and replace it with non-zero" (I have no idea why you think that) I will do this for you. There are no char* variables in this code because they are REPLACED with something different:

int main(void) {
  std::string temporary1(strchr("hello world", ' '));
  std::string temporary2(temporary1.substr(1, temporary1.size())); // LOOK HERE ONLY!!!!!
  std::string foo(temporary2);
  std::cout << foo;
  return 0;
}

Notice the replacement constructs a new string object! In all versions of string objects I have seen, this involves allocating a new buffer and copying the text. (If you look above for my "string_storage" example you can see that I am well aware that it is possible to write a string replacement where this is not true, however it is not commonly done).

I also suspect you are being confused by the strdup replacement on the next line. It is true that in gnu libc++ that this does not make a copy, because they use reference counting so both the new and old strings point at the same buffer (this is also true of my string_storage example, but it is NOT true of Microsoft's C++ library where each std::string owns it's own buffer). So the O(n) strdup() call may have been replaced with an O(1) string copy. And yes this exactly makes up for the difference between "string+1" and "substr()". This is why I have put "LOOK HERE ONLY" everywhere. You seem to think because the slowness might be made up for somewhere else then the slowness does not exist. Anyway in real code "string+n" is done many more times than strdup or strcpy so this does not balance out.

Your other objection that optimizing compilers can get around this I also mention above (I say you could use overloading to make the strchr replacement (or addition) return a different string+offset object). The problem is that without full program optimization this will not work, because actual C string usage tends to look like this:

 extern void foo(const char*);
 void f(const char* string) {
     foo(string);
     foo(string+1);
 }

Unless the optimizer actually reimplements the external function foo as two versions (one which takes a string object and another that takes a string object plus an offset). I think any realistic optimizer would end up replacing all strings with string+offset pairs (as my string_storage example does). This is certainly a possible solution, all I can say is that I have never seen it done, even in strings designed for use by C interpreters.Spitzak (talk) 16:48, 20 January 2011 (UTC)

You seem obsessed with the fact that my tail() function might crash if the length is less than the offset. Ok here is a function that NEVER fails but returns the tail of a string by discarding the first byte. It always works even if the string is less than one character long. And amazingly enough it does this without using strlen() despite all your claims that this is impossible:

 char* tail(char* string) {
   if (!string) return ""; // just in case you complain about this
   // Now I will use my amazing oldie programming abilities to figure out strlen()<1 in O(1) time!
   // Obviously this talent has been lost on the young un's. Now get off my lawn:
   if (string[0] == 0) return "";
   // And to further amaze you I will now return the tail again without using strlen()! Wow!
   return string+1;
 }

Please write the equivalent of the above using std::string.Spitzak (talk) 17:08, 20 January 2011 (UTC)


Unfortunately it is you who does not understand. Is a statement always true when you can find a single scenario where it is true? Example: If the car moves then it must be moving now. The car is not moving. Therefore, the car does not move.
Your first apparent line of reasoning appears to be: If the string representation is not a C-string representation, a new object must be created. The string representation is not a C-string representation. Therefore, a new object must be created.
This is incorrect because the premise is incorrect, as proven by the representation I chose as an example. In my example, a string is represented within an array without a '\0' terminator. This is not a C-string representation, and I didn't need to create an object to store the tail characters independantly of the source string.
Your next apparent line of reasoning appears to be: If a new object is created, the implementation will surely be slower. A new object is created. Therefore, the implementation is slower.
This is also incorrect, but it is entirely irrelevant. A new object needn't be created. Regardless, imagine a C-like implementation that wraps the return value of every string function with strdup(), so rather than obtaining the tail as part of the same object you're obtaining a new object. A new object is not required to store the tail, but there would be a lot of free() calls in your code. What makes you think that the compiler couldn't deduce whether or not the side-effects of the strdup() and free() aren't necessary, and make optimisations? Maybe yours doesn't perform such optimisations, but some might realise that a new object is not required to store the tail and are free to do so by the standard. When you talk about different languages, however, you never know... Some of those you've mentioned might be required to recognise that a new object is not required to store the tail and perform those optimisations. ;) I blame senile degeneracy. You've probably forgotten by now: A new object needn't be created, so none of this paragraph means anything. It might be a good idea to address my response to your first premise.
Plebbeh (talk) 02:12, 21 January 2011 (UTC)

Ohh, let us not forget the third incorrect line of reasoning: C strings are faster than std::string. Therefore, C strings are faster than every other string representation.
Since when did I mention std::string, you idiot?! std::string is not every other string representation, and it has about as much to do with this debate as cheese. I suggest seeing your doctor; I'm starting to worry about a rapid onset of dementia. Plebbeh (talk) 02:23, 21 January 2011 (UTC)

I think I see what you are arguing, though you should have used a structure. The only relevant line in all your code is this one:

  PrintFooString(Foo + Offset, Length - Offset);

This is a pointer+length representation of string that allows O(1) tail, as well as O(1) substring. The fact that you did not put the two fields into a structure made this really unclear. Also there was no need to change the nul in the buffer before hand, as the implementation will never look there!

If you add a reference count it is very close to the string_storage example I gave above (yours has the additional advantage that taking a substring is also O(1)). I would write it as follows:

 struct string_storage {
   int reference_count;
   char data[];
 }
 struct string {
   char* pointer;
   int length;
   string_storage* storage;
 };
 // O(1) tail (no error checking):
 string tail(string s, int n) {
   s->storage->reference_count++;
   return->pointer = s->pointer+n;
   return->length = s->length-n;
   return->storage = s->string_storage;
 }

So yes, you seem to be aware that it is possible to write a string representation that does tail() in O(1) though you sure took way too much text to do so. I am also aware of this and wrote the string_storage example WAY WAY above here trying to show you that I am well aware of this as well.

The problem is that REAL proposed replacements for C string do not do this. This is because of the (probably logical) assumption by the writers of the string class that "tail" is actually a very rare operation and it is silly to optimize that while adding overhead to all other string operators. It also requires reference counting which some consider too expensive (although gnu libc++ does do this).

This is probably pointless but let me try this version which tries to show exactly how often "tail" is done in real C code and I carefully avoid any calls to O(n) functions that seem to have obscured the point:

Proposed text for "alternatives" section

Replacing C strings in existing code with an alternative representation can often cause unexpected and severe slowness. This is due to the quirk of C strings that a pointer into the middle of the string is equivalent to a new string equal to the "tail" of the original string. In most alternatives to C string "tail" is a very slow operation, often involving allocating a new character buffer and copying the characters to it. For example in this C code:

 void print(const char* s) {
   while (*s) {
     putchar(*s);
     s++;
   }
 }

A literal replacement with some "string" object would produce this code, with many calls to "tail":

 void print(string s) {
   while (length(s)>0) {
     putchar(s[0]);
     s = substring(s, 1, length(s)-1); // "tail" operation
   }
 }

Any programmer writing this function from scratch would do something like this, but it is a non-trivial convolution of the original source code:

 void print(string s) {
   for (int n = 0; n < length(s); n++)
      putchar(s[n]);
 }

Modern optimizers and code analyzers can usually fix simple examples as shown above, but more complex C string handling usually outwits them. It is also possible to make a string replacement designed so that tail() is fast, though this may make other operations slower.

Spitzak (talk) 03:23, 21 January 2011 (UTC)

TThe reason nothing like this has been implemented is because, as I mentioned earlier, the C standard does not concern itself with complex structures, typedefs and pointer hiding. What purpose does "reference_count" have? Taking a substring is not O(1). Finding a tail by character is O(1). I could have put the two fields into a structure, but that would have meant I'd have to create a new object for the substring. ;) Besides, it would have bloated the code more. I wrote what was necessary to prove my point, and my point is still that C strings are not faster than every other string representation at obtaining a tail. I don't think it's suitable to pit the C string representation against an ambiguous variety of string representations and say "it's far faster". Aren't "can often", "often", "usually", etc weasel words? In terms of constant time, the representation I defined is no slower at reading a string from input than the C string representation; I could write the other operations to be just as fast, if not faster. In terms of specification, one should not attempt to compare because it is implementation that defines actual speed. The specification does not define any requirements for the actual speed of these functions. "I hereby declare that a call to any standard function that operates on a C string shall take at least one second." If you're comparing implementations, then your argument makes no sense because there are slow C implementations and fast C implementations; a slow C implementation is likely just as slow, or possibly slower at finding the tail of a string as a fast Javascript implementation. However, there is no reason a fast Javascript implementation can't be just as fast as a fast C implementation at finding the tail of a string; It's all about optimisation.Plebbeh (talk) 04:48, 21 January 2011 (UTC)

character vs byte

I think we should name characters as characters since these functions are for character string manipulation. I agree that there's an issue with multi-byte characters, but using bytes doesn't completely remove the source of confusion either, as the reader still must know that there exist non single-byte characters. What if we changed bytes back to characters and added a notice that str* functions operate on single-byte characters?1exec1 (talk) 19:05, 18 October 2011 (UTC)

Saying "it only works on the one-byte characters" is wrong, because the string operations will work on the individual bytes that make up parts of multi-byte characters (for instance you can count the number of characters, assuming no bad encoding, by counting the bytes that don't start with 10 binary, thus there are useful operations you can do working with the bytes). The proper term for the units it operates on is "whatever your C compiler means when you say 'char'" but that is hard to read, looks like the word 'character' misspelled, and 'byte' is probably a much more popular term. The C99 documentation is technically correct because they define the word "character" as being "char", but that is not how the word "character" is defined in any wikipedia article about text.
The main problem is that there are a lot of programmers out there who are just smart enough to do horrible things when they think that strlen() has to return the 'number of characters'. If they were a bit stupider we would be ok because they would not get anything to work. But there seems to be an overlap, perhaps best defined as 'idiot savants' or something, where they will actually write working, but horrific complicated code because they took the word "character" literally. These code writers are probably the biggest impediment to getting Unicode to work. There are active attempts to clear up the documentation, such as the BSD man pages which I was quoting, but there remains a huge amount of legacy documentation, including stuff from standards organizations. Anyway I see no reason not to have Wikipedia use modern notation.Spitzak (talk) 02:05, 19 October 2011 (UTC)
Ok, I agree. C++11 uses byte string to name single byte character strings, so I think it's a good idea to stick with it. 1exec1 (talk) 02:26, 19 October 2011 (UTC)

Single page for C string functions

Based on Talk:C standard library#Pages for each function and WP:NOTMANUAL

The following pages essentially discuss the same topic of C string functions: string.h, memset, strcpy, strlcpy, strcat, strrchr, strcspn, ctype.h, strcmp, strlen, memccpy, mempcpy. I propose to cleanup these pages by removing the material that fails WP:NOTMANUAL and by merging the remains into C string.1exec1 (talk) 23:29, 8 October 2011 (UTC)

memset , memcpy, and mempcpy don't operate on C strings (that is, NUL-terminated strings). They operate on buffers with a length specified by one of the arguments. However I agree that there should not be one function per page. Jeh (talk) 04:33, 11 October 2011 (UTC)
I am unsure myself what to do with these functions. They do not really belong to C string, that's a fact. On the other hand, IMO they are not significant enough to warrant a separate page, and even if we chose to create one, there's hardly a good descriptive name for it. I myself tried to think of one, but all of them fit better for functions like malloc, e.g. C memory handling, C memory operations, etc. Going further, mem* functions are in the same header as str* functions, and most of the references I could find, e.g. cplusplus.com, cppreference.com, the C standard, preserve such grouping. Thus I think we wouldn't do a big mistake by sticking with established sources. Given these arguments, I think it's a good option to merge mem* to str*. That said, if you've got a sensible alternative, I'd happily reconsider.1exec1 (talk) 09:10, 11 October 2011 (UTC)
If some page is merged into C string , then we are unable to see the content of that page.:How can one read the information of that page after merging?
Sagar tikore (talk) 06:58, 11 October 2011 (UTC)
The content wouldn't disappear anywhere, just that it would placed in this article, not these separate articles.1exec1 (talk) 09:10, 11 October 2011 (UTC)
Instead of merging the functions in C string we can merge them into string.h
Asmita yendralwar (talk | contribs) 07:58, 11 October 2011 (UTC)
I don't think this would be a good option since it would be inconsistent with other pages discussing C standard library, e.g. C memory operations, C input/output.1exec1 (talk) 09:10, 11 October 2011 (UTC)
I agree with merging all of string.h into this page (there is a table already there) and removing all the trivial pages for the individual functions and changing them to redirects. However some pages with more information, in particular the strlcpy page, need to be preserved (this page contains a bit of history and political intrigue which is interesting but would bloat this page and make the table hard to read). The mem functions should be merged here as well, they are part of string.h and often are used to manipulate c strings.Spitzak (talk) 19:40, 13 October 2011 (UTC)
I also agree to merge all string.h here. Yogesh.rathod07 (talk) 06:40, 17 October 2011 (UTC)
I did a few modifications. In particular I dupliated the old string.h table and text here, it seems to be more accurate and carefully checked. In particular it lists some of the alternative functions which otherwise we lost. I also restored the page for ctype.h as I was under the impression this was to be divided up by header file (and also that file is not really dependent on null-terminated strings). I also restored the gets and strlcpy pages as they had significant text describing historic details and are referred to often from other wikipedia articles. Hope this is all ok.Spitzak (talk) 02:40, 19 October 2011 (UTC)
Oops it looks like I deleted all the external links to C/C++ documentation pages. Probably should be restored.Spitzak (talk) 02:54, 19 October 2011 (UTC)

strlcpy

I'm going to replace this page with a redirect again. The consensus was to merge all functions, that consist from material failing WP:NOTMANUAL and strlcpy falls into this group of pages. The only section that can be preserved somewhere is Criticism. However, I think we should can delete even that material, because it fails WP:NOTABILITY by not having WP:RS to back up the text. The already provided references are not WP:RS because of WP:SPS.
Going further, the page is imported into Wikibooks at b:C Programming/C Reference/nonstandard/strlcpy, so there absolutely no justification to keep anything failing Wikipedia guidelines here, when the content can be further improved in Wikibooks. 1exec1 (talk) 12:28, 19 October 2011 (UTC)

The criticism is backed by direct quotes from the maintainer of glibc posted on the official glibc mailing list. There was more but it is repeatedly deleted, because of a desire to obfuscate the exact guilty party and to try to claim the argument actually has merit. strlcpy is often mentioned as a indicator of misguided design in Linux and is thus a subject people will look for. I do not think whitewashing this story is good for Linux or for any of the involved parties.Spitzak (talk) 14:38, 19 October 2011 (UTC)
I'm not trying to whitewash or anything. I'm just saying there's not enough notable material to warrant a separate article, especially when strlcpy is not a standard function. That's not to say that we must delete that material - a better idea would be to merge the important bits to C string. In this case, we can create a new section called Extensions, move all the non standard functions there, and to place criticisms and other relevant stuff there. 1exec1 (talk) 16:58, 19 October 2011 (UTC)
While I personally agree with merging the page, you stated that you were going to merge the page but you didn't even do that before blanking it. The best practice according to WP:MERGE is to first obtain consensus, and actually merge the page before redirecting. I'm going to remove the redirect now because clearly the C string page does not cover all of the useful information in the strlcpy page. At the very least, the criticism of strlcpy needs to be addressed before this article can be redirected there. YumOooze (talk) 04:57, 22 October 2011 (UTC)
I think I have addressed your concerns.1exec1 (talk) 12:17, 23 October 2011 (UTC)
There is a lot of books which describe strlcpy, take a look at google books - the article has not been merged - must of the content has been removed with the argument that it violate WP:NOTMANUAL - I think the Section "Usage" could be rewritten by a few edits so it didnt "violate" WP:NOTMANUAL. And I dont understand why you removed this with the statement that it isnt noteable that e.g. Linux Kernel has ported the function - the kernel can not use the standard C library as you may know. I am going to remove the redirect again (until the page has been merged into this article) Christian75 (talk) 15:35, 23 October 2011 (UTC)
Do you wan't to say that all these thousands of functions that are in the Linux kernel deserve a page? Can you find a secondary WP:RS which justifies the inclusion as per wikipedia notability criteria? 1exec1 (talk) 17:53, 23 October 2011 (UTC)
In any case, the current consensus is to merge. See this discussion. As you can see, 5 editors (User:strcat, User:Vadmium, Ruud, Michael, User:1exec1) are for the merge, 2 users against (User:Spitzak, User:Christian75) (please fix if I'm wrong). So I have strong reason to undo your changes. Please establish new consensus before reverting. 1exec1 (talk) 18:14, 23 October 2011 (UTC)
Maybe there was a consensus for the idea of merging many articles but judging my the noise on my watchlist I dunno if there is much consensus for this particular merge from strlcpy into C string. How about something in between like renaming as strlcpy and strlcat? Sources and Wikipedia references seem to group them like that anyway. Vadmium (talk, contribs) 10:19, 24 October 2011 (UTC).

strcpy

I am going to replace this page with a redirect again. Almost all content fails WP:GNG, because the only secondary source I could find, that supports the material, is man pages, which is not WP:RS. The remaining is already at C string. Since there has been no recent attempt to fix these issues, except one editor who reverts page blanking, I assume that there is no genuine interest in the article.

I will not bring the article to WP:AFD because there is no intent to delete the page proper. Undoable action should be discussed in the talk page (see WP:AFD, specifically For problems that do not require deletion, including <...>, be bold and fix the problem or tag the article appropriately'). 1exec1 (talk) 13:29, 26 October 2011 (UTC)

strcat

I am going to replace this page with a redirect again. This page is in exactly the same situation as strcpy. Almost all content fails WP:GNG, because the only secondary source I could find, that supports the material, is man pages, which is not WP:RS. The remaining is already at C string. Since there has been no recent attempt to fix these issues, except one editor who reverts page blanking, I assume that there is no genuine interest in the article. 1exec1 (talk) 13:29, 26 October 2011 (UTC)

Have you tried a search with google books? I could find a lot of secondary sources to the article a minute ago - but it takes time to insert the refs. I think its a shame to delete it (I know you "merged" it, but the content didnt move with it). There exsist a lot of sources for strcat at google books "strcat buffer overflow" at google books Christian75 (talk) 17:50, 26 October 2011 (UTC)
You mean those all programming books/manuals? They do not constitute significant coverage as per WP:N, because if we remove the 'how-to use strcat' portion of that material, only a very limited factual mention is left. As per WP:N, you must find a third-party reliable source in which strcat is one of few major subjects. 1exec1 (talk) 19:55, 26 October 2011 (UTC)
Please, read the WP:N again, especially the section you quote significant coverage which says "Significant coverage is more than a trivial mention but it need not be the main topic of the source material." Christian75 (talk) 18:47, 28 October 2011 (UTC)
Show specific examples of books about strcat and then we can talk what fails WP:N and what doesn't. 1exec1 (talk) 09:36, 29 October 2011 (UTC)

Null-terminated string

Shouldn't this article be at located at null-terminated string (or NUL-terminated string)? And primarily focus on null-terminated strings instead of C's string library? —Ruud 00:21, 19 October 2011 (UTC)

There's actually not much to say about null terminated string itself apart from the definition. Everything comes down to the operations that are defined on these strings, and the properties of these operations. C string library is the most widely used interface to these operations, so the attention to it seems reasonable to me. 1exec1 (talk) 01:41, 19 October 2011 (UTC)
You could also say a few other things. I haven’t really read the article :P but perhaps a comparison with other ways of storing strings and its relative strengths and weaknesses; languages and other applications where it is used? Vadmium (talk, contribs) 07:55, 24 October 2011 (UTC).
There is a comparison with a leading length at the start of the article!
Okay, so there is in the history section. And there’s more at String (computer science)#Representations. Vadmium (talk, contribs) 10:55, 24 October 2011 (UTC).
I would have to disagree with that. One can easily discuss the asymptotic complexity for various operations on null-terminated strings in terms of abstract functions. In my opinion this article should either be split into an article on null-terminated strings and an article on "Strings in the C programming language", of the latter should be more clearly made into a sub-section of an article whose primary topic is null-terminated strings. —Ruud 13:43, 24 October 2011 (UTC)
I agree with the suggestion to split the article. 1exec1 (talk) 14:22, 24 October 2011 (UTC)

Requested move

The following discussion is an archived discussion of a requested move. Please do not modify it. Subsequent comments should be made in a new section on the talk page. No further edits should be made to this section.

The result of the move request was: page moved per consensus in the discussion. Vegaswikian (talk) 22:56, 5 November 2011 (UTC)



C stringNull-terminated stringRelisted. Discussion on going and may lead to something other then a rename. Vegaswikian (talk) 05:13, 31 October 2011 (UTC) Common and language-neutral name. —Ruud 13:55, 24 October 2011 (UTC)

how is the term C string not neutral?--199.91.207.3 (talk) 17:35, 24 October 2011 (UTC)
I would conjecture that the terms "C string" and "Pascal string" are mostly used by C programmers interfacing with libraries developed for different ABI's, while computer scientists and programmers from other languages would prefer to use the more descriptive terminology "null-terminated" and "length-prefixed" strings. The former already requires you know that C uses strings which are terminated by a null character and Pascal uses strings which are prefixed by their length, while this is self-evident with the latter. —Ruud 20:59, 24 October 2011 (UTC)
I think the term "Pascal string" means a 1-byte prefix length, not just the fact that a length is stored.Spitzak (talk) 23:50, 24 October 2011 (UTC)
True. So a Pascal string would be a particular kind of length-prefixed string. If we would have an article on that topic (which I don't believe we have at the moment), it would likely discuss all kinds of length-prefixed strings, not just 1-byte-length prefixed ones. —Ruud 00:58, 25 October 2011 (UTC)
I would support either a move to Null-terminated string, and/or integration with String (computer science)#Null-terminated, especially if the C stuff is to be a separate article. Vadmium (talk, contribs) 05:14, 25 October 2011 (UTC).
Maybe we can move the article containing the remaining C stuff to C standard string functions or similar title? 1exec1 (talk) 18:15, 29 October 2011 (UTC)
I'd prefer something like "String handling in the C programming language" or (more ambiguously, but more concise) "String handling in C". —Ruud 09:50, 31 October 2011 (UTC)
I see one problem with a title like this: it is not consistent with other pages about C standard library, like C mathematical functions and so on. In my opinion we should have either all articles in one format or the other. If we change all titles to Mathematical functions in C and similar, they become much more ambiguous, because then they refer to all functions (i.e. not necessarily standard ones) in the particular domain of C. Current solution mostly works, because when saying C mathematical functions, C standard mathematical functions is naturally implied (I must agree that this assertion might be far fetched as I'm not native speaker of English). Alternative solution might be something like Standard mathematical functions in C, but this also doesn't sound well (and might be grammatically incorrect; again, I'm not native speaker). Thus I think that certainly being not ideal, C standard string functions or C string functions might be the best option. However, if we decided to ignore the consistency issue, I would agree that String handling in C is an appropriate title. 1exec1 (talk) 23:29, 31 October 2011 (UTC)
I think I've already indicate that I find titles such as "C dynamic memory management" to be pretty awkward and that titles such as "Dynamic memory management in C" more clearly indicate the article is actually a sub-article of both Dynamic memory management and C (programming language). Perhaps the title and scope should even be Memory management in C and clearly linked with at {{main|Memory management in C}} from C (programming language)#Memory management. —Ruud 11:53, 1 November 2011 (UTC)
Ok, you finally convinced me. My previous argument is incorrect in that the scope of the articles is actually broader than the standard functions, as is evident in, for example, the current C string page. So now I think that the in C titles not only sound well, they represent the current and potential scope of the articles much better. Is a change from C *** to *** in C a non-controversial move? Can I implement it without a discussion? 1exec1 (talk) 17:36, 1 November 2011 (UTC)
I've created a centralized discussion at Talk:C_standard_library#Move_articles_about_C_standard_library_from_C_.2A.2A.2A_to_.2A.2A.2A_in_C. 1exec1 (talk) 12:51, 2 November 2011 (UTC)
The above discussion is preserved as an archive of a requested move. Please do not modify it. Subsequent comments should be made in a new section on this talk page. No further edits should be made to this section.