Talk:Flex (lexical analyser generator)

Name of this topic
Shouldn't flex be described as a "lexical analyzer generator", rather than a "lexical analyzer"? Lex is described as a "program that generates lexical analyzers".
 * Agreed, changing it now. --fvw *  14:46, 2005 Jan 27 (UTC)

Can we please change the title to: Flex (lexical analyzer generator)? Thanks. 122.29.91.176 (talk) 08:53, 21 June 2008 (UTC)


 * That would seem to be more consistent - assuming that one completed the task and moved similar topics which do not require disambiguation to use the same naming convention. The latter comment of course is to avoid unnecessary wordiness. Tedickey (talk) 11:26, 21 June 2008 (UTC)


 * Why not just "flex (software)"? This is similar to lex (software), which had title issues too. Individ (talk) 07:51, 21 July 2009 (UTC)


 * that seems a similar renaming (except that it's less descriptive - on the one hand, like saying that there's only one program named "flex", and on the other saying that its category is unimportant to the title). Should we rename all of the text-editors, for instance (perhaps not). Tedickey (talk) 08:49, 21 July 2009 (UTC)

Examples
The non-flex example should be more concise, but is needed to illustrate the corresponding flex example Tedickey 10:56, 11 August 2007 (UTC)


 * All examples should be thoroughly condensed. Throwing them out completely like some anon just did might not be such a bad idea. --MarSch 11:54, 5 November 2007 (UTC)


 * Other than constructing a tutorial-style shorter example, there's not much to be done. (All of the examples I've written are much longer ;-) Tedickey 12:45, 5 November 2007 (UTC)


 * While reading this page, at first I actually thought the non-Flex example had something to do with Flex. It's not until reading "Now, contrast the above code with the code needed for a flex generated scanner for the same language" that the reader is made aware that what he just read had nothing to do with Flex. I guess having it be such a long example makes this kind of thing worse. —Preceding unsigned comment added by 88.176.77.232 (talk) 12:10, 12 December 2010 (UTC)


 * When  reaches a newline character,   is not set to the length of the previous line. Wouldn't it be better to leave   and   out of the example completely? As far as I can see they aren't used anywhere in the code.  —Preceding unsigned comment added by Thomerow (talk • contribs) 08:00, 29 April 2011 (UTC)


 * cur_line is arguably the "same" as yylineno; cur_col does not have an analog in flex (or lex) TEDickey (talk) 08:42, 29 April 2011 (UTC)

Quex - offtopic?
Quex appears to be a lexical analyzer - there's a topic for that. But it's not an implementation of lex or flex. Unless it's in some manner a superset (compatible of course), then it's off-topic. Tedickey (talk) 20:17, 11 September 2009 (UTC)

Quadratic Time Complexity
While the statement about the complexity may (or may not) be factual, the given reference (last paragraph, etc), does not support the statment Tedickey (talk) 22:40, 21 October 2009 (UTC)

Of course it does. "dynamically resizing yytext to accommodate huge tokens is a slow process because it presently requires that the (huge) token be rescanned from the beginning" -- so each resizing has cost O(m), and you resize it O(m/8,000)=O(m/k)=O(m) times, giving you O(m^2). I can't see how this could possibly be more obvious... AshtonBenson (talk) 17:14, 23 October 2009 (UTC)


 * no - you're making an assumption about how often it's reallocated to do the resizing. Resizing on each increase is something that a novice programmer might do; stating that it's done that way in flex is something that requires a more reliable source than you provided. Tedickey (talk) 20:57, 23 October 2009 (UTC)


 * no - my comment has nothing to do with allocation. The source clearly states that upon resizing, the input is also 'rescanned' from the beginning.  And rescanning is a linear-time process.  Hence O(m/8,000). AshtonBenson (talk) 18:20, 24 October 2009 (UTC)


 * To settle the issue, we need more information than is given in that (actually very old - from 1993 - comment). The "about 8k" is 8192, by the way, which corresponds to the array storage for yytext.  The comment implies that flex switches to pointer storage when it encounters a large token.  Determining the actual behavior will require a look at flex's source code Tedickey (talk) 19:36, 24 October 2009 (UTC)


 * Of course, your edit in Parsing expression grammar, is also disputed. Tedickey (talk) 20:01, 24 October 2009 (UTC)


 * I've spent a little time now reviewing older versions (2.4.3, 2.4.7, 2.5.3, and can explain the cause of your confusion (the documentation is misleading). As I noted, the comment about large tokens dates back to 1993 (flex 2.4.1).  There are a few old versions available, so it's possible to see the evolution of the documentation.  The original comment (originally in "NEWS" for 2.4.1) states:

- A flex scanner's internal buffer now dynamically grows if needed to match large tokens. Note that growing the buffer presently requires rescanning the (large) token, so consuming a lot of         text this way is a slow process. Also note that presently the buffer does *not* grow if you unput more text than can fit into the buffer.

A following comment in version 2.4.7 explains that this is the case that I mentioned (resizing yytext). Quoting from flexdoc.1:

input. As mentioned above, with .B %pointer yytext grows dynamically to accomodate large tokens. While this means your .B %pointer scanner can accomodate very large tokens (such as matching entire blocks         of comments), bear in mind that each time the scanner must resize .B yytext it also must rescan the entire token from the beginning, so matching such tokens can prove slow.

The "info" file which you're reading on the web was written by a different person than the developer (Francois Pinard), and omits the detail that relates it directly to resizing. However, each time yytext is resized, flex attempts to double it. That's rather far away from the behavior which you've claimed (there aren't a lot of doubling's available). Tedickey (talk) 23:57, 27 October 2009 (UTC)

Hey guys, flex developer JM here. Some quick points I hope will clear this up. The scanners are without a doubt O(n). Performance with REJECT is bad only if one concocts a pathological case. The pathological case is similar to the worst-case of an NFA, where every node has transitions to every combination of every other node, PLUS the tokens must be huge, PLUS the input buffer must be small, etc.. It just never happens because although the input domain is technically all bytes from 0-256 in any possible combination and length, in practice, flex is not used in this manner. Flex is typically (always?) invoked to tokenize human-readable text formats, with a smaller alphabet, small tokens, a relatively large buffer, and to process specifications that invariably have only a hundred tokens rules and requiring only a single character of lookahead. Also, the default buffer size is 16K, not 8k (but that doesn't really affect the complexity). When it comes to REJECT and large tokens, I think we did a pretty good job in the Flex manual to frighten the daylights out of developers, warning them loud and clear to avoid REJECT, and advising them how to handle large tokens. (On the topic of documentation, the texinfo manual is the official doc. The man page is generated from it.) I'm not sure if the goal of this "Complexity" section is to rigorously dissect the worst-case running time, or to provide an informative average case time for practical use. If the former, the runtime would be much more complicated than is listed. If the latter, it can be accurately summarized as "O(n) except when using REJECT". In either case, we flex developers are happy to help clarify. One final note to put this all in perspective: A lot of the "fast" and "slow" terms in the flex history were written two decades ago when everything was in bare-metal C, I/O and memory were expensive, and you could actually notice the difference of a few dozen instructions per input symbol. The comparisons were between blazingly fast C implementations. By today's standards, flex scanners would be considered extremely high-performance software. My guess is that a flex scanner could tackle any sizable document 100 times faster than a naive scanner or scanner written in an interpreted language. Also, a flex scanner would likely not even spike the CPU for large input, since it would process tokens faster than any I/O bus could feed it more input. In this light, the performance section in the flex documentation is somewhat outdated. —Preceding unsigned comment added by 76.99.33.52 (talk) 22:59, 14 November 2009 (UTC)

Looks like someone reverted my corrections just minutes after I corrected the article. Such is wikipedia. I'll leave it at this: The entire section as-is is incorrect. Flex does not "rescan" the input when the buffer grows. And even if it did (it doesn't) the amortized time analysis above is incorrect. It'd be $$2m$$ not $$m^2$$ because there would be $$log(m/b)$$ resizes. But again, it does not rescan when the buffer grows. Forget about the blurbs scraped from various revisions of the flex manuals. It's likely I wrote them anyway. Study the code instead. -JM —Preceding unsigned comment added by 76.99.33.52 (talk) 00:49, 15 November 2009 (UTC)

Merger with Flex++
I would support a merger of the Flexx++ article into this one. Jason Quinn (talk) 22:24, 7 January 2010 (UTC)


 * It's likely that the user community for flex++ is so small that there would be no objections (iirc, the code doesn't compile anymore) Tedickey (talk) 23:41, 7 January 2010 (UTC)

Severe cleanup necessary
I see at least 2 areas where IMHO severe cleanup is necessary: Ipsign (talk) 12:10, 30 October 2010 (UTC)
 * example; I feel that example of manually written parser doesn't belong here (and comparison between manually written and flex-generated parsers may even constitute WP:OR)
 * wording in Flex++ section is a mess (hasn't been rewritten after the merge).

Flex dependencies
A recent edit suggests that the output of flex++ has a dependency on the C run-time. I'm not a flex or a flex++ expert but I suspect this is misleading; I expect the output should only depend on the CRTL if the input also depends on the CRTL, e.g, it contains calls to printf, etc. Can someone confirm or deny, please, and (even better) offer a citation? Msnicki (talk) 01:25, 24 November 2010 (UTC)


 * It uses malloc 96.255.37.241 (talk) 01:31, 24 November 2010 (UTC)
 * Okay, you need a memory allocator, but that functionality is often provided (in many, many applications) by a user routine anyway. Take a look, e.g., at section 21.2 on this page, discussing how to override the default.  If that's all we're talking about, I wouldn't call that a dependency on the CRTL.  I'd call that that a dependency on malloc or a user-supplied alternative.  Is there more than that? Msnicki (talk) 01:38, 24 November 2010 (UTC)


 * Based on this discussion, I've revised the passage to reflect that the output only depends on a memory allocator and whatever the input depended on. Thanks!  Msnicki (talk) 00:03, 26 November 2010 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified one external link on Flex (lexical analyser generator). Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20160303224333/http://flex.sourceforge.net/manual/Is-flex-GNU-or-not_003f.html to http://flex.sourceforge.net/manual/Is-flex-GNU-or-not_003f.html

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

Cheers.— InternetArchiveBot  (Report bug) 07:59, 2 October 2017 (UTC)