Talk:AWK

old discussion
Is "Version 3 UNIX" supposed to be "3rd edition UNIX"? That would be approximately the right time frame, though i would have thought maybe 4th or 5th edition. Adding a date would help too. — Preceding unsigned comment added by Stephen Gilbert (talk • contribs) 00:56, 21 August 2001‎ (UTC)

As I understand it, versions corresponded to software releases, editions corresponded to documentation (manual releases). There wasn't always a one-to-one correspondence to each other. --drj — Preceding unsigned comment added by Drj (talk • contribs) 15:51, 25 February 2002 (UTC)

The naming scheme for UNIX was relatively linear until above Version 7 Unix, which is where major source trees, both inside and outside Bell Labs began to split off.

This UNIX timeline has some excellent linked sources, and names and dates the various Unix versions. http://www.robotwisdom.com/linux/timeline.html

These sources date AWK to January 1979. http://minnie.tuhs.org/UnixTree/V7/

Lent 18:46, 30 March 2006 (UTC)

This article could really do with some simple examples to show the expressive power of the language. IMHO, it's easier than PERL for many simple data manipulation tasks. Any objections? - Steve Donovan — Preceding unsigned comment added by Sdonovan (talk • contribs) 06:34, 24 January 2003‎ (UTC)

Is awk really feature complete enough to be considered a general purpose programming languagE? I've always considered it more of a text manipulation language. Suppafly 01:47, 7 Oct 2004 (UTC)
 * That's what K & R thought at first. But when they saw people using it as a general purpose language, they revised awk (calling it nawk) adding more features and functions. It can be used as a general purpose language. &mdash;Pelladon 07:21, 4 August 2006 (UTC)

Yes, it is. It particularly excels at text manipulation, but is also a fine tool for other things, and in any case 'text manipulation' covers a lot of territory (the majority of dynamic web content, for example). It's not unusual to find quite large awk programs. - jhd — Preceding unsigned comment added by 132.147.65.102 (talk) 02:01, 7 October 2004‎ (UTC)

If you are in any doubt, you should look for the first editions of the excellent little books, "Programming Pearls" and "More Programming Pearls" by Jon Bentley which really show what AWK can do in the hands of an expert. -- Derek Ross | Talk 07:18, 17 November 2005 (UTC)

A simple example (about 200 lines) of a fully working tool that also demonstrates AWK is a used for more than text manipulation can be seen in the TLDP web correlator http://www.nyx.net/~sgjoen/webcorr-css which takes in an Apache HTTP Server log and generates an HTML report. I read somewhere that AWK was initially made for database purposes. — Preceding unsigned comment added by 85.164.112.1 (talk) 17:17, 1 July 2005‎ (UTC)
 * [13 years later...] . o O (How is "turning a log file into an HTML report" not exactly text manipulation? I have to wonder what this IP user imagined HTML documents are made of?) -- FeRDNYC (talk) 13:07, 29 September 2018 (UTC)

Name of the article
Why is this page called "AWK programming language"? I've never seen awk referred to as "AWK", as if it were an acronym (even though I suppose it is). It's always just "awk". Why not change it to "Awk programming language" and chalk up the capital letter to technical restrictions? Makaristos 05:36, 14 December 2005 (UTC)


 * Both capitalized and lower case forms are commonly used. Brian Kerningham seems to prefer AWK. Amnonc 16:27, 14 December 2005 (UTC)


 * Then still, why is it called "AWK programming language" instead of just "AWK"? Also, in the article, both AWK and Awk are used. IMHO, we should stick to one capitalization and only mention the alternative somewhere. Qwertyus 17:06, 14 December 2005 (UTC)


 * No, you're right, Amnonc. Even awk 's own manpage refers to it as the AWK Programming Language. awk, all lowercase letters, is the UNIX program that runs programs in the AWK programming langauge. I stand humbled and corrected, and shall therefore attempt to make this distinction clear, as well as standardize the differing appelations in the article. Makaristos 02:36, 15 December 2005 (UTC)

Add a quick note to say that awk uses "extended regexps" by default while grep/ed/sed have "basic regexps" by default on most (all?) platforms? 70.82.141.92 13:13, 25 March 2006 (UTC)


 * I added a line. If you see other things to be improved, be bold. Thanks, Tom Harrison Talk 14:52, 25 March 2006 (UTC)

Warts

 * I added this section as well as the shebang section. Is this too much text and should

the article be split? Lent 16:59, 30 March 2006 (UTC)

Criticisms
The whole article seems to be quite a mess, but the "criticisms" section seems to be specially bad. Most items there are either completely POV, factually incorrect, or affect only specific versions of AWK. If no one complains I think I will remove the whole section, I think there are a few bits there that are valid, but they are rather minnor, would need sourcing and can be more easily added afterwards. --Lost Goblin 14:15, 8 June 2006 (UTC)


 * Agree, I was going to suggest the same. Qwertyus 10:53, 8 June 2006 (UTC)


 * I wrote most of it and I am also not quite happy with how it fits in. There is one reason why it might still be justified to keep it: Too many misconceptions about what AWK really is and what it isnt are circulating. Look at the list of topics in this "Editing" section. Someone had doubts that AWK was a "real programming" language. Someone else seriously took Kernighan's AWK web page at Bell Labs for a simple "book advert". As long as such nonsense gets written down here, there must be some place where this nonsense is corrected. Jürgen Kahrs 22:00, 17 June 2006 (UTC)


 * I'm not sure how the current criticisms sections helps there, I think it adds more to the confusion. As no one has objected so far, I'm removing it, if someone likes they can try to come up with something more clear and consistent. --Lost Goblin 01:23, 10 July 2006 (UTC)

Logo
The "official" AWK logo can be found on the cover of the book "The AWK Programming Language". You cam find the cover on Kernighan's "book advert" page. Is this good enough ? Jürgen Kahrs 22:08, 17 June 2006

Requested move
AWK programming language → AWK (programming language) – Conformance with WP naming conventions atanamir

domain specific lang or general purpose?
The domain specific language article lists awk as a dsl. This article is also categorized as a DSL. However, the first line states it's a general purpose language. There should be some clarification on both pages. User:Mahanga 04:32, 9 May 2007 (UTC)

bug in hello world example
the hello world program is incorrect, it is missing an exit command at the end of the begin block, or else program will be waiting indefinitelly for an EOF coming from standard input.


 * No it doesn't, at least not in the variants I use. In the BEGIN-block AWK doesn't do any reading on the input and in the main-block there is no code so it exists immediately. --Marbl3s (talk) 10:23, 6 June 2008 (UTC)

Note added 7/4/16: Depends on which version. In "old AWK", the comment is correct. Programs consisting of only a "BEGIN" block would still try to read input. This was fixed in "new AWK" and all subsequent versions. — Preceding unsigned comment added by 66.190.12.101 (talk) 20:11, 4 July 2016 (UTC)

awka is virtually inexistent
The link to sourceforge is valid, but there is no data/source/etc. to download on the page. —Preceding unsigned comment added by 217.88.202.92 (talk) 13:28, 25 December 2009 (UTC)

who wrote this?
awk programs are NOT pattern-action statements, and that's obvious to anyone that's actually knowledgable about awk

consider this implementation of uniq(1) in awk: $ printf %s\\n a c d c b d | awk '!o[$0]++' a c d b

there's not PATTERN because the the conditional has to do with the value of an associative array, not the result of a regular expression, which is what the article refers to as "patterns"

there's no ACTION because print is the deffaut action —Preceding unsigned comment added by 190.36.145.91 (talk) 12:06, 26 February 2011 (UTC)
 * '!o[$0]++' is the pattern. As you said, 'print $0' is the (default) action. The pattern need not necessarily be a regular expression: "Patterns are arbitrary Boolean combinations (with ! || &&) of regular expressions and relational expressions." 70.225.163.47 (talk) 05:14, 1 May 2011 (UTC)

Opening paragraph should clearly state that it is a programming language
The current opening is:
 * "The AWK utility is a data extraction and reporting tool that uses a data-driven scripting language consisting of a set of actions to be taken against textual data (either in files or data streams) for the purpose of producing formatted reports. The language used by awk extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions."'

That opening paragraph fails to clearly state that AWK is an interpreted programming language. You can further categorise afterwards but I'd like to differentiate it from a photocopier fairly quickly :-) --Paddy (talk) 21:57, 22 May 2011 (UTC)

grepinawk code examples
I think the code examples should use "$@" instead of $*, at least on my machine it allows whitespace in the input files while the other does not. Also, shouldn't the pattern variable be exported in order for it to work? At least that's the case for me, but maybe it works this way for others? — Preceding unsigned comment added by 92.76.123.142 (talk) 23:40, 30 May 2011 (UTC)

Logos, etc
Some books (O'Reilly for instance) use a drawing of a bird to hint at the contents, the authors of awk and the standards committees do not appear to associate any particular representation, nor is there a suitable trademark to refer to. So AWK's relationship to the bird has the status of a pun. TEDickey (talk) 16:28, 31 August 2012 (UTC)


 * It's more subtle than that. AWK was created in an age when software projects didn't have logos like they do today.  However, the AWK Programming Language book that was published by the authors of the language features the bird on the cover, as does Arnold Robbins' Effective AWK Programming.  These are the two standard references.  The auk bird has a status much like the Perl Camel, and no one will argue that that animal is a pun or "not official", or something. 128.226.130.73 (talk) 21:49, 3 September 2012 (UTC)

Actually, the publisher is Addison Wesley], not "the authors of the language". The images used for those books have restrictions on their reuse because they're used as part of advertising (and this topic wouldn't meet the guidelines for incorporating that material because it is not dealing directly with the book). Introducing yet another image doesn't help the reader, since it doesn't have any relationship to the books. Not all books on awk use that image, e.g,. sed & awk, and this copy of Arnold's book GAWK: Effective AWK Programming. TEDickey (talk) 22:17, 3 September 2012 (UTC)


 * Bossypants, have you read anything I said at all? And have you recently added to any kind of article? 128.226.130.73 (talk) 22:22, 5 September 2012 (UTC)

I've read your edits, which are uncivil, and in other cases are unconstructive. TEDickey (talk) 00:07, 6 September 2012 (UTC)

Piping and redirection
These aren't awk functions; they're specific to the operating system. ____ Kernel.package — Preceding unsigned comment added by 71.211.235.93 (talk) 22:00, 28 June 2013 (UTC)

They are awk features, as described in the standard (see http://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html#tag_20_06_13_10). TEDickey (talk) 22:12, 28 June 2013 (UTC)

Damaged sentence in 4th paragraph?
The 4th paragraph contains the following sentence, which does not seem to make sense:

"The power, terseness, and limits of early AWK programs inspired Larry Wall to write Perl just as a new, more powerful POSIX AWK and gawk (GNU AWK) were being defined."

To me it looks as if part of he sentence (after "more powerful": a more powerful what?) got lost by accident. --Rüdiger Kupper (talk) 11:27, 30 July 2013 (UTC)


 * Think it's meant to say that Posix Awk and Gawk are more powerful than regular garden-variety Nawk... AnonMoos (talk) 23:35, 31 July 2013 (UTC)

"replaced by Perl"
Awk has long been superseded for complex programs with many lines, but I think a significant number of people still find it convenient for very small programs (one-liners and such), frequently invoked directly from shell. Awk is also the only general-purpose language in the Posix standard intermediate in capability and power between "sh" and "C"...AnonMoos (talk) 20:41, 18 September 2013 (UTC)


 * I see no WP:RS here or in the topic (Perl isn't going to replace Awk in any of my portable scripts simply because Awk is standard, while Perl is not -- and is unlikely to ever be -- and you're most likely to find support for the statement from people who don't focus on portability) TEDickey (talk) 22:09, 18 September 2013 (UTC)


 * We can recognize that AWK's peak of popularity was probably in the late 1980s or beginning of the 1990s, and that few substantial programming projects are undertaken in AWK, while also recognizing that a significant number of people find AWK convenient for various supplemental purposes, and that it's deeply embedded within widely-adopted standards, and is not going anywhere anytime soon. So I'm not sure that simply saying that it's been "replaced by Perl" is a fair summary... AnonMoos (talk) 17:44, 21 September 2013 (UTC)


 * I generally agree - but finding reliable sources by knowledgeable people is the hard part TEDickey (talk) 18:34, 21 September 2013 (UTC)

Awk was never "superseded" because it was never used for "complex programs". It is used for programs of this scale, which are complex enough, depending on your opinion. Perl competes with Awk for "market share" but so do many others. They all have pros and cons and incredibly some people program in more than one language. Awk could not have peaked in the 80s, prior to the invention of Linux, when it was deployed on millions of installs during the 1990s and 2000s. It's only true for certain people born in certain years ie. I remember when Awk was the new thing, and ignores the bulk of users who experienced Awk for the first time in the 90s and 00s (when the O'Reilly books were published). A check of Stack Exchange and Unix.com shows Awk is as alive and well as ever. A check of Awk development (for GNU) shows it has seen more new features added in the past 4 years than in the prior 15. Comparisons of popularity with Perl are individual opinions, like saying Michael Jackson is hot or not. -- Green  C  20:04, 4 October 2014 (UTC)

awk/nawk aliases
It's common for related packages to have aliases for programs which are similar to those on other systems. That doesn't make them "also known as". Otherwise, we would have "bison, also known as yacc". Anywhere except Wikipedia, that sort of thing would be dismissed immediately. Here, we want a reliable source TEDickey (talk) 01:27, 27 February 2015 (UTC)

Comparison of awk implementations
Currently the article claims mawk is a very fast AWK implementation, but I have a counter example: match execution time grows exponentially on certain regular expressions, when you increase input length For some reason mawk is the default awk interpreter in most Ubuntu Linux variants, but the version of mawk, that is inherited from Debian Linux, is old; it does not contain fixes made by mawk's new maintainer Thomas E. Dickey since 2009. mawk's WWW-site — Preceding unsigned comment added by Selkänahka (talk • contribs) 21:58, 1 September 2015 (UTC)


 * Maybe/maybe not: the drawback to bug-reports is that they are not a reliable source. A review by a knowledgeable reviewer of several implementations would be a reliable source.  The problem with using bug-reports is two-fold: (a) the source is selective (chosen to illustrate a point), and (b) a large percentage of bug-reports simply are invalid, or consist largely of irrelevant information which the developer must study to get useful information.  Thus it requires the knowledgeable review to make it suitable for use in sources. TEDickey (talk) 01:04, 2 September 2015 (UTC)

Linux distribs do not always include gawk...
The line:

[quote] Linux distributions are mostly GNU software, and so they include gawk. [/quote]

is actually not true, although it certainly should be.

Debian-based distros (which is many/most of them) tend to ship with mawk as "awk" and require an explicit "apt-get install" to get GAWK.

Personally, I think this is a shame - because GAWK is so much better - but that is the way that it is. — Preceding unsigned comment added by 66.190.12.101 (talk) 20:01, 4 July 2016 (UTC)

External links modified
Hello fellow Wikipedians,

I have just modified 4 one external links on AWK. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20110723111734/http://refspecs.freestandards.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic/command.html to http://refspecs.freestandards.org/LSB_4.0.0/LSB-Core-generic/LSB-Core-generic/command.html#AEN32008
 * Added archive https://web.archive.org/web/20080808234125/http://www.computerworld.com.au/index.php/id;1726534212;pp;2 to http://www.computerworld.com.au/index.php/id;1726534212;pp;2
 * Added archive https://web.archive.org/web/20070410003418/http://cm.bell-labs.com:80/cm/cs/who/bwk/awkc++.ps to http://cm.bell-labs.com/cm/cs/who/bwk/awkc++.ps
 * Added archive https://web.archive.org/web/20081031084509/http://www.think-lamp.com:80/2008/10/awk-a-boon-for-cli-enthusiasts/ to http://www.think-lamp.com/2008/10/awk-a-boon-for-cli-enthusiasts/

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 12:19, 1 October 2016 (UTC)

Curious forward reference to nothing
In the section on books on awk, it says: "Free download of this manual is possible through the following book references."

But no book references follow! GeneCallahan (talk) 06:19, 18 February 2017 (UTC)


 * Yeah, I don't know what that was about. I thought the few sentences there (which were just tacked on after the Cite book call) might be a quote from the linked source, but they don't appear anywhere there. It appears they were just some notes written by whoever added the book reference. Unnecessary non-sequitur notes, so out they came. -- FeRDNYC (talk) 11:55, 29 September 2018 (UTC)

Unicode
I would like to see something written about unicode support or lack of it. It predates unicode of course, but I haven't researched into this to see if anyone has been able to do anything about it. It seems to me that using UTF8 would not be good at all, with very bad results in some cases, yet fine in others. Could anyone help? CecilWard (talk) 11:37, 17 February 2018 (UTC)


 * Keeping mind the usual guidelines on reliable sources and original research. You may not find much that's useful which meets both of those TEDickey (talk) 13:36, 17 February 2018 (UTC)


 * CecilWard -- I think UTF-8 will pretty much work in gawk internal processing, unless you use as an array index a sequence of characters which contains the defined SUBSEP character. AnonMoos (talk) 15:04, 3 October 2018 (UTC)


 * P.S. Gawk apparently can pay attention to "locale" settings, but this seems to mainly affect the meaning of regexp specifications... AnonMoos (talk) 09:28, 4 October 2018 (UTC)

Match pattern from command line
Do we really need three examples of using Bash + AWK for implementing a single feature in an article devoted to AWK? This is encyclopedia, not StackOverflow or a Linux programming tutorial. A single example without Bash would be sufficient, I think. --Amakuha (talk) 13:33, 24 February 2018 (UTC)


 * That's a plague that has been raging across Wikipedia's computing articles for some time. This isn't even one of the more egregious examples, really. (For my money, this is, in the sense that it shouldn't be an article unto itself at all, but rather should occupy a tiny section of the XRI article.) My personal take is, this article should be shorter by about 2/3, and the "Commands", "Sample Applications", and "Self-contained AWK scripts" sections would be gone entirely, because none of that has anything to do with an encyclopedia.


 * Even though I say that XRDS is one of the more egregious examples (because it shouldn't exist), it does represent a good model for what the AWK article should be. It contains exactly one "Example XRDS document", which is just dumped right in there, all syntax-highlighted, in its entirety. It then proceeds to discuss exactly nothing of the syntax, structure, or purpose. Because, again, encyclopedia.


 * In a similar vein, I feel like there's a need here for exactly one "A simple AWK program" listing, just so the reader can get an idea of what they look like. Perhaps with example input and output to illustrate the purpose, and maybe a broad discussion of what makes up the program listing (pointing out the pattern-action structure). That would also remove the need for the confusing pseudocode in § Structure of AWK programs:


 * (Is it a pattern or a condition or an expression? Why would the article be directly contradictory on that point?)
 * (Is it a pattern or a condition or an expression? Why would the article be directly contradictory on that point?)


 * However, I know that trimming 2/3 of the length out of an article like this will upset far too many people who feel invested in its existing content. Either because they take a blanket "more is better" view of Wikipedia as a whole, or because they feel that an article's length is somehow a reflection of the importance of its topic, and they don't want to see AWK "demoted" by the removal of unnecessary cruft. So, . -- FeRDNYC (talk) 12:46, 29 September 2018 (UTC)


 * FeRDNYC -- XRDS is a "batch" query specification language, and so is a rather different beast from a programming language like AWK.  In the case of an active programming language, there's naturally a tendency to show how it actively does things (thus the traditional "Hello, World!" program).  AWK programs are characteristically often very short ("one-liners"), and AWK has a basic programming model that's rather different from what you see in "C" or Pascal or BASIC, so the use of example programs in the article does not seem excessive to me... AnonMoos (talk) 15:15, 3 October 2018 (UTC)

Trimmed entry for gawk in "Versions and implementations"
The entry for gawk in § Versions and implementations previously contained the following:

It's rare to see such a textbook example of improper WP:SYNTH, but oh man this is some kind of poster child. Let me count the ways:
 * 1) The first sentence: Linux distributions are mostly GNU software, and so they include gawk. It's (a) uncited, (b) not accurate (see Talk:AWK above), and (c) creating causation out of thin air. There's nothing presented to justify either half of the sentence even as simple factual statements, but even if there was it wouldn't support the implied "and so" relationship.
 * 2) The second sentence has two citations.
 * 3) At the second cite, the entirety of anything AWK-related is as follows: The system awk(1) now refers to BWK awk. That's it. Absolutely no reason for the change is given, so it certainly doesn't support the claims made in the article about what those reasons are.
 * 4) The first cite makes no mention of AWK or BWK awk anywhere, because it's a link to the entire document on BSD vs. GPL, from the FreeBSD team no less (an obviously biased source, on that topic).
 * 5) The use of those two citations here is thus an attempt to relitigate the licensing debate on the pages of the AWK article, because nothing presented indicates that licensing had anything to do with the switch from gawk to BWK awk. Again: no reason is ever given!

Is it possible that the reasons given for the FreeBSD switch are accurate? Absolutely. But there's nothing in any of the cited materials that even remotely supports the claims made, so it's pure WP:OR without some relevant citations to back it up.

I've therefore replaced the entire text above with:

because I don't even have a source for that claim handy. -- FeRDNYC (talk) 13:52, 29 September 2018 (UTC)


 * fwiw, the commit-comments in FreeBSD's subversion only hint that there was some problem porting gawk to spark64. There's no bug-report cited in any of that, however.  There might be some mailing-list archive mentioning the issue.  In any case, license didn't appear to be a factor TEDickey (talk) 14:29, 29 September 2018 (UTC)

website property
A link to source-code (no documentation) for a particular implementation is off-topic, since this topic deals with the programming language. For instance, the POSIX description of Awk goes into far more depth than the sketchy manual page on the Github site. TEDickey (talk) 19:42, 24 August 2022 (UTC)

reference to Paul Rubin
Is the Wikipedia reference to Paul Rubin correct? It is incredible that an economist contributed to AWK. Vveckaln (talk) 12:00, 22 September 2022 (UTC)


 * The Paul Rubin who had a large early role in GAWK (not official AWK) was a person involved in the GNU project in the 1980s. You can read a short bio of him here... AnonMoos (talk) 14:49, 22 September 2022 (UTC)


 * then is the hyperlink correct? Vveckaln (talk) 14:54, 22 September 2022 (UTC)


 * You should have been able to easily figure that out on your own. Since the Paul Rubin economist article makes no reference to studying at Berkeley, and his education was over long before 1987, I would say that it is not valid. AnonMoos (talk) 14:57, 22 September 2022 (UTC)


 * So... I removed the Paul Rubin link. MichielN (talk) 11:45, 2 October 2022 (UTC)

persistent memory gawk
gawk 5.2 (released September 2022) includes a persistent memory feature that I believe is worthy of mention. For the basics see "man gawk" in 5.2 or later, or for more detail the pm-gawk user manual, which is included in TeXinfo form in the gawk distribution and is also available in PDF format here:

http://web.eecs.umich.edu/~tpkelly/pma/pm-gawk_rev1.52_2022.08aug.16.pdf

A brief description of the feature along with the example in the "Quick Start" section of the user manual above, along with a link to the user manual, might be sufficient to enable interested readers to find additional details on their own.

The persistent memory allocator upon which pm-gawk is based is described here:

https://dl.acm.org/doi/pdf/10.1145/3534855

-- Terence Kelly 38.99.114.119 (talk) 23:19, 30 November 2022 (UTC)

I added a one-sentence mention of persistent-memory gawk and included a URL to the User Manual at gnu.org.

-- Terence Kelly — Preceding unsigned comment added by 50.250.213.78 (talk) 06:49, 20 December 2022 (UTC)


 * As presented there, that's abusing WP:EL as well as WP:UNDUE. Toning down the self-promotion would be an improvement TEDickey (talk) 09:00, 20 December 2022 (UTC)