User:Omphaloscope/CSV

The comma-separated values (or CSV) file format is a delimited data format commonly used for storing tabular data, such as an electronic spreadsheet. Data in CSV format typically appears like this, although there are variants:

"ID","Last name","First name","Email" "42","Adams","Douglas","douglas.adams@wikipedia.org"

Each line has a number of fields separated (or, delimited) by comma characters. Rows are separated by line breaks (specifically, newlines). Fields which themselves contain a comma, newline, or double quotation mark character, or which start or end with whitespace, must be enclosed in double quotation marks. Furthermore, if a line contains a single entry which is the empty string, it must be enclosed in double quotation marks. If a field's value contains a double quotation mark character it is escaped by placing another double quotation mark character next to it. The CSV file format does not require a specific character encoding, byte order, or line terminator format.

Specification
While no formal specification for CSV exists, RFC 4180 describes a common format and establishes "text/csv" as the MIME type registered with the IANA. Many informal documents exist that describe the CSV format. How To: The Comma Separated Value (CSV) File Format provides an overview of the CSV format in the most widely used applications and explains how it can best be used and supported.

Example
The above table of data may be represented in CSV format as follows:

1997,Ford,E350,"ac, abs, moon",3000.00 1999,Chevy,"Venture ""Extended Edition""",,4900.00 1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00

This CSV example illustrates that:


 * fields that contain commas, double-quotes, or line-breaks must be quoted,
 * a quote within a field must be escaped with an additional quote immediately preceding the literal quote,
 * space before and after delimiter commas may be trimmed, and
 * a line break within an element must be preserved.

Application support
The CSV file format is a very simple data file format that is supported by almost all spreadsheet software such as Excel (although some local versions use semicolons instead of commas), Calc, and Gnumeric. Any programming language that has input/output and string processing functionality will be able to read and write CSV files.

CSV files are ubiquitous for tabular data, as are ASCII files for text data.

Utilities
The csvprint utility will reformat CSV input based on a format string. This can be useful for reordering fields or generating source code or tables as illustrated in the following example:

$ csvprint data.csv "\t{ %0, %1, %2, \"%3\" },\n" { 0xC0000008, 0x00060001, NT_STATUS_INVALID_HANDLE, "The handle is invalid." },

csvdiff is a perl script to compare/diff two (comma) separated files with each other. The part that is different to standard diff is, that you'll get the number of the record where the difference occours and the field/column which is different. The separator can be set to the value you want it to, not just comma. Also you can to provide a third file which contains the columnnames in one(!) line separated by your separator. If you do so, columnnames are shown if a difference is found. Example: $ perl csvdiff.pl -a act.csv -e exp.csv -s ";" -c col_names.csv -k "2" -t -i Record with key "200100500" is different: Actual  line 006 > 200100500;200100500;6;;;;;;000;0;2005-12-20;55 < Expected line 008 > 200100500;200100500;6;;;;;;000;0;2005-12-19;55 < Difference in field no.: 11 - field name: Dat_Rueckgabe Actual  > 2005-12-20 < Expected > 2005-12-19 <