Module csv
The csv
module handles records formatted according to Comma-Separated-Values
(CSV) rules.
The default formatting rules are:
- Lua escape sequences such as \n or \10 are legal within strings but not within files,
- Commas designate end-of-field,
- Line feeds, or line feeds plus carriage returns, designate end-of-record,
- Leading or trailing spaces are ignored,
- Quote marks may enclose fields or parts of fields,
- When enclosed by quote marks, commas and line feeds and spaces are treated as ordinary characters, and a pair of quote marks “” is treated as a single quote mark.
The possible options which can be passed to csv functions are:
delimiter = string
(default: comma) – single-byte character to designate end-of-fieldquote_char = string
(default: quote mark) – single-byte character to designate encloser of stringchunk_size = number
(default: 4096) – number of characters to read at once (usually for file-IO efficiency)skip_head_lines = number
(default: 0) – number of lines to skip at the start (usually for a header)
Below is a list of all csv
functions.
Name | Use |
---|---|
csv.load() | Load a CSV file |
csv.dump() | Transform input into a CSV-formatted string |
csv.iterate() | Iterate over CSV records |
-
csv.
load
(readable[, {options}])¶ Get CSV-formatted input from
readable
and return a table as output. Usuallyreadable
is either a string or a file opened for reading. Usuallyoptions
is not specified.Parameters: Return: loaded_value
Rtype: Example:
Readable string has 3 fields, field#2 has comma and space so use quote marks:
tarantool> csv = require('csv') --- ... tarantool> csv.load('a,"b,c ",d') --- - - - a - 'b,c ' - d ...
Readable string contains 2-byte character = Cyrillic Letter Palochka: (This displays a palochka if and only if character set = UTF-8.)
tarantool> csv.load('a\\211\\128b') --- - - - a\211\128b ...
Semicolon instead of comma for the delimiter:
tarantool> csv.load('a,b;c,d', {delimiter = ';'}) --- - - - a,b - c,d ...
Readable file
./file.csv
contains two CSV records. Explanation offio
is in section fio. Source CSV file and example respectively:tarantool> -- input in file.csv is: tarantool> -- a,"b,c ",d tarantool> -- a\\211\\128b tarantool> fio = require('fio') --- ... tarantool> f = fio.open('./file.csv', {'O_RDONLY'}) --- ... tarantool> csv.load(f, {chunk_size = 4096}) --- - - - a - 'b,c ' - d - - a\\211\\128b ... tarantool> f:close() --- - true ...
-
csv.
dump
(csv-table[, options, writable])¶ Get table input from
csv-table
and return a CSV-formatted string as output. Or, get table input fromcsv-table
and put the output inwritable
. Usuallyoptions
is not specified. Usuallywritable
, if specified, is a file opened for writing. csv.dump() is the reverse of csv.load().Parameters: Return: dumped_value
Rtype: string, which is written to
writable
if specifiedExample:
CSV-table has 3 fields, field#2 has “,” so result has quote marks
tarantool> csv = require('csv') --- ... tarantool> csv.dump({'a','b,c ','d'}) --- - 'a,"b,c ",d ' ...
Round Trip: from string to table and back to string
tarantool> csv_table = csv.load('a,b,c') --- ... tarantool> csv.dump(csv_table) --- - 'a,b,c ' ...
-
csv.
iterate
(input, {options})¶ Form a Lua iterator function for going through CSV records one field at a time. Use of an iterator is strongly recommended if the amount of data is large (ten or more megabytes).
Parameters: Return: Lua iterator function
Rtype: iterator function
Example:
csv.iterate() is the low level of csv.load() and csv.dump(). To illustrate that, here is a function which is the same as the csv.load() function, as seen in the Tarantool source code.
tarantool> load = function(readable, opts) > opts = opts or {} > local result = {} > for i, tup in csv.iterate(readable, opts) do > result[i] = tup > end > return result > end --- ... tarantool> load('a,b,c') --- - - - a - b - c ...