This module implements a simple high performance CSV (comma separated value) parser.
Basic usage
import parsecsv from os import paramStr from streams import newFileStream var s = newFileStream(paramStr(1), fmRead) if s == nil: quit("cannot open the file" & paramStr(1)) var x: CsvParser open(x, s, paramStr(1)) while readRow(x): echo "new row: " for val in items(x.row): echo "##", val, "##" close(x)
For CSV files with a header row, the header can be read and then used as a reference for item access with rowEntry:
import parsecsv # Prepare a file let content = """One,Two,Three,Four 1,2,3,4 10,20,30,40 100,200,300,400 """ writeFile("temp.csv", content) var p: CsvParser p.open("temp.csv") p.readHeaderRow() while p.readRow(): echo "new row: " for col in items(p.headers): echo "##", col, ":", p.rowEntry(col), "##" p.close()
See also
- streams module for using open proc and other stream processing (like close proc)
- parseopt module for a command line parser
- parsecfg module for a configuration file parser
- parsexml module for a XML / HTML parser
- parsesql module for a SQL parser
- other parsers for other parsers
Types
CsvRow = seq[string]
- A row in a CSV file. Source Edit
CsvParser = object of BaseLexer row*: CsvRow filename: string sep, quote, esc: char skipWhite: bool currRow: int headers*: seq[string]
-
The parser object.
It consists of two public fields:
- row is the current row
- headers are the columns that are defined in the csv file (read using readHeaderRow). Used with rowEntry).
CsvError = object of IOError
- An exception that is raised if a parsing error occurs. Source Edit
Procs
proc open(my: var CsvParser; input: Stream; filename: string; separator = ','; quote = '\"'; escape = '\x00'; skipInitialSpace = false) {...}{. raises: [Defect, IOError, OSError], tags: [ReadIOEffect].}
-
Initializes the parser with an input stream. Filename is only used for nice error messages. The parser's behaviour can be controlled by the diverse optional parameters:
- separator: character used to separate fields
- quote: Used to quote fields containing special characters like separator, quote or new-line characters. '0' disables the parsing of quotes.
- escape: removes any special meaning from the following character; '0' disables escaping; if escaping is disabled and quote is not '0', two quote characters are parsed one literal quote character.
- skipInitialSpace: If true, whitespace immediately following the separator is ignored.
See also:
- open proc which creates the file stream for you
Examples:
import streams var strm = newStringStream("One,Two,Three\n1,2,3\n10,20,30") var parser: CsvParser parser.open(strm, "tmp.csv") parser.close() strm.close()
Source Edit proc open(my: var CsvParser; filename: string; separator = ','; quote = '\"'; escape = '\x00'; skipInitialSpace = false) {...}{. raises: [CsvError, Defect, IOError, OSError], tags: [ReadIOEffect].}
-
Similar to the other open proc, but creates the file stream for you.
Examples:
from os import removeFile writeFile("tmp.csv", "One,Two,Three\n1,2,3\n10,20,300") var parser: CsvParser parser.open("tmp.csv") parser.close() removeFile("tmp.csv")
Source Edit proc processedRows(my: var CsvParser): int {...}{.raises: [], tags: [].}
-
Returns number of the processed rows.
But even if readRow arrived at EOF then processed rows counter is incremented.
Examples:
import streams var strm = newStringStream("One,Two,Three\n1,2,3") var parser: CsvParser parser.open(strm, "tmp.csv") doAssert parser.readRow() doAssert parser.processedRows() == 1 doAssert parser.readRow() doAssert parser.processedRows() == 2 ## Even if `readRow` arrived at EOF then `processedRows` is incremented. doAssert parser.readRow() == false doAssert parser.processedRows() == 3 doAssert parser.readRow() == false doAssert parser.processedRows() == 4 parser.close() strm.close()
Source Edit proc readRow(my: var CsvParser; columns = 0): bool {...}{. raises: [Defect, IOError, OSError, CsvError], tags: [ReadIOEffect].}
-
Reads the next row; if columns > 0, it expects the row to have exactly this many columns. Returns false if the end of the file has been encountered else true.
Blank lines are skipped.
Examples:
import streams var strm = newStringStream("One,Two,Three\n1,2,3\n\n10,20,30") var parser: CsvParser parser.open(strm, "tmp.csv") doAssert parser.readRow() doAssert parser.row == @["One", "Two", "Three"] doAssert parser.readRow() doAssert parser.row == @["1", "2", "3"] ## Blank lines are skipped. doAssert parser.readRow() doAssert parser.row == @["10", "20", "30"] var emptySeq: seq[string] doAssert parser.readRow() == false doAssert parser.row == emptySeq doAssert parser.readRow() == false doAssert parser.row == emptySeq parser.close() strm.close()
Source Edit proc close(my: var CsvParser) {...}{.inline, raises: [Exception, IOError, OSError], tags: [WriteIOEffect].}
- Closes the parser my and its associated input stream. Source Edit
proc readHeaderRow(my: var CsvParser) {...}{.raises: [Defect, IOError, OSError, CsvError], tags: [ReadIOEffect].}
-
Reads the first row and creates a look-up table for column numbers See also:
Examples:
import streams var strm = newStringStream("One,Two,Three\n1,2,3") var parser: CsvParser parser.open(strm, "tmp.csv") parser.readHeaderRow() doAssert parser.headers == @["One", "Two", "Three"] doAssert parser.row == @["One", "Two", "Three"] doAssert parser.readRow() doAssert parser.headers == @["One", "Two", "Three"] doAssert parser.row == @["1", "2", "3"] parser.close() strm.close()
Source Edit proc rowEntry(my: var CsvParser; entry: string): var string {...}{.raises: [], tags: [].}
-
Accesses a specified entry from the current row.
Assumes that readHeaderRow has already been called.
Examples:
import streams var strm = newStringStream("One,Two,Three\n1,2,3\n\n10,20,30") var parser: CsvParser parser.open(strm, "tmp.csv") ## Need calling `readHeaderRow`. parser.readHeaderRow() doAssert parser.readRow() doAssert parser.rowEntry("One") == "1" doAssert parser.rowEntry("Two") == "2" doAssert parser.rowEntry("Three") == "3" ## `parser.rowEntry("NotExistEntry")` causes SIGSEGV fault. parser.close() strm.close()
Source Edit