Faster Command Line Tools in Nim
25 May 2017 Euan Torano
This is a guest post by Euan Torano cross-posted from Faster Command Line Tools in Nim. If you would like to publish articles as a guest author on nim-lang.org then get in touch with us via Twitter or otherwise.
After having read the blog post titled
Faster Command Line Tools in D by Jon Degenhardt on the D Blog, I was interested to see how Nim would stack up performing the same task.
Rather than explain the task again, I might as well quote the reference post that inspired me:
It’s a common programming task: Take a data file with fields separated by a delimiter (comma, tab, etc), and run a mathematical calculation involving several of the fields. Often these programs are one-time use scripts, other times they have longer shelf life. Speed is of course appreciated when the program is used more than a few times on large files.
The specific exercise we’ll explore starts with files having keys in one field, integer values in another. The task is to sum the values for each key and print the key with the largest sum.
With the first field as key, second field as value, the key with the max sum is B, with a total of 13.
Fields are delimited by a TAB, and there may be any number of fields on a line. The file name and field numbers of the key and value are passed as command line arguments.
Implementing the Nim program, first try
This is quite a simple program. All that we need to do is read a file, parse it as CSV data with a tab delimiter and sum the counts.
Luckily, Nim’s standard library has many useful modules to help with common tasks like these:
- The streams module provides a
FileStreamtype for reading from and writing to files.
- The parsecsv module provides a simple high performance CSV parser.
- The tables module provides a
CountTabletype designed to map a key to its number of occurrences - precisely the task we’re trying to accomplish!
The code is pretty simple, so let’s start by looking at it in its entirety:
import os, strutils, streams, tables, parsecsv const Delim = '\t' proc main() = if paramCount() < 3: quit("synopsis: " & getAppFilename() & " filename keyfield valuefield") let filename = paramStr(1) keyFieldIndex = parseInt(paramStr(2)) valueFieldIndex = parseInt(paramStr(3)) maxFieldIndex = max(keyFieldIndex, valueFieldIndex) var sumByKey = newCountTable[string]() file = newFileStream(filename, fmRead) if file == nil: quit("cannot open the file " & filename) defer: file.close() var csv: CsvParser open(csv, file, filename, separator=Delim) while csv.readRow(): if len(csv.row) > maxFieldIndex: sumByKey.inc(csv.row[keyFieldIndex], parseInt(csv.row[valueFieldIndex])) if sumByKey.len() == 0: echo "No entries" else: let largest = sumByKey.largest() echo "max_key: ", largest, " sum: ", largest main()
Most of this should be pretty self explanatory. I’ve used the same names as those used in the D code samples provided in the original blog post to keep things looking similar.
Obviously the start of the program is all about making sure we have enough command line arguments, and collecting them up. In Nim, we use
paramStr() from the os module to access command line arguments.
From there, we create a new count table, open the CSV file for reading (
fmRead opens the file in read mode) and start the CSV parser.
We then read through all of the rows in the CSV file, and if the row has enough fields to be a valid row, we increment its entry within the count table.
With this initial implementation complete, I built it in release mode (
nim c -d:release test_csv.nim) and benchmarked it against the Python implementation and the final optimised D implementation (
The versions of the tools used in this post are as follows:
DMD64 D Compiler v2.074.0
LDC - the LLVM D compiler (1.2.0): based on DMD v2.072.2 and LLVM 4.0.0
Nim Compiler Version 0.17.0 (2017-05-18) [MacOSX: amd64]
Apple LLVM version 8.1.0 (clang-802.0.42)
Testing was performed against the D version compiled using both the reference
DMD D compiler, and the
LDC LLVM based D compiler. The compilation switches used for the D versions were as follows:
dmd -O -release -inline -boundscheck=off -of=./D/csv_test ./D/csv_test.d
ldc2 -of=./D/csv_test_ldc -O -release -boundscheck=off ./D/csv_test.d
I then ran all of these implementations against the same ngram file from the Google Books project, making sure I got the expected output, which is as follows:
max_key: 2006 sum: 22569013
The benchmark was ran on my mid 2014 MacBook Pro with a 2.8GHz Intel Core i7 running macOS Sierra 10.12.4. The benchmark made use of the
repeat.rb script from Kostya’s benchmarks project to run each version 10 times and report the fastest run time. This script also provides the added bonus of providing the peak amount of memory used by each version. Here are the results:
Python... max_key: 2006 sum: 22569013 21.70s max_key: 2006 sum: 22569013 32.06s max_key: 2006 sum: 22569013 17.83s max_key: 2006 sum: 22569013 18.29s max_key: 2006 sum: 22569013 23.73s max_key: 2006 sum: 22569013 18.09s max_key: 2006 sum: 22569013 17.58s max_key: 2006 sum: 22569013 17.86s max_key: 2006 sum: 22569013 16.92s max_key: 2006 sum: 22569013 15.98s MIN TIME: 15.98s PEAK MEM: 7.9Mb D (DMD)... max_key: 2006 sum: 22569013 2.45s max_key: 2006 sum: 22569013 2.33s max_key: 2006 sum: 22569013 2.24s max_key: 2006 sum: 22569013 2.36s max_key: 2006 sum: 22569013 2.28s max_key: 2006 sum: 22569013 2.21s max_key: 2006 sum: 22569013 2.28s max_key: 2006 sum: 22569013 2.21s max_key: 2006 sum: 22569013 2.28s max_key: 2006 sum: 22569013 2.30s MIN TIME: 2.21s PEAK MEM: 2.3Mb D (LDC)... max_key: 2006 sum: 22569013 1.42s max_key: 2006 sum: 22569013 1.34s max_key: 2006 sum: 22569013 1.34s max_key: 2006 sum: 22569013 1.33s max_key: 2006 sum: 22569013 1.88s max_key: 2006 sum: 22569013 1.31s max_key: 2006 sum: 22569013 1.35s max_key: 2006 sum: 22569013 1.36s max_key: 2006 sum: 22569013 1.29s max_key: 2006 sum: 22569013 1.26s MIN TIME: 1.26s PEAK MEM: 2.4Mb Nim... max_key: 2006 sum: 22569013 1.36s max_key: 2006 sum: 22569013 1.30s max_key: 2006 sum: 22569013 1.31s max_key: 2006 sum: 22569013 1.35s max_key: 2006 sum: 22569013 1.36s max_key: 2006 sum: 22569013 1.45s max_key: 2006 sum: 22569013 1.47s max_key: 2006 sum: 22569013 1.50s max_key: 2006 sum: 22569013 1.53s max_key: 2006 sum: 22569013 1.56s MIN TIME: 1.30s PEAK MEM: 1.9Mb
D (compiled with LDC and using the fifth iteration of the code from the original blog post) comes in fastest at 1.26 seconds. Nim comes in second, at 1.30 seconds - only 0.04 seconds behind.
Nim does use 0.5Mb less memory than the D version built with LDC, and is doing slightly more work - the
parsecsv module will handle escaped values, unlike the naive approach of simply splitting strings on the tab character.
Python is clearly the loser, taking 12 times as long to complete the processing as the Nim version does and using 4 times the memory to do so.
This test also shows the gulf between the DMD and LDC compilers. The same code compiled with the DMD compiler takes almost a whole second longer to execute than it does when compiled with LDC, though it does use 0.1Mb less memory at its peak.
I’m sure that with some tuning, I could get the Nim version to perform even quicker, but for a 0.04 second time difference, I decided not to put that effort in.
Benchmarking compilation times
Having come this far, I though I might as well benchmark how long compilation times are for the two D implementations and the Nim implementation. Obviously, the Python implementation is excluded from this test.
I cleared out all build artefacts (such as the
*.o files for D and the
nimcache folder for Nim) along with the built executables, then simply timed each build. Here are the results:
It turns out that the DMD compiler is much quicker than the LDC compiler. Obviously all the optimisations that LLVM makes take a little while longer to complete.
And that was the end of my little test. It turns out that Nim can certainly hold its own when it comes to parsing CSV files.
I’d be interested to see how things run on other machines and operating systems to see how things are across platforms. I’m also sure there are far better ways I could benchmark these implementations, which is why I’ve posted the project publicly on GitHub - I’d welcome any enhancements to any of the versions or the benchmarking scripts, and would certainly welcome some more results to see how things stack up!