std/unidecode

Search:
Source   Edit  

This module is based on Python's Unidecode module by Tomaz Solc, which in turn is based on the Text::Unidecode Perl module by Sean M. Burke.

It provides a unidecode proc that does Unicode to ASCII transliterations: It finds the sequence of ASCII characters that is the closest approximation to the Unicode string.

For example, the closest to string "Äußerst" in ASCII is "Ausserst". Some information is lost in this transformation, of course, since several Unicode strings can be transformed to the same ASCII representation. So this is a strictly one-way transformation. However, a human reader will probably still be able to guess from the context, what the original string was.

This module needs the data file unidecode.dat to work: This file is embedded as a resource into your application by default. You can also define the symbol --define:noUnidecodeTable during compile time and use the loadUnidecodeTable proc to initialize this module.

Procs

proc loadUnidecodeTable(datafile = "unidecode.dat") {....raises: [], tags: [],
    forbids: [].}
Loads the datafile that unidecode needs to work. This is only required if the module was compiled with the --define:noUnidecodeTable switch. This needs to be called by the main thread before any thread can make a call to unidecode. Source   Edit  
proc unidecode(s: string): string {....raises: [], tags: [], forbids: [].}
Finds the sequence of ASCII characters that is the closest approximation to the UTF-8 string s.

Example:

doAssert unidecode("北京") == "Bei Jing "
doAssert unidecode("Äußerst") == "Ausserst"
Source   Edit