What is NRE?

A regular expression library for Nim using PCRE to do the hard work.

For documentation on how to write patterns, there exists the official PCRE pattern documentation. You can also search the internet for a wide variety of third-party documentation and tools.

Note: If you love sequtils.toSeq we have bad news for you. This library doesn't work with it due to documented compiler limitations. As a workaround, use this:

import nre except toSeq

Licencing

PCRE has some additional terms that you must agree to in order to use this module.

Example:

let vowels = re"[aeoui]"

let expectedResults = [
  1 .. 1,
  2 .. 2,
  4 .. 4,
  6 .. 6,
  7 .. 7,
]
var i = 0
for match in "moigagoo".findIter(vowels):
  doAssert match.matchBounds == expectedResults[i]
  inc i

let firstVowel = "foo".find(vowels)
let hasVowel = firstVowel.isSome()
if hasVowel:
  let matchBounds = firstVowel.get().captureBounds[-1]
  doAssert matchBounds.a == 1

## as with module `re`, unless specified otherwise, `start` parameter in each
## proc indicates where the scan starts, but outputs are relative to the start
## of the input string, not to `start`:
doAssert find("uxabc", re"(?<=x|y)ab", start = 1).get.captures[-1] == "ab"
doAssert find("uxabc", re"ab", start = 3).isNone

Example:

# This MUST be kept in sync with the examples in RegexMatch
doAssert "abc".match(re"(\w)").get.captures[0] == "a"
doAssert "abc".match(re"(?<letter>\w)").get.captures["letter"] == "a"
doAssert "abc".match(re"(\w)\w").get.captures[-1] == "ab"

doAssert "abc".match(re"(\w)").get.captureBounds[0] == 0 .. 0
doAssert 0 in "abc".match(re"(\w)").get.captureBounds == true
doAssert "abc".match(re"").get.captureBounds[-1] == 0 .. -1
doAssert "abc".match(re"abc").get.captureBounds[-1] == 0 .. 2

Imports

pcre, util, tables, strutils, math, options, unicode

Types

Regex = ref object pattern*: string ## not nil pcreObj: ptr pcre.Pcre ## not nil pcreExtra: ptr pcre.ExtraData ## nil captureNameToId: Table[string, int]

Represents the pattern that things are matched against, constructed with re(string). Examples: re"foo", re(r"(*ANYCRLF)(?x)foo # comment".

pattern: string: the string that was used to create the pattern. For details on how to write a pattern, please see the official PCRE pattern documentation.
captureCount: int: the number of captures that the pattern has.
captureNameId: Table[string, int]: a table from the capture names to their numeric id.

Options

The following options may appear anywhere in the pattern, and they affect the rest of it.

(?i) - case insensitive
(?m) - multi-line: ^ and $ match the beginning and end of lines, not of the subject string
(?s) - . also matches newline (dotall)
(?U) - expressions are not greedy by default. ? can be added to a qualifier to make it greedy
(?x) - whitespace and comments (#) are ignored (extended)
(?X) - character escapes without special meaning (\w vs. \a) are errors (extra)

One or a combination of these options may appear only at the beginning of the pattern:

(*UTF8) - treat both the pattern and subject as UTF-8
(*UCP) - Unicode character properties; \w matches я
(*U) - a combination of the two options above
(*FIRSTLINE*) - fails if there is not a match on the first line
(*NO_AUTO_CAPTURE) - turn off auto-capture for groups; (?<name>...) can be used to capture
(*CR) - newlines are separated by \r
(*LF) - newlines are separated by \n (UNIX default)
(*CRLF) - newlines are separated by \r\n (Windows default)
(*ANYCRLF) - newlines are separated by any of the above
(*ANY) - newlines are separated by any of the above and Unicode newlines:
single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit library, the last two are recognized only in UTF-8 mode. — man pcre
(*JAVASCRIPT_COMPAT) - JavaScript compatibility
(*NO_STUDY) - turn off studying; study is enabled by default

For more details on the leading option groups, see the Option Setting and the Newline Convention sections of the PCRE syntax manual.

Some of these options are not part of PCRE and are converted by nre into PCRE flags. These include NEVER_UTF, ANCHORED, DOLLAR_ENDONLY, FIRSTLINE, NO_AUTO_CAPTURE, JAVASCRIPT_COMPAT, U, NO_STUDY. In other PCRE wrappers, you will need to pass these as separate flags to PCRE.