Module nre

What is NRE?

A regular expression library for Nim using PCRE to do the hard work.

Note: If you love sequtils.toSeq we have bad news for you. This library doesn't work with it due to documented compiler limitations. As a workaround, use this:

import nre except toSeq

Licencing

PCRE has some additional terms that you must agree to in order to use this module.

Example

import nre

let vowels = re"[aeoui]"

for match in "moigagoo".findIter(vowels):
  echo match.matchBounds
# (a: 1, b: 1)
# (a: 2, b: 2)
# (a: 4, b: 4)
# (a: 6, b: 6)
# (a: 7, b: 7)

let firstVowel = "foo".find(vowels)
let hasVowel = firstVowel.isSome()
if hasVowel:
  let matchBounds = firstVowel.get().captureBounds[-1]
  echo "first vowel @", matchBounds.get().a
  # first vowel @1

Types

Regex = ref object
  pattern*: string             ## not nil
  pcreObj: ptr pcre.Pcre        ## not nil
  pcreExtra: ptr pcre.ExtraData ## nil
  captureNameToId: Table[string, int]
Represents the pattern that things are matched against, constructed with re(string). Examples: re"foo", re(r"(*ANYCRLF)(?x)foo # comment".
pattern: string
the string that was used to create the pattern.
captureCount: int
the number of captures that the pattern has.
captureNameId: Table[string, int]
a table from the capture names to their numeric id.

Options

The following options may appear anywhere in the pattern, and they affect the rest of it.

  • (?i) - case insensitive
  • (?m) - multi-line: ^ and $ match the beginning and end of lines, not of the subject string
  • (?s) - . also matches newline (dotall)
  • (?U) - expressions are not greedy by default. ? can be added to a qualifier to make it greedy
  • (?x) - whitespace and comments (#) are ignored (extended)
  • (?X) - character escapes without special meaning (\w vs. \a) are errors (extra)

One or a combination of these options may appear only at the beginning of the pattern:

  • (*UTF8) - treat both the pattern and subject as UTF-8
  • (*UCP) - Unicode character properties; \w matches я
  • (*U) - a combination of the two options above
  • (*FIRSTLINE*) - fails if there is not a match on the first line
  • (*NO_AUTO_CAPTURE) - turn off auto-capture for groups; (?<name>...) can be used to capture
  • (*CR) - newlines are separated by \r
  • (*LF) - newlines are separated by \n (UNIX default)
  • (*CRLF) - newlines are separated by \r\n (Windows default)
  • (*ANYCRLF) - newlines are separated by any of the above
  • (*ANY) - newlines are separated by any of the above and Unicode newlines:

    single characters VT (vertical tab, U+000B), FF (form feed, U+000C), NEL (next line, U+0085), LS (line separator, U+2028), and PS (paragraph separator, U+2029). For the 8-bit library, the last two are recognized only in UTF-8 mode. — man pcre

  • (*JAVASCRIPT_COMPAT) - JavaScript compatibility
  • (*NO_STUDY) - turn off studying; study is enabled by default

For more details on the leading option groups, see the Option Setting and the Newline Convention sections of the PCRE syntax manual.

  Source Edit
RegexMatch = object
  pattern*: Regex              ## The regex doing the matching.
                ## Not nil.
  str*: string                 ## The string that was matched against.
             ## Not nil.
  pcreMatchBounds: seq[HSlice[cint, cint]] ## First item is the bounds of the match
                                        ## Other items are the captures
                                        ## `a` is inclusive start, `b` is exclusive end
  
Usually seen as Option[RegexMatch], it represents the result of an execution. On failure, it is none, on success, it is some.
pattern: Regex
the pattern that is being matched
str: string
the string that was matched against
captures[]: string
the string value of whatever was captured at that id. If the value is invalid, then behavior is undefined. If the id is -1, then the whole match is returned. If the given capture was not matched, nil is returned.
  • "abc".match(re"(\w)").captures[0] == "a"
  • "abc".match(re"(?<letter>\w)").captures["letter"] == "a"
  • "abc".match(re"(\w)\w").captures[-1] == "ab"
captureBounds[]: Option[HSlice[int, int]]
gets the bounds of the given capture according to the same rules as the above. If the capture is not filled, then None is returned. The bounds are both inclusive.
  • "abc".match(re"(\w)").captureBounds[0] == 0 .. 0
  • "abc".match(re"").captureBounds[-1] == 0 .. -1
  • "abc".match(re"abc").captureBounds[-1] == 0 .. 2
match: string
the full text of the match.
matchBounds: HSlice[int, int]
the bounds of the match, as in captureBounds[]
(captureBounds|captures).toTable
returns a table with each named capture as a key.
(captureBounds|captures).toSeq
returns all the captures by their number.
$: string
same as match
  Source Edit
Captures = distinct RegexMatch
  Source Edit
CaptureBounds = distinct RegexMatch
  Source Edit
RegexError = ref object of Exception
  Source Edit
RegexInternalError = ref object of RegexError
  
Internal error in the module, this probably means that there is a bug   Source Edit
InvalidUnicodeError = ref object of RegexError
  pos*: int                    ## the location of the invalid unicode in bytes
  
Thrown when matching fails due to invalid unicode in strings   Source Edit
SyntaxError = ref object of RegexError
  pos*: int                    ## the location of the syntax error in bytes
  pattern*: string             ## the pattern that caused the problem
  
Thrown when there is a syntax error in the regular expression string passed in   Source Edit
StudyError = ref object of RegexError
  
Thrown when studying the regular expression failes for whatever reason. The message contains the error code.   Source Edit

Procs

proc captureCount(pattern: Regex): int {...}{.raises: [FieldError, ValueError], tags: [].}
  Source Edit
proc captureNameId(pattern: Regex): Table[string, int] {...}{.raises: [], tags: [].}
  Source Edit
proc captureBounds(pattern: RegexMatch): CaptureBounds {...}{.raises: [], tags: [].}
  Source Edit
proc captures(pattern: RegexMatch): Captures {...}{.raises: [], tags: [].}
  Source Edit
proc `[]`(pattern: CaptureBounds; i: int): Option[HSlice[int, int]] {...}{.raises: [],
    tags: [].}
unequals operator. This is a shorthand for not (x == y).   Source Edit
proc `[]`(pattern: Captures; i: int): string {...}{.raises: [UnpackError], tags: [].}
  Source Edit
proc match(pattern: RegexMatch): string {...}{.raises: [UnpackError], tags: [].}
  Source Edit
proc matchBounds(pattern: RegexMatch): HSlice[int, int] {...}{.raises: [UnpackError],
    tags: [].}
  Source Edit
proc `[]`(pattern: CaptureBounds; name: string): Option[HSlice[int, int]] {...}{.
    raises: [KeyError], tags: [].}
  Source Edit
proc `[]`(pattern: Captures; name: string): string {...}{.raises: [UnpackError, KeyError],
    tags: [].}
  Source Edit
proc toTable(pattern: Captures; default: string = ""): Table[string, string] {...}{.
    raises: [UnpackError, KeyError], tags: [].}
iterates over any key in the table t.   Source Edit
proc toTable(pattern: CaptureBounds; default = none(HSlice[int, int])): Table[string,
    Option[HSlice[int, int]]] {...}{.raises: [KeyError], tags: [].}
iterates over any key in the table t.   Source Edit
proc toSeq(pattern: CaptureBounds; default = none(HSlice[int, int])): seq[
    Option[HSlice[int, int]]] {...}{.raises: [FieldError, ValueError], tags: [].}
helps to convert an iterator to a proc.   Source Edit
proc toSeq(pattern: Captures; default: string = ""): seq[string] {...}{.
    raises: [FieldError, ValueError, UnpackError], tags: [].}
helps to convert an iterator to a proc.   Source Edit
proc `$`(pattern: RegexMatch): string {...}{.raises: [UnpackError], tags: [].}
  Source Edit
proc `==`(a, b: Regex): bool {...}{.raises: [], tags: [].}
  Source Edit
proc `==`(a, b: RegexMatch): bool {...}{.raises: [], tags: [].}
  Source Edit
proc re(pattern: string): Regex {...}{.raises: [KeyError, SyntaxError, StudyError,
                                      FieldError, ValueError], tags: [].}
  Source Edit
proc match(str: string; pattern: Regex; start = 0; endpos = int.high): Option[RegexMatch] {...}{.raises: [
    FieldError, ValueError, AccessViolationError, RegexInternalError,
    InvalidUnicodeError], tags: [].}
Like ```find(...)`` <#proc-find>`_, but anchored to the start of the string. This means that "foo".match(re"f") == true, but "foo".match(re"o") == false.   Source Edit
proc find(str: string; pattern: Regex; start = 0; endpos = int.high): Option[RegexMatch] {...}{.raises: [
    FieldError, ValueError, AccessViolationError, RegexInternalError,
    InvalidUnicodeError], tags: [].}
Finds the given pattern in the string between the end and start positions.
start
The start point at which to start matching. |abc is 0; a|bc is 1
endpos
The maximum index for a match; int.high means the end of the string, otherwise it’s an inclusive upper bound.
  Source Edit
proc findAll(str: string; pattern: Regex; start = 0; endpos = int.high): seq[string] {...}{.raises: [
    FieldError, ValueError, UnpackError, AccessViolationError, RegexInternalError,
    InvalidUnicodeError], tags: [].}
"is greater" operator. This is the same as y < x.   Source Edit
proc contains(str: string; pattern: Regex; start = 0; endpos = int.high): bool {...}{.raises: [
    FieldError, ValueError, AccessViolationError, RegexInternalError,
    InvalidUnicodeError], tags: [].}
Determine if the string contains the given pattern between the end and start positions:
  • "abc".contains(re"bc") == true
  • "abc".contains(re"cd") == false
  • "abc".contains(re"a", start = 1) == false

Same as isSome(str.find(pattern, start, endpos)).

  Source Edit
proc split(str: string; pattern: Regex; maxSplit = -1; start = 0): seq[string] {...}{.raises: [
    FieldError, ValueError, UnpackError, AccessViolationError, RegexInternalError,
    InvalidUnicodeError], tags: [].}
Splits the string with the given regex. This works according to the rules that Perl and Javascript use:
  • If the match is zero-width, then the string is still split: "123".split(r"") == @["1", "2", "3"].
  • If the pattern has a capture in it, it is added after the string split: "12".split(re"(\d)") == @["", "1", "", "2", ""].
  • If maxsplit != -1, then the string will only be split maxsplit - 1 times. This means that there will be maxsplit strings in the output seq. "1.2.3".split(re"\.", maxsplit = 2) == @["1", "2.3"]

start behaves the same as in ```find(...)`` <#proc-find>`_.

  Source Edit
proc replace(str: string; pattern: Regex; subproc: proc (match: RegexMatch): string): string {...}{.raises: [
    FieldError, ValueError, UnpackError, AccessViolationError, RegexInternalError,
    InvalidUnicodeError], tags: [].}

Replaces each match of Regex in the string with subproc, which should never be or return nil.

If subproc is a proc (RegexMatch): string, then it is executed with each match and the return value is the replacement value.

If subproc is a proc (string): string, then it is executed with the full text of the match and and the return value is the replacement value.

If subproc is a string, the syntax is as follows:

  • $$ - literal $
  • $123 - capture number 123
  • $foo - named capture foo
  • ${foo} - same as above
  • $1$# - first and second captures
  • $# - first capture
  • $0 - full match

If a given capture is missing, a ValueError exception is thrown.

  Source Edit
proc replace(str: string; pattern: Regex; subproc: proc (match: string): string): string {...}{.raises: [
    FieldError, ValueError, UnpackError, AccessViolationError, RegexInternalError,
    InvalidUnicodeError], tags: [].}
"is greater" operator. This is the same as y < x.   Source Edit
proc replace(str: string; pattern: Regex; sub: string): string {...}{.raises: [FieldError,
    ValueError, UnpackError, AccessViolationError, RegexInternalError,
    InvalidUnicodeError, KeyError, Exception], tags: [].}
"is greater" operator. This is the same as y < x.   Source Edit
proc escapeRe(str: string): string {...}{.raises: [FieldError, ValueError, UnpackError,
    AccessViolationError, RegexInternalError, InvalidUnicodeError, KeyError,
    Exception], tags: [].}
Escapes the string so it doesn’t match any special characters. Incompatible with the Extra flag (X).   Source Edit

Iterators

iterator items(pattern: CaptureBounds; default = none(HSlice[int, int])): Option[
    HSlice[int, int]] {...}{.raises: [FieldError, ValueError], tags: [].}
  Source Edit
iterator items(pattern: Captures; default: string = ""): string {...}{.
    raises: [FieldError, ValueError, UnpackError], tags: [].}
  Source Edit
iterator findIter(str: string; pattern: Regex; start = 0; endpos = int.high): RegexMatch {...}{.raises: [
    FieldError, ValueError, UnpackError, AccessViolationError, RegexInternalError,
    InvalidUnicodeError], tags: [].}

Works the same as ```find(...)`` <#proc-find>`_, but finds every non-overlapping match. "2222".find(re"22") is "22", "22", not "22", "22", "22".

Arguments are the same as ```find(...)`` <#proc-find>`_

Variants:

  • proc findAll(...) returns a seq[string]
  Source Edit