User Guide

Basics

The simplest parser matches a String or Char iterator.

julia> parse_a = parser("aa")re"aa"
julia> parse(parse_a,"aa")"aa"
julia> parse(parse_a,"ab")ERROR: ArgumentError: no successfull parsing.

Character Sets

julia> parse(CharIn('a':'z'),"c")'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
julia> parse(CharIn(isuppercase),"A")'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
julia> parse(CharNotIn('a':'z'),"A")'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
julia> parse(CharNotIn(isuppercase),"c")'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)

Sequence

Several parsers can be combined with the Sequence constructor and the * operator. The result_type of a Sequence is the Tuple of the result_types of its parts.

julia> p = CharIn(isuppercase) * CharIn(islowercase)๐Ÿ—„ Sequence
โ”œโ”€ [isuppercase(...)] ValueIn
โ””โ”€ [islowercase(...)] ValueIn
::Tuple{Char, Char}
julia> parse(p,"Ab")('A', 'b')

getindex on a sequence creates a transforming parser selecting from the result of the parsing.

julia> parse(Sequence(CharIn(isuppercase) * CharIn(islowercase))[2],"Ab")ERROR: BoundsError: attempt to access DataType at index [2]

Sequence keyword argument constructors transform the parsing into a named tuple. If some Sequence arguments are <:Pair{Symbol}, only those are retained in a NamedTuple.

julia> p = Sequence(first = CharIn(isuppercase), second = CharIn(islowercase))๐Ÿ—„ Sequence |> map(ntuple)
โ”œโ”€ [isuppercase(...)] ValueIn |> with_name(:first)
โ””โ”€ [islowercase(...)] ValueIn |> with_name(:second)
::NamedTuple{(:first, :second), Tuple{Char, Char}}
julia> parse(p,"Ab")(first = 'A', second = 'b')
julia> parse(Sequence(CharIn(isuppercase), :second => CharIn(islowercase)),"Ab")(second = 'b',)

Either

The | operator and constructor Either try matching the provided parsers in order, accepting the first match, and fails if all parsers fail.

julia> parse(("a"|"ab"),"ab")"a"

Repeat

The Repeat constructor creates a new parser repeating its argument zero or more times, and by default transforming to Vector{result_type(p)}. Repeating a specified number of times can be achieved with Repeat(p,min=1,max=2), or Repeat(1,p) or Repeat(1,2,p).

julia> parse(join(Repeat('a')," "),"a a")2-element Vector{Char}:
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

Optional

Similar to Repeat, Optional creates a parser, repeating 0 or 1 times. The result_type(Optional(p, default=d)) is promote_type (or Union type is type promotion results in Any).

julia> option = Optional('a') * join(Repeat('b'),"-")๐Ÿ—„ Sequence
โ”œโ”€ a? |missing
โ””โ”€ ๐Ÿ—„ Sequence |> map(#74)
   โ”œโ”€ b
   โ””โ”€ \-b* Sequence |> map(#54) |> Repeat
::Tuple{Union{Missing, Char}, Vector{Char}}

Feedback appreciated:

julia> option = ( CharIn('a':'z') | missing ) * join(Repeat('b'),"-")๐Ÿ—„ Sequence
โ”œโ”€ [a-z]? ValueIn|missing
โ””โ”€ ๐Ÿ—„ Sequence |> map(#74)
   โ”œโ”€ b
   โ””โ”€ \-b* Sequence |> map(#54) |> Repeat
::Tuple{Union{Missing, Char}, Vector{Char}}

Lazy repetitions and optional parsers

Repetition and optional parsers are greedy by default

julia> parse_all(Repeat(AnyChar()), "abc") |> collect4-element Vector{Vector{Char}}:
 ['a', 'b', 'c']
 ['a', 'b']
 ['a']
 []

Wrapping in Lazy switches to lazy matching:

julia> parse_all(Lazy(Repeat(AnyChar())), "abc") |> collect4-element Vector{Vector{Char}}:
 []
 ['a']
 ['a', 'b']
 ['a', 'b', 'c']

Assertions

Parsers that do not advance the parsing position can be used to assert conditions during parsing.

AtStart() and AtEnd()

AtStart only succeeds if at the start of the input, and similarly the AtEnd succeeds only at the end of the input. By default, parse does not need to consume the full input but succeeds with the first match. With AtEnd() the parser can be forced to consume the full input or fail otherwise.

julia> parse(("a"|"ab")*AtEnd(),"ab")("ab", re"$")

Looking around

Lookahead and Lookbehind parsers wrap a parser p,

The @re_str macro [demonstrates][pcre.md] a parser for lookahead and lookbehind expressions.

Atomic groups

Backtracking of a parser p can be prevented by wrapping in Atomic(Repeat(p)). An atomic parser fails if p fails or if the first successfull parsing with p leads to a failing later in the parsing process.

julia> parse(Either("a","ab","ac")*AtEnd(),"ab")("ab", re"$")
julia> parse(Atomic(Either("a","ab","ac"))*AtEnd(),"ab")ERROR: ArgumentError: no successfull parsing.