User Guide
Basics
The simplest parser matches a String or Char iterator.
julia> parse_a = parser("aa")re"aa"julia> parse(parse_a,"aa")"aa"julia> parse(parse_a,"ab")ERROR: ArgumentError: no successfull parsing.
Character Sets
julia> parse(CharIn('a':'z'),"c")'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)julia> parse(CharIn(isuppercase),"A")'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)julia> parse(CharNotIn('a':'z'),"A")'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)julia> parse(CharNotIn(isuppercase),"c")'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
Sequence
Several parsers can be combined with the Sequence constructor and the * operator. The result_type of a Sequence is the Tuple of the result_types of its parts.
julia> p = CharIn(isuppercase) * CharIn(islowercase)๐ Sequence โโ [isuppercase(...)] ValueIn โโ [islowercase(...)] ValueIn ::Tuple{Char, Char}julia> parse(p,"Ab")('A', 'b')
getindex on a sequence creates a transforming parser selecting from the result of the parsing.
julia> parse(Sequence(CharIn(isuppercase) * CharIn(islowercase))[2],"Ab")ERROR: BoundsError: attempt to access DataType at index [2]
Sequence keyword argument constructors transform the parsing into a named tuple. If some Sequence arguments are <:Pair{Symbol}, only those are retained in a NamedTuple.
julia> p = Sequence(first = CharIn(isuppercase), second = CharIn(islowercase))๐ Sequence |> map(ntuple) โโ [isuppercase(...)] ValueIn |> with_name(:first) โโ [islowercase(...)] ValueIn |> with_name(:second) ::NamedTuple{(:first, :second), Tuple{Char, Char}}julia> parse(p,"Ab")(first = 'A', second = 'b')julia> parse(Sequence(CharIn(isuppercase), :second => CharIn(islowercase)),"Ab")(second = 'b',)
Either
The | operator and constructor Either try matching the provided parsers in order, accepting the first match, and fails if all parsers fail.
julia> parse(("a"|"ab"),"ab")"a"
Repeat
The Repeat constructor creates a new parser repeating its argument zero or more times, and by default transforming to Vector{result_type(p)}. Repeating a specified number of times can be achieved with Repeat(p,min=1,max=2), or Repeat(1,p) or Repeat(1,2,p).
julia> parse(join(Repeat('a')," "),"a a")2-element Vector{Char}: 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase) 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
Optional
Similar to Repeat, Optional creates a parser, repeating 0 or 1 times. The result_type(Optional(p, default=d)) is promote_type (or Union type is type promotion results in Any).
julia> option = Optional('a') * join(Repeat('b'),"-")๐ Sequence โโ a? |missing โโ ๐ Sequence |> map(#74) โโ b โโ \-b* Sequence |> map(#54) |> Repeat ::Tuple{Union{Missing, Char}, Vector{Char}}
Feedback appreciated:
julia> option = ( CharIn('a':'z') | missing ) * join(Repeat('b'),"-")๐ Sequence โโ [a-z]? ValueIn|missing โโ ๐ Sequence |> map(#74) โโ b โโ \-b* Sequence |> map(#54) |> Repeat ::Tuple{Union{Missing, Char}, Vector{Char}}
Lazy repetitions and optional parsers
Repetition and optional parsers are greedy by default
julia> parse_all(Repeat(AnyChar()), "abc") |> collect4-element Vector{Vector{Char}}: ['a', 'b', 'c'] ['a', 'b'] ['a'] []
Wrapping in Lazy switches to lazy matching:
julia> parse_all(Lazy(Repeat(AnyChar())), "abc") |> collect4-element Vector{Vector{Char}}: [] ['a'] ['a', 'b'] ['a', 'b', 'c']
Assertions
Parsers that do not advance the parsing position can be used to assert conditions during parsing.
AtStart() and AtEnd()
AtStart only succeeds if at the start of the input, and similarly the AtEnd succeeds only at the end of the input. By default, parse does not need to consume the full input but succeeds with the first match. With AtEnd() the parser can be forced to consume the full input or fail otherwise.
julia> parse(("a"|"ab")*AtEnd(),"ab")("ab", re"$")
Looking around
Lookahead and Lookbehind parsers wrap a parser p,
- succeed iif
pmatches (PositiveLookahead,PositiveLookbehind), respectively iif failed (NegativeLookahead,NegativeLookbehind), - without advancing the position.
The @re_str macro [demonstrates][pcre.md] a parser for lookahead and lookbehind expressions.
Atomic groups
Backtracking of a parser p can be prevented by wrapping in Atomic(Repeat(p)). An atomic parser fails if p fails or if the first successfull parsing with p leads to a failing later in the parsing process.
julia> parse(Either("a","ab","ac")*AtEnd(),"ab")("ab", re"$")julia> parse(Atomic(Either("a","ab","ac"))*AtEnd(),"ab")ERROR: ArgumentError: no successfull parsing.