User Guide
Basics
The simplest parser matches a String
or Char
iterator.
julia> parse_a = parser("aa")
re"aa"
julia> parse(parse_a,"aa")
"aa"
julia> parse(parse_a,"ab")
ERROR: ArgumentError: no successfull parsing.
Character Sets
julia> parse(CharIn('a':'z'),"c")
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
julia> parse(CharIn(isuppercase),"A")
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
julia> parse(CharNotIn('a':'z'),"A")
'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase)
julia> parse(CharNotIn(isuppercase),"c")
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
Sequence
Several parsers can be combined with the Sequence
constructor and the *
operator. The result_type
of a Sequence
is the Tuple of the result_type
s of its parts.
julia> p = CharIn(isuppercase) * CharIn(islowercase)
๐ Sequence โโ [isuppercase(...)] ValueIn โโ [islowercase(...)] ValueIn ::Tuple{Char, Char}
julia> parse(p,"Ab")
('A', 'b')
getindex
on a sequence creates a transforming parser selecting from the result of the parsing.
julia> parse(Sequence(CharIn(isuppercase) * CharIn(islowercase))[2],"Ab")
ERROR: BoundsError: attempt to access DataType at index [2]
Sequence keyword argument constructors transform the parsing into a named tuple. If some Sequence arguments are <:Pair{Symbol}
, only those are retained in a NamedTuple.
julia> p = Sequence(first = CharIn(isuppercase), second = CharIn(islowercase))
๐ Sequence |> map(ntuple) โโ [isuppercase(...)] ValueIn |> with_name(:first) โโ [islowercase(...)] ValueIn |> with_name(:second) ::NamedTuple{(:first, :second), Tuple{Char, Char}}
julia> parse(p,"Ab")
(first = 'A', second = 'b')
julia> parse(Sequence(CharIn(isuppercase), :second => CharIn(islowercase)),"Ab")
(second = 'b',)
Either
The |
operator and constructor Either
try matching the provided parsers in order, accepting the first match, and fails if all parsers fail.
julia> parse(("a"|"ab"),"ab")
"a"
Repeat
The Repeat
constructor creates a new parser repeating its argument zero or more times, and by default transforming to Vector{result_type(p)}
. Repeating a specified number of times can be achieved with Repeat(p,min=1,max=2)
, or Repeat(1,p)
or Repeat(1,2,p)
.
julia> parse(join(Repeat('a')," "),"a a")
2-element Vector{Char}: 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase) 'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
Optional
Similar to Repeat, Optional
creates a parser, repeating 0 or 1 times. The result_type(Optional(p, default=d))
is promote_type
(or Union
type is type promotion results in Any).
julia> option = Optional('a') * join(Repeat('b'),"-")
๐ Sequence โโ a? |missing โโ ๐ Sequence |> map(#74) โโ b โโ \-b* Sequence |> map(#54) |> Repeat ::Tuple{Union{Missing, Char}, Vector{Char}}
Feedback appreciated:
julia> option = ( CharIn('a':'z') | missing ) * join(Repeat('b'),"-")
๐ Sequence โโ [a-z]? ValueIn|missing โโ ๐ Sequence |> map(#74) โโ b โโ \-b* Sequence |> map(#54) |> Repeat ::Tuple{Union{Missing, Char}, Vector{Char}}
Lazy repetitions and optional parsers
Repetition and optional parsers are greedy by default
julia> parse_all(Repeat(AnyChar()), "abc") |> collect
4-element Vector{Vector{Char}}: ['a', 'b', 'c'] ['a', 'b'] ['a'] []
Wrapping in Lazy
switches to lazy matching:
julia> parse_all(Lazy(Repeat(AnyChar())), "abc") |> collect
4-element Vector{Vector{Char}}: [] ['a'] ['a', 'b'] ['a', 'b', 'c']
Assertions
Parsers that do not advance the parsing position can be used to assert conditions during parsing.
AtStart() and AtEnd()
AtStart
only succeeds if at the start of the input, and similarly the AtEnd
succeeds only at the end of the input. By default, parse
does not need to consume the full input but succeeds with the first match. With AtEnd()
the parser can be forced to consume the full input or fail otherwise.
julia> parse(("a"|"ab")*AtEnd(),"ab")
("ab", re"$")
Looking around
Lookahead
and Lookbehind
parsers wrap a parser p
,
- succeed iif
p
matches (PositiveLookahead
,PositiveLookbehind
), respectively iif failed (NegativeLookahead
,NegativeLookbehind
), - without advancing the position.
The @re_str
macro [demonstrates][pcre.md] a parser for lookahead and lookbehind expressions.
Atomic groups
Backtracking of a parser p
can be prevented by wrapping in Atomic
(Repeat(p))
. An atomic parser fails if p
fails or if the first successfull parsing with p
leads to a failing later in the parsing process.
julia> parse(Either("a","ab","ac")*AtEnd(),"ab")
("ab", re"$")
julia> parse(Atomic(Either("a","ab","ac"))*AtEnd(),"ab")
ERROR: ArgumentError: no successfull parsing.