Using CombinedParsers
Printing
Printing CombinedParsers uses AbstractTrees.jl for printing. The tree nodes are printed with
- a colored regular expressionsish prefix
๐Sub-parsers are shown as children branches.CombinedParsers.WrappedParserconstructors are displayed with pipe|>syntax.
In the last line of printing the infered result type of the CombinedParser is printed.
Printing is useful to understand the structure of regular expressions, while also learning CombinedParser syntax:
julia> p = trim(re"(?:a+c)*b")
๐ Sequence[2]
โโ (?>[\h]*) CharIn |> Repeat |> Atomic
โโ ๐ Sequence
โ โโ ๐* Sequence |> Repeat
โ โ โโ a+ |> Repeat
โ โ โโ c
โ โโ b
โโ (?>[\h]*) CharIn |> Repeat |> Atomic
::Tuple{Vector{Tuple{Vector{Char}, Char}}, Char}Parser templates
Matching
Base.match โ FunctionBase.match(parser::CombinedParser,sequence::AbstractString[, idx::Integer]; log=nothing)Search for the first match of parser in sequence and return a ParseMatch object containing the match, or nothing if the match failed.
The optional idx argument specifies an index at which to start the search.
If log!==nothing, parser is transformed with log_names(p, log).
The matching substring can be retrieved by accessing m.match.
If parser isa CombinedParsers.Regexp.ParserWithCaptures, match behaves like a plug-in replacement for equivalent match(::Regex,sequence):
julia> m = match(re"(?<a>so)+ (or)", "soso or")
ParseMatch("soso or", a="so", 2="or")
julia> m[:a]
"so"
julia> m[2]
"or"
julia> m.match, m.captures
("soso or", SubString{String}["so", "or"])
Parsing
CombinedParser comprise of a pattern as well transformation functions to produce a Julia result_type from a match with get.
julia> match(trim(re"(?:a+c)*b"), "aacacb")
ParseMatch("aacacb")
julia> get(m)
([(['a', 'a'], 'c'), (['a'], 'c')], 'b')Defining transformations is detailed in the transformation section.
Base.get โ FunctionBase.get(parser::Assertion{MatchState, <:Assertion}, sequence, till, after, i, state)Most assertions return the assertion parser as a result (AtStart, AtEnd, Always, Never, NegativeLookahead, NegativeLookbehind).
Base.get(x::ParseMatch{<:MatchTuple})Get the result of a match result.
julia> m = match(re"(?<a>so)+ (or)", "soso or")
ParseMatch("soso or", a="so", 2="or")
julia> get(m)
([('s', 'o'), ('s', 'o')], ' ', ('o', 'r'))
julia> m[2]
"or"
julia> m.match, m.captures
("soso or", SubString{String}["so", "or"])Base.get(parser::PositiveLookbehind, sequence, till, after, i, state)get result of PositiveLookbehind
The result is currently for a reversed sequence, and you might find it difficult to Base.map a lookbehind parser match. If you require this functionality please open an issue for discussion.
Assertions do not consume input, so typically these input chars are parsed/mapped outside of the assertion.
julia> p = Sequence(!re"a+b", PositiveLookbehind(!re"a+b"))
๐ Sequence
โโ ๐ Sequence |> !
โ โโ a+ |> Repeat
โ โโ b
โโ (?<=๐) Sequence |> ! |> PositiveLookbehind
โโ b
โโ a+ |> Repeat
::Tuple{SubString{String}, SubString{String}}
julia> p("aaab")
("aaab", "baaa")Base.get(parser::Bytes{N,T}, sequence::Vector{UInt8})Endianness can be achieved by just mapping bswap
julia> map(bswap, Bytes(2,UInt16))([0x16,0x11])
0x1611
julia> Bytes(2,UInt16)([0x16,0x11])
0x1116Base.get(parser::Transformation{<:Function}, a...)
Base.get(parser::Transformation{<:Type}, a...)Function call parser.transform(get(parser.parser,a...)).
Base.get(parser::Transformation{<:IndexAt}, a...)getindex(get(parser.parser,a...).parser.transform)
Base.parse โ Functionparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)Parse sequence with parser at start and produce an instance of result_type(parser). If log!==nothing, parser is transformed with log_names(p, log) before matching.
tryparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)returns either a result value or nothing if sequence does not start with with a match.
tryparse_pos(parser::CombinedParser, str::AbstractString[, idx=firstindex(sequence)[, till=lastindex(s)]]; log=nothing)returns either a tuple of result value and the position after the match, or nothing if sequence does not start with with a match.
Example
julia> using TextParse
julia> p = ("Number: "*TextParse.Numeric(Int))[2]
๐ Sequence[2]
โโ Number\:\
โโ <Int64>
::Int64
julia> parse(p,"Number: 42")
42
Base.tryparse โ Functionparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)Parse sequence with parser at start and produce an instance of result_type(parser). If log!==nothing, parser is transformed with log_names(p, log) before matching.
tryparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)returns either a result value or nothing if sequence does not start with with a match.
tryparse_pos(parser::CombinedParser, str::AbstractString[, idx=firstindex(sequence)[, till=lastindex(s)]]; log=nothing)returns either a tuple of result value and the position after the match, or nothing if sequence does not start with with a match.
Example
julia> using TextParse
julia> p = ("Number: "*TextParse.Numeric(Int))[2]
๐ Sequence[2]
โโ Number\:\
โโ <Int64>
::Int64
julia> parse(p,"Number: 42")
42
CombinedParsers.tryparse_pos โ Functionparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)Parse sequence with parser at start and produce an instance of result_type(parser). If log!==nothing, parser is transformed with log_names(p, log) before matching.
tryparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)returns either a result value or nothing if sequence does not start with with a match.
tryparse_pos(parser::CombinedParser, str::AbstractString[, idx=firstindex(sequence)[, till=lastindex(s)]]; log=nothing)returns either a tuple of result value and the position after the match, or nothing if sequence does not start with with a match.
Example
julia> using TextParse
julia> p = ("Number: "*TextParse.Numeric(Int))[2]
๐ Sequence[2]
โโ Number\:\
โโ <Int64>
::Int64
julia> parse(p,"Number: 42")
42
Iterating matches
CombinedParsers iterates through all matches if parsing is ambiguous. How to write custom parser match iterations is detailed in the internals section.
Base.iterate โ FunctionBase.iterate(x::ParseMatch[, m::ParseMatch=x])Returns next ParseMatch at m.offset after m.state, see _iterate(m).
Base.iterate(x::MatchesIterator[, s::ParseMatch=ParseMatch(x)])Iterate match s at current position. While no match is found and s.offset<=x.stop, s.offset is incremented to search.
Return first next ParseMatch (as return value and state) or nothing when at x.stop.
CombinedParsers.match_all โ Functionmatch_all(parser::CombinedParser, sequence, a...; kw...)Returns an iterator over all matches of CombinedParsers.wrap(parser; kw...). Constructs a MatchesIterator defining match index range with with a....
CombinedParsers.parse_all โ Functionparse_all(parser::CombinedParser, sequence, idx=1)Returns an iterator over all parsings of the sequence offset at idx.