Using CombinedParsers
Printing
Printing CombinedParser
s uses AbstractTrees.jl
for printing. The tree nodes are printed with
- a colored regular expressionsish prefix
๐
Sub-parsers are shown as children branches.CombinedParsers.WrappedParser
constructors are displayed with pipe|>
syntax.
In the last line of printing the infered result type of the CombinedParser
is printed.
Printing is useful to understand the structure of regular expressions, while also learning CombinedParser
syntax:
julia> p = trim(re"(?:a+c)*b")
๐ Sequence[2]
โโ (?>[\h]*) CharIn |> Repeat |> Atomic
โโ ๐ Sequence
โ โโ ๐* Sequence |> Repeat
โ โ โโ a+ |> Repeat
โ โ โโ c
โ โโ b
โโ (?>[\h]*) CharIn |> Repeat |> Atomic
::Tuple{Vector{Tuple{Vector{Char}, Char}}, Char}
Parser templates
Matching
Base.match
โ FunctionBase.match(parser::CombinedParser,sequence::AbstractString[, idx::Integer]; log=nothing)
Search for the first match of parser
in sequence
and return a ParseMatch
object containing the match, or nothing
if the match failed.
The optional idx
argument specifies an index at which to start the search.
If log!==nothing
, parser is transformed with log_names
(p, log)
.
The matching substring can be retrieved by accessing m.match.
If parser isa CombinedParsers.Regexp.ParserWithCaptures
, match
behaves like a plug-in replacement for equivalent match(::Regex,sequence)
:
julia> m = match(re"(?<a>so)+ (or)", "soso or")
ParseMatch("soso or", a="so", 2="or")
julia> m[:a]
"so"
julia> m[2]
"or"
julia> m.match, m.captures
("soso or", SubString{String}["so", "or"])
Parsing
CombinedParser
comprise of a pattern as well transformation functions to produce a Julia result_type
from a match
with get
.
julia> match(trim(re"(?:a+c)*b"), "aacacb")
ParseMatch("aacacb")
julia> get(m)
([(['a', 'a'], 'c'), (['a'], 'c')], 'b')
Defining transformations is detailed in the transformation section.
Base.get
โ FunctionBase.get(parser::Assertion{MatchState, <:Assertion}, sequence, till, after, i, state)
Most assertions return the assertion parser as a result (AtStart
, AtEnd
, Always
, Never
, NegativeLookahead
, NegativeLookbehind
).
Base.get(x::ParseMatch{<:MatchTuple})
Get the result of a match result.
julia> m = match(re"(?<a>so)+ (or)", "soso or")
ParseMatch("soso or", a="so", 2="or")
julia> get(m)
([('s', 'o'), ('s', 'o')], ' ', ('o', 'r'))
julia> m[2]
"or"
julia> m.match, m.captures
("soso or", SubString{String}["so", "or"])
Base.get(parser::PositiveLookbehind, sequence, till, after, i, state)
get result of PositiveLookbehind
The result is currently for a reversed
sequence, and you might find it difficult to Base.map
a lookbehind parser match. If you require this functionality please open an issue for discussion.
Assertions do not consume input, so typically these input chars are parsed/mapped outside of the assertion.
julia> p = Sequence(!re"a+b", PositiveLookbehind(!re"a+b"))
๐ Sequence
โโ ๐ Sequence |> !
โ โโ a+ |> Repeat
โ โโ b
โโ (?<=๐) Sequence |> ! |> PositiveLookbehind
โโ b
โโ a+ |> Repeat
::Tuple{SubString{String}, SubString{String}}
julia> p("aaab")
("aaab", "baaa")
Base.get(parser::Bytes{N,T}, sequence::Vector{UInt8})
Endianness can be achieved by just mapping bswap
julia> map(bswap, Bytes(2,UInt16))([0x16,0x11])
0x1611
julia> Bytes(2,UInt16)([0x16,0x11])
0x1116
Base.get(parser::Transformation{<:Function}, a...)
Base.get(parser::Transformation{<:Type}, a...)
Function call parser.transform(get(parser.parser,a...))
.
Base.get(parser::Transformation{<:IndexAt}, a...)
getindex(get(parser.parser,a...).parser.transform)
Base.parse
โ Functionparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)
Parse sequence
with parser
at start and produce an instance of result_type(parser)
. If log!==nothing
, parser is transformed with log_names
(p, log)
before matching.
tryparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)
returns either a result value or nothing
if sequence does not start with with a match.
tryparse_pos(parser::CombinedParser, str::AbstractString[, idx=firstindex(sequence)[, till=lastindex(s)]]; log=nothing)
returns either a tuple of result value and the position after the match, or nothing
if sequence does not start with with a match.
Example
julia> using TextParse
julia> p = ("Number: "*TextParse.Numeric(Int))[2]
๐ Sequence[2]
โโ Number\:\
โโ <Int64>
::Int64
julia> parse(p,"Number: 42")
42
Base.tryparse
โ Functionparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)
Parse sequence
with parser
at start and produce an instance of result_type(parser)
. If log!==nothing
, parser is transformed with log_names
(p, log)
before matching.
tryparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)
returns either a result value or nothing
if sequence does not start with with a match.
tryparse_pos(parser::CombinedParser, str::AbstractString[, idx=firstindex(sequence)[, till=lastindex(s)]]; log=nothing)
returns either a tuple of result value and the position after the match, or nothing
if sequence does not start with with a match.
Example
julia> using TextParse
julia> p = ("Number: "*TextParse.Numeric(Int))[2]
๐ Sequence[2]
โโ Number\:\
โโ <Int64>
::Int64
julia> parse(p,"Number: 42")
42
CombinedParsers.tryparse_pos
โ Functionparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)
Parse sequence
with parser
at start and produce an instance of result_type(parser)
. If log!==nothing
, parser is transformed with log_names
(p, log)
before matching.
tryparse(parser::CombinedParser, sequence[, idx=firstindex(sequence)[, till=lastindex(sequence)]]; log=nothing)
returns either a result value or nothing
if sequence does not start with with a match.
tryparse_pos(parser::CombinedParser, str::AbstractString[, idx=firstindex(sequence)[, till=lastindex(s)]]; log=nothing)
returns either a tuple of result value and the position after the match, or nothing
if sequence does not start with with a match.
Example
julia> using TextParse
julia> p = ("Number: "*TextParse.Numeric(Int))[2]
๐ Sequence[2]
โโ Number\:\
โโ <Int64>
::Int64
julia> parse(p,"Number: 42")
42
Iterating matches
CombinedParsers
iterates through all matches if parsing is ambiguous. How to write custom parser match iterations is detailed in the internals section.
Base.iterate
โ FunctionBase.iterate(x::ParseMatch[, m::ParseMatch=x])
Returns next ParseMatch
at m.offset
after m.state
, see _iterate
(m).
Base.iterate(x::MatchesIterator[, s::ParseMatch=ParseMatch(x)])
Iterate match s
at current position. While no match is found and s.offset<=x.stop
, s.offset
is incremented to search.
Return first next ParseMatch
(as return value and state) or nothing
when at x.stop
.
CombinedParsers.match_all
โ Functionmatch_all(parser::CombinedParser, sequence, a...; kw...)
Returns an iterator over all matches of CombinedParsers.wrap
(parser; kw...)
. Constructs a MatchesIterator
defining match index range with with a...
.
CombinedParsers.parse_all
โ Functionparse_all(parser::CombinedParser, sequence, idx=1)
Returns an iterator over all parsings of the sequence offset at idx
.