Internal API
Iterating
Iteration is done with states.
CombinedParsers.state_type
— FunctionCombinedParsers.state_type(x::CombinedParser{S}) where S
Return S
, the state type of x
CombinedParsers.MatchesIterator
— TypeMatchesIterator(parser::P, sequence::S[, start=firstindex(sequence)[, stop=lastindex(sequence), [till=lastindex(sequence)]]])
Iterator type for match_all
and parse_all
with eltype
ParseMatch
{P,S,state_type(P)}
.
Iteration looks for matches beginning between start
and stop
and ending at most at till
.
CombinedParsers.ParseMatch
— TypeParseMatch(p::MatchesIterator{P,S}, offset::Integer, after::Integer, state::ST) where {P,S,ST}
You can extract the following info from a m::ParseMatch
object (like Julia RegexMatch, ):
- the entire substring matched:
m.match
- the offset at which the whole match begins:
m.offset
If P<:
CombinedParsers.Regexp.ParserWithCaptures
and S<:
CombinedParsers.Regexp.SequenceWithCaptures
- the captured substrings as an array of strings:
m.captures
- the offsets of the captured substrings as a vector:
m.offsets
CombinedParsers.parsematch_tuple
— Functionparsematch_tuple(m,offset,state)
ParseMatch iteration has the first match as iterator, the last match as a state. (Turned out to be fastest.)
CombinedParsers._iterate
— Function_iterate(parser, sequence, till::Int, posi::Int[, next_i[, state=nothing]])
Return position after
next match of parser
in sequence
at posi
. The next match is following current match state
(first match iif state==nothing
).
If no next match is found, return nothing
.
next_i
is the index in sequence
after parser
match at posi
with state
.
leftof(sequence,next_i,parser,state)==posi
, the start of thestate
-matching subsequence.rightof(sequence,posi,parser,state)==next_i
, the position after thestate
-matching subsequence.sequence[leftof(sequence,next_i,parser,state):_prevind(sequence,next_i)]
is the matched subsequence.
Dispatches to _iterate(parser, sequence,till,posi,posi,nothing)
to .
custom _iterate
implementations must return
nothing
if no match is foundTuple{Int64,state_type(parser)}
with next position, match state if a match is found.
_iterate(parser::ValueMatcher, sequence, till, posi, next_i, state::Nothing)
When implementing a Custom<:ValueMatcher
it suffices to provide a method CombinedParsers._ismatch
(c, parser::Custom)
.
_iterate(p::AbstractTrie{Char}, str, till, posi, next_i, ::Nothing)
Match char path in p
greedily, recording SubTrie
in a NCodeunitsState
.
_iterate(p::ParserWithCaptures, sequence::SequenceWithCaptures,a...)
Base.empty!(sequence)
before iteration. (Why?)
CombinedParsers.tuple_pos
— Functiontuple_pos(pos_state::Tuple)
_iterate
returns a tuple pos_state
or nothing, and pos_state[1]
is position after match.
CombinedParsers.tuple_state
— Functiontuple_state(pos_state::Tuple)
_iterate
returns a tuple pos_state
or nothing, and pos_state[2]
is the state of match.
CombinedParsers.leftof
— FunctionCombinedParsers._leftof
— Function_leftof(str,i,parser::WrappedParser,x)
Convienience function for overriding leftof
that guarantees that not x isa Nothing
(returning i
).
CombinedParsers.rightof
— FunctionCombinedParsers._rightof
— Function_rightof(str,i,parser::WrappedParser,x)
Convienience function for overriding rightof
that guarantees that not x isa Nothing
(returning i
).
From result can (re-)construct CombinedParsers.leftof
.
Internal Types
Abstract Parsers
CombinedParsers.CombinedParser
— TypeCombinedParser{S,T} <: AbstractToken{T}
Abstract parser type for parsers returning matches transformed to ::T
and state::S
.
CombinedParsers.LeafParser
— TypeLeafParser{T} <: CombinedParser{T}
Abstract parser type for parsers that have no sub-parser (e.g. ConstantParser
). Used for dispatch in deepmap_parser
.
CombinedParsers.Assertion
— TypeParsers that do not consume any input can inherit Assertion{S,T}
.
TODO: allow to keep state and return wrapped get
States
CombinedParsers.MatchState
— TypeState object for a match that is defined by the triple parser, sequence, position
.
!!! note: Performance tip: Atomic
is masking the state of its wrapped parser with MatchState
. This simplifies the state
CombinedParsers.NoMatch
— TypeState type for skipped optional. (Missing was breaking julia).
CombinedParsers.NCodeunitsState
— TypeState object representing ncodeunits explicitely with state of match for leftof
, rightof
to improve performance. nc::Int
and state::S
.
See also MatchState
, leftof
, rightof
.
!!! note: nc
as type parameter faster but slow compilation.
Wrapped Parsers
CombinedParsers.FilterParser
— TypeA parser succeeds ony if
- the wrapped
parser
succeeds - and a predicate function
state_filter(sequence, till, posi, r...)
returnstrue
theafter,state = r
tuple.
CombinedParsers.ConstantParser
— TypeWrapper for stepping with ncodeunit length.
julia> parser("constant") isa CombinedParsers.ConstantParser
true
julia> parser('c') isa CombinedParsers.ConstantParser
true
julia> parser(1) isa CombinedParsers.ConstantParser
true
CombinedParsers.NIndexParser
— TypeNIndexParser{N,T} <: LeafParser{MatchState,T}
Abstract type for stepping N
indices with _leftof
and _rightof
, accounting for Base.ncodeunits
length of unicode chars.
See Bytes
and ValueMatcher
.
CombinedParsers.WrappedParser
— TypeAbstract type for parser wrappers, providing default methods
CombinedParsers.WrappedAssertion
— TypeAn assertion with an inner parser, like WrappedParser interface.
Printing
CombinedParsers.print_constructor
— Functionprint_constructor(io::IO,x)
Print constructor pipeline in parser tree node.
CombinedParsers.MemoTreeChildren
— Typedecurse recursive patterns
PCRE
printing currently in tree view, but has inconsistencies (might not result in the PCRE regex equivalent to the parser).
Base.escape_string
— FunctionBase.escape_string(x::AbstractVector)
for printing a non-string sequence when parsing.
type piracy? module local _escape_string
?
CombinedParsers.regex_string
— Functionregex_string(x::CombinedParser)
regex_prefix(x)*regex_inner(x)*regex_suffix(x)
CombinedParsers.regex_prefix
— Functionregex_prefix(x)
Prefix printed in parser tree node.
CombinedParsers.regex_inner
— Functionregex_inner(x::AbstractToken)
Regex representation of x
. See regex_string
CombinedParsers.regex_suffix
— Functionregex_suffix(x)
Suffix printed in parser tree node.
Rewriting Parsers
CombinedParsers.deepmap_parser
— Functiondeepmap_parser(f::Function[, mem::AbstractDict=IdDict()], x::CombinedParser,a...;kw...)
Perform a deep transformation of a x
.
Default method
- Returns cached result if
haskey(x, mem)
to avoid infinite recursion. - construct deep transformation
dt = _deepmap_parser(f, mem, x, a...; kw...)
- cache and return
f(dt, a...; kw...)
Used for log_names
.
For a new CombinedParser
,
define either deepmap_parser
or _deepmap_parser
.
For a parser transformation f
,
define either custom
deepmap_parser(::typeof(f),...)
(see example implementationsubstitute
)- construction method
_deepmap_parser(::typeof(f),...)
(see example implementationcaseless
) - leaf method
f
(see example implementationdeepmap
)
_deepmap_parser(::typeof(_indexed_captures),mem::AbstractDict,x::Either,context,reset_index)
Method dispatch, resetting lastindex(context.subroutines)
if `reset_index===true'.
CombinedParsers._deepmap_parser
— Functiondeepmap_parser(f,mem::AbstractDict,x,a...;kw...)
Perform a deep transformation of a CombinedParser.
For a custom parser P<:CombinedParser
with sub-parsers, provide a method
CombinedParsers._deepmap_parser(f,mem::AbstractDict,x::P,a...;kw...) =
## construct replacement, e.g. if P <: WrappedParser
P(deepmap_parser(f,mem,x.parser,a...;kw...))
_deepmap_parser(f::typeof(_indexed_captures),mem::AbstractDict,x::DupSubpatternNumbers,context,reset_index)
set `reset_index===true'.
_deepmap_parser(f::typeof(_indexed_captures),mem::AbstractDict,x::Capture,context,a...)
Map the capture my setting index
to _nextind(context,x)
.
Registers result in context.subroutines
if no previous subroutine with the same index exists (see also DupSubpatternNumbers
).
CombinedParsers.reinfer
— Functionreinfer(parser)
Run julia type inference again on a parser for optimization. Either
{<:Vector}
parsers are converted to Either{<:Tuple}
.
Implementation is an example when the a custom deepmap_parser
method is useful.
CombinedParsers.strip_either1
— Functionstrip_either1(x::CombinedParser)
Replace all Either
parsers with one option with that option.
Used in 2-stage substitute
(stage 1: collect for recursion, stage 2: simplify).
CombinedParsers.Regexp.NoDict
— TypeFor use in ParserWithCaptures to enforce different indices for identical captures.