Internal API
Iterating
Iteration is done with states.
CombinedParsers.state_type — FunctionCombinedParsers.state_type(x::CombinedParser{S}) where SReturn S, the state type of x
CombinedParsers.MatchesIterator — TypeMatchesIterator(parser::P, sequence::S[, start=firstindex(sequence)[, stop=lastindex(sequence), [till=lastindex(sequence)]]])Iterator type for match_all and parse_all with eltype ParseMatch{P,S,state_type(P)}.
Iteration looks for matches beginning between start and stop and ending at most at till.
CombinedParsers.ParseMatch — TypeParseMatch(p::MatchesIterator{P,S}, offset::Integer, after::Integer, state::ST) where {P,S,ST}You can extract the following info from a m::ParseMatch object (like Julia RegexMatch, ):
- the entire substring matched:
m.match - the offset at which the whole match begins:
m.offset
If P<:CombinedParsers.Regexp.ParserWithCaptures and S<:CombinedParsers.Regexp.SequenceWithCaptures
- the captured substrings as an array of strings:
m.captures - the offsets of the captured substrings as a vector:
m.offsets
CombinedParsers.parsematch_tuple — Functionparsematch_tuple(m,offset,state)ParseMatch iteration has the first match as iterator, the last match as a state. (Turned out to be fastest.)
CombinedParsers._iterate — Function_iterate(parser, sequence, till::Int, posi::Int[, next_i[, state=nothing]])Return position after next match of parser in sequence at posi. The next match is following current match state (first match iif state==nothing).
If no next match is found, return nothing.
next_i is the index in sequence after parser match at posi with state.
leftof(sequence,next_i,parser,state)==posi, the start of thestate-matching subsequence.rightof(sequence,posi,parser,state)==next_i, the position after thestate-matching subsequence.sequence[leftof(sequence,next_i,parser,state):_prevind(sequence,next_i)]is the matched subsequence.
Dispatches to _iterate(parser, sequence,till,posi,posi,nothing) to .
custom _iterate implementations must return
nothingif no match is foundTuple{Int64,state_type(parser)}with next position, match state if a match is found.
_iterate(parser::ValueMatcher, sequence, till, posi, next_i, state::Nothing)When implementing a Custom<:ValueMatcher it suffices to provide a method CombinedParsers._ismatch(c, parser::Custom).
_iterate(p::AbstractTrie{Char}, str, till, posi, next_i, ::Nothing)Match char path in p greedily, recording SubTrie in a NCodeunitsState.
_iterate(p::ParserWithCaptures, sequence::SequenceWithCaptures,a...)Base.empty!(sequence) before iteration. (Why?)
CombinedParsers.tuple_pos — Functiontuple_pos(pos_state::Tuple)_iterate returns a tuple pos_state or nothing, and pos_state[1] is position after match.
CombinedParsers.tuple_state — Functiontuple_state(pos_state::Tuple)_iterate returns a tuple pos_state or nothing, and pos_state[2] is the state of match.
CombinedParsers.leftof — FunctionCombinedParsers._leftof — Function_leftof(str,i,parser::WrappedParser,x)Convienience function for overriding leftof that guarantees that not x isa Nothing (returning i).
CombinedParsers.rightof — FunctionCombinedParsers._rightof — Function_rightof(str,i,parser::WrappedParser,x)Convienience function for overriding rightof that guarantees that not x isa Nothing (returning i).
From result can (re-)construct CombinedParsers.leftof.
Internal Types
Abstract Parsers
CombinedParsers.CombinedParser — TypeCombinedParser{S,T} <: AbstractToken{T}Abstract parser type for parsers returning matches transformed to ::T and state::S.
CombinedParsers.LeafParser — TypeLeafParser{T} <: CombinedParser{T}Abstract parser type for parsers that have no sub-parser (e.g. ConstantParser). Used for dispatch in deepmap_parser.
CombinedParsers.Assertion — TypeParsers that do not consume any input can inherit Assertion{S,T}.
TODO: allow to keep state and return wrapped get
States
CombinedParsers.MatchState — TypeState object for a match that is defined by the triple parser, sequence, position.
!!! note: Performance tip: Atomic is masking the state of its wrapped parser with MatchState. This simplifies the state
CombinedParsers.NoMatch — TypeState type for skipped optional. (Missing was breaking julia).
CombinedParsers.NCodeunitsState — TypeState object representing ncodeunits explicitely with state of match for leftof, rightof to improve performance. nc::Int and state::S.
See also MatchState, leftof, rightof.
!!! note: nc as type parameter faster but slow compilation.
Wrapped Parsers
CombinedParsers.FilterParser — TypeA parser succeeds ony if
- the wrapped
parsersucceeds - and a predicate function
state_filter(sequence, till, posi, r...)returnstruetheafter,state = rtuple.
CombinedParsers.ConstantParser — TypeWrapper for stepping with ncodeunit length.
julia> parser("constant") isa CombinedParsers.ConstantParser
true
julia> parser('c') isa CombinedParsers.ConstantParser
true
julia> parser(1) isa CombinedParsers.ConstantParser
trueCombinedParsers.NIndexParser — TypeNIndexParser{N,T} <: LeafParser{MatchState,T}Abstract type for stepping N indices with _leftof and _rightof, accounting for Base.ncodeunits length of unicode chars.
See Bytes and ValueMatcher.
CombinedParsers.WrappedParser — TypeAbstract type for parser wrappers, providing default methods
CombinedParsers.WrappedAssertion — TypeAn assertion with an inner parser, like WrappedParser interface.
Printing
CombinedParsers.print_constructor — Functionprint_constructor(io::IO,x)Print constructor pipeline in parser tree node.
CombinedParsers.MemoTreeChildren — Typedecurse recursive patterns
PCRE
printing currently in tree view, but has inconsistencies (might not result in the PCRE regex equivalent to the parser).
Base.escape_string — FunctionBase.escape_string(x::AbstractVector)for printing a non-string sequence when parsing.
type piracy? module local _escape_string?
CombinedParsers.regex_string — Functionregex_string(x::CombinedParser)regex_prefix(x)*regex_inner(x)*regex_suffix(x)
CombinedParsers.regex_prefix — Functionregex_prefix(x)Prefix printed in parser tree node.
CombinedParsers.regex_inner — Functionregex_inner(x::AbstractToken)Regex representation of x. See regex_string
CombinedParsers.regex_suffix — Functionregex_suffix(x)Suffix printed in parser tree node.
Rewriting Parsers
CombinedParsers.deepmap_parser — Functiondeepmap_parser(f::Function[, mem::AbstractDict=IdDict()], x::CombinedParser,a...;kw...)Perform a deep transformation of a x.
Default method
- Returns cached result if
haskey(x, mem)to avoid infinite recursion. - construct deep transformation
dt = _deepmap_parser(f, mem, x, a...; kw...) - cache and return
f(dt, a...; kw...)
Used for log_names.
For a new CombinedParser,
define either deepmap_parser or _deepmap_parser.
For a parser transformation f,
define either custom
deepmap_parser(::typeof(f),...)(see example implementationsubstitute)- construction method
_deepmap_parser(::typeof(f),...)(see example implementationcaseless) - leaf method
f(see example implementationdeepmap)
_deepmap_parser(::typeof(_indexed_captures),mem::AbstractDict,x::Either,context,reset_index)Method dispatch, resetting lastindex(context.subroutines) if `reset_index===true'.
CombinedParsers._deepmap_parser — Functiondeepmap_parser(f,mem::AbstractDict,x,a...;kw...)Perform a deep transformation of a CombinedParser.
For a custom parser P<:CombinedParser with sub-parsers, provide a method
CombinedParsers._deepmap_parser(f,mem::AbstractDict,x::P,a...;kw...) =
## construct replacement, e.g. if P <: WrappedParser
P(deepmap_parser(f,mem,x.parser,a...;kw...))_deepmap_parser(f::typeof(_indexed_captures),mem::AbstractDict,x::DupSubpatternNumbers,context,reset_index)set `reset_index===true'.
_deepmap_parser(f::typeof(_indexed_captures),mem::AbstractDict,x::Capture,context,a...)Map the capture my setting index to _nextind(context,x).
Registers result in context.subroutines if no previous subroutine with the same index exists (see also DupSubpatternNumbers).
CombinedParsers.reinfer — Functionreinfer(parser)Run julia type inference again on a parser for optimization. Either{<:Vector} parsers are converted to Either{<:Tuple}.
Implementation is an example when the a custom deepmap_parser method is useful.
CombinedParsers.strip_either1 — Functionstrip_either1(x::CombinedParser)Replace all Either parsers with one option with that option.
Used in 2-stage substitute (stage 1: collect for recursion, stage 2: simplify).
CombinedParsers.Regexp.NoDict — TypeFor use in ParserWithCaptures to enforce different indices for identical captures.