Internal API

Iterating

Iteration is done with states.

CombinedParsers.MatchesIteratorType
MatchesIterator(parser::P, sequence::S[, start=firstindex(sequence)[, stop=lastindex(sequence), [till=lastindex(sequence)]]])

Iterator type for match_all and parse_all with eltype ParseMatch{P,S,state_type(P)}.

Iteration looks for matches beginning between start and stop and ending at most at till.

CombinedParsers.ParseMatchType
ParseMatch(p::MatchesIterator{P,S}, offset::Integer, after::Integer, state::ST) where {P,S,ST}

You can extract the following info from a m::ParseMatch object (like Julia RegexMatch, ):

  • the entire substring matched: m.match
  • the offset at which the whole match begins: m.offset

If P<:CombinedParsers.Regexp.ParserWithCaptures and S<:CombinedParsers.Regexp.SequenceWithCaptures

  • the captured substrings as an array of strings: m.captures
  • the offsets of the captured substrings as a vector: m.offsets
CombinedParsers.parsematch_tupleFunction
parsematch_tuple(m,offset,state)

ParseMatch iteration has the first match as iterator, the last match as a state. (Turned out to be fastest.)

CombinedParsers._iterateFunction
_iterate(parser, sequence, till::Int, posi::Int[, next_i[, state=nothing]])

Return position after next match of parser in sequence at posi. The next match is following current match state (first match iif state==nothing).

If no next match is found, return nothing.

Note

next_i is the index in sequence after parser match at posi with state.

  • leftof(sequence,next_i,parser,state)==posi, the start of the state-matching subsequence.
  • rightof(sequence,posi,parser,state)==next_i, the position after the state-matching subsequence.
  • sequence[leftof(sequence,next_i,parser,state):_prevind(sequence,next_i)] is the matched subsequence.

Dispatches to _iterate(parser, sequence,till,posi,posi,nothing) to .

Note

custom _iterate implementations must return

  • nothing if no match is found
  • Tuple{Int64,state_type(parser)} with next position, match state if a match is found.
_iterate(parser::ValueMatcher, sequence, till, posi, next_i, state::Nothing)

When implementing a Custom<:ValueMatcher it suffices to provide a method CombinedParsers._ismatch(c, parser::Custom).

_iterate(p::AbstractTrie{Char}, str, till, posi, next_i, ::Nothing)

Match char path in p greedily, recording SubTrie in a NCodeunitsState.

_iterate(p::ParserWithCaptures, sequence::SequenceWithCaptures,a...)

Base.empty!(sequence) before iteration. (Why?)

CombinedParsers._leftofFunction
_leftof(str,i,parser::WrappedParser,x)

Convienience function for overriding leftof that guarantees that not x isa Nothing (returning i).

CombinedParsers._rightofFunction
_rightof(str,i,parser::WrappedParser,x)

Convienience function for overriding rightof that guarantees that not x isa Nothing (returning i).

From result can (re-)construct CombinedParsers.leftof.

Internal Types

Abstract Parsers

CombinedParsers.CombinedParserType
CombinedParser{S,T} <: AbstractToken{T}

Abstract parser type for parsers returning matches transformed to ::T and state::S.

CombinedParsers.AssertionType

Parsers that do not consume any input can inherit Assertion{S,T}.

Note

TODO: allow to keep state and return wrapped get

States

CombinedParsers.MatchStateType

State object for a match that is defined by the triple parser, sequence, position.

!!! note: Performance tip: Atomic is masking the state of its wrapped parser with MatchState. This simplifies the state

CombinedParsers.NCodeunitsStateType

State object representing ncodeunits explicitely with state of match for leftof, rightof to improve performance. nc::Int and state::S.

See also MatchState, leftof, rightof.

!!! note: nc as type parameter faster but slow compilation.

Wrapped Parsers

CombinedParsers.FilterParserType

A parser succeeds ony if

  1. the wrapped parser succeeds
  2. and a predicate function state_filter(sequence, till, posi, r...) returns true the after,state = r tuple.
CombinedParsers.ConstantParserType

Wrapper for stepping with ncodeunit length.

julia> parser("constant") isa CombinedParsers.ConstantParser
true

julia> parser('c') isa CombinedParsers.ConstantParser
true

julia> parser(1) isa CombinedParsers.ConstantParser
true

Printing

PCRE

printing currently in tree view, but has inconsistencies (might not result in the PCRE regex equivalent to the parser).

Base.escape_stringFunction
Base.escape_string(x::AbstractVector)

for printing a non-string sequence when parsing.

Note

type piracy? module local _escape_string?

Rewriting Parsers

CombinedParsers.deepmap_parserFunction
deepmap_parser(f::Function[, mem::AbstractDict=IdDict()], x::CombinedParser,a...;kw...)

Perform a deep transformation of a x.

Default method

  1. Returns cached result if haskey(x, mem) to avoid infinite recursion.
  2. construct deep transformation dt = _deepmap_parser(f, mem, x, a...; kw...)
  3. cache and return f(dt, a...; kw...)

Used for log_names.

For a new CombinedParser,

define either deepmap_parser or _deepmap_parser.

For a parser transformation f,

define either custom

  • deepmap_parser(::typeof(f),...) (see example implementation substitute)
  • construction method _deepmap_parser(::typeof(f),...) (see example implementation caseless)
  • leaf method f (see example implementation deepmap)
_deepmap_parser(::typeof(_indexed_captures),mem::AbstractDict,x::Either,context,reset_index)

Method dispatch, resetting lastindex(context.subroutines) if `reset_index===true'.

CombinedParsers._deepmap_parserFunction
deepmap_parser(f,mem::AbstractDict,x,a...;kw...)

Perform a deep transformation of a CombinedParser.

Note

For a custom parser P<:CombinedParser with sub-parsers, provide a method

CombinedParsers._deepmap_parser(f,mem::AbstractDict,x::P,a...;kw...) =
     ## construct replacement, e.g. if P <: WrappedParser
     P(deepmap_parser(f,mem,x.parser,a...;kw...))
_deepmap_parser(f::typeof(_indexed_captures),mem::AbstractDict,x::DupSubpatternNumbers,context,reset_index)

set `reset_index===true'.

_deepmap_parser(f::typeof(_indexed_captures),mem::AbstractDict,x::Capture,context,a...)

Map the capture my setting index to _nextind(context,x).

Registers result in context.subroutines if no previous subroutine with the same index exists (see also DupSubpatternNumbers).

CombinedParsers.reinferFunction
reinfer(parser)

Run julia type inference again on a parser for optimization. Either{<:Vector} parsers are converted to Either{<:Tuple}.

Implementation is an example when the a custom deepmap_parser method is useful.

CombinedParsers.strip_either1Function
strip_either1(x::CombinedParser)

Replace all Either parsers with one option with that option.

Used in 2-stage substitute (stage 1: collect for recursion, stage 2: simplify).