Constructing Parsers
Character Matchers
CombinedParsers.AnyChar
— FunctionAnyChar() = AnyValue(Char)
CombinedParsers.AnyValue
— TypeAnyValue(T=Char)
Parser matching exactly one x::T
, returning the value.
julia> AnyChar()
. AnyValue
::Char
CombinedParsers.Bytes
— TypeBytes{N,T} <: NIndexParser{N,T}
Fast parsing of a fixed number N
of indices, reinterpret(T,match)[1]
the parsed vector as T
, if isbitstype
, or T(match)
constructor otherwise.
Provide Base.get(parser::Bytes{N,T}, sequence, till, after, i, state) where {N,T}
for custom conversion.
Endianness can be achieved by just mapping bswap
julia> map(bswap, Bytes(2,UInt16))([0x16,0x11])
0x1611
julia> Bytes(2,UInt16)([0x16,0x11])
0x1116
CombinedParsers.ValueMatcher
— TypeValueMatcher
match value at point c
iif ismatch
(c, parser)
. A ValueMatcher{T}=NIndexParser{1,T}
and has state_type
MatchState
.
See AnyValue
, ValueIn
, and ValueNotIn
.
CombinedParsers.CharIn
— FunctionCharIn(a...; kw...) = ValueIn{Char}(a...; kw...)
CombinedParsers.UnicodeClass
— TypeUnicodeClass(unicode_category::Symbol...)
used in ValueIn
, ValueNotIn
and succeeds if char at cursor is in one of the unicode classes.
julia> match(ValueIn(:L), "aB")
ParseMatch("a")
julia> match(ValueIn(:Lu), "aB")
ParseMatch("B")
julia> match(ValueIn(:N), "aA1")
ParseMatch("1")
Supported Unicode classes
julia> for (k,v) in CombinedParsers.unicode_class
println(":",k, " is a ",v[1],", ", v[2],".")
end
:L is a Letter, any kind of letter from any language.
:Ll is a Lowercase Letter, a lowercase letter that has an uppercase variant.
:Lu is a Uppercase Letter, an uppercase letter that has a lowercase variant.
:Lt is a Titlecase Letter, a letter that appears at the start of a word when only the first letter of the word is capitalized.
:L& is a Cased Letter, a letter that exists in lowercase and uppercase variants (combination of Ll, Lu and Lt).
:Lm is a Modifier Letter, a special character that is used like a letter.
:Lo is a Other Letter, a letter or ideograph that does not have lowercase and uppercase variants.
:M is a Mark, a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.).
:Mn is a Non Spacing Mark, a character intended to be combined with another character without taking up extra space (e.g. accents, umlauts, etc.).
:Mc is a Spacing Combining Mark, a character intended to be combined with another character that takes up extra space (vowel signs in many Eastern languages).
:Me is a Enclosing Mark, a character that encloses the character it is combined with (circle, square, keycap, etc.).
:Z is a Separator, any kind of whitespace or invisible separator.
:Zs is a Space Separator, a whitespace character that is invisible, but does take up space.
:Zl is a Line Separator, line separator character U+2028.
:Zp is a Paragraph Separator, paragraph separator character U+2029.
:S is a Symbol, math symbols, currency signs, dingbats, box-drawing characters, etc..
:Sm is a Math Symbol, any mathematical symbol.
:Sc is a Currency Symbol, any currency sign.
:Sk is a Modifier Symbol, a combining character (mark) as a full character on its own.
:So is a Other Symbol, various symbols that are not math symbols, currency signs, or combining characters.
:N is a Number, any kind of numeric character in any script.
:Nd is a Decimal Digit Number, a digit zero through nine in any script except ideographic scripts.
:Nl is a Letter Number, a number that looks like a letter, such as a Roman numeral.
:No is a Other Number, a superscript or subscript digit, or a number that is not a digit 0–9 (excluding numbers from ideographic scripts).
:P is a Punctuation, any kind of punctuation character.
:Pc is a Connector Punctuation, a punctuation character such as an underscore that connects words.
:Pd is a Dash Punctuation, any kind of hyphen or dash.
:Ps is a Open Punctuation, any kind of opening bracket.
:Pe is a Close Punctuation, any kind of closing bracket.
:Pi is a Initial Punctuation, any kind of opening quote.
:Pf is a Final Punctuation, any kind of closing quote.
:Po is a Other Punctuation, any kind of punctuation character that is not a dash, bracket, quote or connector.
:C is a Other, invisible control characters and unused code points.
:Cc is a Control, an ASCII or Latin-1 control character: 0x00–0x1F and 0x7F–0x9F.
:Cf is a Format, invisible formatting indicator.
:Cs is a Surrogate, one half of a surrogate pair in UTF-16 encoding.
:Co is a Private Use, any code point reserved for private use.
:Cn is a Unassigned, any code point to which no character has been assigned.
CombinedParsers.ValueIn
— TypeValueIn(x)
Parser matching exactly one element c
(character) in a sequence, iif _ismatch
(c,x)
.
julia> a_z = ValueIn('a':'z')
[a-z] ValueIn
::Char
julia> parse(a_z, "a")
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
julia> ac = CharIn("ac")
[ac] ValueIn
::Char
julia> parse(ac, "c")
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
julia> l = CharIn(islowercase)
[islowercase(...)] ValueIn
::Char
julia> parse(l, "c")
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
CombinedParsers.CharNotIn
— FunctionCharNotIn(a...; kw...) = ValueNotIn{Char}(a...; kw...)
CombinedParsers.ValueNotIn
— TypeValueNotIn{T}(label::AbstractString, x)
Parser matching exactly one element (character) in a sequence, iif not in x
.
ValueNotIn([label::AbstractString="", ]x...)
ValueNotIn{T}([label::AbstractString="", ]x...)
Flattens x
with CombinedParsers.flatten_valuepatterns
, and tries to infer T
if not provided.
julia> a_z = CharNotIn('a':'z')
[^a-z] ValueNotIn
::Char
julia> ac = CharNotIn("ca")
[^ca] ValueNotIn
::Char
Respects boolean logic:
julia> CharNotIn(CharNotIn("ab"))("a")
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
Respects boolean logic:
julia> CharIn(CharIn("ab"))("a")
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
julia> CharIn(CharNotIn("bc"))("a")
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
julia> parse(CharNotIn(CharIn("bc")), "a")
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
CombinedParsers.ismatch
— Functionismatch(c,p)
returns _ismatch
(c, p)
ismatch(c::MatchingNever,p)
returns false
.
CombinedParsers._ismatch
— Function_ismatch(x::Char, set::Union{Tuple,Vector})::Bool
Return _ismatch(x,set...)
.
_ismatch(x, f, r1, r...)
Check if x
matches any of the options f, r1,r...
: If ismatch(x,f)
return true
, otherwise return _ismatch(x, r1, r...)
.
_ismatch(x)
returns false
(out of options)
_ismatch(x, p)
returns x==p
_ismatch(c,p::Function)
returns p(c)
_ismatch(c,p::AnyValue)
true
_ismatch(c,p::Union{StepRange,Set})
returns c in p
Base.Broadcast.broadcasted
— FunctionBase.broadcasted(::typeof((&)), x::ValueNotIn, y::ValueNotIn)
Character matchers m
like Union{ValueIn,ValueNotIn,T}
, or any type T
providing a ismatch(m::T,c::Char)::Bool
method represent a "sparse" bitarray for all characters.
Please consider the broadcast API a draft you are invited to comment to.
julia> CharNotIn("abc") .& CharNotIn("z")
[^abcz] ValueNotIn
::Char
julia> CharIn("abc") .& CharNotIn("c")
[ab] ValueIn
::Char
CombinedParsers.flatten_valuepatterns
— Functionflatten_valuepatterns(x...)
Used in ValueMatcher
constructors.
Heuristic is roughly:
- collect
ElementIterators
in aSet
- collect everything else in a
Tuple
(Function
s etc.) - in the process the
label
is concatenated - return all that was collected as
Tuple{String, <:Set, <:Tuple}
orTuple{String, <:Set}
orTuple{String, <:Tuple}
.
Repeating
CombinedParsers.Repeat
— TypeRepeat(minmax::UnitRange, x...)
Repeat(x...; min=0,max=Repeat_max)
Repeat(min::Integer, x...)
Repeat(min::Integer,max::Integer, x...)
Parser repeating pattern x
min:max
times.
julia> Repeat(2,2,'a')
a{2} |> Repeat
::Vector{Char}
julia> Repeat(3,'a')
a{3,} |> Repeat
::Vector{Char}
Base.:|
— Method(|)(x::AbstractToken{T}, default::Union{T,Missing})
Operator syntax for Optional(x, default=default)
.
julia> parser("abc") | "nothing"
|🗄 Either
├─ abc
└─ nothing
::SubString{String}
julia> parser("abc") | missing
abc? |missing
::Union{Missing, SubString{String}}
CombinedParsers.Repeat1
— FunctionRepeat1(x)
Parser repeating pattern x
one time or more.
Repeat1(f::Function,a...)
Abbreviation for Base.map
(f,Repeat1(a...))
.
CombinedParsers.Optional
— TypeOptional(parser;default=defaultvalue(result_type(parser)))
Parser that always succeeds. If parser succeeds, return result of parser
with curser behind match. If parser does not succeed, return default
with curser unchanged.
julia> match(r"a?","b")
RegexMatch("")
julia> parse(Optional("a", default=42),"b")
42
CombinedParsers.defaultvalue
— Functiondefaultvalue(T::Type)
Default value if Optional
<:CombinedParser
is skipped.
T<:AbstractString
:""
T<:Vector{E}
:E[]
T<:CombinedParser
:Always()
- otherwise
missing
get
will return a CombinedParsers._copy
of defaultvalue
.
CombinedParsers._copy
— Function_copy(x)
copy(x)
iif ismutable(x)
; used when defaultvalue
of Optional
results in get
.
CombinedParsers.Lazy
— TypeLazy(x::Repeat)
Lazy(x::Optional)
Lazy x
repetition matching (instead of default greedy).
julia> german_street_address = !Lazy(Repeat(AnyChar())) * Repeat1(' ') * TextParse.Numeric(Int)
🗄 Sequence
├─ .*? AnyValue |> Repeat |> Lazy |> !
├─ \ + |> Repeat
└─ <Int64>
::Tuple{SubString{String}, Vector{Char}, Int64}
julia> german_street_address("Konrad Adenauer Allee 42")
("Konrad Adenauer Allee", [' ', ' ', ' ', ' '], 42)
PCRE @re_str
julia> re"a+?"
a+? |> Repeat |> Lazy
::Vector{Char}
julia> re"a??"
a?? |missing |> Lazy
::Union{Missing, Char}
CombinedParsers.Repeat_stop
— FunctionRepeat_stop(p,stop)
Repeat_stop(p,stop; min=0, max=Repeat_max)
Repeat p
until stop
(NegativeLookahead
), not matching stop
. Sets cursor before stop
. Tries min:max
times Returns results of p
.
julia> p = Repeat_stop(AnyChar(),'b') * AnyChar()
🗄 Sequence
├─ 🗄* Sequence[2] |> Repeat
│ ├─ (?!b) NegativeLookahead
│ └─ . AnyValue
└─ . AnyValue
::Tuple{Vector{Char}, Char}
julia> parse(p,"acbX")
(['a', 'c'], 'b')
See also NegativeLookahead
CombinedParsers.Repeat_until
— FunctionRepeat_until(p,until, with_until=false; wrap=identity, min=0, max=Repeat_max)
Repeat p
until stop
(with Repeat_stop
). and set point after stop
.
Return a Vector{result_type(p)}
if wrap_until==false
, otherwise a Tuple{Vector{result_type(p)},result_type(until)}
.
To transform the Repeat_stop(p)
parser head, provide a function(::Vector{result_type(p)}) in wrap
keyword argument, e.g.
julia> p = Repeat_until(AnyChar(),'b') * AnyChar()
🗄 Sequence
├─ 🗄 Sequence[1]
│ ├─ (?>🗄*) Sequence[2] |> Repeat |> Atomic
│ │ ├─ (?!b) NegativeLookahead
│ │ └─ . AnyValue
│ └─ b
└─ . AnyValue
::Tuple{Vector{Char}, Char}
julia> parse(p,"acbX")
(['a', 'c'], 'X')
julia> parse(Repeat_until(AnyChar(),'b';wrap=MatchedSubSequence),"acbX")
"ac"
See also NegativeLookahead
Base.join
— FunctionBase.join(x::Repeat,delim, infix=:skip)
Parser matching repeated x.parser
separated by delim
.
julia> parse(join(Repeat(AnyChar()),','),"a,b,c")
3-element Vector{Char}:
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
julia> parse(join(Repeat(AnyChar()),',';infix=:prefix),"a,b,c")
('a', [(',', 'b'), (',', 'c')])
julia> parse(join(Repeat(AnyChar()),',';infix=:suffix),"a,b,c")
([('a', ','), ('b', ',')], 'c')
Base.join(x::CombinedParser,delim; kw...)
Shorthand for join(Repeat(x),delim; kw...)
.
Base.join(f::Function, x::CombinedParser, delim; kw...)
Shorthand for Base.map
(f,join(x,delim; kw...))
.
Atomic
CombinedParsers.Atomic
— TypeAtomic(x)
A parser matching p
, and failing when required to backtrack (behaving like an atomic group in regular expressions).
Sequences
CombinedParsers.Sequence
— TypeSequence{P,S,T}
of parts::P
, sequence_state_type
==S
with sequence_result_type
==T
.
Sequence(parts::CombinedParser...; tuplestate=true)
of parts
, sequence_state_type
(p; tuplestate=tuplestate)
with sequence_result_type
.
Sequences can alternatively created with *
julia> german_street_address = !Repeat(AnyChar()) * ' ' * TextParse.Numeric(Int)
🗄 Sequence
├─ .* AnyValue |> Repeat |> !
├─ \
└─ <Int64>
::Tuple{SubString{String}, Char, Int64}
julia> german_street_address("Some Avenue 42")
("Some Avenue", ' ', 42)
Indexing (transformation) can be defined with
julia> e1 = Sequence(!Repeat(AnyChar()), ' ',TextParse.Numeric(Int))[1]
🗄 Sequence[1]
├─ .* AnyValue |> Repeat |> !
├─ \
└─ <Int64>
::SubString{String}
julia> e1("Some Avenue 42")
"Some Avenue"
State is managed as sequence_state_type
(parts; tuplestate)
. Overwrite to optimize state types special cases.
Base.:*
— MethodCombinedParsers.sSequence
— FunctionsSequence(x...)
Simplifying Sequence
, flatten Sequence
s, remove Always
assertions.
julia> Sequence('a',CharIn("AB")*'b')
🗄 Sequence
├─ a
└─ 🗄 Sequence
├─ [AB] ValueIn
└─ b
::Tuple{Char, Tuple{Char, Char}}
julia> sSequence('a',CharIn("AB")*'b')
🗄 Sequence
├─ a
├─ [AB] ValueIn
└─ b
::Tuple{Char, Char, Char}
See also Sequence
This function will be removed and replaced with a keyword argument
CombinedParsers.@seq
— Macro@seq(x...)
Create a sequence interleaved with whitespace (horizontal or vertical). The result_type is omitting whitespace.
CombinedParsers.sequence_result_type
— Functionsequence_result_type(::Type{T}) where {T<:Tuple}
Tuple
type, internally used for Sequence
result_type.
CombinedParsers.sequence_state_type
— Functionsequence_state_type(pts::Type; tuplestate=true)
MatchState
if allfieldtypes
areMatchState
,- otherwise if
tuplestate
, a tuple type with thestate_type
ofparts
, - or
Vector{Any}
if!tuplestate
.
Todo: NCodeunitsState instead of MatchState might increase performance.
Recursive Parsers with Either
CombinedParsers.Delayed
— FunctionDelayed(T::Type) =
Either
{T}()
.
CombinedParsers.Either
— TypeEither{S,T}(p) where {S,T} = new{typeof(p),S,T}(p)
Parser that tries matching the provided parsers in order, accepting the first match, and fails if all parsers fail.
This parser has no ==
and hash
methods because it can recurse.
julia> match(r"a|bc","bc")
RegexMatch("bc")
julia> parse(Either("a","bc"),"bc")
"bc"
julia> parse("a" | "bc","bc")
"bc"
Base.:|
— Method(|)(x::AbstractToken, y)
(|)(x, y::AbstractToken)
(|)(x::AbstractToken, y::AbstractToken)
Operator syntax for Either(x, y; simplify=true)
.
julia> 'a' | CharIn("AB") | "bc"
|🗄 Either
├─ a
├─ [AB] ValueIn
└─ bc
::Union{Char, SubString{String}}
CombinedParsers.@syntax
— Macro@syntax name = expr
Convenience macro defining a CombinedParser name=expr
and custom parsing macro @name_str
.
DocTestFilters = r"map\(.+\)"
julia> @syntax a = AnyChar();
julia> a"char"
'c': ASCII/Unicode U+0063 (category Ll: Letter, lowercase)
@syntax for name in either; expr; end
Parser expr
is pushfirst!
to either
. If either
is undefined, it will be created. If either == :text || either == Symbol(:)
the parser will be added to CombinedParser_globals
variable in your module.
julia> @syntax street_address = Either(Any[]);
julia> @syntax for german_street_address in street_address
Sequence(!!Repeat(AnyChar()),
" ",
TextParse.Numeric(Int)) do v
(street = v[1], no=v[3])
end
end
🗄 Sequence |> map(#50) |> with_name(:german_street_address)
├─ .* AnyValue |> Repeat |> ! |> map(intern)
├─ \
└─ <Int64>
::NamedTuple{(:street, :no), Tuple{String, Int64}}
julia> german_street_address"Some Avenue 42"
(street = "Some Avenue", no = 42)
julia> @syntax for us_street_address in street_address
Sequence(TextParse.Numeric(Int),
" ",
!!Repeat(AnyChar())) do v
(street = v[3], no=v[1])
end
end
🗄 Sequence |> map(#52) |> with_name(:us_street_address)
├─ <Int64>
├─ \
└─ .* AnyValue |> Repeat |> ! |> map(intern)
::NamedTuple{(:street, :no), Tuple{String, Int64}}
julia> street_address"50 Oakland Ave"
(street = "Oakland Ave", no = 50)
julia> street_address"Oakland Ave 50"
(street = "Oakland Ave", no = 50)
CombinedParsers.substitute
— Functionsubstitute(name::Symbol)
Define a parser substitution.
substitute(parser::CombinedParser)
Apply parser substitution, respecting scope in the defined tree:
- Parser variables are defined within scope of
Either
s, for all itsNamedParser
options. Substitution
parsers are replaced with parser variables.strip_either1
is used to simplify in a second phase.
Substitution implementation is experimental pending feedback.
todo: scope NamedParser objects in WrappedParser, Sequence, etc.?
julia> Either(:a => !Either(
:b => "X",
:d => substitute(:b),
substitute(:c)),
:b => "b",
:c => substitute(:b)
) |> substitute
|🗄 Either
├─ |🗄 Either |> ! |> with_name(:a)
│ ├─ X |> with_name(:b)
│ ├─ X |> with_name(:b) |> with_name(:d)
│ └─ b |> with_name(:b) |> with_name(:c)
├─ b |> with_name(:b)
└─ b |> with_name(:b) |> with_name(:c)
::SubString{String}
Example
With substitute
you can write recursive parsers in a style inspired by (E)BNF. CombinedParsers.BNF.ebnf
uses substitute
.
julia> def = Either(:integer => !Either("0", Sequence(Optional("-"), substitute(:natural_number))),
:natural_number => !Sequence(substitute(:nonzero_digit), Repeat(substitute(:digit))),
:nonzero_digit => re"[1-9]",
:digit => Either("0", substitute(:nonzero_digit)))
|🗄 Either
├─ |🗄 Either |> ! |> with_name(:integer)
│ ├─ 0
│ └─ 🗄 Sequence
│ ├─ \-? |
│ └─ natural_number call substitute!
├─ 🗄 Sequence |> ! |> with_name(:natural_number)
│ ├─ nonzero_digit call substitute!
│ └─ * digit call substitute! |> Repeat
├─ [1-9] ValueIn |> with_name(:nonzero_digit)
└─ |🗄 Either |> with_name(:digit)
├─ 0
└─ nonzero_digit call substitute!
::Union{Nothing, Char, SubString{String}}
julia> substitute(def)
|🗄 Either
├─ |🗄 Either |> ! |> with_name(:integer)
│ ├─ 0
│ └─ 🗄 Sequence
│ ├─ \-? |
│ └─ 🗄 Sequence |> ! |> with_name(:natural_number) # branches hidden
├─ 🗄 Sequence |> ! |> with_name(:natural_number)
│ ├─ [1-9] ValueIn |> with_name(:nonzero_digit)
│ └─ |🗄* Either |> with_name(:digit) |> Repeat
│ ├─ 0
│ └─ [1-9] ValueIn |> with_name(:nonzero_digit)
├─ [1-9] ValueIn |> with_name(:nonzero_digit)
└─ |🗄 Either |> with_name(:digit)
├─ 0
└─ [1-9] ValueIn |> with_name(:nonzero_digit)
::Union{Char, SubString{String}}
Base.push!
— FunctionBase.push!(x::Either, option)
Push option
to x.options
as parser tried next if x
fails.
Recursive parsers can be built with push!
to Either
.
See also pushfirst!
and @syntax
.
Base.push!(x::WrappedParser{<:Either}, option)
Push option
to x.options
of repeated inner parser.
Base.pushfirst!
— FunctionBase.pushfirst!(x::WrappedParser{<:Either}, option)
Push option
as first x.options
of repeated inner parser.
CombinedParsers.either_result_type
— Functionreturn tuple(statetype,resulttype)
Parser generating parsers
CombinedParsers.FlatMap
— TypeFlatMap{P,S,Q<:Function,T} <: CombinedParser{S,T}
Like Scala's fastparse FlatMap. See after
CombinedParsers.after
— Functionafter(right::Function,left::AbstractToken)
after(right::Function,left::AbstractToken,T::Type)
Like Scala's fastparse FlatMap
julia> saying(v) = v == "same" ? v : "different";
julia> p = after(saying, String, "same"|"but")
🗄 FlatMap
├─ |🗄 Either
│ ├─ same
│ └─ but
└─ saying
::String
julia> p("samesame")
"same"
julia> p("butdifferent")
"different"
Assertions
CombinedParsers.AtStart
— TypeAtStart()
Parser succeding if and only if at index 1 with result_type
AtStart
.
julia> AtStart()
re"^"
CombinedParsers.AtEnd
— TypeAtEnd()
Parser succeding if and only if at last index with result_type
AtEnd
.
julia> AtEnd()
re"$"
CombinedParsers.Always
— TypeAlways()
Assertion parser matching always and not consuming any input. Returns Always()
.
julia> Always()
re""
CombinedParsers.Never
— TypeNever()
Assertion parser matching never.
julia> Never()
re"(*FAIL)"
Look behind
CombinedParsers.Lookbehind
— FunctionLookbehind(does_match::Bool, p)
PositiveLookbehind
if does_match==true
, NegativeLookbehind
otherwise.
CombinedParsers.PositiveLookbehind
— TypePositiveLookbehind(parser)
Parser that succeeds if and only if parser
succeeds before cursor. Consumes no input. The match is returned. Useful for checks like "must be preceded by parser
, don't consume its match".
CombinedParsers.NegativeLookbehind
— TypeNegativeLookbehind(parser)
Parser that succeeds if and only if parser
does not succeed before cursor. Consumes no input. nothing
is returned as match. Useful for checks like "must not be preceded by parser
, don't consume its match".
julia> la=NegativeLookbehind("keep")
re"(?<!keep)"
julia> parse("peek"*la,"peek")
("peek", re"(?<!keep)")
Look ahead
CombinedParsers.Lookahead
— FunctionLookahead(does_match::Bool, p)
PositiveLookahead
if does_match==true
, NegativeLookahead
otherwise.
CombinedParsers.PositiveLookahead
— TypePositiveLookahead(parser)
Parser that succeeds if and only if parser
succeeds, but consumes no input. The match is returned. Useful for checks like "must be followed by parser
, but don't consume its match".
julia> la=PositiveLookahead("peek")
re"(?=peek)"
julia> parse(la*AnyChar(),"peek")
("peek", 'p')
CombinedParsers.NegativeLookahead
— TypeNegativeLookahead(parser)
Parser that succeeds if and only if parser
does not succeed, but consumes no input. parser
is returned as match. Useful for checks like "must not be followed by parser
, don't consume its match".
julia> la = NegativeLookahead("peek")
re"(?!peek)"
julia> parse(la*AnyChar(),"seek")
(re"(?!peek)", 's')
Logging and Side-Effects
CombinedParsers.NamedParser
— TypeNamedParser{P,S,T} <: WrappedParser{P,S,T}
Struct with
name::Symbol
parser::P
doc::String
CombinedParsers.with_name
— Functionwith_name(name::Symbol,x; doc="")
A parser labelled with name
. Labels are useful in printing and logging.
See also: @with_names
, with_name
, log_names
CombinedParsers.@with_names
— Macro@with_names
Sets names of parsers within begin/end block to match the variables they are asigned to.
so, for example
julia> @with_names foo = AnyChar()
. AnyValue |> with_name(:foo)
::Char
julia> parse(log_names(foo),"ab")
match foo@1-2: ab
^
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)
CombinedParsers.log_names
— Functionlog_names(x,names=true; exclude=nothing)
Rebuild parser replacing NamedParser
instances with with_log
parsers. Log all NamedParser
instanses if names==true
or name in names
and not name in exclude
.
See also: with_log
, log_parser
, deepmap_parser
CombinedParsers.log_parser
— Functionlog_parser(message::Type, x::CombinedParser, a...; kw...)
log_parser(message::Function, x::CombinedParser, a...; kw...)
Transform parser including logging statements for sub-parsers of type message
or for which calling message
does not return nothing
.
CombinedParsers.with_log
— Functionwith_log(s::AbstractString,p, delta=5;nomatch=false)
Log matching process of parser p
, displaying delta
characters left of and right of match.
If nomatch==true
, also log when parser does not match.
See also: log_names
, with_effect
CombinedParsers.with_effect
— Functionwith_effect(f::Function,p,a...)
Call f(sequence,before_i,after_i,state,a...)
if p
matches, f(sequence,before_i,before_i,nothing,a...)
otherwise.
other
CombinedParsers.MappedSequenceParser
— TypeMappedSequenceParser(f::F,parser::P) where {F<:Function,P}
Match parser on CharMappedString
(f,sequence)
, e.g. in a caseless
parser.
CombinedParsers.MemoizingParser
— TypeMemoizingParser{P,S,T}
WrappedParser
memoizing all match states. For slow parsers with a lot of backtracking this parser can help improve speed.
(Sharing a good example where memoization makes a difference is appreciated.)
CombinedParsers.WithMemory
— TypeWithMemory(x) <: AbstractString
String wrapper with memoization of next match states for parsers at indices. Memoization is sometimes recommended as a way of improving the performance of parser combinators (like state machine optimization and compilation for regular languages).
A snappy performance gain could not be demonstrated so far, probably because the costs of state memory allocation for caching are often greater than recomputing a match. If you have a case where your performance benefits with this, let me know!
```