Parser Templates

Composing with TextParse

Parsing Numbers or Dates is most efficiently done with TextParse.

Dates.tryparsenext โ€” Function
TextParse.tryparsenext(x::CombinedParser,str,i,till,opts=TextParse.default_opts)

TextParse.jl integrates with CombinedParsers.jl both ways.

tryparsenext returns a tuple (result, nextpos) where result is of type Nullable{T}, Nullable{T}() if parsing failed, non-null containing the parsed value if it succeeded. If parsing succeeded, nextpos is the position the next token, if any, starts at. If parsing failed, nextpos is the position at which the

parsing failed.

julia> using TextParse

julia> p = ("Number:" * Repeat(' ') * TextParse.Numeric(Int))[3]
๐Ÿ—„ Sequence[3]
โ”œโ”€ Number\:
โ”œโ”€ \ *  |> Repeat
โ””โ”€ <Int64>
::Int64

julia> parse(p, "Number:    42")
42

julia> TextParse.tryparsenext(p, "Number:    42")
(Nullable{Int64}(42), 14)
CombinedParsers.DateParser โ€” Function
DateParser(format::DateFormat...)
DateTimeParser(format::DateFormat...)

Create a parser matching either one format using TextParse.DateTimeToken for Dates.Date and Dates.DateTime respectively.

DateParser(format::AbstractString...; locale="english")
DateTimeParser(format::AbstractString...; locale="english")

Convenience functions for above using Dates.DateFormat.(format, locale).

CombinedParsers.DateTimeParser โ€” Function
DateParser(format::DateFormat...)
DateTimeParser(format::DateFormat...)

Create a parser matching either one format using TextParse.DateTimeToken for Dates.Date and Dates.DateTime respectively.

DateParser(format::AbstractString...; locale="english")
DateTimeParser(format::AbstractString...; locale="english")

Convenience functions for above using Dates.DateFormat.(format, locale).

For non base 10 numbers, use

CombinedParsers.integer_base โ€” Function
integer_base(base,mind=0,maxd=Repeat_max)

Parser matching a integer format on base base.

Note

Uses a second Base.parse call on match.

A custom parser could aggregate result incrementally while matching.

Constants and Conversion

Base.convert โ€” Function
Base.convert(::Type{CombinedParser},x)

parser(x).

Base.convert(::Type{Char},y::CharWithOptions)

Strips options.

CombinedParsers.wrap โ€” Function
wrap(x::CombinedParser; log = nothing, trace = false)

transform a parser by wrapping sub-parsers in logging and tracing parser types.

Parser Building Blocks

PCRE regular expressions provides established building blocks as escape sequences. Equivalent CombinedParsers are provided by name.

Note

You can also use PCRE regex syntax with the @re_str to build identical CombinedParsers!

Predefined Parsers

Horizontal and Vertical Space

Trimming space

CombinedParsers.trim โ€” Function
trim(p...; whitespace=horizontal_space_maybe, 
           left=whitespace, right=whitespace)

Ignore whitespace left and right of sSequence(p...).

CombinedParsers.@trimmed โ€” Macro
@trimmed

Create parser within whitespace_maybe to match the variables they are asigned to.

See also trim.

DocTestFilters = r"map\(.+\)"

so, for example

julia> @trimmed foo = AnyChar()
๐Ÿ—„ Sequence[2]
โ”œโ”€ (?>[\h]*) ValueIn |> Repeat |> ! |> Atomic
โ”œโ”€ . AnyValue |> with_name(:foo)
โ””โ”€ (?>[\h]*) ValueIn |> Repeat |> ! |> Atomic
::Char

julia> parse(log_names(foo),"  ab  ")
   match foo@3-4:   ab
                    ^
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

Matching Space

CombinedParsers.whitespace_char โ€” Constant
whitespace_char  = re"[[:space:]]"
whitespace_maybe = re"(?>[[:space:]]*)"
whitespace       = re"(?>[[:space:]]+)"
julia> CombinedParsers.char_label_table(whitespace_char)
|      Char |                     |
|-----------|---------------------|
|     '\t' | Horizontal tab (HT) |
|     '\v' |   Vertical tab (VT) |
|     '\f' |      Form feed (FF) |
|       ' ' |               Space |
|   '\u85' |     Next line (NEL) |
| '\u200e' |  Left-to-right mark |
| '\u200f' |  Right-to-left mark |
| '\u2028' |      Line separator |
| '\u2029' | Paragraph separator |
CombinedParsers.horizontal_space_char โ€” Constant
horizontal_space_char  = re"[\h]"
horizontal_space_maybe = re"(?>[\h]*)"
horizontal_space       = re"(?>[\h]+)"
julia> CombinedParsers.char_label_table(horizontal_space_char)
|      Char |                           |
|-----------|---------------------------|
|     '\t' |       Horizontal tab (HT) |
|       ' ' |                     Space |
|       'ย ' |           Non-break space |
|       'แš€' |          Ogham space mark |
| '\u180e' | Mongolian vowel separator |
|       'โ€€' |                   En quad |
|       'โ€' |                   Em quad |
|       'โ€‚' |                  En space |
|       'โ€ƒ' |                  Em space |
|       'โ€„' |        Three-per-em space |
|       'โ€…' |         Four-per-em space |
|       'โ€†' |          Six-per-em space |
|       'โ€‡' |              Figure space |
|       'โ€ˆ' |         Punctuation space |
|       'โ€‰' |                Thin space |
|       'โ€Š' |                Hair space |
|       'โ€ฏ' |     Narrow no-break space |
|       'โŸ' | Medium mathematical space |
|       'ใ€€' |         Ideographic space |
CombinedParsers.vertical_space_char โ€” Constant
vertical_space_char  = re"[\v]"
vertical_space_maybe = re"(?>[\v]*)"
vertical_space       = re"(?>[\v]+)"
julia> CombinedParsers.char_label_table(vertical_space_char)
|      Char |                      |
|-----------|----------------------|
|     '\n' |        Linefeed (LF) |
|     '\v' |    Vertical tab (VT) |
|     '\f' |       Form feed (FF) |
|     '\r' | Carriage return (CR) |
|   '\u85' |      Next line (NEL) |
| '\u2028' |       Line separator |
| '\u2029' |  Paragraph separator |
CombinedParsers.bsr โ€” Constant
CombinedParsers.newline
CombinedParsers.Regexp.bsr

newlines, PCRE \r backslash R (BSR).

julia> CombinedParsers.Regexp.bsr
(?>|๐Ÿ—„) Either |> Atomic |> with_name(:bsr)
โ”œโ”€ \r\n 
โ””โ”€ [\n\x0b\f\r\x85] ValueIn |> !
::SubString{String}

Words

CombinedParsers.caseless โ€” Function
caseless(x)

MappedSequenceParser(lowercase, deepmap_parser(lowercase,parser(x))).

DocTestFilters = r"[0-9.]+ .s.*"
julia> p = caseless("AlsO")
๐Ÿ—„  |> MappedSequenceParser
โ”œโ”€ also
โ””โ”€ lowercase
::SubString{String}

julia> p("also")
"also"

julia> using BenchmarkTools;

julia> @btime match(p,"also");
  51.983 ns (2 allocations: 176 bytes)

julia> p = parser("also")
re"also"

julia> @btime match(p,"also");
  44.759 ns (2 allocations: 176 bytes)

Predefined Assertions

CombinedParsers.at_linestart โ€” Constant
at_linestart
julia> CombinedParsers.Regexp.at_linestart
|๐Ÿ—„ Either |> with_name(:at_linestart)
โ”œโ”€ ^ AtStart
โ””โ”€ (?<=๐Ÿ—„)) Either |> Atomic |> with_name(:bsr) |> PositiveLookbehind
   โ”œโ”€ \r\n
   โ””โ”€ [\n\x0b\f\r\x85] ValueIn |> !
::Union{AtStart, SubString}
Note

used in re"^" if Base.PCRE.MULTILINE is set.

CombinedParsers.at_lineend โ€” Constant
at_lineend
julia> CombinedParsers.Regexp.at_lineend
|๐Ÿ—„ Either |> with_name(:at_lineend)
โ”œโ”€ $ AtEnd
โ””โ”€ (?=(?>|๐Ÿ—„)) Either |> Atomic |> with_name(:bsr) |> PositiveLookahead
   โ”œโ”€ \r\n 
   โ””โ”€ [\n\x0b\f\r\x85] ValueIn |> !
::Union{AtEnd, SubString{String}}
Note

used in re"$" if Base.PCRE.MULTILINE is set.