Overview

ParseMatch

CombinedParsers.jl provides the @re_str macro as a plug-in replacement for the base Julia @r_str macro. Base Julia PCRE regular expressions:

julia> pattern = r"(?<a>a|B)+c"r"(?<a>a|B)+c"
julia> mr = match(pattern,"aBc")RegexMatch("aBc", a="B")

CombinedParsers.Regexp regular expression:

julia> pattern = re"(?<a>a|B)+c"๐Ÿ—„ Sequence |> regular expression combinator with 1 capturing groups
โ”œโ”€ (?<a>|๐Ÿ—„)+ Either |> Capture 1 |> with_name(:a) |> Repeat
โ”‚  โ”œโ”€ a
โ”‚  โ””โ”€ B
โ””โ”€ c
::Tuple{Vector{Char}, Char}
julia> mre = match(pattern,"aBc")ParseMatch("aBc", a="B")

The ParseMatch type has getproperty and getindex methods for handling like RegexMatch.

julia> mre.match"aBc"
julia> mre.captures1-element Vector{SubString{String}}: "B"
julia> mre[1]"B"
julia> mre[:a]"B"
Note

CombinedParsers.jl is tested and benchmarked against the PCRE C library testset, see compliance report.

Parsing

match searches for the first match of the Regex in the String and return a RegexMatch/Parsematch object containing the match and captures, or nothing if the match failed. If a capture matches repeatedly only the last match is captured.

julia> match(pattern,"aBBac")ParseMatch("aBBac", a="a")

Base.parse methods parse a String into a Julia type. A CombinedParser p will parse into an instance of result_type(p). For parsers defined with the @re_str the result_types are nested Tuples and Vectors of SubString, Chars and Missing.

julia> parse(pattern,"aBBac")(['a', 'B', 'B', 'a'], 'c')

Iterating

If a parsing is not uniquely defined different parsings can be lazily iterated, conforming to Julia's iterate interface.

for p in parse_all(re"^(a|ab|b)+$","abab")
	println(p)
end
(re"^", Union{Char, Tuple{Char, Char}}['a', 'b', 'a', 'b'], re"$")
(re"^", Union{Char, Tuple{Char, Char}}['a', 'b', ('a', 'b')], re"$")
(re"^", Union{Char, Tuple{Char, Char}}[('a', 'b'), 'a', 'b'], re"$")
(re"^", Union{Char, Tuple{Char, Char}}[('a', 'b'), ('a', 'b')], re"$")

Performance

CombinedParsers are fast, utilizing parametric types and generated functions in the Julia compiler.

Compared with the Base.Regex (PCRE C implementation)

using BenchmarkTools
pattern = r"[aB]+c";
@benchmark match(pattern,"aBaBc")
BenchmarkTools.Trial: 10000 samples with 690 evaluations.
 Range (min โ€ฆ max):  184.778 ns โ€ฆ  23.575 ฮผs  โ”Š GC (min โ€ฆ max):  0.00% โ€ฆ 98.58%
 Time  (median):     196.782 ns               โ”Š GC (median):     0.00%
 Time  (mean ยฑ ฯƒ):   245.407 ns ยฑ 932.857 ns  โ”Š GC (mean ยฑ ฯƒ):  15.63% ยฑ  4.07%

     โ–†โ–ˆ                                                          
  โ–‚โ–…โ–ˆโ–ˆโ–ˆโ–‡โ–„โ–†โ–…โ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–‚โ–โ–‚โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–‚โ–โ–‚โ–‚โ–โ–โ–โ– โ–‚
  185 ns           Histogram: frequency by time          291 ns <

 Memory estimate: 224 bytes, allocs estimate: 3.

CombinedParsers are slightly faster in this case, and for many other tested parsers.

pattern = re"[aB]+c";
@benchmark match(pattern,"aBaBc")
BenchmarkTools.Trial: 10000 samples with 200 evaluations.
 Range (min โ€ฆ max):  410.505 ns โ€ฆ 82.686 ฮผs  โ”Š GC (min โ€ฆ max):  0.00% โ€ฆ 99.19%
 Time  (median):     428.952 ns              โ”Š GC (median):     0.00%
 Time  (mean ยฑ ฯƒ):   556.538 ns ยฑ  2.745 ฮผs  โ”Š GC (mean ยฑ ฯƒ):  17.04% ยฑ  3.43%

   โ–†โ–ˆโ–ˆโ–…โ–…โ–„โ–„โ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–‚โ–‚โ–‚โ–โ–          โ–โ–โ–โ–โ– โ– โ–โ–โ–โ– โ– โ–โ–               โ–‚
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–‡โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–ˆโ–†โ–‡โ–†โ–†โ–†โ–†โ–…โ–…โ–… โ–ˆ
  411 ns        Histogram: log(frequency) by time       692 ns <

 Memory estimate: 544 bytes, allocs estimate: 5.

Matching Regex captures are supported for compatibility

pattern = r"([aB])+c"
@benchmark match(pattern,"aBaBc")
BenchmarkTools.Trial: 10000 samples with 506 evaluations.
 Range (min โ€ฆ max):  222.158 ns โ€ฆ 38.584 ฮผs  โ”Š GC (min โ€ฆ max):  0.00% โ€ฆ 99.19%
 Time  (median):     232.386 ns              โ”Š GC (median):     0.00%
 Time  (mean ยฑ ฯƒ):   313.363 ns ยฑ  1.480 ฮผs  โ”Š GC (mean ยฑ ฯƒ):  18.85% ยฑ  3.96%

  โ–„โ–ˆโ–‡โ–…โ–…โ–„โ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–‚โ–‚โ–โ–โ–               โ–โ–โ–โ–โ–                        โ–‚
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–ˆโ–ˆโ–ˆโ–‡โ–‡โ–‡โ–‡โ–‡โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–‡โ–‡โ–‡โ–ˆโ–‡โ–ˆโ–‡โ–ˆโ–†โ–‡โ–‡โ–‡โ–…โ–‡โ–‡โ–‡โ–†โ–…โ–…โ–… โ–ˆ
  222 ns        Histogram: log(frequency) by time       441 ns <

 Memory estimate: 288 bytes, allocs estimate: 4.

CombinedParsers.Regexp.Captures are slow compared with PCRE,

pattern = re"([aB])+c";
@benchmark match(pattern,"aBaBc")
BenchmarkTools.Trial: 10000 samples with 148 evaluations.
 Range (min โ€ฆ max):  696.054 ns โ€ฆ 96.962 ฮผs  โ”Š GC (min โ€ฆ max):  0.00% โ€ฆ 98.97%
 Time  (median):     731.385 ns              โ”Š GC (median):     0.00%
 Time  (mean ยฑ ฯƒ):   970.875 ns ยฑ  4.370 ฮผs  โ”Š GC (mean ยฑ ฯƒ):  21.06% ยฑ  4.63%

   โ–†โ–ˆโ–ˆโ–‡โ–†โ–†โ–…โ–„โ–„โ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–ƒโ–‚โ–โ–โ– โ–โ–                                    โ–‚
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–‡โ–‡โ–‡โ–†โ–†โ–†โ–†โ–†โ–†โ–…โ–…โ–†โ–‡โ–†โ–‡โ–‡โ–‡โ–‡โ–‡โ–ˆโ–ˆโ–ˆโ–‡โ–ˆโ–‡โ–‡โ–†โ–‡โ–†โ–† โ–ˆ
  696 ns        Histogram: log(frequency) by time      1.15 ฮผs <

 Memory estimate: 1.28 KiB, allocs estimate: 10.

But with CombinedParsers you capture more flexibly with transformations anyway.

julia> pattern = re"[aB]+c";
julia> @btime (mre = match(pattern,"aBaBc")) 415.387 ns (5 allocations: 544 bytes) ParseMatch("aBaBc")
julia> @btime get(mre) 66.217 ns (2 allocations: 128 bytes) (['a', 'B'], 'c')

Transformations

Transform the result of a parsing with map. The result_type is inferred automatically using julia type inference.

julia> p = map(length,re"(ab)*")(ab)* Sequence |> Capture 1 |> Repeat |> map(length) |> regular expression combinator with 1 capturing groups
::Int64
julia> parse(p,"abababab")4

Conveniently, calling getindex(::CombinedParser,::Integer) and map(::Integer,::CombinedParser) create a transforming parser selecting from the result of the parsing.

julia> parse(map(IndexAt(2),re"abc"),"abc")'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)
julia> parse(re"abc"[2],"abc")'b': ASCII/Unicode U+0062 (category Ll: Letter, lowercase)

Next: The User guide provides a summary of CombinedParsers types.