CombinedParsers in pure Julia

CombinedParsers is a package for parsing into julia types. Parsimoneously compose parsers with the functional parser combinator paradigm, utilize Julia’s type inferrence for transformations, log conveniently for debugging, and let Julia compile your parser for good performance.

Features:

  • Clear syntax integrates grammar and transformations with Julia type inference.
  • Higher-order parsers depending on the parsing state allow for not context-free parsers.
  • All valid parsings can be iterated lazily.
  • Interoperable with TextParse.jl: existing TextParse.AbstractToken implementations can be used with CombinedParsers. CombinedParsers.AbstractParser provide TextParse.tryparsenext and can be used e.g. in CSV.jl.
  • Parametric parser and state types enable Julia compiler optimizations.
  • Compiled regular expression parsers in pure julia are provided with the re_str macro.
  • AbstractTrees.jl interface provides colored and clearly layed out printing in the REPL.
  • Convenient logging of the parsing process with NamedParsers and SideeffectParsers.
  • CombinedParsers generalize from strings to parsing any type supporting getindex, nextind, prevind methods.
"Julia eval for +-*/ operators."
function eval_ops((l,opr))
    for (op,val) in opr
        l = eval( Expr(:call, Symbol(op), l, val))
    end
    l::Rational{Int}
end

using TextParse
@with_names begin
    number = map(Rational{Int}, TextParse.Numeric(Int))
    factor = Either(number)  # or expression in parens, see push! below
    divMul = map(eval_ops,
                 Sequence( factor, Repeat( CharIn("*/"), factor ) ) )
    addSub = map(eval_ops,
		 divMul * Repeat( CharIn("+-") * divMul ) )
    parens = Sequence(2, "(",addSub,")" )
    push!(factor, parens)
    expr = (addSub * AtEnd())[1]
end;
parse(log_names(expr), "1/((1+2)*4+3*(5*2))")

1//42

Is every rational answer ultimately the inverse of a universal question in life?

Limitations

This is an alpha release. Collaborations are very welcome!

CombinedParsers.jl was tested against the PCRE C library testset. Some tests did not pass: 3130 passed, 123 failed, 0 errored, 0 broken. Failing tests are capture related.

Benchmarks

Performance is acceptable although optimizations are currently disabled/incomplete.

using BenchmarkTools

## PCRE
re = r"a*"
@btime match(re,"a"^3);
#      146.067 ns (5 allocations: 272 bytes)


## CombinedParsers
pc = re"a*"
@btime match(pc,"a"^3);
#      1.049 μs (10 allocations: 480 bytes)
@btime _iterate(pc,"a"^3);
#      283.928 ns (5 allocations: 272 bytes)

Preliminary performance tests were even more encouraging, in trivial example above 10x faster than PCRE.

Next Steps

  • Performance optimizations
    • generated functions
    • parsing memoization
    • backtracking optimization with multiple dispatch on parser and state type.
  • Documentation
  • Publishing packages for parsing wikitext and orgmode markup
  • error backtracking, stepping & debugging
  • test coverage

Check out the github repo for more info and start writing wour own parser with CombinedParsers.jl.