PCRE Regular expressions
You can use PCRE @re_str in combination with CombinedParser's constructors.
Constructing Regular expressions
CombinedParsers.Regexp.@re_str โ Macro@re_str(x,flags)Construct a ParserWithCaptures from PCRE regex syntax, such as re"^[a-z]*$", without interpolation and unescaping (except for quotation mark " which still has to be escaped). Plug-in replacement for PCRE string macro @r_str.
The regex also accepts one or more flags, listed after the ending quote, to change its behaviour:
ienables case-insensitive matchingmtreats the^and$tokens as matching the start and end of individual lines, as opposed to the whole string.sallows the.modifier to match newlines.xenables "comment mode": whitespace is ignored except when escaped with\, and#is treated as starting a comment.adisablesUCPmode (enables ASCII mode). By default\B,\b,\D,\d,\S,\s,\W,\w, etc. match based on Unicode character properties. With this option, these sequences only match ASCII characters.xxenables "extended comment mode": whitespace in bracket character matchers are ignored.
julia> re"a|c"i
|๐ Either
โโ [aA] ValueIn
โโ [cC] ValueIn
::Char
julia> re"a+c"
๐ Sequence
โโ a+ |> Repeat
โโ c
::Tuple{Vector{Char}, Char}See also Regcomb, parse_options.
CombinedParsers.Regexp โ ModuleA regular expression parser transforming a PCRE string to a CombinedParser equivalent to the regular expression.
CombinedParsers.Regexp.Regcomb โ FunctionRegcomb(x::AbstractString[, flags=""])Syntax for flags in @re_str.
Base.getindex โ MethodBase.getindex(x::ParseMatch{<:Any,<:SequenceWithCaptures,<:Any},i::Union{Integer,Symbol})Gets capture i as SubString.
See API of RegexMatch.
Base.getproperty โ MethodBase.getproperty(m::ParseMatch{<:Any,<:SequenceWithCaptures,<:Any},key::Symbol)enable m.captures and m.match.
See API of RegexMatch.
CombinedParsers.regex_escape โ Function regex_escape(s::AbstractString)regular expression metacharacters are escaped along with whitespace.
Compatibility & Unit Tests
CombinedParsers.Regexp.character_class โ Constantjulia> CombinedParsers.Regexp.character_class
๐ Sequence |> map(#57)
โโ \[\:
โโ |๐ Either
โ โโ alpha => [\p{L}] ValueIn
โ โโ lower => [\p{Ll}] ValueIn
โ โโ upper => [\p{Lu}] ValueIn
โ โโ word => [\p{L}\p{Nl}\p{Nd}\p{Pc}] ValueIn
โ โโ digit => [\p{Nd}] ValueIn
โ โโ xdigit => [[:xdigit:]] ValueIn
โ โโ alnum => [\p{L}\p{N}] ValueIn
โ โโ blank => [\t\p{Zs}] ValueIn
โ โโ cntrl => [\p{Cc}] ValueIn
โ โโ graph => [^\p{Z}\p{C}] ValueNotIn
โ โโ print => [\p{C}] ValueIn
โ โโ punct => [\p{P}] ValueIn
โ โโ space => [\r\v\n\f\t\p{Z}] ValueIn
โโ \:\]
::CombinedParsers.ValueMatcherTODO:
By default, characters with values greater than 128 do not match any of the POSIX character classes. However, if the PCREUCP option is passed to pcrecompile(), some of the classes are changed so that Unicode character properties are used. This is achieved by replacing certain POSIX classes by other sequences, as follows:
- [:alnum:] becomes \p{Xan}
- [:alpha:] becomes \p{L}
- [:blank:] becomes \h
- [:digit:] becomes \p{Nd}
- [:lower:] becomes \p{Ll}
- [:space:] becomes \p{Xps}
- [:upper:] becomes \p{Lu}
- [:word:] becomes \p{Xwd}
Base.:== โ Method==(pcre_m::RegexMatch,pc_m::ParseMatch)equal iif values of .match, .offset, .ncodeunits and .captures are equal.
CombinedParsers.Regexp.@pcre_tests โ Macro@pcre_testsetDefine @syntax pcre_test and @syntax pcre_tests for parsing unit test output of the PCRE library. The parser is used for testing CombinedParser and benchmarking against Regex.
CombinedParsers.Regexp
CombinedParsers._iterate โ Method_iterate(p::ParserWithCaptures, sequence::SequenceWithCaptures,a...)Base.empty!(sequence) before iteration. (Why?)
Parsing Options
PCRE options are supported
CombinedParsers.Regexp.with_options โ Functionwith_options(flags::UInt32,x::AbstractString)Return 'xifiszero(0), otherwiseStringWithOptionswithflags`.
with_options(flags::UInt32,x::Char)Return 'xifiszero(0), otherwiseCharWithOptionswithflags`.
with_options(flags::AbstractString,x)Return with_options(parse_options(options),x), see parse_options.
with_options(set_flags::UInt32, unset_flags::UInt32,x)Set options set_flags | ( x.flags & ~unset_flags ) if x isa WithOptions, set options set_flags otherwise.
CombinedParsers.Regexp.parse_options โ Functionparse_options(options::AbstractString)Return PCRE option mask parsed from options.
Parser for flags in @re_str.
julia> CombinedParsers.Regexp.pcre_options_parser
๐ Sequence[2]
โโ ^ AtStart
โโ ๐* Sequence[1] |> Repeat |> map(splat_or)
โ โโ |๐ Either
โ โ โโ dupnames => 0x00000040 |> with_name(:DUPNAMES)
โ โ โโ xx => 0x01000000 |> with_name(:EXTENDED_MORE)
โ โ โโ i => 0x00000008 |> with_name(:CASELESS)
โ โ โโ m => 0x00000400 |> with_name(:MULTILINE)
โ โ โโ n => 0x00002000 |> with_name(:NO_AUTO_CAPTURE)
โ โ โโ U => 0x00040000 |> with_name(:UNGREEDY)
โ โ โโ J => 0x00000040 |> with_name(:DUPNAMES)
โ โ โโ s => 0x00000020 |> with_name(:DOTALL)
โ โ โโ x => 0x00000080 |> with_name(:EXTENDED)
โ โ โโ B => 0x00000000 |> with_name(:BINCODE)
โ โ โโ I => 0x00000000 |> with_name(:INFO)
โ โโ ,? |missing
โโ $ AtEnd
::UInt32
CombinedParsers.Regexp.StringWithOptions โ TypeA lazy element transformation type (e.g. AbstractString), getindex wraps elements in with_options(flags,...).
With parsing options
TODO: make flags a transformation function?
CombinedParsers.Regexp.CharWithOptions โ TypeA lazy element transformation type (e.g. AbstractString), getindex wraps elements in with_options(flags,...).
With parsing options
TODO: make flags a transformation function?
CombinedParsers.Regexp.OnOptionsParser โ TypeParser wrapper sequence with if_options.
CombinedParsers.Regexp.on_options โ Functionon_options(flags::Integer,parser)create parser that matches if flags are set in sequence, and parser matches.
Used for PCRE parsing, e.g.
Either(
on_options(Base.PCRE.MULTILINE,
'^' => at_linestart),
parser('^' => AtStart())
)CombinedParsers.Regexp.ParserOptions โ TypeA wrapper matching the inner parser on with_options(set_flags, unset_flags, sequence).
CombinedParsers.Regexp.FilterOptions โ TypeLazy wrapper for a sequence, masking elements in getindex with MatchingNever if any of flags are not set.
TODO: make flags a filter function? resolve confound of sequence and value, like StringWithOptions, CharWithOptions
CombinedParsers.Regexp.MatchingNever โ TypeHelper struct to mask sequence elements from matchers.
Regular Expression Types
CombinedParsers.Regexp.ParserWithCaptures โ TypeTop level parser supporting regular expression features captures, backreferences and subroutines. Collects subroutines in field subroutines::Vector and indices of named capture groups in field names::Dict.
implicitly called in match
See also Backreference, Capture, Subroutine
CombinedParsers.Regexp.SequenceWithCaptures โ TypeSequenceWithCaptures ensapsulates a sequence to be parsed, and parsed captures.
This struct will allow for captures a sequence-level state. For next version, a match-level state passed as _iterate argument is considered.
See also ParserWithCaptures
CombinedParsers.Regexp.Capture โ TypeCapture a parser result, optionally with a name. index field is initialized when calling ParserWithCaptures on the parser.
CombinedParsers.Regexp.Backreference โ TypeBackreference(f::Function,index::Integer)
Backreference(f::Function,name::Union{Nothing,Symbol},index::Integer)
Backreference(f::Function,name::AbstractString)Parser matching previously captured sequence, optionally with a name. index field is recursively set when calling 'ParserWithCaptures` on the parser.
CombinedParsers.Regexp.Subroutine โ TypeParser matching preceding capture, optionally with a name. index field is recursively set when calling ParserWithCaptures on the parser.
CombinedParsers.Regexp.subroutine_index_reset โ Methodhttps://www.pcre.org/original/doc/html/pcrepattern.html#SEC16
CombinedParsers.Regexp.index โ Methodindex(parser::Subroutine,sequence)Index of a subroutine. "If you make a subroutine call to a non-unique named subpattern, the one that corresponds to the first occurrence of the name is used." (what about "In the absence of duplicate numbers (see the previous section) this is the one with the lowest number."?)
CombinedParsers.Regexp.Conditional โ TypeConditional parser, _iterate cycles conditionally on _iterate_condition through matches in field yes and no respectively.
CombinedParsers.Regexp.DupSubpatternNumbers โ TypeParser wrapper for ParserWithCaptures, setting resetindex=true in `deepmapparser(::typeof(indexedcaptures),...)`.
julia> p = re"(?|(a)|(b))\1"
๐ Sequence |> regular expression combinator with 1 capturing groups
โโ |๐ Either |> DupSubpatternNumbers
โ โโ (a) |> Capture 1
โ โโ (b) |> Capture 1
โโ \g{1} Backreference
::Tuple{Char, AbstractString}
julia> match(p, "aa")
ParseMatch("aa", 1="a")
julia> match(p, "bb")
ParseMatch("bb", 1="b")
See also pcre doc