RE/flex scanner generator replacement for Flex/Lex. More...

#include "reflex.h"

Include dependency graph for reflex.cpp:

Macros
#define	WITH_BOOST_PARTIAL_MATCH_BUG
	Work around the Boost.Regex partial_match bug by forcing the generated scanner to buffer all input. More...

Functions
int	fopen_s (FILE *file, const char name, const char *mode)
	Safer fopen_s() More...

char	char_tolower (char c)
	Convert to lower case. More...

static std::string	file_ext (std::string &name, const char *ext)
	Add file extension if not present, modifies the string argument and returns a copy. More...

int	main (int argc, char **argv)
	Main program instantiates Reflex class and runs `Reflex::main(argc, argv)` More...

Variables
static const char *	options_table []
	Table with command-line reflex options and lex specification %options. More...

static const Reflex::Library	library_table []
	Table with regex library properties. More...

Detailed Description

RE/flex scanner generator replacement for Flex/Lex.

Author: Robert van Engelen - engel.nosp@m.en@g.nosp@m.enivi.nosp@m.a.co.nosp@m.m

Copyright: (c) 2016-2023, Robert van Engelen, Genivia Inc. All rights reserved.; (c) BSD-3 License - see LICENSE.txt

Macro Definition Documentation

#define WITH_BOOST_PARTIAL_MATCH_BUG

Work around the Boost.Regex partial_match bug by forcing the generated scanner to buffer all input.

Function Documentation

char char_tolower ( char c )

inline

Convert to lower case.

Returns: lower case char

static std::string file_ext	(	std::string &	name,
		const char *	ext
	)

static

Add file extension if not present, modifies the string argument and returns a copy.

Returns: copy of file name string with extension ext

int fopen_s	(	FILE **	file,
		const char *	name,
		const char *	mode
	)

inline

Safer fopen_s()

int main	(	int	argc,
		char **	argv
	)

Main program instantiates Reflex class and runs Reflex::main(argc, argv)

Variable Documentation

const Reflex::Library library_table[]

static

Table with regex library properties.

This table is extensible and new regex libraries may be added. Each regex library is described by:

a unique name that is used for specifying the matcher=NAME option
the header file to be included
the pattern type or class used by the matcher class
the matcher class
the regex library signature

A regex library signature is a string of the form "decls:escapes?+.", see reflex::convert.

The optional "decls:" part specifies which modifiers and other special (?...) constructs are supported:

non-capturing group (?:...) is supported
one or all of "imsx" specify which (?ismx) modifiers are supported:
'i' specifies that (?i...) case-insensitive matching is supported
'm' specifies that (?m...) multiline mode is supported for the ^ and $ anchors
's' specifies that (?s...) dotall mode is supported
'x' specifies that (?x...) freespace mode is supported
# specifies that (?#...) comments are supported
= specifies that (?=...) lookahead is supported
< specifies that (?<...) lookbehind is supported
! specifies that (?!=...) and (?!<...) are supported
^ specifies that (?^...) negative (reflex) patterns are supported

The "escapes" characters specify which standard escapes are supported:

a for \a (BEL U+0007)
b for \b (BS U+0008) in brackets [\b] only AND the \b word boundary
c for \cX control character specified by X modulo 32
d for \d ASCII digit [0-9]
e for \e ESC U+001B
f for \f FF U+000C
h for \h ASCII blank [ \t] (SP U+0020 or TAB U+0009)
i for \i reflex indent anchor
j for \j reflex dedent anchor
j for \k reflex undent anchor
l for \l ASCII lower case letter [a-z]
n for \n LF U+000A
p for \p{C} Unicode character classes, also implies Unicode {X}, , , , ,
r for \r CR U+000D
s for \s space (SP, TAB, LF, VT, FF, or CR)
t for \t TAB U+0009
u for \u ASCII upper case letter [A-Z] (when not followed by {XXXX})
v for \v VT U+000B
w for \w ASCII word-like character [0-9A-Z_a-z]
x for \xXX 8-bit character encoding in hexadecimal
y for \y word boundary
z for \z end of input anchor
`for `\ begin of input anchor
' for \' end of input anchor
< for \< left word boundary
> for \> right word boundary
A for \A begin of input anchor
B for \B non-word boundary
D for \D ASCII non-digit [^0-9]
H for \H ASCII non-blank [^ \t]
L for \L ASCII non-lower case letter [^a-z]
N for \N not a newline
P for \P{C} Unicode inverse character classes, see 'p'
Q for \Q...\E quotations
R for \R Unicode line break
S for \S ASCII non-space (no SP, TAB, LF, VT, FF, or CR)
U for \U ASCII non-upper case letter [^A-Z]
W for \W ASCII non-word-like character [^0-9A-Z_a-z]
X for \X any Unicode character
Z for \Z end of input anchor, before the final line break
0 for \0nnn 8-bit character encoding in octal requires a leading 0
'1' to '9' for backreferences (not applicable to lexer specifications)

Note that 'p' is a special case to support Unicode-based matchers that natively support UTF8 patterns and Unicode classes {C}, {C}, , , , , , , , , , and {X}. Basically, 'p' prevents conversion of Unicode patterns to UTF8. This special case does not support {NAME} expansions in bracket lists such as [a-z||{upper}] and {lower}{+}{upper} used in lexer specifications.

The optional "?+" specify lazy and possessive support:

? lazy quantifiers for repeats are supported
+ possessive quantifiers for repeats are supported

The optional "." (dot) specifies that dot matches any character except newline. A dot is implied by the presence of the 's' modifier, and can be omitted in that case.

const char* options_table[]

static

Table with command-line reflex options and lex specification %options.

The table consists of option names with hyphens replaced by underscores.

Macros

Functions

Variables

Detailed Description

Macro Definition Documentation

Function Documentation

Variable Documentation