RE/flex regex tools

The regex-centric, fast lexical analyzer generator for C++

RE/flex is a more powerful free open source alternative to the Flex fast lexical analyzer generator. RE/flex accepts more expressive lexer specifications with Unicode patterns, indent/nodent/dedent anchors, lazy quantifiers, word boundaries and many other modern features compared to Flex. RE/flex generates clean source code lexer classes that are thread-safe. RE/flex accepts Flex specifications and is compatible with Bison (Yacc). RE/flex also offers an extremely fast regex library for C++.

What people are saying about RE/flex

"First of all thanks for amazing tool!! It's so cool!" -ivan-khudyashev on GitHub

"[...] this project is awesome!" -DaOnlyOwner on GitHub

"[...] I ended up writing a little code around this but nothing as sophisticated as the wonderfulness you have done here." -koothkeeper on CodeProject

"Easy to use, out of the box support for moving from flex. Additional syntactic features for generating regexp. Matcher functionality in rules sections enables additional options for distributing logic between parser and lexer. Compatible with UTF16 input, which was a great deal [to have] for my application." -arietz on SourceForge

"It's a fast and very user friendly flex derivative c++ scanner which provides more functionality over Flex." -imran7 on SourceForge

How does RE/flex work?

The RE/flex lexical analyzer generator takes a Flex lexer specification and generates a faster C++ lexer. The C++ lexer class is saved in clean source code that is easy to use and understand. This class is then compiled and linked with the RE/flex library to produce a scanner:

Overview

The generated scanner may be a stand-alone application or be part of a larger program, such as a compiler that tokenizes the input:

Overview

A smart input class is used by the scanner to process diverse input sources, including UTF-8/16/32 files, streams, strings, and memory. The generated scanner executes actions, typically to produce tokens for a parser. The actions are triggered by matching patterns to the input as specified in the lexer specification.

What is new and noteworthy?

RE/flex differs in many respects from other lexical analyzer generators, supporting full Unicode, indent/nodent/dedent anchors, lazy quantifiers, word boundaries, and more. RE/flex also offers performance tuning with the built-in performance analyzer. Perhaps the most striking difference is that there are two regex matching engines to choose from for the generated scanner: the RE/flex matcher or the Boost.Regex library. The RE/flex matcher runs in direct code as a super fast deterministic finite state machine or as a deterministic finite state machine table, depending on options selected. The Boost.Regex library offers a richer regex syntax but uses a slower non-deterministic finite state machine to match input.

See also

  • Constructing Fast Lexical Analyzers with RE/flex - a Modern Alternative to Flex for C++
  • RE/flex manual
  • GitHub repo

  • Genius
    via
    Automation