reflex::AbstractMatcher Class Reference

updated Tue Oct 1 2024 by Robert van Engelen
 
Classes | Public Types | Public Member Functions | Public Attributes | Protected Types | Protected Member Functions | Protected Attributes | List of all members
reflex::AbstractMatcher Class Referenceabstract

The abstract matcher base class template defines an interface for all pattern matcher engines. More...

#include <absmatcher.h>

Inheritance diagram for reflex::AbstractMatcher:
Inheritance graph
[legend]
Collaboration diagram for reflex::AbstractMatcher:
Collaboration graph
[legend]

Classes

struct  Const
 AbstractMatcher::Const common constants. More...
 
struct  Context
 Context returned by before() and after() More...
 
struct  Handler
 Event handler functor base class to invoke when the buffer contents are shifted out, e.g. for logging the data searched. More...
 
class  Iterator
 AbstractMatcher::Iterator class for scanning, searching, and splitting input character sequences. More...
 
class  Operation
 AbstractMatcher::Operation functor to match input to a pattern, also provides a (const) AbstractMatcher::iterator to iterate over matches. More...
 
struct  Option
 AbstractMatcher::Options for matcher engines. More...
 

Public Types

typedef AbstractMatcher::Iterator< AbstractMatcheriterator
 std::input_iterator for scanning, searching, and splitting input character sequences More...
 
typedef AbstractMatcher::Iterator< const AbstractMatcherconst_iterator
 

Public Member Functions

 AbstractMatcher (const Input &input, const char *opt)
 Construct a base abstract matcher. More...
 
 AbstractMatcher (const Input &input, const Option &opt)
 Construct a base abstract matcher. More...
 
virtual ~AbstractMatcher ()
 Delete abstract matcher, deletes this matcher's internal buffer. More...
 
virtual AbstractMatcherclone ()=0
 Polymorphic cloning. More...
 
virtual void reset (const char *opt=NULL)
 Reset this matcher's state to the initial state and set options (when provided). More...
 
bool buffer (size_t blk=0)
 Set buffer block size for reading: use 0 (or omit argument) to buffer all input in which case returns true if all the data could be read and false if a read error occurred. More...
 
void set_handler (Handler *handler)
 Set event handler functor to invoke when the buffer contents are shifted out, e.g. for logging the data searched. More...
 
Context before ()
 Get the buffered context before the matching line. More...
 
Context after ()
 Get the buffered context after EOF is reached. More...
 
void interactive ()
 Set interactive input with buffer size of 1 to read data bytewise which is very slow. More...
 
void flush ()
 Flush the buffer's remaining content. More...
 
virtual size_t get (char *s, size_t n)
 Returns more input data directly from the source (method can be overriden, as by reflex::FlexLexer::get(s, n) for example that invokes reflex::FlexLexer::LexerInput(s, n)). More...
 
virtual bool wrap ()
 Returns true if wrapping of input after EOF is supported. More...
 
virtual AbstractMatcherinput (const Input &input)
 Set the input character sequence for this matcher and reset/restart the matcher. More...
 
AbstractMatcherbuffer (char *base, size_t size)
 Set the buffer base containing 0-terminated character data to scan in place (data may be modified), reset/restart the matcher. More...
 
size_t matches ()
 Returns nonzero capture index (i.e. true) if the entire input matches this matcher's pattern (and internally caches the true/false result to permit repeat invocations). More...
 
size_t accept () const
 Returns a positive integer (true) indicating the capture index of the matched text in the pattern or zero (false) for a mismatch. More...
 
const char * begin () const
 Returns pointer to the begin of the matched text (non-0-terminated), a constant-time operation, use with end() or use size() for text end/length. More...
 
const char * end () const
 Returns pointer to the exclusive end of the matched text, a constant-time operation. More...
 
const char * text ()
 Returns 0-terminated pattern match as a char pointer, does not include matched \0s, this is a constant-time operation. More...
 
std::string str () const
 Returns the text matched as a string, a copy of text(), may include pattern-matched \0s. More...
 
std::wstring wstr () const
 Returns the pattern match as a wide string, converted from UTF-8 text(), may include pattern-matched \0s. More...
 
size_t size () const
 Returns the length of the matched text in number of bytes, including pattern-matched \0s, a constant-time operation. More...
 
size_t wsize () const
 Returns the length of the matched text in number of wide characters. More...
 
int chr () const
 Returns the first 8-bit character of the text matched. More...
 
int wchr () const
 Returns the first wide character of the text matched. More...
 
void lineno_skip (bool f=false)
 Set or reset mode to count matching lines only and skip other (e.g. for speed). More...
 
void lineno (size_t n)
 Set or change the starting line number of the last match. More...
 
size_t lineno ()
 Updates and returns the starting line number of the match in the input character sequence. More...
 
size_t lines ()
 Returns the number of lines that the match spans. More...
 
size_t lineno_end ()
 Returns the inclusive ending line number of the match in the input character sequence. More...
 
void columno (size_t n)
 Set or change the starting column number of the last match. More...
 
size_t columno ()
 Updates and returns the starting column number of the matched text, taking tab spacing into account and counting wide characters as one character each. More...
 
size_t columns ()
 Returns the number of columns of the matched text, taking tab spacing into account and counting wide characters as one character each. More...
 
size_t columno_end ()
 Returns the inclusive ending column number of the matched text on the ending matching line, taking tab spacing into account and counting wide characters as one character each. More...
 
std::pair< size_t, std::string > pair () const
 Returns std::pair<size_t,std::string>(accept(), str()), useful for tokenizing input into containers of pairs. More...
 
std::pair< size_t, std::wstring > wpair () const
 Returns std::pair<size_t,std::wstring>(accept(), wstr()), useful for tokenizing input into containers of pairs. More...
 
size_t first () const
 Returns the position of the first character of the match in the input character sequence, a constant-time operation. More...
 
size_t last () const
 Returns the exclusive position of the last character of the match in the input character sequence, a constant-time operation. More...
 
bool at_bob () const
 Returns true if this matcher is at the start of a buffer to read an input character sequence. Use reset() to restart reading new input. More...
 
void set_bob (bool bob)
 Set/reset the begin of a buffer state. More...
 
bool at_end ()
 Returns true if this matcher has no more input to read from the input character sequence. More...
 
bool hit_end () const
 Returns true if this matcher hit the end of the input character sequence. More...
 
void set_end (bool eof)
 Set and force the end of input state. More...
 
bool at_bol () const
 Returns true if this matcher reached the begin of a new line. More...
 
void set_bol (bool bol)
 Set/reset the begin of a new line state. More...
 
bool at_bow ()
 Returns true if this matcher matched text that begins an ASCII word. More...
 
bool at_eow ()
 Returns true if this matcher matched text that ends an ASCII word. More...
 
int input ()
 Returns the next 8-bit character (unsigned char 0..255 or EOF) from the input character sequence, while preserving the current text() match (but pointer returned by text() may change; warning: does not preserve the yytext string pointer when options –flex and –bison are used). More...
 
int winput ()
 Returns the next wide character (unsigned 0..U+10FFFF or EOF) from the input character sequence, while preserving the current text() match (but pointer returned by text() may change; warning: does not preserve the yytext string pointer when options –flex and –bison are used). More...
 
void unput (char c)
 Put back one character (8-bit) on the input character sequence for matching, DANGER: invalidates the previous text() pointer and match info, unput is not honored when matching in-place using buffer(base, size) and nothing has been read yet. More...
 
void wunput (int c)
 Put back one (wide) character on the input character sequence for matching, DANGER: invalidates the previous text() pointer and match info, unput is not honored when matching in-place using buffer(base, size) and nothing has been read yet. More...
 
int peek ()
 Peek at the next character available for reading from the current input source. More...
 
const char * bol ()
 Returns pointer to the begin of the line in the buffer containing the matched text. More...
 
const char * eol (bool inclusive=false)
 Returns pointer to the end of the line (last char + 1) in the buffer containing the matched text, DANGER: invalidates previous bol() and text() pointers, use eol() before bol(), text(), begin(), and end() when those are used. More...
 
size_t fetch (size_t len)
 Return number of bytes available given number of bytes to fetch ahead, limited by input size and buffer size. More...
 
size_t avail ()
 Returns the number of bytes in the buffer available to search from the current begin()/text() position. More...
 
size_t border ()
 Returns the byte offset of the match from the start of the line. More...
 
const char * span ()
 Enlarge the match to span the entire line of input (excluding
), return text(). More...
 
std::string line ()
 Returns the line of input (excluding
) as a string containing the matched text as a substring. More...
 
std::wstring wline ()
 Returns the line of input (excluding
) as a wide string containing the matched text as a substring. More...
 
bool skip (char c)
 Skip input until the specified ASCII character is consumed and return true, or EOF is reached and return false. More...
 
bool skip (wchar_t c)
 Skip input until the specified Unicode character is consumed and return true, or EOF is reached and return false. More...
 
bool skip (const char *s)
 Skip input until the specified literal UTF-8 string is consumed and return true, or EOF is reached and return false. More...
 
const char * rest ()
 Fetch the rest of the input as text, useful for searching/splitting up to n times after which the rest is needed. More...
 
void more ()
 Append the next match to the currently matched text returned by AbstractMatcher::text, when the next match found is adjacent to the current match. More...
 
void less (size_t n)
 Truncate the AbstractMatcher::text length of the match to n characters in length and reposition for next match. More...
 
 operator size_t () const
 Cast this matcher to positive integer indicating the nonzero capture index of the matched text in the pattern, same as AbstractMatcher::accept. More...
 
 operator std::string () const
 Cast this matcher to a std::string of the text matched by this matcher. More...
 
 operator std::wstring () const
 Cast this matcher to a std::wstring of the text matched by this matcher. More...
 
 operator std::pair< size_t, std::string > () const
 Cast the match to std::pair<size_t,std::wstring>(accept(), wstr()), useful for tokenization into containers. More...
 
bool operator== (const char *rhs) const
 Returns true if matched text is equal to a string, useful for std::algorithm. More...
 
bool operator== (const std::string &rhs) const
 Returns true if matched text is equalt to a string, useful for std::algorithm. More...
 
bool operator== (size_t rhs) const
 Returns true if capture index is equal to a given size_t value, useful for std::algorithm. More...
 
bool operator== (int rhs) const
 Returns true if capture index is equal to a given int value, useful for std::algorithm. More...
 
bool operator!= (const char *rhs) const
 Returns true if matched text is not equal to a string, useful for std::algorithm. More...
 
bool operator!= (const std::string &rhs) const
 Returns true if matched text is not equal to a string, useful for std::algorithm. More...
 
bool operator!= (size_t rhs) const
 Returns true if capture index is not equal to a given size_t value, useful for std::algorithm. More...
 
bool operator!= (int rhs) const
 Returns true if capture index is not equal to a given int value, useful for std::algorithm. More...
 
virtual std::pair< const char *, size_t > operator[] (size_t n) const =0
 Returns captured text as a std::pair<const char*,size_t> with string pointer (non-0-terminated) and length. More...
 
virtual std::pair< size_t, const char * > group_id ()=0
 Returns the group capture identifier containing the group capture index >0 and name (or NULL) of a named group capture, or (1,NULL) by default. More...
 
virtual std::pair< size_t, const char * > group_next_id ()=0
 Returns the next group capture identifier containing the group capture index >0 and name (or NULL) of a named group capture, or (0,NULL) when no more groups matched. More...
 
void tabs (char n)
 Set tab size 1, 2, 4, or 8. More...
 
char tabs ()
 Returns current tab size 1, 2, 4, or 8. More...
 

Public Attributes

Operation scan
 functor to scan input (to tokenize input) More...
 
Operation find
 functor to search input More...
 
Operation split
 functor to split input More...
 
Input in
 input character sequence being matched by this matcher More...
 

Protected Types

typedef int Method
 a method is one of Const::SCAN, Const::FIND, Const::SPLIT, Const::MATCH More...
 

Protected Member Functions

virtual void init (const char *opt=NULL)
 Initialize the base abstract matcher at construction. More...
 
virtual size_t match (Method method)=0
 The abstract match operation implemented by pattern matching engines derived from AbstractMatcher. More...
 
bool grow (size_t need=Const::BLOCK)
 Shift or expand the internal buffer when it is too small to accommodate more input, where the buffer size is doubled when needed, change cur_, pos_, end_, max_, ind_, buf_, bol_, lpb_, and txt_. More...
 
int get ()
 Returns the next character read from the current input source. More...
 
void reset_text ()
 Reset the matched text by removing the terminating \0 when applicable, which is needed to search for a new match. More...
 
void set_current (size_t loc)
 Set the current position in the buffer for the next match. More...
 
void set_current_and_peek_more (size_t loc)
 Set the current match position in the buffer and peek for more text, allows large buffer shifts that aren't pinned to txt_. More...
 
int get_more ()
 Get the next character and grow the buffer to make more room if necessary. More...
 
int peek_more ()
 Peek at the next character and grow the buffer to make more room if necessary. More...
 

Protected Attributes

Option opt_
 options for matcher engines More...
 
char * buf_
 input character sequence buffer More...
 
char * txt_
 points to the matched text in buffer AbstractMatcher::buf_ More...
 
size_t len_
 size of the matched text More...
 
size_t cap_
 nonzero capture index of an accepted match or zero More...
 
size_t cur_
 next position in AbstractMatcher::buf_ to assign to AbstractMatcher::txt_ More...
 
size_t pos_
 position in AbstractMatcher::buf_ after AbstractMatcher::txt_ More...
 
size_t end_
 ending position of the input buffered in AbstractMatcher::buf_ More...
 
size_t max_
 total buffer size and max position + 1 to fill More...
 
size_t ind_
 current indent position More...
 
size_t blk_
 block size for block-based input reading, as set by AbstractMatcher::buffer More...
 
int got_
 last unsigned character we looked at (to determine anchors and boundaries) More...
 
int chr_
 the character located at AbstractMatcher::txt_[AbstractMatcher::len_] More...
 
const char * bol_
 begin of line pointer in buffer More...
 
Handlerevh_
 event handler functor to invoke when buffer contents are shifted out More...
 
const char * lpb_
 line pointer in buffer, updated when counting line numbers with lineno() More...
 
size_t lno_
 line number count (cached) More...
 
const char * cpb_
 column pointer in buffer, updated when counting column numbers with columno() More...
 
size_t cno_
 column number count (cached) More...
 
size_t num_
 character count of the input till bol_ More...
 
bool own_
 true if AbstractMatcher::buf_ was allocated and should be deleted More...
 
bool eof_
 input has reached EOF More...
 
bool mat_
 true if AbstractMatcher::matches() was successful More...
 
bool cml_
 true when counting matching lines instead of line numbers More...
 

Detailed Description

The abstract matcher base class template defines an interface for all pattern matcher engines.

The buffer expands when matches do not fit. The buffer size is initially BUFSZ.

_________________
| | | | |
buf_=| |text|rest|free|
|__|____|____|____|
^ ^ ^ ^
buf_ // points to buffered input, buffer may grow to fit long matches
cur_ // current position in buf_ while matching text, cur_ = pos_ afterwards, may be changed by peek() and more()
pos_ // position in buf_ to start the next match
end_ // position in buf_ that is free to fill with more input
max_ // allocated size of buf_, must ensure that max_ > end_ for text() to add a final \0
txt_ // points to the match, will be 0-terminated when text() or rest() are called
len_ // length of the match
chr_ // char located at txt_[len_] when txt_[len_] is set to \0 by text(), is \0 otherwise
got_ // buf_[cur_-1] or txt_[-1] character before this match (assigned before each match), initially Const::BOB
eof_ // true if no more data can/should be fetched to fill the buffer

Member Typedef Documentation

std::input_iterator for scanning, searching, and splitting input character sequences

std::input_iterator for scanning, searching, and splitting input character sequences

typedef int reflex::AbstractMatcher::Method
protected

Constructor & Destructor Documentation

reflex::AbstractMatcher::AbstractMatcher ( const Input input,
const char *  opt 
)
inline

Construct a base abstract matcher.

Parameters
inputinput character sequence for this matcher
optoption string of the form (A|N|T(=[[:digit:]])?|;)*
reflex::AbstractMatcher::AbstractMatcher ( const Input input,
const Option opt 
)
inline

Construct a base abstract matcher.

Parameters
inputinput character sequence for this matcher
optoptions
virtual reflex::AbstractMatcher::~AbstractMatcher ( )
inlinevirtual

Delete abstract matcher, deletes this matcher's internal buffer.

Member Function Documentation

size_t reflex::AbstractMatcher::accept ( ) const
inline

Returns a positive integer (true) indicating the capture index of the matched text in the pattern or zero (false) for a mismatch.

Returns
nonzero capture index of the match in the pattern, which may be matcher dependent, or zero for a mismatch, or Const::EMPTY for the empty last split
Context reflex::AbstractMatcher::after ( )
inline

Get the buffered context after EOF is reached.

bool reflex::AbstractMatcher::at_bob ( ) const
inline

Returns true if this matcher is at the start of a buffer to read an input character sequence. Use reset() to restart reading new input.

Returns
true if at the begin of an input sequence
bool reflex::AbstractMatcher::at_bol ( ) const
inline

Returns true if this matcher reached the begin of a new line.

Returns
true if at begin of a new line
bool reflex::AbstractMatcher::at_bow ( )
inline

Returns true if this matcher matched text that begins an ASCII word.

Returns
true if this matcher matched text that begins a word
bool reflex::AbstractMatcher::at_end ( )
inline

Returns true if this matcher has no more input to read from the input character sequence.

Returns
true if at end of input and a read attempt will produce EOF
bool reflex::AbstractMatcher::at_eow ( )
inline

Returns true if this matcher matched text that ends an ASCII word.

Returns
true if this matcher matched text that ends a word
size_t reflex::AbstractMatcher::avail ( )
inline

Returns the number of bytes in the buffer available to search from the current begin()/text() position.

Context reflex::AbstractMatcher::before ( )
inline

Get the buffered context before the matching line.

const char* reflex::AbstractMatcher::begin ( ) const
inline

Returns pointer to the begin of the matched text (non-0-terminated), a constant-time operation, use with end() or use size() for text end/length.

Returns
const char* pointer to the matched text in the buffer
const char* reflex::AbstractMatcher::bol ( )
inline

Returns pointer to the begin of the line in the buffer containing the matched text.

Returns
pointer to the begin of line
size_t reflex::AbstractMatcher::border ( )
inline

Returns the byte offset of the match from the start of the line.

Returns
border offset
bool reflex::AbstractMatcher::buffer ( size_t  blk = 0)
inline

Set buffer block size for reading: use 0 (or omit argument) to buffer all input in which case returns true if all the data could be read and false if a read error occurred.

Returns
true when successful to buffer all input when n=0
Parameters
blknew block size between 1 and Const::BLOCK, or 0 to buffer all input (default)
AbstractMatcher& reflex::AbstractMatcher::buffer ( char *  base,
size_t  size 
)
inline

Set the buffer base containing 0-terminated character data to scan in place (data may be modified), reset/restart the matcher.

Returns
this matcher
Parameters
basebase of the buffer containing 0-terminated character data
sizenonzero size of the buffer
int reflex::AbstractMatcher::chr ( ) const
inline

Returns the first 8-bit character of the text matched.

Returns
8-bit char
virtual AbstractMatcher* reflex::AbstractMatcher::clone ( )
pure virtual
void reflex::AbstractMatcher::columno ( size_t  n)
inline

Set or change the starting column number of the last match.

Parameters
nnew column number
size_t reflex::AbstractMatcher::columno ( )
inline

Updates and returns the starting column number of the matched text, taking tab spacing into account and counting wide characters as one character each.

Returns
column number
size_t reflex::AbstractMatcher::columno_end ( )
inline

Returns the inclusive ending column number of the matched text on the ending matching line, taking tab spacing into account and counting wide characters as one character each.

Returns
column number
size_t reflex::AbstractMatcher::columns ( )
inline

Returns the number of columns of the matched text, taking tab spacing into account and counting wide characters as one character each.

Returns
number of columns
const char* reflex::AbstractMatcher::end ( ) const
inline

Returns pointer to the exclusive end of the matched text, a constant-time operation.

Returns
const char* pointer to the exclusive end of the matched text in the buffer
const char* reflex::AbstractMatcher::eol ( bool  inclusive = false)
inline

Returns pointer to the end of the line (last char + 1) in the buffer containing the matched text, DANGER: invalidates previous bol() and text() pointers, use eol() before bol(), text(), begin(), and end() when those are used.

Returns
pointer to the end of line
Parameters
inclusivetrue if inclusive, i.e. point after
instead of at
size_t reflex::AbstractMatcher::fetch ( size_t  len)
inline

Return number of bytes available given number of bytes to fetch ahead, limited by input size and buffer size.

Returns
number of bytes available after fetching.
size_t reflex::AbstractMatcher::first ( ) const
inline

Returns the position of the first character of the match in the input character sequence, a constant-time operation.

Returns
position in the input character sequence
void reflex::AbstractMatcher::flush ( )
inline

Flush the buffer's remaining content.

virtual size_t reflex::AbstractMatcher::get ( char *  s,
size_t  n 
)
inlinevirtual

Returns more input data directly from the source (method can be overriden, as by reflex::FlexLexer::get(s, n) for example that invokes reflex::FlexLexer::LexerInput(s, n)).

Parameters
s
Returns
the nonzero number of (less or equal to n) 8-bit characters added to buffer s from the current input, or zero when EOF points to the string buffer to fill with input
Parameters
nsize of buffer pointed to by s
int reflex::AbstractMatcher::get ( )
inlineprotected

Returns the next character read from the current input source.

Returns
the character read (unsigned char 0..255) or EOF (-1)
int reflex::AbstractMatcher::get_more ( )
inlineprotected

Get the next character and grow the buffer to make more room if necessary.

Returns
the character read (unsigned char 0..255) or EOF (-1)
virtual std::pair<size_t,const char*> reflex::AbstractMatcher::group_id ( )
pure virtual

Returns the group capture identifier containing the group capture index >0 and name (or NULL) of a named group capture, or (1,NULL) by default.

Returns
a pair of size_t and string

Implemented in reflex::PCRE2Matcher, reflex::Matcher, reflex::BoostMatcher, reflex::StdMatcher, and reflex::LineMatcher.

virtual std::pair<size_t,const char*> reflex::AbstractMatcher::group_next_id ( )
pure virtual

Returns the next group capture identifier containing the group capture index >0 and name (or NULL) of a named group capture, or (0,NULL) when no more groups matched.

Returns
a pair of size_t and string

Implemented in reflex::PCRE2Matcher, reflex::Matcher, reflex::BoostMatcher, reflex::StdMatcher, and reflex::LineMatcher.

bool reflex::AbstractMatcher::grow ( size_t  need = Const::BLOCK)
inlineprotected

Shift or expand the internal buffer when it is too small to accommodate more input, where the buffer size is doubled when needed, change cur_, pos_, end_, max_, ind_, buf_, bol_, lpb_, and txt_.

Returns
true if buffer was shifted or enlarged
Parameters
needoptional needed space = Const::BLOCK size by default
bool reflex::AbstractMatcher::hit_end ( ) const
inline

Returns true if this matcher hit the end of the input character sequence.

Returns
true if EOF was hit (and possibly more input would have changed the result), false otherwise (but next read attempt may return EOF immediately)
virtual void reflex::AbstractMatcher::init ( const char *  opt = NULL)
inlineprotectedvirtual

Initialize the base abstract matcher at construction.

Parameters
optoptions
virtual AbstractMatcher& reflex::AbstractMatcher::input ( const Input input)
inlinevirtual

Set the input character sequence for this matcher and reset/restart the matcher.

Returns
this matcher
Parameters
inputinput character sequence for this matcher
int reflex::AbstractMatcher::input ( )
inline

Returns the next 8-bit character (unsigned char 0..255 or EOF) from the input character sequence, while preserving the current text() match (but pointer returned by text() may change; warning: does not preserve the yytext string pointer when options –flex and –bison are used).

Returns
the next character (unsigned char 0..255) from input or EOF (-1)
void reflex::AbstractMatcher::interactive ( )
inline

Set interactive input with buffer size of 1 to read data bytewise which is very slow.

Note
Use this method before any matching is done and before any input is read since the last time input was (re)set.
size_t reflex::AbstractMatcher::last ( ) const
inline

Returns the exclusive position of the last character of the match in the input character sequence, a constant-time operation.

Returns
position in the input character sequence
void reflex::AbstractMatcher::less ( size_t  n)
inline

Truncate the AbstractMatcher::text length of the match to n characters in length and reposition for next match.

Parameters
ntruncated string length
std::string reflex::AbstractMatcher::line ( )
inline

Returns the line of input (excluding
) as a string containing the matched text as a substring.

Returns
matching line as a string
void reflex::AbstractMatcher::lineno ( size_t  n)
inline

Set or change the starting line number of the last match.

Parameters
nnew line number
size_t reflex::AbstractMatcher::lineno ( )
inline

Updates and returns the starting line number of the match in the input character sequence.

Returns
line number
size_t reflex::AbstractMatcher::lineno_end ( )
inline

Returns the inclusive ending line number of the match in the input character sequence.

Returns
line number
void reflex::AbstractMatcher::lineno_skip ( bool  f = false)
inline

Set or reset mode to count matching lines only and skip other (e.g. for speed).

size_t reflex::AbstractMatcher::lines ( )
inline

Returns the number of lines that the match spans.

Returns
number of lines
virtual size_t reflex::AbstractMatcher::match ( Method  method)
protectedpure virtual

The abstract match operation implemented by pattern matching engines derived from AbstractMatcher.

Returns
nonzero when input matched the pattern using method Const::SCAN, Const::FIND, Const::SPLIT, or Const::MATCH

Implemented in reflex::Matcher, reflex::FuzzyMatcher, reflex::PCRE2Matcher, reflex::BoostMatcher, reflex::StdMatcher, and reflex::LineMatcher.

size_t reflex::AbstractMatcher::matches ( )
inline

Returns nonzero capture index (i.e. true) if the entire input matches this matcher's pattern (and internally caches the true/false result to permit repeat invocations).

Returns
nonzero capture index if the entire input matched this matcher's pattern, zero (i.e. false) otherwise
void reflex::AbstractMatcher::more ( )
inline

Append the next match to the currently matched text returned by AbstractMatcher::text, when the next match found is adjacent to the current match.

reflex::AbstractMatcher::operator size_t ( ) const
inline

Cast this matcher to positive integer indicating the nonzero capture index of the matched text in the pattern, same as AbstractMatcher::accept.

Returns
nonzero capture index of a match, which may be matcher dependent, or zero for a mismatch
reflex::AbstractMatcher::operator std::pair< size_t, std::string > ( ) const
inline

Cast the match to std::pair<size_t,std::wstring>(accept(), wstr()), useful for tokenization into containers.

Returns
std::pair<size_t,std::wstring>(accept(), wstr())
reflex::AbstractMatcher::operator std::string ( ) const
inline

Cast this matcher to a std::string of the text matched by this matcher.

Returns
std::string with matched text
reflex::AbstractMatcher::operator std::wstring ( ) const
inline

Cast this matcher to a std::wstring of the text matched by this matcher.

Returns
std::wstring converted to UCS from the 0-terminated matched UTF-8 text
bool reflex::AbstractMatcher::operator!= ( const char *  rhs) const
inline

Returns true if matched text is not equal to a string, useful for std::algorithm.

Returns
true if matched text is not equal to rhs string
Parameters
rhsrhs string to compare to
bool reflex::AbstractMatcher::operator!= ( const std::string &  rhs) const
inline

Returns true if matched text is not equal to a string, useful for std::algorithm.

Returns
true if matched text is not equal to rhs string
Parameters
rhsrhs string to compare to
bool reflex::AbstractMatcher::operator!= ( size_t  rhs) const
inline

Returns true if capture index is not equal to a given size_t value, useful for std::algorithm.

Returns
true if capture index is not equal to rhs
Parameters
rhscapture index to compare accept() to
bool reflex::AbstractMatcher::operator!= ( int  rhs) const
inline

Returns true if capture index is not equal to a given int value, useful for std::algorithm.

Returns
true if capture index is not equal to rhs
Parameters
rhscapture index to compare accept() to
bool reflex::AbstractMatcher::operator== ( const char *  rhs) const
inline

Returns true if matched text is equal to a string, useful for std::algorithm.

Returns
true if matched text is equal to rhs string
Parameters
rhsrhs string to compare to
bool reflex::AbstractMatcher::operator== ( const std::string &  rhs) const
inline

Returns true if matched text is equalt to a string, useful for std::algorithm.

Returns
true if matched text is equal to rhs string
Parameters
rhsrhs string to compare to
bool reflex::AbstractMatcher::operator== ( size_t  rhs) const
inline

Returns true if capture index is equal to a given size_t value, useful for std::algorithm.

Returns
true if capture index is equal to rhs
Parameters
rhscapture index to compare accept() to
bool reflex::AbstractMatcher::operator== ( int  rhs) const
inline

Returns true if capture index is equal to a given int value, useful for std::algorithm.

Returns
true if capture index is equal to rhs
Parameters
rhscapture index to compare accept() to
virtual std::pair<const char*,size_t> reflex::AbstractMatcher::operator[] ( size_t  n) const
pure virtual

Returns captured text as a std::pair<const char*,size_t> with string pointer (non-0-terminated) and length.

Returns
std::pair of string pointer and length in the captured text, where [0] returns std::pair(begin(), size())

Implemented in reflex::PCRE2Matcher, reflex::Matcher, reflex::StdMatcher, reflex::BoostMatcher, and reflex::LineMatcher.

std::pair<size_t,std::string> reflex::AbstractMatcher::pair ( ) const
inline

Returns std::pair<size_t,std::string>(accept(), str()), useful for tokenizing input into containers of pairs.

Returns
std::pair<size_t,std::string>(accept(), str())
int reflex::AbstractMatcher::peek ( )
inline

Peek at the next character available for reading from the current input source.

Returns
the character (unsigned char 0..255) or EOF (-1)
int reflex::AbstractMatcher::peek_more ( )
inlineprotected

Peek at the next character and grow the buffer to make more room if necessary.

Returns
the character (unsigned char 0..255) or EOF (-1)
virtual void reflex::AbstractMatcher::reset ( const char *  opt = NULL)
inlinevirtual

Reset this matcher's state to the initial state and set options (when provided).

Reimplemented in reflex::Matcher, reflex::PCRE2Matcher, reflex::BoostMatcher, reflex::StdMatcher, and reflex::LineMatcher.

void reflex::AbstractMatcher::reset_text ( )
inlineprotected

Reset the matched text by removing the terminating \0 when applicable, which is needed to search for a new match.

const char* reflex::AbstractMatcher::rest ( )
inline

Fetch the rest of the input as text, useful for searching/splitting up to n times after which the rest is needed.

Returns
const char* string of the remaining input (wrapped with more input when AbstractMatcher::wrap is defined)
void reflex::AbstractMatcher::set_bob ( bool  bob)
inline

Set/reset the begin of a buffer state.

Parameters
bobif true: set begin of buffer state
void reflex::AbstractMatcher::set_bol ( bool  bol)
inline

Set/reset the begin of a new line state.

Parameters
bolif true: set begin of a new line state
void reflex::AbstractMatcher::set_current ( size_t  loc)
inlineprotected

Set the current position in the buffer for the next match.

Parameters
locnew location in buffer
void reflex::AbstractMatcher::set_current_and_peek_more ( size_t  loc)
inlineprotected

Set the current match position in the buffer and peek for more text, allows large buffer shifts that aren't pinned to txt_.

Parameters
locwe don't need to keep text before this location in the buffer
void reflex::AbstractMatcher::set_end ( bool  eof)
inline

Set and force the end of input state.

void reflex::AbstractMatcher::set_handler ( Handler handler)
inline

Set event handler functor to invoke when the buffer contents are shifted out, e.g. for logging the data searched.

size_t reflex::AbstractMatcher::size ( ) const
inline

Returns the length of the matched text in number of bytes, including pattern-matched \0s, a constant-time operation.

Returns
match size in bytes
bool reflex::AbstractMatcher::skip ( char  c)
inline

Skip input until the specified ASCII character is consumed and return true, or EOF is reached and return false.

Returns
true if skipped to c, false if EOF is reached
Parameters
cASCII character to skip to
bool reflex::AbstractMatcher::skip ( wchar_t  c)
inline

Skip input until the specified Unicode character is consumed and return true, or EOF is reached and return false.

Returns
true if skipped to c, false if EOF is reached
Parameters
cUnicode character to skip to
bool reflex::AbstractMatcher::skip ( const char *  s)
inline

Skip input until the specified literal UTF-8 string is consumed and return true, or EOF is reached and return false.

Returns
true if skipped to c, false if EOF is reached
Parameters
sliteral UTF-8 string to skip to
const char* reflex::AbstractMatcher::span ( )
inline

Enlarge the match to span the entire line of input (excluding
), return text().

Returns
const char* span of text for the entire line
std::string reflex::AbstractMatcher::str ( ) const
inline

Returns the text matched as a string, a copy of text(), may include pattern-matched \0s.

Returns
string with text matched
void reflex::AbstractMatcher::tabs ( char  n)
inline

Set tab size 1, 2, 4, or 8.

Parameters
ntab size 1, 2, 4, or 8
char reflex::AbstractMatcher::tabs ( )
inline

Returns current tab size 1, 2, 4, or 8.

const char* reflex::AbstractMatcher::text ( )
inline

Returns 0-terminated pattern match as a char pointer, does not include matched \0s, this is a constant-time operation.

Returns
0-terminated const char* string with text matched
void reflex::AbstractMatcher::unput ( char  c)
inline

Put back one character (8-bit) on the input character sequence for matching, DANGER: invalidates the previous text() pointer and match info, unput is not honored when matching in-place using buffer(base, size) and nothing has been read yet.

Parameters
c8-bit character to put back
int reflex::AbstractMatcher::wchr ( ) const
inline

Returns the first wide character of the text matched.

Returns
wide char (UTF-8 converted to Unicode)
int reflex::AbstractMatcher::winput ( )
inline

Returns the next wide character (unsigned 0..U+10FFFF or EOF) from the input character sequence, while preserving the current text() match (but pointer returned by text() may change; warning: does not preserve the yytext string pointer when options –flex and –bison are used).

Returns
the next wide character (unsigned 0..U+10FFFF) or EOF (-1)
std::wstring reflex::AbstractMatcher::wline ( )
inline

Returns the line of input (excluding
) as a wide string containing the matched text as a substring.

Returns
matching line as a wide string
std::pair<size_t,std::wstring> reflex::AbstractMatcher::wpair ( ) const
inline

Returns std::pair<size_t,std::wstring>(accept(), wstr()), useful for tokenizing input into containers of pairs.

Returns
std::pair<size_t,std::wstring>(accept(), wstr())
virtual bool reflex::AbstractMatcher::wrap ( )
inlinevirtual

Returns true if wrapping of input after EOF is supported.

Returns
true if input was succesfully wrapped
size_t reflex::AbstractMatcher::wsize ( ) const
inline

Returns the length of the matched text in number of wide characters.

Returns
the length of the match in number of wide (multibyte UTF-8) characters
std::wstring reflex::AbstractMatcher::wstr ( ) const
inline

Returns the pattern match as a wide string, converted from UTF-8 text(), may include pattern-matched \0s.

Returns
wide string with text matched
void reflex::AbstractMatcher::wunput ( int  c)
inline

Put back one (wide) character on the input character sequence for matching, DANGER: invalidates the previous text() pointer and match info, unput is not honored when matching in-place using buffer(base, size) and nothing has been read yet.

Parameters
ccharacter to put back

Member Data Documentation

size_t reflex::AbstractMatcher::blk_
protected

block size for block-based input reading, as set by AbstractMatcher::buffer

const char* reflex::AbstractMatcher::bol_
protected

begin of line pointer in buffer

char* reflex::AbstractMatcher::buf_
protected

input character sequence buffer

size_t reflex::AbstractMatcher::cap_
protected

nonzero capture index of an accepted match or zero

int reflex::AbstractMatcher::chr_
protected

the character located at AbstractMatcher::txt_[AbstractMatcher::len_]

bool reflex::AbstractMatcher::cml_
protected

true when counting matching lines instead of line numbers

size_t reflex::AbstractMatcher::cno_
protected

column number count (cached)

const char* reflex::AbstractMatcher::cpb_
protected

column pointer in buffer, updated when counting column numbers with columno()

size_t reflex::AbstractMatcher::cur_
protected

next position in AbstractMatcher::buf_ to assign to AbstractMatcher::txt_

size_t reflex::AbstractMatcher::end_
protected

ending position of the input buffered in AbstractMatcher::buf_

bool reflex::AbstractMatcher::eof_
protected

input has reached EOF

Handler* reflex::AbstractMatcher::evh_
protected

event handler functor to invoke when buffer contents are shifted out

Operation reflex::AbstractMatcher::find

functor to search input

int reflex::AbstractMatcher::got_
protected

last unsigned character we looked at (to determine anchors and boundaries)

Input reflex::AbstractMatcher::in

input character sequence being matched by this matcher

size_t reflex::AbstractMatcher::ind_
protected

current indent position

size_t reflex::AbstractMatcher::len_
protected

size of the matched text

size_t reflex::AbstractMatcher::lno_
protected

line number count (cached)

const char* reflex::AbstractMatcher::lpb_
protected

line pointer in buffer, updated when counting line numbers with lineno()

bool reflex::AbstractMatcher::mat_
protected

true if AbstractMatcher::matches() was successful

size_t reflex::AbstractMatcher::max_
protected

total buffer size and max position + 1 to fill

size_t reflex::AbstractMatcher::num_
protected

character count of the input till bol_

Option reflex::AbstractMatcher::opt_
protected

options for matcher engines

bool reflex::AbstractMatcher::own_
protected

true if AbstractMatcher::buf_ was allocated and should be deleted

size_t reflex::AbstractMatcher::pos_
protected
Operation reflex::AbstractMatcher::scan

functor to scan input (to tokenize input)

Operation reflex::AbstractMatcher::split

functor to split input

char* reflex::AbstractMatcher::txt_
protected

points to the matched text in buffer AbstractMatcher::buf_


The documentation for this class was generated from the following file: