reflex::Input Class Reference

updated Tue Oct 1 2024 by Robert van Engelen
 
Classes | Public Types | Public Member Functions | Protected Attributes | List of all members
reflex::Input Class Reference

Input character sequence class for unified access to sources of input text. More...

#include <input.h>

Inheritance diagram for reflex::Input:
Inheritance graph
[legend]
Collaboration diagram for reflex::Input:
Collaboration graph
[legend]

Classes

class  dos_streambuf
 Stream buffer for reflex::Input to read DOS files, replaces CRLF by LF, derived from std::streambuf. More...
 
struct  file_encoding
 Common file_encoding constants. More...
 
struct  Handler
 FILE* handler functor base class to handle FILE* errors and non-blocking FILE* reads. More...
 
class  streambuf
 Stream buffer for reflex::Input, derived from std::streambuf. More...
 

Public Types

typedef unsigned short file_encoding_type
 Common file_encoding constants type. More...
 

Public Member Functions

 Input ()
 Construct empty input character sequence. More...
 
 Input (const Input &input)
 Copy constructor (with intended "move semantics" as internal state is shared, should not rely on using the rhs after copying). More...
 
 Input (const char *cstring, size_t size)
 Construct input character sequence from a char* string. More...
 
 Input (const char *cstring)
 Construct input character sequence from a NUL-terminated string. More...
 
 Input (const std::string &string)
 Construct input character sequence from a std::string. More...
 
 Input (const std::string *string)
 Construct input character sequence from a pointer to a std::string. More...
 
 Input (const wchar_t *wstring)
 Construct input character sequence from a NUL-terminated wide character string. More...
 
 Input (const std::wstring &wstring)
 Construct input character sequence from a std::wstring (may contain UTF-16 surrogate pairs). More...
 
 Input (const std::wstring *wstring)
 Construct input character sequence from a pointer to a std::wstring (may contain UTF-16 surrogate pairs). More...
 
 Input (FILE *file)
 Construct input character sequence from an open FILE* file descriptor, supports UTF-8 conversion from UTF-16 and UTF-32. More...
 
 Input (FILE *file, file_encoding_type enc, const unsigned short *page=NULL)
 Construct input character sequence from an open FILE* file descriptor, using the specified file encoding. More...
 
 Input (std::istream &istream)
 Construct input character sequence from a std::istream. More...
 
 Input (std::istream *istream)
 Construct input character sequence from a pointer to a std::istream. More...
 
Inputoperator= (const Input &input)
 Copy assignment operator. More...
 
 operator const char * () const
 Cast this Input object to a string, returns NULL when this Input is not a string. More...
 
 operator const wchar_t * () const
 Cast this Input object to a wide character string, returns NULL when this Input is not a wide string. More...
 
 operator FILE * () const
 Cast this Input object to a file descriptor FILE*, returns NULL when this Input is not a FILE*. More...
 
 operator std::istream * () const
 Cast this Input object to a std::istream*, returns NULL when this Input is not a std::istream. More...
 
 operator bool () const
 
const char * cstring () const
 Get the remaining string of this Input object, returns NULL when this Input is not a string. More...
 
const wchar_t * wstring () const
 Get the remaining wide character string of this Input object, returns NULL when this Input is not a wide string. More...
 
FILE * file () const
 Get the FILE* of this Input object, returns NULL when this Input is not a FILE*. More...
 
std::istream * istream () const
 Get the std::istream of this Input object, returns NULL when this Input is not a std::istream. More...
 
size_t size ()
 Get the size of the input character sequence in number of ASCII/UTF-8 bytes (zero if size is not determinable from a FILE* or std::istream source). More...
 
bool assigned () const
 Check if this Input object was assigned a character sequence. More...
 
void clear ()
 Clear this Input by unassigning it. More...
 
bool good () const
 Check if input is available. More...
 
bool eof () const
 Check if input reached EOF. More...
 
int get ()
 Get a single character (unsigned char 0..255) or EOF (-1) when end-of-input is reached. More...
 
size_t get (char *s, size_t n)
 Copy character sequence data into buffer. More...
 
void file_encoding (file_encoding_type enc, const unsigned short *page=NULL)
 Set encoding for FILE* input. More...
 
file_encoding_type file_encoding () const
 Get encoding of the current FILE* input. More...
 
void init ()
 Initialize the state after (re)setting the input source, auto-detects UTF BOM in FILE* input if the file size is known. More...
 
void file_init ()
 Called by init() for a FILE*. More...
 
void wstring_size ()
 Called by size() for a wstring. More...
 
void file_size ()
 Called by size() for a FILE*. More...
 
void istream_size ()
 Called by size() for a std::istream. More...
 
size_t file_get (char *s, size_t n)
 Implements get() on a FILE*. More...
 
void set_handler (Handler *handler)
 Set FILE* handler. More...
 

Protected Attributes

const char * cstring_
 char string input (when non-null) of length reflex::Input::size_ More...
 
const wchar_t * wstring_
 NUL-terminated wide string input (when non-null) More...
 
FILE * file_
 FILE* input (when non-null) More...
 
std::istream * istream_
 stream input (when non-null) More...
 
size_t size_
 size of the remaining input in bytes (size_ == 0 may indicate size is not set) More...
 
char utf8_ [8]
 UTF-8 normalization buffer, >=8 bytes. More...
 
unsigned short uidx_
 index in utf8_[] More...
 
unsigned short ulen_
 length of data (remaining after uidx_) in utf8_[] or 0 if no data More...
 
file_encoding_type utfx_
 file_encoding More...
 
const unsigned short * page_
 custom code page More...
 
Handlerhandler_
 to handle FILE* errors and non-blocking FILE* reads More...
 

Detailed Description

Input character sequence class for unified access to sources of input text.

Description

The Input class unifies access to a source of input text that constitutes a sequence of characters:

Example

The following example shows how to use the Input class to read a character sequence in blocks from a std::ifstream to copy to stdout:

std::ifstream ifs;
ifs.open("input.h", std::ifstream::in);
reflex::Input input(ifs);
char buf[1024];
size_t len;
while ((len = input.get(buf, sizeof(buf))) > 0)
fwrite(buf, 1, len, stdout);
if (!input.eof())
std::cerr << "An IO error occurred" << std::endl;
ifs.close();

Example

The following example shows how to use the Input class to store the entire content of a file in a temporary buffer:

reflex::Input input(fopen("input.h", "r"));
if (input.file() == NULL)
abort();
size_t len = input.size(); // file size (minus any leading UTF BOM)
char *buf = new char[len];
input.get(buf, len);
if (!input.eof())
std::cerr << "An IO error occurred" << std::endl;
fwrite(buf, 1, len, stdout);
delete[] buf;
fclose(input.file());

In the above, files with UTF-16 and UTF-32 content are converted to UTF-8 by get(buf, len). Also, size() returns the total number of UTF-8 bytes to copy in the buffer by get(buf, len). The size is computed depending on the UTF-8/16/32 file content encoding, i.e. given a leading UTF BOM in the file. This means that UTF-16/32 files are read twice, first internally with size() and then again with get(buf, len)`.

Example

The following example shows how to use the Input class to read a character sequence in blocks from a file:

reflex::Input input(fopen("input.h", "r"));
char buf[1024];
size_t len;
while ((len = input.get(buf, sizeof(buf))) > 0)
fwrite(buf, 1, len, stdout);
fclose(input);

Example

The following example shows how to use the Input class to echo characters one by one from stdin, e.g. reading input from a tty:

reflex::Input input(stdin);
char c;
while (input.get(&c, 1))
fputc(c, stdout);

Or if you prefer to use an int character and check for EOF explicitly:

reflex::Input input(stdin);
int c;
while ((c = input.get()) != EOF)
fputc(c, stdout);

Example

The following example shows how to use the Input class to read a character sequence in blocks from a wide character string, converting it to UTF-8 to copy to stdout:

reflex::Input input(L"Copyright ©"); // © is unicode U+00A9 and UTF-8 C2 A9
char buf[8];
size_t len;
while ((len = input.get(buf, sizeof(buf))) > 0)
fwrite(buf, 1, len, stdout);

Example

The following example shows how to use the Input class to convert a wide character string to UTF-8:

reflex::Input input(L"Copyright ©"); // © is unicode U+00A9 and UTF-8 C2 A9
size_t len = input.size(); // size of UTF-8 string
char *buf = new char[len + 1];
input.get(buf, len);
buf[len] = '\0'; // make \0-terminated

Example

The following example shows how to switch source inputs while reading input byte by byte (use a buffer as shown in other examples to improve efficiency):

reflex::Input input = "Hello";
std::string message;
char c;
while (input.get(&c, 1))
message.append(c);
input = L" world! To ∞ and beyond."; // switch input to a wide string
while (input.get(&c, 1))
message.append(c);

Example

The following examples shows how to use reflex::Input::streambuf to create an unbuffered std::istream:

reflex::Input input(fopen("legacy.txt", "r"), reflex::Input::file_encoding::ebcdic);
if (input.file() == NULL)
abort();
reflex::Input::streambuf streambuf(input);
std::istream stream(&streambuf);
std::string data;
int c;
while ((c = stream.get()) != EOF)
data.append(c);
fclose(input.file());

With reflex::BufferedInput::streambuf to create a buffered std::istream:

reflex::Input input(fopen("legacy.txt", "r"), reflex::Input::file_encoding::ebcdic);
if (input.file() == NULL)
abort();
std::istream stream(&streambuf);
std::string data;
int c;
while ((c = stream.get()) != EOF)
data.append(c);
fclose(input.file());

Member Typedef Documentation

typedef unsigned short reflex::Input::file_encoding_type

Common file_encoding constants type.

Constructor & Destructor Documentation

reflex::Input::Input ( )
inline

Construct empty input character sequence.

reflex::Input::Input ( const Input input)
inline

Copy constructor (with intended "move semantics" as internal state is shared, should not rely on using the rhs after copying).

Parameters
inputan Input object to share state with (undefined behavior results from using both objects)
reflex::Input::Input ( const char *  cstring,
size_t  size 
)
inline

Construct input character sequence from a char* string.

Parameters
cstringchar string
sizelength of the string
reflex::Input::Input ( const char *  cstring)
inline

Construct input character sequence from a NUL-terminated string.

Parameters
cstringNUL-terminated char* string
reflex::Input::Input ( const std::string &  string)
inline

Construct input character sequence from a std::string.

Parameters
stringinput string
reflex::Input::Input ( const std::string *  string)
inline

Construct input character sequence from a pointer to a std::string.

Parameters
stringinput string
reflex::Input::Input ( const wchar_t *  wstring)
inline

Construct input character sequence from a NUL-terminated wide character string.

Parameters
wstringNUL-terminated wchar_t* input string
reflex::Input::Input ( const std::wstring &  wstring)
inline

Construct input character sequence from a std::wstring (may contain UTF-16 surrogate pairs).

Parameters
wstringinput wide string
reflex::Input::Input ( const std::wstring *  wstring)
inline

Construct input character sequence from a pointer to a std::wstring (may contain UTF-16 surrogate pairs).

Parameters
wstringinput wide string
reflex::Input::Input ( FILE *  file)
inline

Construct input character sequence from an open FILE* file descriptor, supports UTF-8 conversion from UTF-16 and UTF-32.

Parameters
fileinput file
reflex::Input::Input ( FILE *  file,
file_encoding_type  enc,
const unsigned short *  page = NULL 
)
inline

Construct input character sequence from an open FILE* file descriptor, using the specified file encoding.

Parameters
fileinput file
encfile_encoding (when UTF BOM is not present)
pagecode page for file_encoding::custom
reflex::Input::Input ( std::istream &  istream)
inline

Construct input character sequence from a std::istream.

Parameters
istreaminput stream
reflex::Input::Input ( std::istream *  istream)
inline

Construct input character sequence from a pointer to a std::istream.

Parameters
istreaminput stream

Member Function Documentation

bool reflex::Input::assigned ( ) const
inline

Check if this Input object was assigned a character sequence.

Returns
true if this Input object was assigned (not default constructed or cleared)
void reflex::Input::clear ( )
inline

Clear this Input by unassigning it.

const char* reflex::Input::cstring ( ) const
inline

Get the remaining string of this Input object, returns NULL when this Input is not a string.

Returns
remaining unbuffered part of the NUL-terminated string or NULL
bool reflex::Input::eof ( ) const
inline

Check if input reached EOF.

Returns
true if input is at EOF and no characters are available
FILE* reflex::Input::file ( ) const
inline

Get the FILE* of this Input object, returns NULL when this Input is not a FILE*.

Returns
pointer to current file descriptor or NULL
void reflex::Input::file_encoding ( file_encoding_type  enc,
const unsigned short *  page = NULL 
)

Set encoding for FILE* input.

Parameters
encfile_encoding
pagecustom code page for file_encoding::custom

Get encoding of the current FILE* input.

Returns
current file_encoding constant
size_t reflex::Input::file_get ( char *  s,
size_t  n 
)

Implements get() on a FILE*.

Parameters
spoints to the string buffer to fill with input
nsize of buffer pointed to by s
void reflex::Input::file_init ( )

Called by init() for a FILE*.

void reflex::Input::file_size ( )

Called by size() for a FILE*.

int reflex::Input::get ( )
inline

Get a single character (unsigned char 0..255) or EOF (-1) when end-of-input is reached.

size_t reflex::Input::get ( char *  s,
size_t  n 
)
inline

Copy character sequence data into buffer.

Returns
the nonzero number of (less or equal to n) 8-bit characters added to buffer s from the current input, or zero when EOF
Parameters
spoints to the string buffer to fill with input
nsize of buffer pointed to by s
bool reflex::Input::good ( ) const
inline

Check if input is available.

Returns
true if a non-empty sequence of characters is available to get
void reflex::Input::init ( )
inline

Initialize the state after (re)setting the input source, auto-detects UTF BOM in FILE* input if the file size is known.

std::istream* reflex::Input::istream ( ) const
inline

Get the std::istream of this Input object, returns NULL when this Input is not a std::istream.

Returns
pointer to current std::istream or NULL
void reflex::Input::istream_size ( )

Called by size() for a std::istream.

reflex::Input::operator bool ( ) const
inline
Returns
true if a non-empty sequence of characters is available to get
reflex::Input::operator const char * ( ) const
inline

Cast this Input object to a string, returns NULL when this Input is not a string.

Returns
remaining unbuffered part of a NUL-terminated string or NULL
reflex::Input::operator const wchar_t * ( ) const
inline

Cast this Input object to a wide character string, returns NULL when this Input is not a wide string.

Returns
remaining unbuffered part of the NUL-terminated wide character string or NULL
reflex::Input::operator FILE * ( ) const
inline

Cast this Input object to a file descriptor FILE*, returns NULL when this Input is not a FILE*.

Returns
pointer to current file descriptor or NULL
reflex::Input::operator std::istream * ( ) const
inline

Cast this Input object to a std::istream*, returns NULL when this Input is not a std::istream.

Returns
pointer to current std::istream or NULL
Input& reflex::Input::operator= ( const Input input)
inline

Copy assignment operator.

void reflex::Input::set_handler ( Handler handler)
inline

Set FILE* handler.

size_t reflex::Input::size ( )
inline

Get the size of the input character sequence in number of ASCII/UTF-8 bytes (zero if size is not determinable from a FILE* or std::istream source).

Returns
the nonzero number of ASCII/UTF-8 bytes available to read, or zero when source is empty or if size is not determinable e.g. when reading from standard input
const wchar_t* reflex::Input::wstring ( ) const
inline

Get the remaining wide character string of this Input object, returns NULL when this Input is not a wide string.

Returns
remaining unbuffered part of the NUL-terminated wide character string or NULL
void reflex::Input::wstring_size ( )

Called by size() for a wstring.

Member Data Documentation

const char* reflex::Input::cstring_
protected

char string input (when non-null) of length reflex::Input::size_

FILE* reflex::Input::file_
protected

FILE* input (when non-null)

Handler* reflex::Input::handler_
protected

to handle FILE* errors and non-blocking FILE* reads

std::istream* reflex::Input::istream_
protected

stream input (when non-null)

const unsigned short* reflex::Input::page_
protected

custom code page

size_t reflex::Input::size_
protected

size of the remaining input in bytes (size_ == 0 may indicate size is not set)

unsigned short reflex::Input::uidx_
protected

index in utf8_[]

unsigned short reflex::Input::ulen_
protected

length of data (remaining after uidx_) in utf8_[] or 0 if no data

char reflex::Input::utf8_[8]
protected

UTF-8 normalization buffer, >=8 bytes.

file_encoding_type reflex::Input::utfx_
protected
const wchar_t* reflex::Input::wstring_
protected

NUL-terminated wide string input (when non-null)


The documentation for this class was generated from the following file: