cppassist  1.0.0.f4fab4f962ff
C++ sanctuary for small but powerful and frequently required, stand alone features.
Classes | Public Types | Public Member Functions | List of all members
cppassist::Tokenizer Class Reference

Text parser tool that converts a text buffer into a stream of tokens. More...

#include <cppassist/include/cppassist/tokenizer/Tokenizer.h>

Classes

struct  Lookahead
 Token information from lookahead. More...
 
struct  Token
 Token More...
 

Public Types

enum  Option {
  OptionParseStrings = 1, OptionParseNumber = 2, OptionParseBoolean = 4, OptionParseNull = 8,
  OptionCStyleComments = 16, OptionCppStyleComments = 32, OptionShellStyleComments = 64, OptionIncludeComments = 128
}
 Parser options. More...
 
enum  TokenType {
  TokenEndOfStream, TokenWhitespace, TokenComment, TokenStandalone,
  TokenString, TokenNumber, TokenBoolean, TokenNull,
  TokenSingleChar, TokenWord
}
 Token types. More...
 

Public Member Functions

 Tokenizer ()
 Constructor. More...
 
 ~Tokenizer ()
 Destructor. More...
 
unsigned int options () const
 Get parsing options. More...
 
void setOptions (unsigned int options)
 Set parsing options. More...
 
bool hasOption (Option option) const
 Check if a specific parsing option is set. More...
 
const std::string & whitespace () const
 Get whitespace characters. More...
 
void setWhitespace (const std::string &whitespace)
 Set whitespace characters. More...
 
const std::string & quotationMarks () const
 Get quotation marks. More...
 
void setQuotationMarks (const std::string &quotationMarks)
 Set quotation marks. More...
 
const std::string & singleCharacters () const
 Get single characters. More...
 
void setSingleCharacters (const std::string &singleCharacters)
 Set single characters. More...
 
const std::vector< std::string > & standalones () const
 Get standalone strings. More...
 
void setStandalones (const std::vector< std::string > &standalones)
 Set standalone strings. More...
 
bool loadDocument (const std::string &filename)
 Load text document to parse. More...
 
void setDocument (const std::string &document)
 Set text document to parse. More...
 
void setDocument (const char *beginDoc, const char *endDoc)
 Set text document to parse. More...
 
Token parseToken ()
 Parse and return the next token. More...
 

Detailed Description

Text parser tool that converts a text buffer into a stream of tokens.

Remarks
A tokenizer takes a text buffer and identifies individual tokens separated by white space. It returns those tokens one by one, removing the white space in between. Based on this low-level parsing tool, text parsers (e.g., JSON) can be implemented.

Member Enumeration Documentation

◆ Option

Parser options.

Enumerator
OptionParseStrings 

Parse strings (use setQuotationMarks to set string characters)

OptionParseNumber 

Parse numbers.

OptionParseBoolean 

Parse boolean values.

OptionParseNull 

Parse null value.

OptionCStyleComments 

Enable '/* */' for multi-line comments.

OptionCppStyleComments 

Enable '//' for one-line comments.

OptionShellStyleComments 

Enable '#' for one-line comments.

OptionIncludeComments 

Include comments in the output of the tokenizer.

◆ TokenType

Token types.

Enumerator
TokenEndOfStream 

No token read, end of stream reached.

TokenWhitespace 

Token contains only whitespace

TokenComment 

Token contains a comment

TokenStandalone 

Token contains a standalone string

TokenString 

Token contains a string

TokenNumber 

Token contains number

TokenBoolean 

Token contains a boolean value

TokenNull 

Token contains a null value

TokenSingleChar 

Token contains a single character

TokenWord 

Token contains a regular word (any other than above)

Constructor & Destructor Documentation

◆ Tokenizer()

cppassist::Tokenizer::Tokenizer ( )

Constructor.

◆ ~Tokenizer()

cppassist::Tokenizer::~Tokenizer ( )

Destructor.

Member Function Documentation

◆ options()

unsigned int cppassist::Tokenizer::options ( ) const

Get parsing options.

Returns
Parsing options

◆ setOptions()

void cppassist::Tokenizer::setOptions ( unsigned int  options)

Set parsing options.

Parameters
[in]optionsParsing options

◆ hasOption()

bool cppassist::Tokenizer::hasOption ( Option  option) const

Check if a specific parsing option is set.

Returns
true if option is set, else false

◆ whitespace()

const std::string& cppassist::Tokenizer::whitespace ( ) const

Get whitespace characters.

Returns
Characters that are considered whitespace

◆ setWhitespace()

void cppassist::Tokenizer::setWhitespace ( const std::string &  whitespace)

Set whitespace characters.

Parameters
[in]whitespaceCharacters that are considered whitespace

◆ quotationMarks()

const std::string& cppassist::Tokenizer::quotationMarks ( ) const

Get quotation marks.

Returns
Characters that can enclose a string

◆ setQuotationMarks()

void cppassist::Tokenizer::setQuotationMarks ( const std::string &  quotationMarks)

Set quotation marks.

Parameters
[in]quotationMarksCharacters that can enclose a string

◆ singleCharacters()

const std::string& cppassist::Tokenizer::singleCharacters ( ) const

Get single characters.

Returns
Characters that stand on their own

◆ setSingleCharacters()

void cppassist::Tokenizer::setSingleCharacters ( const std::string &  singleCharacters)

Set single characters.

Parameters
[in]singleCharactersCharacters that stand on their own

◆ standalones()

const std::vector<std::string>& cppassist::Tokenizer::standalones ( ) const

Get standalone strings.

Returns
Strings that stand on their own

◆ setStandalones()

void cppassist::Tokenizer::setStandalones ( const std::vector< std::string > &  standalones)

Set standalone strings.

Parameters
[in]standalonesStrings that stand on their own

◆ loadDocument()

bool cppassist::Tokenizer::loadDocument ( const std::string &  filename)

Load text document to parse.

Parameters
[in]filenameFilename of text document
Returns
true if file could be loaded, else false

◆ setDocument() [1/2]

void cppassist::Tokenizer::setDocument ( const std::string &  document)

Set text document to parse.

Parameters
[in]documentText document

◆ setDocument() [2/2]

void cppassist::Tokenizer::setDocument ( const char *  beginDoc,
const char *  endDoc 
)

Set text document to parse.

Parameters
[in]beginDocPointer to the first character inside the document
[in]endDocPointer to the first character outside of the document

◆ parseToken()

Token cppassist::Tokenizer::parseToken ( )

Parse and return the next token.

Returns
Token
Remarks
If there are no tokens left, Token::type is set to TokenEndOfStream.

The documentation for this class was generated from the following file: