Character Sets
A character represents a subset of low-ASCII characters, used as a building block for constructing rules. The library models them as callable predicates invocable with this equivalent signature:
/// Return true if ch is in the set
bool( char ch ) const noexcept;
The CharSet concept describes the requirements on
syntax and semantics for these types. Here we declare
a character set type that includes the horizontal and
vertical whitespace characters:
// code_grammar_2_2
The type trait is_charset determines if a type meets
the requirements:
// code_grammar_2_3
Character sets are always passed as values. As with rules,
we declare an instance of the type for notational convenience.
The constexpr designation is used to make it a zero-cost
abstraction:
// code_grammar_2_4
For best results, ensure that user-defined character set types
are constexpr constructible.
The functions find_if and find_if_not are used to
search a string for the first matching or the first non-matching
character from a set. The example below skips any leading
whitespace and then returns everything from the first
non-whitespace character to the last non-whitespace
character:
// code_grammar_2_5
The function can now be called thusly:
// code_grammar_2_6
The library provides these often-used character sets:
Some of the character sets in the library have implementations optimized for the particular character set or optimized in general, often in ways that take advantage of opportunities not available to standard library facilities. For example, custom code enhancements using Streaming SIMD Extensions 2 (SSE2), available on all x86 and x64 architectures.
The lut_chars Type
The lut_chars type satisfies the CharSet
requirements and offers an optimized constexpr
implementation which provides enhanced performance
and notational convenience for specifying character
sets. Compile-time instances can be constructed
from strings:
// code_grammar_2_7
We can use operator+ and operator- notation to add and
remove elements from the set at compile time. For example,
sometimes the character 'y' sounds like a vowel:
// code_grammar_2_8
The type is named after its implementation, which is a lookup table ("lut") of packed bits. This allows for a variety of construction methods and flexible composition. Here we create the set of visible characters using a lambda:
// code_grammar_2_9
Alternatively:
// code_grammar_2_10
Differences can be calculated with operator-:
// code_grammar_2_11
We can also remove individual characters:
// code_grammar_2_12