Tokenization Functions
A set of functions to tokenize a given text into sentences or words, with optional conditions and morphological operations.
Tokenization is a way of separating a piece of text into smaller units called tokens.
Table of Contents
tokenize
Tokenizes a given text into words, with optional conditions and morphological operations.
Parameters
text
(String) The input text to be tokenized into words.conditions
(Array | Function ,optional) An array of functions, each representing a condition that a token must meet. If a single function is provided, it will be wrapped in an array.morphs
(Array | Function ,optional) An array of functions, each representing a morphological operation to be applied on the tokens. If a single function is provided, it will be wrapped in an array.
Returns
Array
An array of tokens.
Example
sentenceTokenize
Tokenizes a given text into sentences.
Parameters
text
(String) The input text to be tokenized into sentences.
Returns
Array
An array of sentences.
Example
tokenizeWithLocation
Tokenizes a given text into words and provides the start and end indices of each token in the original text.
Parameters
text
(String) The input text to be tokenized into words.
Returns
Array
An array of objects, each containing a token and its start and end indices in the original text.
Example
Last updated