⚙️Normalization Functions
A set of functions to normalize Arabic text.
Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it.
Table of Contents
normalizeLigature
Syntax
Description
Normalize Lam Alef ligatures into two letters (LAM and ALEF), The converted letters into LAM and ALEF are :
LAM ALEF, LAM ALEF HAMZA ABOVE, LAM ALEF HAMZA BELOW, LAM ALEF MADDA ABOVE.
Parameters
text
(string) The input text containing ligatures.
Returns
string
The normalized text with ligatures converted.
Example
normalizeHamza
Syntax
Description
Normalizes the Hamzat in the input word, into one form of hamza(replace Madda by hamza and alef. Replace the LamAlefs by simplified letters.). The method of normalization can be specified as "uniform" or "tasheel".
Parameters
word
(string) The input word containing hamza characters.method
(string, optional) The method of normalization. Default is "uniform
". Can be "uniform
", "tasheel
", "تسهيل
".
Returns
string
(string): The normalized word with hamza characters converted.
Example
normalizeTeh
Syntax
Description
Replaces teh marbuta characters in the input text with their corresponding heh characters.
Parameters
text
(string) The input text containing teh marbuta characters.
Returns
string
The normalized text with teh marbuta characters replaced.
Example
normalizeAlef
Syntax
Description
Replaces all Alefs to ALEF MAMDODA with the exception of Alef maksura
Parameters
text
(string) The input text containing alef characters.
Returns
string
The normalized text with alef characters replaced.
Example
Last updated