⚙️Normalization Functions

A set of functions to normalize Arabic text.

Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it.

-wikipedia

Table of Contents

normalizeLigature

Syntax

normalizeLigature(text)

Description

Normalize Lam Alef ligatures into two letters (LAM and ALEF), The converted letters into LAM and ALEF are :

LAM ALEF, LAM ALEF HAMZA ABOVE, LAM ALEF HAMZA BELOW, LAM ALEF MADDA ABOVE.

Parameters

  • text (string) The input text containing ligatures.

Returns

  • string The normalized text with ligatures converted.

Example

normalizeHamza

Syntax

Description

Normalizes the Hamzat in the input word, into one form of hamza(replace Madda by hamza and alef. Replace the LamAlefs by simplified letters.). The method of normalization can be specified as "uniform" or "tasheel".

Parameters

  • word (string) The input word containing hamza characters.

  • method (string, optional) The method of normalization. Default is "uniform". Can be "uniform", "tasheel", "تسهيل".

Returns

  • string (string): The normalized word with hamza characters converted.

Example

normalizeTeh

Syntax

Description

Replaces teh marbuta characters in the input text with their corresponding heh characters.

Parameters

  • text (string) The input text containing teh marbuta characters.

Returns

  • string The normalized text with teh marbuta characters replaced.

Example

normalizeAlef

Syntax

Description

Replaces all Alefs to ALEF MAMDODA with the exception of Alef maksura

Parameters

  • text (string) The input text containing alef characters.

Returns

  • string The normalized text with alef characters replaced.

Example

Last updated