⚙️Normalization Functions

A set of functions to normalize Arabic text.

Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it.

-wikipedia

Table of Contents

normalizeLigature

Syntax

normalizeLigature(text)

Description

Normalize Lam Alef ligatures into two letters (LAM and ALEF), The converted letters into LAM and ALEF are :

LAM ALEF, LAM ALEF HAMZA ABOVE, LAM ALEF HAMZA BELOW, LAM ALEF MADDA ABOVE.

Parameters

  • text (string) The input text containing ligatures.

Returns

  • string The normalized text with ligatures converted.

Example

const text = "لانها لالء الاسلام";
const normalizedLigature = normalizeLigature(text);
console.log(normalizedLigature); // "لانها لالئ الاسلام"

normalizeHamza

Syntax

normalizeHamza(text)

Description

Normalizes the Hamzat in the input word, into one form of hamza(replace Madda by hamza and alef. Replace the LamAlefs by simplified letters.). The method of normalization can be specified as "uniform" or "tasheel".

Parameters

  • word (string) The input word containing hamza characters.

  • method (string, optional) The method of normalization. Default is "uniform". Can be "uniform", "tasheel", "تسهيل".

Returns

  • string (string): The normalized word with hamza characters converted.

Example

const text = "جاء سؤال الأئمة عن الإسلام آجلا";
const normalizedHamza = normalizeHamza(text);
const normalizedHamzaWithTasheel = normalizeHamza(text, methid = "tasheel");
console.log(normalizedHamza); // "جاء سءال الءءمة عن الءسلام ءءجلا"
console.log(normalizedHamzaWithTasheel); //جاء سوال الايمة عن الاسلام ا

normalizeTeh

Syntax

normalizeTeh(text)

Description

Replaces teh marbuta characters in the input text with their corresponding heh characters.

Parameters

  • text (string) The input text containing teh marbuta characters.

Returns

  • string The normalized text with teh marbuta characters replaced.

Example

const text = "سيارة";
const normalizedTeh = normalizeTeh(text);
console.log(normalizedTeh); // "سياره"

normalizeAlef

Syntax

normalizeAlef(text)

Description

Replaces all Alefs to ALEF MAMDODA with the exception of Alef maksura

Parameters

  • text (string) The input text containing alef characters.

Returns

  • string The normalized text with alef characters replaced.

Example

const text = "ألأيمان";
const normalizedAlef = normalizeAlef(text);
console.log(normalizedAlef); // "الايمان"

Last updated