NEWS

LIGHT-WEIGHT, SEMI CONTEXT-FREE, RULE-BASED ARABIC TEXT CLASSIFIER FOR POS TAGGING


(Received: 16-Jun.-2025, Revised: 12-Jul.-2025 and 12-Aug.-2025 , Accepted: 13-Sep.-2025)
In this research, we address the challenges associated with part-of-speech (POS) tagging and morphological classification of Arabic text where word structure is the subject of study.. Our focus is on Classical Arabic (CA) and Modern Standard Arabic (MSA), where the text is typically vocalized and includes diacritics on most letters. Our proposed classification method does not require a lexicon, stemming processes, or artificial-intelligence techniques. The goal is to minimize the resources needed for classifying Arabic text. This method is based on the principle that each verb in the Arabic language adheres to a specific pattern, we refere to as (wazn نزو or tafʿīl ليعفت), that can be utilized to identify a word. The classification process is governed by a finite state machine, which is translated into regular expressions. Each verb tense is represented by a set of regular expressions (REs). The order in which these regular expressions are processed is crucial for the accuracy of the results. Whenever a match is found, the word is marked to prevent further matches. The proposed method is lightweight and functions as a best-effort classifier, assigning the closest match as a tag. In terms of performance, the proposed classifier's execution time is linear and does not require high processing capabilities.

[1] A. Farghaly and K. Shaalan, "Arabic Natural Language Processing: Challenges and Solutions," ACMTransactions on Asian Language Information Processing, vol. 8, no. 4, pp. 1–22, Dec. 2009.

[2] M. Maamouri and A. Bies, "Developing an Arabic Treebank: Methods, Guidelines, Procedures andTools," Proc. of the Workshop on Computational Approaches to Arabic Script-based Languages, pp. 2–9, Geneva, Switzerland, Aug. 2004.

[3] M. Eid, Alnaho Almosaffa, النحو المصفى , vol. 1, ISBN: 9772324660, Deposit 1975/4427, Alshabab Libraryat Cario, 1971.

[4] B.-E. A. Ibn Aqeel, Sharh Ibn ‘Aqeel’ Ala Alfiyyah Ibn Malik, 1st Edn., vol. 1, Alresalah Center forHeritage Studies , Cairo, 1962.

[5] B. Weiss, "A Theory of the Parts of Speech in Arabic (Noun, Verb and Particle): A Study in ‘ilm al-wad,"Arabica, vol. 23, no. 1, pp. 23–36, Feb. 1976, Accessed: Feb. 04, 2024.

[6] A. Alosaimy and E. Atwell, "Tagging Classical Arabic Text Using Available Morphological Analysersand Part of Speech Taggers," JLCL, vol. 32, no. 1, pp. 1–26, 2017.

[7] Y.-S. Lee, K. Papineni, S. Roukos, O. Emam and H. Hassan, "Language Model-based Arabic WordSegmentation," Proc. of the 41st Annual Meeting on Association for Computational Linguistics (ACL ’03), pp. 399–406, DOI: 10.3115/1075096.1075147, Morristown, NJ, USA, 2003.

[8] N. Habash and O. Rambow, "Arabic Tokenization, Part-of-speech Tagging and MorphologicalDisambiguation in One Fell Swoop," Proc. of the 43rd Annual Meeting on Association for Computational Linguistics (ACL ’05), pp. 573–580, Ann Arbor, Ed., Morristown, NJ, USA, Jun. 2005.

[9] E. Mohamed et al., "Is Arabic Part of Speech Tagging Feasible Without Word Segmentation?" TheAssociation for Computational Linguistics, pp. 704–708, doi: 10.13140/2.1.3631.8402, 2010.

[10] E. Mohamed and S. K. Ubler, "Arabic Part of Speech Tagging," Proc. of the 7th Int. Conf. on LanguageResources and Evaluation (LREC’10), pp. 2537–2543, [Online], Available: http://www.lrecconf.org/proceedings/lrec2010/pdf/384_Paper.pdf, 2010, Accessed: Feb. 04, 2024.

[11] S. Khoja, "APT: Arabic Part-of-speech Tagger," Proc. of the Student Workshop at the Second Meetingof the North American Chapter of the Association for Computational Linguistics, Carnegie Mellon University, Pennsylvania, 2001.

[12] Y. O. Mohamed et al., "Arabic Part-of-speech Tagging Using the Sentence Structure," Proc. of the 2ndInt. Conf. on Arabic Language Resources and Tools, pp. 241–245, Cairo, Egypt, 2009.

[13] Y. O. M. Elhadj, "Statistical Part-of-speech Tagger for Traditional Arabic Texts," Journal of ComputerScience, vol. 5, no. 11, pp. 794–800, 2009.

[14] M. Hjouj, A. Alarabeyyat and I. Olab, "Rule-based Approach for Arabic Part of Speech Tagging andName Entity Recognition," Int. J. of Advanced Computer Science and Applications, vol. 7, no. 6, 2016.

[15] S. Algahtani, W. Black and J. McNaught, "Arabic Part-of-speech Tagging Using Transformation-basedLearning," Proc of the 2nd Int. Conf. on Arabic Language Resources and Tools, Cairo, Egypt: The MEDAR Consortium, Apr. 2009.

[16] I. Zeroual et al., "Towards a Standard Part of Speech Tagset for the Arabic Language," Journal of KingSaud University - Computer and Information Sciences, vol. 29, no. 2, pp. 171–178, Apr. 2017.

[17] M. Tarawneh and E. AlShawakfa, "A Hybrid Approach for Indexing and Searching the Holy Quran,"Jordanian Journal of Computers and Information Technology (JJCIT), vol. 1, no. 1, p. 41, 2015.

[18] T. Zerrouki and A. Balla, "Tashkeela: Novel Corpus of Arabic Vocalized Texts, Data for Auto-diacritization Systems," Data in Brief, vol. 11, pp. 147–151, DOI: 10.1016/j.dib.2017.01.011, Apr. 2017.

[19] B. I. Alqudah, "Context-free Rule-based Arabic Text Tagger and Classifier," [Online], Available:http://bilal-qudah.com/arabic/index.php, [Accessed May 5, 2025].

[20] O. Obeid et al., "CAMeL Tools: An Open Source Python Toolkit for Arabic Natural LanguageProcessing," Proc. of the 12th Language Resources and Evaluation Conf., pp. 7022–7032, Marseille, France, 2020, Accessed: Jun. 16, 2025.

[21] K. Darwish et al., "Multi-dialect Arabic POS Tagging: A CRF Approach," Proc. of the 11th Int. Conf. onLanguage Resources and Evaluation (LREC 2018), pp. 93–98, Miyazaki, Japan, May 2018.

[22] Arabic Language Technology Group, "Farasa POS Tagger," [Online]. Available:https://farasa.qcri.org/POS/ 2020, Accessed: Jun. 16, 2025.

[23] M. Maamouri, A. Bies, T. Buckwalter and H. Jin, "Arabic Treebank: Part 3 v 1.0, LDC2004T11,"Linguistic Data Consortium, The Trustees of the University of Pennsylvania, DOI: https://doi.org/10.35111/jf6e-hm83, Accessed: May 21, 2004.