Text classification is the process of automatically tagging a textual document with the most relevant set of labels.
The aim of this work is to automatically tag an input document based on its vocabulary features. To achieve this
goal, two large datasets have been constructed from various Arabic news portals. The first dataset consists of
90k single-labeled articles from 4 domains (Business, Middle East, Technology and Sports). The second dataset
has over 290k multi-tagged articles. The datasets shall be made freely available to the research community on
Arabic computational linguistics. To examine the usefulness of both datasets, we implemented an array of ten
shallow learning classifiers. In addition, we implemented an ensemble model to combine best classifiers together
in a majority-voting classifier. The performance of the classifiers on the first dataset ranged between 87.7%
(Ada-Boost) and 97.9% (SVM). Analyzing some of the misclassified articles confirmed the need for a multi-label
opposed to single-label categorization for better classification results. We used classifiers that were compatible
with multi-labeling tasks, such as Logistic Regression and XGBoost. We tested the multi-label classifiers on the
second larger dataset. A custom accuracy metric, designed for the multi-labeling task, has been developed for
performance evaluation along with hamming loss metric. XGBoost proved to be the best multi-labeling
classifier, scoring an accuracy of 91.3%, higher than the Logistic Regression score of 87.6%.
 A. Elnagar and O. Einea, "BRAD 1.0: Book Reviews in Arabic dataset," Proc. of the IEEE/ACS 13th
International Conference of Computer Systems and Applications (AICCSA), pp. 1-8, DOI:
10.1109/AICCSA.2016.7945800, Agadir, Morocco, 2016.
 A. Elnagar, Y. Khalifa and A. Einea, "Hotel Arabic-reviews Dataset Construction for Sentiment
Analysis Applications," Book Chapter in Intelligent Natural Language Processing: Trends and
Applications, pp. 35-52, DOI: 10.1007/978-3-319-67056-0_3, 2017.
 A. Elnagar, L. Lulu and O. Einea, "An Annotated Huge Dataset for Standard and Colloquial Arabic
Reviews for Subjective Sentiment Analysis," Procedia Computer Science, vol. 142, pp. 182-189, 2018.
 N. Boudad, R. Faizi, R. O. Thami and R. Chiheb, "Sentiment Analysis in Arabic: A Review of the
Literature," Ain Shams Engineering Journal, vol. 9, pp. 2479-2490, 2017.
 A. Dahou, S. Xiong, J. Zhou, M. H. Haddoud and P. Duan, "Word Embeddings and Convolutional
Neural Network for Arabic Sentiment Classification," Proceedings of the 26th International Conference
on Computational Linguistics: Technical Papers (COLING), pp. 2418–2427, Osaka, Japan, 2016.
 H. Almuaidi, S. Alqrainy and A. Ayesh, "Automated Tagging System and Tagset Design for Arabic
Text," International Journal of Computational Linguistics Research, vol. 1, pp. 55-62, 2010.
 A. Al-Alwani and M. Beseiso, "Arabic Spam Filtering Using Bayesian Model," International Journal of
Computer Applications, vol. 79, pp. 11-14, 2013.
 Y. Li, X. Nie and R. Huang, "Web Spam Classification Method Based on Deep Belief Networks,"
Expert Syst. Appl., vol. 96, pp. 261-270, 2018.
 S. Malmasi and M. Dras, "Language Identification Using Classifier Ensembles," Proceedings of the
Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects,
Association for Computational Linguistics, pp. 35–43, Hissar, Bulgaria, 2015.
 M. El-Haj, P. Rayson and M. Aboelezz, "Arabic Dialect Identification in the Context of Bivalency and
Code-Switching," Proceedings of the 11th International Conference on Language Resources and 280
Jordanian Journal of Computers and Information Technology (JJCIT), Vol. 06, No. 03, September 2020.
Evaluation (LREC 2018), European Language Resources Association (ELRA), pp. 3622-3627,
Miyazaki, Japan, 2018.
 N. Y. Habash, Introduction to Arabic Natural Language Processing, Synthesis Lectures on Human
Language Technologies, Edited by Graeme Hirst, [Online], Available:
 S. C. Dharmadhikari, M. Ingle and P. Kulkarni, "Empirical Studies on Machine Learning Based Text
Classification Algorithms," Advanced Computing: An International Journal, vol. 2, pp. 161-169, 2011.
 C. C. Aggarwal and C. Zhai, "A Survey of Text Classification Algorithms," Mining Text Data, pp. 163-
 V. Korde and C. N. Mahender, "Text Classification and Classifiers: A Survey," International Journal of
Artificial Intelligence & Applications, vol. 3, pp. 85-99, 2012.
 I. Hmeidi, M. Al-Ayyoub, N. A. Abdulla, A. A. Almodawar, R. Abooraig and N. A. Mahyoub,
"Automatic Arabic Text Categorization: A Comprehensive Comparative Study," Journal of Information
Science, vol. 41, no. 1, pp. 114-124, 2015.
 A. M. Sbou, "A Survey of Arabic Text Classification Models," International Journal of Informatics and
Communication Technology, vol. 8, pp. 25-28, 2019.
 M. Saad and W. Ashour, "Arabic Text Classification Using Decision Tree," Proc. of the 12th
International Workshop on Computer Science and Information Technologies ( &