(Received: 6-Jun.-2022, Revised: 9-Aug.-2022 , Accepted: 27-Aug.-2022)
Intelligent systems powered by artificial intelligence techniques have been massively proposed to help humans in performing various tasks. The intelligent personal assistant (IPA) is one of these smart systems. In this paper, we present an attempt to create an IPA that interacts with users via Tunisian Arabic (TA) (the colloquial form used in Tunisia). We propose and explore a simple-to-implement method for building the principal components of a TA IPA. We apply deep-learning techniques: CNN [1], RNN encoder-decoder [2] and end-to-end approaches for creating IPA speech components (speech recognition and speech synthesis). In addition, we explore the availability and free-dialog platform for understanding and generating the suitable response in TA for a request. For this proposal, we create and use TA transcripts for generating the corresponding models. Evaluation results are acceptable for the first attempt.

[1] K. O’Shea and R. Nash, "An Introduction to Convolutional Neural Networks," CoRR, vol. abs/1511.0, 2015.

[2] A. Hannun et al., "Deep Speech : Scaling up End-to-end Speech Recognition," arXiv1412.5567v2 [cs. CL], pp. 1–12, 2014.

[3] I. Lopatovska, "Overview of the Intelligent Personal Assistants," Ukr. J. Libr. Inf. Sci., no. 3, pp. 72–79, DOI: 10.31866/2616-7654.3.2019.169669, 2019.

[4] K. Jokinen and M. McTear, "Spoken Dialogue Systems," Synthesis Lectures on Human Lang. Technolo., Synthesis., Morgan & Claypool Publishers, DOI: 10.2200/S00204ED1V01Y200910HLT005, 2010.

[5] N. Goksel-Canbek and M. E. Mutlu, "On the Track of Artificial Intelligence: Learning with Intelligent Personal Assistants," Int. J. Hum. Sci., vol. 13, no. 1, pp. 592–601, DOI: 10.14687/ijhs.v13i1.3549, 2016.

[6] A. V. Román, D. P. Martínez, Á. L. Murciego, D. M. Jiménez-Bravo and J. F. de Paz, "Voice Assistant Application for Avoiding Sedentarism in Elderly People Based on IoT Technologies," Electronics, vol. 10, no. 980, 2021.

[7] Y. Matsuyama, A. Bhardwaj, R. Zhao, O. J. Romero, S. A. Akoju and J. Cassell, "Socially-aware Animated Intelligent Personal Assistant Agent," Proc. of the 17th Annual Meeting in Special Interest Group on Discourse and Dialogue (SIGDIAL 2016) , pp. 224–227, DOI: 10.18653/v1/w16-3628, 2016.

[8] J. Santos, J. J. P. C. Rodrigues, B. M. C. Silva, J. Casal, K. Saleem and V. Denisov, "An IoT-based Mobile Gateway for Intelligent Personal Assistants on Mobile Health Environments," J. Netw. Comput. Appl., vol. 71, pp. 194–204, DOI: 10.1016/j.jnca.2016.03.014, 2016.

[9] M. T. Talacio, Development of an Intelligent Personal Assistant to Empower Operators in Industry 4.0 Environments, M.Sc. Thesis, School of Technology and Management of Bragança. University of Bragança, 2020. 

[10] E. Balcı, "Overview of Intelligent Personal Assistants," Acta INFOLOGICA, vol. 3, no. 1, pp. 22–33, DOI: 10.26650/acin.454522, 2019.

[11] K. Zdanowski, "Language Support in Voice Assistants Compared," Summa Linguae Technologies, Accessed on: Aug. 01, 2022, [Online], Available:, 2021.

[12] I. Zribi, R. Boujelbane, A. Masmoudi, M. Ellouze, L. Belguith and N. Habash, "A Conventional Orthography for Tunisian Arabic," Proc. of the 9th International Conference on Language Resources and Evaluation (LREC’14), vol. Proc., pp. 2355–2361, Reykjavik, Iceland, 2014.

[13] A. Bouzemni, "Linguistic Situation in Tunisia: French and Arabic code switching," INTERLINGUISTICA, pp. 217–223, 2005.

[14] I. Zribi, M. Ellouze, L. H. Belguith and P. Blache, "Spoken Tunisian Arabic Corpus ‘STAC’: Transcription and Annotation," Resarch in Computing Science, vol. 90, pp. 123–135, 2015.

[15] I. Zribi, M. Ellouze, L. H. Belguith and P. Blache, "Morphological Disambiguation of Tunisian Dialect," J. King Saud Univ.-Comput. Inf. Sci., vol. 29, no. 2, pp. 147–155, 2017.

[16] H. Chen, X. Liu, D. Yin and J. Tang, "A Survey on Dialogue Systems: Recent Advances and New Frontiers," arXiv:1711.01731v3, no. 1, 2018.

[17] H. B. Hashemi, A. Asiaee and R. Kraft, "Query Intent Detection Using Convolutional Neural Networks," WSDM QRUMS 2016 Workshop, DOI: 10.1145/1235, 2016.

[18] K. Sreelakshmi, P. C. Rafeeque, S. Sreetha and E. S. Gayathri, "Deep Bi-directional LSTM Network for Query Intent Detection," Procedia Computer Science, vol. 143, pp. 939–946, 2018.

[19] A. Deoras and R. Sarikaya, "Deep Belief Network Based Semantic Taggers for Spoken Language Understanding," Proc. Interspeech 2013, pp. 2713-2717, DOI: 10.21437/Interspeech.2013-623, 2013.

[20] P. S. Huang, X. He, J. Gao, L. Deng, A. Acero and L. Heck, "Learning Deep Structured Semantic Models for Web Search Using Clickthrough Data," Proc. of the 22nd ACM Int. Conf. on Information & Knowledge Management (CIKM '13), pp. 2333–2338, DOI: 10.1145/2505515.2505665, 2013.

[21] W. A. Abro, A. Aicher, N. Rachb, S. Ultes, W. Minker and G. Qi, "Natural Language Understanding for Argumentative Dialogue Systems in the Opinion Building Domain," Knowledge-Based Syst., vol. 242, DOI: 10.1016/j.knosys.2022.108318, 2022.

[22] J. D. Williams, "Web-style Ranking and SLU Combination for Dialog State Tracking," Proc. of the 15th Annu. Meet. Spec. Interes. Gr. Discourse Dialogue (SIGDIAL 2014), pp. 282–291, DOI: 10.3115/v1/w14-4339, 2014.

[23] S. Sharma, P. K. Choubey and R. Huang, "Improving Dialogue State Tracking by Discerning the Relevant Context," Proc. of the Conf. of the North American Chapter of the Association for Computational Linguistics: Human Lang. Technolog. (NAACL HLT 2019), vol. 1, DOI: 10.18653/v1/n19-1057, 2019.

[24] Q. Xie, K. Sun, S. Zhu, L. Chen and K. Yu, "Recurrent Polynomial Network for Dialogue State Tracking with Mismatched Semantic Parsers," Proc. of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 295–304, DOI: 10.18653/v1/w15-4641, Prague, Czech Republic, 2015.

[25] Z. Yan, N. Duan, P. Chen, M. Zhou, J. Zhou and Z. Li, "Building Task-oriented Dialogue Systems for Online Shopping," Proc. of the 31st AAAI Conf. on Artificial Intell. (AAAI-17), pp. 4618-4625, 2017.

[26] H. Cuayáhuitl, S. Keizer and O. Lemon, "Strategic Dialogue Management via Deep Reinforcement Learning," arXiv:1511.08099v1, pp. 1–10, 2015.

[27] A. Stent, R. Prasad and M. Walker, "Trainable Sentence Planning for Complex Information Presentation in Spoken Dialog Systems," Proc. of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04), pp. 79-86, DOI: 10.3115/1218955.1218966, Barcelona, Spain, 2004.

[28] T. H. Wen et al., "Stochastic Language Generation in Dialogue Using Recurrent Neural Networks with Convolutional Sentence Reranking," Proc. of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 275-284, DOI: 10.18653/v1/w15-4639, Prague, Czech Republic, 2015.

[29] T. H. Wen, M. Gašić, N. Mrkšić, P. H. Su, D. Vandyke and S. Young, "Semantically Conditioned LSTM-based Natural Language Generation for Spoken Dialogue Systems," Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1711–1721, DOI: 10.18653/v1/d15-1199, 2015.

[30] H. Zhou, M. Huang and X. Zhu, "Context-aware Natural Language Generation for Spoken Dialogue Systems," Proc. of the 26th Int. Conf. on Computational Linguistics: Technical Papers, pp. 2032–2041, Osaka, Japan, 2016.

[31] O. Dušek and F. Jurcicek, "Sequence-to-sequence Generation for Spoken Dialogue via Deep Syntax Treesand Strings," Proc. of the 54th Annual Meeting of the Association for Computational Linguistics, vol. 2: Short Papers, pp. 45-51, DOI: 10.18653/v1/p16-2008, Berlin, Germany, 2016.

[32] T. H. Wen and S. Young, "Recurrent Neural Network Language Generation for Spoken Dialogue Systems," Computer Speech & Language, vol. 63, DOI: 10.1016/j.csl.2019.06.008, 2020.

[33] T. H. Wen et al., "A Network-based End-to-end Trainable Task-oriented Dialogue System," Proc. of the 15th Conf. of the European Chapter of the Association for Computational Linguistics, vol. 1: Long Papers, pp. 438–449, Valencia, Spain, April 3-7, 2017.

[34] A. Bordes, Y. Lan Boureau and J. Weston, "Learning End-to-end Goal-oriented Dialog," Proc. of the 5th Int. Conf. Learn. Represent. (ICLR 2017), 2017.

[35] C. Li, L. Li and J. Qi, "A Self-attentive Model with Gate Mechanism for Spoken Language Understanding," Proc. of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 3824–3833. DOI: 10.18653/v1/D18-1417, 2018.

[36] T. Bocklisch, J. Faulkner, N. Pawlowski and A. Nichol, "Rasa: Open Source Language Understanding and Dialogue Management," Proc. of NIPS 2017 Conversational AI Workshop, pp. 1–9, Long Beach, USA, 2017.

[37] B. A. Shawar, "A Chatbot as a Natural Web Interface to Arabic Web QA," Int. J. Emerg. Technol. Learn. (iJET), vol. 6, no. 1, pp. 37-43, DOI: 10.3991/ijet.v6i1.1502, 2011.

[38] S. M. Yassin and M. Z. Khan, "SeerahBot: An Arabic Chatbot about Prophet’s Biography," Int. J. Innov. Res. Comput. Sci. Technol. (IJIRCST), vol. 9, no. 2, DOI: 10.21276/ijircst.2021.9.2.13, 2021.

[39] D. Abu Ali and N. Habash, "Botta : An Arabic Dialect Chatbot," Proc. of the 26th Int. Conf. on Comput. Linguist.: Sys. Demonstrat. (COLING 2016), pp. 208–212, Osaka, Japn, 2016.

[40] D. Al-ghadhban and N. Al-twairesh, "Nabiha : An Arabic Dialect Chatbot," Int. J. of Advanced Computer Sci. and App. (IJACSA) vol. 11, no. 3, pp. 452–459, 2020.

[41] A. A. Abdelhamid, H. Alsayadi, I. Hegazy and Z. T. Fayed, "End-to-end Arabic Speech Recognition: A Review," Proc. of the 19th Conf. of Language Engineering (ESOLEC’19), Bibliotheca Alexandrina, 2020.

[42] A. M. Dammak, "Approche Hybride Pour la Reconnaissance Automatique de la Parole Pour la Langue Arabe," Environnements Informatiques pour l'Apprentissage Humain, Université du Maine, Français, ⟨NNT : 2016LEMA1040⟩, 2016.

[43] S. Dua et al., "Developing a Speech Recognition System for Recognizing Tonal Speech Signals Using a Convolutional Neural Network," Appl. Sci., vol. 12, no. 12, p. 6223, DOI: 10.3390/app12126223, 2022.

[44] A. Y. Hannun, D. Jurafsky, A. L. Maas and A. Y. Ng, "First-pass Large Vocabulary Continuous Speech Recognition Using Bi-directional Recurrent DNNs," arXiv 1408 . 2873v2 [ cs . CL ], pp. 1–7, 2014.

[45] Y. Peng and K. Kao, "Speech to Text System: Pastor Wang Mandarin Bible Teachings (Speech Recognition)," CS230: Deep Learning, Stanford Univ., CA., 2020.

[46] N. Zeghidour et al., "Fully Convolutional Speech Recognition," arXiv:1812.06864v2, pp. 25–29, 2019.

[47] A. Agarwal and T. Zesch, "German End-to-end Speech Recognition Based on DeepSpeech," Proc. of the 15th Conf. on Natural Language Processing (KONVENS 2019), pp. 111-119, 2019.

[48] V. Pratap et al., "Wav2Letter++: The Fastest Open-source Speech Recognition System," Proc. of the2019 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 2–6, Brighton, UK, 2018.

[49] S. Qin, L. Wang, S. Li, J. Dang and L. Pan, "Improving Low-resource Tibetan End-to-end ASR by Multilingual and Multilevel Unit Modeling," Eurasip J. Audio, Speech, Music Process., vol. 2022, no. 1, DOI: 10.1186/s13636-021-00233-4, 2022.

[50] L. Lamel and J. Gauvain, "Automatic Speech-to-text Transcription in Arabic," ACM Transactions on Asian Language Information Processing, vol. 8, no. 4, DOI: 10.1145/1644879.1644885, 2009.

[51] M. Elshafei and H. Al-Muhtaseb, "Speaker-independent Natural Arabic Speech Recognition System," Proc. of the Int. Conf. on Intelligent Systems., [Online], Available:
ion/303873329_Natural_speaker_independent_arabic_speech_recognition_system_based_on_HMM_using_sphinx_tools, 2010.

[52] A. Ben Ltaief, Y. Estève, M. Graja and Lamia Hadrich Belguith, "Automatic Speech Recognition for Tunisian Dialect," Language Resources and Evaluation, vol. 52, no. 1, pp.249-267, DOI: 10.1007/s10579-017-9402-y, hal-01592416, 2018.

[53] A. Masmoudi, M. Ellouze Khmekhem, Y. Esteve, L. Hadrich Belguith and N. Habash, "A Corpus and Phonetic Dictionary for Tunisian Arabic Speech Recognition," Proc. of the 9th Int. Conf. Lang. Resour. Eval., vol. 3, no. 1, pp. 306–310, 2014.

[54] A. Messaoudi, H. Haddad, C. Fourati et al., "Tunisian Dialectal End-to-end Speech Recognition Based on DeepSpeech," Procedia Comput. Sci., vol. 189, pp. 183–190, DOI: 10.1016/j.procs.2021.05.082, 2021.

[55] S. N. Kayte, M. Mundada, S. Gaikwad and B. Gawali, "Performance Evaluation of Speech SynthesisTechniques for English Language," Adv. Intell. Syst. Comput., vol. 439, no. June, pp. 253–262, 2016.

[56] C. Quillen, "Autoregressive HMM Speech Synthesis," Proc. of the 2012 IEEE Int.l Conf. on Acoustics, Speech and Signal Processing (ICASSP), DOI: 10.1109/ICASSP.2012.6288800, Kyoto, Japan, 2012.

[57] M. Shannon and W. Byrne, "Autoregressive HMMs for Speech Synthesis," Proc. of the 10th Int. Conf. of the Int. Speech Comm. Associa. (Interspeech 2009), DOI: 10.21437/interspeech.2009-135, 2009.

[58] S. Roekhaut, S. Brognaux, R. Beaufort and T. Dutoit, "eLite-HTS: A NLP Tool for French HMM-based Speech Synthesis," Proc. Annu. Conf. Int. Speech Commun. Assoc. (Interspeech 2014), Singapore, 2014.

[59] S. Le Maguer, N. Barbot and O. Boeffard, "Evaluation of Contextual Descriptors for HMM-based Speech Synthesis in French," Proc. of the 8th ISCA Work, Speech Synth., HAL Id: hal-00987809, version 1, 2013.

[60] K. M. Khalil and C. Adnan, "Arabic Speech Synthesis Based on HMM," Proc. of the 15th IEEE Int. Multi-Conf. on Systems, Sig. & Devic. (SSD), DOI: 10.1109/SSD.2018.8570388, Hammamet, Tunisia, 2018.

[61] A. Amrouche, A. Abed and L. Falek, "Arabic Speech Synthesis System Based on HMM," Proc. of the 6th IEEE Int. Conf. on Electrical and Electronics Eng. (ICEEE), DOI: 10.1109/ICEEE2019.2019.0022, Istanbul, Turkey, 2019.

[62] H. Al Masri and M. E. Za’ter, "Arabic Text-to-speech (TTS) Data Preparation," arXiv:2204.03255v1, [Online], Available:, 2022.

[63] K. M. Khalil and C. Adnan, "Arabic HMM-based Speech Synthesis," Proc. of the IEEE 2013 Int. Conf. on Electri. Eng. and Soft. Appl., DOI: 10.1109/ICEESA.2013.6578437, Hammamet, Tunisia, 2013.

[64] F. K. Fahmy, M. I. Khalil and H. M. Abbas, "A Transfer Learning End-to-end Arabic Text-to-speech (TTS) Deep Architecture, " arXiv:2007.11541v1 [eess.AS], 2020.

[65] A. van den Oord et al., "WaveNet: A Generative Model for Raw Audio Based on PixelCNN Architecture," arXiv:1609.03499, 2016.

[66] S. Arik et al., "Deep Voice: Real-time Neural Text-to-speech," Proc. of the 34th Int. Conf. Mach. Learn. (ICML 2017), vol. 1, no. Icml, pp. 264–273, 2017.

[67] Y. Wang et al., "Tacotron: Towards End-to-end Speech Synthesis," arXiv:1703.10135v2, pp. 1–10, 2017.

[68] I. Sutskever, O. Vinyals and Q. V. Le, "Sequence to Sequence Learning with Neural Networks," arXiv:1409.3215, 2014.

[69] J. Shen et al., "Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions," DOI: 10.1109/ICASSP.2018.8461368, Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018.

[70] J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho and Y. Bengio, "Attention-based Models for Speech Recognition," arXiv:1506.07503, 2015.

[71] I. Hadj Ali, Z. Mnasri and Z. Lachiri, "DNN-based Grapheme-to-phoneme Conversion for Arabic Text-to-speech Synthesis," Int. J. Speech Technol., vol. 23, pp. 569–584, DOI: 10.1007/s10772-020-09750-7, 2020.

[72] A. Abdelali, N. Durrani, C. Demiroglu, F. Dalvi, H. Mubarak and K. Darwish, "NatiQ: An End-to-end Text-to-speech System for Arabic," arXiv:2206.07373v1, 2022.

[73] N. Li, S. Liu, Y. Liu et al., "Neural Speech Synthesis with Transformer Network," Proc. of the 33rd AAAI Conf. on Artificial Intelligence (AAAI-19), pp. 6706–6713. DOI: 10.1609/aaai.v33i01.33016706, 2019.

[74] D. Amodei et al., "Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin," arXiv 1512 . 02595v1 [ cs . CL ], pp. 1–28, 2015.

[75] A. Graves, S. Fernandez, F. Gomez and J. Schmidhuber, "Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks," Proc. of the 23rd Int. Conf. on Machine Learning (ICML '06), pp. 369-376, DOI: 10.1145/1143844.1143891, 2006.

[76] K. Heafield, "KenLM: Faster and Smaller Language Model Queries," Proc. of the 6th Workshop on Statistical Machine Translation, pp. 187–197, Edinburgh, Scotland, 2011.

[77] A. Mekki, I. Zribi, M. Ellouze and L. H. Belguith, "Sentence Boundary Detection of Various Forms of Tunisian Arabic," Language Resources and Evaluation, vol. 56, pp. 357–385, DOI: 10.1007/s10579-021-09538-4, 2022.

[78] N. Thi, M. Trang and M. Shcherbakov, "Enhancing Rasa NLU Model for Vietnamese Chatbot," Int. J. of Open Information Technologies (INJOIT), vol. 9, no. 1, pp. 31–36, 2021.

[79] Y. Windiatmoko, A. F. Hidayatullah and R. Rahmadi, "Developing FB Chatbot Based on Deep Learning Using RASA Framework for University Enquiries," CoRR, vol. abs/2009.1, [Online], Available:, 2020.

[80] V. Vlasov, J. E. M. Mosig and A. Nichol, "Rasa Open Source Documentation," RASA DOCS, [Online], available:, 2022.

[81] T. Bunk et al., "DIET: Lightweight Language Understanding for Dialogue Systems," arXiv:2004.09936v3, [Online], Available:, 2020.

[82] A. Chernyavskiy, D. Ilvovsky and P. Nakov, "Transformers: ‘The End of History’ for Natural Language Processing?," arXiv.2105.00813, [Online], Available:, 2021.

[83] N. Habash, A. Soudi and T. Buckwalter, "On Arabic Transliteration," Arabic Computational Morphology, Part of the Text, Speech and Language Technology Book Series, vol. 38, pp. 15-22, 2007.

[84] S. Hussain, O. A. Sianaki and N. Ababneh, "A Survey on Conversational Agents," Proc. of the Workshopsof the 33rd Int. Conf. on Advanced Information Networking and Applications (Waina-2019), pp. 946-956, DOI: 10.1007/978-3-030-15035-8_93, Matsue, Japan, 2019.