PREDICTION OF PEOPLE SENTIMENTS ON TWITTER USING MACHINE LEARNING CLASSIFIERS DURING RUSSIAN AGGRESSION IN UKRAINE

(Received: 12-Feb.-2023, Revised: 29-Apr.-2023 , Accepted: 18-May-2023)

Authors Mohammed Rashad Baker *, Yalmaz Najmaldeen Taher, Kamal H. Jihad,

Keywords #Sentiment analysis #Machine learning #Classification algorithm #Imbalanced data classification #Russian aggression in Ukraine

Abstract Social media has become an excellent way to discover people’s thoughts about various topics and situations. In recent years, many studies have focused on social media during crises, including natural disasters or wars caused by individuals. This study examines how people expressed their feelings on Twitter during the Russian aggression on Ukraine. This study met two goals: the collected data was unique and it used Machine Learning (ML) to classify the tweets based on their effect on people’s feelings. The first goal was to find the most relevant hashtags about aggression to locate the dataset. The second goal was to use several well-known ML models to organize the tweets into groups. The experimental results have shown that most of the performed ML classifiers have higher accuracy with a balanced dataset. However, the findings of the demonstrated experiments using data-balancing strategies would not necessarily indicate that all classes would perform better. Therefore, it is essential to highlight the importance of comparing and contrasting the data-balancing strategies employed in Sentiment Analysis (SA) and ML studies, including more classifiers and a more comprehensive range of use cases.

References

[1] A. H. Alamoodi, M. R. Baker, O. S. Albahri, B. B. Zaidan and A. A. Zaidan, "Public Sentiment Analysis and Topic Modeling Regarding COVID-19’s Three Waves of Total Lockdown: A Case Study on Movement Control Order in Malaysia," KSII Trans. Internet Inf. Syst., vol. 16, no. 7, pp. 2169–2190, DOI: 10.3837/tiis.2022.07.003, 2022.

[2] N. Afroz, M. Boral, V. Sharma and M. Gupta, "Sentiment Analysis of COVID-19 Nationwide Lockdown Effect in India," Proc. of the Int. Conf. on Artificial Intelligence and Smart Systems (ICAIS 2021), pp. 561–567, DOI: 10.1109/ICAIS50930.2021.9396038, 2021.

[3] S. Hajrahnur, M. Nasrun, C. Setianingsih and M. A. Murti, "Classification of Posts on Twitter Traffic Jam in the City of Jakarta Using Algorithm C4.5," Proc. of the 2018 Int. Conf. on Signals and Systems (ICSigSys 2018), pp. 294–300, DOI: 10.1109/ICSIGSYS.2018.8372776, 2018.

[4] P. Kostakos, M. Nykanen, M. Martinviita, A. Pandya and M. Oussalah, "Meta-terrorism: Identifying Linguistic Patterns in Public Discourse After an Attack," Proc. of the 2018 IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining (ASONAM 2018), pp. 1079–1083, DOI:10.1109/ASONAM.2018.8508647, 2018.

[5] G. M. Demirci, S. R. Keskin and G. Dogan, "Sentiment Analysis in Turkish with Deep Learning," Proc. of the 2019 IEEE Int. Conf. on Big Data (Big Data 2019), pp. 2215–2221, DOI: 10.1109/BigData47090.2019.9006066, 2019.

[6] J. P. Singh, Y. K. Dwivedi, N. P. Rana, A. Kumar and K. K. Kapoor, "Event Classification and Location Prediction from Tweets during Disasters," Annals of Operations Research, vol. 283, no. 1–2, pp. 737–757, DOI: 10.1007/s10479-017-2522-3, Dec. 2019.

[7] N. H. Khun, T. T. Zin, M. Yokota and H. A. Thant, "Emotion Analysis of Twitter Users on Natural Disasters," Proc. of the 2019 IEEE 8th Global Conf, on Consumer Electronics (GCCE 2019), pp. 342–343, DOI: 10.1109/GCCE46687.2019.9015234, 2019.

[8] U. H. H. Zaki, R. Ibrahim, S. A. Halim, K. A. M. Khaidzir and T. Yokoi, "Sentiflood: Process Model for Flood Disaster Sentiment Analysis," Proc. of the 2017 IEEE Conf. on Big Data and Analytics (ICBDA 2017), vol. 2018-Janua., pp. 37–42, DOI: 10.1109/ICBDAA.2017.8284104, 2018.

[9] S. K. Akpatsa et al., "Sentiment Analysis and Topic Modeling of Twitter Data: A Text Mining Approach to the US-Afghan War Crisis," SSRN Electronic J., DOI: 10.2139/ssrn.4064560, 2022.

[10] E. Lee, F. Rustam, I. Ashraf, P. B. Washington, M. Narra and R. Shafique, "Inquest of Current Situation in Afghanistan under Taliban Rule Using Sentiment Analysis and Volume Analysis," IEEE Access, vol. 10, pp. 10333–10348, DOI: 10.1109/ACCESS.2022.3144659, 2022.

[11] M. Mahiuddin, "Real Time Sentiment Analysis and Opinion Mining on Refugee Crisis," Proc. of the 2019 5th Int. Conf. on Advances in Electrical Engineering (ICAEE 2019), pp. 699–705, DOI: 10.1109/ICAEE48663.2019.8975462, 2019.

[12] A. Alamoodi et al., "Sentiment Analysis and Its Applications in Fighting COVID-19 and Infectious Diseases: A Systematic Review," Expert Systems with Applications, vol. 167, p. 114155, 2020.

[13] G. Assembly, "Aggression against Ukraine: Resolution / Adopted by the General Assembly," United Nations, [Online], Available: https://digitallibrary.un.org/record/3959039?ln=en, 2022.

[14] M. M. Metzger, R. Bonneau, J. Nagler and J. A. Tucker, "Tweeting Identity? Ukrainian, Russian and# Euromaidan," J. of Comparative Economics, vol. 4, no. 1, pp. 16–40, 2016.

[15] J. Driscoll and Z. C. Steinert-Threlkeld, "Social Media and Russian Territorial Irredentism: Some Facts and a Conjecture," Post-Soviet Aff., vol. 36, no. 2, pp. 101–121, Mar. 2020.

[16] R. A. Bryant, P. P. Schnurr and D. Pedlar, "Addressing the Mental Health Needs of Civilian Combatants in Ukraine," The Lancet Psychiatry, vol. 9, no. 5, pp. 346–347, 2022.

[17] E. Elmurngi and A. Gherbi, "Detecting Fake Reviews through Sentiment Analysis Using Machine Learning Techniques," Proc. of the 6th Int. Conf. Data Analytics Detection (DATA Anal. 2017), no. c, pp. 65–72, 2017.

[18] W. F. Al-Sarraj and H. M. Lubbad, "Bias Detection of Palestinian/Israeli Conflict in Western Media: A Sentiment Analysis Experimental Study," Proc. of the 2018 Int. Conf. on Promising Electronic Technologies (ICPET 2018), pp. 98–103, DOI: 10.1109/ICPET.2018.00024, 2018.

[19] N. Öztürk and S. Ayvaz, "Sentiment Analysis on Twitter: A Text Mining Approach to the Syrian Refugee Crisis," Telematics and Informatics, vol. 35, no. 1, pp. 136–147, DOI: 10.1016/j.tele.2017.10.006, 2018.

[20] S. Mansour, "Social Media Analysis of Users' Responses to Terrorism Using Sentiment Analysis and Text Mining," Procedia Computer Science, vol. 140, pp. 95–103, DOI: 10.1016/j.procs.2018.10.297, 2018.

[21] G. A. Ruz, P. A. Henríquez and A. Mascareño, "Sentiment Analysis of Twitter Data During Critical Events through Bayesian Networks Classifiers," Future Generation Computer Systems, vol. 106, pp. 92–104, DOI: 10.1016/j.future.2020.01.005, 2020.

[22] F. Yao and Y. Wang, "Domain-specific Sentiment Analysis for Tweets during Hurricanes (DSSA-H): A Domain-adversarial Neural-network-based Approach," Computers, Environment and Urban Systems, vol. 83, DOI: 10.1016/j.compenvurbsys.2020.101522, 2020.

[23] A. Squicciarini, A. Tapia and S. Stehle, "Sentiment Analysis during Hurricane Sandy in Emergency Response," Int. J. of Disaster Risk Reduct., vol. 21, pp. 213–222, DOI: 10.1016/j.ijdrr.2016.12.011, 2017.

[24] S. H. W. Ilyas, Z. T. Soomro, A. Anwar, H. Shahzad and U. Yaqub, "Analyzing Brexit’s Impact Using Sentiment Analysis and Topic Modeling on Twitter Discussion, " Proc. of the ACM Int. Conf., pp. 1–6, DOI: 10.1145/3396956.3396973, Jun. 2020.

[25] A. Field, C. Y. Park, A. Theophilo, J. Watson-Daniels and Y. Tsvetkov, "An Analysis of Emotions and the Prominence of Positivity in #BlackLivesMatter Tweets," Proc. of the National Academy of Sciences of the United States of America, vol. 119, no. 35, p. e2205767119, DOI: 10.1073/pnas.2205767119, 2022.

[26] D. Won, Z. C. Steinert-Threlkeld and J. Joo, "Protest Activity Detection and Perceived Violence Estimation from Social Media Images," Proc. of the 2017 ACM Multimedia Conf.e (MM 2017), pp. 786–794, DOI: 10.1145/3123266.3123282, Oct. 2017.

[27] Z. Steinert-Threlkeld and J. Joo, "MMCHIVED: Multimodal Chile and Venezuela Protest Event Data," Proc. of the 16th Int. AAAI Conf. on Web and Social Media, vol. 16, pp. 1332–1341, DOI: 10.1609/icwsm.v16i1.19385, 2022.

[28] F. Rustam, M. Khalid, W. Aslam, V. Rupapara, A. Mehmood and G. S. Choi, "A Performance Comparisonof Supervised Machine Learning Models for Covid-19 Tweets Sentiment Analysis," PLoS One, vol. 16, no. 2, DOI: 10.1371/journal.pone.0245909, Feb. 2021.

[29] Imamah and F. H. Rachman, "Twitter Sentiment Analysis of Covid-19 Using Term Weighting TF-IDF and Logistic Regresion," Proc. of the 6th Information Technology Int. Seminar (ITIS 2020), pp. 238–242, DOI: 10.1109/ITIS50118.2020.9320958, 2020.

[30] P. Sharma and A. K. Sharma, "Experimental Investigation of Automated System for Twitter Sentiment Analysis to Predict the Public Emotions Using Machine Learning Algorithms," Materials Today Proc., DOI: 10.1016/j.matpr.2020.09.351, 2020.

[31] M. Caprolu, A. Sadighian and R. Di Pietro, "Characterizing the 2022 Russo-Ukrainian Conflict through the Lenses of Aspect-based Sentiment Analysis: Dataset, Methodology and Preliminary Findings," arXiv Prepr, arXiv2208.04903, [Online], Available: http://arxiv.org/abs/2208.04903, Aug. 2022.

[32] H. W. A. Hanley, D. Kumar and Z. Durumeric, "A Special Operation’: A Quantitative Approach to Dissecting and Comparing Different Media Ecosystems’ Coverage of the Russo-Ukrainian War," arXiv Prepr, arXiv 2210.03016, [Online], Available: https://doi.org/10.48550/arXiv.2210.03016, Oct. 2022.

[33] A. Guerra and O. Karakuş, "Sentiment Analysis for Measuring Hope and Fear from Reddit Posts during the 2022 Russo-Ukrainian Conflict," arXiv Prepr, arXiv2301.08347, [Online], Available: http://arxiv.org/abs/2301.08347, Jan. 2023.

[34] F. Pierri, L. Luceri, N. Jindal and E. Ferrara, "Propaganda and Misinformation on Facebook and Twitter during the Russian Invasion of Ukraine," arXiv Prepr, arXiv2212.00419, Accessed: Apr. 04, 2023. [Online], Available: http://arxiv.org/abs/2212.00419, Dec. 2022.

[35] N. S. Agarwal, N. S. Punn and S. K. Sonbhadra, "Exploring Public Opinion Dynamics on the Verge of World War III Using Russia-Ukraine War-Tweets Dataset," KDD-UC, Washington, DC, USA, [Online], Available: https://www.kdd.org/kdd2022/papers/27_Navya Sonal Agarwal.pdf, 2022.

[36] P. Vyas, M. Reisslein, B. P. Rimal, G. Vyas, G. P. Basyal and P. Muzumdar, "Automated Classification of Societal Sentiments on Twitter with Machine Learning," IEEE Transactions on Technology and Socity, vol. 3, no. 2, pp. 100–110, DOI: 10.1109/tts.2021.3108963, 2021.

[37] R. Ibar-Alonso, R. Quiroga-García and M. Arenas-Parra, "Opinion Mining of Green Energy Sentiment: A Russia-Ukraine Conflict Analysis," Mathematics, vol. 10, no. 14, DOI: 10.3390/math10142532, 2022.

[38] B. Chen et al., "Public Opinion Dynamics in Cyberspace on Russia-Ukraine War: A Case Analysis with Chinese Weibo," IEEE Transactions on Computational Social Systems, vol. 9, no. 3, pp. 948–958, 2022.

[39] M. B. Garcia and A. Cunanan-Yabut, "Public Sentiment and Emotion Analyses of Twitter Data on the 2022 Russian Invasion of Ukraine," Proc. of the 2022 9th Int. Conf. on Information Technology, Computer and Electrical Engineering (ICITACEE 2022), pp. 242–247, DOI: 10.1109/ ICITACEE55701.2022.99241 36, 2022.

[40] B. Džubur, Ž. Trojer and U. Zrimšek, "Semantic Analysis of Russo-Ukrainian War Tweet Networks," SCORES: Ljubljana, [Online], Available: http://www.scores.si/assets/papers/6258.pdf, 2022.

[41] Z. C. Steinert-Threlkeld, Twitter As Data, DOI: 10.1017/9781108529327, Cambridge Uni. Press, 2018.

[42] Mendeley Data, "Russian Aggression in Ukraine Related Tweets - Mendeley Data," DOI:10.17632/77xdt9 25zp.1, 2023.

[43] M. R. Baker and M. A. Akcayol, "A Novel Web Ranking Algorithm Based on Pages Multi-attribute," Int. J. of Information Technology, vol. 14, no. 2, pp. 739–749, DOI: 10.1007/s41870-021-00833-5, 2022.

[44] A. Krouska, C. Troussas and M. Virvou, "The Effect of Preprocessing Techniques on Twitter Sentiment Analysis," Proc. of the 7th Int. Conf. on Information, Intelligence, Systems and Applications (IISA 2016), pp. 1–5. DOI: 10.1109/IISA.2016.7785373, Dec. 2016.

[45] M. A. Abid, S. Ullah, M. A. Siddique, M. F. Mushtaq, W. Aljedaani and F. Rustam, "Spam SMS Filtering Based on Text Features and Supervised Machine Learning Techniques," Multimedia Tools and Applications, vol. 81, pp. 39853–39871, DOI: 10.1007/s11042-022-12991-0, 2022.

[46] K. Chen, Z. Zhang, J. Long and H. Zhang, "Turning from TF-IDF to TF-IGM for Term Weighting in Text Classification," Expert Systems with Applications, vol. 66, DOI: 10.1016/j.eswa.2016.09.009, 2016.

[47] E. Alpaydin, Introduction to Machine Learning, 4th Edn., MIT Press, DOI: 10.1007/978-3-030-74640-7_4, 2020.

[48] V. K. Vijayan, K. R. Bindu and L. Parameswaran, "A Comprehensive Study of Text Classification Algorithms," Proc. of the 2017 Int. Conf. on Advances in Computing, Communications and Informatics (ICACCI 2017), vol. 2017-Jan., pp. 1109–1113, DOI: 10.1109/ICACCI.2017.8125990, 2017.

[49] F. Sebastiani, "Machine Learning in Automated Text Categorization," ACM Computing Surveys, vol. 34, no. 1, pp. 1–47, DOI: 10.1145/505282.505283, 2002.

[50] Y. Yang and X. Liu, "A Re-examination of Text Categorization Methods," Proc. of the 22nd Annual Int. ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 42–49, DOI: 10.1145/312624.312647, Aug. 1999.

[51] N. Jalal, A. Mehmood, G. S. Choi and I. Ashraf, "A Novel Improved Random Forest for Text Classification Using Feature Ranking and Optimal Number of Trees," J. King Saud Univ. - Comput. Inf. Sci., DOI: 10.1016/j.jksuci.2022.03.012, 2022.

[52] L. Breiman, "Bagging Predictors," Machine Learning, vol. 24, no. 2, pp. 123–140, 1996.

[53] L. Breiman, "Random Forests," Machine Learning, vol. 45, no. 1, pp. 5–32, Oct. 2001.

[54] P. Domingos, "A Few Useful Things to Know about Machine Learning," Communications of the ACM, vol. 55, no. 10, pp. 78–87, DOI: 10.1145/2347736.2347755, Oct. 2012.

[55] B. Agarwal and N. Mittal, "Text Classification Using Machine Learning Methods: A Survey," Advances in Intelligent Systems and Comp., vol. 236, pp. 701–709, DOI: 10.1007/978-81-322-1602-5_75, 2014.

[56] A. Subasi, Practical Machine Learning for Data Analysis Using Python, Elsevier, DOI: 10.1016/B978-0-12-821379-7.00008-4, 2020.

[57] H. Belyadi and A. Haghighat, Machine Learning Guide for Oil and Gas Using Python, Elsevier, DOI: 10.1016/c2019-0-03617-5, 2021.

[58] Y. Yang, "An Evaluation of Statistical Approaches to Text Categorization," Information Retrieval, vol. 1, no. 1–2, pp. 69–90, DOI: 10.1023/a:1009982220290, 1999.

[59] S. Wang and C. D. Manning, "Baselines and Bigrams: Simple, Good Sentiment and Topic Classification," Proc. of the 50th Annual Meeting of the Association for Computational Linguistics (ACL 2012), vol. 2, pp. 90–94, 2012.

[60] R. Can, S. Kocaman and C. Gokceoglu, "A Comprehensive Assessment of XGBoost Algorithm for Landslide Susceptibility Mapping in the Upper Basin of Ataturk Dam, Turkey," Applied Sciences, vol. 11, no. 11, p. 4993, DOI: 10.3390/app11114993, 2021.

[61] Y. Freund and R. E. Schapire, "Experiments with a New Boosting Algorithm," Proc. of the 13th Int. Conf. Machine Learning, pp. 148–156, DOI: 10.1.1.133.1040, 1996.

[62] W. Wang and D. Sun, "The Improved AdaBoost Algorithms for Imbalanced Data Classification," Information Sciences, vol. 563, pp. 358–374, DOI: 10.1016/j.ins.2021.03.042, Jul. 2021.

[63] A. Diera et al., "Bag-of-Words vs. Sequence vs. Graph vs. Hierarchy for Single- and Multi-label Text Classification," Proc. of the 60th Annual Meeting of the Association for Computational Linguistics, vol. 1: Long Papers, pp. 4038 - 4051, DOI: 10.48550/arXiv.2204.03954, 2022.

[64] A. Pinkus, "Approximation Theory of the MLP Model in Neural Networks," Acta Numerica, vol. 8, pp. 143–195, DOI: 10.1017/S0962492900002919, 1999.

[65] D. Chicco and G. Jurman, "The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation," BMC Genomics, vol. 21, no. 1, pp. 1–13, DOI: 10.1186/s12864-019-6413-7, Jan. 2020.

[66] Y. Wang, A. Sun, J. Han, Y. Liu and X. Zhu, "Sentiment Analysis by Capsules," Proc. of the World Wide Web Conference (WWW 2018), vol. 10, pp. 1165–1174, DOI: 10.1145/3178876.3186015, Apr. 2018.

[67] A. U. Rehman, A. K. Malik, B. Raza and W. Ali, "A Hybrid CNN-LSTM Model for Improving Accuracy of Movie Reviews Sentiment Analysis," Multimedia Tools and Applications, vol. 78, no. 18, pp. 26597–26613, DOI: 10.1007/s11042-019-07788-7, Sep. 2019.

[68] S. Tam, R. Ben Said and Ö. Tanriöver, "A ConvBiLSTM Deep Learning Model-based Approach for Twitter Sentiment Classification," IEEE Access, vol. 9, pp. 41283–41293, DOI: 10.1109/ACCESS.2021.3064830, 2021.