NEWS

A NOVEL TRUE-REAL-TIME SPATIOTEMPORAL DATA STREAM PROCESSING FRAMEWORK


(Received: 9-Mar.-2022, Revised: 2-May-2022 , Accepted: 23-May-2022)
The ability to interpret spatiotemporal data streams in real time is critical for a range of systems. However, processing vast amounts of spatiotemporal data out of several sources, such as online traffic, social platforms, sensor networks and other sources, is a considerable challenge. The major goal of this study is to create a framework for processing and analyzing spatiotemporal data from multiple sources with irregular shapes, so that researchers can focus on data analysis instead of worrying about the data sources' structure. We introduced a novel spatiotemporal data paradigm for true-real-time stream processing, which enables high-speed and low- latency real-time data processing, with these considerations in mind. A comparison of two state-of-the-art real- time process architectures was offered, as well as a full review of the various open-source technologies for real- time data stream processing and their system topologies were also presented. Hence, this study proposed a brand-new framework that integrates Apache Kafka for spatiotemporal data ingestion, Apache Flink for true- real-time processing of spatiotemporal stream data, as well as machine learning for real-time predictions and Apache Cassandra at the storage layer for distributed storage in real time. The proposed framework was compared with others from the literature using the following features: Scalability (Sc), prediction tools (PT), data analytics (DA), multiple event types (MET), data storage (DS), Real-time (Rt) and performance evaluation (PE) stream processing (SP) and our proposed framework provided the ability to handle all of these tasks.

[1] B. R. Hiraman, M. C. Viresh and C. K. Abhijeet, "A Study of Apache Kafka in Big Data Stream Processing," Proc. of the Int. Conf. on Information, Communication, Engineering and Technology (ICICET 2018), pp. 1–3, DOI: 10.1109/ICICET.2018.8533771, 2018.

[2] J. Manyika, M. Chui Brown, B. B. J., R. Dobbs, C. Roxburgh and A. Hung Byers, "Big Data: The Next Frontier for Innovation, Competition and Productivity," McKinsey Global Institute, no. June, p. 156, [Online], Available: https://bigdatawg.nist.gov/pdf/MGI_big_data_full_report.pdf, 2011.

[3] S. Ounacer, M. Amine, S. Ardchir, A. Daif and M. Azouazi, "A New Architecture for Real Time Data Stream Processing," International Journal of Advanced Computer Science and Applications, vol. 8, no. 11, pp. 44–51, DOI: 10.14569/ijacsa.2017.081106, 2017.

[4] F. Pivec, "The Global Information Technology Report 2003–2004," Organizacija Znanja, vol. 8, no. 4, pp. 203-206, DOI:10.3359/oz0304203, 2003.

[5] S. Nadal et al., "A Software Reference Architecture for Semantic-aware Big Data Systems," Information and Software Technology, vol. 90, pp. 75–92, DOI: 10.1016/j.infsof.2017.06.001, 2017.

[6] A. Hamdi, K. Shaban, A. Erradi et al., "Spatiotemporal Data Mining: A Survey on Challenges and Open Problems," Artificial Intelligence Review, no. 0123456789, DOI: 10.48550/arXiv.2103.17128, 2021.

[7] N. Khan, et al., "The 10 Vs, Issues and Challenges of Big Data," Proc. of the ACM Int. Conf., no. March, pp. 52–56, DOI: 10.1145/3206157.3206166, 2018.

[8] R. L. Villars, C. W. Olofson and M. Eastwood, "Big Data: What It is and Why You Should Care," IDC White Paper, pp. 7–8, 2011, [Online], Available: http://www.tracemyflows.com/ uploads/big_data/IDC_ AMD_Big_Data_Whitepaper.pdf, 2011.

[9] N. Elgendy and A. Elragal, "Big Data Analytics: A Literature Review," Journal of Management Analytics, vol. 2, no. 3, pp. 214–227, 2014.

[10] C. L. Philip Chen and C. Y. Zhang, "Data-intensive Applications, Challenges, Techniques and Technologies: A Survey on Big Data," Information Sciences, vol. 275, pp. 314–347, 2014.

[11] S. Salehian and Y. Yan, "Comparison of Spark Resource Managers and Distributed File Systems," Proc. of the IEEE Int. Conf. on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud- SocialCom-SustainCom), pp. 567–572, DOI: 10.1109/BDCloud-SocialCom-SustainCom.2016.88, 2016.

[12] J. Jo and K. W. Lee, "MapReduce-based D-ELT Framework to Address the Challenges of Geospatial Big Data," ISPRS Int. Journal of Geo-information, vol. 8, no. 11, DOI: 10.3390/ijgi8110475, 2019.

[13] J. Kang, L. Fang, S. Li and X. Wang, "Parallel Cellular Automata Markov Model for Land Use Change Prediction over MapReduce Framework," ISPRS Int. Journal of Geo-Information, vol. 8, no. 10, DOI: 10.3390/ijgi8100454, 2019.

[14] D. Glushkova, P. Jovanovic and A. Abelló, "MapReduce Performance Model for Hadoop 2.x," Information Systems, vol. 79, pp. 32–43, DOI: 10.1016/j.is.2017.11.006, 2019.

[15] I. A. T. Hashem et al., "MapReduce Scheduling Algorithms: A Review," The Journal of Supercomputing, vol. 76, pp. 4915–4945, 2020.

[16] F. Li, J. Chen and Z. Wang, "Wireless MapReduce Distributed Computing, " IEEE Transactions on Information Theory, vol. 65, no. 10, pp. 6101–6114, DOI: 10.1109/TIT.2019.2924621, 2019.

[17] M. Bendre and R. Manthalkar, "Time Series Decomposition and Predictive Analytics Using MapReduce Framework," Expert Systems with Applications, vol. 116, pp. 108–120, 2019.

[18] S. Heidari, M. Alborzi, R. Radfar et al., "Big Data Clustering with Varied Density Based on MapReduce," Journal of Big Data, vol. 6, no. 1, DOI: 10.1186/s40537-019-0236-x, 2019.

[19] N. Maleki, A. M. Rahmani and M. Conti, "MapReduce: An Infrastructure Review and Research Insights," The Journal of Supercomputing, vol. 75, pp. 6934–7002, 2019.

[20] S. Wang, Y. Zhong and E. Wang, "An Integrated GIS Platform Architecture for Spatiotemporal Big Data," Future Generation Comp. Sys., vol. 94, pp. 160–172, DOI: 10.1016/j.future.2018.10.034, 2019.

[21] M. M. Alam, L. Torgo and A. Bifet, "A Survey on Spatio-temporal Data Analytics Systems," ACM Computing Surveys, pp. 1–37, DOI: 10.1145/3507904, 2022.

[22] Apache, "Apache Hadoop: An Open-source Distributed Processing Framework," [Online], Available: https://hadoop.apache.org/, 2020.

[23] A. Davoudian, L. Chen and M. Liu, "A Survey on NoSQL Stores," ACM Computing Surveys, vol. 51, no. 2, DOI: 10.1145/3158661, 2018.

[24] M. Zaharia et al., "Apache Spark: A Unified Engine for Big Data Processing," Communications of the ACM, vol. 59, no. 11, pp. 56–65, DOI: 10.1145/2934664, 2016.

[25] I. Yaqoob, I. A. T. Hashem, A. Ahmed, S. M. A. Kazmi and C. S. Hong, "Internet of Things Forensics: Recent Advances, Taxonomy, Requirements and Open Challenges," Future Generation Computer Systems, vol. 92, no. May 2018, pp. 265–275, DOI: 10.1016/j.future.2018.09.058, 2019.

[26] E. Ahmed et al., "The Role of Big Data Analytics in Internet of Things," Computer Networks, vol. 129, pp. 459–471, DOI: 10.1016/j.comnet.2017.06.013, 2017.

[27] S. Henning and W. Hasselbring, "How to Measure Scalability of Distributed Stream Processing Engines?" Proc. of Companion of the ACM/SPEC Int. Conf. on Performance Engineering (ICPE 2021), pp. 85–88, DOI: 10.1145/3447545.3451190, 2021.

[28] K. Kallas, F. Niksic, C. Stanford and R. Alur, "Stream Processing with Dependency-guided Synchronization," Proc. of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP ’22), pp. 1-16, DOI: 10.1145/3503221.3508413, Seoul, Republic of Korea, 2022.

[29] N. Giatrakos, E. Alevizos, A. Artikis, A. Deligiannakis and M. Garofalakis, "Complex Event Recognition in the Big Data Era: A Survey," VLDB Journal, vol. 29, no. 1, pp. 313–352, 2020.

[30] T. Li, Z. Xu, J. Tang and Y. Wang, "Model-free Control for Distributed Stream Data Processing Using Deep Reinforcement Learning," Proc. of the VLDB Endowment, vol. 11, no. 6, pp. 705–718, 2018.

[31] R. Sahal, J. G. Breslin and M. I. Ali, "Big Data and Stream Processing Platforms for Industry 4.0 Requirements Mapping for a Predictive Maintenance Use Case," Journal of Manufacturing Systems, vol. 54, no. November 2019, pp. 138–151, DOI: 10.1016/j.jmsy.2019.11.004, 2020,

[32] D. P. Carazo, Evaluation and Deployment of Big Data Technologies on a NIDS Evaluación y Despliegue de Tecnologías Big Data Sobre un NIDS, M.Sc. Thesis, Master in Data Science, Universidad Internacional Menendez Pelayo, 2019.

[33] F. Carcillo, A. Dal Pozzolo, Y. A. Le Borgne, O. Caelen, Y. Mazzer and G. Bontempi, "SCARFF: A Scalable Framework for Streaming Credit Card Fraud Detection with Spark," Information Fusion, vol. 41, pp. 182–194, DOI: 10.1016/j.inffus.2017.09.005, 2018.

[34] H. Herodotou, Y. Chen and J. Lu, "A Survey on Automatic Parameter Tuning for Big Data Processing Systems," ACM Computing Surveys, vol. 53, no. 2, DOI: 10.1145/3381027, 2020.

[35] M. Dias de Assunção, A. da Silva Veith and R. Buyya, "Distributed Data Stream Processing and Edge Computing: A Survey on Resource Elasticity and Future Directions," Journal of Network and Computer Applications, vol. 103, no. July 2017, pp. 1–17, DOI: 10.1016/j.jnca.2017.12.001, 2018.

[36] A. Batyuk and V. Voityshyn, "Apache Storm Based on Topology for Real-time Processing of Streaming Data from Social Networks," Proc. of the 1st IEEE Int. Conf. on Data Stream Mining and Processing (DSMP 2016), no. August, pp. 345–349, DOI: 10.1109/DSMP.2016.7583573, 2016.

[37] B. Zhao, H. Van Der Aa, T. T. Nguyen, Q. V. H. Nguyen and M. Weidlich, "EIRES: Efficient Integration of Remote Data in Event Stream Processing," Proc. of the ACM SIGMOD Int. Conf. on Management of Data, no. i, pp. 2128–2141, DOI: 10.1145/3448016.3457304, 2021.

[38] D. Corral-Plaza, I. Medina-Bulo, G. Ortiz and J. Boubeta-Puig, "A Stream Processing Architecture for Heterogeneous Data Sources in the Internet of Things," Computer Standards and Interfaces, vol. 70, no. June 2019, p. 103426, DOI: 10.1016/j.csi.2020.103426, , 2020.

[39] N. Tantalaki, S. Souravlas and M. Roumeliotis, "A Review on Big Data Real-time Stream Processing and Its Scheduling Techniques," International Journal of Parallel, Emergent and Distributed Systems, vol. 35, no. 5, pp. 571–601, DOI: 10.1080/17445760.2019.1585848, 2020.

[40] N. Marz, Big Data: Principles and Best Practices of Scalable Realtime Data Systems, ISBN:978-1-61729-034-3, [S.l.]: O’Reilly Media, 2013.

[41] J. Bobulski and M. Kubanek, "Data Model for Bigdata System for Multimedia," Proc. of the ACM Int. Conf. Proceeding Series, vol. PartF16898, pp. 12–17, DOI: 10.1145/3449365.3449368, 2021. 

[42] A. Bandi and J. A. Hurtado, "Big Data Streaming Architecture for Edge Computing Using Kafka and Rockset," Proc. of the 5th Int. Conf. on Computing Methodologies and Communication (ICCMC 2021), no. Iccmc, pp. 323–329, DOI: 10.1109/ICCMC51019.2021.9418466, 2021.

[43] S. Dipietro, G. Casale and G. Serazzi, "A Queueing Network Model for Performance Prediction of Apache Cassandra," Proc. of the 10th EAI Int. Conf. on Performance Evaluation Methodologies and Tools (ValueTools 2016), pp. 186–193, DOI: 10.4108/eai.25-10-2016.2266606, 2017.

[44] S. Amini, I. Gerostathopoulos and C. Prehofer, "Big Data Analytics Architecture for Real-time Traffic Control," Proc. of the 5th IEEE Int. Conf. on Models and Technologies for Intelligent Transportation Systems (MT-ITS), pp. 710–715, DOI: 10.1109/MTITS.2017.8005605, 2017.

[45] G. M. D’silva, A. Khan, Gaurav and S. Bari, "Real-time Processing of IoT Events with Historic Data Using Apache Kafka and Apache Spark with Dashing Framework," Proc. of the 2nd IEEE Int. Conf. on Recent Trends in Electronics, Information Communication Technology (RTEICT), pp. 1804–1809, DOI: 10.1109/RTEICT.2017.8256910, 2017.

[46] H. S. Jung, C. S. Yoon, Y. W. Lee, J. W. Park and C. H. Yun, "Cloud Computing Platform Based Real- time Processing for Stream Reasoning," Proc. of the 6th Int. Conf. on Future Generation Communication Technologies (FGCT 2017), pp. 37–41, DOI: 10.1109/FGCT.2017.8103400, 2017.

[47] F. Montori, L. Bedogni and L. Bononi, "A Collaborative Internet of Things Architecture for Smart Cities and Environmental Monitoring," IEEE Internet of Things Journal, vol. 5, no. 2, pp. 592–605, DOI: 10.1109/JIOT.2017.2720855, 2018.

[48] P. M. Santos et al., "PortoLivingLab: An IoT-based Sensing Platform for Smart Cities," IEEE Internet of Things Journal, vol. 5, no. 2, pp. 523–532, DOI: 10.1109/JIOT.2018.2791522, 2018.