NEWS

A COMPARATIVE STUDY OF DIFFERENT SEARCH AND INDEXING TOOLS FOR BIG DATA


(Received: 17-Nov.-2021, Revised: 15-Jan.-2022 , Accepted: 26-Jan.-2022)
The exponential growth of data generated from the Moroccan court makes it difficult to search for valuable knowledge within multiple and huge datasets. Traditional searching methods are not adapted to Big Data context. Indeed, handling the search of specific information on Big Data requires advanced methods and powerful search systems. To contribute to the Court Digital Transformation Strategy, we aim to develop a solution that will leverage the technological advances in this field.The project we propose consists in developing new methods and techniques of artificial intelligence in order to automate the content of a large mass of data produced by the jurisdictions of the Kingdom of Morocco and to design a system capable of analyzing large volumes of complex judicial data. The aim is to discover and explain certain existing phenomena or to extrapolate new knowledge from the information analyzed, to recognize shapes, make predictions and make the necessary adjustments if necessary. For that, the purpose of this first study is to investigate and examine the existing search and indexing technologies for Big Data. It compares the leading solutions used for information retrieval in order to choose one that will serve as the base for our jurisprudential search engine.

[1] T. J. Ma, R. J. Garcia, F. Danford, L. Patrizi, J. Galasso and J. Loyd, "Big Data Actionable Intelligence Architecture," Journal of Big Data, vol. 7, no. 1, pp. 1–19, 2020.

[2] V. V. Kolisetty and D. S. Rajput, "A Review on the Significance of Machine Learning for Data Analysis in Big Data," Jordanian Journal of Computers and Information Technology (JJCIT), vol. 6, no. 01, pp.41-57, 2020.

[3] J. Wang, Y. Yang, T. Wang, R. S. Sherratt and J. Zhang, "Big Data Service Architecture: A Survey," Journal of Internet Technology, vol. 21, no. 2, pp. 393–405, 2020.

[4] A. Oussous, F.-Z. Benjelloun, A. A. Lahcen and S. Belfkih, "Big Data Technologies: A Survey," Journal of King Saud University-Computer and Information Sciences, vol. 30, no. 4, pp. 431–448, 2018.

[5] H. Hu, Y. Wen, T.-S. Chua and X. Li, "Toward Scalable Systems for Big Data Analytics: A Technology Tutorial," IEEE Access, vol. 2, pp. 652–687, 2014.

[6] A. Gani, A. Siddiqa, S. Shamshirband and F. Hanum, "A Survey on Indexing Techniques for Big Data: Taxonomy and Performance Evaluation," Knowledge and Inf. Systems, vol. 46, no. 2, pp. 241–284, 2016.

[7] V. Jatakia, S. Korlahalli and K. Deulkar, "A Survey of Different Search Techniques for Big Data," Proc. of the IEEE International Conference on Innovations in Information, Embedded and Communication Systems (ICIIECS), pp. 1–4, Coimbatore, India, 2017.

[8] T. Lee, H. Lee, K.-H. Rhee and U. S. Shin, "The Efficient Implementation of Distributed Indexing with Hadoop for Digital Investigations on Big Data," Computer Science and Information Systems, vol. 11, no. 3, pp. 1037–1054, 2014.

[9] T. H. Davenport and J. Dyché, "Big Data in Big Companies," International Institute for Analytics, vol. 3, pp. 1–31, 2013.

[10] R. V. Zicari, "Big Data: Challenges and Opportunities," Big Data Computing, vol. 564, p. 103, 2014.

[11] H. Ma, W. Du, S. Xu and W. Li, "Searching Tourism Information by Using Vertical Search Engine Based on Nutch and Solr," Proc. of the 17th IEEE International Conference on Software Engineering Research, Management and Applications (SERA), pp. 128–132, Honolulu, HI, USA, 2019.

[12] M. A. AKCA, T. Aydoğan and M. İlkuçar, "An Analysis on the Comparison of the Performance and Configuration Features of Big Data Tools Solr and ElasticSearch," International Journal of Intelligent Systems and Applications in Engineering, vol. 6, no. Special Issue (2016), pp. 8–12, 2016.

[13] N. Luburić and D. Ivanović, "Comparing Apache Solr and ElasticSearch Search Servers," Proc. of the 6th International Conference on Information Society and Technology (ICIST 2016), pp. 287-291, 2016.

[14] U. Kılıç and K. Aksakalli, "Comparison of Solr and ElasticSearch among Popular Full Text Search Engines and Their Security Analysis," Proc. of 6th International Conference on Future Internet of Things and Cloud Workshops, pp. 163–168, DOI: 10.13140/RG.2.2.24563.32803, 2016.

[15] T.-J. Su, S.-M. Wang, Y.-F. Chen and C.-L. Liu, "Attack Detection of Distributed Denial of Service Based on Splunk," Proc. of the IEEE International Conference on Advanced Materials for Science and Engineering (ICAMSE), pp. 397–400, Tainan, Taiwan, 2016.

[16] A. Siddiqa, A. Karim and V. Chang, "Smallclient for Big Data: An Indexing Framework towards Fast Data Retrieval," Cluster Computing, vol. 20, no. 2, pp. 1193–1208, 2017.

[17] A. Voit, A. Stankus, S. Magomedov and I. Ivanova, "Big Data Processing for Full-text Search and Visualization with ElasticSearch," Int. J. of Advanced Computer Sci. and Appl., vol. 8, no. 12, p. 18, 2017.

[18] N. Shah, D. Willick and V. Mago, "A Framework for Social Media Data Analytics Using ElasticSearch and Kibana," Wireless Networks, vol. 2018, pp. 1–9, DOI: 10.1007/s11276-018-01896-2, 2018.

[19] J. Hansen, K. Porter, A. Shalaginov and K. Franke, "Comparing Open Source Search engine Functionality, Efficiency and Effectiveness with Respect to Digital Forensic Search," Norsk Informasjonssikkerhetskonferanse (NISK), pp. 1-14, 2018.

[20] D. Lande, I. Subach and A. Puchkov, "A System for Analysis of Big Data from Social Media," Information & Security, vol. 47, no. 1, pp. 44–61, 2020.

[21] K. Rodrigues, Y. Luo and D. Yuan, "{CLP}: Efficient and Scalable Search on Compressed Text Logs," Proc. of the 15th USENIX Symposium on Operating Systems Design and Implementation, pp. 183–198, 2021.

[22] R. Ando, Y. Kadobayashi, H. Takakura and H. Itoh, "Understanding Traffic Patterns of Covid-19 Ioc in Huge Academic Backbone Network Sinet," International Journal of Network Security & Its Applications (IJNSA), vol. 13, no. 6, pp. 23-36, 2021.

[23] D. Shahi, Apache Solr: A Practical Approach to Enterprise Search, ISBN: 978-1-4842-1070-3, Springer, 2016.

[24] R. Gao, D. Li, W. Li and Y. Dong, "Application of Full Text Search Engine Based on Lucene," Advances in Internet of Things, vol. 2, no. 4, DOI:10.4236/ait.2012.24013, 2012.

[25] A. Białecki, R. Muir, G. Ingersoll and L. Imagination," Apache Lucene 4," Proc. of SIGIR Workshop on Open Source Inf. Retrieval, p.17, [Online], Available: http://opensearchlab.otago.ac.nz/paper_10.pdf, 2012.

[26] B. Lublinsky, K. T. Smith and A. Yakubovich, Professional Hadoop Solutions, ISBN: 978-1-118-61193-7, John Wiley & Sons, 2013.

[27] R. Kuć, Apache Solr 4 Cookbook, ISBN-13: 978-1782161325, Packt Publishing, Ltd., 2013.

[28] B. Abu-Salih, P. Wongthongtham, D. Zhu, K. Y. Chan and A. Rudra, Social Big Data Analytics: Practices, Techniques and Applications, ISBN: 978-981-33-6652-7, Springer Nature, 2021.

[29] B. Abu-Salih, P. Wongthongtham, D. Zhu et al., "Introduction to Big Data Technology," Ch. 2 in Book: Social Big Data Analytics: Practices, Techniques and Applications, pp. 15–59, 2021.

[30] C. Gormley and Z. Tong, ElasticSearch: The Definitive Guide - A Distributed Real-time Search and Analytics Engine, ISBN: 9781449358549, O’Reilly Media, Inc., 2015.

[31] S. Bhandarkar and N. BN, "A Full-text-based Search Algorithm vs ElasticSearch," Studies in Indian Place Names, UGC Care Journal, vol. 40, no. 74, pp. 2168–2171, 2020.

[32] Y. Gupta and R. K. Gupta, Mastering Elastic Stack, ISBN-13: 978-1786460011, Packt Publishing, Ltd., 2017.

[33] L. Belcastro, F. Marozzo, D. Talia and P. Trunfio, "Big Data Analysis on Clouds," In Book: Handbook of Big Data Technologies, pp. 101–142, DOI:10.1007/978-3-319-49340-4_4, Springer, 2017.

[34] P. Zadrozny and R. Kodali, Big Data Analytics Using Splunk: Deriving Operational Intelligence from Social Media, Machine Data, Existing Data Warehouses and Other Real-time Streaming Sources, ISBN-13: 978- 1430257615, Apress, 2013.

[35] B. P. Sigman and E. Delgado, Splunk Essentials, 2nd Ed., ISBN: 9781785882135 1785882139, Packt Publishing Ltd, 2016.

[36] T. Hryhorova and O. Moskalenko, "Use of Information Technologies to Improve Access to Information in E- learning Systems," Proc. of the 18th International Conference on Data Science and Intelligent Analysis of Information (ICDSIAI 2018), vol. 836, pp. 206–215, Springer, 2018.

[37] A. Aksyonoff, Introduction to Search with Sphinx: From Installation to Relevance Tuning, ISBN: 9780596809553, O’Reilly Media, Inc., 2011.

[38] A. Ali, Sphinx Search Beginner’s Guide, ISBN-13: 978-1849512541, Packt Publishing, Ltd., 2011.

[39] R. Maski, "Using Apache Solr for Ecommerce Search Applications," Happiest Minds, IT Services, pp. 1-12, [Online], Available: https://www.happiestminds.com/whitepapers/using-apache-solr-for-ecommerce-search- applications.pdf, 2013.

[40] V.-A. Zamfir, M. Carabas, C. Carabas and N. Tapus, "Systems Monitoring and Big Data Analysis Using the ElasticSearch System," Proc. of the 22nd IEEE International Conference on Control Systems and Computer Science (CSCS), pp. 188–193, Bucharest, Romania, 2019.

[41] J. Hamilton, B. Schofield, M. G. Berges and J.-C. Tournier, "SCADA Statistics Monitoring Using the Elastic Stack (ElasticSearch, Logstash, Kibana)," Proc. of the International Conference on Accelerator and Large Experimental Physics Control Systems (ICALEPCS2017), pp. 451-455, Barcelona, Spain, 2017.

[42] S. P. Chamarthi S. Prasad and S. Magesh, "Application of Splunk towards Log Files Analysis and Monitoring of Mobile Communication Nodes," International Journal of Applied Science and Engineering Research, vol. 3, pp. 478-483, 2014.

[43] D. Mehta, "Splunk Search Processing Language," In Book: Splunk Certified Study Guide, pp. 27–52, Springer, 2021.

[44] A. Chaudhary, K. Akshatha, K. Kodlekere and S. J. Prasad, "Keyword Based Indexing of a Mmultimdia File," IEEE International Symposium on Multimedia (ISM), pp. 573–576, Taichung, Taiwan, 2017.

[45] P. Kumar, P. Kumar, N. Zaidi and V. S. Rathore, "Analysis and Comparative Exploration of ElasticSearch, Mongodb and Hadoop Big Data Processing," in Book: Soft Computing: Theories and Applications, pp. 605– 615, Springer, 2018.

[46] P. Zadrozny and R. Kodali, "Visualizing the Results," in Book: Big Data Analytics Using Splunk, pp. 63– 96, Springer, 2013.

[47] K. Venkatesh, M. J. S. Ali, N. Nithiyanandam and M. Rajesh, "Challenges and Research Disputes and Tools in Big-data Analytics," International Journal of Engineering and Advanced Technology, vol. 6, pp. 1949– 1952, 2019.

[48] V. Prajapati, Big Data Analytics with R and Hadoop, ISBN 978-1-78216-328-2, Packt Publishing, Ltd., 2013.

[49] F. A. Vadhil, M. L. Salihi and M. F. Nanne, "Toward a Secure ELK Stack," International Journal of Computer Science and Information Security (IJCSIS), vol. 17, no. 7, pp. 139-143, 2019.

[50] Splunkbase, "Home | Splunkbase," [Online], Available: http:splunkbase.splunk.com, [Accessed: December 28, 2021].