BIG DATA IN HEALTHCARE: REVIEW AND OPEN RESEARCH ISSUES

(Received: 2016-10-18, Revised: 2016-12-29 , Accepted: 2017-01-21)
The globe is generating a high volume of data in all domains, such as social media, industries, stock markets and healthcare systems. Most of data volume has been generated in the past two years. This massive amount of data can bring benefits and draw knowledge to individuals, governments and industries and assist in decision making. In healthcare, an enormous volume of data is generated from healthcare providers and stored in digital systems. Hence, data are more accessible for reference and future use. The ultimate vision for working with health big data is to support the process of improving the quality of service in healthcare providers, reducing medical mistakes and providing a promoting consultation in addition to providing answers when needed. This paper provides a critical review of some applications of big data in healthcare, such as the flu-prediction project by the Institute of Cognitive Sciences, which combines social media data with governmental data. The project aim is to provide swift response about flu-related questions. The project should study human multi-modal representations, such as text, voice and images. Moreover, integrating social media data with governmental health data could create some challenges, because governmental health data are considered as more accurate than subjective opinions on social media. Another attempt to utilize big data in healthcare is Google Flu Trends GFT. GFT collects search queries from users to predict flu activity and outbreak. GFT performed well for the first two to three years; however, it started to perform worse since 2011 due to people behaviour changes. GFT did not update the prediction model based on new data released by the Centre for Disease Control and Prevention-US (CDC). On the other hand, ARGO (Auto Regression with Google) performed better than all previously available influenza models, because it adjusts people behaviour changes and relies on current publicly available data from google-search and CDC. This research also describes, analayzes and reflects the value of big data in healthcare. Big data has been introduced and defined based on the most agreed terms. The paper also explains big data revenue forecast for the year 2017 and historical revenue in three main domains: services, hardware and software. Big data management cycle has been reviewed and the main aspects of big data in healthcare (volume, velocity, variety and veracity) have been discussed. Finally, a discussion has been made of some challenges that face individuals and organizations in the process of utilizing big data in healthcare, such as data ownership, privacy, security, clinical data linkage, storage and processing issues and skills requirements.
  1. R. Jayanthi, "Big Data Applications in Healthcare," in: Impact of Emerging Digital Technologies on Leadership in Global Business, USA: IGI Global, p. 202, 2016.
  2. K. W. Oh, P. Lee and Y. W. Choi, "Enhanced Unlatch Operation of Disk Drive for Low Temperature Environment," Procedia Eng., vol. 131, pp. 906–913, 2015. 48 "Big Data in Healthcare: Review and Open Research Issues", Mohammad Ashraf Ottom.
  3. A. Li, "Presidential Debate Most-Tweeted Event in U.S. Political History," 2012,[ine], Available at: http://mashable.com/2012/10/04/presidential-debate-twitter/#p54DATw_Wuqz.
  4. A. De Mauro, M. Greco and M. Grimaldi, "What is Big Data? A Consensual Definition and a Review of Key Research Topics," AIP Conference Proceedings, vol. 1644, pp. 97–104, 2015.
  5. E. Dumbill, "Making sense of big data," Big Data, vol. 1, no. 1, pp. 1–2, 2013.
  6. D. Fisher, R. De Line, M. Czerwinski and S. Drucker, "Interactions With Big Data Analytics," Interactions, vol. 19, no. 3, pp. 50–59, 2012.
  7. C. Wu, R. Buyya and K. Ramamohanarao, "Big Data Analytics = Machine Learning + Cloud Computing," p. 27, Jan. 2016.
  8. N. Council, Frontiers in Massive Data Analysis, The National Academies Press Washington, DC, 2013.
  9. D. Boyd and K. Crawford, "Critical Questions for Big Data: Provocations for a Cultural, Technological and Scholarly Phenomenon," Information, Commun. Soc., vol. 15, no. 5, pp. 662–679, 2012.
  10. L. Gomes, "Machine-learning Maestro Michael Jordan on the Delusions of Big Data and other Huge Engineering Efforts," IEEE Spectrum, vol. 20, Oct. 2014.
  11. J. Kelly, "Big Data Vendor Revenue and Market Forecast," Wikibon Artic. Febrero, 2014.
  12. Ashish Nadkarni, Iris Feng and Laura DuBois, "Worldwide Storage in Big Data Forecast, 2015– 2019," IDC, Market Forecast, Doc # 259205, Oct. 2015.
  13. M. A. B. Ahmad, Mining Health Data for Breast Cancer Diagnosis Using Machine Learning, PHD Thesis, University of Canberra, Australia, Dec. 2013.
  14. W. Raghupathi and V. Raghupathi, "Big Data Analytics in Healthcare: Promise and Potential," Heal. Inf. Sci. Syst., vol. 2, no. 1, p. 3, 2014.
  15. Transparency Market Research, "Electronic Health Records Solution Market (Web Based, Client Server Based, Software as Services) for Applications in Hospitals, Physicians Office, Ambulatory Centers - Global Industry Analysis, Size, Share, Growth, Trends, and Forecast 2015 - 2023," MarketersMedia, USA, 2016.
  16. P. Zikopoulos and C. Eaton, Understanding big data: Analytics for Enterprise Class Hadoop and Streaming Data, McGraw-Hill Osborne Media, 2011.
  17. A. Gandomi and M. Haider, "Beyond the Hype: Big Data Concepts, Methods and Analytics," Int. J. Inf. Manage., vol. 35, no. 2, pp. 137–144, 2015.
  18. I. A. T. Hashem, I. Yaqoob, N. B. Anuar, S. Mokhtar, A. Gani and S. U. Khan, "The Rise of ‘Big Data’ on Cloud Computing: Review and Open Research Issues," Inf. Syst., vol. 47, pp. 98–115, 2015.
  19. R. Fang, S. Pouyanfar, Y. Yang, S.-C. Chen and S. Iyengar, "Computational Health Informatics in the Big Data Age: A Survey," ACM Comput. Surv., vol. 49, no. 1, pp. 1–36, 2016.
  20. B. Feldman, E. M. Martin and T. Skotnes, "Big Data in Healthcare - Hype and Hope," Dr. Bonnie 360 Degree (Bus. Dev. Digit. Heal., vol. 2013, no. 1, pp. 122–125, 2012.
  21. Oracle, "Oracle Big Data Products,"[ine], Available at: https://www.oracle.com/big- data/products.html.
  22. Hadoop, "Apache Hadoop,"[ine], Available at: https://wiki.apache.org/hadoop.
  23. D. Miner and A. Shook, MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems, O’Reilly Media, Inc., 2012.
  24. N. Yuhanna and T. F. WaveTM, "Market Overview: Big Data Integration," Forrester 49 Jordanian Journal of Computers and Information Technology (JJCIT), Vol. 3, No. 1, April 2017. Research, Inc. Reproduction Prohibited, 2014.
  25. A. Trnka, "Big Data Analysis," Eur. J. Sci. Theol., vol. 10, no. 1, pp. 143–148, 2014.
  26. J. Hurwitz, A. Nugent, F. Halper and M. Kaufman, Big data for dummies, John Wiley & Sons, 2013.
  27. Osnabrück University and IBM WATSON, "Flu-prediction Project," 2016,ine], Available at: http://www.flu-prediction.com.
  28. Google, "Google Flu Trends," 2016,[ine], Available at: http://www.google.org/flutrends/ about/
  29. D. Lazer and R. Kennedy, "What We Can Learn From The Epic Failure of Google Flu Trends," in: Wired, 2015.
  30. S. Yang, M. Santillana, and S. C. Kou, “Accurate estimation of influenza epidemics using Google search data via ARGO,” Proc. Natl. Acad. Sci., vol. 112, no. 47, pp. 14473–14478, Nov. 2015.
  31. B. Mole, "New Flu Tracker Uses Google Search Data Better Than Google," Scientific Method, USA, 2015.
  32. D. Lazer, R. Kennedy, G. King and A. Vespignani, "The Parable of Google Flu: Traps in Big Data Analysis," Science (80-. )., vol. 343, no. 6176, pp. 1203–1205, 14 March 2014.
  33. Builtinla, "Significant Benefits of Big Data Analytics In Healthcare Industry," Builtinla, 2016,[ine], Availavle at: http://www.builtinla.com/blog/significant-benefits-big-data-analytics- healthcare-industry.
  34. WHO, "Report of the Review Committee on the Functioning of the International Health Regulations (2005) in relation to Pandemic (H1N1) 2009," World Health Orgnaization, 2011.
  35. T. White, Hadoop: The definitive guide, O’Reilly Media, Inc., 2012.
  36. R. C. Taylor, "An Overview of the Hadoop/MapReduce/HBase Framework and Its Current Applications in Bioinformatics," BMC Bioinformatics, vol. 11, no. Suppl. 12, p. S1, 2010.
  37. The Apache Software Foundation, "Apache Pig!," 2016,[ine], Available at: https://pig.apache.org/.[Accessed: 05-Aug-2016].
  38. The Apache Software Foundation, "Apache Hive," 2016,
  39. ine], Available at: https://hive.apache.org/.[Accessed: 05-Aug.-2016].
  40. "Apache Sqoop - Overview : Apache Sqoop,"
  41. ine], Available at: https://blogs.apache.org/sqoop/entry/apache_sqoop_overview.[Accessed: 24-Dec.-2016].
  42. "Introducing Sqoop - Cloudera Engineering Blog,"[ine], Available at: http://blog.cloudera.com/blog/2009/06/introducing-sqoop/, Accessed: 24-Dec.-2016.
  43. A. Gupta, Learning Apache Mahout Classification, Packt Publishing Ltd., 2015.
  44. "5 Healthcare applications of Hadoop and Big data," DeZyre, 16 March 2015.
  45. K. J. Cios and G. William Moore, "Uniqueness of Medical Data Mining," Artif. Intell. Med., vol. 26, no. 1, pp. 1–24, 2002.
  46. "Methods for De-identification of PHI | HHS.gov," Health Information Privacy,[ine], Available at: https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de- dentification/index.html#guidancedetermination, Accessed: 16 Jan. 2017.
  47. S. White, "A Review of Big Data in Health Care: Challenges and Opportunities," Open Access Bioinformatics, vol. 6, pp. 13–18, 2014.
  48. O. Tene and J. Polonetsky, "Privacy in The Age of Big Data: A Time for Big Decisions," 50 "Big Data in Healthcare: Review and Open Research Issues", Mohammad Ashraf Ottom. Stanford Law Review, 2012,[ine], Available at: https://www.stanfordlawreview.org /online/privacy-paradox-privacy-and-big-data/.
  49. S. Kaisler, F. Armour, J. A. Espinosa and W. Money, "Big Data: Issues and Challenges Moving Forward," The 46th Hawaii International Conference on System Sciences (HICSS), pp. 995–1004, 2013.
  50. S. Kaisler, W. H. Money and S. J. Cohen, "A Decision Framework for Cloud Computing," The 45th Hawaii International Conference on System Sciences, pp. 1553– 1562, 2012.
  51. J. Moon et al., "Clinical Data Linkages in Spinal Cord Injuries (SCI) in Australia:," in: Big Data Analytics in Bioinformatics and Healthcare, vol. 43, no. 10, IGI Global, 1AD, pp. 392–405.
  52. A. W. Toga and I. D. Dinov, "Sharing Big Biomedical Data,” J. Big Data, vol. 2, no. 1, p. 7, Dec. 2015.
  53. P. Moghe, "6 Hidden Challenges of Using the Cloud for Big Data and How to Overcome Them," Insider, 2016,[ine], Available at: http://thenextweb.com/insider/2016/04/12/6-challenges- cloud-overcome/.
  54. S. Robinson, "The Storage and Transfer Challenges of Big Data," MIT Sloan Management Review, 2012,[ine], Available at: http://sloanreview.mit.edu/article/the-storage-and-transfer- challenges-of-big-data/.[Accessed: 16-Jan-2017].
  55. S. Debortoli, O. Müller and J. vom Brocke, "Comparing Business Intelligence and Big Data Skills," Bus. Inf. Syst. Eng., vol. 6, no. 5, pp. 289–300, Oct. 2014.