1. bookVolume 2 (2018): Issue 3 (December 2018)
Journal Details
License
Format
Journal
First Published
30 Mar 2017
Publication timeframe
4 times per year
Languages
English
access type Open Access

Big Data in Health Care: Applications and Challenges

Published Online: 31 Dec 2018
Page range: 175 - 197
Received: 08 May 2018
Accepted: 18 Jun 2018
Journal Details
License
Format
Journal
First Published
30 Mar 2017
Publication timeframe
4 times per year
Languages
English

The concept of Big Data is popular in a variety of domains. The purpose of this review was to summarize the features, applications, analysis approaches, and challenges of Big Data in health care. Big Data in health care has its own features, such as heterogeneity, incompleteness, timeliness and longevity, privacy, and ownership. These features bring a series of challenges for data storage, mining, and sharing to promote health-related research. To deal with these challenges, analysis approaches focusing on Big Data in health care need to be developed and laws and regulations for making use of Big Data in health care need to be enacted. From a patient perspective, application of Big Data analysis could bring about improved treatment and lower costs. In addition to patients, government, hospitals, and research institutions could also benefit from the Big Data in health care.

Keywords

Introduction

Big Data, the generic term for data sets of structured and unstructured data that are extremely large and complex so that the traditional software, algorithm, and data repositories are inadequate to collect, process, analyze, and store them (Asante-Korang & Jacobs, 2016; Kyoungyoung Jee & Gang Hoon Kim, 2013; Khoury & Ioannidis, 2014; Tan, Gao, & Koch, 2015), has become an intensively studied area in recent years. With the development of the Internet, the mobile Internet, the Internet of things, social media, biology, finance, and digital medicine, the volume of data has increased dramatically. Big Data not only describes the large size of data as its name suggests but also implies rapid data processing ability and novel technology and approaches for handling the data (Krumholz, 2014). After entering the 21st century, Big Data went through a series of evolutionary steps, and software in suitable environment has been developed. With the growth of information exchanges, Big Data has been expanded to a certain scale, not only in its size but also in data technology. In terms of its five main characteristics, volume, variety, velocity, variability, and veracity, state-of-the-art techniques, technologies, and equipment are required to deal with Big Data in correlation analysis, clustering analysis, modeling, prediction, and hypothesis verification. Thus, advanced hardware and software are required for data acquisition, extraction, processing, analysis, and storage. Currently, infrastructure for Big Data includes servers, storage systems, cloud service, and networking equipment. Software for Big Data includes parallel and distributed file systems, retrieval software, and data-mining software (Anderson & Chang, 2015).

The advanced analytical technologies developed for Big Data have driven its applications in many areas such as combating crime, business execution, finance, Global Positioning System (GPS), commerce, travel, urban informatics, meteorology, genomics, complex physics simulations, biology, environmental research, and health care (Chen, Mao, & Liu, 2014). Health care data are one of the driving forces of Big Data. With advanced data generation technology, there presents an exponential increasing trend in the volume of data. For example, as can be seen from the Human Genome Project completed in 2003, one single genome in human DNA occupies 100–150 gigabytes (Marx, 2013; O’Driscoll, Daugelaite, & Sleator, 2013). In terms of data size, Big Data in health care exceeded 150 exabytes after 2011 (Wang, Kung, Ting, & Byrd, 2015), and a study showed that data size in health care is estimated to be around 40 ZB in 2020, about 50 times the 2009 figure of 0.8 ZB (O et al., 2013) (Fig. 1A).

Figure 1A

Data explosion in health care

In addition, as researchers continue to make progress in health care, there is a dramatic explosion in the quantity of research literatures (Fig. 1B).

Figure 1B

Literature explosion searched with “health care” in Pubmed

Major Types and Sources of Big Data in Health Care

Health care has become an important issue in developed countries and middle-income countries (Kyoungyoung Jee & Gang Hoon Kim, 2013). Big Data in health care can be classified into four main types based on the data sources, i.e., Big Data in medicine, also named as medical/clinical Big Data; Big Data in public health and behavior; Big Data in medical experiments; and Big Data in medical literature. Table 1 summarizes the information of major data types.

Big Data in medicine and clinics

Big Data in medicine and clinics includes various types and large amounts of data generated from hospitals, such as clinical data, and medical imaging. It is often closely associated with doctors and patients. In other words, Big Data in medicine is generated from historical clinical activities (Tsumoto, Hirano, & Iwata, 2013) and has significant effects on the medical industry. For instance, it can assist in planning treatment paths for patients, processing clinical decision support (CDS), and improving health care technology and systems (Kyoungyoung Jee & Gang Hoon Kim, 2013).

In the medical domain, Big Data comes from hospital information resources, surgeons’ work, activities of anesthesia, physical examinations, radiography, magnetic resonance imaging (MRI), computer tomography (CT), information of patients, pharmacy, treatment, medical imaging, and imaging report (Tan et al., 2015; Wang & Alexander, 2013). These clinical activities generate a large number of records including identification information of patients, diagnosis, medicine scheme, notes from physicians, and sensor data (Tan et al., 2015; Wang & Alexander, 2013). Major data from clinical activities are electronic health record (EHR)/ electronic medical record (EMR), personal health record (PHR), and medical images. EMR comprises structured and unstructured data that contain all the medical activity information of the patients and is often used for treatment and treatment decisions, while EHR is associated with health-related information for individuals such as medical information and financial information, which are closely related to the health care of the individuals (L. Wang & Alexander, 2013; Wu et al., 2017). Differences between EHR and EMR are that EHR can be shared between different systems in different organizations (Heart, Ben-Assuli, & Shabtai, 2017; Joshi & Yesha, 2012; L. Wang & Alexander, 2013) and is the whole-life record of a patient from birth to death stored in the medical institution, while EMR is the complete record of patient’s disease stored in the hospital; EHR focuses on health management of residents, while EMR focuses on clinical diagnosis of patients; EHR also contains data of demographics, medical history, medication and allergies, immunization status, laboratory test results, radiology images, vital signs, personal statistics, and billing information (M, 2014); EMR is the record of care delivery organization (CDO) and belongs to CDO, while EHR is the subset of CDO and belongs to the patients or stakeholders (Garets & Davis, 2007). EHRs are adopted by many countries, generating about 500 petabytes of data in 2012, which is expected to reach 25,000 petabytes by 2020 (Feldman, Martin, & Skotnes, 2012).

Summary of Major Date Types of Big Data in Health Care

Data typeData nameData descriptionData acquisitionTechnology/database/system
Big Data in medicine and clinicsElectronic health record (EHR)/ electronic medical record (EMR)Standard data collection of medical and health information for patients and can be shared in different organizations (Gunter & Terry, 2005). Often comes from medical activities and public health dataHospital information resource, surgery’s work, activities of anesthesia, physical examination, radiography, magnetic resonance imaging (MRI), computer tomography (CT), information of patient, pharmacy, treatment, medical imaging, imaging report, identification information of patient, clinical diagnosis, medicine scheme, notes from physician, sensor data (Belle et al., 2015; Wang & Alexander, 2013), patient demographics, clinic or inpatient notes, electronic reportsMedical record data exchange, standards: Health Level 7 (HL7) , Continuity of Care Record (CCR), Continuity of Care Document (CCD), controlled medical vocabulary (CMV), computerized provider order entry (CPOE) (Valdes, Kibbe, Tolleson, Kunik, & Petersen, 2004) (Garets & Davis, 2007), all scripts, Epic Systems, Practice Fusion, NextGen Healthcare, clinical decision support systems, pharmacy management system, EMR Adoption Model (Wang & Alexander, 2013) (Garets & Davis, 2007), NoSQL database, clinical data repository (CDR) (Garets & Davis, 2007)
Personal health record (PHR)As its name suggests, it is the health-related data and information of patients (Tang, Ash, Bates, Overhage, & Sands, 2006) and about people’s lifelong health information. It is available for further use (Chen et al., 2012)Allergies and adverse drug reactions, chronic diseases, family history, illnesses and hospitalizations, imaging reports, laboratory test results, medications and dosing, prescription record, surgeries and other procedures, vaccinations and observations of daily living, and reported by patients (Rumsfeld, Joynt, & Maddox, 2016)Cloud computing, Health Insurance Portability and Accountability Act(HIPAA) , and HL7 (Chen et al., 2012); stored in paper like printed laboratory reports, copies of clinic notes, and health histories created by the individual; electronic devices such as personal computer-based software, CD, DVD, and smart card; web applications such as HealthVault and PatientsLikeMe; and cloud servers (Chen et al., 2012)
Medical imagesData that present visual information of interior human bodyX-ray, CT, histology, positron- emission tomography (PET), radiography, MRI, nuclear medicine, elastography, tactile imaging, photoacoustic imaging, echocardiography (Kovalev & Kalinovsky, 2015), ultrasonography, angiographyStatistical shape models (SSMs), medial models, clustering, active appearance models (AAMs), active shape models (ASMs) (Heimann & Meinzer, 2009), image segmentation algorithm, fuzzy C-means (FCM) algorithm (Zhang & Chen, 2004), image registration, picture archiving and communication systems, Super PACS (Picture Archiving and Communication Systems) , RIS, and digital image communication in medicine (DICOM) (Luo, Wu, Gopukumar, & Zhao, 2016)
ElectrocardiogramElectrical graph recording heartbeat activity of a person in a period of time like 1 minuteElectrocardiograph (ECG) signalMIT-BIH Arrhythmia Database, American Heart Association(AHA) database, Common Standards for Electrocardiography database, ST-T database, Physikalisch-Technische Bundesanstalt (PTB) and Paroxysmal Atrial Fibrillation(PAF)
Big Data in public health and behaviorVitalsMainly refer to four sings (temperature, pulse, respiratory rate, and blood pressure) and other physiological data outside the health-care setting (Rumsfeld et al., 2016)Temperature, pulse, respiratory rate, and blood pressureMobile technology, portable equipment, wearable system, and advanced devices like smartphones with third-party applications (HealthKit from Apple, Google Fit from Google, and S Health form Samsung), Android watches and Google glasses (Safavi & Shukur, 2014), and medical devices like implantable cardioverter– defibrillators (Rumsfeld et al., 2016)
-omics dataBiology information data in molecular- level catalog (Skotnes, 2012). Reflects characteristics of individual for treatment (Rumsfeld et al., 2016)Genomics, transcriptomics – whole genome sequencing, RNA seq, metabolomics –Nuclear Magnetic Resonance (NMR) , mass spectrometry, proteomics – mass spectrometry, methylomics – pyrosequencing, and ChIP-on-chipData End-of-life (EOL) Extension (DAnTE) and DanteR
Molecular biology experimentInteraction and regulation of biological activity within cells, such as interactions between DNA, RNA, proteins, and biosynthesisMolecular cloning, polymerase chain reaction (PCR), macromolecule blotting and probing, microarrays, and next-generation sequencingNCBI
Human body samplesData and samples of cells, tissues, and organs in human body (Bagayoko, Dufour, Chaacho, Bouhaddou, & Fieschi, 2010)Cells, tissues, and organsMayo Clinic Biobanks (http://specimencentral.com/biobank-directory/)
Big Data in medical experimentClinical trialsExperiments for evaluating new medical treatment (e.g., drug, device) (Kanagaraj & Sumathi, 2012)Drug efficacy, toxicity, new treatment devices, and proceduresClinicalTrials.gov
Journal/ conference articleResearch articles written by researchersPubmed.com, New England Journal of Medicine, Lancet, Nature, Science, and CellWebsite of journal articles, Google Scholar, and Science Citation Index (SCI)
Big Data in medical literatureStructured knowledgeMeSH and International Classification of Diseases 10th revision (ICD-10)Database in MeSHNCBI

PHR comes from a variety of patient health and social information; the main role of it is as a data source for medical analysis and clinical decision support (Poulymenopoulou et al., 2015) . It includes data of allergies and adverse drug reactions (ADRs), chronic diseases, family history, illnesses and hospitalizations, imaging reports, laboratory test results, medications and dosing, prescription records, surgeries and other procedures, vaccinations, and observations of daily living (ODLs). Unlike other document or text data, medical imaging mainly comes from X-ray, CT, histology, PET, radiography, magnetic resonance imaging (MRI), nuclear medicine, ultrasound, elastography, tactile imaging, photoacoustic imaging, echocardiography, and so on. It contains visual elements, and this means that data are usually very large (Kovalev & Kalinovsky, 2015).

Big Data in public health and behavior

Big Data in public health and behavior focuses on the physiological data of users that are often collected by portable equipment (Yan, Y., Qin, X., Fan, J., & Wang, L., 2014), such as electrocardiogram, vitals, contagion, wearable device, daily health record, sports, and diet.

Electrocardiogram is the electrical graph recording heartbeat activity of a person in a period of time, e.g., 1 minute; the recording process involves putting electrodes on the skin. Vitals, short for vital signs, include temperature, pulse, respiratory rate, and blood pressure. These signs are the most important four signs of the body’s function. Wearable device in public health refers to equipment that records details about lifestyle and vitals of people, from which the physicians can be assisted in treatment and diagnosis for patients. Advanced devices such as smartphones with third-party applications (HealthKit from Apple, Google Fit from Google, and S Health form Samsung), Android watches, and Google Glasses have been developed with sensors in the health care area (Safavi & Shukur, 2014). Since people have become more concerned with their own health on a day-today basis, ODLs have come to play a key role in recording personal daily health and behavior, signs, and symptoms of patients (Backonja et al., 2012). Additionally, data of sports and diet of people also contribute significantly to Big Data in public health and behavior. In the Apple iTunes store alone, there are more than 40,000 health care apps available (Aitken & Gauntlett, 2013). In 2017, it is predicted that more than 1.7 billion people will have downloaded health care apps.

In terms of infectious diseases in public health, there is a well-known case in which Google successfully predicted the time and scale of an influenza by analyzing the search engine results.

Big Data in medical experiment

This part of Big Data mainly focuses on molecular biology, human body data set, clinical trials, biology samples, gene sequences, and clinical and medical research laboratory tests and “omics” data (Table 1).

Molecular biology, a vital part of both biological and medical experiments, focuses on interaction and regulation of biological activities within cells, such as interactions between DNA, RNA, and proteins and biosynthesis (Fenderson & Bruce, 2008). It has a close relationship with fields of biochemistry and genetics in research of proteins and genes (Lodish, 2008). The main techniques of molecular biology include molecular cloning, polymerase chain reaction (PCR), macromolecule blotting and probing, microarrays, and so on. Human body data sets include samples of cells, tissues, and organs in human body, as well as cross-sectional photographs of the human body in the visible human project, which is used to visualize anatomy of human body in support of medical activities (Vesna, 2000). Similar to human body data sets, biological laboratory specimen also comes from sampling of human body and it is stored in biorepository. In case of one type of new drug, novel vaccines, or new medical device has been created, clinical trials should be processed before they come into use. Clinical trial, a kind of experiment or observation in medical or clinical research, is a procedure of evaluating the effectiveness of new medical treatment through study on human volunteers (DerSimonian & Laird, 1986). Gene sequencing, mainly referring to DNA sequencing, is a medical research activity of obtaining precise order of nucleotides within DNA. This process results in a large amount of data for recording DNA sequences. Medical research is often performed by researchers in universities, research institutions, and industry. The objective of their work is to make breakthrough in cellular, molecular, and physiological mechanisms in human for health care; fundamental parts of it also include molecular biology, medical genetics, immunology, neuroscience and psychology (Obenshain, 2004). Omics data are the biology information data in the molecular level catalog, which include genomics, proteomics, metabolomics, transcriptomics, epigenomics, lipidomics, immunomics, glycomics, and RNomics (Wu et al., 2017).

Big Data in medical literature

As the medical/clinical area has developed, currently, research articles as well as the structured knowledge are produced at a high speed. Additionally, there are also many older materials in the medical/clinical area. This literature makes a significant contribution to Big Data in health care.

Hospital information system (HIS) and its evolution

Technology for Big Data storage and processing like the Cassandra database has been applied; the main characteristic of this tool is that it can accommodate about two million columns in one row, making it more convenient to deal with large volumes of data (Kyoungyoung Jee & Gang Hoon Kim, 2013). In Big Data, including those in health care, one of the most popular processing tools Hadoop, created by Apache, uses the concept of distribution to handle tremendous volumes of data (Asante-Korang & Jacobs, 2016; Kyoungyoung Jee & Gang Hoon Kim, 2013). In terms of data management, data warehouses are used for supporting decision-making, online transaction processing (OLTP), and online analysis processing (OLAP) (Sheta & Eldeen, 2013). In addition, machine learning in data mining seems to be the most popular technological approach in Big Data analysis, and some technologies such as retrieval, web mining, decision tree, support vector machines (SVMs), clustering, neural network, network analysis, knowledge maps, and Natural Language Processing (NLP) and Multi-Layer Perceptron (MLP) approaches have been used. For instance, named-entity recognition is one of the most important techniques in BioNLP, used in recognizing particular entity processes such as gene normalization and event extraction (Usami, Cho, Okazaki, & Tsujii, 2011). Various techniques for – omics data analysis, such as amplified fragment length polymorphism (AFLP) for DNA fingerprinting and interpretation, validation tools for –omics data (Hassani S, 2010), and statistical tools data analysis tool extension (DAnTE) and data analysis tool extension R (DanteR) for –omics data analysis have emerged with different usages (Polpitiya er al., 2008; Taverner et al., 2012). In addition to the techniques in data processing, techniques for health care data have progressed in HISs. For example, a typical system is developed for data collection, data management, and data sharing in Hospital Information System (HIS) (Abernethy, Wheeler, & Bull, 2011). Currently, new technologies and new models have been found to be effective for structured and unstructured Big Data in health care. Data mining, as well as NLP, has been incorporated in the Big Data platform to handle complex scientific research oriental problems.

As a sociotechnical subsystem, HIS is commonly featured in presenting quality community for historical data resource, information, and knowledge in health care for hospital administration and patient health care (Bagayoko & Dufour, 2010; Kanagaraj & Sumathi, 2011; Roberts, 1985; Tsumoto et al., 2013) (Table 2). HIS was developed only for administrative management usage in the early 1960s and gradually expanded to information management after 1970 (Pai & Huang, 2011). Broadly speaking, there are many types of HIS. For instance, PACS, short for picture archiving and communication systems, is a common HIS for storing and transferring digital images (Joshi & Yesha, 2012). Additionally, laboratory information system (LIS), radiology information system (RIS), ultrasound information system (UIS), and EHR system, EMR system and PHR system are also included (He, Jin, Zhao, & Xiang, 2010; Joshi & Yesha, 2012). In

Systems for Acquiring Medical/Clinical Big Data

SystemDescription
HISHospital information system; the system provides quality community for historical data resource, information, and knowledge in healthcare for hospital administration and patient health care (Bagayoko et al., 2010; Kanagaraj & Sumathi, 2011; Sirintrapun & Artz, 2016; Tsumoto, Hirano, & Iwata, 2013)
LISLaboratory information system; often used to collect, restore, archive, process, extract, and analyze data in laboratory; this system aims to improve efficiency of turn-around-times (TAT) of records, quality of resource utilization, and public health supporting (Blaya et al., 2007; Sepulveda & Young, 2013)
RISRadiology information system; it is used to capture and store data including images, demographic and clinical information, and so on, also assisting in patient registration, report repository, and physician directory with advanced technology (Nance, Meenan, & Nagy, 2013)
PACS (super sound PACS, endoscope PACS)Picture archiving and communication systems; it is a common HIS for storage and transferring of digital images (Joshi & Yesha, 2012)
EMREMR system is used to maintain medical records and store, process, and retrieve information. It also ensures accuracy of information. Its aim is to ensure accuracy of information in order to provide patient control and transparency, interdepartmental communication, and great reporting capabilities for treatment (Kumar & Aldrich, 2010)
Cost accountingSystem for collecting, recording, classifying, analyzing, summarizing, allocating, and evaluating financial cost in the medical area
Physical examination systemSystem for checking signs of patient

terms of handling HL7 format data, the open archive information system model was applied (Celesti, Fazio, Romano, & Villari, 2016). HIS presents the ability to capture, store, and process health care data and often requires a large number of techniques to assist it. In other words, one of the major research challenges is how to integrate advanced techniques of information processing into HIS (Roberts, 1985). Cloud computing, a technique for data storage and sharing, is widely used in information system. The use of cloud computing in HIS is well known and very common for data processing, data backup, and information sharing between different organizations, such as cloud-based PACS and cloud-based EHR systems (He et al., 2010; Joshi & Yesha, 2012; Kanagaraj & Sumathi, 2011). Cloud security requires in many aspects, including data security, application security, system security, network security, and physical security, a high-quality of security management platform. Additionally, novel techniques have been proposed to improve the quality of HIS. For example, in order to achieve data-level interoperability, an adaptive AdapteR Interoperability ENgine (ARIEN) mediation system was proposed (Khan et al., 2014) for HIS with different health care standards. Open-source software is also available for supporting Hospital Information System (HIS) development. According to Bagayoko & Dufour (2010), web infrastructure, server operation systems, developer tools, and databases are commonly used in Europe and North America.

Unique Features of Big Data in Health Care

In addition to the “5V” features of Big Data, Big Data in health care has its own unique features, such as heterogeneity, incompleteness, timeliness and longevity, data privacy, and ownership.

Heterogeneity

Big Data in health care often has incompatible formats, which can be classified into structured and unstructured data. For example, some EHR collect data in structured formats and International Classification of Diseases 10th revision (ICD-10) are structured (Asante-Korang & Jacobs, 2016). However, the majority of Big Data in health care is unstructured, including data from CT, MRI, X-ray, Holter monitoring, angiography, and laboratories (Swan, 2013).

The sources of the Big Data in health care can be classified into four categories (Table 1). There is a shortage of tools to analyze the information from these heterogeneous sources. A German calciphylaxis registry proposed a framework and developed a tool to integrate medical record, imaging data, and signal data for the purpose of improving knowledge of rare diseases (Deserno et al., 2014). Windridge and Bober (2014) proposed a kernel-based framework to analyze heterogeneous data in the medical domain, which addressed the missing data problem presented by patients with sparse or absent data modalities. Using the kernel method, regression and classification of heterogeneous medical information can be achieved. Cismondi et al. (2013) developed a classifier to determine which missing data of ICUs should be imputed and which should not be. Through a simulated test bed, the performance of this method is improved compared with that of the previous work.

Incompleteness

To the extent that the data created by monitoring devices consist of continuous data streams, such as electrocardiogram, it is difficult to consistently save it in the longitudinal record (Clemens Scott Kruse, Rishi Goswamy, Yesha Raval, & Sarah Marawi, 2016). It is too expensive to store all the Big Data in health care, a situation that leads to data incompleteness. Additionally, the EHR requires doctors or nurses to record disease information of patients, such as medications and allergies, and this process may also lead to data incompleteness (Hong, Kaur, Farrokhyar, & Thoma, 2015). In Menelik II Referral Hospital, inpatient medical record completeness was 73%, which is low against the standard. Medical records not only support direct patient care but also support clinical audit, epidemiology, medical research, and resource allocation. Improving the completeness of medical records is important to improve the quality of health care (Tola, Abebe, Gebremariam, & Jikamo, 2017).

Timeliness and longevity

For HIS, there is a delay time from when the EHR information is entered into HIS to the point when the EHR is available for electronic access (Medicare & Medicaid Services 2010). Medical signals such as electrocardiogram (ECG), Single Photon Emission Computed Tomography (SPECT) images, MRI, and EEG are a function of time and thus have a strong timeliness. Keeping medical/health information current is a major challenge for Big Data in health care analytics, and HIS should maximize the timeliness of data. At the same time, storage time of medical records is different among hospitals. For some familial or genetic diseases, it is useful to know the family history in order to support medical decision-making. To this point, there is no link between one’s medical records with those of his/her family members.

Data privacy

Owing to the sensitivity of health care data, there are significant concerns regarding privacy and security (Clemens Scott Kruse et al., 2016; Naito, 2014). Extreme care should be taken to protect patient privacy, and privacy concerns pose limitations in linking external data to individual insured data, which may improve consumer health-related experience and personalize service and care (Yuen-Reed & Mojsilović, 2016). Because of the centralization of much health care information, the data are highly vulnerable to attacks (Mohr, Burns, Schueller, Clarke, & Klinkman, 2013). Owing to privacy issues, Herland et al. (2014) used synthesized EMR/EHR and PHRs with help from a medical professional to conduct their research. Health care mobile phone applications, such as Google Health, promise consumers “complete control over your data,” meaning that personal information will not be sold or shared without the consumer’s explicit permission (Steinbrook, 2008). In different countries, there are two patterns of policies and regulations to protect the data in health care. In one pattern, based on the basic privacy laws, governments pass additional laws, policies, and regulations to protect personal health care information, such as HIPAA in the US, Health Records and Information Privacy Act 2002 in Australia, and Medical Privacy Act and Healthcare Insurance Act in France. In the other pattern, taking personal health care information as part of personal information or sensitive information, governments pass laws to protect personal information or sensitive information, such as the Data Protection Act in England and he Personal Information Protection and Electronic Documents Act (PIPEDA) in Canada.

Ownership

Although consumers who have medical needs legally own their health data, which may be stored and controlled by hospitals, physicians, laboratories, clinics, pharmacies, and government agencies in innumerable, incompatible data silos, consumers may lack access to and control over their own health care data. To solve this problem, the cooperative, which is an old and successful form of corporation that is entirely owned by citizens, is an effective approach. Each consumer has one account that stores and manages all health care data. They can share subsets or all the data for research purposes (Pentland, Reid, & Heibeck, 2013).

Importance of Big Data in Health Care

It is important to extract valuable information and discard useless fragments from Big Data. As the main issue for this discussion, Big Data in health care could produce considerable economic benefits with the application of Big Data analytics (BDA). For example, a significant amount of money could be saved in the health care industry (Asante-Korang & Jacobs, 2016). Additionally, it would be applied in clinical diagnosis, medical research, hospital management, and fundamental demand in medicine. Through the use of Big Data techniques, patients may have personalized medicine and patient-centric care. This argument supposes that Big Data would help to provide novel approaches to deal with issues in health care (C. S. Kruse, R Goswamy, Y Raval, & S Marawi, 2016).

The perspective of the research institution and the hospital

Research institutions could better understand the mechanisms and effects of newly developed drugs through BDA. For example, it could also reprocess cancer data to hunt for new cancer drugs (Marx, 2013). Through using statistical tools and algorithms, researchers could improve the clinical trial design and reduce trial failures (Wullianallur Raghupathi & Raghupathi, 2014).

Physicians could use clinical decision support systems (CDSS) with BDA to make more informed decisions, which may improve the quality of patient care (K. Jee & G.-H. Kim, 2013; Kim, Park, Yi, & Kim, 2014). Allowing Big Data to influence clinical decision-making, new practices, and treatment guidelines within clinical research may be integrated and lead to an optimized result. BDA and computer-aided diagnostics may be used to save time in cancer detection, reducing the false-positive rate of cancer diagnosis (Costa, 2014). Now in the cardiology area, computing and Big Data technology enable cardiologists to read patients’ medical record via smartphones, which are helpful in identifying emergency cases in need of immediate treatment (Hsieh, Li, & Yang, 2013).

The perspective of the government or the public

BDA could reduce costs in the medical domain, estimated at approximately 8% of national health care expenditures for the US government (Manyika et al., 2011). In Italy, by exploiting the admissions for “laparoscopic appendectomy” surgery in different sanitary districts, it was possible to categorize districts based on cost efficiency and timeliness by using the number of admissions and the average days of hospitalization. This data analysis provides an automatic and continuous monitoring of the sanitary districts. The results of this data analysis provide useful insights into reducing cost and increasing the effectiveness and efficacy of health care services (Mancini, 2014a).

BDA could help governments prevent the spread of infectious diseases. In Pakistan, BDA with smartphone technology helped in detection and prevention of the early stage of the dengue fever epidemics. The method was also used to detect outbreaks of flu epidemics in the US (Pentland et al., 2013). Governments can thus respond more quickly to epidemics and help people avoid the disease.

BDA has the potential to reveal regional health problems. For example, Duke University led a project that involved building an integrated clinical data warehouse by combining millions of patient records from their EHRs with geographic information system data (Braunstein, 2015). Based on the combined data, this project reveals the social determinants of health.

The perspective of patients and their relatives

Using health care mobile phone applications and other online health-related websites, patients can store, retrieve, manage, and share their health data. Over the long term, this process will improve health care and decrease costs, especially for patients who have complicated chronic conditions (Steinbrook, 2008), such as diabetes. Some diabetes applications offer a variety of functions, including medication or insulin logs, self-monitoring blood glucose recording, and prandial insulin dose calculators (Demidowich, Lu, Tamler, & Bloomgarden, 2012), and others integrate health care providers who can access the patients’ records and formulate personalized feedback. Thus, patients can take the right treatments and live healthier, more comfortable lives (Asri, Mousannif, Al Moatassime, & Noel, 2015).

Through Big Data techniques, patients may have personalized medicine and patient-centric care (Chawla & Davis, 2013; Collins, 2016). Chawla and Davis (2013) constructed a framework called the Collaborative Assessment and Recommendation Engine (CARE) for patient-centered disease prediction and management. It can generate personalized disease predictions and management plans. In addition through BDA, three drugs have been identified and used in specific groups of cancer patients. Dabrafenib is used to treat melanoma; the BRAF mutation V600E, a targeted therapy using trastuzumab, is used to treat breast cancer and the amplification or overexpression of the gene encoding Her2/Neu; and imatinib is used to treat different types of tumor that contain the fusion protein BCR-ABL (Costa, 2014).

Through BDA, patients may have their diseases detected earlier, receive treatment earlier, and have better outcomes (K. Jee & G.-H. Kim, 2013; Kim et al., 2014). In daily life, BDA can help patients and their relatives monitor their respective conditions.

Common Approaches for Analyzing Big Data in Health Care

With the growing awareness of data as an asset, more and more data-mining approaches are adopted in order to gain insights from large volumes of data. In medicine and health care, a data-rich environment generates an enormous amount of data every day. Thus, we need to use data-mining approaches such as classification, clustering, regression analysis, and association rules to analyze big health care data.

Classification

Classification is the process of organizing data into categories for its most effective and efficient use. Classification is widely applied in mining health care data. There are some specific introductions in these areas.

Primary care influences child health outcomes by managing illness and providing preventive and health promotion services. New Zealand is in a strong position to analyze patterns of childhood morbidity due to universal enrollment with a primary care provider at birth. However, analyzing morbidity patterns within these extracted data is problematic because primary care practices do not consistently or frequently use diagnostic labeling and there is marked variability between clinicians and conditions. A study conducted by MacRae et al. (2015) aimed to extend the use of Pattern Recognition Over Standard Aesculapian Information Collections (PROSAIC) to identify childhood respiratory conditions within primary care consultations by building an algorithm to classify the unstructured clinical narrative written by clinicians. Three independent sets of 1,200 child consultation records were randomly extracted from a data set of all general practitioner consultations in participating practices between January 1, 2008, and December 31, 2013, for children younger than 18 years of age (n=754,242). Each consultation record within these sets was independently classified by two expert clinicians as respiratory or non-respiratory and subclassified according to respiratory diagnostic categories to create three “gold standard” sets of classified records. These three “gold standard” record sets were used to train, test, and validate the algorithm. Then, sensitivity, specificity, positive predictive value, and F-measure were calculated to illustrate the algorithm’s ability to replicate judgments of expert clinicians within the 1,200 record “gold standard” validation set. This algorithm that uses primary care Big Data can accurately classify the content of clinical consultations. It enables accurate estimation of the prevalence of childhood respiratory illness in primary care and the resultant service utilization. The algorithm is able to analyze very large data sets, including routinely recorded unstructured clinical narratives. These data sets would be impractical to analyze manually.

Frantzidis et al. (2010) applied data classification techniques to emotion recognition for health care applications, taking into account the bidirectional emotion theory model that accounts emotions as mixtures of two (orthogonal and independent) dimensions, namely, valence and arousal. Specifically, this paper uses classification rules derived from the C4.5 algorithm and pattern classifier based on the Mahalanobis distance. It then favors the role of multiphysiological recordings for the enhancement of emotion discrimination and the use of metadata structure designs via the extensible markup language (XML) for linking the various system components.

Fan et al. (2011) developed a hybrid model named case-based reasoning and fuzzy decision tree (CBFDT) for medical data classification in two medical domains: breast cancer diagnosis and liver disorder diagnosis. In this paper, they introduced the method and algorithm of a case-based fuzzy decision tree (FDT) model for medical classification problems. Two medical data sets including liver disorders and Breast Cancer Wisconsin are selected from University of California Irvine (UCI) database. More than 900 data sets are used to conduct this experiment. Decision tree induction is free from parametric assumptions, and it generates a reasonable tree by progressively selecting attributes to branch the tree. By combining all kinds of medical features of liver disorders and Breast Cancer Wisconsin database, this research applies an FDT to develop a forecasting model for generating decision rules in disease classification. This classification model integrates a data clustering technique, an FDT, and genetic algorithms (GAs) to construct a medical classification system based on medical database. It can be divided into four major steps: (1) screening medical database from UCI data set; (2) clustering case library into smaller cases; (3) establishing FDT; and finally (4) outputting the classification results.

Clinical data usually contain numerous features with small sample size, leading to degradation in accuracy and efficiency of the system by curse of dimensionality. This leads to the degradation of classifier system’s performance in high-dimensional data sets because irrelevant features not only lead to insufficient classification accuracy but also add extra difficulties in finding potentially useful knowledge. Azar and Hassanien (2015) presented a linguistic hedges neuro-fuzzy classifier with selected features (LHNFCSF) for dimensionality reduction, feature selection, and classification. The new classifier is compared with the other classifiers for different classification problems. All data sets are in the public domain. The data sets are breast cancer Wisconsin diagnostic, breast cancer Wisconsin prognostic, erythemato-squamous disease, and thyroid disease data set. These data sets are obtained from the well-known UCI machine learning repository. The results indicate that applying LHNFCSF not only reduces the dimensions of the problem but also improves classification performance by discarding redundant, noise-corrupted, or unimportant features. The results strongly suggest that the proposed method not only helps reduce the dimensionality of large data sets but also can speed up the computation time of a learning algorithm and simplify the classification tasks.

Estella et al. (2012) designed an advanced system for autonomously classifying brain MRI images of neurodegenerative diseases, with the main purpose of assisting in decision-making in classification tasks. The method was tested on data from a large database (more than 1,500 patients were analyzed), with a sensitivity of and specificity close to 90%, which are considerably better than those predicted by human experts.

Clustering

Clustering is the task of grouping a set of objects in such a way that objects in the same cluster are more similar to each other than those in other clusters. Clustering techniques are widely used for exploratory data analysis, with applications including patient segmentation, outlier health care data detection, disease prediction, and clustering of patients.

Elbattah & Molloy (2017) employed clustering in order to realize the segmentation of patients from a data-driven viewpoint. The Irish Hip Fracture Database (IHFD) is the primary source of data used in the study. Its records contain ample information about patients’ journeys from admission to discharge. Then, a set of data pre-processing procedures are conducted for two purposes: (1) dealing with data anomalies and (2) extraction of additional features that are considered as indicators of care quality. In this paper, the authors use k-means algorithm as the partitioned clustering approach. The k-means clustering uses a simple iterative technique to group points in a data set into clusters that contain similar characteristics.

Christy et al. (2015) proposed two cluster-based outlier detection algorithms including distance-based outlier detection and cluster-based outlier detection. The main purpose of the algorithms was to remove outliers that are irrelevant or only weakly relevant to the analysis of health care data. Experimental evaluation based on the metrics of F-score and likelihood ratio shows that the cluster-based outlier detection method outperforms distance-based outlier detection method.

Huang and Yao (2016) proposed a novel clustering approach for multidimensional physical health data based on artificial ant colony optimization. This method is determined through testing to be an effective and efficient approach to clustering health and medical data for further analysis.

Paul and Hoque (2010) proposed to use the background knowledge of medical domain in the clustering process to predict the likelihood of diseases. The developed algorithm can handle both continuous and discrete data and perform clustering based on anticipated likelihood attributes with core attributes of disease in data point. In this paper, its effectiveness has been demonstrated by testing it on a real-world patient data set.

Hastie et al. (2005) conducted a test in which 188 individuals (59.0% female) completed several psychological instruments and underwent ischemic, pressure, and thermal pain assessments. Then, 13 separate pain measures were obtained by using three experimental pain modalities with several parameters tested within each modality. Cluster analyses of PSI scores revealed four distinct clusters, and significant correlations were found between psychological measures and index scores. These findings highlight the need for future investigation to identify patterns of responses across different pain modalities in order to more accurately characterize individual differences in responses to experimental pain.

Regression analysis

Regression analysis is widely used in analyzing health care Big Data for estimating the relationships among variables or properties. The main research issues include trend features of data sequences, prediction of data sequences, and relationships between data.

With the emergence of administrative databases, the ability to access longitudinal patient data to adjust for comorbidity has improved considerably. This raises the issue of the most appropriate lookback period to determine patients’ disease status for risk estimation. Most research has used relatively short lookback durations, but longer lookback periods are likely to capture more conditions per patient, as well as assign comorbidities to a greater proportion of patients. Preen et al. (2006) conducted a research to discover the impact of different comorbidity ascertainment lookback periods on modeling post-hospitalization mortality and readmission. Data were extracted for ~1.1 million patients admitted to hospital in the Washington State from July 1990 to December 1996. Hierarchically nested Cox regressions were used to model mortality within one year and readmission within 30 days of index separation. Additionally, deaths within one year and readmissions within 30 days of index hospitalization were analyzed using logistic regression and receiver operator characteristic (ROC) area under the curve (AUC) determined for each hierarchically nested lookback model in order to estimate the predictive power of different models. The result is that longer lookback resulted in more comorbidity being identified. For the entire sample, 46.8% of comorbidity observed across the five-year lookback period was recorded at index hospitalization. For readmission, lookback periods of five years perform better than shorter durations for both patient groups.

Risk adjustment is an important component of outcomes and quality analysis in surgical health care. However, there are some concerns that should be addressed if risk-adjustment models avoid subjective data elements, such as history of comorbidities, and rely on objective data, such as laboratory values or other machine-collected variables that do not require subjective interpretation and input of hospital personnel.

A study was conducted by Anderson and Chang (2015) was conducted to determine whether machine-collected data elements could perform as well as a traditional, full risk-adjustment model that includes other physician-assessed and physician-recorded data elements. This research uses all available The National Surgical Quality Improvement Program (NSQIP) data from January 1, 2005, to December 31, 2010. This nationally validated program measures more than 135 variables on each patient and follows up each patient for 30 days postoperatively. The primary analysis included all patients in the database who were categorized as having had an operation performed by a general surgeon or surgeons in some surgery subspecialties and having an adverse event. Multivariate logistic regression models were created to predict either mortality or any complication in the inpatient setting or within 30 days of surgery. The researchers then compared the ROC AUC of each regression using objective preoperative risk variables to its corresponding regression with all variables. A total of 745,053 patients were included. The difference in AUC comparing models with all variables with objective variables ranged from −0.0073 to 0.1944 for mortality and from 0.0198 to 0.0687 for complications. These data suggest that it is possible to create a risk-adjustment system with a high discriminatory value based only on objective variables. By restricting data collection to objective data, we can reduce concerns about reliability and validity as well as threats of gaming the system from attempting to increase the risk score of patients through subjective variables.

Kennedy et al. (2013) conducted a retrospective cohort study. In this paper, they identified all Veterans Health Administration (VHA) patients without recent cerebral and cardiovascular (CCV) events treated at twelve facilities from 2003 to 2007 and predicted risk using the Framingham risk score (FRS), logistic regression, generalized additive modeling, and gradient tree boosting.

Oztekin et al. (2009) used three different variable selection methods on a large and feature-rich data set to generate a consolidated set of factors and use them to develop Cox regression models for heart–lung graft survival. The main objective of this study was to improve the prediction of outcomes following combined heart–lung transplantation by proposing an integrated data-mining methodology. The data files were obtained from United Network for Organ Sharing (UNOS) using a formal data requisition procedure. The complete data set consists of 443 variables and 61,391 records. These variables included the socio-demographic and health-related factors of both the donor and the recipients. There are also procedure-related factors included in the data set. The results indicated that the proposed integrated data-mining methodology using Cox hazard models better predicted graft survival with different variables than the conventional approaches commonly used in the literature.

Association rules

Association rule mining aims to discover associations between items in large databases. The typical association rule mining methods include Apriori (Agrawal, Imieliński, & Swami, 1993) and Frequent Pattern (FP)-tree growth (Han, Pei, & Yin, 2000). Association rule mining is normally a two-step process where in the first step, frequent item-sets are discovered (i.e., item-sets whose support is no less than a minimum support) and in the second step, association rules are derived from the frequent item-sets using some measures of interestingness.

Antonie et al. (2001) used Apriori algorithm to discover association rules among the features extracted from the mammography database and the category to which each mammogram belongs. They constrained the association rules to be discovered such that the antecedent of the rules is composed of a conjunction of features from the mammogram, while the consequent of the rules is always the category to which the mammogram belongs. Once the association rules are found, they are used to construct a classification system that categorizes the mammograms as normal, malign, or benign.

In a medical database, the most complete and detailed information is anamnesis data, which contain disease name, prescription, patient’s detail information, etc. Through this method, it is possible to find the association rules between diseases. Driven by this, Kuo et al. (2007) proposed a novel framework of data mining that clusters the data first and then follows with association rule mining. The first stage uses the ant system-based clustering algorithm (ASCA) and ant k-means (AK) to cluster the database, while the ant colony system (ACS)-based association rule mining algorithm is applied to mine the association rule for each cluster. Experimentation on the data sets provided by the National Health Insurance Plan of Taiwan demonstrates that the proposed method can find the hidden rules that may occur less often but have robust relationships.

Systems and Applications for Analyzing Big Data in Health Care

Big Data can provide support across many aspects of health care. BDA has made progress to different degrees in CDS, remote medical information services, public health, disease pattern analysis, and personalized medicine. There are some specific applications and potential opportunities in these areas.

CDSS

A CDSS can provide a large amount of medical support for clinicians, helping them to make diagnoses and choose the best treatments. CDSS helps in supplementing the knowledge of clinicians, preventing human negligence, and reducing the costs while improving the quality of medical treatment. Representative data-driven CDSSs include the Health Evaluation Through Logical Processing (HELP) system, Quick Medical Reference (QMR) system, Iliad system, and MYCIN system.

The HELP system

The Health Evaluation Through Logical Processing system (Gardner, Pryor, & Warner, 1999) is the first data-driven clinical decision-making and HIS. The system uses the knowledge base to make decisions from the multi-source clinical data stored in its integrated clinical database. For example, a serum potassium of 6.2 meq/L will trigger an elevated potassium alert to the nurse caring for a patient via a digital pager. Time-driven decision-making capabilities are also available within the HELP system. Using natural language processing, data from transcribed reports such as handwritten medical records have become a major source of data for decision-making.

The HELP system consists of a knowledge base, decision-making processor, data and time driver, data review alerts, accounting system, longitudinal patient data repository, and other components.

The QMR system

QMR is a typical CDSS to help physicians, using the knowledge base of INTERNIST-1/CADUCEUS. This knowledge base is widely used as a medical book, which contains 750 diseases, 5,000 clinical symptoms, and more than 50,000 disease relationships. QMR was one of the earliest CDSSs to use artificial intelligence and probability ranking system.

Because many of the diseases in the system are rare and documented, an ad hoc scoring model is proposed to encode the relationship between specific clinical symptoms and disease. One of the factors limiting the use of QMR is that its knowledge base needs to be constantly updated. The significance of QMR lies in its powerful knowledge base, which is used as the basic model of other knowledge base system.

The Iliad system

Iliad is a medical expert consulting system developed by the University of Utah School of Medicine. It is used as a consultation tool or a simulation training tool for CDS and teaching (Lincoln, 1998).

The Iliad consultant utilizes a number of inferencing mechanisms to emulate the strategy of a medical expert in working with a patient. The knowledge in Iliad is represented in Bayesian and Boolean frames. These frames permit the use of sensitivities and specificities to describe the relationship of a disease to its manifestations and provide a basis for explaining its conclusions. Iliad has four basic components: the inference engine, the user interface, the data driver, and the best information algorithm.

The MYCIN system

MYCIN is an interactive expert system for the diagnosis and treatment of central nervous system’s infection (Berner, 2003). It is composed of three subsystems: consultation, interpretation, and rules. According to the clinical manifestations and laboratory results of patients, MYCIN imitates the expert reasoning process, assists clinicians in determining bacterial species, and makes clinical recommendations. The system adopts the method of if–then inference rules and produces more than 400 kinds of embodied knowledge expert judgment rules.

Remote medical information systems

The aggregated electrocardiogram (ECG) and images from hospitals worldwide can become Big Data, which could be used to develop an e-consultation program helping on-site practitioners deliver appropriate treatment. Real-time teleconsultation and telediagnosis of ECG and images can be practiced via an e-platform for clinical, research, and educational purposes.

With respect to large-scale data research, Chia and Syed (2011) used Big Data computing to generate a predictor of the mortality risk for patients with acute coronary syndromes in 2011. This predictor was developed through data mining and machine learning, based on 24-hour continuous ECG readings over 4,000 patients’ trials. In each trial, 24-hour ECG readings were collected in a two-year period. This Big Data-based predictor can predict over 50% of deaths with fewer false positives as compared with the traditional ECG analysis, which was conducted based on a smaller segment of ECG signals. This approach can be easily extended to other clinical and non-clinical applications focused on approximate sequential pattern discovery in massive time-series data sets.

To make telemedicine more efficient, medical wearable devices that apply Big Data-mining and analysis techniques are used. For example, patients with dementia (such as Alzheimer type) need to be looked after day and night in order to manage their negative behaviors, which means a sea of input of labor and capital. With the purpose of resolving this problem, real-time health monitoring devices have been developed to capture a large amount of data. Based on these real-time data, patients with dementia can be diagnosed whether in agitation or not. At the same time, medical Big Data also pose challenges to data cleaning; poor-quality data should be identified and rejected to ensure that the results of data mining are right. Moreover, data captured from remote motoring devices can be mined to realize long-term prognoses.

A Context Processing Algorithm (CPA) (Moore, Xhafa, Barolli, & Thomas, 2013) is proposed to address the issues encountered in decision support in medical diagnosis and potential prognoses based on the event–condition– action (ECA) rule concept. CPA regards captured Big Data as a kind of contextual information to carry out data processing in intelligent context-aware systems.

On the basis of Big Data, pervasive remote medical systems are designed for both healthy and ill people. Páez et al. (2015) proposed an architecture including the application of cloud computing, Big Data, and Internet of things approaches to make sure chronic or non-chronic patients as well as healthy people are monitored in different environments. Family members, emergency systems, and hospitals can interact with the patients whenever and wherever possible.

While Big Data promotes the function of medical remote monitoring and diagnosis, the development of telemedicine also enriches the connotation of Big Data. Traditionally, medical Big Data refers to EHR and remote monitoring health data. However now, medical Big Data, including user’s behaviors, physical strength, and mental state data, has been rapidly generated (Redmond et al., 2014). Technological advances in the medical field, such as medical video communications, also provide a new type of medical Big Data. For instance, a light-field-based 3D cloud telemedicine system (Wang, Xiang, Pickering, & Zhang, 2016) that combines Big Data analysis with 3D technologies is proposed to mine big video data.

Applications in public health

In the field of public health, BDA represents a new solution that can mine web-based and social media data to predict disease outbreaks based on consumers’ searches, social content, and query activity. Systems in public health also support clinicians and epidemiologists performing analyses across patients and care venues to help identify disease trends and drug safety.

BDA is often used for monitoring of disease networking. An example is Google’s use of BDA to study the timing and location of search engine queries to predict disease outbreaks. Research shows that one-third of consumers currently use social networking for health care purposes (Facebook, YouTube, blogs, Google, Twitter). As demand for access to health information from social networking sites continues to proliferate, BDA can potentially support key prevention programs such as disease surveillance and outbreak management.

The Global Burden of Disease Study (GBD) is a comprehensive regional and global research program of disease burden that assesses mortality and disability from major diseases, injuries, and risk factors. GBD is a collaboration of more than 1,800 researchers using medical Big Data from 127 countries. The 2015 report (Collaborators, 2017) showed that globally, diarrhea was a leading cause of death among all ages, as well as a leading cause of disability-adjusted life years (DALYs) because of its disproportionate impact on young children.

BDA is also widely applied to supervise drug safety, particularly ADRs, and identify susceptible population. ADR is defined as an appreciably harmful or unpleasant reaction resulting from an intervention related to the use of a medicinal product (Edwards & Aronson, 2000). ADR can be used in the field of medical administration and warrants prevention, specific treatment, alteration of the dosage regimen, and withdrawal of the product.

With the help of Big Data, health departments or medical companies can efficiently take actions when they detect potential ADRs among the people who take the medication. In 2004, Wilsonet al. proposed that Knowledge Discovery in Databases (KDD) is a more effective way to determine the presence and assess the strength of ADR signals. At this point, numerous data-mining techniques have been used in drug safety, such as cluster analysis, link analysis, deviation detection, and disproportionality assessment.

As Big Data emerges, health social media sites are regarded as a fast and direct data resource for scientist to get first-hand ADR information. Compared with ADRs recorded by health professionals, spontaneous reporting of data on health social media sites is much more abundant, open, and timely. Owing to the advantages discussed earlier, Christopher et al. (2009) used association mining and proportional reporting ratio to analyze the detected ADRs for different drugs on the basis of social data. Given the prosperity of medical research especially in the ADR field and the advantages of Big Data, Shah et al. (2012) believed that Big Data in biomedical informatics will grow considerably. There is no doubt that the age of data-medicine is poised to create a proactive, predictive, preventive, participatory, and patient-centered health system.

Apart from the great potential shown in drug safety, Big Data can also achieve powerful effects in identifying susceptible populations. A large collection of EHRs accumulated by various medical treatments provides an opportunity to dig out the statistical model of high-risky people. The model aims to reduce the cost of health care and conserve limited resources in health value. Bates et al. (2014) suggested that identifying and managing six practical use cases’ data is the way to use predictive medical systems. The use cases include high-cost patients, readmissions, triage, decompensation, adverse events, and treatment optimization for diseases affecting multiple organ systems.

Applications in disease pattern analysis and personalized medicine

Hay et al. (2013) imported new sources of data, such as social data, to relevant environmental information to create a dynamic and real-time global infectious disease map. On the basis of infectious disease risk maps, human beings can deepen their knowledge of infectious diseases and improve the ability to triage spatially and issue infectious disease outbreak alerts. Lazer et al. (2014) stated that “Big Data hubris” is the often implicit assumption that Big Data is a substitute for, rather than a supplement to, traditional data collection and analysis. Given that most Big Data cannot reach the standard of scientific statistical analysis, there is no doubt that the results can have large errors. Additionally, medical algorithms are not constant. On the contrary, they are dynamic and process a continuous series of adjustments.

Big medical data can be applied not only to mining public medical patterns but also to personalized medical care. At present, health care is moving from a disease-centered model toward a patient-centered model. In a disease-centered model, physicians’ decision making is centered on the clinical expertise and data from medical evidence and various tests. In a patient-centered model, patients actively participate in their own care and receive services focused on individual needs and preferences.

Personalized healthcare is a data-driven approach. This means a kind of patient-centered medical model that assesses the relationship among patients who are exposed to similar risk, lifestyle, and environmental factors that are created. In light of these thoughts, Chawla and Davis (2013) developed a system named CARE that uses a collaborative filtering method to capture patient similarities and produces personalized disease profiles for personalized disease risk predictions.

Panahiazar et al. (2014) presented the main challenges in the standpoints including variety of the data, quality of the data, volume of the data, and velocity of the data. Alyass et al. (2015) proposed that personalized medicine may widen the growing gap in health systems between rich and poor countries. Moreover, they blamed the slow transition from conventional to personalized medicine based on several factors: generation of cost-effective high-throughput data, hybrid education and multidisciplinary teams, data storage and processing, data integration and interpretation, and individual and global economic relevance.

Challenges for Mining Big Data in Health Care
Data mining

Clinical Big Data contains a large amount of unstructured data such as natural language or other handwritten data (Jee & Kim, 2013) whose integration, analysis, and storage bring a certain degree of difficulty. At the current stage, it is inefficient to share structured data among agencies and the sharing of unstructured data among the same organizations is even more difficult to achieve. Determining how to effectively mine a large amount of unstructured data will continue to be a major challenge (Sejdic, 2014). One of the characteristics of Big Data is variability in data sources (Dieringer & Schlotterer, 2003), and medical data itself have a strong timeliness; for example, personalized medical care has high timeliness requirements. The medical industry’s processing speed of data is extremely demanding, especially when the patient’s condition deteriorates rapidly. In addition, when using real-time applications such as cloud computing to access and analyze data, the patient data’s privacy and security are also a challenge (Jee & Kim, 2013). Cloud computing now offers new possibilities for medical Big Data’s mining and sharing. However, there are also several challenges to be overcome before cloud computing can become more practical. First, although cloud computing offers an easy and flexible way to mine resources, it also increases the risk of privacy disclosure, a fact that is particularly evident in fields such as clinical informatics and public health informatics. Second, in medicine, a large amount of data are often required to be imported or exported to the cloud (petabyte level). The network bandwidth constraints affect the speed of data transmission and also increase the cost of cloud computing (J. J. Chen, Qian, Yan, & Shen, 2013). At present, the attention to Big Data focuses mainly on its accuracy; timely and accurate data mining is another challenge, which is still in the initial stage (Abenstein & Tompkins, 1982; Xu et al).

Data storage

The current difficulties in data storage are mainly due to high costs. Medical data costs arise mainly from three aspects. The huge amount of medical data is one of the sources of storage costs. With the development of medical information, the medical industry has produced a large amount of data, ranging from medical diagnostic images to pathological analysis of maps. For example, regional medical data are usually derived from a region with millions of people and hundreds of medical institutions, and the amount of data continues to grow. In accordance with the relevant provisions of the medical industry, a patient’s data typically need to be retained for more than 50 years. The data of this patient not only contain a large number of online or real-time data but also include a variety of data such as diagnosis and medication recommendations in CDS, various structured data tables, non-(semi-) structured text documents, medical images, and other information. The massive size of the data inevitably increases the cost and difficulty of storage. There are also costs associated with moving them from one place to another as well as analyzing them. Finally, the types of medical data type are diverse, including numerical data that record various disease tests, as well as various diagnostic images, records made by doctors and nurses, and even diagnostic speech, video, and other unstructured data. Unstructured data are more difficult to store, analyze, and manipulate than structured data. They also, to a certain extent, increase the cost of storage. It is also a challenge to maintain safety and privacy in the process of storing, extracting, and downloading patient-related data (Youssef, 2014).

Data sharing
Limited data standardization and interoperability

The current standards and technologies are inadequate to meet the requirements of the integrative applications of health care Big Data. The difficulties are two folds. First, the data lack uniform standards, consistent description format, and presentation methods. Second, different levels of structured, semi-structured, and unstructured data integration are difficult. At the same time, each database uses different software and data formats, especially the latter makes data comparison, analysis, transfer, sharing, and other processes more difficult (Chawla & Davis, 2013; Mohr, Burns, Schueller, Clarke, & Klinkman, 2013; W. Raghupathi & Raghupathi, 2014). Data integration can also reduce the cost. Hillestad et al. (2005) compared health care with the use of IT in other industries and estimated that the use of interoperable electromagnetic radiation system can save $142–137 billion.

Information barriers

The medical field of Big Data users covers a wide range, such as hospital clinics, regional medical centers, medical insurance companies, drug management analysis units, and medical equipment monitoring centers. The corresponding data resources are scattered in different data pools, including hospital medical records, settlement and cost data, medical firms’ records, academic medical research data, residents’ health records collected by regional health information platforms, and population and public health data of government surveys. There is not much connection between these data sets. At the same time, data sharing mechanism is imperfect due to the information barriers among hospitals, scientific research institutions, and other institutions (Kruse, Goswamy, Raval, & Marawi, 2016). For example, in China, medical institutions have limited communication and sharing with each other as a whole (Rui, Y., 2015). With the globalization of data, Big Data in health care will also face varying degrees of language, terminology, and standardization barriers (Kruse et al, 2016.).

Volume of data

The massive volume of health care Big Data in the terabyte (TB) level and even petabyte (PB) level is now beyond the capabilities of personal computers and network file sharing programs, thus establishing that a new sharing mechanism is urgently needed (Kruse et al, 2016; Service).

Insufficient data integration

More data integration is needed. The data have not yet been fully embedded in business processes and organizational management practices. For example, in many cases, patient monitoring data have not yet been integrated into clinical diagnosis and treatment, and clinical data have not yet been integrated into public health services and infectious disease monitoring (Tao, D. A. I., 2016).

Data privacy

Health care data are more sensitive and centralized than other types of Big Data. There are significant concerns regarding confidentiality (Mancini, 2014b; D. C. Mohr et al., 2013). However, for the problem of patient data privacy protection, no perfect solution has yet emerged. Patient data leakage may have unpredictable consequences (including injury, discrimination, and others). There are many real cases at home and abroad. Big Data technology makes personal medical data face a greater risk. Some people even believe that in the era of Big Data, protecting personal privacy is impossible (Schadt, 2012). The problem can be alleviated by special processing (such as de-identification and digital identity encryption), but the identification and de-identification of information still require people or applications to process

An Example of Data Privacy Breach

Voter registration data (publicly available)Hospital discharge data
NameSexZip codeDate of birthAddressSexZip codeDate of birthDisease
AngelaFemale7788906/18/90ArizonaFemale7788906/18/90Diabetes
HarryMale8345602/14/76CaliforniaMale6672307/19/88Anemia
HarleyFemale7623109/15/92ConnecticutMale3241210/01/79Malnutrition

identifiable information that may cause the patient’s health information to be misappropriated by others without knowing or unauthorizedly (Rothstein, 2010). Big Data increases the risks to patient data for two reasons. First is the risk of the data itself. The data can be copied and preserved without space and time constraints, and this feature is characterized by high risk and long-term risk under Big Data conditions. Second is the risk of Big Data technology. Under Big Data technology conditions, even if a Big Database uses anonymous personal encrypted data, there is still a user identity that can be re-identified by residual risk, and personal identities can be re-determined by data link technology because Big Data uses pseudonymized personal confidential data that have been anonymized but retain a residual risk of re-identification (Ward,2014). This risk is greater when different data are used to relate. De-anonymization is an attack in which anonymous data and other sources of data are compared in order to re-identify the anonymous data sources (Yom-Tov, E, 2016). For example, comparing voter registration data and hospital discharge data can determine whether a person is sick. Voter registration data contains date of birth, sex, zip code, address, date last voted, name, data registered, and other details. Hospital discharge data contains date of birth, sex, zip code, diagnosis, ethnicity, medication, procedure, visit date, and other information. By comparing the same fields in the two data sources, such as date of birth, sex, and zip code, an attacker can determine the specific source and then determine the subject’s illness and voting situation. In the example in Table 3, through the comparison of these two data sources, it is not difficult to determine that the person whose date of birth, sex, and zip code are 06/18/90, female, 77889, respectively, is Angela and she is suffering from diabetes.

Also in the future, in order to better achieve individualized treatment, our individual genomes may be added to the EHR. The individual genome is private, and the gene sequence may lead to many privacy-related issues. Lin et al. (2004) found that “Specifying DNA sequence at only 30 to 80 statistically independent SNP positions will uniquely define a single person”. As such the privacy protection becomes the focus.

Data technologies and talent

As described in the main characteristics of Big Data, in terms of data size, Big Data in health care exceeded 150 exabytes after 2011 (Y. C. Wang et al., 2015). A study showed that data size in health care is estimated to be around 40 ZB in 2020 (Fig. 1) (O’Driscoll, Daugelaite, & Sleator, 2013). The complexity of the data is also growing rapidly, with data diversity, fast change, low value density, and other complex features becoming increasingly significant. Their complexity poses a serious challenge to traditional computing and information technology (Tony Hey, 2012.06). At present, it is difficult to accommodate the availability, consistency, and partition fault tolerance of the distributed system all at once. It is also difficult to solve the health care data collection, processing real-time and dynamic index, lack of prior knowledge, and other difficult issues (Zhang Zhen, Zhou Yi, Du Shou-hong, Luo Xue-qiong, Mei Tian, 2014). Even some widely used Big Data technology also has its challenges. For example, Hadoop helps solve the storage problems of Big Data and also reduces the cost of data storage and improves the speed of operation. However, Hadoop is faced with technical problems of low security and that data cannot be interconnected (Augustine, 2014. Mar; K. Jee & G. H. Kim, 2013). In addition, promoting the development of health care Big Data applications needs human experts who have both clinical and analytic knowledge (Mavandadi et al., 2012). According to McKinsey, even in the U.S., the leading information technology power, the related talent gap will reach 14–19 million in 2018 (James Manyika, 2011). Many of the data technologies today, including Hadoop and computing cloud, are challenging for many businesses, especially small firms. The skills required are in many cases not simple; they involve data mining, analysis, manipulation, and other techniques that are too difficult and expensive for most small firms to master (K. Jee & G. H. Kim, 2013). At present, only a small number of companies in the world have mastered the core technology of Big Data analysis. The world needs more data analysts who can use information technology to visualize the data before presenting to the policy makers. Finally, we also need to master the professional management of technology, data processing technology, and medical data management personnel. They can use the appropriate management model to make the information infrastructure a continuous research and application platform, ensure continuity, and achieve cross-cutting cooperation (Sepulveda,2013. Youssef, 2014).

Conclusions

Medical research that integrates Big Data will contribute to a higher level of human health at a broader and deeper level. This paper summarizes and introduces the related research of medical data at home and abroad in recent years. This paper mainly introduces the related concepts of medical Big Data, the background, and the main applications, and it introduces several key technologies related to medical Big Data. In addition, we summarize and think about the opportunities and challenges in the study of big medical data. In general, the current research on medical data is not yet mature; there are many problems that need to be resolved. In order to take full advantage of the profound patterns contained in the massive data, Big Data storage, mining, analysis, and related talent are essential. These technologies and talents will support research on health care Big Data and further serve a wide range of medical applications such as public health, medical care, and medical insurance, and many others.

Figure 1A

Data explosion in health care
Data explosion in health care

Figure 1B

Literature explosion searched with “health care” in Pubmed
Literature explosion searched with “health care” in Pubmed

Systems for Acquiring Medical/Clinical Big Data

SystemDescription
HISHospital information system; the system provides quality community for historical data resource, information, and knowledge in healthcare for hospital administration and patient health care (Bagayoko et al., 2010; Kanagaraj & Sumathi, 2011; Sirintrapun & Artz, 2016; Tsumoto, Hirano, & Iwata, 2013)
LISLaboratory information system; often used to collect, restore, archive, process, extract, and analyze data in laboratory; this system aims to improve efficiency of turn-around-times (TAT) of records, quality of resource utilization, and public health supporting (Blaya et al., 2007; Sepulveda & Young, 2013)
RISRadiology information system; it is used to capture and store data including images, demographic and clinical information, and so on, also assisting in patient registration, report repository, and physician directory with advanced technology (Nance, Meenan, & Nagy, 2013)
PACS (super sound PACS, endoscope PACS)Picture archiving and communication systems; it is a common HIS for storage and transferring of digital images (Joshi & Yesha, 2012)
EMREMR system is used to maintain medical records and store, process, and retrieve information. It also ensures accuracy of information. Its aim is to ensure accuracy of information in order to provide patient control and transparency, interdepartmental communication, and great reporting capabilities for treatment (Kumar & Aldrich, 2010)
Cost accountingSystem for collecting, recording, classifying, analyzing, summarizing, allocating, and evaluating financial cost in the medical area
Physical examination systemSystem for checking signs of patient

Summary of Major Date Types of Big Data in Health Care

Data typeData nameData descriptionData acquisitionTechnology/database/system
Big Data in medicine and clinicsElectronic health record (EHR)/ electronic medical record (EMR)Standard data collection of medical and health information for patients and can be shared in different organizations (Gunter & Terry, 2005). Often comes from medical activities and public health dataHospital information resource, surgery’s work, activities of anesthesia, physical examination, radiography, magnetic resonance imaging (MRI), computer tomography (CT), information of patient, pharmacy, treatment, medical imaging, imaging report, identification information of patient, clinical diagnosis, medicine scheme, notes from physician, sensor data (Belle et al., 2015; Wang & Alexander, 2013), patient demographics, clinic or inpatient notes, electronic reportsMedical record data exchange, standards: Health Level 7 (HL7) , Continuity of Care Record (CCR), Continuity of Care Document (CCD), controlled medical vocabulary (CMV), computerized provider order entry (CPOE) (Valdes, Kibbe, Tolleson, Kunik, & Petersen, 2004) (Garets & Davis, 2007), all scripts, Epic Systems, Practice Fusion, NextGen Healthcare, clinical decision support systems, pharmacy management system, EMR Adoption Model (Wang & Alexander, 2013) (Garets & Davis, 2007), NoSQL database, clinical data repository (CDR) (Garets & Davis, 2007)
Personal health record (PHR)As its name suggests, it is the health-related data and information of patients (Tang, Ash, Bates, Overhage, & Sands, 2006) and about people’s lifelong health information. It is available for further use (Chen et al., 2012)Allergies and adverse drug reactions, chronic diseases, family history, illnesses and hospitalizations, imaging reports, laboratory test results, medications and dosing, prescription record, surgeries and other procedures, vaccinations and observations of daily living, and reported by patients (Rumsfeld, Joynt, & Maddox, 2016)Cloud computing, Health Insurance Portability and Accountability Act(HIPAA) , and HL7 (Chen et al., 2012); stored in paper like printed laboratory reports, copies of clinic notes, and health histories created by the individual; electronic devices such as personal computer-based software, CD, DVD, and smart card; web applications such as HealthVault and PatientsLikeMe; and cloud servers (Chen et al., 2012)
Medical imagesData that present visual information of interior human bodyX-ray, CT, histology, positron- emission tomography (PET), radiography, MRI, nuclear medicine, elastography, tactile imaging, photoacoustic imaging, echocardiography (Kovalev & Kalinovsky, 2015), ultrasonography, angiographyStatistical shape models (SSMs), medial models, clustering, active appearance models (AAMs), active shape models (ASMs) (Heimann & Meinzer, 2009), image segmentation algorithm, fuzzy C-means (FCM) algorithm (Zhang & Chen, 2004), image registration, picture archiving and communication systems, Super PACS (Picture Archiving and Communication Systems) , RIS, and digital image communication in medicine (DICOM) (Luo, Wu, Gopukumar, & Zhao, 2016)
ElectrocardiogramElectrical graph recording heartbeat activity of a person in a period of time like 1 minuteElectrocardiograph (ECG) signalMIT-BIH Arrhythmia Database, American Heart Association(AHA) database, Common Standards for Electrocardiography database, ST-T database, Physikalisch-Technische Bundesanstalt (PTB) and Paroxysmal Atrial Fibrillation(PAF)
Big Data in public health and behaviorVitalsMainly refer to four sings (temperature, pulse, respiratory rate, and blood pressure) and other physiological data outside the health-care setting (Rumsfeld et al., 2016)Temperature, pulse, respiratory rate, and blood pressureMobile technology, portable equipment, wearable system, and advanced devices like smartphones with third-party applications (HealthKit from Apple, Google Fit from Google, and S Health form Samsung), Android watches and Google glasses (Safavi & Shukur, 2014), and medical devices like implantable cardioverter– defibrillators (Rumsfeld et al., 2016)
-omics dataBiology information data in molecular- level catalog (Skotnes, 2012). Reflects characteristics of individual for treatment (Rumsfeld et al., 2016)Genomics, transcriptomics – whole genome sequencing, RNA seq, metabolomics –Nuclear Magnetic Resonance (NMR) , mass spectrometry, proteomics – mass spectrometry, methylomics – pyrosequencing, and ChIP-on-chipData End-of-life (EOL) Extension (DAnTE) and DanteR
Molecular biology experimentInteraction and regulation of biological activity within cells, such as interactions between DNA, RNA, proteins, and biosynthesisMolecular cloning, polymerase chain reaction (PCR), macromolecule blotting and probing, microarrays, and next-generation sequencingNCBI
Human body samplesData and samples of cells, tissues, and organs in human body (Bagayoko, Dufour, Chaacho, Bouhaddou, & Fieschi, 2010)Cells, tissues, and organsMayo Clinic Biobanks (http://specimencentral.com/biobank-directory/)
Big Data in medical experimentClinical trialsExperiments for evaluating new medical treatment (e.g., drug, device) (Kanagaraj & Sumathi, 2012)Drug efficacy, toxicity, new treatment devices, and proceduresClinicalTrials.gov
Journal/ conference articleResearch articles written by researchersPubmed.com, New England Journal of Medicine, Lancet, Nature, Science, and CellWebsite of journal articles, Google Scholar, and Science Citation Index (SCI)
Big Data in medical literatureStructured knowledgeMeSH and International Classification of Diseases 10th revision (ICD-10)Database in MeSHNCBI

An Example of Data Privacy Breach

Voter registration data (publicly available)Hospital discharge data
NameSexZip codeDate of birthAddressSexZip codeDate of birthDisease
AngelaFemale7788906/18/90ArizonaFemale7788906/18/90Diabetes
HarryMale8345602/14/76CaliforniaMale6672307/19/88Anemia
HarleyFemale7623109/15/92ConnecticutMale3241210/01/79Malnutrition

Abenstein, J. P., & Tompkins, W. J. (1982). A new data-reduction algorithm for real-time ECG analysis. IEEE Transactions on Biomedical Engineering,29(1), 43–48.AbensteinJ. P.TompkinsW. J.1982A new data-reduction algorithm for real-time ECG analysisIEEE Transactions on Biomedical Engineering2914348Search in Google Scholar

Abernethy, A. P., Wheeler, J. L., & Bull, J. (2011). Development of a health information technology-based data system in community-based hospice and palliative care. American Journal of Preventive Medicine, 40(5, Suppl 2), S217–S224.AbernethyA. P.WheelerJ. L.BullJ.2011Development of a health information technology-based data system in community-based hospice and palliative careAmerican Journal of Preventive Medicine405Suppl 2S217S224Search in Google Scholar

Agrawal, R., Imieliński, T., & Swami, A. (1993, May). Mining association rules between sets of items in large databases. In B. Peter, & J. Sunshil(Eds.), Proceeding of the ACM SIGMOD Conference on Management of Data(pp.207-216). Washington, DC: ACM Press.AgrawalR.ImielińskiT.SwamiA.1993Mining association rules between sets of items in large databasesPeterB.SunshilJ.Proceeding of the ACM SIGMOD Conference on Management of Data207216Washington, DCACM PressSearch in Google Scholar

Aitken, M., & Gauntlett, C. (2013). Patient apps for improved healthcare: from novelty to mainstream. IMS Institute for Healthcare Informatics Retrieved from https://www.mendeley.com/catalogue/patient-apps-improved-healthcare-novelty-mainstream/AitkenM.GauntlettC.2013Patient apps for improved healthcare: from novelty to mainstreamIMS Institute for Healthcare InformaticsRetrieved fromhttps://www.mendeley.com/catalogue/patient-apps-improved-healthcare-novelty-mainstream/Search in Google Scholar

Alyass, A., Turcotte, M., & Meyre, D. (2015). From big data analysis to personalized medicine for all: Challenges and opportunities. BMC Medical Genomics, 8(1), 33.AlyassA.TurcotteM.MeyreD.2015From big data analysis to personalized medicine for all: Challenges and opportunitiesBMC Medical Genomics8133Search in Google Scholar

Anderson, J. E., & Chang, D. C. (2015). Using electronic health records for surgical quality improvement in the era of big data. Jama Surgery, 150(1), 24-29.AndersonJ. E.ChangD. C.2015Using electronic health records for surgical quality improvement in the era of big dataJama Surgery15012429Search in Google Scholar

Antonie, M. L., Zaïane, O. R., & Coman, A. (2001). Application of data mining techniques for medical image classification. Proceedings of the Second International Conference on Multimedia Data Mining 94-101. doi:10.1.1.23.9742AntonieM. L.ZaïaneO. R.ComanA.2001Application of data mining techniques for medical image classificationProceedings of the Second International Conference on Multimedia Data Mining9410110.1.1.23.9742Open DOISearch in Google Scholar

Asante-Korang, A., & Jacobs, J. P. (2016). Big Data and paediatric cardiovascular disease in the era of transparency in healthcare. Cardiology in the Young, 26(8), 1597–1602.Asante-KorangA.JacobsJ. P.2016Big Data and paediatric cardiovascular disease in the era of transparency in healthcareCardiology in the Young26815971602Search in Google Scholar

Asri, H., Mousannif, H., Al Moatassime, H., & Noel, T. (2015, June). Big data in healthcare: challenges and opportunities. Proceedings of 2015 International Conference on Cloud Computing Technologies and ApplicationsMarrakech, Morocco.AsriH.MousannifH.Al MoatassimeH.NoelT.2015Big data in healthcare: challenges and opportunitiesProceedings of 2015 International Conference on Cloud Computing Technologies and ApplicationsMarrakech, MoroccoSearch in Google Scholar

Augustine, D. P. (2014). Leveraging big data analytics and Hadoop in developing India’s healthcare services. International Journal of Computers and Applications, 89(16), 44–50.AugustineD. P.2014Leveraging big data analytics and Hadoop in developing India’s healthcare servicesInternational Journal of Computers and Applications89164450Search in Google Scholar

Azar, A. T., & Hassanien, A. E. (2015). Dimensionality reduction of medical big data using neural-fuzzy classifier. Soft Computing, 19(4), 1115–1127.AzarA. T.HassanienA. E.2015Dimensionality reduction of medical big data using neural-fuzzy classifierSoft Computing19411151127Search in Google Scholar

Backonja, U., Kim, K., Casper, G. R., Patton, T., Ramly, E., & Brennan, P. F. (2012, June). Observations of daily living: putting the “personal” in personal health records. NI 2012: 11th International Congress on Nursing Informatics Montreal, Canada.BackonjaU.KimK.CasperG. R.PattonT.RamlyE.BrennanP. F.2012Observations of daily living: putting the “personal” in personal health recordsNI 2012 11th International Congress on Nursing InformaticsMontrealCanadaSearch in Google Scholar

Bagayoko, C. O., Dufour, J. C., Chaacho, S., Bouhaddou, O., & Fieschi, M. (2010). Open source challenges for hospital information system (HIS) in developing countries: A pilot project in Mali. BMC Medical Informatics and Decision Making, 10(22), 1-13.BagayokoC. O.DufourJ. C.ChaachoS.BouhaddouO.FieschiM.2010Open source challenges for hospital information system (HIS) in developing countries: A pilot project in MaliBMC Medical Informatics and Decision Making1022113Search in Google Scholar

Bamidis, P. D. (2010). On the classification of emotional biosignals evoked while viewing affective pictures: An integrated data-mining-based approach for healthcare applications. IEEE Transactions on Information Technology in Biomedicine, 14(2), 309–318.BamidisP. D.2010On the classification of emotional biosignals evoked while viewing affective pictures: An integrated data-mining-based approach for healthcare applicationsIEEE Transactions on Information Technology in Biomedicine142309318Search in Google Scholar

Bates, D. W., Saria, S., Ohno-Machado, L., Shah, A., & Escobar, G. (2014). Big data in health care: Using analytics to identify and manage high-risk and high-cost patients. Health Affairs, 33(7), 1123–1131BatesD. W.SariaS.Ohno-MachadoL.ShahA.EscobarG.2014Big data in health care: Using analytics to identify and manage high-risk and high-cost patientsHealth Affairs3371123–1131Search in Google Scholar

Belle, A., Thiagarajan, R., Soroushmehr, S. M., Navidi, F., Beard, D. A., & Najarian, K. (2015). Big data analytics in healthcare. Biomed Research Internatioan,2015370194 1-16.BelleA.ThiagarajanR.SoroushmehrS. M.NavidiF.BeardD. A.NajarianK.2015Big data analytics in healthcareBiomed Research Internatioan 2015370194116Search in Google Scholar

Berner, E. S. (2003). Diagnostic decision support systems: How to determine the gold standard? Journal of the American Medical Informatics Association,10(6), 608–610.BernerE. S.2003Diagnostic decision support systems: How to determine the gold standard?Journal of the American Medical Informatics Association106608610Search in Google Scholar

Blaya, J. A., Shin, S. S., Yagui, M. J., Yale, G., Suarez, C. Z., Asencios, L. L., Fraser, H. S. (2007). A web-based laboratory information system to improve quality of care of tuberculosis patients in Peru: Functional requirements, implementation and usage statistics. BMC Medical Informatics and Decision Making, 7(1), 33–43.BlayaJ. A.ShinS. S.YaguiM. J.YaleG.SuarezC. Z.AsenciosL. L.FraserH. S.2007A web-based laboratory information system to improve quality of care of tuberculosis patients in Peru: Functional requirements, implementation and usage statisticsBMC Medical Informatics and Decision Making713343Search in Google Scholar

Braunstein, M. L. (2015). Health big data and analytics. Practitioner’s Guide to Health Informatics (pp. 133–149). Berlin, Germany: Springer International Publishing.BraunsteinM. L.2015Health big data and analytics. Practitioner’s Guide to Health Informatics133149Berlin, GermanySpringer International PublishingSearch in Google Scholar

Celesti, A., Fazio, M., Romano, A., & Villari, M. (2016). A hospital cloud-based archival information system for the efficient management of HL7 big data. 2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) Opatija, Croatia.CelestiA.FazioM.RomanoA.VillariM.2016A hospital cloud-based archival information system for the efficient management of HL7 big data2016 39th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)Opatija, CroatiaSearch in Google Scholar

Centers for Medicare & Medicaid Services (CMS), HHS. (2010). Medicare and Medicaid programs; electronic health record incentive program. Final rule, Federal Register, 75(144), 44313–44588. PMID:20677415Centers for Medicare & Medicaid Services (CMS), HHS2010Medicare and Medicaid programs; electronic health record incentive programFinal rule, Federal Register751444431344588PMID:20677415Search in Google Scholar

Chawla, N. V., & Davis, D. A. (2013). Bringing big data to personalized healthcare: A patient-centered framework. Journal of General Internal Medicine, 28(3, Suppl 3), S660–S665.ChawlaN. V.DavisD. A.2013Bringing big data to personalized healthcare: A patient-centered frameworkJournal of General Internal Medicine283Suppl 3S660S665Search in Google Scholar

Chen, J., Qian, F., Yan, W., & Shen, B. (2013). Translational biomedical informatics in the cloud: Present and future. BioMed Research International,2013 658925. PMID:23586054ChenJ.QianF.YanW.ShenB.2013Translational biomedical informatics in the cloud: Present and futureBioMed Research International,2013658925PMID:23586054Search in Google Scholar

Chen, M., Mao, S., & Liu, Y. (2014). Big data: A survey. Mobile Networks and Applications, 19(2), 171–209.ChenM.MaoS.LiuY.2014Big data: A surveyMobile Networks and Applications192171209Search in Google Scholar

Chen, T. S., Liu, C. H., Chen, T. L., Chen, C. S., Bau, J. G., & Lin, T. C. (2012). Secure dynamic access control scheme of PHR in cloud computing. Journal of Medical Systems, 36(6), 4005–4020.ChenT. S.LiuC. H.ChenT. L.ChenC. S.BauJ. G.LinT. C.2012Secure dynamic access control scheme of PHR in cloud computingJournal of Medical Systems36640054020Search in Google Scholar

Chia, C.-C., & Syed, Z. (2011). Computationally generated cardiac biomarkers: Heart rate patterns to predict death following coronary attacks. Proceedings of the 2011 SIAM International Conference on Data Mining 735-746.ChiaC.-C.SyedZ.2011Computationally generated cardiac biomarkers: Heart rate patterns to predict death following coronary attacksProceedings of the 2011 SIAM International Conference on Data Mining735746Search in Google Scholar

Christopher C. Yang, H. Y., Jiang, L., & Zhang, M. (2009). Social media mining for drug safety signal detection. Proceedings of the 2012 international workshop on Smart health and wellbeingChristopherC. YangH. Y.JiangL.ZhangM.2009Social media mining for drug safety signal detectionProceedings of the 2012 international workshop on Smart health and wellbeingSearch in Google Scholar

Christy, A., Gandhi, G. M., & Vaithyasubramanian, S. (2015). Cluster based outlier detection algorithm for healthcare data. Procedia Computer Science,50 209–215.ChristyA.GandhiG. M.VaithyasubramanianS.2015Cluster based outlier detection algorithm for healthcare dataProcedia Computer Science50209–215Search in Google Scholar

Cismondi, F., Fialho, A. S., Vieira, S. M., Reti, S. R., Sousa, J. M., & Finkelstein, S. N. (2013). Missing data in medical databases: Impute, delete or classify? Artificial Intelligence in Medicine, 58(1), 63–72.CismondiF.FialhoA. S.VieiraS. M.RetiS. R.SousaJ. M.FinkelsteinS. N.2013Missing data in medical databases: Impute, delete or classify?Artificial Intelligence in Medicine5816372Search in Google Scholar

Collins, B. (2016). Big data and health economics: Strengths, waknesses, opportunities and threats. PharmacoEconomics, 34(2), 101–106.CollinsB.2016Big data and health economics: Strengths, waknesses, opportunities and threatsPharmacoEconomics342101106Search in Google Scholar

Costa, F. F. (2014). Big data in biomedicine. Drug Discovery Today, 19(4), 433–440.CostaF. F.2014Big data in biomedicineDrug Discovery Today194433440Search in Google Scholar

Dai, T. (2016). Health and medical big data development perspective. Journal of Medical Informatics, 37(2), 2–8.DaiT.2016Health and medical big data development perspectiveJournal of Medical Informatics37228Search in Google Scholar

Demidowich, A. P., Lu, K., Tamler, R., & Bloomgarden, Z. (2012). An evaluation of diabetes self-management applications for Android smartphones. Journal of Telemedicine and Telecare, 18(4), 235–238.DemidowichA. P.LuK.TamlerR.BloomgardenZ.2012An evaluation of diabetes self-management applications for Android smartphonesJournal of Telemedicine and Telecare184235238Search in Google Scholar

DerSimonian, R., & Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials, 7(3), 177–188.DerSimonianR.LairdN.1986Meta-analysis in clinical trialsControlled Clinical Trials73177188Search in Google Scholar

Deserno, T. M., Haak, D., Brandenburg, V., Deserno, V., Classen, C., & Specht, P. (2014). Integrated image data and medical record management for rare disease registries. A general framework and its instantiation to the German Calciphylaxis Registry. Journal of Digital Imaging, 27(6), 702–713.DesernoT. M.HaakD.BrandenburgV.DesernoV.ClassenC.SpechtP.2014Integrated image data and medical record management for rare disease registriesA general framework and its instantiation to the German Calciphylaxis RegistryJournal of Digital Imaging276702713Search in Google Scholar

Dieringer, D., & Schlotterer, C. (2003). Microsatellite analyser (MSA): A platform independent analysis tool for large microsatellite data sets. Molecular Ecology Notes, 3(1), 167–169.DieringerD.SchlottererC.2003Microsatellite analyser (MSA): A platform independent analysis tool for large microsatellite data setsMolecular Ecology Notes31167169Search in Google Scholar

Docherty,A., (2014). Big Data—Ethical perspectives. Anaesthesia, 69(4), 390–391.DochertyA.2014Big Data—Ethical perspectivesAnaesthesia694390391Search in Google Scholar

Edwards, I. R., & Aronson, J. K. (2000). Adverse drug reactions: Definitions, diagnosis, and management. Lancet, 356(9237), 1255–1259.EdwardsI. R.AronsonJ. K.2000Adverse drug reactions: Definitions, diagnosis, and managementLancet356923712551259Search in Google Scholar

Fan, C.-Y., Chang, P.-C., Lin, J.-J., & Hsieh, J. C. (2011). A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classification. Applied Soft Computing, 11(1), 632–644.FanC.-Y.ChangP.-C.LinJ.-J.HsiehJ. C.2011A hybrid model combining case-based reasoning and fuzzy decision tree for medical data classificationApplied Soft Computing111632644Search in Google Scholar

Feldman, B., Martin, E. M., & Skotnes, T. (2012). Big data in healthcare: Hype and hope. Dr. Bonnie, 2012(1), 122–125.FeldmanB.MartinE. M.SkotnesT.2012Big data in healthcare: Hype and hopeDr. Bonnie20121122125Search in Google Scholar

Fenderson, & Bruce.,A. (2008). Molecular Biology of the Cell,5th Edition. Medicine & Science in Sports & Exercise, 40(9), 1709.FendersonBruce.,A2008Molecular Biology of the Cell,5th EditionMedicine & Science in Sports & Exercise4091709Search in Google Scholar

Frantzidis, C. A., Bratsas, C., Klados, M. A., Konstantinidis, E., Lithari, C. D., Vivas, A. B., Gardner, R. M., Pryor, T. A., & Warner, H. R. (1999). The HELP hospital information system: Update 1998. International Journal of Medical Informatics, 54(3), 169–182.FrantzidisC. A.BratsasC.KladosM. A.KonstantinidisE.LithariC. D.VivasA. B.GardnerR. M.PryorT. A.WarnerH. R.1999The HELP hospital information system: Update 1998International Journal of Medical Informatics543169182Search in Google Scholar

Garets, D., & Davis, M. (2007). Electronic medical records vs Electronic health records: Yes, there is a difference. Zhongguo Yiyuan, 11(5), 38–39.GaretsD.DavisM.2007Electronic medical records vs Electronic health records: Yes, there is a differenceZhongguo Yiyuan1153839Search in Google Scholar

Gunter, T. D., & Terry, N. P. (2005). The emergence of national electronic health record architectures in the United States and Australia: Models, costs, and questions. Journal of Medical Internet Research, 7(1), 13-15.GunterT. D.TerryN. P.2005The emergence of national electronic health record architectures in the United States and Australia: Models, costs, and questionsJournal of Medical Internet Research711315Search in Google Scholar

Han, J., Pei, J., & Yin, Y.. (2000, May). Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Proceedings of the 2000 ACM SIGMOD international conference on Management of data(pp.1-12), Texas, USA.HanJ.PeiJ.YinY..2000Mining frequent patterns without candidate generation: A frequent-pattern tree approachProceedings of the 2000 ACM SIGMOD international conference on Management of data112Texas, USASearch in Google Scholar

Hassani S, M. H., Qannari E M, et al. (2010). Analysis of -omics data: Graphical interpretation- and validation tools in multi-block methods. Chemometrics and Intelligent Laboratory Systems, 104(1), 140–153.HassaniS, M. H.QannariE M2010Analysis of -omics data: Graphical interpretation- and validation tools in multi-block methodsChemometrics and Intelligent Laboratory Systems1041140153Search in Google Scholar

Hastie, B. A., Riley, J. L., Robinson, M. E., Glover, T., Campbell, C. M., Staud, R., & Fillingim, R. B. (2005). Cluster analysis of multiple experimental pain modalities. Pain, 116(3), 227–237.HastieB. A.RileyJ. L.RobinsonM. E.GloverT.CampbellC. M.StaudR.FillingimR. B.2005Cluster analysis of multiple experimental pain modalitiesPain1163227237Search in Google Scholar

Hay, S. I., George, D. B., Moyes, C. L., & Brownstein, J. S. (2013). Big data opportunities for global infectious disease surveillance. PLoS Medicine, 10(4), e1001413.HayS. I.GeorgeD. B.MoyesC. L.BrownsteinJ. S.2013Big data opportunities for global infectious disease surveillancePLoS Medicine104Search in Google Scholar

He, C., Jin, X., Zhao, Z., & Xiang, T. (2010, Deceember). A cloud computing solution for hospital information system Paper presented at the 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems, Xiamen, China.HeC.JinX.ZhaoZ.XiangT.2010A cloud computing solution for hospital information systemPaper presented at the 2010 IEEE International Conference on Intelligent Computing and Intelligent SystemsXiamen, ChinaSearch in Google Scholar

Heart, T., Ben-Assuli, O., & Shabtai, I. (2017). A review of PHR, EMR and EHR integration: A more personalized healthcare and public health policy. Health Policy and Technology, 6(1), 20–25.HeartT.Ben-AssuliO.ShabtaiI.2017A review of PHR, EMR and EHR integration: A more personalized healthcare and public health policyHealth Policy and Technology612025Search in Google Scholar

Heimann, T., & Meinzer, H. P. (2009). Statistical shape models for 3D medical image segmentation: A review. Medical Image Analysis, 13(4), 543-563.HeimannT.MeinzerH. P.2009Statistical shape models for 3D medical image segmentation: A reviewMedical Image Analysis134543563Search in Google Scholar

Herland, M., Khoshgoftaar, T. M., & Wald, R. (2014). A review of data mining using big data in health informatics. Journal of Big Data, 1(2), 1–35.HerlandM.KhoshgoftaarT. M.WaldR.2014A review of data mining using big data in health informaticsJournal of Big Data12135Search in Google Scholar

Hillestad, R., Bigelow, J., Bower, A., Girosi, F., Meili, R., Scoville, R., & Taylor, R. (2005). Can electronic medical record systems transform health care? Potential health benefits, savings, and costs. Health Affairs, 24(5), 1103-1117.HillestadR.BigelowJ.BowerA.GirosiF.MeiliR.ScovilleR.TaylorR.2005Can electronic medical record systems transform health care?Potential health benefits, savings, and costsHealth Affairs24511031117Search in Google Scholar

Hong, C. J., Kaur, M. N., Farrokhyar, F., & Thoma, A. (2015). Accuracy and completeness of electronic medical records obtained from referring physicians in a Hamilton, Ontario, plastic surgery practice: a prospective feasibility study. Plastic Surgery, 23(1), 48.HongC. J.KaurM. N.FarrokhyarF.ThomaA.2015Accuracy and completeness of electronic medical records obtained from referring physicians in a Hamilton, Ontario, plastic surgery practice: a prospective feasibility studyPlastic Surgery23148Search in Google Scholar

Hsieh, J. C., Li, A. H., & Yang, C. C. (2013). Mobile, cloud, and big data computing: Contributions, challenges, and new directions in telecardiology. International Journal of Environmental Research and Public Health, 10(11), 6131–6153.HsiehJ. C.LiA. H.YangC. C.2013Mobile, cloud, and big data computing: Contributions, challenges, and new directions in telecardiologyInternational Journal of Environmental Research and Public Health101161316153Search in Google Scholar

Huang, X. J., & Yao, Y. (2016, August). Multi-dimensions clustering approach for physical health data based on aritificial ant colony optimization Paper presented at the 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), Hangzhou, China.HuangX. J.YaoY.2016Multi-dimensions clustering approach for physical health data based on aritificial ant colony optimizationPaper presented at the 8th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC)Hangzhou, ChinaSearch in Google Scholar

Jee, K., & Kim, G. H. (2013). Potentiality of big data in the medical sector: Focus on how to reshape the healthcare system. Healthcare Informatics Research, 19(2), 79–85.JeeK.KimG. H.2013Potentiality of big data in the medical sector: Focus on how to reshape the healthcare systemHealthcare Informatics Research1927985Search in Google Scholar

Joshi, K., & Yesha, Y. (2012). Workshop on analytics for big data generated by healthcare and personalized medicine domain. Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research 267-269.JoshiK.YeshaY.2012Workshop on analytics for big data generated by healthcare and personalized medicine domainProceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research267269Search in Google Scholar

Kanagaraj, G., & Sumathi, A. C. (2011, December). Proposal of an open-source cloud computing system for exchanging medical images of a Hospital Information System Paper presented at the 3rd International Conference on Trendz in Information Sciences & Computing (TISC2011), Chennai, India.KanagarajG.SumathiA. C.2011Proposal of an open-source cloud computing system for exchanging medical images of a Hospital Information SystemPaper presented at the 3rd International Conference on Trendz in Information Sciences & Computing (TISC2011)Chennai, IndiaSearch in Google Scholar

Kennedy, E. H., Wiitala, W. L., Hayward, R. A., & Sussman, J. B. (2013). Improved cardiovascular risk prediction using nonparametric regression and electronic health record data. Medical Care, 51(3), 251–258.KennedyE. H.WiitalaW. L.HaywardR. A.SussmanJ. B.2013Improved cardiovascular risk prediction using nonparametric regression and electronic health record dataMedical Care513251258Search in Google Scholar

Khan, W. A., Khattak, A. M., Hussain, M., Amin, M. B., Afzal, M., Nugent, C., & Lee, S. (2014). An adaptive semantic based mediation system for data interoperability among Health Information Systems. Journal of Medical Systems, 38(8), 1-18.KhanW. A.KhattakA. M.HussainM.AminM. B.AfzalM.NugentC.LeeS.2014An adaptive semantic based mediation system for data interoperability among Health Information SystemsJournal of Medical Systems388118Search in Google Scholar

Khoury, M. J., & Ioannidis, J. P. A. (2014). Medicine. Big data meets public health. The New Zealand Medical Journal, 346(6213), 1054–1055.KhouryM. J.IoannidisJ. P. A.2014MedicineBig data meets public healthThe New Zealand Medical Journal346621310541055Search in Google Scholar

Kim, T.-W., Park, K.-H., Yi, S.-H., & Kim, H.-C. (2014). A big data framework for u-Healthcare systems utilizing vital signs Paper presented at 2014 International Symposium on Computer, Consumer and Control, Taichung, Taiwan.KimT.-W.ParkK.-H.YiS.-H.KimH.-C.2014A big data framework for u-Healthcare systems utilizing vital signsPaper presented at 2014 International Symposium on ComputerConsumer and ControlTaichung, TaiwanSearch in Google Scholar

Kovalev, V., & Kalinovsky, A. (2015). Big Medical Data: Image Mining, Retrieval and Analytics Paper presented at Big Data and Predictive Analytics, Minsk, Belarus.KovalevV.KalinovskyA.2015Big Medical Data: Image Mining, Retrieval and AnalyticsPaper presented at Big Data and Predictive AnalyticsMinsk, BelarusSearch in Google Scholar

Krumholz, H. M. (2014). Big data and new knowledge in medicine: The thinking, training, and tools needed for a learning health system. Health Affairs, 33(7), 1163–1170.KrumholzH. M.2014Big data and new knowledge in medicine: The thinking, training, and tools needed for a learning health systemHealth Affairs33711631170Search in Google Scholar

Kruse, C. S., Goswamy, R., Raval, Y., & Marawi, S. (2016). Challenges and opportunities of big data in health care: A systematic review. Jmir Medical Informaticas, 4(4), e38.KruseC. S.GoswamyR.RavalY.MarawiS.2016Challenges and opportunities of big data in health care: A systematic reviewJmir Medical Informaticas44e38Search in Google Scholar

Kumar, S., & Aldrich, K. (2010). Overcoming barriers to electronic medical record (EMR) implementation in the US healthcare system: A comparative study. Health Informatics Journal, 16(4), 306–318.KumarS.AldrichK.2010Overcoming barriers to electronic medical record (EMR) implementation in the US healthcare system: A comparative studyHealth Informatics Journal164306318Search in Google Scholar

Kuo, R., Lin, S., & Shih, C. (2007). Mining association rules through integration of clustering analysis and ant colony system for health insurance database in Taiwan. Expert Systems with Applications, 33(3), 794-808.KuoR.LinS.ShihC.2007Mining association rules through integration of clustering analysis and ant colony system for health insurance database in TaiwanExpert Systems with Applications333794808Search in Google Scholar

Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). Big data. The parable of Google Flu: Traps in big data analysis. Science, 343(6176), 1203–1205LazerD.KennedyR.KingG.VespignaniA.2014Big dataThe parable of Google Flu: Traps in big data analysisScience34361761203–1205Search in Google Scholar

Lin, Z., Owen, A. B., & Altman, R. B. (2004). Genetics: Genomic research and human subject privacy. Science, 305(5681), 183.LinZ.OwenA. B.AltmanR. B.2004Genetics: Genomic research and human subject privacyScience3055681183Search in Google Scholar

Lincoln, M. J. (1998). Applying commonly available expert systems in physician assistant education. Perspective on Physician Assistant Education, 9(3), 144–151.LincolnM. J.1998Applying commonly available expert systems in physician assistant educationPerspective on Physician Assistant Education93144151Search in Google Scholar

Lodish, H. (2008). Molecular cell biology San Francisco, CA: W.H.Freeman and Company.LodishH.2008Molecular cell biologySan Francisco, CAW.H.Freeman and CompanySearch in Google Scholar

Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. (2016). Big data application in biomedical research and health care: A literature review. Biomedical Informatics Insights, 8 1–10.LuoJ.WuM.GopukumarD.ZhaoY.2016Big data application in biomedical research and health care: A literature reviewBiomedical Informatics Insights81–10Search in Google Scholar

M, T. T. (2014). Mobile Tech Contributions to Healthcare & Patient Experiences. Retrieved from M, T. T2014Mobile Tech Contributions to Healthcare & Patient ExperiencesRetrieved fromhttp://topmobiletrends.com/mobile-technologycontributions-Patient-experience-parmar/Search in Google Scholar

MacRae, J., Darlow, B., McBain, L., Jones, O., Stubbe, M., Turner, N., & Dowell, A. (2015). Accessing primary care Big Data: The development of a software algorithm to explore the rich content of consultation records. BMJ Open, 5(8), e008160.MacRaeJ.DarlowB.McBainL.JonesO.StubbeM.TurnerN.DowellA.2015Accessing primary care Big Data: The development of a software algorithm to explore the rich content of consultation recordsBMJ Open58e008160Search in Google Scholar

Mancini, M. (2014). Exploiting big data for improving healthcare services. Journal of e-Learning and Knowledge Society, 10(2), 23-33.ManciniM.2014Exploiting big data for improving healthcare servicesJournal of e-Learning and Knowledge Society1022333Search in Google Scholar

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity Retrieved from Mckinsey Glbal Institute website: https://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovationManyikaJ.ChuiM.BrownB.BughinJ.DobbsR.RoxburghC.ByersA. H.2011Big data: The next frontier for innovation, competition, and productivityRetrieved from Mckinsey Glbal Institute websitehttps://www.mckinsey.com/business-functions/digital-mckinsey/our-insights/big-data-the-next-frontier-for-innovationSearch in Google Scholar

Marx, V. (2013). Biology: The big challenges of big data. Nature, 498(7453), 255–260.MarxV.2013Biology: The big challenges of big dataNature4987453255260Search in Google Scholar

Mavandadi, S., Dimitrov, S., Feng, S., Yu, F., Yu, R., Sikora, U., & Ozcan, A. (2012). Crowd-sourced BioGames: Managing the big data problem for next-generation lab-on-a-chip platforms. Lab on a Chip, 12(20), 4102–4106.MavandadiS.DimitrovS.FengS.YuF.YuR.SikoraU.OzcanA.2012Crowd-sourced BioGames: Managing the big data problem for next-generation lab-on-a-chip platformsLab on a Chip122041024106Search in Google Scholar

Mohr, D. C., Burns, M. N., Schueller, S. M., Clarke, G., & Klinkman, M. (2013). Behavioral intervention technologies: Evidence review and recommendations for future research in mental health. General Hospital Psychiatry, 35(4), 332–338.MohrD. C.BurnsM. N.SchuellerS. M.ClarkeG.KlinkmanM.2013Behavioral intervention technologies: Evidence review and recommendations for future research in mental healthGeneral Hospital Psychiatry354332338Search in Google Scholar

Moore, P., Xhafa, F., Barolli, L., & Thomas, A. (2013, October). Monitoring and detection of agitation in dementia: Towards real-time and big-data solutions Paper presented at the 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, Compiegne, France.MooreP.XhafaF.BarolliL.ThomasA.2013Monitoring and detection of agitation in dementia: Towards real-time and big-data solutionsPaper presented at the 2013 Eighth International Conference on P2P, Parallel, Grid, Cloud and Internet ComputingCompiegne, FranceSearch in Google Scholar

Naito, M. (2014). Utilization and application of public health data in descriptive epidemiology. Journal of Epidemiology, 24(6), 435–436.NaitoM.2014Utilization and application of public health data in descriptive epidemiologyJournal of Epidemiology246435436Search in Google Scholar

Nance, J. W., Jr., Meenan, C., & Nagy, P. G. (2013). The future of the radiology information system. AJR. American Journal of Roentgenology,200(5), 1064–1070.NanceJ. W., Jr.MeenanC.NagyP. G.2013The future of the radiology information systemAJR. American Journal of Roentgenology200510641070Search in Google Scholar

Obenshain, M. K. (2004). Application of data mining techniques to healthcare data. Infection Control and Hospital Epidemiology, 25(8), 690–695.ObenshainM. K.2004Application of data mining techniques to healthcare dataInfection Control and Hospital Epidemiology258690695Search in Google Scholar

O’Driscoll, A., Daugelaite, J., & Sleator, R. D. (2013). ‘Big data’, Hadoop and cloud computing in genomics. Journal of Biomedical Informatics,46(5), 774–781.O’DriscollA.DaugelaiteJ.SleatorR. D.2013‘Big data’, Hadoop and cloud computing in genomicsJournal of Biomedical Informatics465774781Search in Google Scholar

Oztekin, A., Delen, D., & Kong, Z. J. (2009). Predicting the graft survival for heart-lung transplantation patients: An integrated data mining methodology. International Journal of Medical Informatics, 78(12), e84–e96.OztekinA.DelenD.KongZ. J.2009Predicting the graft survival for heart-lung transplantation patients: An integrated data mining methodologyInternational Journal of Medical Informatics7812e84e96Search in Google Scholar

Páez, D. G., Rodríguez, M. D. B., Sánz, E. P., Villalba, M. T., & Gil, R. M. (2015). Big data processing using wearable devices for wellbeing and healthy activities promotion. In I. Cleland, L. Guerrero, & J. Bravo (Eds.), IWAAL: Ambient assisted living. ICT-based Solutions in Real Life Situations (pp. 196–205). Cham, Switzerland: Springer.PáezD. G.RodríguezM. D. B.SánzE. P.VillalbaM. T.GilR. M.2015Big data processing using wearable devices for wellbeing and healthy activities promotionClelandI.GuerreroL.BravoJ.IWAAL: Ambient assisted living. ICT-based Solutions in Real Life Situations196205Cham, SwitzerlandSpringerSearch in Google Scholar

Pai, F. Y., & Huang, K. I. (2011). Applying the technology acceptance model to the introduction of healthcare information systems. Technological Forecasting and Social Change, 78(4), 650–660.PaiF. Y.HuangK. I.2011Applying the technology acceptance model to the introduction of healthcare information systemsTechnological Forecasting and Social Change784650660Search in Google Scholar

Panahiazar, M., Taslimitehrani, V., Jadhav, A., & Pathak, J. (2014, October). Empowering personalized medicine with big data and semantic web technology: Promises, Challenges, and Use Cases 2014 IEEE International Conference on Big Data, Washington, DC.PanahiazarM.TaslimitehraniV.JadhavA.PathakJ.2014Empowering personalized medicine with big data and semantic web technology: Promises, Challenges, and Use Cases2014IEEE International Conference on Big DataWashington, DCSearch in Google Scholar

Paul, R., & Hoque, A. S. M. L. (2010). Clustering medical data to predict the likelihood of diseases. 2010 Fifth International Conference on Digital Information Management 44-49. Thunder Bay, Canada.PaulR.HoqueA. S. M. L.2010Clustering medical data to predict the likelihood of diseases2010 Fifth International Conference on Digital Information Management4449Thunder Bay, CanadaSearch in Google Scholar

Pentland, A., Reid, T., & Heibeck, T. (2013). Big data and health: Revolutionizing medicine and public health. Report of the Big Data andd Health Working Group 2013 Retrieved from http://www.wish-qatar.org/summits/wish-2013/forums-research-chairs/big-data-healthcare/PentlandA.ReidT.HeibeckT.2013Big data and health: Revolutionizing medicine and public healthReport of the Big Data andd Health Working Group 2013Retrieved fromhttp://www.wish-qatar.org/summits/wish-2013/forums-research-chairs/big-data-healthcare/Search in Google Scholar

Polpitiya, A. D., Qian, W. J., Jaitly, N., Petyuk, V. A., Adkins, J. N., Camp, D. G.,…Smith, R. D. (2008). DAnTE: A statistical tool for quantitative analysis of -omics data. Bioinformatics, 24(13), 1556–1558.PolpitiyaA. D.QianW. J.JaitlyN.PetyukV. A.AdkinsJ. N.CampD. G.…SmithR. D.2008DAnTE: A statistical tool for quantitative analysis of -omics dataBioinformatics241315561558Search in Google Scholar

Poulymenopoulou, M., Malamateniou, F., Prentza, A., &Vassilacopous, G. (2015). Challenges of evolving PINCLOUD PHR into a PHR-based health analytics system Paper presented at the Proceedings of the European, Mdediterranean & Middle Eastern Conference on Information Systems EMCIS.PoulymenopoulouM.MalamateniouF.PrentzaA.&VassilacopousG.2015Challenges of evolving PINCLOUD PHR into a PHR-based health analytics systemPaper presented at the Proceedings of the European, Mdediterranean & Middle Eastern Conference on Information SystemsEMCISSearch in Google Scholar

Preen, D. B., Holman, C. D., Spilsbury, K., Semmens, J. B., & Brameld, K. J. (2006). Length of comorbidity lookback period affected regression model performance of administrative health data. Journal of Clinical Epidemiology,59(9), 940–946.PreenD. B.HolmanC. D.SpilsburyK.SemmensJ. B.BrameldK. J.2006Length of comorbidity lookback period affected regression model performance of administrative health dataJournal of Clinical Epidemiology599940946Search in Google Scholar

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: Promise and potential. Health Information Science and Systems, 2(1), 3.RaghupathiW.RaghupathiV.2014Big data analytics in healthcare: Promise and potentialHealth Information Science and Systems213Search in Google Scholar

Redmond, S. J., Lovell, N. H., Yang, G. Z., Horsch, A., Lukowicz, P., Murrugarra, L., & Marschollek, M. (2014). What does big data mean for wearable sensor systems? Yearbook of Medical Informatics, 9(1), 135–142.RedmondS. J.LovellN. H.YangG. Z.HorschA.LukowiczP.MurrugarraL.MarschollekM.2014What does big data mean for wearable sensor systems?Yearbook of Medical Informatics91135142Search in Google Scholar

Roberts, E. B. (1985). Health information systems Clinics in Laboratory Medicine, 23(5), 672–676.RobertsE. B.1985Health information systemsClinics in Laboratory Medicine235672676Search in Google Scholar

Rothstein, M. A. (2010). Is deidentification sufficient to protect health privacy in research? The American Journal of Bioethics, 10(9), 3–11.RothsteinM. A.2010Is deidentification sufficient to protect health privacy in research?The American Journal of Bioethics109311Search in Google Scholar

Rui, Y. (2015). Medical big data: The next industry windy spot. Business School,[Chinese], 4 100-103.RuiY.2015Medical big data: The next industry windy spotBusiness School,[Chinese]4100103Search in Google Scholar

Rumsfeld, J. S., Joynt, K. E., & Maddox, T. M. (2016). Big data analytics to improve cardiovascular care: Promise and challenges. Nature Reviews. Cardiology, 13(6), 350–359.RumsfeldJ. S.JoyntK. E.MaddoxT. M.2016Big data analytics to improve cardiovascular care: Promise and challengesNature Reviews. Cardiology136350359Search in Google Scholar

Safavi, S., & Shukur, Z. (2014). Conceptual privacy framework for health information on wearable device. PLoS One, 9(12), e114306.SafaviS.ShukurZ.2014Conceptual privacy framework for health information on wearable devicePLoS One912e114306Search in Google Scholar

Schadt, E. E.(2012). The changing privacy landscape in the era of big data. Molecular Systems Biology, 8(1), 612.SchadtE. E.2012The changing privacy landscape in the era of big dataMolecular Systems Biology81612Search in Google Scholar

Sejdić, E. (2014). Medicine: Adapt current tools for handling big data. Nature, 507(7492), 306.SejdićE.2014Medicine: Adapt current tools for handling big dataNature5077492306Search in Google Scholar

Sepulveda, J. L., & Young, D. S. (2013). The ideal laboratory information system. Archives of Pathology & Laboratory Medicine, 137(8), 1129–1140.SepulvedaJ. L.YoungD. S.2013The ideal laboratory information systemArchives of Pathology & Laboratory Medicine137811291140Search in Google Scholar

Sepulveda, M. J.(2013). From worker health to citizen health: Moving upstream. Journal of Occupational and Environmental Medicine, 55(12, Suppl), S52–S57.SepulvedaM. J.2013From worker health to citizen health: Moving upstreamJournal of Occupational and Environmental Medicine5512S52S57Search in Google Scholar

Service, R. F.(2013). Biology’s dry future. Science, 342(6155), 186–189.ServiceR. F.2013Biology’s dry futureScience3426155186189Search in Google Scholar

Shah, N. H., & Tenenbaum, J. D. (2012). The coming age of data-driven medicine: Translational bioinformatics’ next frontier. Journal of the American Medical Informatics Association, 19(e1), e2–e4.ShahN. H.TenenbaumJ. D.2012The coming age of data-driven medicine: Translational bioinformatics’ next frontierJournal of the American Medical Informatics Association19e1e2e4Search in Google Scholar

Sheta, O. E., & Eldeen, A. N. (2013). The technology of using a data warehouse to support decision-making in health care. International Journal of Database Management Systems, 5(3),75-86.ShetaO. E.EldeenA. N.2013The technology of using a data warehouse to support decision-making in health careInternational Journal of Database Management Systems537586Search in Google Scholar

Sirintrapun, S. J., & Artz, D. R. (2016). Health information systems. Clinics in Laboratory Medicine, 36(1), 133.SirintrapunS. J.ArtzD. R.2016Health information systemsClinics in Laboratory Medicine361133Search in Google Scholar

Steinbrook, R. (2008). Personally controlled online health data—The next big thing in medical care? The New England Journal of Medicine, 358(16), 1653–1656.SteinbrookR.2008Personally controlled online health data—The next big thing in medical care?The New England Journal of Medicine3581616531656Search in Google Scholar

Swan, M. (2013). The quantified self: Fundamental disruption in big data science and biological discovery. Big Data, 1(2), 85–99.SwanM.2013The quantified self: Fundamental disruption in big data science and biological discoveryBig Data128599Search in Google Scholar

Tan, S. S., Gao, G., & Koch, S. (2015). Big Data and Analytics in Healthcare. Methods of Information in Medicine, 54(6), 546–547.TanS. S.GaoG.KochS.2015Big Data and Analytics in HealthcareMethods of Information in Medicine546546547Search in Google Scholar

Tang, P. C., Ash, J. S., Bates, D. W., Overhage, J. M., & Sands, D. Z. (2006). Personal health records: Definitions, benefits, and strategies for overcoming barriers to adoption. Journal of the American Medical Informatics Association, 13(2), 121–126.TangP. C.AshJ. S.BatesD. W.OverhageJ. M.SandsD. Z.2006Personal health records: Definitions, benefits, and strategies for overcoming barriers to adoptionJournal of the American Medical Informatics Association132121126Search in Google Scholar

Taverner, T., Karpievitch, Y. V., Polpitiya, A. D., Brown, J. N., Dabney, A. R., Anderson, G. A., & Smith, R. D. (2012). DanteR: An extensible R-based tool for quantitative analysis of -omics data. Bioinformatics (Oxford, England), 28(18), 2404–2406.TavernerT.KarpievitchY. V.PolpitiyaA. D.BrownJ. N.DabneyA. R.AndersonG. A.SmithR. D.2012DanteR: An extensible R-based tool for quantitative analysis of -omics dataBioinformatics (Oxford, England)281824042406Search in Google Scholar

Tola, K., Abebe, H., Gebremariam, Y., & Jikamo, B. (2017). Improving Completeness of Inpatient Medical Records in Menelik II Referral Hospital, Addis Ababa, Ethiopia. Advances in Public Health, 2017 1–5.TolaK.AbebeH.GebremariamY.JikamoB.2017Improving Completeness of Inpatient Medical Records in Menelik II Referral Hospital, Addis Ababa, EthiopiaAdvances in Public Health, 20171–5Search in Google Scholar

Tony, H., Stewart, T., & Kristin, T. (2012). The fourth paradigm: Data -intensive scientific discover Berlin, Germany : Springer-Verlag Berlin Heidelberg.TonyH.StewartT.KristinT.2012The fourth paradigm: Data -intensive scientific discoverBerlin, GermanySpringer-Verlag Berlin HeidelbergSearch in Google Scholar

Tsumoto, S., Hirano, S., & Iwata, H. (2013). Mining nursing care plan from data extracted from hospital information system Paper presented at the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Niagara Falls, ON, Canada.TsumotoS.HiranoS.IwataH.2013Mining nursing care plan from data extracted from hospital information systemPaper presented at the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Niagara Falls, ONCanadaSearch in Google Scholar

Usami, Y., Cho, H. C., Okazaki, N., & Tsujii, J. I. (2011). Automatic acquisition of huge training data for bio-medical named entity recognition. Proceedings of BioNLP 2011 Workshop 5 65-73.UsamiY.ChoH. C.OkazakiN.TsujiiJ. I.2011Automatic acquisition of huge training data for bio-medical named entity recognitionProceedings of BioNLP 2011 Workshop56573Search in Google Scholar

Valdes, I., Kibbe, D. C., Tolleson, G., Kunik, M. E., & Petersen, L. A. (2004). Barriers to proliferation of electronic medical records. Journal of Innovation in Health Informatics, 12(1), 3–9.ValdesI.KibbeD. C.TollesonG.KunikM. E.PetersenL. A.2004Barriers to proliferation of electronic medical recordsJournal of Innovation in Health Informatics12139Search in Google Scholar

Vesna, V. (2000). The Visible Human Project: Informatic bodies and posthuman medicine. AI & Society, 14(2), 262–263.VesnaV.2000The Visible Human Project: Informatic bodies and posthuman medicineAI & Society142262263Search in Google Scholar

Wang, L., & Alexander, C. A. (2013). Applications of automated identification technology in EHR/EMR. International Journal of Public Health Science, 2(3), 109–122.WangL.AlexanderC. A.2013Applications of automated identification technology in EHR/EMRInternational Journal of Public Health Science23109122Search in Google Scholar

Wang, Y., Kung, L., Ting, C., & Byrd, T. A. (2015). Beyond a technical perspective: Understanding big data capabilities in health care. Proceedings of 48th Annual Hawaii International Conference on System Sciences 48( pp3044-3053). Hawaii, USA.WangY.KungL.TingC.ByrdT. A.2015Beyond a technical perspective: Understanding big data capabilities in health careProceedings of 48th Annual Hawaii International Conference on System Sciences4830443053Hawaii, USASearch in Google Scholar

Ward, J. C. (2014). Oncology reimbursement in the era of personalized medicine and big data. Journal of Oncology Practice 10(2), 83–86.WardJ. C.2014Oncology reimbursement in the era of personalized medicine and big dataJournal of Oncology Practice1028386Search in Google Scholar

White, S. E. (2013). De-identification and the sharing of big data. Journal of American Health Information Management Association, 84(4), 44–47.WhiteS. E.2013De-identification and the sharing of big dataJournal of American Health Information Management Association8444447Search in Google Scholar

Wilson, A. M., Thabane, L., & Holbrook, A. (2004). Application of data mining techniques in pharmacovigilance. British Journal of Clinical Pharmacology, 57(2), 127–134.WilsonA. M.ThabaneL.HolbrookA.2004Application of data mining techniques in pharmacovigilanceBritish Journal of Clinical Pharmacology572127134Search in Google Scholar

Windridge, D., & Bober, M. (2014). A kernel-based framework for medical big-data analytics. In A. Holzinger & I. Jursica (Eds.), Interactive knowledge discovery and data mining in biomedical informatics (pp. 197-208). Berlin, Germany: Springer-Verlag.WindridgeD.BoberM.2014A kernel-based framework for medical big-data analyticsHolzingerA.JursicaI.Interactive knowledge discovery and data mining in biomedical informatics197208Berlin, GermanySpringer-VerlagSearch in Google Scholar

Wu, P. Y., Cheng, C. W., Kaddi, C. D., Venugopalan, J., Hoffman, R., & Wang, M. D. (2017). –Omic and electronic health record big data analytics for precision medicine. IEEE Transactions on Biomedical Engineering, 64(2), 263–273.WuP. Y.ChengC. W.KaddiC. D.VenugopalanJ.HoffmanR.WangM. D.2017–Omic and electronic health record big data analytics for precision medicineIEEE Transactions on Biomedical Engineering642263273Search in Google Scholar

Xiang, W., Wang, G., Pickering, M. & Zhang, Y. (2016). Big video data for light-field-based 3D telemedicine. IEEE Network, 30(3), 30–38.XiangW.WangG.PickeringM.ZhangY.2016Big video data for light-field-based 3D telemedicineIEEE Network3033038Search in Google Scholar

Xu, J., Wise, C., Varma, V., Fang, H., Ning, B., Hong, H., Kaput, J. (2010). Two new Array Track libraries for personalized biomedical research. BMC Bioinformatics, 11(Suppl 6), S6.XuJ.WiseC.VarmaV.FangH.NingB.HongH.KaputJ.2010Two new Array Track libraries for personalized biomedical researchBMC Bioinformatics116Search in Google Scholar

Yan, Y., Qin, X., Fan, J., & Wang, L. (2014). A review on healthcare big data research. E-Science Technology & Application, [Chinese], 5(6), 3-16.YanY.QinX.FanJ.WangL.2014A review on healthcare big data researchE-Science Technology & Application, [Chinese]56316Search in Google Scholar

Yom-Tov, E. (2016). Crowdsourced health: How what you do on the Internet will improve medicine Cambridge, MA: Mit Press.Yom-TovE.2016Crowdsourced health: How what you do on the Internet will improve medicineCambridge, MAMit PressSearch in Google Scholar

Youssef, A. E. (2014). A framework for secure healthcare systems based on big data analytics in mobile cloud computing environments. The International Journal of Ambient Systems and Applications, 2(2), 1-11.YoussefA. E.2014A framework for secure healthcare systems based on big data analytics in mobile cloud computing environmentsThe International Journal of Ambient Systems and Applications22111Search in Google Scholar

Yuen-Reed, G., & Mojsilović, A. (2016). The role of big data and analytics in health payer transformation to consumer-centricity. In C. Weaver, M. Ball, G. Kim & J. Kiel (Eds.), Healthcare information management systems (pp. 399–420). Switzerland: Springer.Yuen-ReedG.MojsilovićA.2016The role of big data and analytics in health payer transformation to consumer-centricityWeaverC.BallM.KimG.KielJ.Healthcare information management systems399420SwitzerlandSpringerSearch in Google Scholar

Zhang, D. Q., & Chen, S. C. (2004). A novel kernelized fuzzy c-means algorithm with application in medical image segmentation. Artificial Intelligence in Medicine, 32(1), 37–50.ZhangD. Q.ChenS. C.2004A novel kernelized fuzzy c-means algorithm with application in medical image segmentationArtificial Intelligence in Medicine3213750Search in Google Scholar

Zhang, Z., Zhou, Y., Du, S. H., Luo, X. Q., & Mei, T. (2014). Medical big data and the facing opportunities and challenge. Journal of Medical Informatics, 6 2–8.ZhangZ.ZhouY.DuS. H.LuoX. Q.MeiT.2014Medical big data and the facing opportunities and challengeJournal of Medical Informatics62–8Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo