34
Page 1 of 34 1. ABSTRACT Objectives: The purpose of this research is to portray and discuss the perspectives of the evolving use of Big Data Analytics to unravel the causes and prerequisites for preventing diseases and to examine some of the opportunities and challenges on its economic value in Public Health and offers recommendations & conclusions. Methods: A non-systematic review of the literature was conducted to highlight the implications associated with the use of Big Data Analytics in healthcare innovations and its applications to address public health challenges in India. A thematic review of selected articles was performed with an architectural framework and methodology, describes examples reported in the literature, briefly discusses the challenges, and offers conclusions. Results: The paper provides a broad overview various applications of Big Data analytics for clinicians, public health practitioners, epidemiologists, policy makers and other health experts for the use of Big Data and analytics in the areas of healthcare Conclusions:. The concept of Big Data and associated analytics are to be taken seriously when approaching the use of vast volumes of both structured and unstructured data in science and healthcare. Big Data analytics in Public Health is evolving into a promising field for providing insight from very large data sets and improving outcomes while reducing costs. Future exploration of issues surrounding data privacy, confidentiality, and education are needed.

Big Data Final Report

  • Upload
    clinfox

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

Big Data Important Concepts

Citation preview

  • Page 1 of 34

    1. ABSTRACT

    Objectives:

    The purpose of this research is to portray and discuss the perspectives of the evolving use of Big

    Data Analytics to unravel the causes and prerequisites for preventing diseases and to examine

    some of the opportunities and challenges on its economic value in Public Health and offers

    recommendations & conclusions.

    Methods:

    A non-systematic review of the literature was conducted to highlight the implications associated

    with the use of Big Data Analytics in healthcare innovations and its applications to address

    public health challenges in India. A thematic review of selected articles was performed with an

    architectural framework and methodology, describes examples reported in the literature, briefly

    discusses the challenges, and offers conclusions.

    Results:

    The paper provides a broad overview various applications of Big Data analytics for clinicians,

    public health practitioners, epidemiologists, policy makers and other health experts for the use of

    Big Data and analytics in the areas of healthcare

    Conclusions:.

    The concept of Big Data and associated analytics are to be taken seriously when approaching the

    use of vast volumes of both structured and unstructured data in science and healthcare. Big Data

    analytics in Public Health is evolving into a promising field for providing insight from very large

    data sets and improving outcomes while reducing costs. Future exploration of issues surrounding

    data privacy, confidentiality, and education are needed.

  • Page 2 of 34

    2. INTRODUCTION

    Public Health is not a new field every successful civilization has recognized the health

    implications of lean water and the efficient disposal of human waste. Today, the Public Health

    agenda has been defined and driven by National and International agencies such as the World

    Health Organization (WHO), the National Health Service (NHS) and the Centers for Disease

    Control and Prevention (CDC). Healthcare in India is government financed and government run.

    But, for many people living in the many parts of the country, accessing primary healthcare is still

    a challenge. For the developing countries like India, strengthening the public health system is

    one of the most important areas for emphasis, so as to provide better healthcare access to the

    priceless human resources, which in turn can make the India healthier too.

    The most effective public health interventions are typically preventative interventions and

    policies that help stop a crisis before it starts. But predicting the next public health crisis has

    historically been a challenge in preventing diseases, designing better diagnostic tools and

    increase access to and reduce the costs of healthcare. Many experts, including researchers, policy

    makers and practitioners identified that, there is a big gap in the knowledge about interventions

    in Public Health delivery systems. The inefficiencies and inequities in the Public Health in India

    have pushed forward the need for creative thinking and innovative solutions to strengthen the

    same. The exponential growth of data over the last decade has introduced a new domain in needs

    validation and analysis, Big Data Analytics can be applied. Big Data has the potential to perform

    critical computing and analytical ability towards the processing of the huge volumes of

    transactional data.

    Big data in healthcare is overwhelming not only because of its volume but also because of the

    diversity of data types and the speed at which it must be managed. The totality of data related to

    patient healthcare and wellbeing make up big data in the healthcare industry. It includes

    clinical data from CPOE and clinical decision support systems (physicians written notes and

    prescriptions, medical imaging, laboratory, pharmacy, insurance, and other administrative data);

    patient data in electronic patient records (EPRs); machine generated/sensor data, such as from

    monitoring vital signs; social media posts, including Twitter feeds (so-called tweets) [8], blogs

    [9], status updates on Facebook and other platforms, and web pages; and less patient-specific

    information, including emergency care data, news feeds, and articles in medical journals.

  • Page 3 of 34

    The potential applications of Big Data analytics in public health are 1) analyzing disease patterns

    and tracking disease outbreaks and transmission to improve public health surveillance and speed

    response; 2) faster development of more accurately targeted vaccines, e.g.,choosing the annual

    influenza strains; and, 3) turning large amounts of data into actionable information that can be

    used to identify needs, provide services, and predict and prevent crises, especially for the benefit

    of populations. In addition, [14] suggests Big Data analytics in healthcare can contribute to

    Evidence-based medicine: Combine and analyze a variety of structured and unstructured data-

    EMRs, financial and operational data, clinical data, and genomic data to match treatments with

    outcomes, predict patients at risk for disease or readmission and provide more efficient care.

    The current research project provides an overview of Big Data analytics in addressing the

    healthcare as it is emerging as a discipline. First, we define and discuss the various advantages

    and characteristics of Big Data analytics in healthcare. Then we describe the architectural

    framework of Big Data analytics in healthcare. Third, the Big Data analytics application

    development methodology is described. Fourth, we provide examples of Big Data analytics in

    healthcare reported in the literature. Fifth, the challenges are identified. Lastly, we offer

    conclusions and future directions.

  • Page 4 of 34

    3. AIMS & OBJECTIVES:

    Aims: The aim of the current research is to portray and discuss the perspectives of the evolving

    use of Big Data Analytics in Healthcare and, to examine some of the opportunities and

    challenges from Public Health perspectives in India.

    Objectives:

    The main objective of this dissertation was to gain new knowledge on how to bridge data mining

    and Public Health communities to foster interdisciplinary works between the two communities.

    The data collected were then used to achieve the following specific objectives:

    1. To identify the benefits, risks and opportunities for Big Data in health and make

    recommendations for the use of Big Data in the delivery of healthcare services in India.

    2. To understand the gap between the healthcare delivery systems and public health

    3. To understand the spatial distribution of epidemiological outbreaks globally by using

    Google Trends tool.

  • Page 5 of 34

    4. REVIEW OF LITERATURE

    4.1 What is BIG DATAA?

    Big Data is a term used by the IT industry to describe the voluminous amount of unstructured

    data an organization creates. It represents information that has not been normalized or

    harmonized, comes from many different sources, and in the past has been too expensive or not

    practical operationally to normalize for typical online transactional processing (OLTP) or data

    warehouse type data stores. Big Data (BIG DATA) has the characteristic of vast size that

    exceeds the capability of traditional data management technologies and requires the use of new

    capabilities and processes to source, process and manage it.

    In siple terms Big Data is A collection of large and complex data sets which are difficult to

    process using common database management tools or traditional data processing applications.

    Big Data refers to the tools, processes and procedures allowing an organization to create,

    manipulate, and manage very large data sets and storage facilities. Big Data is not just about

    size. Finds insights from complex, noisy, heterogeneous, longitudinal, and voluminous data. It

    aims to answer questions that were previously unanswered.

    Four Vs definition that points to the four characteristics of Big Data, namely volume, variety,

    velocity, and veracity

    BIG DATA is described using four terms:

    The convergence of these four dimensions helps to define Big Data:

    Volume (the amount of data): it refers to the mass quantities of data that organizations

    are trying to use to improve decision-making processes. Data volumes continue to

    increase at an unprecedented rate. However, geography, and is smaller than the petabytes

    and zettabytes often referenced. Many companies consider datasets between one terabyte

    and one petabyte to be Big Data. Still, everyone can agree that whatever is considered

    high volume today, will be even higher tomorrow.

    Variety (different types of data and data sources): variety is about managing the

    complexity of multiple data types, including structured, semi-structured and unstructured

    data. Organizations need to integrate and analyze data from a complex array of both

    traditional and nontraditional information sources, from within and outside the enterprise.

    With the explosion of sensors, smart devices and social media technologies, data is being

  • Page 6 of 34

    generated in countless forms, including text, web data, tweets, sensor data, audio, video,

    click streams, log files and more;

    Velocity (data in motion): the speed at which data is created, processed and analyzed

    continues to accelerate. Higher velocity is due to both the real-time nature of data

    creation, and the need to incorporate streaming data into business processes. Today, data

    is continually being generated at a rate that is impossible for traditional systems to

    capture, store and analyze. For time-sensitive processes such as multi-channel instant

    marketing, data must be analyzed in real time to be of value to the business;

    Veracity (data uncertainty): it refers to the level of reliability associated with certain

    types of data. The quest for high data quality is an important Big Data requirement and

    challenge, but even the best data cleansing methods cannot remove the inherent

    unpredictability of some data, like the weather, the economy, or a customers buying

    decisions. The need to acknowledge and plan for uncertainty is a dimension of Big Data

    that has been introduced as executives try to better understand the uncertain world around

    them.

    The four "V" for Big Data

    The term Analytics refers to the logic and algorithms, both deduction and inference, performed

    on BIG DATA to derive value, insights and knowledge from it. Analytical methods such as data

    mining, natural language processing, artificial intelligence and predictive analytics are employed

    to analyze, contextualize and visualize the data. These computerized analytical methods

  • Page 7 of 34

    recognize inherent patterns, correlations and anomalies which are discovered as a result of

    integrating vast amounts of data from different datasets.

    Together, the term Big Data Analytics represents, across all industries, new data-driven

    insights which are being used for competitive advantage over peer organizations to more

    effectively market products and services to targeted consumers. Examples include real-time

    purchasing patterns and recommendations back to consumers, and gaining better understandings

    and insights into consumer preferences and perspectives through affinity to certain social groups.

    The origin of BIG DATAA comes from web-based search engines such as Google and Yahoo,

    the popularity of social media and social networking services such as Facebook and Twitter, and

    data-generating sensors, telehealth and mobile devices. All have increased and generated new

    data and opportunities for new insights on customer behaviours and trends. While BIG DATAA

    frameworks have been in operation since 2005, they have just recently moved into other

    industries and sectors including financial services firms and banks, online retailers and

    healthcare.

    For healthcare, Big Data represents opportunities to exploit personalized care, streamline health

    operations, support clinical and policy decision making, and improve patient engagement.

    Today, across all industries, the typical sources of Big Data include:

    Internet transactions By 2015, more than three billion people will be online. Billions

    of online purchases, stock trades, social networking exchanges, Internet searches and

    other transactions happen every day, including countless automated transactions. Each

    creates a number of data points collected by retailers, banks, credit card issuers, credit

    agencies, social networking and search engine service providers and others.

    Mobile devices There are more than 5.6 billion mobile phones in use worldwide. Each

    call, text and instant message is generating data. The average teen texts 4,700 times per

    month. Mobile devices, particularly smart phones and tablets, also make it easier to use

    social networking and other data-generating applications. Mobile devices also collect and

    transmit location data.

    Social networking and media There are currently more than 955 million active

    Facebook users, 500 million Twitter users and 156 million public blogs. By 2015, more

    than two billion videos will be watched over YouTube in one day. Each Facebook

  • Page 8 of 34

    update, tweet, blog post and comment creates multiple new data points structured,

    semi-structured and unstructured sometimes referred to as data exhaust.

    Networked devices and sensors Electronic devices of all sorts including servers and

    other IT hardware, smart energy meters and temperature sensors, patient monitors and

    aides all create semi-structured log data that record every action.

    Genomic data Represents significant amounts of new gene sequencing data being

    made available through new investments, BIG DATAA capabilities and business models.

    Streamed data Home monitoring, telehealth, handheld and sensor-based wireless and

    smart devices are new data sources and types. They represent significant amounts of real

    time data available for use by the health system.

    Web and social networking-based data Web-based data comes from Google and

    other search engines, consumer use of the Internet, as well as data from social networking

    sites.

    Health publication and clinical reference data This includes text-based publications

    (clinical research and medical reference material) and clinical text based reference

    practice guidelines and health product (e.g., drug information) data.

    Clinical data Eighty per cent of health data is unstructured as documents, images,

    clinical or transcribed notes. These semi-structured to unstructured clinical records and

    documents represent new data sources

    Business, organizational and external data Data which previously has not been

    linked, such as financial, billing, scheduling, administrative, external and other non-

    clinical and non-health data.

    It is important to note that while there are many sources of Big Data within the health sector, it is

    unrealistic to assume that all data can be put to use for Big Data due to a range of governance,

    privacy, operational and technical considerations.

    Gartner Groups analysis of Big Data shows that vendors are enabling Big Data with a wide

    variety of new and old technologies, in different ways and at different rates. Overall, Gartner

    depicts an IT market that is still fairly immature, with larger traditional DW/BI entities engaged

    and investing millions of dollars, and smaller Big Data pure-players ramping up their go-to-

    market strategies purely focused on Big Data. Gartners research points to a marketplace in the

    early adopter phase, despite the large valuation8 of $5 billion (US).

  • Page 9 of 34

    5. MATERIALS AND METHODS 5.1 Methodology: In this section the methodological approach is described, explaining how the research work is

    carried out in order to answer to proposed research questions. Figure 1 shows the detailed steps

    involved in the execution process of Big Data. While several different methodologies are being developed in this rapidly emerging discipline,

    here we outline one that is practical and hands-on. Table 01 shows the main stages of the

    methodology. The cutting edge computational technologies of Big Data collection, storage,

    transferring, and the state-of-the-art analytical methods were introduced. The future perspectives

    of health sciences in the era of Big Data were discussed.

    STEP 01 : Formulate your question

    STEP 02: Find the right ways (smart devices, Internet, hospitals ) to collect your data;

    STEP 03: Store the data

    STEP 04: Analyze your data

    STEP 05: Generate the analysis report with vivid visualization.

    STEP 06: Evaluate the project: problem solved or start over.

    Table 01: Steps for Big Data Analytics Pojects in Healthcare:

    Fig 01: Digramatic representation of flow of Big Data process

  • Page 10 of 34

    To start a Big Data project, several steps are suggested as shown in Fig. 1: First, the right

    problem should be chosen. There are three kinds of problems. The first kind of problem has

    already been solved with traditional method and there is no need to use Big Data technologies.

    The second kind of problem is impossible to be solved with current technologies. We should

    focus on the third kind of problem that is solvable with current Big Data technologies. Second,

    we need to generate the data by sensors, monitors, molecular profiling or extract the data from

    public databases/sources after setting up a practical goal. Third, we need to do data pre-

    processing to obtain clean and meaningful data. Data pre-processing is a critical step for the

    success of a Big Data project. A recent publication [5] showed that sample mis-alignment for

    eQTL (expression Quantitative Trait Loci) and mQTL (methylation Quantitative Trait Loci)

    studies will reduce the discovered associations by 27 folds. The quality control of data

    essentially determines the upper bound of the data product, i.e. garbage in garbage out. The clean

    data will be stored into database for the next step analysis. Fourth, the insight or knowledge will

    be discovered from the processed data through statistical analysis. At last, the analytic results

    will be presented to the end user as a report, an online recommendation or a decision-making.

    Visualization of data, such as networks/graphs and charts, make the analytic results easy to

    interpret and understand. If the results do not make sense, we need to reformulate our problems

    and start the steps over again.

    In health sciences, there are many problems that can be addressed with Big Data technologies,

    such as recommendation system in healthcare, Internet based epidemic surveillance, sensor

    based health condition and food safety monitoring, Genome-Wide Association Studies (GWAS)

    and expression Quantitative Trait Loci (eQTL), inferring air quality using Big Data and

    metabolomics and ionomics for nutritionists.

    To solve these problems, many advanced computational technologies will be used. We will

    cover the following technological perspectives: (1) Infrastructure of Big Data; (2) Analyzing of

    Big Data Results; and (3) Visualization of Big Data Results. And the future perspectives of

    health sciences in the era of Big Data will be discussed.

  • Page 11 of 34

    5.2 Architectural Framework

    The conceptual framework for a Big Data analytics project in Public Health is similar to that of a

    traditional health informatics or analytics project. The key difference lies in how processing is

    executed. In a regular health analytics project, the analysis can be performed with a business

    intelligence tool installed on a stand-alone system, such as a desktop or laptop. Because Big Data

    is by definition large, processing is broken down and executed across multiple nodes. The

    concept of distributed processing has existed for decades. What is relatively new is its use in

    analyzing very large data sets as healthcare providers start to tap into their large data repositories

    to gain insight for making better-informed health-related decisions. Furthermore, open source

    platforms such as Hadoop/MapReduce, available on the cloud, have encouraged the application

    of Big Data analytics in healthcare.

    While the algorithms and models are similar, the user interfaces of traditional analytics tools and

    those used for Big Data are entirely different; traditional health analytics tools have become very

    user friendly and transparent.

    Big Data analytics tools, on the other hand, are extremely complex, programming intensive, and

    require the application of a variety of skills. They have emerged in an ad hoc fashion mostly as

    open-source development tools and platforms, and therefore they lack the support and user-

    friendliness that vendor-driven proprietary tools possess. As Figure 1 indicates, the complexity

    begins with the data itself.

    Big Data in healthcare can come from internal (e.g., electronic health records, clinical decision

    support systems and etc.) and external sources (government sources, laboratories, pharmacies,

    insurance companies & HMOs, etc.), often in multiple formats (flat files, .csv, relational tables,

    ASCII/text, and PDFs etc.) and residing at multiple locations (geographic as well as in different

    healthcare providers sites) in numerous legacy and other applications (transaction processing

    applications, databases, etc.). Sources and data types include:

    1. Web and social media data: Clickstream and interaction data from Facebook, Twitter,

    LinkedIn, blogs, and the like. It can also include health plan websites, smartphone apps, etc. [6].

  • Page 12 of 34

    2. Machine to machine data: readings from remote sensors, meters, and other vital sign devices

    [6].

    3. Big transaction data: healthcare claims and other billing records increasingly available in

    semi-structured and unstructured formats [6].

    4. Biometric data: finger prints, genetics, handwriting, retinal scans, x-ray and other medical

    images, blood pressure, pulse and pulse-oximetry readings, and other similar types of data [6].

    5. Human-generated data: unstructured and semi-structured data such as EMRs, physicians notes,

    email, and paper documents [6].

    For the purpose of Big Data analytics, this data has to be pooled. In the second component the data is in

    raw state and needs to be processed or transformed, at which point several options are available. A

    service oriented architectural approach combined with web services (middleware) is one possibility [27].

    The data stays raw and services are used to call, retrieve and process the data. Another approach is data

    warehousing wherein data from various sources is aggregated and made ready for processing, although

    the data is not available in realtime. Via the steps of extract, transform, and load (ETL), data from diverse

    sources is cleansed and readied. Depending on whether the data is structured or unstructured, several data

    formats can be input to the Big Data analytics platform.

  • Page 13 of 34

    In this next component in the conceptual framework, several decisions are made regarding the data input

    approach, distributed design, tool selection and analytics models. Finally, on the far right, the four typical

    applications of Big Data analytics in healthcare are shown.

    These include queries, reports, OLAP, and data mining. Visualization is an overarching theme across the

    four applications. Drawing from such fields as statistics, computer science, applied mathematics and

    economics, a wide variety of techniques and technologies has been developed and adapted to aggregate,

    manipulate, analyze, and visualize Big Data in healthcare.

    The most significant platform for Big Data analytics is the open-source distributed data processing

    platform Hadoop (Apache platform), initially developed for such routine functions as aggregating web

    search indexes. It belongs to the class NoSQL technologiesothers include CouchDB and

    MongoDBthat evolved to aggregate data in unique ways. Hadoop has the potential to process

    extremely large amounts of data mainly by allocating partitioned data sets to numerous servers (nodes),

    each of which solves different parts of the larger problem and then integrates them for the final result [28-

    31].

    Hadoop can serve the twin roles of data organizer and analytics tool. It offers a great deal of potential in

    enabling enterprises to harness the data that has been, until now, difficult to manage and analyze.

    Specifically, Hadoop makes it possible to process extremely large volumes of data with various structures

    or no structure at all. But Hadoop can be challenging to install, configure and administer, and individuals

    with Hadoop skills are not easily found. Furthermore, for these reasons, it appears organizations are not

    quite ready to embrace Hadoop completely.

    The surrounding ecosystem of additional platforms and tools supports the Hadoop distributed platform

    [30,31]. These are summarized in Table 1. Numerous vendorsincluding AWS, Cloudera, Hortonworks,

    and MapR Technologiesdistribute opensource Hadoop platforms [29]. Many proprietary options are

    also available, such as IBMs BigInsights. Further, many of these platforms are cloud versions, making

    them widely available. Cassandra, HBase, and MongoDB, described above, are used widely for the

    database component.

    While the available frameworks and tools are mostly open source and wrapped around Hadoop and

    related platforms, there are numerous trade-offs that developers and users of Big Data analytics in

    healthcare must consider. While the development costs may be lower since these tools are open source

    and free of charge, the downsides are the lack of technical support and minimal security. In the healthcare

    industry, these are, of course, significant drawbacks, and therefore the trade-offs must be addressed.

  • Page 14 of 34

    Additionally, these platforms/tools require a great deal of programming, skills the typical end-user in

    healthcare may not possess. Furthermore, considering the only recent emergence of Big Data analytics in

    healthcare, governance issues including ownership, privacy, security, and standards have yet to be

    addressed. In the next section we offer an applied Big Data analytics in healthcare methodology to

    develop and implement a Big Data project for healthcare providers.

  • Page 15 of 34

    6. RESULTS

    6.1. Review of Big Data applications to Public Health:

    Many countries are applying Big Data analytics to solve problems in healthcare. The benefits of

    health-related Big Data have been demonstrated in three areas so far, namely to 1) prevent

    disease, 2) identify modifiable risk factors for disease, and 3) design interventions for health

    behavior change [9]. Organizations worldwide are recognizing the Big Data movement and

    introducing new initiatives for knowledge discovery and data-driven decision-making. For

    example, the National Institute of Health (NIH) is establishing the Big Data to Knowledge (BIG

    DATA2K) and Infrastructure Plus Program, which provides a shared computational environment

    (e.g. data standards, ontologies, data catalogues, virtualized cloud computing) to facilitate large-

    scale biomedical data analysis for the NIH community [10]. Specifically, the NIH US Library of

    Medicine hosts an impressive set of data sharing repositories [11], which primarily accept

    submissions of biomedical data and other information sharing systems from NIH-funded

    investigators. In addition, the United Nations (UN) is launching the Global Pulse project, which

    advocates for the data philanthropy movement by asking organizations and individuals to

    contribute data, resources, and skills to help understand the impact of UN development programs

    and ways to improve their outreach on affected populations and regions [12].

    In the United States, the Pillbox project results in an annual $500 million reduction in healthcare

    costs through the application of Big Data analytics [3,4]. The San Francisco Police Department

    has developed a Big Data system designed for crime prevention [3]. The UK is utilizing Big

    Data through establishment and management of the Foresight Horizon Scanning Centre, which

    serves as a countermeasure to various health and social problems such as obesity, potential risk

    management (coastal erosion, climate change), and epidemics [5]. The EU is dealing with

    uncertainty through the iKnow (Interconnect Knowledge) project, which provides opportunities

    for research on earthquakes, tsunamis, terrorism, networking, and global crisis [15]. The OECD

    adopted evaluating economic benefits of Big Data as an agenda for the 15th Working Party on

    Indicators for the Information Society (WPIIS) by considering Big Data for business efficiency

    [8].

  • Page 16 of 34

    Moreover, the Australian Government Information Management Office has saved time and

    resources by developing an automated tool that can analyze, search, and reuse massive

    information through government 2.0 [7]. In 2004, Singapore established the Risk Assessment

    and Horizon Scanning (RAHS) to prepare for future uncertainty regarding terrorism and

    epidemics [6].

    Big Data streams in health can be broadly summarized into three categories [13]. Traditional

    medical data is primarily originated from the health system (e.g. EMRs, personal and family

    health history, medication history, lab reports, pathology results), where the objective of these

    analyses is to derive a better understanding of disease outcomes and their risk factors, reduce

    health system costs, and improve its efficiency [13]. Omics data refer to large-scale datasets in

    the biological and molecular fields (e.g. genomics, microbiomics, proteomics, and

    metabolomics), where the aim of these analyses is to understand the mechanisms of diseases and

    accelerate the individualization of medical treatments (e.g. precision medicine) [3, 6]. As

    pointed out by Alice Whittmore, in the Stanford Big Data in Biomedicine Conference (2013),

    genomic testing and mapping could, for example, point to women in high risk of developing

    breast cancer, which would allow allocating them preventive care, and reduce the need for large

    scale, potentially hazardous interventions, for other low-risk women [14]. Last but not least, data

    from social media and the quantified-self movement essentially consist, of signs and behaviors

    on how individuals (or groups of individuals) use the Internet, social media, mobile applications

    (apps), sensor devices, wearable computing devices, or other technological and non-

    technological tools to better inform and enhance their health.

    This section presents examples of health-related Big Data projects, with an emphasis on data

    from social media and the quantified-self movement (Table 1). For Big Data research related to

    EMRs, digital enterprise, genetic data and omics sources, readers can refer to the following

    reviews and perspectives conducted recently [15, 16, 17, 18, 19].

  • Page 17 of 34

    Examples of health-related Big Data projects related to social media and the quantified-self movement.

    Data type How has it been used in health? Examples

    Quantified-self data

    (via devices, self-

    reporting, or

    sensors)

    Engaged in the self-tracking of signs

    and/or behaviors as n=1 individual or in

    groups, where there is often a proactive

    stance toward acting on the information

    Provides richer and more detailed data

    on potential risk factors (biological,

    physical, behavioral or environmental)

    [13]

    Allows data collection over potentially

    longer follow-up periods than is

    currently possible using standard

    questionnaires [13]

    Food consumption [20]

    Information diet [21]

    Smile triggered electromyogram (EMG)

    muscle to create unexpected moments of

    joy in human interaction [22]

    Coffee consumption, social interaction,

    and mood [23]

    Idea-tracking process [24]

    Use of rescue and controller asthma

    medications with an inhaler sensor (e.g.

    Asthmapolis) [25]

    Monitors blood glucose levels in

    diabetics (e.g. Glooko) [26

    Location-based

    information

    Information derived from Global

    Positioning Systems (GPS), Geographic

    Information Systems (GIS), and other

    open source mapping and visualization

    projects

    Provides information on the

    environmental and social determinants

    of health

    Monitors for disease outbreaks near

    your location

    Weather patterns, pollution levels,

    allergens, traffic patterns, water quality,

    walkability of neighborhood, and access

    to fresh fruit and vegetables (such as

    supermarkets) [34, 35,36]

    HealthMap [37]

    Twitter (Note: a

    2011 study has

    suggested that 8.5%

    Assesses disease spread in real-time

    Assesses sentiments and moods

    Facilitates emergency services by

    Quantify medical misconceptions (e.g.

    concussions) [38]

    The spread of poor medical compliance

  • Page 18 of 34

    Data type How has it been used in health? Examples

    of English-

    language tweets

    relate to illness, and

    16.6% relate to

    health [46])

    allowing for the wide-scale broadcast of

    available resource, enabling people in

    need of medical assistance to locate help

    Facilitates crisis mapping (e.g. where

    eyewitness reports are plotted on

    interactive maps. These data can help

    target areas for emergency services and

    additional resources)

    Facilitates discourse on non-emergency

    healthcare (e.g. broadcasts of public

    health messages, quantify medical

    misconception)

    (e.g., antibiotic use) [39]

    Trends of cardiac arrest and resuscitation

    communication [40

    Cervical and breast cancer screening [41]

    Postpartum depression [42]

    Influenza A H1N1 outbreak (disease

    activity and public concern) [43]

    2010 Haitian cholera outbreak [44]

    Emergency situations from Boston

    marathon explosion [45]

    Health-related

    social networking

    sites

    Facilitates sharing of personal health

    data and advice amongst patients and

    consumers

    Monitors spread of infectious diseases

    via crowd surveillance

    PatientsLikeMe [47]

    Disease surveillance sites which collect

    participant-reported symptoms and

    utilize informal online data sources to

    analyze, map, and disseminate

    information about infectious disease

    outbreaks

    Other social

    networking sites

    (e.g. online

    discussion board,

    Facebook)

    Monitors how patients use social

    media to discuss their concerns and

    issues

    Provides awareness of what the

    person in the street is saying [56]

    Side effects and associated medication

    adherence behaviors (e.g. drug switching

    and discontinuation) [51]

    Search queries and

    Web logs

    Found to be highly predictive for a wide

    range of population-level health

    behaviors

    Search keyword selection has been found

    Google and Yahoo search queries

    have been used to predict epidemics

    of illnesses, such as:

    Influenza (Google 2013)

  • Page 19 of 34

    Data type How has it been used in health? Examples

    to be critical for arriving at reliable

    curated health content

    Click stream navigational data from

    web logs are found to be informative of

    individual characteristics such as mental

    health and dietary preferences [57]

    Dengue fever [52]

    Seasonality of mental health,

    depression and suicide [53]

    Prevalence of Lyme disease [54]

    Prevalence of smoking and

    electronic cigarette use [55]

    6.2 Healthcare in Developing Countries Malaria Control and Prevention:

    Malaria kills one million people a year in Sub-Saharan Africa alone and most of them are

    children. A group of researchers from Harvard School of Public Health have tracked Big Data

    from cell phone usage and the malaria prevalence maps. The team analyzed the movement of

    nearly five million Kenyan cell phone subscribers over the course of a year (from June 2008 to

    June 2009) and compared it to the instances of malaria found in the country using a map

    provided by the Kenya Medical Research Institute and the Malaria Atlas Project. The goal was to

    identify both the source and sink points, or where the disease originates and where the disease

    primarily ends up.

    Not surprisingly, they found that one of the primary sources was the area near Lake Victoria, as

    lakes are prime breeding grounds for mosquitoes. However, according to the study, a

    surprisingly large portion of non-native infections ended up in Nairobi, Kenyas capital.

    The researchers, using text and call information, figured out Nairobi was a sink by mapping

    every journey taken by each of the nearly 15 million cell phone subscribers. Through that data, it

    was discovered that many people who travel to mosquito hotspots such as Lake Victoria or the

    shore are from Nairobi and end up bringing the disease back with them.

    6.3 Internet based epidemic surveillance

    At http://www.google.com/flutrends/, Google provides a tool called Google Flu Trends for real-

    time surveillance of influenza outbreaks [9]. Its assumption is that when the number of people

    have influenza symptoms, the searches for influenza related topics will increase [10]. Therefore

    based on Internet searches, the number of people with influenza symptoms can be estimated. The

  • Page 20 of 34

    predictions made by Google Flu Trends were 710 days prior to the official CDC networks and

    their results were consistent [11].

    For Chinese users, Baidu disease trend (http://trends.baidu.com/disease/) provided the province

    citycounty view of prevalence of several diseases include hepatitis, tuberculosis, venereal

    disease and influenza. What's more, its Big Data Trend product is open to ordinary users and

    therefore similar trends can be customized.

    Twitter is a widely used social networking and news-sharing platform. The tweets reflected

    people's opinions and judgments about public event, especially the epidemic outbreaks [12].

    Several methods were developed to monitor people's reaction to epidemic outbreak [12] and

    early disease syndrome based on Twitter [13]. The tweets involving H1N1 activity can be

    collected by searching key words, such as flu, influenza and H1N1. And the tweets involving

    public concern can also be filtered using keywords like travel, flight and ship for disease

    transmission, keywords like wash, hygiene and mask for disease counter measures. By studying

    the sequential tweets of H1N1 activity and public concern, the evolution pattern of public

    countermeasure can be revealed [12]. Similarly, by analyzing the early disease syndrome

    keywords, the risks of diseases such as cancer, flu, depression, aches/pains, allergies, obesity and

    dental disease, can be estimated [13].

    Fig 02: Dengue Trends in India using Google Trends Tools

  • Page 21 of 34

    Fig 03: Flu trends globally

    6.4 Sensor based health condition and food safety monitoring

    The integration of software and hardware, especially various sensors, create plenty amazing

    applications which monitor health condition and food safety. Many high-tech companies have

    launched their products, such Apple Watch from Apple (http://www.apple.com/watch/) which

    measures heart rate, Latin from Baidu (http://dulife.baidu.com/device/328) which measures body

    fat, MUMU from Baidu (http://dulife.baidu.com/device/330) which measures the blood pressure,

    Smart Chopsticks from Baidu which measures PH levels, temperature, calories and freshness of

    cooking oil [14]. Most such applications are based on well-established principles and have

    already been achieved with better accuracy or performance on larger instrument. The important

    meaning of these products is that they can be easily used and their data can be automatically

    gathered and analyzed on the cloud. The gathered quantified data make the powerful Big Data

    analysis applicable and hidden patterns obvious.

  • Page 22 of 34

    7. DISCUSSIONS Even though many benefits are expected with the implementation of Big Data in the areas of

    Healthcare, there are certain difficulties in particular, have unique characteristics that merit

    special analysis of the challenges faced by the application of Big Data and the ways they can be

    surmounted.

    In this section six broad categories have been developed to organize the content; with each

    domain the difficulties that are common to all Big Data are mentioned, and finally the challenges

    and opportunities to overcome them.

    7.1 Data Capture:

    Data sets are becoming larger and more difficult to manage using traditional database tools. As a

    result, organizations are faced with difficulties to capture, store, manage, and analyze data in a

    timely manner [15]. Consequently, this situation creates new infrastructure needs, and significant

    economic costs. Fortunately, storage costs are also decreasing. This allows for the capture of

    useful data, such as location data, which permit the mapping of real-time events for

    epidemiological surveillance.

    The growing adoption of mobile phones, 80% of which are located in India [27], offers the

    possibility to use the data they provide to improve development programs. For example, SMS for

    Life uses a combination of mobile phones, SMS messages, the Internet, and electronic mapping

    technology to track weekly stock levels of malaria drugs at public health facilities. This program

    improved the distribution of malaria drugs in rural Tanzania, reducing facilities without stock

    from 78% to 26% [28]. In 2013, this initiative encompassed several countries in sub-Saharan

    Africa from Ghana to Kenya, with plans to increase the number of countries reached [29].

    7.2 Infrastructure:

    A robust physical infrastructure is a key point for the operation and scalability of a Big Data. It is

    based on a distributed model, where data can be physically stored in different places and

    integrated through networks. The fundamental condition to take advantage of this capacity lies in

    the quality of telecommunications, which offer a gateway to Big Data.

    Large Internet companies like Google, Microsoft, Yahoo, and Amazon use this architecture with

    centers distributed throughout the world offering their services. All these changes in

  • Page 23 of 34

    infrastructure involve substantial costs, generating economies of scale that favor large Internet

    companies [32], which take advantage of these barriers to provide infrastructure as a service

    (IaaS) to organizations who cannot afford them [33].

    In addition, apart from the hardware infrastructure, an additional component is required: the

    software used to implement Big Data. The production, adoption, and adaptation of this software

    are key ingredients for Big Data, and require a properly trained workforce [30].

    Many developing countries lack the storage and communications infrastructure needed to

    organize and integrate the amount of information that is generated in a Big Data. Not only do

    these countries lack these resources, but they dont have the computing capacity to analyze them.

    The vast majority of the necessary hardware resides in developed countries, and access to

    information and resources is skewed by a very unequal distribution of telecommunication

    capabilities to access them [30].

    Regarding software used for organizing, integrating, and analyzing data, production is limited by

    the lack of a trained workforce, and the possibility to purchase or license the necessary systems

    is often not an option for developing countries. However, there are open source options with

    strong communities that provide the necessary functionalities for free. The most outstanding

    example is Apache Hadoop [42], a platform for processing large amounts of data distributed on

    computer clusters used by companies like Yahoo and Facebook.

    7.4 Organizational Changes Workforce:

    According to Villars et al, BIG DATA deployments require new IT administration and

    application developer skill sets. Additionally, the people who possess these skills are a scarce

    resource given the high market demand. Hal Varian, Googles chief economist, contends that

    statisticians will have the job most in demand in the next decade.

    To take advantage of the opportunity created by Big Data, trained human resources are needed,

    with the ability to manage and analyze data, with knowledge in computer science, statistics, and

    mathematics. Some developing countries are better positioned in this regard, including Brazil,

    Russia, India and China (the BRIC countries). In 2008, 40% of the specialized resources were

    trained in these countries [30].

  • Page 24 of 34

    As Internet and technological advances allow the outsourcing of infrastructures, there also exists

    the possibility to recruit the human resources needed for a Big Data project over the web. As an

    example, the Kaggle platform allows any organization to set a prize, and specialists from around

    the world can compete to solve Big Data problems [45]. Ultimately, this possibility depends on

    the economic resources that can be offered. One important example of a nonprofit organization is

    Datakind, a group of data scientists that work with high impact social organizations to improve

    their decision making processes [46].

    7.5 Integration and Interoperability

    One of the greatest challenges Big Data faces is to integrate data from many different sources.

    The use of standards to achieve interoperability between systems is a core requirement to

    effectively integrate information [47].

    The major difficulty for achieving interoperability among multiple repositories of Big Data lies

    in the differences in the metadata used in one repository with respect to other repositories.

    Without standards for these metadata, the integration of data generated in Big Data projects will

    be even more challenging [48].

    Health information systems are often fragmented and isolated in information silos hindering

    analysis and improvements in healthcare assistance [49]. This problem requires a political rather

    than a technological solution. In most cases, the required standards for systems to interoperate

    already exist, and they are the same in developing countries than in developed countries [50]. It

    is necessary to achieve consensus between government organizations, businesses, and

    stakeholders in order to advance in the development of digital agendas.

    Developed countries have made progress in spreading digital agendas in the last decade, and are

    now better positioned than developing countries, although lately this gap is narrowing.

    According to the World Health Organization (WHO), since 2008 more than 20 developing

    countries are in the process of implementing strategic plans for eHealth [51].

    The WHO and the International Telecommunications Union (ITU) published a document in

    order to help countries in the process of generating a national eHealth vision and an action plan

  • Page 25 of 34

    (National eHealth Strategy Toolkit) [52]. These resources are especially useful for governments

    in developing countries.

    7.6 Privacy and Security

    Some characteristics of Big Data, such as the relative lack of structure and the informal nature of

    some data, can be a problem if they are sensitive, with potential privacy, safety or legal issues.

    Traditional database management systems support granular security policies that protect data at

    various levels. The software used in Big Data does not usually have these safety measures [15].

    Another important challenge includes the security infrastructure and privacy policies. It is crucial

    to apply not only legal but also ethical considerations on the security of the data as soon as

    possible. The development of strategies to report on how data are collected, how they are

    protected, and how they will be used should be considered and recognized as a necessity [53].

    Likewise, an action plan should be contemplated in case of possible data losses or security

    breaches. Sharing information in a clear and careful way will help reduce concerns related to

    security and privacy [54].

    It is essential to ensure the privacy and confidentiality of personal data, especially with regard to

    the use of Big Data in healthcare. These factors should be considered part of the structure of a

    Big Data project from the beginning.

    Whatever the data, when they are related to humans, safety concerns will inevitably arise. If the

    goal is to share data, those who provide them have to be able to trust those who assume the

    responsibility of caring for their information [57, 58]. This will only be achieved with an

    appropriate regulatory framework.

    7.7 Adoption

    Data should be managed as a strategic asset within organizations. Existing barriers to the

    adoption of Big Data are usually cultural. Many organizations do not implement Big Data

    programs because they cannot appreciate the way in which data analysis can enhance their

    businesses [15].

  • Page 26 of 34

    Defining objectives and expected outcomes are critical in order to establish a governance capable

    to sustain projects of this magnitude. A BIG DATA program should include the people,

    processes, and policies needed [59].

    The difficulties that were previously reviewed: economic issues, poor infrastructure, and lack of

    trained personnel, are common to most developing countries, and generate a gap in the adoption

    of Big Data as compared to developed countries that is equivalent to the digital divide [30].

    Some ways to accelerate the adoption of Big Data techniques in developing countries like India

    are simple, such as sharing experiences and lessons learned [36]. Currently, developing countries

    have more access to sources of scientific information, due to the increased penetration of

    Internet, the emergence of the Open Access movement, which allows to access to scientific

    articles of prestigious publications for free, and the advent of new tools for searching scientific

    literature, like Google Scholar. A recent paper shows that Google Scholar provides greater

    access to free full-text articles than PubMed [60].

    # Trend Description Attribute 1 Fragmented data The separation of data among labs, hospital systems,

    and even clinical components such as financial IT and electronic health records is a key issue in healthcare.

    Variety

    2 Big Data is all about real or near real-time

    Traditional analytics use ETL processes that upload data nightly or weekly to a data warehouse. The Big Data trend is moving toward real or near real-time decision support at the point-of-care. In traditional analytics, reporting focuses on the past, but with Big Data, it is more predictive.

    Velocity, Value

    3 Data is driving the processes

    Traditionally, processes pulled and pushed data whenever needed. In Big Data, processes access data to derive meaning from datasets, create clinical hypothesis, prevent fraud, reduce cost of care, reduce clinical errors, and improve outcomes.

    Volume, Variety, Velocity

    4 Scale-up is shifting to scale-out.

    Traditionally, scale-up was the active choice. This led to replacing existing infrastructure with bigger servers, larger memory and more processing power In Big Data, multiple nodes are leveraged. Systems need not be replaced, rather are modernized and leveraged to exchange and use information.

    Value

    5 Software as a service (SaaS), Infrastructure as a

    The exponential growth of data requires significant supporting infrastructure and complex software for

    Value

  • Page 27 of 34

    Service (IaaS) healthcare companies to derive insights. Healthcare organizations can adopt new service delivery models such as Saas and Iaas to fulfill software and infrastructure needs

    6 Data Privacy Concern Privacy of Personal Health Information (PHI) and Individually Identifiable Personal Information (IIPI) is key to healthcare companies. Big Data solutions also need to effectively address data security Value concerns to ensure data privacy

    Value

    Tabular representation of Challenges with Big Data Analytics in Healthcare

  • Page 28 of 34

    8. CONCLUSIONS

    Big Data has the potential to string this traditional and non-traditional data together to deliver

    significant insights that can drive improvements in wide ranging areas of healthcare from clinical

    research to care delivery to health policy and planning. Big Data is proving to be a huge asset in

    tackling community healthcare issues to reduce the costs associated with emergency care and

    make it prevention-focused. In clinical research and care delivery, Big Data can be leveraged as

    a powerful tool to find solutions to Alzheimers disease and certain types of cancer and also

    provide a low cost approach to personalized medicine. In health policy, planning and

    implementation, initiatives such as using cellphone data to track disease origination and spread

    can lead to key insights on where to spend valuable economic resources to control diseases and

    epidemics. Healthcare organizations need to evaluate Big Data needs as well as potential uses

    and take a step towards moving to a data driven, hypothesis generating approach to forward

    clinical research frontiers. By leveraging Big Data, healthcare organizations can create value

    based outcome-driven efficient care delivery that benefits all stakeholders.

    Recommendations:

    Data capture Take advantage of the high penetration rates of mobile phones to collect usage-associated data and sensor data for innovative BIG DATA projects.

    Infrastructure Circumvent infrastructure and economic deficits using IaaS and open source software.

    Organizational changes - Workforce

    Increase the number of data scientists trained. Make partnerships with nonprofit organizations like Datakind when trained resources are needed.

    Integration and interoperability Advance in the creation and adoption of digital agendas.

    Privacy and security Institute policies and regulatory frameworks to ensure the privacy and security of sensitive data.

    Adoption Implement strategic partnerships with private and public institutions with expertise in BIG DATA tools and techniques.

  • Page 29 of 34

    9. REFERENCES

    1. Raghupathi W: Data Mining in Healthcare. In Healthcare Informatics: Improving Efficiency

    and Productivity. Edited by Kudyba S. Taylor & Francis; 2010:211223.

    2. Burghard C: Big Data and Analytics Key to Accountable Care Success. IDC Health Insights;

    2012.

    3. Dembosky A: Data Prescription for Better Healthcare. Financial Times, December 12, 2012,

    p. 19; 2012. Available from: http://www.ft.com/intl/cms/ s/2/55cbca5a-4333-11e2-aa8f-

    00144feaBig Datac0.html#axzz2W9cuwajK.

    4. Feldman B, Martin EM, Skotnes T: Big Data in Healthcare Hype and Hope. October 2012.

    Dr. Bonnie 360; 2012. http://www.west-info.eu/files/big-data-inhealthcare. pdf.

    5. Fernandes L, OConnor M, Weaver V: Big Data, bigger outcomes. J AHIMA 2012:3842.

    6. IHTT: Transforming Healthcare through Big Data Strategies for leveraging Big Data in the

    healthcare industry; 2013. http://ihealthtran.com/wordpress/2013/03/iht%C2%B2-releases-big-

    data-research-reportdownload-today/.

    7. Frost & Sullivan: Drowning in Big Data? Reducing Information Technology Complexities and

    Costs for Healthcare Organizations. http://www.emc.com/collateral/analyst-reports/frost-

    sullivan-reducing-information-technologycomplexities-ar.pdf.

    8. Bian J, Topaloglu U, Yu F, Yu F: Towards Large-scale Twitter Mining for Drugrelated

    Adverse Events. Maui, Hawaii: SHB; 2012.

    9. Raghupathi W, Raghupathi V: An Overview of Health Analytics. Working paper; 2013.

    10. Ikanow: Data Analytics for Healthcare: Creating Understanding from Big Data.

    http://info.ikanow.com/Portals/163225/docs/data-analytics-for-healthcare.pdf.

    11. jStart: How Big Data Analytics Reduced Medicaid Re-admissions. A jStart Case Study;

    2012. http://www-01.ibm.com/software/ebusiness/jstart/portfolio/uncMedicaidCaseStudy.pdf.

    12. Knowledgent: Big Data and Healthcare Payers; 2013.

    http://knowledgent.com/mediapage/insights/whitepaper/482.

    13. Explorys: Unlocking the Power of Big Data to Improve Healthcare for

    Everyone.https://www.explorys.com/docs/data-sheets/explorys-overview.pdf.

    14. IBM: IBM Big Data platform for healthcare. Solutions Brief; 2012.

    http://publicdhe.ibm.com/common/ssi/ecm/en/ims14398usen/IMS14398USEN.PDF.

  • Page 30 of 34

    15. Intel: Leveraging Big Data and Analytics in Healthcare and Life Sciences: Enabling

    Personalized Medicine for High-Quality Care, Better Outcomes;

    2012.http://www.intel.com/content/dam/www/public/us/en/documents/whitepapers/healthcare-

    everaging-big-data-paper.pdf.

    16. IBM: Data Driven Healthcare Organizations Use Big Data Analytics for Big Gains; 2013.

    http://www03.ibm.com/industries/ca/en/healthcare/documents/Data_driven_healthcare_organizat

    ions_use_big_data_analytics_for_big_gains.pdf.

    17. Savage N: Digging for drug facts. Commun ACM 2012, 55(10):1113.18. Zenger B: Can

    Big Data Solve Healthcares Big Problems? HealthByte,February 2012; 2012.

    http://www.equityhealthcare.com/docstor/EH%20Blog%20on%20Analytics.pdf.

    19. LaValle S, Lesser E, Shockley R, Hopkins MS, Kruschwitz N: Big Data,analytics and the

    path from insights to value. MIT Sloan Manag Rev 2011, 52:2032.

    20. Core Techniques and Technologies for Advancing Big Data Science & Engineering

    (BIGDATA) [Internet]. National Science Foundation; 2012. Available

    at:http://www.nsf.gov/pubs/2012/nsf12499/nsf12499.pdf

    21. MD Anderson Taps IBM Watson to Power Moon Shots Mission [Internet]. MD Anderson

    Cancer Center. 2013[cited 2013 Dec 17]. Available

    at: http://www.mdanderson.org/newsroom/news-releases/2013/ibm-watson-to-power-moon-

    shots-.html

    22. Okun S, McGraw D, Stang P, Larson E, Gold-mann D, Kupersmith J. Making the Case for

    Continuous Learning from Routinely Collected Data [Internet]. IOM; 2013. Available

    at:http://www.iom.edu/~/media/Files/Perspectives-Files/2013/Discussion-Papers/VSRT-

    MakingtheCase.pdf

    23. Davis DA, Chawla NV, Blumm N, Christakis N, Barabasi A-L. Predicting individual disease

    risk based on medical history. Proceedings of the 17th ACM conference on Information and

    knowledge management. ACM; 2008. p. 76978.

    24. Davis DA, Chawla NV, Christakis NA, Barabsi A-L. Time to CARE: a collaborative engine

    for practical disease prediction. Data Min Knowl Discov 2010;20(3):388415.

  • Page 31 of 34

    25. Asangansi I, Braa K. The emergence of mobile-supported national health information

    systems in developing countries. Stud Health Technol Inf 2010;160(Pt 1):5404. [PubMed]

    26. Lewis T, Synowiec C, Lagomarsino G, Schweitzer J. E-health in low- and middle-income

    countries: Findings from the center for health market innovations. Bull World Health

    Organ 2012;90(5):33240.[PMC free article] [PubMed]

    27. Big Data for Development: Challenges & Opportunities [Internet]. UN Global Pulse; 2012.

    Available at:http://www.unglobalpulse.org/sites/default/files/BigDataforDevelopment-

    UNGlobal-PulseJune2012.pdf

    28. Barrington J, Wereko-Brobby O, Ward P, Mwafongo W, Kungulwe S. SMS for Life: a pilot

    project to improve anti-malarial drug supply management in rural Tanzania using standard

    technology. Malar J 2010. Oct 27;9(1):298. [PMC free article] [PubMed]

    29. Novartis Malaria Initiative: SMS for Life [Internet]. [cited 2014 Mar 27]. Available

    at:http://www.malaria.novartis.com/innovation/sms-for-life/

    30. Hilbert M. Big Data for Development: From Information-to Knowledge Societies. Univ

    South Calif - Annenberg Sch Commun [Internet]. 2013; Available

    at: http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2205145

    31. Barroso LA, Hlzle U. The Datacenter as a Computer: An Introduction to the Design of

    Warehouse-Scale Machines. Synth Lect Comput Archit 2009. Jan;4(1):1108.

    32. Shapiro C, Varian HR. Information rules: a strategic guide to the network economy. Boston,

    Mass: Harvard Business School Press; 1999.

    33. Infrastructure as a Service (IaaS) [Internet]. Gartner IT Glossary. [cited 2013 Dec 10].

    Available at:http://www.gartner.com/it-glossary/infrastructure-as-a-service-iaas

    34. Latourette MT, Siebert JE, Barto RJ, Jr., Marable KL, Muyepa A, Hammond CA, et

    al. Magnetic resonance imaging research in sub-Saharan Africa: Challenges and satellite-based

    networking implementation. J Digit Imaging 2011;24(4):72938. [PMC free article] [PubMed]

  • Page 32 of 34

    35. Shiferaw F, Zolfo M. The role of information communication technology (ICT) towards

    universal health coverage: The first steps of a telemedicine project in Ethiopia. Glob Health

    Action 2012;5(1):15.[PMC free article] [PubMed]

    36. Simba DO. Application of ICT in strengthening health information systems in developing

    countries in the wake of globalisation. Afr Health Sci 2004. Dec;4(3):1948. [PMC free

    article] [PubMed]

    37. Gardiner B. Astrophysicist Replaces Supercomputer with a Cluster of Eight PlayStation

    3s [Internet].WIRED. 2007[cited 2013 Dec 10]. Available

    at:http://www.wired.com/techbiz/it/news/2007/10/ps3_supercomputer

    38. Zyga L. US Air Force connects 1,760 PlayStation 3 s to build

    supercomputer [Internet]. PhysOrg. 2010[cited 2013 Dec 10]. Available

    at: http://phys.org/news/2010-12-air-playstation-3s-super-computer.html

    39. Amazon Web Services [Internet]. Amazon. [cited 2013 Dec 10]. Available

    at: http://aws.amazon.com/

    40. Google Compute Engine [Internet]. Google Cloud Platform. [cited 2013 Dec 10]. Available

    at:https://cloud.google.com/products/compute-engine/

    41. Purkayastha S, Braa J. Big Data Analytics for developing countries-Using the Cloud for

    Operational BI in Health. Electron J Inf Syst Dev Ctries [Internet]. 2013[cited 2014 Mar 25];59.

    Available at:https://ejisdc.org/ojs2/index.php/ejisdc/article/view/1220

    42. Apache Hadoop [Internet]. Hadoop. [cited 2013 Dec 10]. Available

    at: http://hadoop.apache.org/

    43. Lohr S. For Todays Graduate, Just One Word: Statistics. The New York Times [Internet].

    2009. Aug 6 [cited 2013 Dec 10]; Available

    at: http://www.nytimes.com/2009/08/06/technology/06stats.html?_r=3&

    44. Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, et al. Big Data: The next

    frontier for innovation, competition, and productivity [Internet]. McKinsey Global Institute;

    2011.Available

  • Page 33 of 34

    at:http://www.mckinsey.com/insights/business_technology/big_data_the_next_frontier_for_inno

    vation

    45. Competitions | Kaggle [Internet]. [cited 2014 Mar 27]. Available

    at:https://www.kaggle.com/solutions/competitions

    46. DataKind | DataKind [Internet]. [cited 2014 Mar 27]. Available at: http://www.datakind.org/

    47. Hammond WE, Bailey C, Boucher P, Spohr M, Whitaker P. Connecting Information To

    Improve Health. Health Aff (Millwood) 2010. Feb 1;29(2):2848. [PubMed]

    48. Searching for standards in Big Data [Internet]. FCW; 2012[cited 2013 Dec 17]. Available

    at:http://fcw.com/microsites/2012/snapshot-man-aging-big-data/05-establishing-big-data-

    standards.aspx

    49. Glaser J. Interoperability: the key to breaking down information silos in health care. Healthc

    Financ Manage 2011. Nov;65(11):446, 48, 50. [PubMed]

    50. Luna D, Garca M, Nishioka A, Franco M. OPS - Revisin de estndares de interoperabilidad

    para la e-salud en latinoamrica y el caribe. In Press. 2013;

    51. Country health information systems: a review of the current situation and

    trends [Internet]. Geneva: World Health Organization; 2011[cited 2013 Nov 1]. Available

    at:http://www.who.int/healthmetrics/news/chis_report.pdf

    52. National eHealth strategy toolkit. [Internet]. World Health Organization and International

    Telecommunication Union; 2012. Available at: http://www.itu.int/pub/D-STR-E_HEALTH.05-

    2012/

    53. Committee on the Role of Institutional Review Boards in Health Services Research Data

    Privacy Protection. I of M. Protecting data privacy in health services research [Internet]. National

    Academies Press.; 2000. Available at: http://www.nap.edu/openbook.php?isbn=0309071879

    54. Meslin EM. Shifting Paradigms in Health Services Research Ethics. J Gen Intern Med 2006.

    Mar;21(3):27980. [PMC free article] [PubMed]

  • Page 34 of 34

    55. Summary of the HIPAA Security Rule [Internet]. HHS. [cited 2013 Dec 17]. Available

    at:http://www.hhs.gov/ocr/privacy/hipaa/understanding/srsummary.html

    56. Summary of the HIPAA Privacy Rule [Internet]. HHS. [cited 2013 Dec 17]. Available

    at:http://www.hhs.gov/ocr/privacy/hipaa/understanding/summary/index.html

    57. Campbell AV. The Ethical Challenges of Genetic Databases: Safeguarding Altruism and

    Trust. Kings Law J 2007. Jan 1;18(2):22745.

    58. Chalmers D, Nicol D. Commercialisation of biotechnology: public trust and research. Int J

    Biotechnol2004. Jan 1;6(2):11633.

    59. Michele O, Fernandes L, Weaver V. Big Data, Bigger Outcomes. J AHIMA 2012;83(10):38

    43.[PubMed]

    60. Shariff SZ, Bejaimal SA, Sontrop JM, Iansavichus AV, Haynes RB, Weir MA, et

    al. Retrieving clinical evidence: a comparison of PubMed and Google Scholar for quick clinical

    searches. J Med Internet Res2013;15(8):e164. [PMC free article] [PubMed]

    61. Big Data for Development: a primer. Harnessing Big Data For Real-Time Awareness

    [Internet]. UN Global Pulse; 2013. Available

    at: http://www.unglobalpulse.org/sites/default/files/Primer%20

    2013_FINAL%20FOR%20PRINT.pdf

    62. Vital Wave Consulting. Big Data, Big Impact: New Possibilities for International

    Development [Internet]. World Economic Forum; 2012. Available

    at:http://www3.weforum.org/docs/WEF_TC_MFS_BigDataBigImpact_Briefing_2012.pdf

    63. New Data for Understanding the Human Condition: International Perspectives

    [Internet]. OECD; 2013. Available at: http://www.oecd.org/sti/scitech/new-data-for-

    understanding-the-human-condition.pdf.