Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Pekka Neittaanmaki
Dean of the Faculty of Information Technology
Professor, Dept. of Mathematical Information Technology.
University of Jyvaskyla.
Anthony Ogbechie
Service Innovation Management
University of Jyvaskyla.
Contents 1. How does precision medicine become a reality? The Semantic Data Lake for Healthcare
makes it possible ............................................................................................................................. 3
2. Medical technologist drives semantic data lake development ............................................... 3
3. Montefiore Semantic Data Lake Tackles Predictive Analytics ................................................ 4
4. Semantic Big Data Lakes Can Support Better Population Health ........................................... 4
5. “Data Lake as a Service” Enables Internet of Things, Precision Medicine ................................. 5
6. Semantic Computing, Predictive Analytics Need Reliable Metadata ......................................... 5
7. Partners Data Lake Offers Healthcare Analytics as a Service ..................................................... 6
8. Semantic Data Lake Delivering Tacit Knowledge - Evidence based Clinical Decision Support... 6
9. Hadoop, Triple Stores, and the Semantic Data Lake .................................................................. 7
10. Medical Insight Set to Flow from Semantic Data Lakes ........................................................... 7
11. Semantic graph database underpins healthcare data lake ...................................................... 8
12. Data Lakes Get Smart with Semantic Graph Models ................................................................ 8
13. Semantic Data Lakes and the Advance of Medicine ................................................................. 9
14. Semantic Data Lakes Dives In For Healthcare......................................................................... 10
15. The Data Lake Concept Is Maturing ........................................................................................ 10
16. The Potential of Data Lake Technology in the Healthcare Industry ....................................... 11
17. 6 WAYS TO PROTECT MEDICAL DATA LAKES AT THE FACILITY LEVEL .................................... 11
18. Dealing with Big Data: The ascendency of data lakes ............................................................ 12
19. Making Data Lakes Usable: Why we need a semantic layer – and why it should be open source
....................................................................................................................................................... 13
20. Implementing Personalized, Precision Medicine with Artificial Intelligence and Semantic
Graph Technology ......................................................................................................................... 13
21. Big Data and Healthcare Payers .............................................................................................. 14
22. The Bright Future of Semantic Graphs and Big Connected Data............................................ 14
22. Empowering personalized medicine with big data and semantic web technology: Promises,
challenges, and use cases ............................................................................................................. 15
23. How the Search for Smart Data Drives Healthcare IT Investment ......................................... 15
24. Data Lake Management: Do You Know the Type of “Fish” You Caught? ............................... 16
1. How does precision medicine become a reality? The Semantic
Data Lake for Healthcare makes it possible
One of the prominent problems plaguing the current healthcare system is the narrow scope of
patient data used to facilitate most aspects of care, from initial diagnoses to treatment.
According to Dr. Parsa Mirhaji, director of clinical research informatics at Montefiore Health
System and Albert Einstein College of Medicine, the vast majority of research findings are based
on averages of middle-aged white males: “We don’t really know much about women, other
ethnicities, children, you name it—there’s no evidence,” he says.
The White House launched The Precision Medicine Initiative in 2015 as a means of redressing the
situation and expanding the breadth of patient data to personalize treatment for individuals
and historically underrepresented groups. Achieving that objective requires not only amassing
patient-specific data for wider demographics, but also storing, accessing and analyzing them
with a number of avant-garde data management technologies including:
http://www.kmworld.com/Articles/Editorial/Features/How-does-precision-medicine-become-
a-reality-The-Semantic-Data-Lake-for-Healthcare-makes-it-possible-114312.aspx
2. Medical technologist drives semantic data lake development A pivotal magazine article helped point medical doctor Parsa Mirhaji along a path to a semantic
data lake for healthcare analytics applications, using Hadoop, RDF, graph databases and more.
http://searchdatamanagement.techtarget.com/feature/Medical-technologist-drives-semantic-
data-lake-development
3. Montefiore Semantic Data Lake Tackles Predictive Analytics This new approach to analytics eschews the rigid, limited capabilities of the traditional
relational database and instead focuses on creating a fluid pool of standardized data elements
that can be mixed and matched on the fly to answer a large number of unique queries.
Montefiore Medical Center, in partnership with Franz Inc., is among the first healthcare
organizations to invest in a robust semantic data lake as the foundation for advanced clinical
decision support and predictive analytics capabilities.
https://healthitanalytics.com/news/montefiore-semantic-data-lake-tackles-predictive-analytics
4. Semantic Big Data Lakes Can Support Better Population
Health As healthcare providers navigate the treacherous transitional waters of Stage 2 and try to
predict how future regulations will shape their actions, the need to lay the groundwork for
advanced population health management and accountable care is only becoming clearer.
No matter what the outcome of debates about the future course of the EHR Incentive
Programs, one thing remains abundantly clear for organizations of all shapes and sizes:
advancements in healthcare big data analytics will not be driven solely by rules and mandates,
but by the pressing financial need to collect, corral, understand, and leverage information in
order to refine and expand population health management techniques.
https://healthitanalytics.com/news/semantic-big-data-lakes-can-support-better-population-
health
5. “Data Lake as a Service” Enables Internet of Things, Precision
Medicine Can data lake technology simplify the development of the Internet of Things, create a
welcoming environment for precision medicine, and change the way providers approach big
data analytics?
https://healthitanalytics.com/news/data-lake-as-a-service-enables-internet-of-things-precision-
medicine
6. Semantic Computing, Predictive Analytics Need
Reliable Metadata Reliable metadata is the key to leveraging semantic computing and predictive analytics for
healthcare applications, such as population health management and crisis care.
https://healthitanalytics.com/news/semantic-computing-predictive-analytics-need-reliable-
metadata
7. Partners Data Lake Offers Healthcare Analytics as a Service Big data analytics is a key component of the complex transition, and Partners already has a
history of success with innovative clinical decision support tools like QPID.
In order to add to their analytics toolkit while continuing to attract world-class clinical research
and technical development talent to the Boston area, the health system is pairing its new EHR
with the Integrated Data Environment for Analytics (IDEA) platform.
This tool, developed in conjunction with EMC, is geared towards researchers and
investigators working on everything from precision medicine projects to developing apps for
clinical decision support and patient care.
https://healthitanalytics.com/news/partners-data-lake-offers-healthcare-analytics-as-a-service
8. Semantic Data Lake Delivering Tacit Knowledge - Evidence
based Clinical Decision Support Can the complexity be removed and tacit knowledge delivered from the plethora of the
medical information available in the world.
“Let Doctors be Doctors"
Semantic Data Lake becomes the Book of Knowledge ascertained by correlation and causation resulting into Weighted Evidence
https://www.linkedin.com/pulse/semantic-data-lake-delivering-tacit-knowledge-evidence-
boray
9. Hadoop, Triple Stores, and the Semantic Data Lake Hadoop-based data lakes are springing up all over the place as organizations seek low-cost
repositories for storing huge mounds of semi-structured data. But when it comes to analyzing
that data, some organizations are finding the going tougher than expected. One solution to the
dilemma may be found in Hadoop-resident graph databases and the notion of the semantic
data lake.
Despite their growing popularity, data lakes have taken a bit of heat lately as analyst firms
like Gartner call into question their long-term viability. Without a way to organize the
schemaless data that people are shunting into Hadoop en masse, the data lakes risk becomes
convoluted quagmires, where data goes in and nothing useful comes out.
https://www.datanami.com/2015/05/26/hadoop-triple-stores-and-the-semantic-data-lake/
10. Medical Insight Set to Flow from Semantic Data Lakes
The potential for data analytics to disrupt healthcare delivery is large, and getting larger by the
day. But in many cases, the need to hammer data into a structured format creates a barrier to
productivity. Now a hospital chain in New York City is hoping to change that by adopting a
Hadoop-based semantic data lake.
Located in the Bronx, Montefiore Health System is the first hospital to implement a semantic
data lake as part of the New York City Clinical Data Research Network (NYC-CDRN), an
association of seven hospitals in the NYC area that are sharing data. As the pioneer, Montefiore
is working with several technology providers, including Intel and Franz, to test a big data system
capable of delivering precision medicine.
Most Datanami readers are familiar with the term “data lake,” but a “semantic data lake”
provides an interesting twist on the familiar concept. According to Franz CEO Jans Aasman, a
semantic data lake employs a combination of technologies, including Hadoop, graph analytics, a
semantic “triple store,” the SPARQL query language, and Spark-based machine learning, to
allow doctors to connect the dots between patient conditions and a world of knowledge
contained in structured internal systems, as well as unstructured data sources outside of the
organization.
https://www.datanami.com/2015/08/26/medical-insight-set-to-flow-from-semantic-data-
lakes/
11. Semantic graph database underpins healthcare data lake
http://searchhealthit.techtarget.com/feature/Semantic-graph-database-underpins-healthcare-
data-lake
12. Data Lakes Get Smart with Semantic Graph Models Been swimming in a data lake recently? Perhaps not, as many companies still are just dipping
their toes into these waters, as they become more familiar with the general idea of a data lake.
As research firm Gartner describes it, a data lake is:
“a collection of storage instances of various data assets additional to the originating data
sources…[and whose purpose is to] present an unrefined view of data to only the most highly
skilled analysts, to help them explore their data refinement and analysis techniques
independent of any of the system-of-record compromises that may exist in a traditional analytic
data store (such as a data mart or data warehouse).”
Even as enterprises consider the returns they may expect from diving deeper into data lakes,
they’re now being exposed to another twist on the concept. Enter the smart data lake, also
known as the semantic data lake. At DATAVERSITY’s® Smart Data Conference in San Jose in August,
the issue came up in sessions including presentations by Cambridge Semantics CTO Sean Martin
and FranzCEO Jans Aasman. Franz, in fact, notes that it has copy written the term Semantic Data
Lake, and points out that Gartner also has explained that data lakes need semantics in order to
be usable by a broad set of users.
http://www.dataversity.net/data-lakes-get-smart-with-semantic-graph-models/
13. Semantic Data Lakes and the Advance of Medicine This article charts the long path of Dr. Parsa Mirhaji and his work to bring Semantic Data Lakes
and healthcare analytics together: “Mirhaji saw something in the story [from 2001] that spoke
to a practical problem he had encountered in searching for and saving information on medical
advances, and it set the tone for research and analytics work he continues to do now.”
”I was being challenged to do analytics on heterogeneous data sets from across the Web,” he
said. “I started to learn about artificial intelligence and software agents and how they could
‘grab concepts’ from the Web or databases.”
It continues with a much deeper discussion of how Dr. Mirhaji began to integrate his ideas into
real-world practice, “Mirhaji said the data lake system is still in training mode, being tested out
on some specific analytics tasks. It takes in all sorts of genetic, population, health and wellness
data; that includes data from the U.S. Census, clinical trials and patient records – for example,
heart rate, temperature and blood pressure measurements collected by patient monitoring
devices.”
http://www.dataversity.net/semantic-data-lakes-and-the-advance-of-medicine/
14. Semantic Data Lakes Dives In For Healthcare The Healthcare industry has been having a hard time trying to save time and dollars to
improving patient outcomes for healthcare organizations through the use of data analytics.
Many health systems find it difficult capturing and using the data from its patients to make a
real impact on their businesses, especially considering the enormous amount of redundancies
and unanswered questions when it comes to dealing with healthcare.
“Making sense out of big data is a challenge, particularly in the healthcare industry where
information comes from a variety of sources and in different forms including structured,
unstructured, images, temporal, geo-location and signal data,” says Jans Aasman, PhD, CEO of
Franz, Inc., specializing in semantic web technologies.
The solution may have finally appeared in the form of SDL’s (Semantic Data Lakes) says
Aasman. In collaboration between The Montefiore Medical Center in Bronx, New York, Franz,
Inc., Intel, Cloudera, and Cisco, SDL is a system that allows you to get all the data together from
different silos for analytics. It will then be used to transform statistical databases, such as
spreadsheets, into interactive graph databases that can be used to make better informed and
predictive healthcare decisions.
http://bigcommunity.net/big_news/semantic-data-lakes-dives-in-for-healthcare/
15. The Data Lake Concept Is Maturing
https://cacm.acm.org/news/200095-the-data-lake-concept-is-maturing/fulltext
16. The Potential of Data Lake Technology in the Healthcare
Industry Data lake technology has been named the future of big data. Unlike data warehousing
techniques, data lakes offer a different type of data management better suited to handling and
collaborating various forms of data in a way which doesn’t confuse systems. But how exactly do
data lakes work? What are their benefits in the healthcare system?
Data lakes embrace a semantic approach to data storage, retaining enormous amounts of raw
information (whether structured or unstructured) in its original state within a single centralised
location. This technology, often referred to as graph databases, enables analysts to select which
data to use when necessary, giving them the opportunity to reuse it when required. Analysts
also have the option of combining seemingly incompatible sources of information in ways
previously thought impossible with data warehousing.
http://www.protogen.com.au/blog/the-potential-of-data-lake-technology-in-the-healthcare-
industry
17. 6 WAYS TO PROTECT MEDICAL DATA LAKES AT THE FACILITY
LEVEL With the global push towards putting EHRs and the unified medical record in the cloud,
hospitals and clinics are finally getting on board this big data ride. EHRs (Electronic Health
Records) and Patient Portal use has been rising steadily over the past three years with more
and more patients having access to their medical data thru Patient Portals and mobile devices.
As providers finally begin to roll out their EHR and Patient Portal programs, both patients and
doctors are enjoying the new found access to the data they need to make medical decisions,
and it’s literally at their fingertips. This has given the patients a new sense of ownership of their
medical data and perhaps a feeling of control over their destiny – especially in an environment
where ACA has left them feeling somewhat limited. In today’s connected society, medical data
of all kinds are being gathered and gleaned from a myriad of sources; from EHRs to mobile
device apps to telemetry at the office level and to test results of all kinds. Medical data lakes,
collections of unstructured medical data, are growing and thriving; but what is the risk of this
access? What are the purposes of data lakes and how can we protect them?
http://dermatologyrecruiters.com/6-ways-protect-medical-data-lakes/
18. Dealing with Big Data: The ascendency of data lakes The data lake concept occupies a central place of prominence in contemporary big data
initiatives. The past two years have unveiled numerous headlines, vendor solutions (including
repackaging of former solutions) and enterprise use cases for the utility of this centralized
approach for accumulating, analyzing and actuating big data.
The fervor for this method of managing big data is based on a simple premise that promises
value for organizations regardless of size or vertical industry. Data lakes provide a singular
repository for storing all data – unstructured, semi-structured and structured – in their native
formats, granting access and insight to all without lengthy IT preparation.
Moreover, the data lake movement is largely spurred by adoption rates for Hadoop. As
Hadoop’s presence increases, its function as an integration hub for all data delivers more
credence and traction to the notion of data lakes. The data lake concept may be relatively new,
but the association of Hadoop and big data is nearly as ubiquitous as big data itself.
The combination of these two factors, Hadoop’s deployment as a data lake and the storage and
access benefits this method produces, is largely responsible for the widespread attention data
lakes have garnered. A recent post from Gartner reveals that data lake interest is “becoming
quite widespread.” Forbes indicates that “one phrase in particular has become popular for the
massing of data into Hadoop, the ‘Data Lake.’”
http://analytics-magazine.org/dealing-with-big-data-the-ascendency-of-data-lakes/
19. Making Data Lakes Usable: Why we need a semantic layer –
and why it should be open source Big Data platforms, like Cloudera’s Data Hub, Hortonworks’ Data
Platform, MapR’s Converged Data Platform, and others including IBM, Oracle,
Pivotal, promise to easily bring diverse sets of application data together into one data cluster
running on Hadoop. This collection of data sets is called a data lake.
It is a wonderful thing to be able to bring data together so easily with Hadoop’s schema on read
ability rather than the schema on write required by traditional database systems. But we need
to remember that along with the data comes all the data’s associated data problems. Bringing
data into the data lake does not, unfortunately, wash away all the problems of non-standard,
non-integrated, redundant, and inconsistent data that are buried in application data.
For data lakes, with great ease of data access comes a great need for data management. But
this not what I want to talk about today (perhaps I’ll do so in a later blog post). I want to talk
about how we need to make it easy for users to access data in a data lake when there is non-
integrated and redundant data.
http://rcgglobalservices.com/blog/making-data-lakes-usable-why-we-need-a-semantic-layer-
and-why-it-should-be-open-source/
20. Implementing Personalized, Precision Medicine with Artificial
Intelligence and Semantic Graph Technology Personalizing healthcare services for individuals creates several demands on data-driven
functions in the medical field. Healthcare organizations are tasked with integrating structured,
unstructured and semi-structured data, storing and cataloging them in relevant ways across use
cases and locations, and leveraging emerging AI techniques for predictive capabilities which
could potentially save lives.
Most of all, this process must occur in time to make a difference for patients.
According to Montefiore Health System System Senior Vice President and Chief Medical Officer
Andrew Racine, who spoke at a recent event for the unveiling of Intel’s Xeon Scalable
Processors, all of these measures must be implemented so providers can: “use information in
real time to make clinical decisions that are going to allow us to intervene with patients and
prevent them from having adverse outcomes.”
Montefiore is currently engaged in such an undertaking with a Semantic Data Lake for
Healthcare(SDL). The SDL is powered by Franz’s AllegroGraph, architected by Intel, and fortified
by Cloudera’s Hadoop distribution. By merging a unique set of data management techniques
with some of the most pertinent technologies across the data landscape, Montefiore is seeking
to tailor its medical treatment and diagnoses for individual patients.
https://analyticsweek.com/content/implementing-personalized-precision-medicine-artificial-
intelligence-semantic-graph-technology/
21. Big Data and Healthcare Payers With the implementation of the Affordable Care Act, the advent of Healthcare Information
Exchanges (HIE), the introduction of new provider models, such as Accountable Care
Organizations (ACO), and the transition to a more member-centric relationship model,
Healthcare Payers face seismic changes in their business models. As with many large-scale,
business transformations, there are challenges to navigate as well as opportunities to realize
around improving patient outcomes, reducing cost, and increasing revenue. Capitalizing on
these opportunities will depend on an organization’s capability to leverage information. The
ability to capture, integrate, and interrogate large information sets will be foundational in
realizing objectives, such as:
https://knowledgent.com/whitepaper/big-data-and-healthcare-payers/
22. The Bright Future of Semantic Graphs and Big Connected Data The big data revolution is generating a mess of unruly data that’s difficult to parse and
understand. This is to be expected–explosions don’t generally occur in a nice, orderly fashion,
after all. But if the folks at Cloudera and Franz have their way, the world of connected data will
become more accessible and useful when viewed through the lens of semantic graph
technologies.
Semantic graph technology is shaping up to play a key role in how organizations access the
growing stores of public data. This is particularly true in the healthcare space, where
organizations are beginning to store their data using so-called triple stores, often defined by
the Resource Description Framework (RDF), which is a model for storing metadata created by
the World Wide Web Consortium (W3C).
https://www.datanami.com/2016/02/08/the-bright-future-of-semantic-graphs-and-big-
connected-data/
22. Empowering personalized medicine with big data and
semantic web technology: Promises, challenges, and use cases In healthcare, big data tools and technologies have the potential to create significant value by
improving outcomes while lowering costs for each individual patient. Diagnostic images, genetic
test results and biometric information are increasingly generated and stored in electronic
health records presenting us with challenges in data that is by nature high volume, variety and
velocity, thereby necessitating novel ways to store, manage and process big data. This presents
an urgent need to develop new, scalable and expandable big data infrastructure and analytical
methods that can enable healthcare providers access knowledge for the individual patient,
yielding better decisions and outcomes. In this paper, we briefly discuss the nature of big data
and the role of semantic web and data analysis for generating “smart data” which offer
actionable information that supports better decision for personalized medicine. In our view, the
biggest challenge is to create a system that makes big data robust and smart for healthcare
providers and patients that can lead to more effective clinical decision-making, improved health
outcomes, and ultimately, managing the healthcare costs. We highlight some of the challenges
in using big data and propose the need for a semantic data-driven environment to address
them. We illustrate our vision with practical use cases, and discuss a path for empowering
personalized medicine using big data and semantic web technology.
http://ieeexplore.ieee.org/abstract/document/7004307/?reload=true
23. How the Search for Smart Data Drives Healthcare IT
Investment There’s no question that the healthcare industry has become extremely “data rich” over the
past few years. Thanks to the rapid pace of electronic health record adoption, the vast majority
of healthcare organizations are now sitting on an enormous nest egg of big data, including
petabytes of clinical, administrative, demographic, and even genomic data on thousands or
millions of their patients.
But having data and knowing how to use it are two very different things. Despite the keen
interest in adopting a growing collection of big data analytics, predictive analytics, and risk
stratification tools, few organizations have really cracked the secret of how to turn a wealth of
fresh, unfiltered data into the spendable coin of actionable information.
Some fall into the trap of buying new products to solve each individual problem as it arises, not
realizing that they are creating a patchwork of competing technologies, or developing ad hoc
workarounds and an endless array of user interfaces that produce more headaches than they
cure.
http://www.distilnfo.com/provider/2016/08/08/search-smart-data-drives-healthcare-
investment/
24. Data Lake Management: Do You Know the Type of “Fish” You
Caught? “Different types of fish live in a community and when you understand their
relationship to each other, you have a better chance of catching what you
want.” thompsonadvertisinginc.com
You navigated your way to the lake and read up on the fundamentals of fish
management and introduced the data lake management principles. As you drove up to
the lake, you were thinking about what type of fish are you looking to catch or are
you looking for a trophy fish or to eat a fish? Without knowing the type of fish you’re
going to catch, you don’t know how far off-shore you have to boat for a catch.
The primary charter for any data lake initiative is the ability to catalog all the data,
enterprise-wide regardless of form (variety) or where it’s stored, whether on Hadoop,
NoSQL or an enterprise data warehouse, along with the associated business,
technical, and operational metadata. To carry on with our analogy, cataloging fish
into off-shore, near-shore or bottom fish can determine the type of fish you catch and
how far out you go fishing.
The catalog must enable business analysts, data architects, and data stewards to
easily search and discover data assets, data set patterns, data domains, data lineage
and understand the relationships between data assets – a 360 degree view of the
data. A catalog provides advanced discovery capabilities, smart tagging, data set
recommendations, metadata versioning, a comprehensive business glossary, and drill
down to finer grained metadata.
https://blogs.informatica.com/2016/11/01/data-lake-management-know-type-fish-
caught/#fbid=L81Agt5pe2b