16
Pekka Neittaanmaki Dean of the Faculty of Information Technology Professor, Dept. of Mathematical Information Technology. University of Jyvaskyla. Anthony Ogbechie Service Innovation Management University of Jyvaskyla.

Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

Pekka Neittaanmaki

Dean of the Faculty of Information Technology

Professor, Dept. of Mathematical Information Technology.

University of Jyvaskyla.

Anthony Ogbechie

Service Innovation Management

University of Jyvaskyla.

Page 2: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

Contents 1. How does precision medicine become a reality? The Semantic Data Lake for Healthcare

makes it possible ............................................................................................................................. 3

2. Medical technologist drives semantic data lake development ............................................... 3

3. Montefiore Semantic Data Lake Tackles Predictive Analytics ................................................ 4

4. Semantic Big Data Lakes Can Support Better Population Health ........................................... 4

5. “Data Lake as a Service” Enables Internet of Things, Precision Medicine ................................. 5

6. Semantic Computing, Predictive Analytics Need Reliable Metadata ......................................... 5

7. Partners Data Lake Offers Healthcare Analytics as a Service ..................................................... 6

8. Semantic Data Lake Delivering Tacit Knowledge - Evidence based Clinical Decision Support... 6

9. Hadoop, Triple Stores, and the Semantic Data Lake .................................................................. 7

10. Medical Insight Set to Flow from Semantic Data Lakes ........................................................... 7

11. Semantic graph database underpins healthcare data lake ...................................................... 8

12. Data Lakes Get Smart with Semantic Graph Models ................................................................ 8

13. Semantic Data Lakes and the Advance of Medicine ................................................................. 9

14. Semantic Data Lakes Dives In For Healthcare......................................................................... 10

15. The Data Lake Concept Is Maturing ........................................................................................ 10

16. The Potential of Data Lake Technology in the Healthcare Industry ....................................... 11

17. 6 WAYS TO PROTECT MEDICAL DATA LAKES AT THE FACILITY LEVEL .................................... 11

18. Dealing with Big Data: The ascendency of data lakes ............................................................ 12

19. Making Data Lakes Usable: Why we need a semantic layer – and why it should be open source

....................................................................................................................................................... 13

20. Implementing Personalized, Precision Medicine with Artificial Intelligence and Semantic

Graph Technology ......................................................................................................................... 13

21. Big Data and Healthcare Payers .............................................................................................. 14

22. The Bright Future of Semantic Graphs and Big Connected Data............................................ 14

22. Empowering personalized medicine with big data and semantic web technology: Promises,

challenges, and use cases ............................................................................................................. 15

23. How the Search for Smart Data Drives Healthcare IT Investment ......................................... 15

24. Data Lake Management: Do You Know the Type of “Fish” You Caught? ............................... 16

Page 3: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

1. How does precision medicine become a reality? The Semantic

Data Lake for Healthcare makes it possible

One of the prominent problems plaguing the current healthcare system is the narrow scope of

patient data used to facilitate most aspects of care, from initial diagnoses to treatment.

According to Dr. Parsa Mirhaji, director of clinical research informatics at Montefiore Health

System and Albert Einstein College of Medicine, the vast majority of research findings are based

on averages of middle-aged white males: “We don’t really know much about women, other

ethnicities, children, you name it—there’s no evidence,” he says.

The White House launched The Precision Medicine Initiative in 2015 as a means of redressing the

situation and expanding the breadth of patient data to personalize treatment for individuals

and historically underrepresented groups. Achieving that objective requires not only amassing

patient-specific data for wider demographics, but also storing, accessing and analyzing them

with a number of avant-garde data management technologies including:

http://www.kmworld.com/Articles/Editorial/Features/How-does-precision-medicine-become-

a-reality-The-Semantic-Data-Lake-for-Healthcare-makes-it-possible-114312.aspx

2. Medical technologist drives semantic data lake development A pivotal magazine article helped point medical doctor Parsa Mirhaji along a path to a semantic

data lake for healthcare analytics applications, using Hadoop, RDF, graph databases and more.

http://searchdatamanagement.techtarget.com/feature/Medical-technologist-drives-semantic-

data-lake-development

Page 4: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

3. Montefiore Semantic Data Lake Tackles Predictive Analytics This new approach to analytics eschews the rigid, limited capabilities of the traditional

relational database and instead focuses on creating a fluid pool of standardized data elements

that can be mixed and matched on the fly to answer a large number of unique queries.

Montefiore Medical Center, in partnership with Franz Inc., is among the first healthcare

organizations to invest in a robust semantic data lake as the foundation for advanced clinical

decision support and predictive analytics capabilities.

https://healthitanalytics.com/news/montefiore-semantic-data-lake-tackles-predictive-analytics

4. Semantic Big Data Lakes Can Support Better Population

Health As healthcare providers navigate the treacherous transitional waters of Stage 2 and try to

predict how future regulations will shape their actions, the need to lay the groundwork for

advanced population health management and accountable care is only becoming clearer.

No matter what the outcome of debates about the future course of the EHR Incentive

Programs, one thing remains abundantly clear for organizations of all shapes and sizes:

advancements in healthcare big data analytics will not be driven solely by rules and mandates,

but by the pressing financial need to collect, corral, understand, and leverage information in

order to refine and expand population health management techniques.

Page 5: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

https://healthitanalytics.com/news/semantic-big-data-lakes-can-support-better-population-

health

5. “Data Lake as a Service” Enables Internet of Things, Precision

Medicine Can data lake technology simplify the development of the Internet of Things, create a

welcoming environment for precision medicine, and change the way providers approach big

data analytics?

https://healthitanalytics.com/news/data-lake-as-a-service-enables-internet-of-things-precision-

medicine

6. Semantic Computing, Predictive Analytics Need

Reliable Metadata Reliable metadata is the key to leveraging semantic computing and predictive analytics for

healthcare applications, such as population health management and crisis care.

Page 6: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

https://healthitanalytics.com/news/semantic-computing-predictive-analytics-need-reliable-

metadata

7. Partners Data Lake Offers Healthcare Analytics as a Service Big data analytics is a key component of the complex transition, and Partners already has a

history of success with innovative clinical decision support tools like QPID.

In order to add to their analytics toolkit while continuing to attract world-class clinical research

and technical development talent to the Boston area, the health system is pairing its new EHR

with the Integrated Data Environment for Analytics (IDEA) platform.

This tool, developed in conjunction with EMC, is geared towards researchers and

investigators working on everything from precision medicine projects to developing apps for

clinical decision support and patient care.

https://healthitanalytics.com/news/partners-data-lake-offers-healthcare-analytics-as-a-service

8. Semantic Data Lake Delivering Tacit Knowledge - Evidence

based Clinical Decision Support Can the complexity be removed and tacit knowledge delivered from the plethora of the

medical information available in the world.

“Let Doctors be Doctors"

Semantic Data Lake becomes the Book of Knowledge ascertained by correlation and causation resulting into Weighted Evidence

Page 7: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

https://www.linkedin.com/pulse/semantic-data-lake-delivering-tacit-knowledge-evidence-

boray

9. Hadoop, Triple Stores, and the Semantic Data Lake Hadoop-based data lakes are springing up all over the place as organizations seek low-cost

repositories for storing huge mounds of semi-structured data. But when it comes to analyzing

that data, some organizations are finding the going tougher than expected. One solution to the

dilemma may be found in Hadoop-resident graph databases and the notion of the semantic

data lake.

Despite their growing popularity, data lakes have taken a bit of heat lately as analyst firms

like Gartner call into question their long-term viability. Without a way to organize the

schemaless data that people are shunting into Hadoop en masse, the data lakes risk becomes

convoluted quagmires, where data goes in and nothing useful comes out.

https://www.datanami.com/2015/05/26/hadoop-triple-stores-and-the-semantic-data-lake/

10. Medical Insight Set to Flow from Semantic Data Lakes

The potential for data analytics to disrupt healthcare delivery is large, and getting larger by the

day. But in many cases, the need to hammer data into a structured format creates a barrier to

productivity. Now a hospital chain in New York City is hoping to change that by adopting a

Hadoop-based semantic data lake.

Located in the Bronx, Montefiore Health System is the first hospital to implement a semantic

data lake as part of the New York City Clinical Data Research Network (NYC-CDRN), an

association of seven hospitals in the NYC area that are sharing data. As the pioneer, Montefiore

Page 8: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

is working with several technology providers, including Intel and Franz, to test a big data system

capable of delivering precision medicine.

Most Datanami readers are familiar with the term “data lake,” but a “semantic data lake”

provides an interesting twist on the familiar concept. According to Franz CEO Jans Aasman, a

semantic data lake employs a combination of technologies, including Hadoop, graph analytics, a

semantic “triple store,” the SPARQL query language, and Spark-based machine learning, to

allow doctors to connect the dots between patient conditions and a world of knowledge

contained in structured internal systems, as well as unstructured data sources outside of the

organization.

https://www.datanami.com/2015/08/26/medical-insight-set-to-flow-from-semantic-data-

lakes/

11. Semantic graph database underpins healthcare data lake

http://searchhealthit.techtarget.com/feature/Semantic-graph-database-underpins-healthcare-

data-lake

12. Data Lakes Get Smart with Semantic Graph Models Been swimming in a data lake recently? Perhaps not, as many companies still are just dipping

their toes into these waters, as they become more familiar with the general idea of a data lake.

As research firm Gartner describes it, a data lake is:

“a collection of storage instances of various data assets additional to the originating data

sources…[and whose purpose is to] present an unrefined view of data to only the most highly

skilled analysts, to help them explore their data refinement and analysis techniques

Page 9: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

independent of any of the system-of-record compromises that may exist in a traditional analytic

data store (such as a data mart or data warehouse).”

Even as enterprises consider the returns they may expect from diving deeper into data lakes,

they’re now being exposed to another twist on the concept. Enter the smart data lake, also

known as the semantic data lake. At DATAVERSITY’s® Smart Data Conference in San Jose in August,

the issue came up in sessions including presentations by Cambridge Semantics CTO Sean Martin

and FranzCEO Jans Aasman. Franz, in fact, notes that it has copy written the term Semantic Data

Lake, and points out that Gartner also has explained that data lakes need semantics in order to

be usable by a broad set of users.

http://www.dataversity.net/data-lakes-get-smart-with-semantic-graph-models/

13. Semantic Data Lakes and the Advance of Medicine This article charts the long path of Dr. Parsa Mirhaji and his work to bring Semantic Data Lakes

and healthcare analytics together: “Mirhaji saw something in the story [from 2001] that spoke

to a practical problem he had encountered in searching for and saving information on medical

advances, and it set the tone for research and analytics work he continues to do now.”

”I was being challenged to do analytics on heterogeneous data sets from across the Web,” he

said. “I started to learn about artificial intelligence and software agents and how they could

‘grab concepts’ from the Web or databases.”

It continues with a much deeper discussion of how Dr. Mirhaji began to integrate his ideas into

real-world practice, “Mirhaji said the data lake system is still in training mode, being tested out

on some specific analytics tasks. It takes in all sorts of genetic, population, health and wellness

data; that includes data from the U.S. Census, clinical trials and patient records – for example,

heart rate, temperature and blood pressure measurements collected by patient monitoring

devices.”

http://www.dataversity.net/semantic-data-lakes-and-the-advance-of-medicine/

Page 10: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

14. Semantic Data Lakes Dives In For Healthcare The Healthcare industry has been having a hard time trying to save time and dollars to

improving patient outcomes for healthcare organizations through the use of data analytics.

Many health systems find it difficult capturing and using the data from its patients to make a

real impact on their businesses, especially considering the enormous amount of redundancies

and unanswered questions when it comes to dealing with healthcare.

“Making sense out of big data is a challenge, particularly in the healthcare industry where

information comes from a variety of sources and in different forms including structured,

unstructured, images, temporal, geo-location and signal data,” says Jans Aasman, PhD, CEO of

Franz, Inc., specializing in semantic web technologies.

The solution may have finally appeared in the form of SDL’s (Semantic Data Lakes) says

Aasman. In collaboration between The Montefiore Medical Center in Bronx, New York, Franz,

Inc., Intel, Cloudera, and Cisco, SDL is a system that allows you to get all the data together from

different silos for analytics. It will then be used to transform statistical databases, such as

spreadsheets, into interactive graph databases that can be used to make better informed and

predictive healthcare decisions.

http://bigcommunity.net/big_news/semantic-data-lakes-dives-in-for-healthcare/

15. The Data Lake Concept Is Maturing

https://cacm.acm.org/news/200095-the-data-lake-concept-is-maturing/fulltext

Page 11: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

16. The Potential of Data Lake Technology in the Healthcare

Industry Data lake technology has been named the future of big data. Unlike data warehousing

techniques, data lakes offer a different type of data management better suited to handling and

collaborating various forms of data in a way which doesn’t confuse systems. But how exactly do

data lakes work? What are their benefits in the healthcare system?

Data lakes embrace a semantic approach to data storage, retaining enormous amounts of raw

information (whether structured or unstructured) in its original state within a single centralised

location. This technology, often referred to as graph databases, enables analysts to select which

data to use when necessary, giving them the opportunity to reuse it when required. Analysts

also have the option of combining seemingly incompatible sources of information in ways

previously thought impossible with data warehousing.

http://www.protogen.com.au/blog/the-potential-of-data-lake-technology-in-the-healthcare-

industry

17. 6 WAYS TO PROTECT MEDICAL DATA LAKES AT THE FACILITY

LEVEL With the global push towards putting EHRs and the unified medical record in the cloud,

hospitals and clinics are finally getting on board this big data ride. EHRs (Electronic Health

Records) and Patient Portal use has been rising steadily over the past three years with more

and more patients having access to their medical data thru Patient Portals and mobile devices.

As providers finally begin to roll out their EHR and Patient Portal programs, both patients and

Page 12: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

doctors are enjoying the new found access to the data they need to make medical decisions,

and it’s literally at their fingertips. This has given the patients a new sense of ownership of their

medical data and perhaps a feeling of control over their destiny – especially in an environment

where ACA has left them feeling somewhat limited. In today’s connected society, medical data

of all kinds are being gathered and gleaned from a myriad of sources; from EHRs to mobile

device apps to telemetry at the office level and to test results of all kinds. Medical data lakes,

collections of unstructured medical data, are growing and thriving; but what is the risk of this

access? What are the purposes of data lakes and how can we protect them?

http://dermatologyrecruiters.com/6-ways-protect-medical-data-lakes/

18. Dealing with Big Data: The ascendency of data lakes The data lake concept occupies a central place of prominence in contemporary big data

initiatives. The past two years have unveiled numerous headlines, vendor solutions (including

repackaging of former solutions) and enterprise use cases for the utility of this centralized

approach for accumulating, analyzing and actuating big data.

The fervor for this method of managing big data is based on a simple premise that promises

value for organizations regardless of size or vertical industry. Data lakes provide a singular

repository for storing all data – unstructured, semi-structured and structured – in their native

formats, granting access and insight to all without lengthy IT preparation.

Moreover, the data lake movement is largely spurred by adoption rates for Hadoop. As

Hadoop’s presence increases, its function as an integration hub for all data delivers more

credence and traction to the notion of data lakes. The data lake concept may be relatively new,

but the association of Hadoop and big data is nearly as ubiquitous as big data itself.

The combination of these two factors, Hadoop’s deployment as a data lake and the storage and

access benefits this method produces, is largely responsible for the widespread attention data

lakes have garnered. A recent post from Gartner reveals that data lake interest is “becoming

quite widespread.” Forbes indicates that “one phrase in particular has become popular for the

massing of data into Hadoop, the ‘Data Lake.’”

http://analytics-magazine.org/dealing-with-big-data-the-ascendency-of-data-lakes/

Page 13: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

19. Making Data Lakes Usable: Why we need a semantic layer –

and why it should be open source Big Data platforms, like Cloudera’s Data Hub, Hortonworks’ Data

Platform, MapR’s Converged Data Platform, and others including IBM, Oracle,

Pivotal, promise to easily bring diverse sets of application data together into one data cluster

running on Hadoop. This collection of data sets is called a data lake.

It is a wonderful thing to be able to bring data together so easily with Hadoop’s schema on read

ability rather than the schema on write required by traditional database systems. But we need

to remember that along with the data comes all the data’s associated data problems. Bringing

data into the data lake does not, unfortunately, wash away all the problems of non-standard,

non-integrated, redundant, and inconsistent data that are buried in application data.

For data lakes, with great ease of data access comes a great need for data management. But

this not what I want to talk about today (perhaps I’ll do so in a later blog post). I want to talk

about how we need to make it easy for users to access data in a data lake when there is non-

integrated and redundant data.

http://rcgglobalservices.com/blog/making-data-lakes-usable-why-we-need-a-semantic-layer-

and-why-it-should-be-open-source/

20. Implementing Personalized, Precision Medicine with Artificial

Intelligence and Semantic Graph Technology Personalizing healthcare services for individuals creates several demands on data-driven

functions in the medical field. Healthcare organizations are tasked with integrating structured,

unstructured and semi-structured data, storing and cataloging them in relevant ways across use

cases and locations, and leveraging emerging AI techniques for predictive capabilities which

could potentially save lives.

Most of all, this process must occur in time to make a difference for patients.

According to Montefiore Health System System Senior Vice President and Chief Medical Officer

Andrew Racine, who spoke at a recent event for the unveiling of Intel’s Xeon Scalable

Processors, all of these measures must be implemented so providers can: “use information in

real time to make clinical decisions that are going to allow us to intervene with patients and

prevent them from having adverse outcomes.”

Montefiore is currently engaged in such an undertaking with a Semantic Data Lake for

Healthcare(SDL). The SDL is powered by Franz’s AllegroGraph, architected by Intel, and fortified

by Cloudera’s Hadoop distribution. By merging a unique set of data management techniques

Page 14: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

with some of the most pertinent technologies across the data landscape, Montefiore is seeking

to tailor its medical treatment and diagnoses for individual patients.

https://analyticsweek.com/content/implementing-personalized-precision-medicine-artificial-

intelligence-semantic-graph-technology/

21. Big Data and Healthcare Payers With the implementation of the Affordable Care Act, the advent of Healthcare Information

Exchanges (HIE), the introduction of new provider models, such as Accountable Care

Organizations (ACO), and the transition to a more member-centric relationship model,

Healthcare Payers face seismic changes in their business models. As with many large-scale,

business transformations, there are challenges to navigate as well as opportunities to realize

around improving patient outcomes, reducing cost, and increasing revenue. Capitalizing on

these opportunities will depend on an organization’s capability to leverage information. The

ability to capture, integrate, and interrogate large information sets will be foundational in

realizing objectives, such as:

https://knowledgent.com/whitepaper/big-data-and-healthcare-payers/

22. The Bright Future of Semantic Graphs and Big Connected Data The big data revolution is generating a mess of unruly data that’s difficult to parse and

understand. This is to be expected–explosions don’t generally occur in a nice, orderly fashion,

after all. But if the folks at Cloudera and Franz have their way, the world of connected data will

become more accessible and useful when viewed through the lens of semantic graph

technologies.

Semantic graph technology is shaping up to play a key role in how organizations access the

growing stores of public data. This is particularly true in the healthcare space, where

organizations are beginning to store their data using so-called triple stores, often defined by

the Resource Description Framework (RDF), which is a model for storing metadata created by

the World Wide Web Consortium (W3C).

https://www.datanami.com/2016/02/08/the-bright-future-of-semantic-graphs-and-big-

connected-data/

Page 15: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

22. Empowering personalized medicine with big data and

semantic web technology: Promises, challenges, and use cases In healthcare, big data tools and technologies have the potential to create significant value by

improving outcomes while lowering costs for each individual patient. Diagnostic images, genetic

test results and biometric information are increasingly generated and stored in electronic

health records presenting us with challenges in data that is by nature high volume, variety and

velocity, thereby necessitating novel ways to store, manage and process big data. This presents

an urgent need to develop new, scalable and expandable big data infrastructure and analytical

methods that can enable healthcare providers access knowledge for the individual patient,

yielding better decisions and outcomes. In this paper, we briefly discuss the nature of big data

and the role of semantic web and data analysis for generating “smart data” which offer

actionable information that supports better decision for personalized medicine. In our view, the

biggest challenge is to create a system that makes big data robust and smart for healthcare

providers and patients that can lead to more effective clinical decision-making, improved health

outcomes, and ultimately, managing the healthcare costs. We highlight some of the challenges

in using big data and propose the need for a semantic data-driven environment to address

them. We illustrate our vision with practical use cases, and discuss a path for empowering

personalized medicine using big data and semantic web technology.

http://ieeexplore.ieee.org/abstract/document/7004307/?reload=true

23. How the Search for Smart Data Drives Healthcare IT

Investment There’s no question that the healthcare industry has become extremely “data rich” over the

past few years. Thanks to the rapid pace of electronic health record adoption, the vast majority

of healthcare organizations are now sitting on an enormous nest egg of big data, including

petabytes of clinical, administrative, demographic, and even genomic data on thousands or

millions of their patients.

But having data and knowing how to use it are two very different things. Despite the keen

interest in adopting a growing collection of big data analytics, predictive analytics, and risk

stratification tools, few organizations have really cracked the secret of how to turn a wealth of

fresh, unfiltered data into the spendable coin of actionable information.

Some fall into the trap of buying new products to solve each individual problem as it arises, not

realizing that they are creating a patchwork of competing technologies, or developing ad hoc

workarounds and an endless array of user interfaces that produce more headaches than they

cure.

http://www.distilnfo.com/provider/2016/08/08/search-smart-data-drives-healthcare-

investment/

Page 16: Pekka Neittaanmaki Dean of the Faculty of Information ... · data store (such as a data mart or data warehouse). _ Even as enterprises consider the returns they may expect from diving

24. Data Lake Management: Do You Know the Type of “Fish” You

Caught? “Different types of fish live in a community and when you understand their

relationship to each other, you have a better chance of catching what you

want.” thompsonadvertisinginc.com

You navigated your way to the lake and read up on the fundamentals of fish

management and introduced the data lake management principles. As you drove up to

the lake, you were thinking about what type of fish are you looking to catch or are

you looking for a trophy fish or to eat a fish? Without knowing the type of fish you’re

going to catch, you don’t know how far off-shore you have to boat for a catch.

The primary charter for any data lake initiative is the ability to catalog all the data,

enterprise-wide regardless of form (variety) or where it’s stored, whether on Hadoop,

NoSQL or an enterprise data warehouse, along with the associated business,

technical, and operational metadata. To carry on with our analogy, cataloging fish

into off-shore, near-shore or bottom fish can determine the type of fish you catch and

how far out you go fishing.

The catalog must enable business analysts, data architects, and data stewards to

easily search and discover data assets, data set patterns, data domains, data lineage

and understand the relationships between data assets – a 360 degree view of the

data. A catalog provides advanced discovery capabilities, smart tagging, data set

recommendations, metadata versioning, a comprehensive business glossary, and drill

down to finer grained metadata.

https://blogs.informatica.com/2016/11/01/data-lake-management-know-type-fish-

caught/#fbid=L81Agt5pe2b