Great Data By Design

Preview:

DESCRIPTION

The volume, velocity, and variety of data is increasing at an unprecedented pace. The amount of data generated in the world today is doubling every two years. Great data isn't an accident. It happens by design. You have to work at it…

Citation preview

Great Data.By Design.

2

Great data isn’t an accident. It happens by design. Ensuring that you have the clean, safe, connected data you need to power confident decisions and effective business processes isn’t an easy task.

You have to work at it…

3

The challenge is that the market trends are working against you as data professionals.

Your jobs are getting harder.

More Data. In More Places. Moving Faster Than Ever

Before.

Market Trend

#1

The volume, velocity, and variety of data is increasing at an unprecedented pace. The amount of data generated in the world today is doubling every two years.

It’s the new Moore’s law.

20090.8Zettabytes

202035.2Zettabytes

And, to top it off, we are attaching RFID devices and sensors to everything.

Technologies like Hadoop allow us to affordably store vast amounts of data.

The power of mainframe computing now fits in the palm of our hands.

Take jet airplanes for example. A jet aircraft engine has up to 3000 sensors

on it, and they are constantly throwing off data. The amount of data that comes off an engine during flight ranges from .5 TB to 4 TB.

And we are only just beginning.

The volume, variety, and velocity of data will only continue to increase.

Data is Everywhere and It’s Quality is

Questionable

Market Trend

#2

It’s in all the old places, and all the new ones.

Both on-premise and in the cloud.

Data is scattered everywhe

re

Mobile Devices

Social Media

CRM Applications ERP Applications

Message Queues

Flat Files

SensorsFlat FilesObscure Legacy Systems

Databases

Unstructured Docs

Cloud

Hadoop ClustersMainframes

It used to be that data integration projects were limited or put at risk by the cost and performance of CPU, memory, network, or disk. Today, that’s no longer the case.

Now we’re limited by our ability

to deal with data that is fragmented and of poor or questionable quality.

12

To realize the full value of their data, organizations need to be able to integrate it across the entire enterprise.

And data quality needs to be built into the process. Much like manufacturing went through a transition in the 80’s – where the quality steps for building products were baked into the manufacturing process – the same needs to be done with data.

The Business Wants Self Service

Market Trend

#3

Over the last five years, business users have become more technically savvy. Easy-to-use technology now plays a large role in their personal lives, helping them do things faster, easier, and better. It has empowered them. And they expect the same experience at work.

The business doesn’t want to wait for IT to deliver great data. They want to do it on their own. The Empowered

Consumer

Search

SocialNetworking

Apps

Mobility

There are (some pretty cool) self-service tools that allow them to visualize their data.

The trouble is, they only work for a single data set at a time.

When the business needs data that crosses business boundaries, or data set boundaries, they still have to come back to IT.

Or worse yet, they come back to IT because they have done all they can with their self-service tools and then realize that the data they are using is mission-critical and requires mission-critical processes…

…that they can’t run on their laptop.

Self-service can only take the business so far.

A new way of thinking is needed

A lot of companies believe that the way to achieve competitive advantage is to focus on their core business processes.

If we’re the best at what we do, we

can beat the competition.

And they aren’t entirely wrong.

They believe that by investing in applications to support those core business processes they can use the new efficiencies – or the improved service that comes from those efficiencies – for competitive advantage.

We need an application that will

automate and improve our core processes, so we

can beat the competition.

And they aren’t entirely wrong.

The trouble is: people still think about their business application as a single, monolithic thing.

Business-Critical Application

That’s where they’re wrong.

The reality is that these processes and the core applications supporting them aren’t a single monolithic thing. Any business process today is highly distributed across multiple systems…

Business-Critical Applications

…. and the number of systems and data points to which data must flow in or out is only increasing.

Business-Critical Applications

It is generally true that innovation exists at the edges of boundaries, or the intersection of different disciplines.

Innovation happens here

As more data gets created across more systems, the ability to integrate and intersect data across those boundaries becomes a critical success factor for the next generation of innovation.

Do we have all the data we need to support our compliance constraints?

Who are our most profitable customers?

How can I improve collaboration between suppliers and contractors?”

How do I accelerate my supply chain?

Can we drive efficiencies in our procurement processes?”

Can we create new information based services to offer our customers

Business-Critical Applications

But integrating data is harder than most people think.

Take the jet aircraft, for example. While the engines may be the same from plane to plane, the data coming off of them – via their 3000 sensors -- is not controlled by the engine manufacturer. It’s controlled by the airlines.

And each airline stores those same 3000 attributes in their own format.

Which means that when the data for the same kind of engine is sent back to the manufacturer for analysis, they first have to normalize it. What would seem like an easy exercise – analyzing data from the same kind of engine -- is much harder than it looks.

The additional challenge is that the legacy data never dies and has to be pulled in as well.

Every data project is like this. It is always harder than anyone thinks and the number of moving parts is only increasing.

To overcome this challenge, you have to design great data into your business processes.

Just like you invest in people, process, and technology for your core business processes, you have to invest in people, process, and technology to integrate the distributed data that supports those processes.

That is because business agility now depends on data integration agility. And data integration agility depends on getting everyone involved -- and ensuring that the business and IT have the right tools to enable collaboration. In fact, we’ve seen that in companies where the business and IT collaborate, DI projects are executed 5x faster than in companies where they don’t.

34

Considerations for Designing Great Data5

Connect to all your dataRDBMS, Flat Files, XML, Hadoop, NOSQL, Social Media, Mainframe, Machine Data, and More …#1Data integration enables you to combine data from many different and rich sources to produce new business information you couldn’t get from a single source. Make sure your data integration tools are able to connect to any data source (both current and legacy) including RDBMS, NOSQL, mainframe, text, applications, and so on — and not just the data sources you consume today. It’s this universal set of connections that makes it possible to bring all that data together.

Support the Right Format and LatencyBatch, Real-Time, Near Real-Time. Structured, Unstructured, Semi-Unstructured.

#2In the same way data integration draws data from many different sources, it also must be able to consume various and multiple data types, including structured, semi-structured, and unstructured data sources in batch and real-time modes. You need a tool that is flexible enough to work with any type of data you encounter.

Understand Data Structure and ContentInclude Data Profiling in Your Methodology#3With so many different sources of data involved, you need to have a means to make sure that your data is what you expect. It’s important that your tools allow a level of data profiling so that you can verify the data going into and out of your system, and ensure that you’ll end up with the desired results.

Enable Effective Business and IT Collaboration Be Agile and Lean#4You can’t afford to create and execute projects using traditional, isolated development methods anymore. Your data integration tools need to support lean and agile integration processes that enable business and IT collaboration so that development happens quickly and interactively.

Support Business Growth and ExpansionBe Able to Scale Up and Scale Down#5Companies grow, and so do the sizes of their projects. You don’t want to be locked into tools that are only appropriate for today’s projects. Rather, you want tools that have the ability to scale, grow, and move projects from small departmental innovative exercises to large enterprise mission-critical environments, or vice versa.

Learn how you can build great data, by design.

Click here to download