4
Academic Skills in English, Oct. 12.2015 Ins. Colette Gattoni (M.Ed.) Visualizing Big Data Dawit Nida Abstract Ever since the creation of computers, data have been generated on a continues manner, in several forms and structures. Many organizations are utilizing Big Data, a huge amount of data, that have not been handled by the traditional data management systems, to make their business decisions. Big Data can be referred to unstructured, semi-structured or structured data generated by users. Unstructured data, such as emails, texts, data from sensors, etc. structured data includes transactions. Big Data can be characterized by the four Vs, Volume, Velocity, Variety and Veracity. Handling Big Data requires expertise in the field to minimize the risks in security, data management and analytics. Data scientists who make different kinds of analytics for organizations, use a set of techniques for processing big amount data to create business decisions using business intelligence tools and methods. Keywords Big Data, Analytic, Business Intelligence Contents Introduction 1 1 Why Big Data is ‘BIG’? 1 1.1 The Four Vs of Big Data ............. 2 1.2 The Risks of Big Data ................ 2 2 Big Data Analytics 3 2.1 Types of Analytics ................... 3 3 BI and Big Data 3 4 Conclusion 4 References 4 Introduction Big Data is a huge amount of data, that have not been handled by the traditional data manage- ment systems. To handle these data, analytics is required. Analytics is a knowledge discovery and extracting valuable trends in data that can be visualized for insights and patterns using a set of techniques and tool required to collect, store, visualize data into valuable information and benefit from analysing and making efficient business decisions. 1. Why Big Data is ‘BIG’? Big data is a form of data, but huge amount of data, from several sources in varies forms that have not been handled by the traditional data management systems. Big data can be referred to unstructured, semi-structured or structured data. Unstructured data from social media and sensors includes emails, texts, data from various sensors, video and audio data, and etc. struc- tured data includes transaction data from cus- tomers, etc. For instance, based on the infographic post on iDigitalTimes (July 2013) [1][2], in every 60 sec- onds on the internet the following happens. • 204 mil. emails sent out 1.8 mil. Facebook likes are generated by users • 278,000 Tweets are twitted • 200,000 photos are uploaded to Facebook Around 100 hours of video are uploaded to YouTube • 20 mil. photos are viewed on Flickr • 120 new users register to LinkedIn • 88,000 calls made on Skype • €73,000 is spent on Amazon

Introduction to visualizing Big Data

Embed Size (px)

Citation preview

Page 1: Introduction to visualizing Big Data

Academic Skills in English, Oct. 12.2015Ins. Colette Gattoni (M.Ed.)

Visualizing Big DataDawit Nida

AbstractEver since the creation of computers, data have been generated on a continues manner, in severalforms and structures. Many organizations are utilizing Big Data, a huge amount of data, that havenot been handled by the traditional data management systems, to make their business decisions.Big Data can be referred to unstructured, semi-structured or structured data generated by users.Unstructured data, such as emails, texts, data from sensors, etc. structured data includes transactions.Big Data can be characterized by the four Vs, Volume, Velocity, Variety and Veracity. Handling BigData requires expertise in the field to minimize the risks in security, data management and analytics.Data scientists who make different kinds of analytics for organizations, use a set of techniques forprocessing big amount data to create business decisions using business intelligence tools and methods.KeywordsBig Data, Analytic, Business Intelligence

ContentsIntroduction 1

1 Why Big Data is ‘BIG’? 11.1 The Four Vs of Big Data . . . . . . . . . . . . . 21.2 The Risks of Big Data . . . . . . . . . . . . . . . . 22 Big Data Analytics 32.1 Types of Analytics . . . . . . . . . . . . . . . . . . . 33 BI and Big Data 34 Conclusion 4

References 4

IntroductionBig Data is a huge amount of data, that have notbeen handled by the traditional data manage-ment systems. To handle these data, analyticsis required. Analytics is a knowledge discoveryand extracting valuable trends in data that canbe visualized for insights and patterns using aset of techniques and tool required to collect,store, visualize data into valuable informationand benefit from analysing and making efficientbusiness decisions.

1. Why Big Data is ‘BIG’?Big data is a form of data, but huge amount ofdata, from several sources in varies forms thathave not been handled by the traditional datamanagement systems. Big data can be referredto unstructured, semi-structured or structureddata. Unstructured data from social media andsensors includes emails, texts, data from varioussensors, video and audio data, and etc. struc-tured data includes transaction data from cus-tomers, etc.

For instance, based on the infographic post oniDigitalTimes (July 2013) [1][2], in every 60 sec-onds on the internet the following happens.• 204 mil. emails sent out• 1.8 mil. Facebook likes are generated by users• 278,000 Tweets are twitted• 200,000 photos are uploaded to Facebook• Around 100 hours of video are uploaded to

YouTube• 20 mil. photos are viewed on Flickr• 120 new users register to LinkedIn• 88,000 calls made on Skype• €73,000 is spent on Amazon

Page 2: Introduction to visualizing Big Data

Visualizing Big Data — 2/4

Figure 1. Two selected Big Data articles analysed using R-programming language

1.1 The Four Vs of Big DataTo define the features of Big Data, the four Vsare described below[3].

Volume: refers to the amount of data generatedevery millisecond. Huge number of transactionaldata stored every second, machine and sensordata, social media, and enterprise data is beingcollected and stored. Even though, storage costis not a big issue, sorting out the data withrespect to their relevancy is today’s problem.

Velocity: refers to the speed of streaming datathat is required to be executed in real time. It isnot only how fast data is produced or changedbut also the speed it has to be received, under-stood and processed.

Variety: in addition to the size and the speed,data also varies with respect to structure andtypes. For instance email, photos, video or audio

data are few examples of unstructured data thatcan be used with traditional structured data thatfits into tables and relational databases.

Veracity: refers to the trustworthiness (uncer-tainty of data) of the data answering where thedata came from and how accurate it is. Theamount of data from varies sources and theamount of data directly affects the value andquality of the data.

1.2 The Risks of Big DataDue to the nature of Big Data and technologyadvancement, less expertise of data scientiststhat handle data and do visualization, Big Databrings the following risks when considering usingit in organizations [4].

Page 3: Introduction to visualizing Big Data

Visualizing Big Data — 3/4

Bad data and bad data analysis: collecting datacannot support companies business to grow andcompete with similar businesses. Irrelevant andoutdated data can turn the company businessdecision to undesired outcomes. While analysing,misinterpreting data patters and trends in thedata is also risky.

Security: since Big Data contains data that canbe sensitive considering the logistics of data col-lection and analysis can be insecure and mayexpose data. Besides the bigger the data, thebigger the risk will become to companies. Dataprivacy can also be mentioned closely in relatedto the issue of security.

Costs: collecting data, aggregating and process-ing it, storing and then analysing it to generatereports and visual graphs it requires budget plan-ning and money, thus Big Data might be costlycompare to the efficiency of the data later.

2. Big Data AnalyticsToday, several organizations are collecting andstoring petabytes and exabytes of data from cus-tomers, sensors, transactions and various datasources that requests high-performance analyticsand with new advances in computing technol-ogy to process the data in order to figure outwhat’s important and what isn’t. Analytics issimply knowledge discovery and extracting valu-able trends in data that can be visualized forinsights, unknown correlations and hidden pat-terns. Using high-performance data and textmining, diagnostic and predictive analytics, pre-scriptive analytics for forecasting and optimiza-tion to make the best possible business decisionsfor an organization [5].

2.1 Types of AnalyticsData visualization refers to the approaches andtools used to visually understand the insightsfrom data to prove or disprove a hypothesis.There are four types of analytics organizationsuse to make high-performance analytics to in-crease competence in their domains.

1. Descriptive analytics is a set of techniquesfor reviewing and examining the data set tounderstand the data and analyse business per-formance to provide insight about the past.

2. Diagnostic analytics is a set of techniques fordetermine what has happened,why servicesdemand is low/high, customer segmentation,any trend from the data.

3. Predictive a set of techniques that analysecurrent and historical data to determine whatis most likely to (not) happen, who need more,helps model and forecast.

4. Prescriptive analytics is a set of techniques forcomputationally developing and analysing al-ternatives in tactical or strategical model anddiscover the unexpected, seeks to determinethe best solution or outcome among variouschoices, given the known parameters.

3. BI and Big DataBusiness Intelligence is a technique or process,technology and tools required to collect, store,analyse data into valuable information and ben-efit from analysing and taking efficient businessadvantages. BI provides users to easily consoli-date, search, and visually analyse data and gainunexpected business insights by understandinghow data is associated.Usefulness of BI

• Optimize business processes• Better decision making on every level in

the organization based on fact• Create better customer experience• Improve competitiveness and• Increase revenues, etc.

Data discovery and visualization in BI can be• Reports• Scorecards• Dashboards• Ad hoc analysis• Visualization

Page 4: Introduction to visualizing Big Data

Visualizing Big Data — 4/4

4. ConclusionTo summarize, finding ways to use Big Dataand analytics has become a bigger concern fromsmall and mid-sized businesses to big corporatesthat are looking for ways keep up with largercompetitors. Big Data enhance organizations intheir product quality, improve marketing opera-tions and further customer relationships, betterbusiness decisions, etc. BI and data visualiza-tion are very essential for utilizing Big Data andimplementing using descriptive, diagnostic, pre-dictive, prescriptive or combination of the fourtypes of analytics. Although Big Data has risksand weakness, it is called renewable oil that canbe applicable in many organizations and fields.

References[1] Big data, what it is and why it matters.[2] Tomas Eklund. Dw lecture 4: Big data, Aau.

4.5.2015.[3] Big data analytics, advanced analytics in

oracle database. 2013.[4] Bernard Marr. The 5 biggest risks of big

data.[5] Marko Grobelnik. Big data tutorial.