21
Nathan Kohn BU MET [email protected] Thinking Big in Small Spaces One Hadoop Two Hadoop (Big Data & 21st Century Analytics in the Classroom) Stanislav Seltser BU MET [email protected]

Nathan Kohn BU MET enzyme@bu

  • Upload
    cynara

  • View
    96

  • Download
    0

Embed Size (px)

DESCRIPTION

Thinking Big in Small Spaces One Hadoop Two Hadoop (Big Data & 21st Century Analytics in the Classroom). Stanislav Seltser BU MET [email protected]. Nathan Kohn BU MET [email protected]. 6 Billion Flickr Photos. 900 Million Facebook Users. 72 Hours a Minute YouTube. - PowerPoint PPT Presentation

Citation preview

Page 1: Nathan Kohn BU MET enzyme@bu

Nathan KohnBU MET

[email protected]

Thinking Big in Small Spaces

One Hadoop Two Hadoop (Big Data & 21st Century Analytics in the Classroom)

Stanislav SeltserBU MET

[email protected]

Page 2: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 2

Big Data is Everywhere

72 Hours a MinuteYouTube28 Million

Wikipedia Pages

900 MillionFacebook Users

6 Billion Flickr Photos

2

“… data a new class of economic asset, like currency or gold.”

“…growing at 50 percent a year…”

Page 3: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 3

How will wedesign and implement Big learning systems?

Big Learning

3

BU Graduate

studentsBU Undergraduates

GPUs Multicore Clusters Clouds Supercomputers

BU Faculty

Page 4: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 4

Graphs are Everywhere

Use

r

Movie

Netflix

Collaborative Filtering

Doc

s

Words

Wiki

Text Analysis

Social Network

Probabilistic Analysis

4

Page 5: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 6

Big Data & Linear Regression

Page 6: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 7

Stochastic Gradient Descent

Page 7: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 8

Serial vs Parallel SGD

Page 8: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 9

Big Data Landscape –Apps, Infrastructure, Data Semantics

Page 9: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 10

Landscape

Page 10: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 11

Grad Student Response #1How Big is Big? How is BigData measured?As per my understanding, the term big data doesn’t refer directly to the size of the data itself. What the term might mean is that the demand of data (storage/transfer/analysis) has surpassed several parameters that the relational databases cannot control (or handle) –too big to handle--. How is it measure, I really don’t know. Server storage keeps increasing and increasing (5TB, 10TB, 50TB, 100TB……) and RBDMS’s like ORACLE seem to be keeping up with it, but then again I don’t know exactly what measure is being used.

Is Big Data relevant to you professionally?Indeed it is, even though I am not using it or practicing it daily.I am really interested in learning it.

Is Big Data relevant to you personally?Very relevant, and it is a topic that drove me into pursuing a master’s degree

Page 11: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 12

Grad Student Response #2How Big is Big? How is BigData measured?Big data is a term for large data sets that are too complex to compute by traditional data management processes and tools. Its points and data types are dependent and measured by the parameters set forth by each organization.

Where does BigData come from?Big data can come from various sources that can be categorized as internal or external contributors.

What is BigData good for?BigData is good for complex and large data sets that exist within a relational databases and may require object-oriented programming.

Would you like to see Big Data incorporated in your courses? Yes, I think that we exist in a period in which we are inundated by social media, numbers, photographs and other forms of data which require us to be well versed in the storage, maintenance, and interface design so that we are better able to parse through the Big Data that we encounter on a daily basis.

Page 12: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 13

Undergrad Student #1Is Big Data relevant to you personally?Yes. As my current major is Business Application Development, I can see myself gaining a lot of opportunities to deal with not only the technologies of building up user interface in the future but also the technologies of storing user information, and the techniques used to understand those data could be another opportunity for the business

Would you like to see Big Data incorporated in your courses? Yes. I would like to see our course includes some of the techniques that the corporates use nowadays to understand the relation between their data and the problems they need to address, such as how they decide which part of the their big data provides them with the most helpful information for their problem, and explain the meaning of their data analysis based on the result, such as how they can decide the result is accurate and meaningful enough to allow them to take an action.

Do you have any questions about Big Data?Big data is a pretty interesting and useful topic. It will be nice to have more background information to help our understanding.

Page 13: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 14

Undergrad Student #2How Big is Big? How is BigData measured?The survey is asking rather easy conceptual questions about big data. Big data is easy to understand at that level: we finally have the technology to store, retrieve (cheap memory), and analyze (with proper languages) data on magnitudes that were impossible before. Instead of just a phone book type of data, people can gather every relevant or even possibly relevant piece of information about anything (often but not limited to customers of a business). I have read articles about how some companies (credit card mostly, if I remember correctly) that can tell if a woman is pregnant before they even know themselves. Or they can predict divorce rates a year in advance quite reliably. All this from their spending habits and deviations from those habits.

While all this is fascinating, I don't have any real interest in learning the conceptual level like this. If big data is to be relevant in a class, it needs to show HOW all this is done. Teach the language, teach the search and statistical algorithms, or even the methods people use to collect big data (the penta+bytes aren't being entered by hand).

Classes or lectures on big data should come away with some practical knowledge on the subject, otherwise we're just applying a name to something people generally understand: organizations collect and analyze as much data as they can, and recent technology has made that amount of data staggeringly large. The key- and buzz-words are nice to sound like an expert, but the how to is generally more important.

Page 14: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 15

Student Response #4How Big is Big? How is BigData measured?Big data is a term developed recently to describe the trend of exponentially increasing amount of data stored by organizations for business uses. Very often these big data might be extremely big, such as 16 petabytes. These data is measured by the memory space they occupy. Thus, a 16 petabytes of big data approximately occupies 1015 bytes of memory.

Where does BigData come from?Big Data could come from different sources, such as emails, social-networking sites, sensors on the webs, sensors installed on other tracking devices, or line of business applications.

Is Big Data relevant to you professionally?Yes. In my previous work as market researcher, we always needed to gather information and analyzed them for the business decision making. The technologies of gathering big data and the techniques used to analyze and filter data is also considered extremely helpful for the career.

Page 15: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 16

Data Warehouse CourseStudent Comments:

Very informative, content-rich course, covers the latest technologies, trends, and skills of data warehousing and data management, and data analysis. I would recommend to include this course in the required courses for the MS in CIS with concentration in Database Management and BI Program.

Relevance to job opportunities and cutting edge technologies.

This is probably the most useful course I have taken at Boston University. I have used every bit of what this professor taught every night at work. I have made contribution to my employer, a data mining company in ways that had never been done before as a result of this course. I have for the first time in my 8 years career planned, designed, and augmented a Data Warehouse from scratch. I have configured an analysis server and reported using MD x queries.

This professor has been helpful in many ways. He has guided me through some Data Warehouse design projects at work. Moreover, he has been available to work with me and others after class and on week days.

Page 16: Nathan Kohn BU MET enzyme@bu

Mar 7, 2014 17

Road map

Page 17: Nathan Kohn BU MET enzyme@bu

to help archaeologists find answers to questions hidden in thousands of images and text files generated from field sites around the world:

Professor Mark Eramian et al. have been awarded $548,000 through the Digging into Data Challenge, National Endowment for the Humanities

A Archeology

Page 18: Nathan Kohn BU MET enzyme@bu

Recently, a researcher wanted to ascertain whether a search against GQ-Pat could provide novel insight into his work related to a specific gene, the cAMP Responsive Element Modulator.

Reporting to the VP of R&D:

Apply data mining and machine learning techniques to develop better search and content discovery in the field of patents Invent new ways to index tens of millions of documents with semantic information

B Biology

Page 19: Nathan Kohn BU MET enzyme@bu

(hint: beer)

ZZymurgy

QUIZ ?

Page 20: Nathan Kohn BU MET enzyme@bu

Quiz:

Page 21: Nathan Kohn BU MET enzyme@bu

Nathan KohnBU MET

[email protected]

Stanislav SeltserBU MET

[email protected]