Previously known as
Think Big. Move Fast.
Template designed by
brought to you by
SolidQ• Born in 2002 in USA and Spain
• Established in 2007 in Italy
• More than 1000 customers and more than 200 consultants worldwide
• Dedicated to Data Management on the Microsoft Platform
• Books Authors, Conference Speakers, SQL Server MVPs and Regional Directors
• www.solidq.com
Davide Mauri• 18 Years of experience on the SQL Server Platform• Specialized in Data Solution Architecture, Database Design, Performance
Tuning, Business Intelligence• Microsoft SQL Server MVP• President of UGISS (Italian SQL Server UG)• Mentor @ SolidQ• Video, Book & Article Author• Regular Speaker @ SQL Server events• Projects, Consulting, Mentoring & Training
Data ScienceReinassance 2.0
“Companies are collecting mountains of information about
you, to predict how likely you are to buy a product,
and using that knowledge to craft a marketing message
precisely calibrated to get you to do so”
Business Week Magazine
1994
Data Science• Extraction of knowledge from data
• So, what’s new?
• Nothing. Except that it’s now economic and fast.
• It’s now applicable to everything. And we have a lot of data produced everyday that can be used to extract knowledge
Data Science
DecisionsKnowledgeInformationData
Data Science• A Sum Of
• Statistics• Mathematics• Machine Learning• Data Mining• Computer Programming• Data Engineering• Visualization• Data Warehousing• High Performance Computing
• To support (Informed) Decision Making• Data-Driven Decisions
Data Scientist• IBM
• A data scientist represents an evolution from the business or data analyst role. • The formal training is similar, with a solid foundation typically in computer science and
applications, modeling, statistics, analytics and math. • What sets the data scientist apart is strong business acumen, coupled with the ability to
communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge.
• It's almost like a Renaissance individual who really wants to learn and bring change to an organization.
Algorithms• Algorithms are the new gatekeepers
• http://www.slideshare.net/socialisten/algorithms-are-the-new-gatekeepers • There is simply too much data for a human to analyze!• They decide
• What we find• What we see• What we buy
• Data is the foundation upon which algorithm works• Better Data lease Better Results
• Data-Driven Decisions will be a MUST in the next years!• Data Scientists will help companies to leverage their most valuable asset: Data
Modern Data Environment
MasterData
EDWData Mart
Big Data
UnstructuredData
BI Environment
Analytics Environment
StructuredData
Big Data
The 3 V
No, the 4 V!!!
No, no, the 5 V!!!!!6V!!!
http://www.ibmbigdatahub.com/infographic/four-vs-big-data
Big Data• Volume, Velocity, Variety, Veracity….V<your-v-here>
• Data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process the data within a tolerable elapsed time
• Grid Computing, Parallel Computing needed• keep processing time reasonable• provide scalability
Big Data Data• Paradigm: “Store Now, Figure Out Later”
• Data is the new resource. Never throw it away!
• Unstructured Data• Text Files• Images• Sounds
• Structured/Semi Structured Data• Sensors• Transactions• Logs
Data Storage• RDBMS
• SQL Server
• Hadoop• HDInsight• Hortonworks Data Platform
• Distributed File (Eco)System• CSV• JSON• *.*
Data Storage• Hadoop Ecosystem
http://hortonworks.com/hadoop-modern-data-architecture/
Data Science & Big Data• Data Science != Big Data
• Data Science Not Only on Big Data
• Data Science can be applied to Big Data
• Data Science starts from Small Data• 1) find the algorithm that extract knowledge• 2) measure algorithm results and in terms of probability
Machine Learning• Machine learning, a branch of artificial intelligence, concerns the construction
and study of systems that can learn from data. (Wikipedia)• For example, a machine learning system could be trained on email messages to learn to
distinguish between spam and non-spam messages. After learning, it can then be used to classify new email messages into spam and non-spam folders.
• Flavors• Supervised• Unsupervised
Data Analysis• Common Data Scientists Tools
• R• Weka• Octave• Scikit-Learn
• Common Data Scientists Languages• Python• Scala• F#
Resources• https://www.coursera.org/
• Data Scientist Specialization
• https://www.khanacademy.org/ • Math
• http://www.osservatori.net/business_intelligence • Italian Big Data Market Analysis Resources
• http://www.solidq.com/consulting/• Data Science Services• Big Data / Business Intelligence / Data Warehousing
Previously known as
Think Big. Move Fast.