Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Rapid Advances in Computer Science and Opportunities for SocietyEuropean CS Presentation October 2010
Alfred SpectorVP Research and Special Initiatives
Rapid Advances in Computer Science amp Opportunities for Society
Information and Communication Technologies have had a rapid impact on society and ndashamazinglymdashthe pace of innovation continues to accelerate This innovation is catalyzed by ever-increasing hardware and networking capabilities the growth in internet usage as well as important advances in basic and applied computer science In this talk I will describe some of the research that Google is undertaking (for example in machine translation semantic processing and information management) and discuss some of the likely beneficial impacts on our society ndash for example in science the humanities education philanthropic activities and more Irsquoll conclude my presentation with some interesting challenges from both a technology and policy point of view
Abstract
OutlineGoogleProdigiousnessAdvances in the Field examples
TranslationSpeechVisionCloud-based collaboration around structured-dataOperations ResearchSemantic Processing
Beneficial Societal Impacts examplesEarth EngineGoogle HealthOther Health EffortsCrisis ResponseDigital HumanitiesEducation
A Technical ThemesChallenges
Mission
Organizing the worldrsquos information andMaking it universally accessible and useful
Google and Commerce
Over 1 million AdWords advertisers worldwideOver 1 million AdSense publishers worldwideVia the Google Ad Network AdSense publishers reach over 80 of global internet users in 100 countries and 20 languagesYouTube is monetizing over a billion video views per week globallyIn 2009 Google generated $54 billion of economic activity for American businesses website publishers and non-profits
Prodigiousness
Giga 109 Tera 1012 Peta 1015 Exa 1018 Zetta1021
Publicized Bigtable of 70 petabytes 10M opssecWarehouse computing possibilities 100 x 10 x 20 x 20 x 40 = 16000000 nodeshellipSome representative numbers
Storage 1018 -gt 1020-21
Users 109 -gt 1010
Devices 10 -gt 1012
Network 1020 now -gt1021yr 32 KBsec for 1B peopleApps 105 -gt 106-7 or more
Eg embedded car systems 30-50 ECUs 100M lines of code
A variety of science engineering challenges
Focus on Innovation that Benefits our UsersFocus on Research and Engineering
Commitment to advancing technologyRich domain of work due to our missionGrand challenge problemsInternal consensus that production issues are often as challengingfun as pure inventionTechnical leverage1 Google Common Distributed System 2 A Focus on Services3 Empiricism and a Holistic Approach to Design
Our Innovation Culture
Focus on talentDistributed across the organization
Impacting Google necessitates broad diverse involvement in science and engineeringResearch is done both in our research team and in our engineering organization organized opportunistically
Teams benefit greatlyFrom mutual talentFrom Googlersquos comparative advantages to our scale and broad useFrom service-based architecture (ldquoeaserdquo of working in vivo)
Ideal Distributed Computing
Devices
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Rapid Advances in Computer Science amp Opportunities for Society
Information and Communication Technologies have had a rapid impact on society and ndashamazinglymdashthe pace of innovation continues to accelerate This innovation is catalyzed by ever-increasing hardware and networking capabilities the growth in internet usage as well as important advances in basic and applied computer science In this talk I will describe some of the research that Google is undertaking (for example in machine translation semantic processing and information management) and discuss some of the likely beneficial impacts on our society ndash for example in science the humanities education philanthropic activities and more Irsquoll conclude my presentation with some interesting challenges from both a technology and policy point of view
Abstract
OutlineGoogleProdigiousnessAdvances in the Field examples
TranslationSpeechVisionCloud-based collaboration around structured-dataOperations ResearchSemantic Processing
Beneficial Societal Impacts examplesEarth EngineGoogle HealthOther Health EffortsCrisis ResponseDigital HumanitiesEducation
A Technical ThemesChallenges
Mission
Organizing the worldrsquos information andMaking it universally accessible and useful
Google and Commerce
Over 1 million AdWords advertisers worldwideOver 1 million AdSense publishers worldwideVia the Google Ad Network AdSense publishers reach over 80 of global internet users in 100 countries and 20 languagesYouTube is monetizing over a billion video views per week globallyIn 2009 Google generated $54 billion of economic activity for American businesses website publishers and non-profits
Prodigiousness
Giga 109 Tera 1012 Peta 1015 Exa 1018 Zetta1021
Publicized Bigtable of 70 petabytes 10M opssecWarehouse computing possibilities 100 x 10 x 20 x 20 x 40 = 16000000 nodeshellipSome representative numbers
Storage 1018 -gt 1020-21
Users 109 -gt 1010
Devices 10 -gt 1012
Network 1020 now -gt1021yr 32 KBsec for 1B peopleApps 105 -gt 106-7 or more
Eg embedded car systems 30-50 ECUs 100M lines of code
A variety of science engineering challenges
Focus on Innovation that Benefits our UsersFocus on Research and Engineering
Commitment to advancing technologyRich domain of work due to our missionGrand challenge problemsInternal consensus that production issues are often as challengingfun as pure inventionTechnical leverage1 Google Common Distributed System 2 A Focus on Services3 Empiricism and a Holistic Approach to Design
Our Innovation Culture
Focus on talentDistributed across the organization
Impacting Google necessitates broad diverse involvement in science and engineeringResearch is done both in our research team and in our engineering organization organized opportunistically
Teams benefit greatlyFrom mutual talentFrom Googlersquos comparative advantages to our scale and broad useFrom service-based architecture (ldquoeaserdquo of working in vivo)
Ideal Distributed Computing
Devices
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
OutlineGoogleProdigiousnessAdvances in the Field examples
TranslationSpeechVisionCloud-based collaboration around structured-dataOperations ResearchSemantic Processing
Beneficial Societal Impacts examplesEarth EngineGoogle HealthOther Health EffortsCrisis ResponseDigital HumanitiesEducation
A Technical ThemesChallenges
Mission
Organizing the worldrsquos information andMaking it universally accessible and useful
Google and Commerce
Over 1 million AdWords advertisers worldwideOver 1 million AdSense publishers worldwideVia the Google Ad Network AdSense publishers reach over 80 of global internet users in 100 countries and 20 languagesYouTube is monetizing over a billion video views per week globallyIn 2009 Google generated $54 billion of economic activity for American businesses website publishers and non-profits
Prodigiousness
Giga 109 Tera 1012 Peta 1015 Exa 1018 Zetta1021
Publicized Bigtable of 70 petabytes 10M opssecWarehouse computing possibilities 100 x 10 x 20 x 20 x 40 = 16000000 nodeshellipSome representative numbers
Storage 1018 -gt 1020-21
Users 109 -gt 1010
Devices 10 -gt 1012
Network 1020 now -gt1021yr 32 KBsec for 1B peopleApps 105 -gt 106-7 or more
Eg embedded car systems 30-50 ECUs 100M lines of code
A variety of science engineering challenges
Focus on Innovation that Benefits our UsersFocus on Research and Engineering
Commitment to advancing technologyRich domain of work due to our missionGrand challenge problemsInternal consensus that production issues are often as challengingfun as pure inventionTechnical leverage1 Google Common Distributed System 2 A Focus on Services3 Empiricism and a Holistic Approach to Design
Our Innovation Culture
Focus on talentDistributed across the organization
Impacting Google necessitates broad diverse involvement in science and engineeringResearch is done both in our research team and in our engineering organization organized opportunistically
Teams benefit greatlyFrom mutual talentFrom Googlersquos comparative advantages to our scale and broad useFrom service-based architecture (ldquoeaserdquo of working in vivo)
Ideal Distributed Computing
Devices
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Mission
Organizing the worldrsquos information andMaking it universally accessible and useful
Google and Commerce
Over 1 million AdWords advertisers worldwideOver 1 million AdSense publishers worldwideVia the Google Ad Network AdSense publishers reach over 80 of global internet users in 100 countries and 20 languagesYouTube is monetizing over a billion video views per week globallyIn 2009 Google generated $54 billion of economic activity for American businesses website publishers and non-profits
Prodigiousness
Giga 109 Tera 1012 Peta 1015 Exa 1018 Zetta1021
Publicized Bigtable of 70 petabytes 10M opssecWarehouse computing possibilities 100 x 10 x 20 x 20 x 40 = 16000000 nodeshellipSome representative numbers
Storage 1018 -gt 1020-21
Users 109 -gt 1010
Devices 10 -gt 1012
Network 1020 now -gt1021yr 32 KBsec for 1B peopleApps 105 -gt 106-7 or more
Eg embedded car systems 30-50 ECUs 100M lines of code
A variety of science engineering challenges
Focus on Innovation that Benefits our UsersFocus on Research and Engineering
Commitment to advancing technologyRich domain of work due to our missionGrand challenge problemsInternal consensus that production issues are often as challengingfun as pure inventionTechnical leverage1 Google Common Distributed System 2 A Focus on Services3 Empiricism and a Holistic Approach to Design
Our Innovation Culture
Focus on talentDistributed across the organization
Impacting Google necessitates broad diverse involvement in science and engineeringResearch is done both in our research team and in our engineering organization organized opportunistically
Teams benefit greatlyFrom mutual talentFrom Googlersquos comparative advantages to our scale and broad useFrom service-based architecture (ldquoeaserdquo of working in vivo)
Ideal Distributed Computing
Devices
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Google and Commerce
Over 1 million AdWords advertisers worldwideOver 1 million AdSense publishers worldwideVia the Google Ad Network AdSense publishers reach over 80 of global internet users in 100 countries and 20 languagesYouTube is monetizing over a billion video views per week globallyIn 2009 Google generated $54 billion of economic activity for American businesses website publishers and non-profits
Prodigiousness
Giga 109 Tera 1012 Peta 1015 Exa 1018 Zetta1021
Publicized Bigtable of 70 petabytes 10M opssecWarehouse computing possibilities 100 x 10 x 20 x 20 x 40 = 16000000 nodeshellipSome representative numbers
Storage 1018 -gt 1020-21
Users 109 -gt 1010
Devices 10 -gt 1012
Network 1020 now -gt1021yr 32 KBsec for 1B peopleApps 105 -gt 106-7 or more
Eg embedded car systems 30-50 ECUs 100M lines of code
A variety of science engineering challenges
Focus on Innovation that Benefits our UsersFocus on Research and Engineering
Commitment to advancing technologyRich domain of work due to our missionGrand challenge problemsInternal consensus that production issues are often as challengingfun as pure inventionTechnical leverage1 Google Common Distributed System 2 A Focus on Services3 Empiricism and a Holistic Approach to Design
Our Innovation Culture
Focus on talentDistributed across the organization
Impacting Google necessitates broad diverse involvement in science and engineeringResearch is done both in our research team and in our engineering organization organized opportunistically
Teams benefit greatlyFrom mutual talentFrom Googlersquos comparative advantages to our scale and broad useFrom service-based architecture (ldquoeaserdquo of working in vivo)
Ideal Distributed Computing
Devices
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Prodigiousness
Giga 109 Tera 1012 Peta 1015 Exa 1018 Zetta1021
Publicized Bigtable of 70 petabytes 10M opssecWarehouse computing possibilities 100 x 10 x 20 x 20 x 40 = 16000000 nodeshellipSome representative numbers
Storage 1018 -gt 1020-21
Users 109 -gt 1010
Devices 10 -gt 1012
Network 1020 now -gt1021yr 32 KBsec for 1B peopleApps 105 -gt 106-7 or more
Eg embedded car systems 30-50 ECUs 100M lines of code
A variety of science engineering challenges
Focus on Innovation that Benefits our UsersFocus on Research and Engineering
Commitment to advancing technologyRich domain of work due to our missionGrand challenge problemsInternal consensus that production issues are often as challengingfun as pure inventionTechnical leverage1 Google Common Distributed System 2 A Focus on Services3 Empiricism and a Holistic Approach to Design
Our Innovation Culture
Focus on talentDistributed across the organization
Impacting Google necessitates broad diverse involvement in science and engineeringResearch is done both in our research team and in our engineering organization organized opportunistically
Teams benefit greatlyFrom mutual talentFrom Googlersquos comparative advantages to our scale and broad useFrom service-based architecture (ldquoeaserdquo of working in vivo)
Ideal Distributed Computing
Devices
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
A variety of science engineering challenges
Focus on Innovation that Benefits our UsersFocus on Research and Engineering
Commitment to advancing technologyRich domain of work due to our missionGrand challenge problemsInternal consensus that production issues are often as challengingfun as pure inventionTechnical leverage1 Google Common Distributed System 2 A Focus on Services3 Empiricism and a Holistic Approach to Design
Our Innovation Culture
Focus on talentDistributed across the organization
Impacting Google necessitates broad diverse involvement in science and engineeringResearch is done both in our research team and in our engineering organization organized opportunistically
Teams benefit greatlyFrom mutual talentFrom Googlersquos comparative advantages to our scale and broad useFrom service-based architecture (ldquoeaserdquo of working in vivo)
Ideal Distributed Computing
Devices
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Focus on Innovation that Benefits our UsersFocus on Research and Engineering
Commitment to advancing technologyRich domain of work due to our missionGrand challenge problemsInternal consensus that production issues are often as challengingfun as pure inventionTechnical leverage1 Google Common Distributed System 2 A Focus on Services3 Empiricism and a Holistic Approach to Design
Our Innovation Culture
Focus on talentDistributed across the organization
Impacting Google necessitates broad diverse involvement in science and engineeringResearch is done both in our research team and in our engineering organization organized opportunistically
Teams benefit greatlyFrom mutual talentFrom Googlersquos comparative advantages to our scale and broad useFrom service-based architecture (ldquoeaserdquo of working in vivo)
Ideal Distributed Computing
Devices
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Our Innovation Culture
Focus on talentDistributed across the organization
Impacting Google necessitates broad diverse involvement in science and engineeringResearch is done both in our research team and in our engineering organization organized opportunistically
Teams benefit greatlyFrom mutual talentFrom Googlersquos comparative advantages to our scale and broad useFrom service-based architecture (ldquoeaserdquo of working in vivo)
Ideal Distributed Computing
Devices
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Ideal Distributed Computing
Devices
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Research Challenges in Ideal Distributed Computing
Alternative designs that would give better energy efficiency at lower utilizationServer OS design aimed at many highly-connected machines in one buildingUnifying abstractions for exploiting parallelism beyond inter-transaction parallelism and map-reduceLatency reductionA general model of replication including consistency choices explained and codifiedMachine learning techniques applied to monitoringcontrolling such systemsAutomatic dynamic world-wide placement of data amp computation to minimize latency andor cost given constraints onBuilding retrieval systems that efficiently and usably deal with ACLsHolistic models of privacyThe user interface to the userrsquos diverse processing and state
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Totally Transparent Processing
D The set of all end-user access
devices
L The set of all human languages
M The set of all modalities
C The set of all corpora
Personal ComputersPhoneMedia PlayersReadersTelematicsSet-top BoxesAppliancesHealth deviceshellip
Current languagesHistorical languagesOther forms of human notationPossible language specializationFormal languageshellip
TextImageAudioVideoGraphicsOther sensor-based datahellip
The normal webThe deep webPeriodicalsBooksCatalogsBlogsGeodataScientific datasetsHealth datahellip
For all d in D all l in L all m in M and all c in C
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Totally Transparent Processing
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
ldquoHybridrdquo Intelligence
To extend the capability of people not in isolationAggregation of empirical signal is exceedingly valuableEx
Feedback in Information Retrieval eg in ranking or spelling correctionMachine learning eg image content analysis speech recognition with semi-supervised learning
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Research Challenges in Transparent Computing amp Hybrid Intelligence
Endless applications with very new user interface implicationsAddressing limits to dataTechniques to integrate user-feedback in acceptable fashionsApproaches to new signalExplanation scale and variance minimization in machine learningInformation fusionlearning across diverse signals ndash The Combination Hypothesis more generallyUsability devices and subpopulationsPrivacy
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Domains of Application
Search enginesTranslationSpeech recognitionVision
Remedial EducationPersonal healthEpidemiologyEconomic predictionSocietalenvironmental optimizationSocial Networking in ever more cleveruseful ways Humanities and Social SciencesMulti-player gaming
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Translation
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Machine Translation Google
Statistical Machine TranslationModel translation process with a statistical modelLearning from data monolingual amp bilingual
More data better translation qualityComputationally expensive approach
Models have many hundreds of Gigabyte of data(Moores law helps here)
Applying syntax information as a signal
ResultsMuch better translation qualityOngoing progress
More research groups 58 languages (so far)
recently Haitian Creole Urdu Georgian Latin
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Grand Challenges
Morphologytranslating into morphologically rich languageseg Russian Hungarianneed morphology-aware translation models
Reliabilitysome translation mistakes more severe than others
hotel - MontrealHeath Ledger - Tom Cruise
Research How to detect crazy translations
Long-distance reorderingsimple case SVO SOV(one) approach parse source amp reorder
issue parsing accuracy for out-of-domain texts
Finding all Training Data
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
How about Poetry
Paper at EMNLP 2010 conferenceldquoPoeticrdquo Statistical Machine Translation Rhyme and Meter DGenzel JUszkoreit FOch EMNLP 2010
ApproachEnforce meter and rhyme as extra constraints(similar to language model)Eg iambic pentameter stress pattern 0101010101Produce most probable translation that obeys constraints(Function follows form)
Example output (couplet in amphibrachic tetrameterAn officer stated that three were arrestedand that the equipment is currently tested
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Speech
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Goals for Speech Technology at Google
Much of the worldrsquos information is spoken ndash we need to recognize it before we can organize it
YouTube transcription and translation (breaking the language barrier for YouTube access)
Voicemail transcription Mobile is the fastest growing and most widespread platform for communication and services that has ever existed
Spoken input and output is key to usability
Our goal is completely ubiquitous availability of speech io (every applicationservice every usage scenario every language)
How do we get thereDelivery from the cloud ndash support constant iteration and refinement
Operating at large scale ndash train huge statistical models on huge amounts of data
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Learning from use - without human transcription
ChallengesHow do we grow the model to take advantage of the data (richer models of accent speaker noise etc)Huge computational demandsInfrastructure demands ndash parallelization ndash leverage Google software environment
Training Acoustic Models wUnsupervised Learning
Supervised vs unsupervised training - hours of
data vs error rate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Vision
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Computer Vision
Advance state-of-the art in 3 key areas of imageaudiovideo analysis and apply results to our multimedia products
Semantic Interpretation Generate human understandable description of content (eg auto-tagging videos on YouTube Image annotation porn classification etc)Matching Find similar entities from a large corpus (eg find similar on image search video fingerprinting for YouTube etc )Synthesis Generate better imagesvideo by understanding the statistics of a large corpus of images (eg better facades in 3D building on Google Earth automatic shadow removal from areal images etc)
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Semantic Interpretation sample problem - Video Annotation
Video metadata has a cognitive cost on the user because they have to type it in be careful about what keywords they use and in general try to make their video searchableMany uploaders donrsquot have the motivation or energy to provide proper metadataNoisy metadata hurts everyone ndash spam misspellings 1337 acronyms etc
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Cloud-based ComputingStructured Data
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Structured Data on the Web
Discovery and search for structured dataThe deep Web -- significant gap in coverageStructured tables on the Web -- not leveraged in search
Enable easy creation management sharing and publishing of structured data
Fusion Tables wwwgooglecomfusiontables
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Google Fusion Tables host manage collaborate on visualize and publish data tables online
What can I do with Fusion Tables
Host data online - and stay in controlcontrol can be at the level of columns or rows
Re-use data without making copies
Collaborate on the detailsMerge data from multiple tablesComment on individual rows columns or cells
Make a map (or chart or timeline) in minutes
Manage data via our site or an API
Fusion Table Example Gallery
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Easy Data Upload Attribution recorded
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Easily Create Informative Maps
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Easily Create Informative Maps
baby steps towards the dream platform
DEMOcircle of blue
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Cloud-based ComputingPrediction
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
1 Upload
2 Train
Upload your training data toGoogle Storage
Build a model from your data
Make new predictions3 Predict
Machine learning as a web serviceSmart Apps for every developer
- RESTful HTTP service- Simple integration
Prediction API
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Under the hood many classifiersregressorsRecent research efficient and theoretically principled methods for distributed learning (NIPS-09 HLT-10)
Network costs can be reduced by an order of magnitude with minimal loss in classifier accuracy
Under the API
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Operations Reseach and Optimization
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Operations Research ChallengesSize Optimization is often NP Complete
Increasing the size by 1 doubles the search spaceThe tools are barely keeping up with the problems
Uncertainty Data is often fuzzy How do you route cars when there are roadblocks new one-ways traffic jamsCan you use optimization in on-line algorithm connected to usersHow well can you optimize against forecasted data how do you react if the forecast is bad
User expectation and requirementsThe definition of problems is also unclear What is the objective What is a good solution Can I violate this requirement By how much
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Operations Research Opportunities
Machine Learning can help us in two waysBy providing guidance towards good solutionsBy qualifying valid solutionsBy reducing the search space
Large computing resources means we can try a bit harderCrowd-Sourcing means better data better feedback better evaluations of algorithms and solutionsHaving all our code open-source means we can collaborate on building the best set of tools
See httpcodegooglecompor-tools
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Semantic Processing
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Web Inference and LearningGoal better understanding of Web content and user intentMethod algorithms that draw reliable semantic inferences from the wealth of evidence implicit in massive Web data
How to interpret this term in this context
Does this sentence answer that question
Will this user click on that ad
Learning create concise representations to support good inferences
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Meaning from the Web
Elementary semantic inference what are the possible classes for each instance
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Combination wins
Combined graph14M nodes75M edges
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Applications to Society Follow
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Earth Engine
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Motivation - Carbon Forest Tracking
UNEP Atlas of our Changing Environment
1975 1989 2001
Rondonia Brazil
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
A Sampling of Other Use Cases
Disease Early Warning Remote surveillance of disease and prediction of epidemics
Population Census Supplements traditional census and mapping in developing regions
Humanitarian Crisis Mapping Can detect and monitor a growing range of crisis typesWater Resources Monitor water quality and availability and alleviate water shortage problemsFood Security Famine early warning rainfall and water requirements estimations agr production estimates and irrigation and fertilizer supply amp demandGlobal Education Programs
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Parallel Geo-Processing ldquoin the cloudrdquo
(a brief illustration)
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Original image
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Original image is divided into 256px sub-units
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Sub-units are distributed
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Sub-units are distributed to separate machines
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Sub-units are distributed to separate machines where they can be processed in parallel
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Thousands can be processed simultaneously
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Result is reassembled
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Result is reassembled into a finished image
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Global-scale earth observation and informatics platformFor public benefit and to support emerging green economyHelp science come out of research lab and into operational use at scaleUnprecedented catalog of earth observation data for mining and analysisPromote transparency reproducibility collaboration ldquoopen sciencerdquo
Very fast computation of scientific map productsIntrinsically-parallel pixel processing systemBuilt-in Google algorithms as well as user-suppliedEarth Engine API for 3rd party algorithm developmentAccess control versioning provenanceOnline and desktop versions (open source desktop version)
On a lot of useful dataEvery available Landsat and MODIS scene (more satellites coming)Commercial datasets (very high resolution satellite imagery)Environmental data (atmospheric ocean terrestrial)User-supplied (ex in-situ data collected via Android phones)
Overview
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Scale of Data
US Satellite imagery dataset LandsatPeru 60 Landsat scenes (3Gpix 20GB) per coveringWorld 8000 Landsat scenes (2TB) per coveringComplete global coverage every 16 daysOperating since 1972 historical archive holds ~4PBUS NASA EOS approaching ~10PB of Earth images
Europe European Space Agency (ESA)ESA satellite missions MERIS Envisat othersSpotImage (France) 20M SPOT images since 1986 10000 new images collected daily 5+ PB archiveESA Launching Sentinel-1 in 2011
Representative Examples
Envisat Gulf Oil Spill June 2010 (ESA))MERIS Hurricane Isabel Sept 2003 (ESA) Spot Image Xingu Brazilian Amazon
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
View on YouTube
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Health (US)
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Google Health personal health dashboard
Launched in May 2008
Major update Sept 2010
User controls and owns hisher data
A platform thathellipProvides a dashboard for wellness information amp medical recordsAllows user to connect and interact with a broad group of ldquoadd onrdquo servicesIncludes a non-tethered PHR
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Crisis Response
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Person Finder
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Pre-Earthquake - Aug 26 2009
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
1 Day After Earthquake - Jan 13 2010
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
13 Days After Earthquake - Jan 25 2010
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Digital Humanities
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Illuminating the Humanities
Q What can you do with12 million books inover 400 languagescomprised of 5 billion pages and 2 trillion wordsall digitized
A Look to the humanities for new questionsHow would you (re)define Victorian literature
What are the differences between the English and Latin editions of Hobbesrsquo Leviathan
How have places changed over the course of history
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Digital Humanities Awards
Research program supporting university research taking a computational approach to traditional humanist questions US program Summer 2010
12 projects23 researchers15 universities
European program Winter 2010
10 projects planned $1M total funding
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Education
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Curriculum Development
Seeding and supporting computing curriculum developmentExploring computational thinking in K12 (launch late Oct)CS4HS High school computer science (cs4hscom)Undergraduate open source CS curriculum Google Code University (codegooglecomedu)Lantern platform Wiki for open source curriculum development (in collaboration with Khan Academy)
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Talent Development Google Summer of CodeTM
Program GenesisrdquoFlip bits not burgersrdquo during summer holidaysExposure to real-world software development
Students paired with mentor from OS communityExecute to milestones laid out in accepted applicationStipend allows students to concentrate on OS development
20101026 students150 organizations 69 countries
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Technology Leadership App Inventor for Android
Visual programming environment for Android mobile devicesHelping people become creators (rather than consumers) of technologyLaunched in Google Labs July 12 2010
httpappinventorgooglelabscomabout
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
CS4HS European WorkshopsEacutecole Polytechnique Feacutedeacuterale de Lausanne (EPFL) Switzerland Building and Programming Robots
ETH Zurich Switzerland ABZ Ausbildung - und Beratungszentrum fuer Informatikunterricht
Makerere University Uganda Grassroot approach to improve the quality of applicants of computing programs at Makerere University
Manchester University United Kingdom Animation11
Oslo University Norway TENK
Queen Mary University United Kingdom cs4fn magazine
RWTH Aachen Germany Bright Brains in Computer Science
Sapienza University of Rome Italy Challenge and Fun with the CS Olympiads
University of Stuttgart Germany UniS2010
Technion Israel High School Computer Science Female Students Visits in Google Impressions Conceptions and Influences
Trinity College Dublin Ireland Computer Programming Outreach B2C
University of Cape Town South Africa Project Umonya
University College Dublin Ireland CS Summer School
University of Warsaw Poland Mastering Programing Skills Workshops for Teachers
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Technology Leadership Google Code University
Course content on current computing technologies and paradigmsCS Curriculum SearchTech Talks on CS TopicsTutorials lecture slides problem sets for a variety of topic areas
AJAX ProgrammingAlgorithmsDistributed SystemsWeb SecurityLanguagesPractical Skills (MySQL Linux)
httpcodegooglecomedu
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Supporting our Academic Institutions
Research Awards Programs - 230+ projects funded in the last year
Next Due dates August 15 (CS Awards) October 15 (Marketing Awards)Research-awardsgooglecomFocused Grant Program
Visiting Faculty Program - 20 faculty (ongoing)University-relationsgooglecom
PhD Fellowship Program2009 13 students supported in North America2010 15 in North America 16 outside North AmericaOver 150 other scholarships
~1000 interns worldwideCS4HS 1500+ teachers (~100000 students) US amp EMEA
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate
Final Thoughts
Scale of Communication and Computing is profoundEndless opportunity for technical growth
Some large themesMajor new application domains
Google rapidly to innovate in sciencetechnology and value to consumersWe are providing increased support for academic institutions in computer science and related areas
Its a most exciting area in which to innovate