Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
BuildingDataAcumen
NicoleLazar,UniversityofGeorgiaProfessor,DepartmentofSta=s=cs
MladenVouk,NorthCarolinaStateUniversityDis=nguishedProfessorofComputerScience,
AssociateViceChancellorforResearchDevelopmentandAdministra=on
2
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
BuildingDataAcumen
NicoleLazar,UniversityofGeorgiaProfessor,DepartmentofSta=s=cs
CapstoneCourses
3
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
HistoryofCapstoneattheUniversityofGeorgia
Ø FirstofferedinAY2007–2008,10students
Ø PartofUGA’sWriDngIntensiveProgramsinceAY2008–2009Ø RequiredfromAY2010–2011forallstaDsDcsmajors
Ø PostersessionintroducedinAY2010–2011
Ø Enrollmenthasgrownsteadily,trackinggrowthofmajor;
46studentsincurrentoffering
4
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
GoalsofCapstoneattheUniversityofGeorgia
Ø ExposuretoadvancedstaDsDcaltechniques
Ø PracDcecommunicaDonofstaDsDcalideas,inwriDngandorallyØ Groupwork
Ø VerDcalintegraDonoflearning
(faculty→graduateassistants→students)
Ø ConsulDng/workwithclient
Ø Workwithrealdata
Ø Professionaldevelopment
Withgrowth,challengingtomaintainallofthese.Currentoffering:relaxingclientandgroupwork.
5
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
CapstoneattheUniversityofGeorgia:ProjectFormats
Ø Typically,findprojectsfromresearchersaroundcampus;scopeandscalehaveincreaseddramaDcallyoverDme
Ø Alsoprojectsfromgovernmentoffices(e.g.,USDA,CDC)
Ø Whenclasssizewassmaller,community-basedsurveyprojectsincorporated(e.g.,Nuci’sSpace,CampusTransit)
Ø Currentoffering:
• experimentalindividualandgroup(“datarepository”)opDons
• morepublicoutreachprojects(HumaneSociety,UnitedWay)
6
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
EssenFalComponentsforBuildingDataAcumen:TheData
Ø Movingbeyondt-testsandANOVA(topicscoveredhaveincludedbootstrap,classificaDonandregressiontrees,survivalanalysis,mulDpletesDng,issuesofreproducibilityinscience...)
Ø Hands-onpracDcebeyondtheproject
Ø Realdata(allprojectsinvolvereal,okenlarge,datasets)
Ø Messydata(studentsarenotguaranteedtoreceivecleandatasetsfromclients)
7
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
EssenFalComponentsforBuildingDataAcumen:TheStudent
Ø BuildingawarenessofaprofessionalidenDtyasstaDsDciansand(morerecently)datascienDsts
Ø CommunicaDon!MuchresistanceonthisattheDme;benefitsareseenlater(ingraduateschool,intheworkforce)
Ø Buildingasenseofcommunityamongstudentsandinstructors,TAs(howtoscaleupfrom10to~50students?)
Ø Challengestudentstogoaboveandbeyondtheirintellectualcomfortzones
8
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
LookingtotheFuture
Ø ConflicDngdirecDons
• CanexpectconDnuedgrowthinstaDsDcsanddatascienceprograms
• Hands-onpracDcalexperiencewithrealdataisessenDal
Ø HowtoscaleandsDllprovideausefulexperiencetostudents?
Ø NeedforinnovaDveapproaches(e.g.team“compeDDons”,smallerworkshopgroups...others?)
Ø Anysuchcoursewillbeaconstantworkinprogress:don’tfallintoarouDneof“We’vealwaysdoneitthisway”–thatisnolongersufficient
9
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
HowDoWe(CanWe)KnowIt’sWorking?
Ø Internalassessments:Qualityofprojects;clientsaDsfacDon(repeatparDcipants);postersessionchecklist(completedbyaqendees–staDsDcsfacultyandgraduatestudents,clients)
Ø Feedbackfromgraduates:What’susefulwhentheygettothenextstage?
Ø Whatareemployersandgraduateschoolslookingfor?
10
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
BuildingDataAcumen
NicoleLazar,UniversityofGeorgiaProfessor,DepartmentofSta=s=cs
CapstoneCourses
Q&A
11
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
BuildingDataAcumen
MladenVouk,NorthCarolinaStateUniversityDis=nguishedProfessorofComputerScience,
AssociateViceChancellorforResearchDevelopmentandAdministra=on
NCStateUniversityDataScienceIniFaFve
(dis.ncsu.edu)
12
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
Context
• Understanding,managing,andusingdata—okenlargeamountsofunstructureddata—isbecomingincreasinglyimportantinnearlyeveryindustry,governmentsector,andacademicdomain.
• NothavingtheskillsandinfrastructuretoapplydatascienceandanalyDcsreliablyandcorrectlyhasbecomeamajorriskforallsectors.
13
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
Data(and)ScienceLiteracyAgooddatascienDstdoesnothavetobeacomputerscienDst,amathemaDcian,orastaDsDcian.
But….
14
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
EducaFon?
• Whatshouldbeincludedindatasciencecurriculum,bothnowandinthefuture?
• HowtoprioriDzeorbestconveyfordifferingtypesofdata(scienceprograms)?
• HowcanopportuniDestoenhancedataacumen(i.e.,theabilitytomakegoodjudgmentsanddecisionswithdata)beintegratedintodatascienceeducaDonalprograms?
• Howcandataacumenbemeasuredorevaluated?
• etc.
15
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
DisrupFve?
Largesttaxicompaniesmaynotowntaxis?
(e.g.,Uber)
LargestaccommodaDonprovidersmaynot
ownanyaccomodaDons?(AirBnB)
etc.
16
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
TheFiveV-sof(Big)Data*
Veracity(Uncertainty,Trust,Source,InformaDon
Density,Intent,…)
Velocity(Howfastis
dataarriving?)
Value(Economics,
Health,Security,…) Variety
(RangeofData-text,
visual,numerical,locaDon…)
Volume(Howmuchdataisarriving?)
Howmuchisneeded?Kb/secMb/secGb/secTb/sec
MB,6GB,9TB,12PB,15XB,18ZB,21YB,24
Errors:EpistemicAleatoricWorkingwithanAI
RealLifeDataTypes
(*)www.ibmbigdatahub.com/infographic/four-vs-big-data
17
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
ValuePath
Wisdom&ValueAdded,Impact
Knowledge&
Valaue
(Big)Data
DataScience&AnalyDcs
ApplicaDons
Data
People
Technology
Provenance,Compliance,Security,Usability,Privacy,EthicsReliability,Trustworthiness,SenseofImpact…
e.g.,Extract,Verify,Transform,Load,Organize,…
e.g.curaDon
18
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
DataScienceWorkThatMaYersImaginebeingabletoholdageographicinformaDonsystem(GIS)inyourhands,feeltheshapeoftheearth,sculptitstopography,anddirecttheflowofwater.ResearchersatNCState’sCenterforGeospaDalAnalyDcshavemadethisnovelideaarealitywithTangibleLandscape,anopensourcetangibleinterfacepoweredbyGRASSGISthatphysicallyandinteracDvelymanifestsgeospaDaldatasothatuserscannaturallyfeelit,shapeit,andimmediatelyseeresultsprojectedontothe3-Dmodel.ThismakesGISfarmoreintuiDveandaccessibleforbeginners,empowersgeospaDalexperts,andcreatesexciDngnewopportuniDesfordatascienDstsanddevelopersalike–likegamingwithGIS.TangibleLandscapeisnowbeingappliedtotacklecomplexrealworldproblems,fromcontrollingthespreadofwildfireoremerginginfecDousdiseasestounderstandingtheimpactsofstormhazards.
19
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
CiDzenScienDstsarecomparingnearly300,000satelliteimageswiththeseexamplestohelpimprovetheglobalrecordofhurricanesandtropicalcyclones.Seehowyoucanhelpimproveourunderstandingoftropicalcyclones.Crowd-sourcing.
hqps://ncics.org/events/cics-nc-leads-the-launch-of-cyclonecenter-org/
CycloneCenter.org
20
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
Technology
Data People
Provenance,Compliance,Security,Usability,Privacy,Etc.…
EducaFon
TechnologyDistributedHPC/HPDCapableCloud
DataBus
Tools+ApplicaDons
DataInteracDonModels:• TotalIsolaDon• Compute-to-Data• Data-to-Compute• Openmodel
DataandAnalyDcsLiteracy• ComputaDonalThinking• Math,modeling,sokware,
datamanagement,methods• CommunicaDons• Domainview&relevancy• Ethics,senseofIntegrity&
ConfidenDality(security)
Users/Actors• Naïve2advanceuser• Concierge• Content/Appdeveloper• Datamanagerand
Admin• Toolsandsystemdev.
Framework
21
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
HuntLibrary
CreaDvityLab-Hunt
LASMersiveExperienceEB2VisualizaDonlab-Hunt
VizLab–D.H.Hill
GamesRoom-Hunt
NCStatehasExtensiveVisualizaFonFaciliFes
22
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
NCStateUniversityDataScienceIniFaFveGoals• CoordinatedatascienceacDviDes
• EstablishaninterdisciplinarydatasciencecurriculumtofurtherdatascienceeducaDon
• FosterresearchcollaboraDon,bothinternalandexternal
• IncreaseresearchfundingandcompeDDveness
• Buildindustrypartnerships
• Provideservices&infrastructuretofaculty
• Raisevisibility&increasereputaDon
• …
Ins$tu$onalizeDataScience
23
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
AnalyFcs&DataSciencePrograms
• Undergraduate- GeneralEducaDonthemaDctrack- Co-taughtComputerScience/StaDsDcsundergraduateelecDves- PooleCollegeofManagement(PCOM)Undergraduate15-hourData
AnalyDcsHonorsProgram- SomeoftheExecuDveEducaDonand“DataMaqers”offerings
• Graduate- InsDtuteforAdvancedAnalyDcs(IAA)–ProfessionalMS
(analyDcs.ncsu.edu)- DataScienceGraduateCerDficateandMS(CSC/Stat)- CSCandStatgraduatetracksinDataScience- PooleCollegeofManagement–DigitalAnalyDcsCerDficatewithinMBA
program,CSC/Stat/PCOMexecuDveeducaDonforcompanies- DataMaqerscourses(hqps://research.ncsu.edu/dsi/data-maqers/)
• Other…(e.g.,Libraryofferings)
24
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
ForExample:
25
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
NCSUEnd-to-EndDataScienceCurriculum*
Domain-Focused Data Science (security, supply chain, economics, ...)
Story Telling and Visualization
End-to-end Data Science
Statistics Machine Learning
Applied Machine Learning
Foundation: Algebra, Calculus, Optimization, and Probability
Pro
gra
mm
ing
En
viro
nm
en
ts, P
latf
orm
s,
an
d P
rod
uc
tivi
ty T
oo
ls
Dat
a M
an
ag
em
en
t, W
ran
glin
g
Big
Dat
a
Insight Generation, Decision Science and Leadership
26
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
DataScience(DS)MasteryLevels• Core(C):
o AbletomasterindividualcoreconceptswithintheBloom’staxonomy*:Knowledge,Comprehension,ApplicaDon,Analysis,EvaluaDon,andSynthesis
o AbletoadaptpreviouslyseensoluDonstodatascienceproblemsfortargetdomain-focusedapplicaDonsuDlizingthesecoreconcepts
• IntermediateElecDves(I):o AbletosynthesizemulDpleconceptstosolve,evaluate,andvalidatethe
proposeddatascienceproblemfromtheend-to-endperspecDveo AbletoidenDfyandproperlyapplythetextbook-leveltechniquessuitablefor
solvingeachpartofthecomplexdatascienceproblempipeline• AdvancedElecDves(A):
o Abletoformulatenewdomain-targeteddatascienceproblems,jusDfytheirbusinessvalue,andmakedata-guidedacDonabledecisions
o Abletoresearchthecu|ngedgetechnologies,comparethemandcreatetheopDmalonesforsolvingDStheproblemsathand
o Abletoleadasmallteamworkingontheend-to-endexecuDonoftheDSproject
(*)e.g.,hqps://www.csun.edu/science/ref/reasoning/quesDons_blooms/blooms.html
27
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
CoreCurriculumCoursesFoundaDons:
• MatrixAlgebra• Calculus• OpDmizaDon• Probability
DataScienceMethods:• StaDsDcs• MachineLearningandDataMining• AlgorithmsandDataStructures
DecisionMaking• Data-guidedDecisionMaking• VisualizaDon,VisualDataExploraDon,andStoryTelling
InfrastructureandProgrammingEnvironments• ScripDngLanguagesforDS:e.g.,R,Python• DatabaseManagementandOpDmizaDon
28
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
AdvancedElecFveCurriculumCoursesDataScienceMethods
• DeepLearning• BayesianReasoningandProbabilisDcGraphicalModels• ProcessMining
DataScienceApplicaDons:• NaturalLanguageProcessingandTextAnalyDcs• GraphDataMining• SocialNetworkAnalyDcs• DataStreamAnalyDcs• SenDmentAnalyDcsandRecommendaDonSystems• SupplyChainDataAnalyDcs• MarkeDngandFinanceDataAnalyDcs
DecisionMaking• DiscreteEventSimulaDonandProcessModelingandControlInfrastructureand
ProgrammingEnvironments• BigDataMiddleware:Hadoop,Spark,GraphDB,etc• ParallelProgramming(e.g.,withSpark,MPI,etc.)• IoT
29
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
ClosingThoughtsHow could these components be prioritized or best conveyed for differing types of data science programs? • Conference on Undergraduate Research in Data Science • Senior Design Projects on Data Science • Internship/job Opportunities in partnership with Data Science and other Industries • Annual Data Science Hackathons • Data Science Competitions How can opportunities to enhance data acumen (i.e., the ability to make good judgments & decisions with data) be integrated into DS educational programs? • Industry Surveys • Foundations and Methods: High Priority • Is data science introduced as an academic enrichment to the existing university curriculum?
- e.g., Data Science Concentration within the Computer Science Curriculum How can data acumen be measured or evaluated? • Individual Course Level Capstone Projects • Senior Design Project in collaboration with Industry Partners • Standardized Placement Tests for different Levels of the DS ladder • Data Science Competition and Contests: perhaps developed in collaboration with industry • Data Science and DS-related Job Placement Statistics: collected and monitored
30
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
Acknowledgments• I would like to thank a number of my colleagues at NC
State, UNC-Charlotte, UNC-Chapel Hill and RENCI for discussion and direct or indirect input and contributions as summarized in this presentation. This includes:
Alyson Wilson, Nagiza Samatova, Rada Chirkova, Patrick Dreher, Jamie Roseborough, Michael Rappa, Christopher Healey, Dan McGurrin, Trey Overman, Stan Ahalt, Andrew Wilson, Mirsad Hadjikadic, Ashok Krishnamurthy, Raju Varsavai, Otis Brown, Mike Kowolenko, Andy Rindos.
• Support for some of the described activities comes in part from NC State University, UNC General Administration, State of North Carolina, a number of USA federal agencies and a number of industrial partners.
31
Provideinputandlearnmoreaboutthestudyatwww.nas.edu/EnvisioningDS
BuildingDataAcumen–Q&A
NicoleLazar,UniversityofGeorgiaProfessor,DepartmentofSta=s=cs
MladenVouk,NorthCarolinaStateUniversityDis=nguishedProfessorofComputerScience,
AssociateViceChancellorforResearchDevelopmentandAdministra=on
32
9/12/17–BuildingDataAcumen9/19/17–IncorporaDngReal-WorldApplicaDons
9/26/17–FacultyTrainingandCurriculumDevelopment
10/3/17–CommunicaDonSkillsandTeamwork
10/10/17–Inter-DepartmentalCollaboraDonandInsDtuDonalOrganizaDon
10/17/17–Ethics10/24/17–AssessmentandEvaluaDonforDataSciencePrograms
11/7/17–Diversity,Inclusion,andIncreasingParDcipaDon
11/14/17–Two-YearCollegesandInsDtuDonalPartnerships
Provideinputandlearnmore
aboutthestudyatwww.nas.edu/EnvisioningDS
33