Lecture/Studio:Culturomics
Prof. AlvaradoMDST 3703/77031 November 2012
Business
• Everyone’s families and friends OK?
Review
• The New Epistemology– Rise of Big Data: massive, available, social– Shifts our relationship to primary sources– From reading to quantitative methods and
visualizations– Example of media determinism
• Manovich– Consistent with database logic– Applies spirit of Big Data methods to art
Review
• Rationalization Effects– What are we looking at?– What is theory?– What are models?– What is culture?– What are the humanities?
Overview
• Combined Studio and Lecture• Lecture– Google’s NGram Viewer– Culturomics
• Studio: – Collaborative Topic Index
Google Does the Humanities
Google NGrams
• Google Books comprises 11% of the corpus of published books, about 2 trillion words
• NGrams uses 5.2 million books (4% of the corpus)
• 500 billion words• Published between 1500-1800• In English, French, Spanish, German, Chinese
and Russian (Hebrew too)
Erez Lieberman Aiden and Jean-Baptiste Michel
What’s an NGram?
A space-delimited string
N = number of strings
Case sensitivePurely syntactic
Very hard to index
Culturomics
• A method more than a model (like Anderson argues)
• Analogy is to genomics– Does this make sense? – What is the analog to the gene?
Parallel
Crossing
Convergent/Divergent
American
British
“There’s not even a historian of the book connected to the project,” Mr. Menand noted.
Anthony Grafton, History, Princeton
Studio
• We are now at the point where we have all the pieces in place– HTML markup, CSS, JavaScript– Structured data (table in Google Docs)– Visualization tools
• Create Character Index– We will use everything we have done so far – notes,
network visualizations, etc.– Today we begin to collaboratively create the Character
Index (a subset of a full topic index)