Mdst3703 culturomics-2012-11-01

Preview:

DESCRIPTION

 

Citation preview

Lecture/Studio:Culturomics

Prof. AlvaradoMDST 3703/77031 November 2012

Business

• Everyone’s families and friends OK?

Review

• The New Epistemology– Rise of Big Data: massive, available, social– Shifts our relationship to primary sources– From reading to quantitative methods and

visualizations– Example of media determinism

• Manovich– Consistent with database logic– Applies spirit of Big Data methods to art

Review

• Rationalization Effects– What are we looking at?– What is theory?– What are models?– What is culture?– What are the humanities?

Overview

• Combined Studio and Lecture• Lecture– Google’s NGram Viewer– Culturomics

• Studio: – Collaborative Topic Index

Google Does the Humanities

Google NGrams

• Google Books comprises 11% of the corpus of published books, about 2 trillion words

• NGrams uses 5.2 million books (4% of the corpus)

• 500 billion words• Published between 1500-1800• In English, French, Spanish, German, Chinese

and Russian (Hebrew too)

Erez Lieberman Aiden and Jean-Baptiste Michel

What’s an NGram?

A space-delimited string

N = number of strings

Case sensitivePurely syntactic

Very hard to index

Culturomics

• A method more than a model (like Anderson argues)

• Analogy is to genomics– Does this make sense? – What is the analog to the gene?

Parallel

Crossing

Convergent/Divergent

American

British

“There’s not even a historian of the book connected to the project,” Mr. Menand noted.

Anthony Grafton, History, Princeton

Studio

• We are now at the point where we have all the pieces in place– HTML markup, CSS, JavaScript– Structured data (table in Google Docs)– Visualization tools

• Create Character Index– We will use everything we have done so far – notes,

network visualizations, etc.– Today we begin to collaboratively create the Character

Index (a subset of a full topic index)

Recommended