64
DMDH Winter 2015 Session #1: Exploring Programming in the Digital Humanities

Dmdh winter 2015 session #1

Embed Size (px)

Citation preview

Page 1: Dmdh winter 2015 session #1

DMDH Winter 2015 Session #1:Exploring Programming in the Digital Humanities

Page 2: Dmdh winter 2015 session #1

Programming is complex enough that just figuring out what you want to do and

what sort of language you need is work.

Page 3: Dmdh winter 2015 session #1

Thinking that you ought to be able to do everything almost immediately is a recipe for

feeling terrible.

Page 4: Dmdh winter 2015 session #1

Being aware that it is genuine work, and not just work for newbies,

matters.

Page 5: Dmdh winter 2015 session #1

There will always be new programs and

platforms that you will want to experiment

with.

Page 6: Dmdh winter 2015 session #1

Working with technology means

periodically starting from scratch -- a bit like working with a new time period or

culture; or figuring out how to teach a new

class.

Page 7: Dmdh winter 2015 session #1

What can programming languages do?

Page 8: Dmdh winter 2015 session #1

Programming languages can...

Page 9: Dmdh winter 2015 session #1

They can also do all these things in combination.

Page 10: Dmdh winter 2015 session #1

Example #1• find all the statements in quotes ("")

from a novel.

• count how many words are in each statement

• put the statements in order from smallest amount of words to largest

•write all the statements from the novel in a text file

Page 11: Dmdh winter 2015 session #1

Example #2• allow a user to type in some information,

i.e., "Benedict Cumberbatch"

• compare “Benedict Cumberbatch” to a much larger file

• retrieve any data that matches the information

• print the retrieved information on screen

Page 12: Dmdh winter 2015 session #1

Example #3• "read" two texts -- say, two plays by

Seneca

• search for any words that the two plays have in common

• print the words that they have in common on screen

• calculate what percentage of the words in each play are shared

• print that percentage onscreen

Page 13: Dmdh winter 2015 session #1

Example #4•if the user is located in geographic

location Z, i.e., 45th and University, go to an online address and retrieve some text

•print that text on the user’s tablet screen

•receive input from the user and respond

Page 14: Dmdh winter 2015 session #1

However...• In Example #1, the computer is focusing

on things that characters say. But what if you want to isolate speeches from just one character?

• In Example 2, how does the computer know how much text to print? Will it just print "Benedict Cumberbatch" 379 times, because that's how often it appears in the larger file?

Page 15: Dmdh winter 2015 session #1

These are the areas of programming where critical thinking and

humanities skills become vital.

Page 16: Dmdh winter 2015 session #1

The Difference•Humans are good at differentiating

between material in complex and sophisticated ways.

•Computers are good at not differentiating between material unless they’ve been specifically instructed to do so.

Page 17: Dmdh winter 2015 session #1

Computers work with data.

You work with data, too -- but in most

cases, you'll have to make your data

readable by computer.

Page 18: Dmdh winter 2015 session #1

How to make your data machine-readable

•Annotate it with markup language

•Organize it in patterns that the computer can understand

•Add data that is not explicitly readable in the current format (i.e., hardbound/softbound binding; language:English; date of record creation)

Page 19: Dmdh winter 2015 session #1

Depending on the data you have, and the way

you annotate or structure it, different

things become possible.

Page 20: Dmdh winter 2015 session #1

For instance, sometimes it may be

enough to know that a tile is 9” sq. But

sometimes you need to know that it is 3” x

3”.

Page 21: Dmdh winter 2015 session #1

Your goal is to make the data As Simple As Possible -- but not so simple that it

stops being useful.

Page 22: Dmdh winter 2015 session #1

Depending on the data you work with, the

work of structuring or annotating becomes

more challenging, but also more useful.

Page 23: Dmdh winter 2015 session #1

The work of creating data is social.

Page 24: Dmdh winter 2015 session #1

In other words, how can others use it?

Page 25: Dmdh winter 2015 session #1

Many programming languages have governing bodies that establish standards for their

use:

•the World Wide Web (W3C) Consortium (http://www.w3.org/standards/)•the TEI Technical Council

Page 26: Dmdh winter 2015 session #1

BREAK!

Page 27: Dmdh winter 2015 session #1

Data Examples

•Annotated (Markup Languages: HTML, TEI)

•Structured (MySQL)

•Combination (Semantic Web)

Page 28: Dmdh winter 2015 session #1

Markup: HTML

<i> This text is italic.</i> =

This text is italic.

Page 29: Dmdh winter 2015 session #1

Markup: HTML

<a href=“http://www.dmdh.org”>This text</a> will take you to a webpage.

=This text will take you to a webpage.

Page 30: Dmdh winter 2015 session #1

Markup: HTML

Anything can be data -- and markup languages provide instructions for how

computers should treat that data.

Page 31: Dmdh winter 2015 session #1

Markup: HTMLHTML is a display language used to format text on

webpages.

<p> separates text into paragraphs.

<em> makes text bold (emphasized).

These are just a few of the HTML formatting instructions that you can use.

Page 32: Dmdh winter 2015 session #1

HTML Syntax Rules

•Open and closed tags: <> and </>•Attributes (2nd-level information) defined using =“”•Comments: <!-- -->

Page 33: Dmdh winter 2015 session #1

Markup languages are popular in digital

humanities because lots of humanists work

with texts.

Page 34: Dmdh winter 2015 session #1

Without markup languages, the things that a computer can

search for are limited.

Page 35: Dmdh winter 2015 session #1

Ctrl + F: any text in iambic pentameter.

Page 36: Dmdh winter 2015 session #1

With markup, the things you can

search for are only limited by your interpretation.

Markup: TEI

Page 37: Dmdh winter 2015 session #1

TEI(Text Encoding

Initiative)

Markup: TEI

Page 38: Dmdh winter 2015 session #1

Poetry w/ TEI<text xmlns="http://www.tei-c.org/ns/1.0" xml:id="d1">

<body xml:id="d2"><div1 type="book" xml:id="d3">

<head>Songs of Innocence</head><pb n="4"/><div2 type="poem" xml:id="d4">

<head>Introduction</head><lg type="stanza">

<l>Piping down the valleys wild, </l><l>Piping songs of pleasant glee, </l><l>On a cloud I saw a child, </l><l>And he laughing said to me: </l>

</lg>

Page 39: Dmdh winter 2015 session #1

Grammar w/ TEI<entry> <form> <orth>pamplemousse</orth> </form> <gramGrp> <gram type="pos">noun</gram> <gram type="gen">masculine</gram> </gramGrp></entry>

Page 40: Dmdh winter 2015 session #1

TEI’s syntax rules are identical to HTML’s -- though your normal browser can’t work with TEI the way it works with HTML.

Page 41: Dmdh winter 2015 session #1

TEI is meant to be a highly social language

-- meaning that the committee who

maintains its standards want it to be something that anyone

can use.

Page 42: Dmdh winter 2015 session #1

In order for TEI to successfully encode texts, it has to be

adaptable to individual projects.

Page 43: Dmdh winter 2015 session #1

Anything that you can isolate (and put in brackets) can (theoretically) then be manipulated to serve your

project.

Page 44: Dmdh winter 2015 session #1

TEI can be used to encode more than just text:

<div type="shot">  <view>BBC World symbol</view>  <sp>   <speaker>Voice Over</speaker>   <p>Monty Python's Flying Circus tonight comes to you live     from the Grillomat Snack Bar, Paignton.</p>

 </sp></div><div type="shot">  <view>Interior of a nasty snack bar. Customers around, preferably   real people. Linkman sitting at one of the plastic tables.</view>

 <sp>   <speaker>Linkman</speaker>    <p>Hello to you live from the Grillomat Snack Bar.</p>  </sp></div>

Page 45: Dmdh winter 2015 session #1

Or, you could encode all Stephenie Meyer’s Twilight according to its emotional register.

Page 46: Dmdh winter 2015 session #1

Whether you include or exclude some

aspect of the text in your markup can be very important from

an academic perspective.

Page 47: Dmdh winter 2015 session #1

The challenge of creating good data is

one reason that collaboration is so important to digital

scholarship.

Page 48: Dmdh winter 2015 session #1

Data Collaboration

•Avoid reinventing the wheel (has the markup for this text already been done?)

•Consider the labor involved vs. the outcome (and future use of the data you create.)

Page 49: Dmdh winter 2015 session #1

Structured Data

Page 50: Dmdh winter 2015 session #1

Study Scenario #1

•You study urban espresso stands: their hours, brands of coffee, whether or not they sell pastries, and how far the espresso stands are from major roadways.

Page 51: Dmdh winter 2015 session #1

What Types of Data?

•Binary (pastries: y/n)

•Unordered (hours; coffee brands)

•Derived/subservient (hours+proximity to roadways; take cards? Which cards?)

Page 52: Dmdh winter 2015 session #1

Study Scenario #2

•You study female characters in novels written between 1700 and 1850. Encoding a whole novel just to study female characters isn’t practical for you.

Page 53: Dmdh winter 2015 session #1

What types of data might you collect in

this case?

Page 54: Dmdh winter 2015 session #1

Both scenarios involve aggregating

information, rather than encoding it.

Page 55: Dmdh winter 2015 session #1

Structured Data: Example #1

(MySQL)ID Name Location Hours Coffee Brand Pastries (Y/N) Distance from

Street

008 Java the Hut

56 Farringdon Road, London, UK

7:00 a.m.-2:00 p.m.

Square Mile Roasters

N 25 meters

009 Prufrock Coffee

18 Shoreditch High Street

7:00 a.m. – 10:00 p.m.

Monmouth Y 10 meters

Page 56: Dmdh winter 2015 session #1
Page 57: Dmdh winter 2015 session #1

Structured Data: Example #2 (RDF)

Page 58: Dmdh winter 2015 session #1

How your data is (or can be) structured will

influence the technology that you

(can) use to work with it.

Page 59: Dmdh winter 2015 session #1

Digital humanists see creating machine-readable data as

valuable scholarship, and consider it vital to

make that labor transparent.

Page 60: Dmdh winter 2015 session #1

Exercise: You Create the Data!

Page 61: Dmdh winter 2015 session #1

Your data determines your

project.

Page 62: Dmdh winter 2015 session #1

Every project has data.

Text objects, images, tags, geographical coordinates, categories, records, creator

metadata, etc.

Page 63: Dmdh winter 2015 session #1

Even if you’re not planning to learn any programming skills, you are still working

with data.

Page 64: Dmdh winter 2015 session #1

Next time:Programming on the Whiteboard

January 24, 9:30, CMU 202•Cleaning data before you work with it!•Identifying specific programming tasks•How access affects your project idea•Flash project development•Homework: bring some data to work with.