14
Build the NY Times Subject Headings and Topics in the Cloud Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4, 2011 1

Build the NY Times Subject Headings and Topics in the Cloud Dr. Brand Niemann Director and Senior Data Scientist Semantic Community July 4, 2011 1

Embed Size (px)

Citation preview

1

Build the NY Times Subject Headings and Topics in the Cloud

Dr. Brand NiemannDirector and Senior Data Scientist

Semantic CommunityJuly 4, 2011

2

Preface

• For the last 150 years, The New York Times has maintained one of the most authoritative news vocabularies ever developed. In 2009, they began to publish this vocabulary as linked open data . The New York Times also uses approximately 30,000 tags to power their Times Topics Pages . It is their intention to publish all of these tags as linked open data.

• Today AOL Government publishes both of those together as linked open data in Spotfire so our readers can more readily browse, search, and download these invaluable data sets!

3

data.nytimes.com

http://data.nytimes.com/

See next slide

People is a 14 MB RDF file!

These can be screen scrape into Excel!

4

Build Your Own NYT Linked Data Application

• March 30, 2010, 1:21 PM Build Your Own NYT Linked Data Application By EVAN SANDHAUS– That’s It?:• So there you have it — all it takes to build a simple

linked data application with New York Times Linked Open Data. But remember: this post just focuses on the highlights. We encourage you to take a closer look at the code and dig into some of the more advanced features we didn’t discuss. We hope that you share our excitement about the possibilities of linked data, and we look forward to seeing what you create!

http://open.blogs.nytimes.com/2010/03/30/build-your-own-nyt-linked-data-application/

5

Alumni in the News

http://data.nytimes.com/schools/schools.html

http://topics.nytimes.com/top/reference/timestopics/people/l/frank_lorenzo/index.html

http://select.nytimes.com//2005/10/15/business/15nocera.html

Opens and Closes Snippet

6

“Who Went Where” Code

http://data.nytimes.com/code/schools.html

833 lines of code!

7

Subject Headings

http://data.nytimes.com/home/a.html

See next slide

8

Subject Headings

http://data.nytimes.com/86075200336035840002

See next slide

9

Using Our Linked Data

http://data.nytimes.com/home/about.html

10

Times Topics

http://topics.nytimes.com/topics/reference/timestopics/index.html

The New York Times uses approximately 30,000 tags to power our Times Topics Pages. It is our intention to publish all of these tags as linked open data.

See next page

11

Times Topics

http://topics.nytimes.com/topics/reference/timestopics/all/a/index.html

See next page

12

Times Topics

http://topics.nytimes.com/top/news/business/companies/a-m-castle-and-company/index.html

13

Spotfire• Describe the chart, how it’s made:

– The Spotfire chart was made by screen scraping the NY Times Subject Headings and Topics into an Excel spreadsheet and importing it into Spotfire. The author decided to place the two listings side-by-side as Tufte suggests to facilitate comparisons. The author also decided to re-create the summary table of Subject Heading categories to see how much change had occurred between January 13, 2010, and July 4, 2011 (very little).

• How it succeeds or falls short– This single Spotfire chart makes the two lists at the NY Times sortable (click

on column headers), searchable (use Filters and facets), and downloadable (click on the down arrow in the table header in the Spotfire Web Player).

• Add any tips for improving:– The NY Times Topics need URLs (25,389) and the author will find a way to

automate that task and will soon finish adding the URLs for NY Time Reporters by-hand.

14

Spotfire

PC Desktop Spotfire