9

GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 3 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics

GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 3 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics

Download PPTX Report

Upload
anissa-wilkerson
View
217
Download
0

Embed Size (px)

Citation preview

Slide 1
Slide 2
GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 3 Thomas Tiahrt, MA, PhD Computer Science 482 Introduction to Text Analytics
Slide 3
2 Data created July 2009 Version 1 file format N-gram \t year \t match_count \t page_count \t volume_count \n N-gram is the 1gram, 2gram, 3gram, 4gram, 5gram Year is the publication year match_count is the occurrences for that year page_count is the number of pages on which the ngram appeared volume_count is the number of books where the ngram occurred Version 1
Slide 4
3 http://aws.amazon.com/datasets/8172056142375670 http://aws.amazon.com/datasets/8172056142375670 Stored in AWS Simple Storage Service (S3) AWS Public Dataset
Slide 5
4 Stored as compressed data Luckily Hadoop supports GZIP BZIP2 LZO (see below) DEFLATE (zlib implementation) But Hadoop does not support WinZip And Hadoop supports LZO only if you create a version with it yourself AWS Public Dataset
Slide 6
5 Compression Format ToolAlgorithmFilename Extension Multiple files? Able to be Split? DEFLATE (zlib)No CLI toolsDEFLATE.deflateNo gzip DEFLATE+.gzNo bzip2.bz2NoYes LZOlzopLZO.lzoNo Hadoop Compression Formats Source: Hadoop The Definitive Guide
Slide 7
6 Compression FormatTool DEFLATE (zlib) org.apache.hadoop.io.compress.DefaultCodec gzip org.apache.hadoop.io.compress.GzipCodec bzip2 org.apache.hadoop.io.compress.GzipCodec LZO com.hadoop.compression.LzopCodec Hadoop Compression Formats Source: Hadoop The Definitive Guide
Slide 8
Project Assignment I 7 Use the nwcdatabucket as the bucket for input Use the tmp folder in nwcdatabucket Input is nwcdatabucket/tmp Write Python code (in > 1.py files) Find the twenty most frequently occurring 5-grams for a 10 year period. You may hard-code the 10 year period E.g. 1950 to 1959 You need not worry about error checking the range
Slide 9
Project Assignment II 8 Setting reducers Use the extra arguments in the bottom of the first page The following creates 1 reducer -D mapred.reduce.tasks=1 Upload your results as a text file Upload your Python code modules
Slide 10
The end has come. End of the Part 3 PowerPoint 9

Triple Beam Balance - Reading...Triple Beam Balance 2 Record the weight indicated on each the triple beam balance in grams. 1) grams 2) grams 3) grams 4) grams 5) grams

Triple Beam Balance - Reading...Triple Beam Balance 2 Record the weight indicated on each the triple beam balance in grams. 1) grams 2) grams 3) grams 4) grams 5) grams

Documents

PROBABILITY REVIEW PART 9 CONDITIONAL PROBABILITY II Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

PROBABILITY REVIEW PART 9 CONDITIONAL PROBABILITY II Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

Documents

Shortbread Recipe Ingredients: 250 grams of gluten free flour 250 grams of butter 125 grams of cornflour 125 grams of icing sugar Caster

Shortbread Recipe Ingredients: 250 grams of gluten free flour 250 grams of butter 125 grams of cornflour 125 grams of icing sugar Caster

Documents

PROBABILITY REVIEW PART 5 PROBABILITY FOR TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

PROBABILITY REVIEW PART 5 PROBABILITY FOR TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

Documents

Documents

How many grams of water will 5 grams of Oxygen produce? 5 grams of Hydrogen? Tonight's Homework:

How many grams of water will 5 grams of Oxygen produce? 5 grams of Hydrogen? Tonight's Homework:

Documents

PROBABILITY REVIEW PART 2 PROBABILITY FOR TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

PROBABILITY REVIEW PART 2 PROBABILITY FOR TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

Documents

MapR – HADOOP DEVELOPMENT IN A VIRTUAL MACHINE Thomas Tiahrt, MA, PhD CSC482 Introduction to Text Analytics

MapR – HADOOP DEVELOPMENT IN A VIRTUAL MACHINE Thomas Tiahrt, MA, PhD CSC482 Introduction to Text Analytics

Documents

Documents

71055 DRAFT 71035 71036 118.4 grams (frozen) 71037 Vesicular … · 2015. 6. 18. · 71055 – 669.6 grams DRAFT 71035 – 141.8 grams. 71036 – 118.4 grams (frozen) 71037 – 14.39

71055 DRAFT 71035 71036 118.4 grams (frozen) 71037 Vesicular … · 2015. 6. 18. · 71055 – 669.6 grams DRAFT 71035 – 141.8 grams. 71036 – 118.4 grams (frozen) 71037 – 14.39

Documents

482 - minsalud.gov.co³n No. 482 de... · Title: 482 Created Date: 2/22/2018 3:50:14 PM

482 - minsalud.gov.co³n No. 482 de... · Title: 482 Created Date: 2/22/2018 3:50:14 PM

Documents

Documents

INTRODUCTION TO PYTHON PART 5 - GRAPHICS CSC482 Introduction to Text Analytics Thomas Tiahrt, MA, PhD

INTRODUCTION TO PYTHON PART 5 - GRAPHICS CSC482 Introduction to Text Analytics Thomas Tiahrt, MA, PhD

Documents

VISIT US : vitshotel.com / kamatsindia.com / houseofkamats ... · 15 Grams, Amul Butter 40 Grams, Amul Cheese 40 Grams, Oregano a Pinch, Fresh Cream 30 Grams, Milk 50 Ml, Red Chili

VISIT US : vitshotel.com / kamatsindia.com / houseofkamats ... · 15 Grams, Amul Butter 40 Grams, Amul Cheese 40 Grams, Oregano a Pinch, Fresh Cream 30 Grams, Milk 50 Ml, Red Chili

Documents

Importance of Agriculture to Society. Interest Approach Before class, mass 55 grams, 39 grams, 22 grams, and 12 grams of a solid fat, such as Crisco,

Importance of Agriculture to Society. Interest Approach Before class, mass 55 grams, 39 grams, 22 grams, and 12 grams of a solid fat, such as Crisco,

Documents

Lecture 4: n-grams and NLP - University of Pittsburghnaraehan/ling1330/Lecture4.pdfObjectives Frequent n-grams in English n-grams and statistical NLP n-grams and conditional probability

Lecture 4: n-grams and NLP - University of Pittsburghnaraehan/ling1330/Lecture4.pdfObjectives Frequent n-grams in English n-grams and statistical NLP n-grams and conditional probability

Documents

Mean: 23.55 grams of fat Median: 26 grams of fat Mode: 23, 26 and 28 grams of fat Range: 36 grams of fat Standard Deviation: 10.36 grams of

Mean: 23.55 grams of fat Median: 26 grams of fat Mode: 23, 26 and 28 grams of fat Range: 36 grams of fat Standard Deviation: 10.36 grams of

Documents

482-2131 482-1038 483-0753 482-0063 483-2358 482-1092 ......482-2131 482-1038 483-0753 482-0063 483-2358 482-1092 482-0005 482-4800 482-3000 482-1512 482-0664 482-2646 482-0109 482-2252

482-2131 482-1038 483-0753 482-0063 483-2358 482-1092 ......482-2131 482-1038 483-0753 482-0063 483-2358 482-1092 482-0005 482-4800 482-3000 482-1512 482-0664 482-2646 482-0109 482-2252

Documents

TEXT CATEGORIZATION THE FEDERALIST – PART 3 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

TEXT CATEGORIZATION THE FEDERALIST – PART 3 Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

Documents

5?@...Slim Fast Peanut Bu˜er Fat Bomb Snacks 90 9 grams 7 grams 1 gram 2 grams Wonderful Pistachios 160 14 grams 8 grams 5 grams 6 grams Great Value Deluxe Mixed Nuts 170 15 grams

[email protected] Fast Peanut Bu˜er Fat Bomb Snacks 90 9 grams 7 grams 1 gram 2 grams Wonderful Pistachios 160 14 grams 8 grams 5 grams 6 grams Great Value Deluxe Mixed Nuts 170 15 grams

Documents

GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 2 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics

GOOGLE N-GRAMS ON AMAZON WEB SERVICES PART 2 Thomas Tiahrt, MA, PhD Computer Science 482 – Introduction to Text Analytics

Documents

Trivia Challenge. Q1: How much protein is in a large egg? (A)6 grams (B) 5 grams (C) 8 grams (D)0 grams

Trivia Challenge. Q1: How much protein is in a large egg? (A)6 grams (B) 5 grams (C) 8 grams (D)0 grams

Documents

Pages From 2010 EW Certs Tiahrt to Visclosky

Pages From 2010 EW Certs Tiahrt to Visclosky

Documents

Broadcast Setting MatrixLBS./1,000 SQ FT SPREADER SETTING 5 Grams 1.0 LBS. 11 10 Grams 2.0 LBS. 13 15 Grams 3.0 LBS. 14 20 Grams 4.0 LBS. 16 25 Grams 5.0 LBS. 17 30 Grams 6.0 LBS

Broadcast Setting MatrixLBS./1,000 SQ FT SPREADER SETTING 5 Grams 1.0 LBS. 11 10 Grams 2.0 LBS. 13 15 Grams 3.0 LBS. 14 20 Grams 4.0 LBS. 16 25 Grams 5.0 LBS. 17 30 Grams 6.0 LBS

Documents

BROWNIES RECIPE 4 Eggs 200 grams Flour 150 grams Sugar 50 grams Cocoa powder 200 grams Butter 150 grams Chocolate

BROWNIES RECIPE 4 Eggs 200 grams Flour 150 grams Sugar 50 grams Cocoa powder 200 grams Butter 150 grams Chocolate

Documents

Informed Choice & The Tiahrt Amendment APPENDIX F Optional Session Facilitative Supervision for Quality Improvement Curriculum 2008

Informed Choice & The Tiahrt Amendment APPENDIX F Optional Session Facilitative Supervision for Quality Improvement Curriculum 2008

Documents

PROBABILITY REVIEW PART 4 PROBABILITY FOR TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

PROBABILITY REVIEW PART 4 PROBABILITY FOR TEXT ANALYTICS Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

Documents

INFORMATION RETRIEVAL LINEAR ALGEBRA REVIEW Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

INFORMATION RETRIEVAL LINEAR ALGEBRA REVIEW Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

Documents

INFORMATION THEORY CONDITIONAL ENTROPY Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

INFORMATION THEORY CONDITIONAL ENTROPY Thomas Tiahrt, MA, PhD CSC492 – Advanced Text Analytics

Documents

WHOLESALE - herbalcode · Choco Balls ( 1000 grams ) $999.99-Choco Balls ( 500 grams ) $499.99 - Choco Balls ( 10 grams ) $10.40 $24.99 Choco Balls ( 5 grams ) $5.20 $14.99 PRODUCT

WHOLESALE - herbalcode · Choco Balls ( 1000 grams ) $999.99-Choco Balls ( 500 grams ) $499.99 - Choco Balls ( 10 grams ) $10.40 $24.99 Choco Balls ( 5 grams ) $5.20 $14.99 PRODUCT

Documents