47
@PaulBradshaw Leanpub.com/u/paulbradshaw Birmingham City University, City University London Online Journalism Blog, HelpMeInvestigate Saturday, 10 May 14

Finding stories in spreadsheets

Embed Size (px)

DESCRIPTION

Presentation at Data Harvest 2014

Citation preview

Page 1: Finding stories in spreadsheets

@PaulBradshawLeanpub.com/u/paulbradshawBirmingham City University, City University LondonOnline Journalism Blog, HelpMeInvestigate

Saturday, 10 May 14

Page 2: Finding stories in spreadsheets

Show of hands. Who has...- Calculated a proportion- Used a function like SUM- Used pivot tables- Used a function like VLOOKUP

Saturday, 10 May 14

Page 3: Finding stories in spreadsheets

PART ONE:

BASICS.Saturday, 10 May 14

Page 4: Finding stories in spreadsheets

Saturday, 10 May 14

Page 6: Finding stories in spreadsheets

- Make a copy, work on that- Use CTRL+arrow keys to skip to edges of data- Clean first few rows to create single heading row- Remove grand total row- Remove empty rows (Open Refine)

Speed: keyboard shortcuts for checking the data

Saturday, 10 May 14

Page 7: Finding stories in spreadsheets

Numbers Strings Calculations10 John Smith =10+20+30

20 Kate Brown =A2+A3+A430 Mike Moore =SUM(A2:A4)

N/A Kim Smith =COUNT(A:A)

50 =COUNTA(B:B)

Row 1

Column A Column B Column C

Row 3

Row 4

Row 5

Row 6

Row 2

Saturday, 10 May 14

Page 8: Finding stories in spreadsheets

Granular data has row for every payment, person, crime etc.Aggregate has rows for total crimes, payments, etc.Granular always better - can calculate your own aggregates

Two types of datasets:Aggregate and granular

Saturday, 10 May 14

Page 9: Finding stories in spreadsheets

Aggregate data: - put the focus in Rows- numbers (money, crimes) in Values

Granular: pivot tables

Saturday, 10 May 14

Page 10: Finding stories in spreadsheets

Saturday, 10 May 14

Page 11: Finding stories in spreadsheets

= indicates this is a formulaSUM is the function to be applied( contains the ingredients for that formulaD2:D300 this is a range (array) of cells*, separates each ingredient) ends the list of ingredients

Using functions - and arguments

Saturday, 10 May 14

Page 12: Finding stories in spreadsheets

=SUM(D:D) ignores any text/empty cells=MAX(D:D)=MIN(D:D)=AVERAGE(D:D)

More speed: use column ranges

Saturday, 10 May 14

Page 13: Finding stories in spreadsheets

=AVERAGE(D:D) =MEDIAN(D:D) =MODE(D:D) - for ‘most common’: useful for ordinal ratings which shouldn’t be averaged.

Sense-checking: misleading averages

Saturday, 10 May 14

Page 14: Finding stories in spreadsheets

=MAX(D:D)/SUM(D:D) - how much of the total is accounted for by the biggest value?=SUM(D35:D64)/SUM(D:D) - what proportion from one entity?=SUM(D:D)/365 - how much per day? (for annual data)

Combining functions to quickly make numbers meaningful

Saturday, 10 May 14

Page 15: Finding stories in spreadsheets

Org spending £X per dayCompany receives X% of spendingOrg spent £X on Y

Stories you can report quickly

Saturday, 10 May 14

Page 16: Finding stories in spreadsheets

Saturday, 10 May 14

Page 17: Finding stories in spreadsheets

Data health

warning!

Remember the context: e.g. spending over £500, inflationSaturday, 10 May 14

Page 18: Finding stories in spreadsheets

PART TWO:

CHECKINGSaturday, 10 May 14

Page 19: Finding stories in spreadsheets

Saturday, 10 May 14

Page 20: Finding stories in spreadsheets

=COUNT(D:D) =COUNTA(D:D) =COUNTBLANK(D2:D15000) - have to use specific range or blank cells underneath table are counted=COUNTIF(D:D, “Other”)

COUNT functions: Checking data coverage

Saturday, 10 May 14

Page 21: Finding stories in spreadsheets

=COUNTIF(D:D, “Individual”) =COUNTIFS(D:D, “Individual”, B:B,”<10000”)=SUMIF(D:D, “<10000”) =IF(This, then that, otherwise this)

IF functions: Drill down further

Saturday, 10 May 14

Page 22: Finding stories in spreadsheets

=COUNTIF(D:D, “*hire*”) =COUNTIF(D:D, “Scottish*”)=COUNTIF(D:D, “* hire*”)

COUNTIF:Use wildcards - and spaces

Saturday, 10 May 14

Page 23: Finding stories in spreadsheets

Saturday, 10 May 14

Page 24: Finding stories in spreadsheets

=COUNTIF(D2, “*adidas*”) =COUNTIF(D3, “*adidas*”)=COUNTIF(D4, “*adidas*”)...Then sort to bring the 1s to the top

COUNTIF: Test free text data

Saturday, 10 May 14

Page 25: Finding stories in spreadsheets

THE BLACK CROSS

DOUBLE

CLICKSaturday, 10 May 14

Page 26: Finding stories in spreadsheets

Saturday, 10 May 14

Page 27: Finding stories in spreadsheets

PART THREE:

CLEANINGSaturday, 10 May 14

Page 28: Finding stories in spreadsheets

Saturday, 10 May 14

Page 29: Finding stories in spreadsheets

=TRIM(D2)=SUBSTITUTE(D2,“ ”, “”)(Target cell, what you want to substitute, what you want to replace it with)=SEARCH(“Wales”,A2) Gives a position of the first match

Cleaning text:TRIM, SEARCH, SUBSTITUTE

Saturday, 10 May 14

Page 30: Finding stories in spreadsheets

mr SMITH=UPPER(D2) = MR SMITH=LOWER(D2) = mr smith=PROPER(D2) = Mr Smith

Cleaning text:UPPER, LOWER, PROPER

Saturday, 10 May 14

Page 31: Finding stories in spreadsheets

=LEFT(E2,3) = first 3 characters in E2=RIGHT(E2,3) = last 3 characters in E2=MID(E2,10,3) = the 3 characters in E2 starting from position 10

Cleaning text:LEFT, RIGHT, MID

Saturday, 10 May 14

Page 32: Finding stories in spreadsheets

=LEN(E2) = how many characters in E2=LEFT(E2,LEN(E2)-3) = Length of E2 - 3. Grab that many characters. i.e.- If E2 is 5 characters, it will grab the first 2 (5-3=2)- If E2 is 7 characters it will grab the first 4 (7-3=4)

Combine with LEN

Saturday, 10 May 14

Page 33: Finding stories in spreadsheets

=SEARCH(“ ”,E2) = which position is the first space=LEFT(E2,SEARCH(“ ”,E2)) = Grab all characters up to (and including) that space

Combine with SEARCH

Saturday, 10 May 14

Page 34: Finding stories in spreadsheets

=SEARCH(“ ”,E2) = which position is the first space=LEFT(E2,SEARCH(“ ”,E2)) = Grab all characters up to (and including) that space=TRIM(LEFT(E2,SEARCH(“ ”,E2)))

Combine with SEARCH

Saturday, 10 May 14

Page 35: Finding stories in spreadsheets

=ISERROR(D2) = TRUE or FALSESee also:ISNUMBER, ISTEXT, ISNONTEXT, ISLOGICAL, ISEVEN, ISODDISERR (all but N/A)

Finding errors:ISERROR, ISNA, ISBLANK

Saturday, 10 May 14

Page 36: Finding stories in spreadsheets

PART FOUR:

ADDINGSaturday, 10 May 14

Page 37: Finding stories in spreadsheets

Saturday, 10 May 14

Page 38: Finding stories in spreadsheets

Save time typing search URLs

Saturday, 10 May 14

Page 41: Finding stories in spreadsheets

=VLOOKUP(What you’re looking for, what range contains a match & what you want back, which column you want back, nearest match?)=VLOOKUP(D2,Sheet1!D:E,2,false)

Merging data:VLOOKUP

Saturday, 10 May 14

Page 42: Finding stories in spreadsheets

=TEXT(D2, “dddd”) =YEAR(D2)=MONTH(D2) = 1=TEXT(D2, “mmmm”) = ‘January’=TEXT(D2, “mmm”) = ‘Jan’If not formatted as date, use LEFT

Convert dates to years:TEXT functions

Saturday, 10 May 14

Page 43: Finding stories in spreadsheets

=IF(B2>2500,“High”,“Low”)

Convert amounts to categories: nested IF functions

Saturday, 10 May 14

Page 44: Finding stories in spreadsheets

=IF(B2>2500,“High”,“Low”)=IF(B2>2500,“High”,IF(B2<1000,“Low”,“Mid”))

Convert amounts to categories: nested IF functions

Saturday, 10 May 14

Page 45: Finding stories in spreadsheets

=IF(COUNTIF(B2, “*dropped*”), “Dropped”, “Not dropped”)

Can’t use wildcard. Combine with COUNTIF

Saturday, 10 May 14

Page 46: Finding stories in spreadsheets

1. Save time.2. Check your data.3. Clean your data.4. Add to your data.5. Feel clever. But don’t be too clever.

Saturday, 10 May 14

Page 47: Finding stories in spreadsheets

Thank youLeanpub.com/u/spreadsheetstories@paulbradshaw

Saturday, 10 May 14