7
14 December 2010 Excel and R: data exchange R-meetup of Los Angeles Eric Kostello Monday, December 13, 2010

Los Angeles R users group - Dec 14 2010 - Part 4

Embed Size (px)

Citation preview

Page 1: Los Angeles R users group - Dec 14 2010 - Part 4

14 December 2010

Excel and R: data exchangeR-meetup of Los AngelesEric Kostello

Monday, December 13, 2010

Page 2: Los Angeles R users group - Dec 14 2010 - Part 4

On spreadsheets✤ Power of spreadsheets: you can do “anything”

✤ Problem with spreadsheets: anything can happen

✤ Spreadsheets are ubiquitous

✤ Very handy for certain types of problems

✤ Users like the control they give

✤ This is not a talk about why not to use spreadsheets, but check these out...

✤ http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html

✤ Encyclopedia of the Evils, but acknowledges utility when limited in scope

✤ “spreadsheet addiction”: search the web with this phrase to see that problems with spreadsheets are not confined to data analysis

Monday, December 13, 2010

Page 3: Los Angeles R users group - Dec 14 2010 - Part 4

Living with spreadsheets

✤ R users often must exchange data with spreadsheet users

✤ Data is stored in spreadsheets because...

✤ That is the way it was archived/sent/obtained

✤ It is still being created that way and change is difficult/impossible

✤ So, communication is essential

✤ Easier communication may make your day easier and your exchange more reliable

Monday, December 13, 2010

Page 4: Los Angeles R users group - Dec 14 2010 - Part 4

Data exchange between R and ExcelData exchange between R and ExcelData exchange between R and ExcelData exchange between R and ExcelData exchange between R and ExcelData exchange between R and ExcelMethod/package RW Details Pros Cons Cross

platform

Avoid RW Import/Export CSV Avoid Excel pitfalls Manual steps required every time Yes

RODBC + drivers R Adaptation of SQL APIsCan read rows and columns. Some writing ability on Windows.

Complexity & inconsistencies

With driver purchase

read.xls(gdata) R Automates creation of

CSV, then importsData frame to sheet only. Trouble with quotes. Yes

write.xls(dataframe2xls & Python)

W Automates creation of CSVs, then converts Some formatting ability data frame to sheet only.

(Coerces to dataframe.) Yes

WriteXLS(WriteXLS & Perl)

W Automates creation of CSVs, then converts

Some formatting ability Limited flexibility. Some oddities in function call.

Yes

RDCOMClient RW via Windows APIs Cell level control Not fully vectorized? No

(xlsx , rJava & xlsxJars)

RW Using Java library from Apache

Data frames and smaller. Fine formatting control.xlsx file format.

xlsx format only.Low level calls not all fully vectorized.

Yes

Monday, December 13, 2010

Page 5: Los Angeles R users group - Dec 14 2010 - Part 4

RDCOMClient examplelibrary ( "RDCOMClient")

exampleTemplateFilename <- "Example_Template.xls"

newExcelReportInstance <- paste ( "reportsDirectory\\Report_for_", format(Sys.Date(), "%d_%b_%Y"), ".xls", sep = '')

copyCommand <- paste ( "copy", exampleTemplateFilename, newExcelReportInstance )

shell ( copyCommand, shell = 'cmd %WINDIR%')

print ( "Ignore the error message about UNC paths if it occurs; it does not matter.")

exampleData <- data.frame ( X = 10:19, Y = 566:557 )

.COMInit() # Start server

exl <- COMCreate("Excel.Application") # Hook to Excel

books <- exl[["workbooks"]] # Talk to workbooks

exampleBook <- books$open(newHOfile)

exampleSheets <- exampleBook[["sheets"]]

exampleSheet <- exampleSheets$Item(as.integer(1))

# But, I cannot figure out how to get the "Range" to be larger than 1x1, so iterate through rows

headerRowPadding <- 1 # Allow for this many header rows

for ( ithRow in 1:nrow ( exampleData ) ) {

cellReferenceA <- exampleSheet$Range( paste ( "A", r + headerRowPadding, sep = '') ) # Create a reference to worksheet Column A, row ithRow + headerRowPadding

cellReferenceA[["Value"]] <- exampleData[ ithRow, "X" ]

cellReferenceB <- exampleSheet$Range( paste ( "B", r + headerRowPadding, sep = '') )

cellReferenceB[["Value"]] <- exampleData[ ithRow, "Y" ]

}

exampleBook$save()

exampleBook$close()

Monday, December 13, 2010

Page 6: Los Angeles R users group - Dec 14 2010 - Part 4

xlsx package overview

✤ Philosophy: Use Excel interface capabilities created in a more widely used codebase: The Apache Java API to Microsoft documents.

✤ Many capabilities are obtained “for free.”

✤ Fully-featured cross platform solution

✤ This is a suitable candidate for one stop shopping in R to Excel communications

✤ but requiring it may be a problem for some installations (rJava dependency)

Monday, December 13, 2010

Page 7: Los Angeles R users group - Dec 14 2010 - Part 4

xlsx package capabilities✤ Easy data frame import/export: read.xls and write.xls

✤ write.xlsx ( exampleData, file = “exampleData Workbook.xlsx”)

✤ read.xlsx ( file = ..., sheet = ... )

✤ One sheet at a time. Can keep formulas, provide colClasses.

✤ Formatting control (using Excel native capabilities, such as borderColor)

✤ Read/Write comments

✤ Merging regions, freezing panes, set print area, set zoom

✤ Can insert images (dib, emf, jpeg, pict, png, wmf)

Monday, December 13, 2010