Upload
rusersla
View
1.769
Download
2
Embed Size (px)
Citation preview
14 December 2010
Excel and R: data exchangeR-meetup of Los AngelesEric Kostello
Monday, December 13, 2010
On spreadsheets✤ Power of spreadsheets: you can do “anything”
✤ Problem with spreadsheets: anything can happen
✤ Spreadsheets are ubiquitous
✤ Very handy for certain types of problems
✤ Users like the control they give
✤ This is not a talk about why not to use spreadsheets, but check these out...
✤ http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html
✤ Encyclopedia of the Evils, but acknowledges utility when limited in scope
✤ “spreadsheet addiction”: search the web with this phrase to see that problems with spreadsheets are not confined to data analysis
Monday, December 13, 2010
Living with spreadsheets
✤ R users often must exchange data with spreadsheet users
✤ Data is stored in spreadsheets because...
✤ That is the way it was archived/sent/obtained
✤ It is still being created that way and change is difficult/impossible
✤ So, communication is essential
✤ Easier communication may make your day easier and your exchange more reliable
Monday, December 13, 2010
Data exchange between R and ExcelData exchange between R and ExcelData exchange between R and ExcelData exchange between R and ExcelData exchange between R and ExcelData exchange between R and ExcelMethod/package RW Details Pros Cons Cross
platform
Avoid RW Import/Export CSV Avoid Excel pitfalls Manual steps required every time Yes
RODBC + drivers R Adaptation of SQL APIsCan read rows and columns. Some writing ability on Windows.
Complexity & inconsistencies
With driver purchase
read.xls(gdata) R Automates creation of
CSV, then importsData frame to sheet only. Trouble with quotes. Yes
write.xls(dataframe2xls & Python)
W Automates creation of CSVs, then converts Some formatting ability data frame to sheet only.
(Coerces to dataframe.) Yes
WriteXLS(WriteXLS & Perl)
W Automates creation of CSVs, then converts
Some formatting ability Limited flexibility. Some oddities in function call.
Yes
RDCOMClient RW via Windows APIs Cell level control Not fully vectorized? No
(xlsx , rJava & xlsxJars)
RW Using Java library from Apache
Data frames and smaller. Fine formatting control.xlsx file format.
xlsx format only.Low level calls not all fully vectorized.
Yes
Monday, December 13, 2010
RDCOMClient examplelibrary ( "RDCOMClient")
exampleTemplateFilename <- "Example_Template.xls"
newExcelReportInstance <- paste ( "reportsDirectory\\Report_for_", format(Sys.Date(), "%d_%b_%Y"), ".xls", sep = '')
copyCommand <- paste ( "copy", exampleTemplateFilename, newExcelReportInstance )
shell ( copyCommand, shell = 'cmd %WINDIR%')
print ( "Ignore the error message about UNC paths if it occurs; it does not matter.")
exampleData <- data.frame ( X = 10:19, Y = 566:557 )
.COMInit() # Start server
exl <- COMCreate("Excel.Application") # Hook to Excel
books <- exl[["workbooks"]] # Talk to workbooks
exampleBook <- books$open(newHOfile)
exampleSheets <- exampleBook[["sheets"]]
exampleSheet <- exampleSheets$Item(as.integer(1))
# But, I cannot figure out how to get the "Range" to be larger than 1x1, so iterate through rows
headerRowPadding <- 1 # Allow for this many header rows
for ( ithRow in 1:nrow ( exampleData ) ) {
cellReferenceA <- exampleSheet$Range( paste ( "A", r + headerRowPadding, sep = '') ) # Create a reference to worksheet Column A, row ithRow + headerRowPadding
cellReferenceA[["Value"]] <- exampleData[ ithRow, "X" ]
cellReferenceB <- exampleSheet$Range( paste ( "B", r + headerRowPadding, sep = '') )
cellReferenceB[["Value"]] <- exampleData[ ithRow, "Y" ]
}
exampleBook$save()
exampleBook$close()
Monday, December 13, 2010
xlsx package overview
✤ Philosophy: Use Excel interface capabilities created in a more widely used codebase: The Apache Java API to Microsoft documents.
✤ Many capabilities are obtained “for free.”
✤ Fully-featured cross platform solution
✤ This is a suitable candidate for one stop shopping in R to Excel communications
✤ but requiring it may be a problem for some installations (rJava dependency)
Monday, December 13, 2010
xlsx package capabilities✤ Easy data frame import/export: read.xls and write.xls
✤ write.xlsx ( exampleData, file = “exampleData Workbook.xlsx”)
✤ read.xlsx ( file = ..., sheet = ... )
✤ One sheet at a time. Can keep formulas, provide colClasses.
✤ Formatting control (using Excel native capabilities, such as borderColor)
✤ Read/Write comments
✤ Merging regions, freezing panes, set print area, set zoom
✤ Can insert images (dib, emf, jpeg, pict, png, wmf)
Monday, December 13, 2010