15
Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing Sciences Email: {db, robert, kjx}@mcs.vuw.ac.nz http://www.mcs.vuw.ac.nz/~db/honours.html

Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Embed Size (px)

Citation preview

Page 1: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Spreadsheet structure inspection using low level access and

visualisationDaniel Ballinger, Robert Biddle and James Noble

School of Mathematical and Computing SciencesEmail: {db, robert, kjx}@mcs.vuw.ac.nz

http://www.mcs.vuw.ac.nz/~db/honours.html

Page 2: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Motivation

Spreadsheets are a common form of end-user programming.

Unfamiliar spreadsheets can contain daunting amounts of information in the layout and inter-cell dependencies.

Methods for studying these structures are usually limited to what the application provides.

It is difficult to get a global sense of the structure of an individual formula that may have dependencies spread out all over the spreadsheet table. Users have to track down individual cell dependencies one by one, tacking back and fourth all over the spreadsheet.

- Bonnie Nardi, A Small Matter of Programming (1993)

Page 3: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Our Proposed Solution

Working outside the spreadsheet application allows for greater flexibility in addressing issues.

The flexibility allows for visualisations to aid in end-user understanding of spreadsheets beyond what the application is capable of.

A set of visualisations as interface enhancements that allow the user to progress from an abstract levels towards actual details present.

The greater flexibility is a trade-off with direct interaction.

We focused on Microsoft Excel due to its large market share.

Page 4: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Spreadsheet elements requiring extraction

Artefacts of interest are derived from low-level structures.

The basic unit of interest is any occupied cell. Each occupied cell will have a value, and

optionally a formula. Formula should be in the same format shown to

the user (not RPN). Building our own BIFF reader would be a sizeable

project in itself. Hence we use third party software.

Page 5: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

IBM alphaWorks ExcelAccessor

A Java Bean to access and modify to contents of spreadsheets using a Windows DLL.

Requires using Windows native code and Excel to be installed. Limiting portability.

Excellent ability to extract all details correctly.

Prone to irregular crashes on larger corpus tasks.

Page 6: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Andy Khan’s JExcelAPI

Pure Java integrates better with toolkit. Open source allows for easier expansion

and bug fixes. Some element types produce problems with

extraction. E.g. Array functions, intersections, absolute references becoming relative, and earlier BIFF formats.

Page 7: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Spreadsheet Application Toolkit

Find and store spreadsheets from the Internet.Extract low level structures. E.g. Cell values and formulas.Analyse spreadsheet structures. Either individual or corpus.Conveying the findings through visualisation.

Query

URLs

XLS files

Algorithms

Image

Gobbler Google

Fetcher

Analyser

Extractor

Visualisation Tools

Web Servers

Toolkit Files

BIFF Reader

Processed Data

Metrics

Page 8: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Aspects of spreadsheet structure and use

The spreadsheet paradigm has two main characteristics:– The spatial relationships between cells– The logical relationships created by formula

These characteristics are not always disjoint.

User problems are mapped onto a 2D table that shields them from low-level details of programming and allows for more natural expression of many problems.

Page 9: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Spreadsheet layout – Real-estate Utilisation 2D

Understanding layout is an important first step in learning about a new spreadsheet.Actual values and formulas are only shown as occupied cells.The visualisation layout mimics that of Excel, with columns along the top of the x-axis and rows running down the y-axis.Cells with a higher occupancy level are coloured towards the red end of the colour spectrum.

Page 10: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Spreadsheet layout – Real-estate Utilisation 3D

Occupancy data is projected into 3D to create a surface map.Discrete to continuous data transformation helps smooth the effects of spikes.Coloured to give a Topographical terrain effect.Full benefit is seen with user interaction.

Page 11: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Formula Inspection – Data Flow

Visualising formula components that are extracted.Fully trace worksheets in one view.Single Cell, Range, Union, and Intersection.

Basic Referencing Components

Page 12: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Visualising formula in a fairly common summation example

The relative complexity of the bottom-right formula is clear from the larger circle.

Formula in a summation example

Igarashi’s Static Global View

Page 13: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Formula Inspection - Dependency Types

Excel allows for combinations of relative and absolute positioning.Understanding the referencing type is important when replicating formula and identifying regular patterns.

Relative (default)

Row Absolute

Fully Absolute

Column Absolute

Page 14: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Related Work

Takeo Igarashi– Spreadsheets augment “a visible tabular layout with invisible formulas”.– Created visualisations to help reveal the hidden dataflow graphs and

superficial tabular layouts of spreadsheets. Markus Clermont

– Most end-users are not trained programmers.– Many spreadsheets exist beyond being simple scratch pads.

Raymond Panko– Studies of empirical data into spreadsheet errors.– Found error rates can be disturbingly high.– Errors attributed to over confidence and lack of formal checking.

Margaret Burnett– The importance of scalability in visualisations.– Forms/3 and an embedded testing methodology.

Page 15: Spreadsheet structure inspection using low level access and visualisation Daniel Ballinger, Robert Biddle and James Noble School of Mathematical and Computing

Summary and Future Work

We created a Java toolkit to extract artefacts from spreadsheets and then convert the basic information into visualisations.

These visualisations are used to augment the information provided by Excel in helping users understand spreadsheets.

Future work will include detailed user studies and corpus analysis to find larger patterns.

We must also address visualisation scalability for larger, more complex, spreadsheets.

http://www.mcs.vuw.ac.nz/~db/honours.html