DATA VISUALISATION IN JAVASCRIPT
S ANANDDATA SCIENTISTGRAMENER.COM
Consider the sales report shown alongside
It shows performance of 4 branches with average price and sales across 4 cities
Each of the branches change prices every month with a corresponding change in the sales value
Basic analytics of these numbers reveal a consistent performance across 4 branches.
Further, these sales figures have a consistent Correlation and Linear regression across all cities
2010 Bangalore DelhiHyderaba
dMumbai
MonthPric
eSale
sPric
eSale
sPric
eSale
sPric
eSale
s
Jan 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
Feb 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
Mar 13.0 7.58 13.0 8.74 13.012.7
48.0 7.71
Apr 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
May 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
Jun 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
Jul 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
Aug 4.0 4.26 4.0 3.10 4.0 5.39 19.012.5
0
Sep 12.010.8
412.0 9.13 12.0 8.15 8.0 5.56
Oct 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
Nov 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
Average
9.0 7.50 9.0 7.50 9.0 7.50 9.0 7.50
Variance
10.0 3.75 10.0 3.75 10.0 3.75 10.0 3.75
WHY VISUALISE?
The four cities are completely different in behaviour and need different strategies for growth.
Bangalore sales has generally increased with price.
Hyderabad has a nearly perfect increase in sales with price, except for one aberration.
Delhi, however, shows a decline in sales as price is increased beyond a certain point.
Mumbai sales fluctuated despite a constant price, except for 1 month.
WHY VISUALISE?
DETECTING FRAUD
“We know meter readings are incorrect, for various reasons.
We don’t, however, have the concrete proof we need to start the process of meter reading automation.
Part of our problem is the volume of data that needs to be analysed. The other is the inexperience in tools or analyses to identify such patterns.
ENERGY UTILITY
This plot shows the frequency of all meter readings from Apr-2010 to Mar-2011. An unusually large number of
readings are aligned with the tariff slab boundaries.
This clearly shows collusion of some form with the customers.
Apr-10May-10Jun-
10Jul-10Aug-10Sep-
10Oct-
10Nov-10Dec-
10Jan-
11Feb-
11Mar-
11217 219 200 200 200 200 200 200 200 350 200 200250 200 200 200 201 200 200 200 250 200 200 150250 150 150 200 200 200 200 200 200 200 200 150150 200 200 200 200 200 200 200 200 200 200 50200 200 200 150 180 150 50 100 50 70 100 100100 100 100 100 100 100 100 100 100 100 110 100100 150 123 123 50 100 50 100 100 100 100 100
0 111 100 100 100 100 100 100 100 100 50 500 100 27 100 50 100 100 100 100 100 70 1001 1 1 100 99 50 100 100 100 100 100 100
This happens with specific customers, not randomly. Here are such customers’ meter readings.
Section Apr-10May-10Jun-10Jul-10
Aug-10
Sep-10
Oct-10Nov-10
Dec-10
Jan-11
Feb-11
Mar-11
Section 1 70% 97% 136% 65% 110% 116% 121% 107% 114% 88% 74% 109%Section 2 66% 92% 66% 87% 70% 64% 63% 50% 58% 38% 41% 54%Section 3 90% 46% 47% 43% 28% 31% 50% 32% 19% 38% 8% 34%Section 4 44% 24% 36% 39% 21% 18% 24% 49% 56% 44% 31% 14%Section 5 4% 63% -27% 20% 41% 82% 26% 34% 43% 2% 37% 15%Section 6 18% 23% 30% 21% 28% 33% 39% 41% 39% 18% 0% 33%Section 7 36% 51% 33% 33% 27% 35% 10% 39% 12% 5% 15% 14%Section 8 22% 21% 28% 12% 24% 27% 10% 31% 13% 11% 22% 17%Section 9 19% 35% 14% 9% 16% 32% 37% 12% 9% 5% -3% 11%
If we define the “extent of fraud” as the percentage excess of the 100 unitmeter reading, the value varies considerably across sections, and time
New section manager arrives
… and is transferred
out
… with some explainable anamolies.
Why would these
happen?
FINDING PATTERNS
Which securities move together?
How should I diversify?
What should I sell to reduce risk?
What’s a reliable predictor of a security?
SECURITIES
68% correlation between AUD &
EUR
Plot of 6 month daily AUD - EUR
values
Block of correlated currencies
… clustered hierarchically
… that move counter-cyclically
to indices
LET’S MAKE A FEW
http://s-anand.net