1
Data Curry Higher Accuracy at What Cost? These stories are written by Dr. Dakshinamurthy V Kolluru, Chief Advisor – Data Science, Usha Martin Education: President, International School of Engineering (http://www.insofe.edu.in) The best place in the world to learn Applied Engineering. Just how much accuracy do you need in a predictive model? The answer is, interestingly, not as much as you can get! Essentially, any model is at some level a curve fit. So, if you are OK to complicate, you can fit a bit better. While a complex model is actually going to be more accurate than the simpler models, it is difficult to understand. The following is a feed forward neural net on a customer data to predict whether a customer buys a product or not. The same can be presented as a bunch of rules: If (income is very high) and (family size is either 1 or 2 members) and (education is high school), they will not buy the product. Now, even if the rules have less accuracy, they can be very easy for the business user to implement. Get the point? Netflix is another excellent example. When people rent a video on their site, they want to recommend other movies to them that they might like. It is essential for them to make the best possible recommendations as every additional rental leads to a swelling of the top line. They announced a competition to data scientists to beat their recommendation engine by 10% extra accuracy. The prize money was a million dollars. It was huge and attracted great minds. After the first phase (approximately after 12 months), the leading algorithm gave around 8% better accuracy and was fairly easy to engineer. However, the contest went on as the magic 10% mark was still untouched. The contest ran for two more years and finally, the BellKor (a team from Bell Labs) won the contest with an algorithm that gave 10.06% improvement. It had several hundred algorithms working together to create the desired improvement which indeed is an intellectual marvel. However, ironically, Netflix never used that algorithm in their production. Here is how their spokesperson explained it: “We evaluated some of the new methods offline but the additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring them into a production environment.” So, as a manager interested in data science, you always have to consider what the additional accuracy is costing you in terms of the development efforts and usability. The trade-offs have to be carefully evaluated. In most cases, your business users are interested in insights. So, an if-then rule or a simple equation gives them better understanding than a complex blackbox that somehow spits the correct answer. You are better off in such situations to compromise a bit on accuracy for simplicity. www.datacurry.com

High Accuracy Model at what costs - Data Curry

Embed Size (px)

DESCRIPTION

Just how much accuracy do you need in a predictive model? The answer is, interestingly, not as much as you can get. Essentially, any model is at some level a curve fit. So, if you are OK to complicate, you can fit a bit better. While a complex model is actually going to be more accurate than the simpler models, it is difficult to understand.

Citation preview

Page 1: High Accuracy Model at what costs - Data Curry

Data Curry

Higher Accuracy at What Cost?Higher Accuracy at What Cost?

These stories are written by Dr. Dakshinamurthy V Kolluru, Chief Advisor – Data Science,

Usha Martin Education: President, International School of Engineering (http://www.insofe.edu.in)

The best place in the world to learn Applied Engineering.

Just how much accuracy do you need in a predictive model? The answer is, interestingly, not as much as you can get!

Essentially, any model is at some level a curve fit. So, if you are OK to complicate, you can fit a bit better. While a complex model

is actually going to be more accurate than the simpler models, it is difficult to understand.

The following is a feed forward neural net on a customer data to predict whether a customer buys a product or not.

The same can be presented as a bunch of rules:

If (income is very high) and (family size is either 1 or 2 members)

and (education is high school), they will not buy the product.

Now, even if the rules have less accuracy, they can be very easy

for the business user to implement. Get the point?

Netflix is another excellent example. When people rent a video

on their site, they want to recommend other movies to them

that they might like. It is essential for them to make the best

possible recommendations as every additional rental leads to a

swelling of the top line.

They announced a competition to data scientists to beat their

recommendation engine by 10% extra accuracy. The prize

money was a million dollars. It was huge and attracted great

minds.

After the first phase (approximately after 12 months), the

leading algorithm gave around 8% better accuracy and was

fairly easy to engineer. However, the contest went on as the

magic 10% mark was still untouched. The contest ran for two

more years and finally, the BellKor (a team from Bell Labs) won

the contest with an algorithm that gave 10.06% improvement.

It had several hundred algorithms working together to create

the desired improvement which indeed is an intellectual

marvel.

However, ironically, Netflix never used that algorithm in their

production. Here is how their spokesperson explained it:

“We evaluated some of the new methods offline but the

additional accuracy gains that we measured did not seem to

justify the engineering effort needed to bring them into a

production environment.”

So, as a manager interested in data science, you always have to

consider what the additional accuracy is costing you in terms of

the development efforts and usability. The trade-offs have to

be carefully evaluated.

In most cases, your business users are interested in insights.

So, an if-then rule or a simple equation gives them better

understanding than a complex blackbox that somehow spits

the correct answer. You are better off in such situations to

compromise a bit on accuracy for simplicity.

www.datacurry.com