| « January 2006 » | | Sun | Mon | Tue | Wed | Thu | Fri | Sat |
|---|
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | | | | | | 15 | | 17 | 18 | 19 | 20 | | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | | | | | | | | | | | | | | Today |
FEEDS
SEARCH
LINKS
CONTACT
tilmannsblog
Template by
Helquin
|
Saturday Jan 21, 2006
Popular Web Searches
Wanna know what's popular on the web? Check Google Zeitgeist.
Some of the most popular searches are around celebreties, famous people and the things they do, like Mimi, Shakira, new album, Britney, new husband and baby, Pop Princess, Beautiful People, Hollywood, Brad Pitt, Jen Aniston, Angelina Jolie, You’re Fired (and Hired), Martha Stewart, federal prison, Donald Trump, reality show, The Apprentice, Prince Charles, Camilla Parker-Bowles, Princess Diana.
Top gainer web searches, news and Froogle items in 2005 were Myspace, Ares, Baidu, Wikipedia, orkut, iTunes, Sky News, World of Warcraft, Green Day, Leonardo da Vinci, Janet Jackson, Hurricane Katrina, tsunami, xbox 360, Brad Pitt, Michael Jackson, American Idol, Britney Spears, Angelina Jolie, Harry Potter, ipod, digital camera, mp3 player, ipod mini, psp, laptop, xbox, ipod shuffle, computer desk, ipod nano
Posted at
12:04PM Jan 21, 2006
by tilmannsblog in Popular |
Monday Jan 16, 2006
The Accuracy Paradox
Today, let me try and convince you to avoid the accuracy metric in favor of other metrics such as precision and recall.
Accuracy is often the starting point for analyzing the quality of a predictive model. Accuracy is also probably the first term that comes to mind when non-experts think about how to evaluate the quality of a prediction. As shown below, accuracy measures the ratio of correct predictions over the total number of cases evaluated.
What about the business relevance of accuracy? Surprisingly, this is a difficult question. It seems obvious that the ratio of correct predictions over all cases should be a key metric for determining the business impact of a predictive model. Yet, the value of the accuracy metric is dubious. In fact, it is often trivially easy to create a predictive model with high accuracy, and such trivial models can be useless despite of high accuracy. Similarly, when comparing the business impact of two alternative predictive models, it may well be the less accurate model that is more beneficial to the user organization.
Let's review an example predictive model for an insurance fraud application. To prevent payment on fraudulent claims all cases that are predicted as high-risk by the model will be investigated by fraud experts. The insurance has devised a predictive model that predicts fraud with some degree of accuracy. And in order to evaluate the performance of the model the insurance has created a sample data set of 10,000 claims. All 10,000 cases in the validation sample have been carefully checked and it is known which cases are fraudulent. Now, to analyze the quality of the model, the insurance uses the table of confusion below.
|
|
Predicted Negative
|
Predicted Positive
|
|
Negative Cases
|
9,700
|
150
|
|
Positive Cases
|
50
|
100
|
Table 1: Table of Confusion for Fraud Model M1Fraud.
The accuracy for model M 1Fraud computes to:
With an accuracy of 98.0% model M 1Fraud appears to perform fairly well. However, the Accuracy Paradox lies in the fact that accuracy can be easily improved to 98.5% by always predicting "no fraud". The table of confusion and the accuracy for this trivial “always predict negative” model M 2Fraud are shown below.
|
|
Predicted Negative
|
Predicted Positive
|
|
Negative Cases
|
9,850
|
0
|
|
Positive Cases
|
150
|
0
|
Table 1: Table of Confusion for Fraud Model
M2Fraud.
Model M2Fraud reduces the rate of inaccurate predictions from 2% to 1.5%. This is an apparent improvement of 25%. Although the new model M2Fraud shows fewer incorrect predictions and markedly improved accuracy, as compared to the original model M1Fraud, the new model is obviously useless. The alternative model M2Fraud does not offer any value to the insurance company for preventing fraud, and clearly, the less accurate model is more useful than the more accurate model. The inescapable conclusion is that high accuracy is not necessarily an indicator of high model quality, and therein lies the Accuracy Paradox of predictive analytics. High accuracy does not necessarily lead to desirable business outcomes, and model improvements should not be measured in terms of accuracy gains. It may go too far to say that accuracy is irrelevant for assessing business benefits but I advise against using accuracy when evaluating predictive models.
Posted at
03:40PM Jan 16, 2006
by tilmannsblog in The Predictive Business |
Saturday Jan 14, 2006
Seminar on Computational Learning and Adaptation
On the 7th of December 2005, I had the opportunity to present at Stanford University about the "Business Impact of Predictive Analytics". I paste the Seminar on Computational Learning and Adaption announcement below. The seminar is chaired by Professor Pat Langley, and a number of members of his research staff and other guests also participated in the seminar.
The discussion during and after my presentation hit a couple of interesting points. One was that it would be interesting to explore the possibility of going beyond the prediction of a single target variable to predict sets of variables.
For example, a predictive capability might predict the risk of an event happening, the cost of either outcome of the event happening or not happening, the cost of taking action to prevent the event, and the probability that the action might prevent the event. This set of variables together would allow for a more comprehensive analysis of business impact.
For instance, in an insurance fraud prediction and prevention application, it is desirable to predict, for each claim, the probability of fraud, the predicted cost of fraud, the cost of taking action to prevent fraud, and the probability that the action will prevent payment on the fraudulent claim.
Seminar on Computational Learning and
Adaptation
Business Impact of
Predictive Analytics
Dr. Tilmann Bruckhaus
Chief Architect, Data Mining and Analytics
Sun Microsystems
Tilmann.Bruckhaus@Sun.Com
In commercial applications of predictive modeling, the
ultimate objective is typically to maximize Return On Investment (ROI).
However, literature, conferences, and training often
stops short of providing techniques for ROI maximization. With an
apparent lack of know-how for maximizing ROI, analysts often have to
rely on technical metrics, such as ROC, accuracy, precision, or similar
metrics to optimize predictive models. In this presentation, I will
explore the problem of assessing the ROI for predictive analytics
applications, break down the drivers of ROI, and show how to compute
ROI. I will also present an example ROI analysis to demonstrate that
one predictive model can have negative or positive ROI based on the
business context in which it is used, even though technical quality
metrics of the predictive model do not change.
|
Date: Wed., Dec 7, 2005
|
Time: 4:15-5:30PM
|
Place: Cordura 100
|
Return to the seminar
schedule
|