« January 2006 »
SunMonTueWedThuFriSat
1
2
3
4
5
6
7
8
9
15
17
18
19
20
22
23
24
25
26
27
28
29
30
31
    
       
Today

FEEDS

SEARCH



LINKS




CONTACT
tilmannsblog
Template by
Helquin

Saturday Jan 21, 2006

Popular Web Searches

Wanna know what's popular on the web? Check Google Zeitgeist.

Some of the most popular searches are around celebreties, famous people and the things they do, like Mimi, Shakira, new album, Britney, new husband and baby, Pop Princess, Beautiful People, Hollywood, Brad Pitt, Jen Aniston, Angelina Jolie, You’re Fired (and Hired), Martha Stewart, federal prison, Donald Trump, reality show, The Apprentice, Prince Charles, Camilla Parker-Bowles, Princess Diana.

Top gainer web searches, news and Froogle items in 2005 were Myspace, Ares, Baidu, Wikipedia, orkut, iTunes, Sky News, World of Warcraft, Green Day, Leonardo da Vinci, Janet Jackson, Hurricane Katrina, tsunami, xbox 360, Brad Pitt, Michael Jackson, American Idol, Britney Spears, Angelina Jolie, Harry Potter, ipod, digital camera, mp3 player, ipod mini, psp, laptop, xbox, ipod shuffle, computer desk, ipod nano

Monday Jan 16, 2006

The Accuracy Paradox



Warning

Today, let me try and convince you to avoid the accuracy metric in favor of other metrics such as precision and recall.

Accuracy is often the starting point for analyzing the quality of a predictive model. Accuracy is also probably the first term that comes to mind when non-experts think about how to evaluate the quality of a prediction. As shown below, accuracy measures the ratio of correct predictions over the total number of cases evaluated.

What about the business relevance of accuracy? Surprisingly, this is a difficult question. It seems obvious that the ratio of correct predictions over all cases should be a key metric for determining the business impact of a predictive model. Yet, the value of the accuracy metric is dubious. In fact, it is often trivially easy to create a predictive model with high accuracy, and such trivial models can be useless despite of high accuracy. Similarly, when comparing the business impact of two alternative predictive models, it may well be the less accurate model that is more beneficial to the user organization.

Let's review an example predictive model for an insurance fraud application. To prevent payment on fraudulent claims all cases that are predicted as high-risk by the model will be investigated by fraud experts. The insurance has devised a predictive model that predicts fraud with some degree of accuracy. And in order to evaluate the performance of the model the insurance has created a sample data set of 10,000 claims. All 10,000 cases in the validation sample have been carefully checked and it is known which cases are fraudulent. Now, to analyze the quality of the model, the insurance uses the table of confusion below.

Definition of Accuracy


Predicted Negative

Predicted Positive

Negative Cases

9,700

150

Positive Cases

50

100


Table 1: Table of Confusion for Fraud Model M1Fraud.

The accuracy for model M1Fraud computes to:

Accuracy of Model M<sub>1</sub><sup>Fraud</sup>

With an accuracy of 98.0% model M1Fraud appears to perform fairly well. However, the Accuracy Paradox lies in the fact that accuracy can be easily improved to 98.5% by always predicting "no fraud". The table of confusion and the accuracy for this trivial “always predict negative” model M2Fraud are shown below.


Predicted Negative

Predicted Positive

Negative Cases

9,850

0

Positive Cases

150

0

Table 1: Table of Confusion for Fraud Model M2Fraud.

Definition of Accuracy

Model M2Fraud reduces the rate of inaccurate predictions from 2% to 1.5%. This is an apparent improvement of 25%. Although the new model M2Fraud shows fewer incorrect predictions and markedly improved accuracy, as compared to the original model M1Fraud, the new model is obviously useless. The alternative model M2Fraud does not offer any value to the insurance company for preventing fraud, and clearly, the less accurate model is more useful than the more accurate model. The inescapable conclusion is that high accuracy is not necessarily an indicator of high model quality, and therein lies the Accuracy Paradox of predictive analytics. High accuracy does not necessarily lead to desirable business outcomes, and model improvements should not be measured in terms of accuracy gains. It may go too far to say that accuracy is irrelevant for assessing business benefits but I advise against using accuracy when evaluating predictive models.


Hazmat

Saturday Jan 14, 2006

Seminar on Computational Learning and Adaptation


Stanford University - Palm Drive On the 7th of December 2005, I had the opportunity to present at Stanford University about the "Business Impact of Predictive Analytics". I paste the Seminar on Computational Learning and Adaption announcement below. The seminar is chaired by Professor Pat Langley, and a number of members of his research staff and other guests also participated in the seminar.

The discussion during and after my presentation hit a couple of interesting points. One was that it would be interesting to explore the possibility of going beyond the prediction of a single target variable to predict sets of variables.

Professor Pat Langley For example, a predictive capability might predict the risk of an event happening, the cost of either outcome of the event happening or not happening, the cost of taking action to prevent the event, and the probability that the action might prevent the event. This set of variables together would allow for a more comprehensive analysis of business impact.

For instance, in an insurance fraud prediction and prevention application, it is desirable to predict, for each claim, the probability of fraud, the predicted cost of fraud, the cost of taking action to prevent fraud, and the probability that the action will prevent payment on the fraudulent claim.



Seminar on Computational Learning and Adaptation



  Business Impact of Predictive Analytics

Dr. Tilmann Bruckhaus


Chief Architect, Data Mining and Analytics
Sun Microsystems

Tilmann.Bruckhaus@Sun.Com
 

In commercial applications of predictive modeling, the ultimate objective is typically to maximize Return On Investment (ROI). However, literature, conferences, and training often stops short of providing techniques for ROI maximization. With an apparent lack of know-how for maximizing ROI, analysts often have to rely on technical metrics, such as ROC, accuracy, precision, or similar metrics to optimize predictive models. In this presentation, I will explore the problem of assessing the ROI for predictive analytics applications, break down the drivers of ROI, and show how to compute ROI. I will also present an example ROI analysis to demonstrate that one predictive model can have negative or positive ROI based on the business context in which it is used, even though technical quality metrics of the predictive model do not change.


Date: Wed., Dec 7, 2005 

Time: 4:15-5:30PM

Place: Cordura 100


Return to the seminar schedule

Friday Jan 13, 2006

AI Application Programming


AI Application Programming Techniques I recently reviewed an interesting book for Stickyminds.Com:

AI Application Programming, 2nd edition
Author: M. Tim Jones
Pages: 473	Published: 2005
Publisher: Charles River Media	ISBN: 1584504218
M. Tim Jones' book “AI Application Programming” is a practical and inspiring introduction into a variety of artificial intelligence programming techniques. This book is ideal for readers who want to become familiar with a variety of artificial intelligence application programming techniques without delving into much detail on any one technique.

Locust Swarm - Swarm Intelligence M. Tim Jones reviews 15 Artificial Intelligence (AI) programming techniques from Classifier Systems to Simulated Annealing. Jones begins by reviewing the history of AI from the 1940s to the present, and he concludes his book by a review of the present state of AI. The main part of the book is dedicated to the overview of programming techniques.

Each technique is described in general terms to convey the purpose and motivation of each technique, and to provide the context of who developed the technique and why. With this background we move on to a brief overview of how the procedure operates, guided by diagrams and small example problems. One of the most useful resources this book provides is source code in the C programming language for a simple implementation of each technique. Jones walks us through the more interesting functions of each implementation to illuminate programming techniques used to transform the technique from idea to executable. The complete source is provided on a CD which is included with the book. There is an even balance between the general description of each technique and the source code review.

Alife - Artificial Life The A-Star path finding algorithm is covered first in Jones book, followed by the newer Simulated Annealing technique which aims to find global maxima instead of suboptimal local maxima. Next, we find the fascinating Particle Swarm Optimization method which can track moving targets, and the Adaptive Resonance Theory which finds application in personalization solutions that can help recommend likely choices in shopping application and other similar systems. The Classifier System serves to link conditions to actions, and the Ant Algorithm explores environments to find hidden targets. The book also covers the equally useful techniques of Neural Networks, Reinforcement Learning, Genetic Algorithms, Artificial Life, Rule-Based Systems, Fuzzy Logic, Natural Language Processing, Bigram Model, and finally Agent-Based Software.

Fuzzy Logic This book has many advantages not easily found in other texts. The organization of the material is clear and simple. One chapter is dedicated to each technique with a few well-designed subsections with various details. Each chapter stands on its own, and the reader can easily focus on just a small number of techniques of interest, or skip over some of the techniques.

Working source code in the C programming language is included for each technique, and this code makes the book immediately useful for anyone beginning to develop software with artificial intelligence capabilities. The source code of the key functions of every programming technique is reproduced in the book as part of each chapter along with the author's review of the code at a fairly detailed level.

Every AI programming technique is also illustrated with a variety of diagrams, tables and session transcripts from program runs. Jones' writing style is unassuming and straight forward, and that is all the more helpful considering the complexity of the subject matter. Another key feature is the fairly broad selection of over a dozen different techniques, helping readers appreciate the diversity of practical artificial intelligence programming techniques.

Thursday Jan 12, 2006

Too Good To Be True


Split (The town in Croatia) The greatest danger to success with Predictive Analytics may be to over-estimate the predictive power of a predictive model. One problem that can lead to over-estimating predictive power is over-fitting. Over fitting occurs when a predictive model is trained to memorize training data so well that the model will not perform well when scoring new data. Machine Learning algorithms typically split training data internally to test for and to avoid overfitting. This internal splitting is an important safe guard but it is advisable to take the additional precaution of setting a hold out data set aside against which the quality of a trained model can be tested.

A hold out data set is created by splitting your available historical data set into two subsets, one for training, and one for validation. It is crucial that the validation data set faithfully mimics new data coming in for scoring in the production environment. What is important is to exclude any inputs that carry information that is not available when the model is deployed, and also to exclude any information from the training data set that provides clues about the to-be-predicted outcomes for the validation data. If you make a mistake your validation will be meaningless.

Churn For example, consider anecdotes I heard at KDD04, Directions 2005 and ICDM05. A predictive model was being developed for a churn prediction application, and an account code was used as an input. The model validated with excellent precision but it was found later that the account code contained information about whether the account was active. This made it too easy for the model to predict churn because accounts that are canceled due to churn then become inactive. The account code is therefore an illegal input, at least if the account code represents current status as opposed to the historical status as of the time before the churn event occurred.

An image which includes a tank. In a military application the goal was to train a model to identify tanks in imagery. This model too performed exceedingly well but it was later found that all the training images containing tanks were taken at a different time of day than those images that did not contain tanks, and the validation data had the same problem. Again, it was easy for the model, in fact unrealistically easy, to identify the presence of tanks by assessing the overall brightness of the image.

I have yet to meet an experienced practitioner in Predictive Analytics who does not admit to accidentally using illegal inputs or allowing hints about outcomes in the validation hold-out data set to spill into the training data set.

Wednesday Jan 11, 2006

Predictive Analytics vs. Information Retrieval


Predictive Analytics and Information Retrieval (IR) are two technologies used in data mining. However, they are used for different purposes, as the table below illustrates. One common aspect is that the quality of both types of solutions can be measured with Precision and Recall metrics. See the previous post for more information about precision and recall.


Information Retrieval

Predictive Analytics

General Purpose

Multi-database, open-ended research

Prediction, classification and scoring

Scope of Results

Wide

Narrow

Type of Results

Entire Documents from various sources, such as internal databases, document collections, and Google results.

A single value, such as a risk percentage (e.g., "42%"), classification ("red", "blue", "green"), or predicted value. (e.g., "54.2")

Setup

Connect to source databases, then perform any desired query.

Develop a custom model for each task.

Typical Uses

Research background info on a given technical issue.

Pinpoint high-risk situations among hundreds or even millions of known cases.

Mechanism

Sophisticated indexing of source documents.

Discover complex patterns in high-dimensional spaces.

How are results found?

Matching against a user-supplied ad-hoc query.

Matching against mathematical patterns that were learned during a training phase.

Strengths

Flexibility - handle any ad-hoc query on the fly.

Automation - provide answers without user intervention..

Tuesday Jan 10, 2006

How are Precision and Recall Calculated?


Precision Calculating precision and recall is actually quite easy. Imagine there are 100 positive cases among 10,000 cases. You want to predict which ones ore positive, and you pick 200 to have a better chance of catching many of the 100 positive cases. You record the IDs of your predictions, and when you get the actual results and tally up how many times you were right or wrong. There are four ways of being right or wrong:

  • TN / True Negative: case was negative and predicted negative
  • TP / True Positive: case was positive and predicted positive
  • FN / False Negative: case was positive but predicted negative
  • FP / False Positive: case was positive but predicted negative

Makes sense so far? Now you count how many of the 10,000 cases fall in each bucket, say:



Predicted Negative

Predicted Positive

Negative Cases

TN: 9,760

FP: 140

Positive Cases

FN: 40

TP: 60



Now, your boss asks you three questions:
  • How many percent of your predictions were correct?
    You answer: the "accuracy" was (9,760+60) out of 10,000 = 98.2%
  • How many percent of the positive cases did you catch?
    You answer: the "recall" was 60 out of 100 = 60%
  • How many percent of positive predictions were correct?
    You answer: the "precision" was 60 out of 200 = 30%