« December 2009
SunMonTueWedThuFriSat
  
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
  
       
Today

FEEDS

SEARCH



LINKS




CONTACT
tilmannsblog
Template by
Helquin

Friday Sep 30, 2005

Nasty Properties


Wikipedia: Chaos Theory Data mining can solve problems with "nasty properties" that make them hard to tackle. These nasty properties are chaotic behavior, high complexity, high dimensionality, concept drift, and poor data quality. Ordinary software code and traditional statistical methods cannot deal well with these issues. Let's look at these issues more closely.

Wikipedia: Butterfly Effect Chaotic behavior refers to situations where a small change in the inital configuration of a system can lead to dramatically different bahavior over time. For example weather simulation has this property. One often quoted story tells of how Edward Lorenz ran a weather simulation for days and was very satisfied with the result. He wanted to make his prediction even more precise by adding additional precision to the inputs he fed into the simulation. So he changed the input variables by a tiny amount. However, surprisingly, he found that the results he obtained were not an improved version of his earlier result but rather drastically different. Lorenz had discovered the chaotic nature of weather: if a butterfly flaps its wings in the Amazonas this tiny and remote difference in atmospheric conditions can lead to a chain reaction of changes that, in the end, may lead to a tornado hitting the USA. This phenomenon is known the Butterfly Effect.

Data mining avoids explicitly modeling the mechanisms that governs the output. Weather simulation involves modeling how atmospheric variables affect each other and lead to the resulting weather. Instead, data mining uses standard machine learning algorithms to learn what input patterns lead to different outputs. In other words, simulation will run a very large number of iterations to arrive at the expected weather three days in the future. Data mining instead looks at inputs available today and learns from historical records what the most likely weather situation will be like three days, without computing all of the weather patterns in between.

Over the next few days, I plan to post additional information on the other "nasty" properties.

Comments:

Post a Comment:
  • HTML Syntax: NOT allowed