I made a change in the blogger configuration to ease the later work when blogging. It is possible that older entries are not correctly formatted.

Sunday, 6 January 2008

Data Mining

This is the beginning of a series of post on the topics of Data Mining and Knowledge Discovery or KDD. There exists a large amount of resources on these topics. I plan to draw a picture of the possible open source solutions for Data Mining. I also want to recall which are the main applications of Data Mining. Open Source Systems The main open source systems for Data Mining are: for Text Mining, it is certainly important to add: the GATE system. I will review each of these systems. Supervised and unsupervised tasks One of the main distinction for algorithms and problems in Data Mining is whether the task to be performed is supervised or unsupervised. An algorithm is supervised if the goal of the algorithm is to create a model which is appropriate to categorise data according to an existing model. On the contrary, an unsupervised task consists in building a model which is appropriate to describe the data. For example, suppose we have a list of topics and news items which should be categorised in this list of topic. This is a typical supervised task. If we had no topics and were supposed to group the news items together according to some similarity criteria. This would be a unsupervised task. N.B: This entry does not really please me yet and is subject to change.