You have to pick one of the proposed subjects and write a report at least 7500 words long.

The idea is that you begin by reading the proposed material and then look up for other relevant papers over the internet about the subjectt. With this material you have to write a brief introductory paper describing what is the problem and its motivation, what different approaches exist to the problem, giving a brief explanation of each, and commenting if there are better approches than others or what advantages present in front of the others. Include in your paper all relevant bibliography that you collect during the research of the subject.

 Think of this coursework as if you were to write an entry on Wikipedia about the subject.
 
 The deadline for this report is January 11th. You can either deliver a hardcopy of your report to my mailbox (office S202b, omega-K2M building) or deliver your report in electronic format by e-mail to bejar@lsi.upc.edu

Subject 1: Cluster ensembles/consensus

The goal of cluster combination is to obtain a more accurate clustering of a dataset by combining the results of a set of clusterings. The different approaches can embedded in the clustering process or work only with the resulting partitions.

Papers

Subject 2: Graph clustering

Graph Clustering is an specific area of clustering that deals with the finding of groups in data that can be represented as a graph. There are many applications for this algorithms as for example the analysis of sociological data, vision, social networks or web pages analysis.

Papers

Subject 3: Unsupervised attribute selection


Attibute selection is a preprocess step needed in usupervised knowledge discovery in order to reduce the number of irrelevant attributes that obfuscate the data.

Papers

Subject 4: Clustering of datastreams


An important problem in knowledge discovery is when the data that we have is a continuous stream of data. This means that all the dataset is not available to process at the begining, The goal is to develop algoritms that can incrementaly build a model of the data. This model has to adapt to any changes of the concepts described by the datastream.

Papers

Subject 5: Frequent trees/graphs discovery


The next step in knowledge discovery is to used structured datasets in the discovery processf. A lot of data cam be represented as trees or graphs, the discovery of frequent substructures pretends to extend the research on association rules to structures data

Papers

Subject 6: Clustering in bioinformatics


The particularities of the data in the bioinformatics area needs for particular clustering methodologies. The data mining of DNA and proteins has yield new problemas and a new kind of clustering algoritms.

Papers

Subject 7: Clustering of documents


One of the application of clustering algorithms is the organization of large corpus of documents. This area is in between of data mining and documento retrieval.

Papers

Subject 8: Parallel/Distributed Clustering


The need of cluster huge amount of data has bring some algorithms able to reduce the computational cost by dividing the task. There are two different approaches, on one hand the algorithms that used parallel processing and use multiple threads that need to comunicate to maintain cluster informations and on the other hand algorithms that use the map/reduce paradigm that merge the result of the same clustering algorithm on different partitions of the dataset

Papers

Subject 9: One-class classification


Sometimes we are only interested in a model/representation of an specific class and we do not have more information of the examples from other classes or we have only a very small subtet of them compared with the data from the target class. The goal is to have a model that allows to classify up to a confidence factor new examples as members or non members of the only class.

Papers