Data Mining course page

 

 

Mario Martin

Email: mmartin@cs.upc.edu

https://www.cs.upc.edu/~mmartin/DM.htm

 

Location

Office #202

Omega building, Campus Nord

 

Attention time:

Monday: 12:00-14:00

Friday: 12:00-14:00

For other hours, contact by e-mail

 

 

Material

 

Slides:

DM1 – Supervised Learning: Concepts, and evaluation

DM2a and DM2b – Data preprocessing

DM3 – Naive Bayes and KNN

DM4 – Decision Trees

DM5 – Support Vector Machines

DM6 – Meta-Methods

DM7 – Association Rules

 

 

Laboratory:

 

Project

Guidelines (updated on 28/11/22)

 

Software

            Poll of most used data mining tools 2019. [Older polls: 2018, 2017 and 2016]

 

Rapidminer (latest open version) or Rapidminer Studio latest version

 

Anaconda python distribution

SciKit-learn

Python graph gallery

 

Scripts

Pre-processing with pandas

Python Notebook for KNN

Python Notebook for Preprocessing in KNN

Python Notebook for Naive Bayes

Python Notebooks for Decision Trees

Meta-methods demonstration in python

SVMs notebook

Notebook explaining techniques for unbalanced datasets (updated on 18/12/22)

 

Rapidminer workflow for KNN and grid search

 

Toy data for feature selection:

            FSnormal.arff : Normal data with only two lasts features relevant

            foo.csv : Normal data with only two lasts features relevant

            FSbool.arff : Boolean data with nonlinear relation of the tree first features

 

Data

UCI KDD and UCI repository

Kaggle

DrivenData  

Google Dataset search

KDNudgets

CMU StatLib

BigML

MLData

 

Other collections :

https://github.com/awesomedata/awesome-public-datasets#machinelearning

https://habr.com/en/post/452740/