Mario Martin
Email: mmartin@cs.upc.edu
https://www.cs.upc.edu/~mmartin/DM.htm
Office #202
Omega building, Campus Nord
Monday: 12:00-14:00
Friday: 12:00-14:00
For other hours, contact by e-mail
DM1 – Supervised Learning: Concepts, and evaluation
DM2a and DM2b – Data preprocessing
DM3 – Naive Bayes and KNN
DM4 – Decision Trees
DM5 – Support Vector Machines
DM6 – Meta-Methods
DM7 – Association Rules
Project
Guidelines
(updated on 28/11/22)
Software
Poll of most used data mining tools 2019. [Older polls: 2018, 2017 and 2016]
Rapidminer (latest open version) or Rapidminer Studio latest version
Scripts
Python Notebook for Preprocessing in KNN
Python Notebook for Naive Bayes
Python Notebooks for Decision Trees
Meta-methods demonstration in python
Notebook
explaining techniques for unbalanced datasets (updated on 18/12/22)
Rapidminer workflow for KNN and grid search
Toy data for feature selection:
FSnormal.arff : Normal data with only two lasts features relevant
foo.csv : Normal data with only two lasts features relevant
FSbool.arff : Boolean data with nonlinear relation of the tree first features
Data
Other collections :
https://github.com/awesomedata/awesome-public-datasets#machinelearning
https://habr.com/en/post/452740/