You have to implement one unsupervised learning methods in python
from the list of proposed
algorithms extending the library of methods used in the
course (kemlg-learn).
You should fork the library from github so it is more easy to integrate your code later once it is working. When implementing the algorithms you must follow the API conventions used by scikit-learn. You should also use heavily the numpy/scipy libraries to obtain efficient algorithms.
No more than two students for each different option, so you must send an email to bejar@cs.upc.edu as soon as possible with the option you have pick to implement.
You can propose your own coursework If you are interested in other unsupervised algorithms, send and email or ask your professor during class.
Apart from the implementation of the algorithms you have to write a report comparing the algorithms to other similar algorithms using generated and real datasets (for instance from UCI).
You have this document posted in the raco explaining the assignment and evaluation criteria.
For accessing the papers you will need to access using the instructions explained in this link
The deadline for delivering the code and the report is June 1st (no extensions).
Option 2 (0)
You should fork the library from github so it is more easy to integrate your code later once it is working. When implementing the algorithms you must follow the API conventions used by scikit-learn. You should also use heavily the numpy/scipy libraries to obtain efficient algorithms.
No more than two students for each different option, so you must send an email to bejar@cs.upc.edu as soon as possible with the option you have pick to implement.
You can propose your own coursework If you are interested in other unsupervised algorithms, send and email or ask your professor during class.
Apart from the implementation of the algorithms you have to write a report comparing the algorithms to other similar algorithms using generated and real datasets (for instance from UCI).
You have this document posted in the raco explaining the assignment and evaluation criteria.
For accessing the papers you will need to access using the instructions explained in this link
The deadline for delivering the code and the report is June 1st (no extensions).
Option 1 (0)
- Ying-Yang k-means
Ding, Y.; Zhao, Y.; Shen, X.; Musuvathi, M. & Mytkowicz, T.
Yinyang k-means: A drop-in replacement
of the classic k-means with
consistent speedup Proceedings of the 32nd International
Conference on
Machine Learning (ICML-15), 2015, 579-587 1
Option 2 (0)
- One or more of the k-means consensus clustering algorithms
proposed in
J., Wu; H., Liu; H., Xiong; J., Cao; J., Chen K-Means-Based
Consensus Clustering: A Unified View IEEE Transactions on
Knowledge
and Data Engineering (2015) 27 (1): 155-169. 1
Option 3 (2)
- Capo, Marco; Perez, Aritz; Lozano, Jose A.
Capo, M.; Perez, A. & Lozano, J. A. An efficient approximation
to the K-means clustering for massive data Knowledge-Based
Systems, 2016 1
Option 4 (0)
- Scalable hierarchical clustering
Patra, B. K.; Nandi, S. & Viswanath, P. A distance based
clustering method for arbitrary shaped clusters in large datasets
Pattern Recognition, 2011, 44, 2862-2870 1
Option 5 (1)
- Rough DBSCAN
Viswanath, P. & Babu, V. S. Rough-DBSCAN:
A fast hybrid
density based clustering method for large data sets Pattern
Recognition
Letters, 2009, 30, 1477-1488 1
Option 6 (2)
- Tzortzis, G. & Likas, A. MinMax K-means
Tzortzis, G. & Likas, A. The
MinMax k-Means clustering
algorithm Pattern Recognition , 2014, 47, 2505 - 2516 1
Option 7 (2)
- Implement some of the methods for ensemble clustering described
in
Iam-On, N. & Boongoen, T. Comparative
study of matrix
refinement approaches for ensemble clustering Machine
Learning,
Springer US, 2015, 98, 269-300 1
Option 8 (1)
- Hamerly, G. & Elkan, C. Harmonic K-means
Hamerly, G. & Elkan, C. Alternatives
to the k-means
algorithm that find better clusterings Proceedings of the
Eleventh
International Conference on Information and Knowledge Management
(CIKM-02), ACM Press, 2002, 600-607 1
Option 9 (0)
- Implement this extension of k-means
Cleuziou, G. An extended version
of the k-means method for
overlapping clustering Pattern Recognition, 2008. ICPR 2008.
19th
International Conference on, 2008, 1-4 1
Option 10 (2)
- Implement the quantization based clustering algorithm
Yu, Z. & Wong, H.-S. Quantization-based clustering
algorithm Pattern Recognition, 2010, 43, 2698 - 2711 1
Option 11 (0)
- Implement the seeded and constrained K-means algorithms described
in the paper
Basu, S.; Banerjee, A. & Mooney, R. J. Semi-supervised
Clustering by Seeding Proceedings of the Nineteenth
International
Conference on Machine Learning, 2002, 27-34 1
Option 12 (1)
- Implement the I-k-means−+ clustering algorithm
Ismkhan, H. I-k-means−+: An iterative clustering algorithm
based on an enhanced version of the k-means Pattern Recognition,
2018, 79, 402 - 413 - 1
Option 13 (1)
- Implement Mini-Batch Spectral Clustering
Yufei Han, Maurizio Filippone Mini-Batch
Spectral Clustering arXiv:1607.02024v2 1
Option 14 (0)
- Implement GAD fast clustering algorithm with some of its
variations
Jin,
X., Kim, S., Han, J., Cao, L., & Yin, Z. GAD: general activity detection for fast
clustering on large data. In Proceedings
of the 2009 SIAM international conference on data mining (pp.
2-13). Society for Industrial and Applied Mathematics. 1
Jin, X., Kim, S., Han, J., Cao, L., & Yin, Z. (2011). A general framework for efficient clustering of large datasets based on activity detection. Statistical Analysis and Data Mining: The ASA Data Science Journal, 4(1), 11-29.1
Jin, X., Kim, S., Han, J., Cao, L., & Yin, Z. (2011). A general framework for efficient clustering of large datasets based on activity detection. Statistical Analysis and Data Mining: The ASA Data Science Journal, 4(1), 11-29.1
Option 15 (0)
- Implement this variation of bisecting k-means
R. Kashef, M.S. Kamel,
Enhanced bisecting k-means clustering using intermediate cooperation,
Pattern Recognition,
Volume 42, Issue 11,
2009,
Pages 2557-2569- 1
Pages 2557-2569- 1
Option 16 (1)
- Implement the density canopy based K-means
Geng Zhang, Chengchang Zhang, Huayu Zhang,
Improved K-means algorithm based on
density Canopy,
Knowledge-Based Systems,
Volume 145,
2018,
Pages 289-297, - 1
Option 17 (1)
- Implement the Power Iteration Clustering Algorithm
Option 18 (0)
- Implement the GridDBSCAN algorithm
S. Mahran and K. Mahar, Using
grid for accelerating density-based clustering, 2008 8th IEEE
International Conference on Computer and Information Technology,
Sydney, NSW, 2008, pp. 35-40. - 1
Option 19 (0)
- Implement the FDBSCAN algorithm
B. Liu, A Fast Density-Based
Clustering Algorithm for Large Databases, 2006 International
Conference on Machine Learning and Cybernetics, Dalian, China, 2006,
pp. 996-1000.- 1
Option 20 (0)
- Implement the rough*-DBSCAN or the I-DBSCAN algorithm
Diego Luchi, Alexandre Loureiros Rodrigues, Flávio Miguel Varejao,
Sampling approaches for applying
DBSCAN to large datasets, Pattern
Recognition Letters, Volume 117, 2019,Pages 90-96,- 1
Option 21 (1)
- Implement the SNN clustering algorithm
Levent Ertöz, Michael Steinbach, and Vipin Kumar Finding Clusters of Different Sizes,
Shapes, and Densities in Noisy, High Dimensional Data
Proceedings of the 2003 SIAM International Conference on Data Mining.
2003, 47-58 1
Option 22 (0)
- Implement this clustering algorithm based on density estimation
Azzalini, A., & Torelli, N. (2007). Clustering via nonparametric density
estimation. Statistics and Computing, 17(1), 71-80.
1
Option 23 (1)
- Implement this clustering algorithm based on density peaks finding
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496.
1