You have to implement one unsupervised learning methods in python from the list of proposed algorithms extending the library of methods used in the course (kemlg-learn).

You should fork the library from github so it is more easy to integrate your code later once it is working. When implementing the algorithms you must follow the API conventions used by scikit-learn. You should also use heavily the numpy/scipy libraries to obtain efficient algorithms.

  No more than two students for each different option, so you must send an email to bejar@cs.upc.edu as soon as possible with the option you have pick to implement.

You can propose your own coursework If you are interested in other unsupervised algorithms, send and email or ask your professor during class.

Apart from the implementation of the algorithms you have to write a report comparing the algorithms to other similar algorithms using generated and real datasets (for instance from UCI).

You have this document posted in the raco explaining the assignment and evaluation criteria.

  For accessing the papers you will need to access using the instructions explained in this link

The deadline for delivering the code and the report is June 1st (no extensions).

Option 1 (0)

  • Ying-Yang k-means
Ding, Y.; Zhao, Y.; Shen, X.; Musuvathi, M. & Mytkowicz, T. Yinyang k-means: A drop-in replacement of the classic k-means with consistent speedup Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 2015, 579-587 1

Option 2 (0)

  • One or more of the k-means consensus clustering algorithms proposed in
J., Wu; H., Liu; H., Xiong; J., Cao; J., Chen K-Means-Based Consensus Clustering: A Unified View IEEE Transactions on Knowledge and Data Engineering (2015) 27 (1): 155-169. 1

Option 3 (2)

  • Capo, Marco; Perez, Aritz; Lozano, Jose A.
Capo, M.; Perez, A. & Lozano, J. A. An efficient approximation to the K-means clustering for massive data Knowledge-Based Systems, 2016 1

Option 4 (0)

  • Scalable hierarchical clustering
Patra, B. K.; Nandi, S. & Viswanath, P. A distance based clustering method for arbitrary shaped clusters in large datasets Pattern Recognition, 2011, 44, 2862-2870 1

Option 5 (1)

  • Rough DBSCAN
Viswanath, P. & Babu, V. S. Rough-DBSCAN: A fast hybrid density based clustering method for large data sets Pattern Recognition Letters, 2009, 30, 1477-1488 1

Option 6 (2)

  • Tzortzis, G. & Likas, A. MinMax K-means
Tzortzis, G. & Likas, A. The MinMax k-Means clustering algorithm Pattern Recognition , 2014, 47, 2505 - 2516  1

Option 7 (2)

  • Implement some of the methods for ensemble clustering described in
Iam-On, N. & Boongoen, T. Comparative study of matrix refinement approaches for ensemble clustering Machine Learning, Springer US, 2015, 98, 269-300 1

Option 8 (1)

  • Hamerly, G. & Elkan, C. Harmonic K-means
Hamerly, G. & Elkan, C.  Alternatives to the k-means algorithm that find better clusterings Proceedings of the Eleventh International Conference on Information and Knowledge Management (CIKM-02), ACM Press, 2002, 600-607 1

Option 9 (0)

  • Implement this extension of k-means
Cleuziou, G. An extended version of the k-means method for overlapping clustering Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, 2008, 1-4 1

Option 10 (2)

  • Implement the quantization based clustering algorithm
Yu, Z. & Wong, H.-S. Quantization-based clustering algorithm Pattern Recognition, 2010, 43, 2698 - 2711 1

Option 11 (0)

  • Implement the seeded and constrained K-means algorithms described in the paper
Basu, S.; Banerjee, A. & Mooney, R. J. Semi-supervised Clustering by Seeding Proceedings of the Nineteenth International Conference on Machine Learning, 2002, 27-34 1

Option 12 (1)

  • Implement the I-k-means−+ clustering algorithm
Ismkhan, H. I-k-means−+: An iterative clustering algorithm based on an enhanced version of the k-means Pattern Recognition, 2018, 79, 402 - 413 - 1

Option 13 (1)

  • Implement Mini-Batch Spectral Clustering
Yufei Han, Maurizio Filippone Mini-Batch Spectral Clustering arXiv:1607.02024v2 1

Option 14 (0)

  • Implement GAD fast clustering algorithm with some of its variations
Jin, X., Kim, S., Han, J., Cao, L., & Yin, Z.  GAD: general activity detection for fast clustering on large data. In Proceedings of the 2009 SIAM international conference on data mining (pp. 2-13). Society for Industrial and Applied Mathematics. 1

Jin, X., Kim, S., Han, J., Cao, L., & Yin, Z. (2011). A general framework for efficient clustering of large datasets based on activity detection. Statistical Analysis and Data Mining: The ASA Data Science Journal, 4(1), 11-29.1

Option 15 (0)

  • Implement this variation of bisecting k-means
R. Kashef, M.S. Kamel,  Enhanced bisecting k-means clustering using intermediate cooperation,  Pattern Recognition,  Volume 42, Issue 11, 2009,
Pages 2557-2569- 1

Option 16 (1)

  • Implement the density canopy based K-means
Geng Zhang, Chengchang Zhang, Huayu Zhang,  Improved K-means algorithm based on density Canopy,  Knowledge-Based Systems,  Volume 145,  2018,  Pages 289-297, - 1

Option 17 (1)

  • Implement the Power Iteration Clustering Algorithm
Lin, F., & Cohen, W. W. (2010, June). Power Iteration Clustering. In ICML (Vol. 10, pp. 655-662). - 1 2

Option 18 (0)

  • Implement the GridDBSCAN algorithm
S. Mahran and K. Mahar, Using grid for accelerating density-based clustering, 2008 8th IEEE International Conference on Computer and Information Technology, Sydney, NSW, 2008, pp. 35-40. - 1

Option 19 (0)

  • Implement the FDBSCAN algorithm
B. Liu, A Fast Density-Based Clustering Algorithm for Large Databases, 2006 International Conference on Machine Learning and Cybernetics, Dalian, China, 2006, pp. 996-1000.- 1

Option 20 (0)

  • Implement the rough*-DBSCAN or the I-DBSCAN algorithm
Diego Luchi, Alexandre Loureiros Rodrigues, Flávio Miguel Varejao, Sampling approaches for applying DBSCAN to large datasets, Pattern Recognition Letters, Volume 117, 2019,Pages 90-96,- 1

Option 21 (1)

  • Implement the SNN clustering algorithm
Levent Ertöz, Michael Steinbach, and Vipin Kumar Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data Proceedings of the 2003 SIAM International Conference on Data Mining. 2003, 47-58 1

Option 22 (0)

  • Implement this clustering algorithm based on density estimation
Azzalini, A., & Torelli, N. (2007). Clustering via nonparametric density estimation. Statistics and Computing, 17(1), 71-80. 1

Option 23 (1)

  • Implement this clustering algorithm based on density peaks finding
Rodriguez, A., & Laio, A. (2014). Clustering by fast search and find of density peaks. Science, 344(6191), 1492-1496. 1