CLASSIX: Towards fast and scalable clustering

  • Date:
  • Time: 14:00 - 15:30
  • Address:
    Sokolovská 83, Praha
  • Room: K3
  • Speaker: Xinye Chen

Clustering is an important task in the data science and machine learning community, with numerous applications in domains such as bioinformatics and astronomy. In this talk, we present a practically improved sorting-based density clustering called CLASSIX. The nature of the algorithm design enables early stopping criteria and BLAS routines, which allows for faster clustering procedures against the existing clustering methods. This talk will illustrate the algorithm as well as its software design in Python and Matlab, and detail its explainability in clustering. We demonstrate its capability over tasks from various domains, e.g., image clustering, in terms of adjusted mutual information and runtime, which showcase CLASSIX's superiority over other most widely used clustering methods. Last but not least, we present the interaction between Python and MATLAB development for CLASSIX.