- This event has passed.
PhD in Artificial Intelligence Thesis Defense by Matteo Ruffini
February 14 @ 10:30 am - 11:30 am
Date: February 14, 2019, time: 10:30 h
Room: Sala d’Actes. Building B6. Campus Nord. UPC
Title: Learning Latent Variable Models: Efficient Algorithms and Applications
Advisor: Dr. Ricard Gavaldà
Learning latent variable models is a fundamental machine learning problem, and the models belonging to this class – which include topic models, hidden Markov models, mixture models and many others – have a variety of real-world applications, like text mining, clustering and time series analysis.
For many practitioners, the decade-old Expectation Maximization method (EM) is still the tool of choice, despite its known proneness to local minima and long running times. To overcome these issues, algorithms based on the spectral method of moments have been recently proposed. These techniques recover the parameters of a latent variable model by solving – typically via tensor decomposition – a system of non-linear equations relating the low-order moments of the observable data with the parameters of the model to be learned. Moment-based algorithms are in general faster than EM as they require a single pass over the data, and have provable guarantees of learning accuracy in polynomial time. Nevertheless, methods of moments have room for improvements: their ability to deal with real-world data is often limited by a lack of robustness to input perturbations. Also, almost no theory studies their behavior when some of the model assumptions are violated by the input data. Extending the theory of methods of moments to learn latent variable models and providing meaningful applications to real-world contexts is the focus of this thesis.
Assuming data to be generated by a certain latent variable model, the standard approach of methods of moments consists of two steps: first, finding the equations that relate the moments of the observable data with the model parameters and then, to solve these equations to retrieve estimators of the parameters of the model. In Part I of this thesis we will focus on both steps, providing and analyzing novel and improved model-specific moments estimators and techniques to solve the equations of the moments. In both the cases we will introduce theoretical results, providing guarantees on the behavior of the proposed methods, and we will perform experimental comparisons with existing algorithms. In Part II, we will analyze the behavior of methods of moments when data violates some of the model assumptions performed by a user. First, we will observe that in this context most of the theoretical infrastructure underlying methods of moments is not valid anymore, and consequently we will develop a theoretical foundation to methods of moments in the misspecified setting, developing efficient methods, guaranteed to provide meaningful results even when some of the model assumptions are violated.
During all the thesis, we will apply the developed theoretical results to challenging real-world applications, focusing on two main domains: topic modeling and healthcare analytics. We will extend the existing theory of methods of moments to learn models that are traditionally used to do topic modeling – like the single-topic model and Latent Dirichlet Allocation – providing improved learning techniques and comparing them with existing methods, which we prove to outperform in terms of speed and learning accuracy. Furthermore, we will propose applications of latent variable models to the analysis of electronic healthcare records, which, similarly to text mining, are very likely to become massive datasets; we will propose a method to discover recurrent phenotypes in populations of patients and to cluster them in groups with similar clinical profiles – a task where the efficiency properties of methods of moments will constitute a competitive advantage over traditional approaches.