The Machine Learning Center at Georgia Tech invites you to a seminar by Matthias Grossglauser, an associate professor in the School of Computer and Communication Sciences at EPFL. The seminar will be at 11 a.m. on Monday, April 22 at the College of Computing (CCB), Room 345.
So Many Choices, So Little Time: Learning from Comparison and Rank Data
The digital world offers infinite possibilities, which forces us to make choices all the time: we click on a search result in a list, choose a song from iTunes, or pick a restaurant on Yelp. Because of the central role of choosing, there is an abundance of data capturing comparisons (or rankings) over alternatives. This has led to a resurgence of interest in discrete-choice models in the machine learning community. In this talk, we discuss several recent results in this context.
First, we discuss large-scale inference of the Plackett-Luce (PL) model, a widely used probabilistic choice model. We formulate a new iterative spectral algorithm that produces a sequence of estimates that provably converges to the ML estimator. It is considerably faster than competing approaches, and enables choice models to be applied to much larger datasets than previously possible.
Second, we consider a situation where choices are constrained: one navigates a network structure, e.g., surfing the web or navigating on a map, and is forced to make a decision at each web page or road intersection about where to go next. Crucially, we assume that only the total marginal counts for each vertex are available, so that the individual transitions (traffic on links) are not observable. This leads to a new node metric we term ChoiceRank, which measures the utility of a node in the network independently of the network structure. We show through extensive experiments with real data that ChoiceRank recovers link traffic more faithfully than other alternatives, such as PageRank.
Third, we revisit inference for the PL model in an active learning setting, where we can ask an oracle for comparisons and get noisy answers. Our goal is to recover the ranking accurately by asking as few questions as possible. We develop a method to learn the model parameters whose statistical performance is comparable to prior work, but whose computational efficiency is dramatically better. To show this result, we first prove that the performance of classical sorting algorithms degrades benignly under noise. The strategy is then to simply sort all the items repeatedly until the budget is exhausted, and to combine the results into an overall ranking. This strategy performs as well as state-of-the-art methods (and much better than random sampling) at a minuscule fraction of the computational cost.
Matthias Grossglauser is Associate Professor in the School of Computer and Communication Sciences at EPFL. His current research interests center on machine learning and data analytics for large social systems, including stochastic models and algorithms for graph and mobility processes, and recommender systems. He is also the director of EPFL's Doctoral School in Computer and Communication Sciences.
From 2007-2010, he was with the Nokia Research Center (NRC) in Helsinki, Finland, serving as director of the Internet Laboratory, and driving a tech-transfer program focused on applied data mining and machine learning. In addition, he served on Nokia's CEO Technology Council, a technology advisory group reporting to the CEO. Prior to this, he was Assistant Professor at EPFL, and Principal Research Scientist in the Networking and Distributed Systems Laboratory at AT&T Research in New Jersey.
He received the 1998 Cor Baayen Award from the European Research Consortium for Informatics and Mathematics (ERCIM), and the 2006 CoNEXT/SIGCOMM Rising Star Award.