All faculty and students are invited to attend ML@GT Ph.D. student Cyrille Combette's Ph.D. thesis defense.
Title: Frank-Wolfe Methods for Optimization and Machine Learning
Date: Friday, April 16, 2021
Time: 9:30 am EST
Prof. Sebastian Pokutta, Institute of Mathematics, Technische Universität Berlin and Department for AI in Society, Science, and Technology, Zuse Institute Berlin
- Prof. Alexandre d'Aspremont, Département d'Informatique, CNRS and École Normale Supérieure
- Prof. Swati Gupta, School of Industrial and Systems Engineering, Georgia Institute of Technology
- Prof. Guanghui Lan, School of Industrial and Systems Engineering, Georgia Institute of Technology
- Prof. Arkadi S. Nemirovski, School of Industrial and Systems Engineering, Georgia Institute of Technology
In Chapter 2, we present the Frank-Wolfe algorithm (FW) and all necessary background material. We explain the projection-free and sparsity properties of the algorithm, provide motivation for real-world problems, and analyze the convergence rates and a lower bound.
In Chapter 3, we review the complexity bounds of linear minimizations and projections on several sets commonly used in optimization, providing a rigorous support to the use of FW. We also propose two methods for projecting onto the lp-ball, p in ]1, 2[ U ]2, +inf[, and the Birkhoff polytope respectively, and we analyze their complexity. Computational experiments for the l1-ball and the nuclear norm-ball are presented.
In Chapter 4, we identify the well-known drawback in FW, a naive zig-zagging phenomenon that slows down the algorithm. In response to this issue, we propose a boosting procedure generating a descent direction better aligned with the negative gradient and preserving the projection-free property. Although the method is relatively simple and intuitive, it provides significant computational speedups over the state-of-the-art on a variety of experiments.
In Chapter 5, we address the large-scale finite-sum optimization problem arising in many tasks of machine learning. Based on a sliding technique, we propose a generic template to integrate adaptive gradients into stochastic Frank-Wolfe algorithms in a practical way. Computational experiments on standard convex optimization problems and on the nonconvex training of neural networks demonstrate that the blend of the two methods is successful.
Both developments in Chapters 4 and 5 are motivated by the projection-free property of FW. In Chapter 6, we use the natural sparsity of the iterates generated by FW and study an application to the approximate Carathéodory problem. We show that FW generates a simple solution to the problem and that with no modification of the algorithm, better cardinality bounds can be established using existing convergence analysis of FW in different scenarios. We also consider a nonsmooth variant of FW.
In Chapter 7, we carry on with the sparsity property and we consider an extension of the Frank-Wolfe algorithm to the unconstrained setting. It addresses smooth convex optimization problems over the linear span of a given set and resembles the matching pursuit algorithm. We propose a blending method that combines fast convergence and high sparsity of the iterates. Computational experiments validate the purpose of our method.