Is manifold learning for toy data only?

-
Marina Meila, University of Washington
Fine Hall 214

Manifold learning algorithms aim to uncover low-dimensional parametrizations of data using either local or global features.

One apparent drawback of manifold learning is that the low dimensional parametrizations will typically distort the geometric properties of the original data, such as distances and angles. These impredictible and algorithm dependent distortions make it unsafe to pipeline the output of a manifold learning algorithm into other data analysis algorithms, limiting the use of these techniques in engineering and the sciences.

Moreover, accurate manifold learning typically requires very large sample sizes, yet most existing implementations are not scalable.

This talk will show how both limitations can be overcome. I will present a statistically founded methodology to estimate and then cancel out the distortions introduced by a manifold learning algorithm, thus effectively preserving the distances in the original data. This method builds on the relationship between the Laplace-Beltrami operator and the Riemannian metric on a manifold. The method can be taken further to estimate embedding parameters in a data driven fashion. On the computational side I will demonstrate that with careful use of sparse data structures manifold learning can scale to data sets in the millions.

Joint work with Dominique Perrault-Joncas, James McQueen, Jacob VanderPlas, Zhongyue Zhang, Yu-Chia Chen, Grace Telford