Active and Semi-Supervised Learning Theory

-
Rob Nowak, University of Wisconsin, Madison
Fine Hall 214

Science is arguably the pinnacle of human intellectual achievement, yet the scientific discovery process itself remains an art. Human intuition and experience is still the driving force of the high-level discovery process: we determine which hypotheses and theories to entertain, which experiments to conduct, how data should be interpreted, when hypotheses should be abandoned, and so on. Meanwhile machines are limited to low-level tasks such as gathering and processing data. A grand challenge for scientific discovery in the 21st century is to devise machines that directly participate in the high-level discovery process. Towards this grand challenge, we must formally characterize the limits of machine learning. Statistical learning theory is usually based on supervised training, wherein a learning algorithm is presented with a finite set of i.i.d. labeled training examples. However, modern experimental methods often generate incredibly large numbers of unlabeled data for very little expense, while the task of labeling data is often painstaking and costly. Machine learning methods must leverage the abundance of unlabeled data in scientific problem domains. Active learning (AL) and semi-supervised learing (SSL) are two well known approaches to exploit unlabeled data. In both paradigms one has access to a large pool of unlabeled examples, and only a few labeled examples are provided or selected. AL is a sequential feedback process. Unlabeled examples that are predicted to have very informative labels, based on previously gathered labeled and unlabeled data, are selected for labeling. In SSL, labeled examples are randomly provided, without regard to potential informativeness. Today, little is known about theoretical limits of AL and SSL performance. Sparsity and complexity of the underlying data-generating distributions appear to play a central role in the performance of AL and SSL, and this talk will discuss some of the known theoretical results.This work is joint with Rui Castro, Aarti Singh and Jerry Zhu.