# Seminars & Events for PACM IDeAS

##### Kernel Fusion for Manifold Learning and Signal Processing

Kernel-based methods are useful for various machine learning tasks. A kernel is a symmetrical positive definite function constructed on the graph of the data. Spectral analysis of the kernel can lead to an efficient representation. Such representation enables to reduce the size (dimension) of objects in complex datasets while preserving the coherence of the original data, such that clustering, classification, manifold learning and many other data analysis tasks can be applied in the reduced space. In this talk, I will describe two frameworks for fusing kernels from multiple views to extract a reliable, consistent representation from high-dimensional data sets. A new method for setting the kernel’s bandwidth will be presented, as well as applications for manifold learning, clustering, classification and detection of seismic events.

##### From Integrability to Medical Imaging and to the Asymtotics of the Riemann Zeta Function

It is often realized that this technique can actually be used for the solution of a plethora of other problems,and thus it becomes a mathematical method.In this lecture, a review will be presented of how a concrete problem in the area of integrability led to the development of a new method in mathematical physics for analyzing boundary value problems for linear and for integrable nonlinear PDEs,called the "Unified Transform". This method has been acclaimed by the late Israel Gelfand as "the most important development on the exact analysis of PDE since the work of the classics in the 18th century." Remarkable connections with the development of several effective algorithms for Medical Imaging,and with the Riemann hypothesis will also be reviewed.

##### State-Of-The Art Machine Learning Algorithms and How They Are Affected By Near-Term Technology Trends

Industry and Wall Street projections indicate that Machine Learning will touch every piece of data in the data center by 2020. This has created a technology arms race and algorithmic competition as IBM, NVIDIA, Intel, and ARM strive to dominate the retooling of the computer industry to support ubiquitous machine learning workloads over the next 3-4 years. Similarly, algorithm designers compete to create faster and more accurate training and inference techniques that can address complex problems spanning speech, image recognition, image tagging, self-driving cars, data analytics and more.

##### Clique-based semidefinite relaxation of the quadratic assignment problem

The matching problem between two adjacency matrices, A and B, can be formulated as the NP-hard quadratic assignment problem (QAP). While the QAP admits a semidefinite (SDP) relaxation that is often tight in practice, this SDP scales badly as it involves a matrix variable of size n^2 by n^2. To achieve a speed up, a further relaxation of the SDP will be described, where the number of variables scale as O(bn^2), where b is the number of non-zero entries in B. The dual problem of this relaxation has a natural three-block structure that can be solved via Alternating Direction Method of Multipliers (ADMM) in a distributed manner.

##### Hermite interpolation and approximation in manifolds

In this talk we study the Hermite interpolation and approximation problem. It aims at producing a function together with its derivatives, which interpolate or approximate given discrete point-vector data. The classical Hermite method interpolates data in linear spaces using polynomial functions. We are interested in interpolating or approximating manifold-valued point-vector data using curves defined solely by the intrinsic geometry of the underlying manifold. For this purpose we use iterative refinement algorithms, called Hermite subdivision schemes. These algorithms successively refine given point-vector data and, via a limit process, produce a function and its derivatives solving the Hermite problem.

##### Nonlinear Fourier series via Blaschke products

**Please note different start time: 1:30. **Classical Fourier series may be interpreted as repeatedly adding and removing roots in the origin of the complex plane. A natural modification, proposed by Coifman in the 1990s, is to remove all roots inside the unit disk - this can be done without having to find the roots and gives rise to a nonlinear form of Fourier series with many curious properties (among them: fast convergence). We explain convergence in $L^2$, convergence control quantitative control in Sobolev spaces and describe some of the nonlinear processes and applications to real-life data. This is joint work with Raphy Coifman (Yale) and Hau-tieng Wu (Toronto).

##### Non-Convex Phase Retrieval from STFT Measurements

The problem of recovering a one-dimensional signal from its Fourier transform magnitude, called phase retrieval, is ill-posed in most cases. We consider the closely-related problem of recovering a signal from its phaseless short-time Fourier transform (STFT) measurements. This problem arises naturally in several applications, such as ultra-short pulse characterization and ptychography. We suggest to recover the signal by a gradient algorithm, minimizing a non-convex loss function. The algorithm is initialized by the leading eigenvector of a designed matrix. We show that under appropriate conditions, this initialization is close to the underlying signal. We analyze the geometry of the loss function and show empirically that the gradient algorithm converges to the underlying signal even with small redundancy in the measurements.

##### Alternating projections for phase retrieval with random sensing vectors

Phase retrieval problems are a subclass of low-rank matrix recovery problems, that have been studied for a long time because of their important applications in physics. They have traditionally been solved with non-convex algorithms, that came with no theoretical convergence guarantees, and were known to sometimes get stuck in local optima. In the past few years, new algorithms have been proposed, that provably reconstruct the true solution with high probability for random instances of phase retrieval problems. We will show that, in this "random" setting, the most well-known traditional algorithm actually enjoys essentially the same convergence guarantees as the newer methods, provided that it is preceded by a suitable initialization procedure. We will discuss the importance of the initialization.

##### Nonconvex Recovery of Low-Complexity Models

Nonconvex optimization plays important role in wide range of areas of science and engineering — from learning feature representations for visual classification, to reconstructing images in biology, medicine and astronomy, to disentangling spikes from multiple neurons. The worst case theory for nonconvex optimization is dismal: in general, even guaranteeing a local minimum is NP hard. However, in these and other applications, very simple iterative methods such as gradient descent often perform surprisingly well. In this talk, I will discuss examples of nonconvex optimization problems that can be solved to global optimality using simple iterative methods, which succeed independent of initialization.

##### Synchronization over Cartan motion groups via contraction

The mathematical problem of group synchronization deals with the question of how to estimate unknown set of group elements from a set of their mutual relations. This problem appears as an important step in solving many real world problem such as Structure from Motion (SfM) in vision or pose graph optimization (estimating positions and orientations of mobile robots) in robotics. In this talk, we present a novel solution for synchronization over the class of the non-compact Cartan motion groups, which includes the special important case of rigid motions. Our method is based upon the idea of group contraction, which is a compactification process origin in relativistic mechanics.

##### Rapid solution of the cryo-EM reconstruction problem by frequency marching

Determining the three-dimensional structure of proteins and protein complexes at atomic resolution is a fundamental task in structural biology. Over the last decade, remarkable progress has been made using ``single particle" cryo-electron microscopy (cryo-EM) for this purpose. The reconstruction of a high-resolution image from cryo-EM data is typically formulated as a nonlinear, non-convex optimization problem for hundreds of thousands of unknown parameters. This leads to a very CPU-intensive task---limiting both the number of particle images which can be processed and the number of independent reconstructions which can be carried out for the purpose of statistical validation.

##### A Taste of Julia for Scientific Computing

Bridging cultures that have often been distant, Julia combines expertise from the diverse fields of computer science and computational science to create a new approach to numerical computing. Julia is designed to be easy and fast and questions notions generally held to be “laws of nature” by practitioners of numerical computing: 1. High-level dynamic programs have to be slow. 2. One must prototype in one language and then rewrite in another language for speed or deployment. 3. There are parts of a system appropriate for the programmer, and other parts that are best left untouched as they have been built by the experts. This talk will be an introduction to the Julia programming language and its design—a dance between specialization and abstraction. Specialization allows for custom treatment.

##### Structure-Aware Dictionary Learning for Graph Signals

**Please note different day (Friday). ** Dictionary Learning techniques aim to find sparse signal representations that capture prominent characteristics in the given data. For signals residing on non-Euclidean topologies, represented by weighted graphs, an additional challenge is incorporating the underlying geometric structure of the data domain into the learning process. We introduce an approach that aims to infer and preserve the local intrinsic geometry in both dimensions of the data. Combining ideas from spectral graph theory, manifold learning and sparse representations, our proposed algorithm simultaneously takes into account the underlying graph topology as well as the data manifold structure.

##### The effectiveness of nonconvex optimization in two problems

Many information and data science applications involve computation of the maximum likelihood estimates (MLE) over a high-dimensional space. Unfortunately, most MLEs are solutions to non-convex problems, which are in general intractable. Fortunately, some of the problems are not as hard as they may seem, and can be solved by optimizing the nonconvex objectives directly. This talk illustrates this observation through two problems: (1) solving a random quadratic system of equations, which spans many applications ranging from the century-old phase retrieval problem to various latent-variable models in machine learning; (2) recovering a collection of discrete variables from noisy pairwise differences, which arises when one wishes to jointly align multiple images or to retrieve the genome phases from paired sequencing reads.

##### Incorporation of Geometry into Learning Algorithms and Medicine

This talk focuses on two instances in which scientific fields outside mathematics benefit from incorporating the geometry of the data. In each instance, the applications area motivates the need for new mathematical approaches and algorithms, and leads to interesting new questions. (1) The current empirical success of deep learning in imaging and medical applications, in which theory and understanding is lagging far behind. By assuming the data lies near low dimensional manifolds and building local wavelet frames, we improve on existing theory that breaks down when the ambient dimension is large (the regime in which deep learning has seen the most success). (2) A method to determine and predict personalized drug treatment effectiveness for patients based off their baseline information.

##### Applications of Noncommutative Harmonic Analysis in Engineering and the Applied Sciences

This talk will focus more on both the mathematics and the structural biology.

##### Matrix Optimal Mass Transport: a Quantum Mechanical Approach

**Please note special day and location (Thursday, McDonnell 102A). **Optimal mass transport (OMT) is a rich area of research with applications to numerous disciplines including econometrics, fluid dynamics, statistical physics, shape optimization, expert systems, and meteorology. The problem was originally formulated on the space of scalar probability densities. In the present talk, we describe a non-commutative generalization of OMT, to the space of Hermitian matrices with trace one, and to the space of matrix-valued probability densities. Our approach follows a fluid dynamics formulation of OMT, and utilizes certain results from the quantum mechanics of open systems, in particular the Lindblad equation. The non-commutative OMT introduces a natural distance metric for matrix-valued distributions.

##### Demixing Sines and Spikes: Spectral Super-resolution in the Presence of Outliers

In this talk we consider the problem of super-resolving the line spectrum of a multisinusoidal signal from a finite number of samples, some of which may be completely corrupted. Measurements of this form can be modeled as an additive mixture of a sinusoidal and a sparse component. We propose to demix the two components and super-resolve the spectrum of the multisinusoidal signal by solving a convex program. Our main theoretical result is that-- up to logarithmic factors-- this approach is guaranteed to be successful with high probability for a number of spectral lines that is linear in the number of measurements, even if a constant fraction of the data are outliers.

##### A random walk on the upper triangular matrices

We study the following random walk on the group of n n upper triangular matrices with coecients in Z=pZ and ones along the diagonal. Starting at the identity, at each step we choose a row at random and either add it to or subtract it from the row immediately above. The mixing time of this walk is conjectured to be asymptotically n2p2. While much has been proven in this direction by a number of authors, the full conjecture remains open. We sharpen the techniques introduced by Arias-Castro, Diaconis, and Stanley to show that the dependence on p of the mixing time is p2. To prove this result, we use super-character theory and comparison theory to bound the eigenvalues of this random walk.

##### Estimating the Covariance Spectrum

**Please note special day (Tuesday).** Suppose one wishes to accurately recover the set of eigenvalues of the covariance matrix of a distribution, given access to samples from the distribution. To what extent can this set be accurately approximated given an amount of data that is sublinear in the dimension of the space? The naive empirical estimate has poor performance when the number of samples is linear or sub-linear in the dimension of the data. In the "large sample, large dimension" setting, we proposed an efficient and information theoretically near-optimal algorithm to learn the moments of the covariance spectral distribution. Further we show that given the first k moments of a distribution, we can pin down the distribution in Earthmover distance up to an error of O(1/k).