Mihai Cucuringu
 

Princeton University

Applied and Computational Mathematics


[Homepage]      [Research]    [CV]      [Personal]      [Links]      


 (1) M. Cucuringu, "Synchronization over Z2 and community detection in bipartite networks", (in progress)

Finding group elements from noisy measurements of their ratios is also known as the synchronization problem. The eigenvector method was first introduced for solving the synchronization problem over the group SO(2) of planar rotations. The usefulness of synchronization over the group Z2 (SYNC(Z2)) has been demonstrated in recent algorithms for localization of sensor networks and three-dimensional structuring of molecules. The contribution of our paper is threefold. First, we give an analysis of the eigenvector method for SYNC(Z2), when the underlying graph of pairwise measurements is the complete graph or the Erdos-Renyi random graph, and use tools from random matrix theory to analyze the robustness to noise. Second, we propose a message passing decoding algorithm that solves SYNC(Z2), that outperforms the existing eigenvector synchronization algorithm for certain classes of graphs and noise models. Third, we introduce and compare several algorithms (based on spectral and semidefinite programming relaxations) for the synchronization problem over Z2 when one has the additional information that subsets of nodes represent the same (unknown) group element. In other words, all nodes within a given subset represent the same group element (i.e. should all be +1 or all -1), and there may exist multiple such subsets. We apply the synchronization methods to the U.S. Congress data set of roll call voting patterns in the U.S. Senate across time, to identify the two existing communities, i.e. the Democratic and Republican parties. We discuss several related open problems and other possible applications.

 (2) M. Cucuringu, A. Singer, D. Cowburn, " Synchronization, Graph Rigidity and the Molecule Problem", (submitted), arXiv:1111.3304 (2011), (arXiv)

The graph realization problem has received a great deal of attention in recent years, due to its importance in applications such as wireless sensor networks and structural biology. In this paper, we extend on previous work and propose the 3D-ASAP algorithm, for the graph realization problem in $\mathbb{R}^3$, given a sparse and noisy set of distance measurements. 3D-ASAP is a divide and conquer, non-incremental and non-iterative algorithm, which integrates local distance information into a global structure determination. Our approach starts with identifying, for every node, a subgraph of its 1-hop neighborhood graph, which can be accurately embedded in its own coordinate system. In the noise-free case, the computed coordinates of the sensors in each patch must agree with their global positioning up to some unknown rigid motion, that is, up to translation, rotation and possibly reflection. In other words, to every patch there corresponds an element of the Euclidean group Euc(3) of rigid transformations in $\mathbb{R}^3$, and the goal is to estimate the group elements that will properly align all the patches in a globally consistent way. Furthermore, 3D-ASAP successfully incorporates information specific to the molecule problem in structural biology, in particular information on known substructures and their orientation. In addition, we also propose 3D-SP-ASAP, a faster version of 3D-ASAP, which uses a spectral partitioning algorithm as a preprocessing step for dividing the initial graph into smaller subgraphs. Our extensive numerical simulations show that 3D-ASAP and 3D-SP-ASAP are very robust to high levels of noise in the measured distances and to sparse connectivity in the measurement graph, and compare favorably to similar state-of-the art localization algorithms.

Keywords: Graph realization, distance geometry, eigenvectors, synchronization, spectral graph theory, rigidity theory, SDP, the molecule problem, divide-and-conquer.


 (3) M. Cucuringu, M. W. Mahoney, " Localization on low-order eigenvectors of data matrices", Technical Report, (submitted), arXiv:1109.1355 (2011) (arXiv)

Eigenvector localization refers to the situation when most of the components of an eigenvector are zero or near-zero. This phenomenon has been observed on eigenvectors associated with extremal eigenvalues, and in many of those cases it can be meaningfully interpreted in terms of "structural heterogeneities" in the data. For example, the largest eigenvectors of adjacency matrices of large complex networks often have most of their mass localized on high-degree nodes; and the smallest eigenvectors of the Laplacians of such networks are often localized on small but meaningful community-like sets of nodes. Here, we describe localization associated with low-order eigenvectors, i.e., eigenvectors corresponding to eigenvalues that are not extremal but that are \buried" further down in the spectrum. Although we have observed it in several unrelated applications, this phenomenon of low-order eigenvector localization defies common intuitions and simple explanations, and it creates serious difficulties for the applicability of popular eigenvector-based machine learning and data analysis tools. After describing two examples where low-order eigenvector localization arises, we present a very simple model that qualitatively reproduces several of the empirically-observed results. This model suggests certain coarse structural similarities among the seemingly-unrelated applications where we have observed low-order eigenvector localization, and it may be used as a diagnostic tool to help extract insight from data graphs when such low-order eigenvector localization is present.


 (4) M. Cucuringu, V. Blondel, P. Van Dooren, " Extracting spatial information from networks with low-order eigenvectors", (submitted), arXiv:1111.0920, (arXiv)

We consider the problem of inferring meaningful spatial information in networks from incomplete information on the connection intensity between the nodes of the network. We consider two spatially distributed networks: a population migration flow network within the US, and a network of mobile phone calls between cities in Belgium. For both networks we use the eigenvectors of the Laplacian matrix constructed from the link intensities to obtain informative visualizations and capture natural geographical subdivisions. We observe that some low order eigenvectors localize very well and seem to reveal small geographically cohesive regions that match remarkably well with political and administrative boundaries. We discuss possible explanations for this observation by describing diffusion maps and localized eigenfunctions. In addition, we discuss a possible connection with the weighted graph cut problem, and provide numerical evidence supporting the idea that lower order eigenvectors point out local cuts in the network. However, we do not provide a formal and rigorous justification for our observations.

Keywords: Eigenvector localization, diffusion maps, visualization of large data sets, social-economic networks, community detection, gravity model.


 (5) M. Cucuringu, J. Puente, and D. Shue, " Model Selection in Undirected Graphical Models with Elastic Net ", Technical Report, 2010, arXiv: 1111.0559 (arXiv)

Structure learning in random fields has attracted considerable attention due to its difficulty and importance in areas such as remote sensing, computational biology, natural language processing, protein networks, and social network analysis. We consider the problem of estimating the probabilistic graph structure associated with a Gaussian Markov Random Field (GMRF), the Ising model and the Potts model, by extending previous work on l1 regularized neighborhood estimation to include the elastic net l1 + l2 penalty. Additionally, we show numerical evidence that the edge density plays a role in the graph recovery process. Finally, we introduce a novel method for augmenting neighborhood estimation by leveraging pair-wise neighborhood union estimates.
 (6) F. Blanchet-Sadri, M. Cucuringu, "Counting primitive partial words", Journal of Automata, Languages and Combinatorics, accepted for publication.
(Chapter 6 in book by F. Blanchet-Sadri, "Algorithmic Combinatorics on Partial Words", Chapman & Hall/CRC Press, Boca Raton, FL, 2008)

A word is primitive if it is not a power of another word. The number of primitive words of a fixed length over an alphabet of a fixed size is well known and relates to the M¨obius function. In this paper, we investigate the number of primitive partial words which are strings that may contain “do not know” symbols.

Keywords: Combinatorics on words; Words; Partial words; Primitive words; Primitive partial words; Mobius function; Periods; Exact periods.



 (7) M. Cucuringu, Y. Lipman , A. Singer, "Sensor network localization by eigenvector synchronization over the Euclidean group", ACM Transactions on Sensor Networks, accepted for publication

We present a new approach to localization of sensors from noisy measurements of a subset of their Euclidean distances. Our algorithm starts by finding, embedding and aligning uniquely realizable subsets of neighboring sensors called patches. In the noise-free case, each patch agrees with its global positioning up to an unknown rigid motion of translation, rotation and possibly reflection. The reflections and rotations are estimated using the recently developed eigenvector synchronization algorithm, while the translations are estimated by solving an overdetermined linear system. The algorithm is scalable as the number of nodes increases, and can be implemented in a distributed fashion. Extensive numerical experiments show that it compares favorably to other existing algorithms in terms of robustness to noise, sparse connectivity and running time. While our approach is applicable to higher dimensions, in the current paper we focus on the two dimensional case.

Keywords: Sensor networks, distance geometry, eigenvectors, synchronization, rigidity theory, spectral graph theory.

 (8) F. Blanchet-Sadri, E. Allen, C. Byrum, M. Cucuringu and R. Mercas, "Counting Bordered Partial Words by Critical Positions" , The Electronic Journal of Combinatorics, Vol. 18, 2011, #P138.

A partial word, sequence over a finite alphabet that may have some undefined positions or holes, is bordered if one of its proper prefixes is compatible with one of its suffixes. The number theoretical problem of enumerating all bordered full words (the ones without holes) of a fixed length n over an alphabet of a fixed size k is well known. In this paper, we investigate the number of bordered partial words having h holes with the parameters k, n. It turns out that all borders of a full word are simple, and so every bordered full word has a unique minimal border no longer than half its length. Counting bordered partial words is made extremely more difficult by the failure of that combinatorial property since there is now the possibility of a minimal border that is nonsimple. Here, we give recursive formulas based on our approach of the so-called simple and nonsimple critical positions.

Keywords: Theory of formal languages; Number theory; Combinatorics on words; Partial words; Bordered partial words; Simple border; Simply bordered partial words; Critical positions


 (9) A. Singer, M. Cucuringu, "Uniqueness of Low-Rank Matrix Completion by Rigidity Theory", SIAM Journal on Matrix Analysis and Applications, 31 (4), pp. 1621-1641 (2010) (arXiv)

The problem of completing a low-rank matrix from a subset of its entries is often encountered in the analysis of incomplete data sets exhibiting an underlying factor model with applications in collaborative filtering, computer vision, and control. Most recent work has been focused on constructing efficient algorithms for exact or approximate recovery of the missing matrix entries and proving lower bounds for the number of known entries that guarantee a successful recovery with high probability. A related problem from both the mathematical and algorithmic points of view is the distance geometry problem of realizing points in a Euclidean space from a given subset of their pairwise distances. Rigidity theory answers basic questions regarding the uniqueness of the realization satisfying a given partial set of distances. We observe that basic ideas and tools of rigidity theory can be adapted to determine uniqueness of low-rank matrix completion, where inner products play the role that distances play in rigidity theory. This observation leads to efficient randomized algorithms for testing necessary and sufficient conditions for local completion and for testing sufficient conditions for global completion. Crucial to our analysis is a new matrix, which we call the completion matrix, that serves as the analogue of the rigidity matrix.

Keywords: low-rank matrices, missing values, rigidity theory, iterative methods, collaborative filtering

 (10) F. Blanchet-Sadri, M. Cordier, M. Cucuringu and R. Kirsch, "Combinatorics on Border Correlations of Partial Words", International Conference on Automata, Languages and Related Topics, Debrecen, Hungary, October 21-24, 2008

In this paper, we consider the border sets of partial words of length n over a finite alphabet, and study the combinatorics of specific representations of them, called border correlations, which are binary vectors of length n indicating the borders. We characterize precisely which of these vectors are valid border correlations. We establish a one-to-one correspondence between the set of valid border correlations and the set of valid ternary correlations of a given length, the latter being ternary vectors representing the strong and strictly weak period sets. This gives an efficient algorithm for generating the valid ternary correlations of a given length. It turns out that the set of border correlations of partial words over an arbitrary alphabet of cardinality at least two is the same as the set of border correlations of partial words over the binary alphabet. We also give a correspondence between the ternary correlation of a partial word and its refined border correlation which specifies the lengths of all the word’s bordered cyclic shifts’ minimal borders. Finally, we investigate the population size, that is, the number of partial words sharing a given border correlation, and obtain formulas to compute it. Our approach is based on the fact that the sets of all border correlations of a given length form distributive lattices under suitably defined partial orderings.

Keywords: Theory of formal languages; Combinatorics on words; Partial words; Border correlations; Refined border correlations; Ternary correlations; Population size.

 (11) M. Cucuringu, R. Strichartz, "Infinitesimal Resistance Metrics on Sierpinski Gasket Type Fractals"- Analysis , Vol. 28, Issue 3 (2008), page 319-331

We prove the existence of an infinitesimal resistance metric on the Sierpinksi gasket (SG) at boundary points, junction points and periodic points. This is a renormalized limit of the effective resistance metric as we zoom in on the point, and satisfies a self-similar identity. We obtain similar results on PCF fractals with three boundary points.
 (12) M. Cucuringu, R. Strichartz, "Self-Similar Energy Forms on the Sierpinski Gasket with Twists" - Potential Analysis, 27 ( Aug 2006), page 45-60

By introducing twists into the iterated function system that defines the Sierpinski gasket, we are able to construct a unique regular energy form that satisfies a self–similar identity with any prescribed projective weights. Our construction is explicit (involving finding a root of a 4th order polynomial), and we are able to find explicitly a polynomial identity for the algebraic variety containing the smooth manifold of admissible weights. Without the twists, there are obstructions to existence, and a complete description due to Sabot is quite complicated.


Undergraduate research

● Summer 2006: Research Experience for Undergraduates (REU link), University of North Carolina, Greensboro, NC
I worked with Professor Blanchet-Sadri on the topic of "Algorithmic Combinatorics on Words". We investigated a series of problems related to coding, primitivity testing and computing periods in partial words which are strings that may contain a number of  "do not know" symbols known as holes.

● Summer 2005: Research Experience for Undergraduates (REU link), Cornell University, Ithaca, NY
I worked with Professor Robert Strichartz on two projects in the area of fractal analysis: "Self-Similar Energy Forms on the Sierpinski Gasket with Twists" and "Infinitesimal Resistance Metrics on Sierpinski Gasket Type Fractals". During the Cornell REU program, I also participated in a project directed by Prof. Martin Kassabov on expander graphs and groups which studied the diameters of Cayley graphs and the Kazhdan constants of finite groups with respect to different generating sets. Part of the research resulted in a report & a presentation with John Hageman at Stanford and Alina Florescu at Mt. Holyoke, entitled "Symmetries of Squares and Higher Dimensional Cubes".

Check this website for more information on this work.
This is the presentation (PDF) I gave at the end of the REU.
This article (pages 4-5) talks about the 2005 REU at Cornell.

Undergraduate projects/papers/presentations


© 2011 Mihai Cucuringu