On the Complexity of Detecting Planted Solutions

-
Santosh Vempala , Georgia Institute of Technology
Fine Hall 214

Many combinatorial problems appear to be hard even when inputs are drawn from simple, natural distributions, e.g., SAT for random formulas, clique in random graphs etc. To understand their complexity, we consider random problems with planted solutions, e.g., planted k-SAT/k-CSP (clauses are drawn at random from those satisfying a fixed assignment), planted clique (a large clique is added to a random graph), planted partitions (edges of a random graph have different probabilities between and within parts of a fixed vertex partition). How many clauses/edges does one need  to find (or detect) planted solutions? At the moment there are large gaps between the point at which the planted solution becomes unique (information-theoretically) and when it can be found efficiently (in time polynomial in the size of the input). For random k-SAT/k-CSP, the current best algorithms need roughly n^{k/2} clauses to find the planted assignment although it becomes unique after roughly n log n clauses; A planted clique of size slightly larger than 2 log n is unique, while known algorithms can only find cliques of size n^{1/2}. Are these gaps inherent? In this talk, we describe a paring transition as the number of clauses (size of clique) grows, and show how it directly affects the complexity of any statistical algorithm, a general framework first studied by Kearns in the context of learning theory. As known algorithmic techniques for these problems all have statistical counterparts, this provides concrete evidence of their computational hardness.  The talk is based on joint work with Vitaly Feldman, Elena Grigorescu, Will Perkins, Lev Reyzin and Ying Xiao.