Long common subsequences between bitstrings

Long common subsequences between bitstrings

Xiaoyu He, Princeton University
Fine Hall 224

In-Person Talk

A binary code of positive rate is a subset of {0,1}^n with size exponentially large in n. In any binary code of positive rate, one can find two codewords sharing the same majority bit and thus a common subsequence of length at least n/2. Surprisingly, whether this trivial constant 1/2 could be improved was a longstanding open problem in coding theory, and we solve it in a strengthened form. That is, we show that there exist A, c > 0 such that among any 2^{(log n)^A} bitstrings of length n, some two have a common subsequence of length at least (1/2 + c)n. In the language of coding theory, this means that the zero-rate threshold for coding against adversarial bit-deletion is strictly less than 1/2.

The main idea of the proof is to develop a classification system for bitstrings according to their "oscillation patterns," very roughly analogous to Fourier phases; one key ingredient of this classification is a new entropy-increment variant of the string regularity lemma of Axenovitch, Person, and Puzynina.

Joint work with Venkatesan Guruswami and Ray Li.