# On the expressiveness of comparison queries

# On the expressiveness of comparison queries

Comparisons are a classical and well studied algorithmic tool that is used in a variety of contexts and applications. I will present three examples that manifest the expressiveness of these queries in information theory, machine learning, and complexity theory. 20 (simple) questions. A basic combinatorial interpretation of Shannon's entropy function is via the "20 questions" game. This cooperative game is played by two players, Alice and Bob: Alice picks a distribution π over the numbers {1,…,n}, and announces it to Bob. She then chooses a number x according to π , and Bob attempts to identify x using as few Yes/No queries as possible, on average. An optimal strategy for the "20 questions" game is given by a Huffman code for π: Bob's questions reveal the codeword for x bit by bit. This strategy finds x using fewer than H(π)+1 questions on average. However, the questions asked by Bob could be arbitrary. We investigate the following question: Are there restricted sets of questions that match the performance of Huffman codes? We show that for every distribution π, Bob has a strategy that uses only questions of the form "x<c?" and "x=c?", and uncovers x using at most H(π)+1 questions on average, matching the performance of Huffman codes in this sense. Active classification with comparison queries. This part concerns an extension of active learning in which the learning algorithm may ask the annotator to compare the distances of two examples from the boundary of their label-class. For example, in a recommendation system application (say for restaurants), the annotator may be asked whether she liked or disliked a specific restaurant (a label query); or which one of two restaurants did she like more (a comparison query). We focus on the class of half-spaces, and show that under natural assumptions, such as large margin or bounded bit-description of the input examples, it is possible to reveal all the labels of a sample of size n using approximately O(logn) queries. Nearly optimal linear decision trees for k-SUM and related problems. We use the above active classification framework to construct near optimal linear decision trees for a variety of decision problems in combinatorics and discrete geometry. For example, for any constant k, we construct linear decision trees that solve the k-SUM problem on n elements using O(n log n) linear queries. Moreover, the queries we use are comparison queries, which compare the sums of two k-subsets; when viewed as linear queries, comparison queries are 2k-sparse and have only {−1,+1} coefficients. We give similar constructions for sorting sumsets A+B and for solving the SUBSET-SUM problem, both with optimal number of queries, up to i poly-logarithmic terms.

Based on joint works with Yuval Dagan, Yuval Filmus, Ariel Gabizon, Daniel Kane, Shachar Lovett, and Jiapeng Zhang.