## Unitary 2-designs, variational quantum eigensolvers, and barren plateaus

Are randomised quantum circuits inducing unitary 2-designs a reasonable thing to hope for in the near future? It seems so, following a paper presented by Saeed Mehraban at QIP this year. With Harrow, Mehraban showed that short-depth random circuits involving nearest-neighbour unitary gates (where proximity is defined by a cubic lattice structure) produce approximate unitary $t$-designs. This is precisely the experimental model of Google’s Quantum AI group, who are working with a 49-qubit 2-dimensional lattice of qubits.

The main application of these randomised circuits is to prove ‘quantum supremacy’. But can they do anything useful? In the attached note, I discuss an application to building a variational quantum eigensolver, which runs into the problem of ‘barren plateaus’ recently highlighted by McClean et al. (See within for references.)

barrenplateausblogpost-1xqcazi

## A Quantum Query Complexity Trichotomy for Regular Languages (Aaronson, Grier, and Shaeffer 2018)

There were a few great talks in the ‘query complexity session’ of QIP this year: including Srinivasan Arunachalam’s talk “A converse to the polynomial method” (joint work with Briët and Palazuelos – see João’s post below for a discussion of this) and André Chailloux’s talk entitled “A note on the quantum query complexity of permutation symmetric functions”. In this post I’ll discuss the talk/paperA Quantum Query Complexity Trichotomy for Regular Languages, which is the work of Scott Aaronson, Daniel Grier (who gave the talk), and Luke Schaeffer.

Anybody who took courses on theoretical computer science will probably, at some point, have come across regular languages and their more powerful friends. They might even remember their organisation into Chomsky’s hierarchy (see Figure 1). Roughly speaking, a (formal) language consists of an alphabet of letters, and a set of rules for generating words from those letters. With more complicated rules come more expressive languages, for which the sets of words that can be generated from a given alphabet becomes larger. Intuitively, the languages in Chomsky’s hierarchy become more expressive (i.e. larger) as we move upwards and outwards in Figure 1.

One might view this hierarchy as an early attempt at computational complexity theory, where we are trying to answer the question “Given more complex rules (e.g. more powerful computational models), what kinds of languages can we generate?”. Indeed, modern computational complexity theory is still defined in terms of languages: complexity classes are defined as the sets of the formal languages that can be parsed by machines with certain computational powers.

However, there have been surprisingly few attempts to study the relationships between modern computational complexity and the kinds of formal languages discussed above. The recent work by Scott Aaronson, Daniel Grier, and Luke Shaeffer on the quantum query complexity of regular languages takes a (large) step in this direction.

This work connects formal language theory to one of the great pillars of modern computational complexity – query complexity. Also known as the ‘black box model’, in this setting we only count the number of times that we need to query (i.e. access) the input in order to carry out our computation. For many problems, the query complexity just ends up corresponding to the number of bits of the input that need to be looked at in order to compute some function.

Why do we study query complexity? The best answer is probably that it allows us prove lower bounds – in contrast to the plethora of conjectures and implications that permeate the circuit complexity world, it is possible to actually prove concrete query lower bounds for many problems, which, in turn, imply rigorous separations between various models of computation. For instance, the study of quantum query complexity led to the proof of optimality of Grover’s algorithm, which gives a provable quadratic separation between classical and quantum computation for the unstructured search problem.

In what follows, we can consider the query complexity of a regular language to correspond to the number of symbols of an input string $x$ that must be looked at in order to decide if $x$ belongs to a given regular language. Here, a language is some (not necessarily finite) set of words that can be built from some alphabet of letters according to a set of rules. By changing the rules, we obtain a different language.

We’ll start with some background on regular languages, and then discuss the results of the paper and the small amount of previous work that has investigated similar connections.

## Background: Regular languages

Formally, a regular language over an alphabet $\Sigma$ is any language that can be constructed from:

• The empty language $\varnothing$.
• The empty string language $\{\epsilon\}$.
• The singleton’ languages $\{a\}$ consisting of a single letter $a \in \Sigma$ from the alphabet.
• Any combination of the operators
• Concatenation ($AB$): ‘$A$ followed by $B$‘.
• Union ($A \cup B$): ‘either $A$ or $B$‘.
• The Kleene star operation ($A^*$): ‘zero or more repetitions of $A$‘.

Readers with some coding experience will probably have encountered regular expressions. These are essentially explicit expressions for how to construct a regular language, where we usually write $|$ for union and omit some brackets. For example, to construct a regular language that recognises all words (over the Roman alphabet) that begin with ‘q’ and end with either ‘tum’ or ‘ta’, we could write the regular expression $q\Sigma^*(tum|ta)$. In this case, the words ‘quantum’ and ‘quanta’ are contained in the regular language, but the words ‘classical’ and ‘quant’ are not. A more computationally motivated example is the regular expression for the $\mathsf{OR}$ function over the binary alphabet $\Sigma = \{0,1\}$, which we can write as $\Sigma^*1\Sigma^*$ (i.e. ‘a 1 surrounded by any number of any other characters’).

There are many other equivalent definitions of the regular languages. A particularly nice one is: “The set of languages accepted by deterministic finite state automata (DFAs)”. Here, we can roughly think of a (deterministic) finite state automaton as a ‘Turing machine without a memory tape’, which are usually drawn diagrammatically, as in Figure 2. Other definitions include those in terms of grammars (e.g. regular expressions, prefix grammars), or algebraic structures (e.g. recognition via monoids, rational series).

The decision problem associated with a regular language is: Given some string $x$ and a language $L$, is $x$ in $L$? Often, we want to make this decision by reading in as few bits of $x$ as possible (hence the motivation to study the query complexity of regular languages).

## Main results

The main result of the paper by Aaronson et al. is a trichotomy theorem, stated informally as follows:

Every regular language has quantum query complexity $\Theta(1), \tilde{\Theta}(\sqrt{n})$, or $\Theta(n)$. Furthermore, each query upper bound results from an explicit quantum algorithm.

The authors arrive at this result by showing that all regular languages naturally fall into 1 of 3 categories:

1. ‘Trivial’ languages: Intuitively, these are the languages for which membership can be decided by the first and last characters of the input string. For instance, the language describing all binary representations of even numbers is a trivial language.
2. Star-free languages: The variant of regular languages where complement is allowed ($\overline A$ — i.e. something not in $A$‘), but the Kleene star is not. Note that these languages can still be infinite — indeed, the language $\overline\varnothing$ is equivalent to the language $\Sigma^*$. The quantum query complexity of these languages is $\tilde{\Theta}(\sqrt{n})$.
3. All the rest, which have quantum query complexity $\Theta(n)$.

The paper mostly describes these classes in terms of the algebraic definitions of regular languages (i.e. in terms of monoids), since these form the basis of many of the results, but for the sake of simplicity, we will avoid talking about monoids in this post.

Along the way, the authors prove several more interesting results:

• Algebraic characterisation: They give a characterisation of each class of regular languages in terms of the monoids that recognise them. That is, the monoid is either a rectangular band, aperiodic, or finite. In particular, given a description of the machine, grammar, etc. generating the language, it is possible to decide its membership in one of the three classes mentioned above by explicitly calculating its ‘syntactic monoid’ and checking a small number of conditions. See Section 3 of their paper for details.
• Related complexity measures: Many of the lower bounds are derived from lower bounds on other query measures. They prove query dichotomies for deterministic complexity, randomised query complexity, sensitivity, block sensitivity, and certificate complexity: they are all either $\Theta(1)$ or $\Theta(n)$ for regular languages. See Section 6 of their paper.
• Generalisation of Grover’s algorithm: The quantum algorithm using $\tilde{O}(\sqrt{n})$ queries for star-free regular languages extends to a variety of other settings by virtue of the fact that the star-free languages enjoy a number of equivalent characterisations. In particular, the characterisation of star-free languages as sentences in first-order logic over the natural numbers with the less-than relation shows that the algorithm for star-free languages is a nice generalisation of Grover’s algorithm. See Sections 1.3 and 4 of their paper for extra details and applications.

Finally, the authors show that the trichotomy breaks down for other formal languages. In fact, it breaks down as soon as we move to the ‘next level’ of the hierarchy, namely the context-free languages.

The results in the paper allow the authors to link the query complexities to the more familiar (for some) setting of circuit complexity. By the characterisation of star-free languages in first-order logic (in particular, McNaughton’s characterisation of star-free languages in first-order logic, which says that every star-free language can be expressed as a sentence in first-order logic over the natural numbers with the less-than relation and predicates $\pi_a$ for $a \in \Sigma$, such that $\pi_a(i)$ is true if input symbol $x_i$ is $a$.), it follows that all star-free languages that have quantum query complexity $\tilde{O}(\sqrt{n})$ are in $\mathsf{AC^0}$. Conversely, they show that regular languages not in $\mathsf{AC^0}$ have quantum query complexity $\Omega(n)$. Thus, another way to state the trichotomy is that (very roughly speaking) regular languages in $\mathsf{NC^0}$ have complexity $O(1)$, regular languages in $\mathsf{AC^0}$ but not $\mathsf{NC^0}$ have complexity $\tilde{\Theta}(\sqrt{n})$, and everything else has complexity $\Omega(n)$.

## Previous work

There have been other attempts to connect the more modern aspects of complexity theory to regular languages, as the authors of this work point out. One example is the work of Tesson and Thérien on the communication complexity of regular languages. They show that the communication complexity is $\Theta(1)$, $\Theta(\log \log n)$, $\Theta(\log n)$, or $\Theta(n)$ — a `tetrachotomy’ theorem with parallels to the current work. It is interesting that Tesson and Thérien didn’t study the query complexity of regular languages, and instead went straight for the communication complexity setting, since the former is the more obvious first step. Indeed, Aaronson et al. write

“communication complexity is traditionally more difficult than query complexity, yet the authors appear to have skipped over query complexity — we assume because quantum query complexity is necessary to get an interesting result.”

Another example of previous work is the work of Alon, Krivelevich, Newman, and Szegedy, who consider regular languages in the property-testing framework. Here the task is to decide if an $n$-letter input word $x$ is in a given language, or if at least $\epsilon n$ many positions of $x$ must be changed in order to create a word that is in the language. They show that regular languages can be tested, in this sense, using only $\tilde{O}(1/\epsilon)$ queries. They also demonstrate that there exist context-free grammars which do not admit constant query property testers – showing that the results once again break down once we leave the regular languages.

Aaronson et al. also point out some similarities to work of Childs and Kothari on the complexity of deciding minor-closed graph properties (the results are of the same flavour, but not obviously related), and that a combination of two existing results – Chandra, Fortune, and Lipton and Bun, Kothari, and Thaler – allows one to show that the quantum query complexity of star-free languages is $o(n)$.

## Techniques

The lower bounds are mostly derived from a (new) dichotomy theorem for sensitivity – i.e. that the sensitivity of a regular language is either $O(1)$ or $\Omega(n)$. The proof makes a nice use of the pumping lemma for regular languages.

The majority of the work for the rest of the paper is focused on developing the quantum query algorithm for star-free languages. The proof is based on an algebraic characterisation of star-free regular languages as ‘those languages recognised by finite aperiodic monoids’, due to Schützenberger, and the actual algorithm can be seen as a generalisation of Grover’s algorithm. I’ll leave the exact details to the paper.

These two short paragraphs really don’t do justice to the number of different techniques that the paper combines to obtain its results. So I recommend checking out the paper for details!

## Summary and open questions

This work takes a complete first step towards studying formal languages from the perspective of the more modern forms of computational complexity, namely query complexity. It very satisfyingly answers the question “what is the query complexity of the regular languages?”. For classical computers, it’s either $\Theta(1)$ or $\Theta(n)$ – i.e. either trivial or difficult. For quantum computers, it’s either  $\Theta(1)$, $\Theta(n)$, or $\tilde{\Theta}(\sqrt{n})$ – i.e. trivial, difficult, or slightly easier than classical.

An obvious next step is to extend this work to other languages in the hierarchy, for example the context-free languages. However, the authors obtain what is essentially a no-go result in this direction – they show that the trichotomy breaks down, and in particular for every $c \in [1/2, 1]$, there exists some context-free language with quantum query complexity approaching $\Theta(n^c)$. They conjecture that no context-free language will have quantum query complexity that lies strictly between constant or $\sqrt{n}$, but leave this open.

Another direction is to consider promise problems: suppose we are promised that the input strings are taken from some specific set, does this affect the query complexity? It is known that in order to obtain exponential separations between classical and quantum query complexity for (say) Boolean functions, we have to consider partial functions – i.e. functions with a promise on the input. For instance, suppose we are promised that the input is ‘sorted’ (i.e. belongs to the regular language generated by $0^*1^*$). Then our task is to determine whether there is an occurrence of $01$ at an even position (belongs to the language $(\Sigma\Sigma)^*01\Sigma^*$). As the authors point out, we can use binary search to decide membership in only $\Theta(\log n)$ quantum (and classical) queries, so clearly complexities other than the three above are possible. Aaronson et al. conjecture that the trichotomy would remain, though, and that the quantum query complexity of regular languages with promises on their inputs is one of $\Theta(\text{polylog}(n))$, $\Theta(\sqrt{n}\cdot\text{polylog}(n))$, or $\Theta(n\cdot\text{polylog}(n))$.

It’d also be nice to see if there are any other applications of the Grover-esque algorithm that the authors develop. Given that the algorithm is quite general, and that there are many alternative characterisations of the star-free regular languages, it’d be surprising if there weren’t any immediate applications to other problems. The authors suggest that string matching problems could be an appropriate setting, since  linear-time classical algorithms for these problems have been derived from finite automata. Although quadratic quantum speedups are already known here, it could be a good first step to obtain these speedups by just applying the new algorithm as a black box.

## Low-depth gradient measurements can improve convergence in variational hybrid quantum-classical algorithms (Harrow and Napp 2019)

With the growing popularity of hybrid quantum-classical algorithms as a way of potentially achieving a quantum advantage on Noisy Intermediate-Scale Quantum (NISQ) devices, there were a number of talks on this topic at this year’s QIP conference in Boulder, Colorado. One of the talks that I found the most interesting was given by John Napp from MIT on how “Low-depth gradient measurements can improve convergence in variational hybrid quantum-classical algorithms” based on work done with Aram Harrow (https://arxiv.org/abs/1901.05374).

What are variational hybrid quantum-classical algorithms?

They are a class of optimisation algorithms in which the quantum and classical computer work closely together. Most variational algorithms follow a simple structure:

1. Prepare a parameterised quantum state $|\theta\rangle$ which can take the form $|\theta\rangle = |\theta_1,\hdots,\theta_p\rangle = e^{-iA_p\theta_p/2} \cdots e^{-iA_1\theta_1/2} |\psi\rangle$ where the $A_j$ are Hermitian operators. The type of parameters and operators depend on the device that is being targeted and $|\psi\rangle$ is an easy-to-prepare initial state.
2. Carry out measurements to determine information about the classical objective function $f(\theta)$ you wish to minimise (or maximise) where $f(\theta) = \langle\theta|H|\theta\rangle$ and $H$ is some Hermitian observable (for example corresponding to a physical Hamiltonian). Due to the randomness of quantum measurements, many preparations and measurements of $|\theta\rangle$ are required to obtain a good estimate of $f(\theta)$.
3. Use a classical optimisation method to determine a new value for $\theta$ that will minimise $f(\theta)$. This is a stochastic optimisation problem since we do not have direct access to $f(\theta)$ – only noisy access through measurements.
4. Repeat steps 1-3 until the optimiser converges.

Examples of this type of algorithm are the variational quantum eigensolver (VQE) used to calculate ground states of Hamiltonians and the quantum approximate optimisation algorithm (QAOA) for combinatoric optimisation problems.

To obtain information about the objective function $f(\theta)$, it can be expressed in terms of easily measurable operators:

$f(\theta) = \langle\theta|H|\theta\rangle = \sum_{i=1}^m \alpha_i\langle\theta|P_i|\theta\rangle$

where $P_i$ are tensor products of Pauli operators. Then to carry out the optimisation, derivative-free methods such as Nelder-Mead can be used. However, if one wishes to use derivative-based methods such as BFGS or the conjugate gradient method, we need an estimate of the gradient $\nabla f(\theta)$. A numerical way to do this is by finite-differencing which only requires measurements of $f(\theta)$, for small $\epsilon$,

$\frac{\partial f}{\partial \theta_i} \approx \frac{1}{2\epsilon}$ $(f(\theta + \epsilon \hat{e}_i) - f(\theta - \epsilon \hat{e}_i))$

where $\hat{e}_i$ is the unit vector along the $i^{\text{th}}$ component. Each time $f$ is evaluated with different parameters, which can be done in low-depth, many repeat measurements are required.

An alternative method is to take measurements that correspond directly to estimating the gradient $\nabla f(\theta)$. This is referred to as an analytic gradient measurement and usually only requires a slightly greater circuit depth than measuring the objective function. Previously it was not clear whether these analytic gradient measurements could offer an improvement over schemes that used finite-differences or derivative-free methods, but as we will see, Harrow and Napp have proved in this paper that in some cases it can substantially improve convergence rates.

For the rest of this post, the term zeroth-order will refer to taking measurements corresponding to the objective function. First-order will refer to algorithms which make an analytic gradient measurement (and this can generalise to kth-order where the kth derivatives are measured). It is clear how zeroth-order measurements are made – by measuring the Pauli operators $P_i$ with respect to the state $|\theta\rangle$. But how do we make first-order measurements corresponding to the gradient?

The state $|\theta\rangle$ can be rewritten as $|\theta\rangle = U_{1:p}|\psi\rangle$ where the unitary $U_i = e^{-iA_p\theta_p/2}$ and for $i \leq j$ the sequence $U_{i:j}$ is defined as $U_j \hdots U_i$. Therefore, $f(\theta) = \langle\theta|H|\theta\rangle = \langle\psi|U^\dagger_{1:p}HU_{1:p}|\psi\rangle$. This can be differentiated via the chain rule to find:

$\frac{\partial f}{\partial \theta_j} =$ $-\text{Im} \langle\psi|U^\dagger_{1:j} A_j U^\dagger_{(j+1):p} HU_{1:p}|\psi\rangle.$

Recalling that $H = \sum_{l=1}^m \alpha_l P_l$ and writing the Pauli decomposition of $A_j$ as $A_j = \sum_{k=1}^{n_j} \beta_k^{(j)} Q_k^{(j)}$ where $Q_k^{(j)}$ are products of Pauli operators, the derivative can be rewritten as

$\frac{\partial f}{\partial \theta_j} = -\sum_{k=1}^{n_j}\sum_{l=1}^m$ $\beta_k^{(j)}\alpha_l \text{Im} \langle\psi|U^\dagger_{1:j} Q_k^{(j)} U^\dagger_{(j+1):p} P_l U_{1:p}|\psi\rangle.$

This can then be estimated using a general Hadamard test which is used to estimate real (or imaginary) parts of expected values. The circuit that yields an unbiased estimator for $-\text{Im} \langle\psi|U^\dagger_{1:j} Q_k^{(j)} U^\dagger_{(j+1):p} P_l U_{1:p}|\psi\rangle$ is:

Measuring every term in the expansion is unnecessary to estimate $f(\theta)$ or its derivatives. So for all orders, a further sampling step is carried out, where terms in the expansion are sampled from (using a strategy where terms with smaller norms are sampled from with smaller probability) and measured to determine a specific unbiased estimator, but I will not go into details here.

Black-box formulation

To quantify how complex an optimisation problem is, the function to be optimised $f(\theta)$ can be encoded in an oracle. The query complexity of the optimisation algorithm can then be defined as the number of calls made to the oracle. This black-box formalism is typically used in the study of classical convex optimisation, here the quantum part of the variational algorithm has been placed into the black box.

In this black-box model, the classical optimisation loop is given an oracle $\mathcal{O}_H$ encoding $H$. The optimiser is not given specifics about the problem, but it could be promised that the objective function will have certain properties. The outer loop queries $\mathcal{O}_H$ with a state parameterisation $U_p \cdots U_1 |\psi\rangle$, parameters $\theta_1,\hdots,\theta_p$ and a set $S = \{s_1,\hdots,s_k\}$ containing integers $\{1,\hdots,p\}$. The black box then prepares the state $|\theta\rangle$ and if $S = \varnothing$ performs a zeroth-order measurement estimating $f(\theta)$. Otherwise a kth-order query is performed to estimate the derivative of $f(\theta)$ with respect to $\theta_{s_1},\hdots,\theta_{s_k}$. The query cost of this model is the number of Pauli operators measured.

How many measurements are sufficient to converge to a local minimum?

Imagine now that we restrict to a convex region of the parameter space on which the objective function is also convex. We would like to know the upper bounds for the query complexity when optimising $f(\theta)$ to precision $\epsilon$, or in other words how many measurements are required so that convergence to the minimum is guaranteed.

In the paper, results from classical convex optimisation theory are used to compare a zeroth-order algorithm with stochastic gradient descent (SGD) and stochastic mirror descent (SMD, a generalisation of SGD to non-Euclidean spaces). For convex and $\lambda$-strongly convex functions, the upper bounds are shown to be:

Here $E$ and $\overrightarrow{\Gamma}$ are parameters related to the Pauli expansion of $H$ and $R_1, R_2, r_2$ are balls in the convex region we are optimising over. It is clear that SGD and SMD will typically require less measurements to converge to the minimum compared to zeroth-order, but whether SGD outperforms SMD (or vice versa) depends on the problem at hand.

It is important to note that these are the best theoretical bounds, for some derivative-free algorithms (such as those based on trust regions) it can be hard to prove good upper bounds and guarantees of convergence. However they can perform very well in practice and so zeroth-order could still potentially outperform SGD and SMD.

Can first-order improve over zeroth-order measurements?

To answer this question, a toy problem was studied. Consider a class of 1-local $n$-qubit Hamiltonians defined as the set $\mathcal{H}^\epsilon_n := \{H^{\delta(\epsilon)}_v : \forall v \in \{-1,1\}^n\}$ where $\delta(\epsilon) = \sqrt{\frac{45\epsilon}{n}}$ and

$H^\delta_v = -\sum_{i=1}^n \left[\text{sin}\left(\frac{\pi}{4} + v_i \delta\right)X_i + \text{cos}\left(\frac{\pi}{4} + v_i\delta\right)Z_i \right].$

These $H^\delta_v$ are perturbations about $H^0 = -\frac{1}{\sqrt{2}} \sum_{i=1}^n (X_i + Z_i)$ where $\delta$ is the strength and $v$ the direction of the perturbation. We wish to know how many measurements are needed to reach the ground state of $H^\delta_v \,\, \forall v$. This problem is trivial (the lowest eigenvalue and it’s associated eigenvector can be written down directly) which is why the black-box formulation is necessary to hide the problem. The resulting upper and lower query complexity bounds for optimising the family $\mathcal{H}^\epsilon_n$ is found to be:

The proof of the lower bound is too complicated to explain here. Proving the upper bound, in particular for first-order, is simpler and relies on using a good parameterisation

$|\theta\rangle = \left(\otimes_{j=1}^n e^{-i(\theta_j + \pi/4)Y_j/2} \right)|0\rangle^{\otimes n}$

in our optimisation algorithm. $|\theta\rangle$ is a natural choice for the family $\mathcal{H}^\epsilon_n$ as each qubit is polarised in the $\hat{x}-\hat{z}$ plane. The corresponding objective function is then $f(\theta) = -\sum_{i=1}^n \text{cos}(\theta_i - v_i\delta)$ which is strongly convex near the optimum and so stochastic gradient descent performs well here. Note that making higher-order queries is unnecessary in this case as the optimal bounds can be achieved with just first-order.

Conclusion

Ultimately Harrow and Napp have shown that there are cases in which taking analytic gradient measurements in variational algorithms for use in stochastic gradient/mirror descent optimisation routines (compared to derivative-free methods) could help with convergence. It would be interesting to see what happens with more complicated problems and if more general kth-order measurements will provide benefits over first-order. Another extension that is mentioned in the paper is to see what the impact of noisy qubits and gates is on the convergence of the optimisation problem.

I personally am most eager to see how these results will hold up in practice. For example, it would be interesting to see a simple simulation performed for the toy problem comparing zeroth-order optimisers with those that take advantage of analytic gradient measurements.

## A Converse to the Polynomial Method (Arunachalam et al. 2017)

One of the most famous models for quantum computation is the black-box model, in which one is given a black-box called oracle encoding a unitary operation. With this oracle one can probe the bits of an unknown bit string $x \in\{-1,1\}^n$ (the paper uses the set $\{-1,1\}$ instead of $\{0,1\}$, which I will keep here) . The main aim of the model is to use the oracle to learn some property given by a Boolean function $f$ of the bit string $x$, for example, its parity. One use of the oracle is usually referred as a query, and the number of queries a certain algorithm performs reflects its efficiency. Quite obviously, we want to minimize the number of queries. In more precise words, the bounded-error quantum query complexity of a Boolean function $f : \{-1,1\}^n \to \{-1,1\}$ is denoted by $Q_\epsilon (f)$ and refers to the minimum number of queries a quantum algorithm must make on the worst-case input x in order to compute $f(x)$ with probability $1 - \epsilon$.

Looking at the model and the definition for the quantum query complexity, one might ask “but how can we find the query complexity of a certain function? Is there an easy way to do so?”. As usual in this kind of problem, finding the optimal performance of an algorithm for a certain problem or an useful lower bound for it is easier said than done. Nonetheless, there are some methods for tackling the problem of determining $Q_\epsilon (f)$. There are two main methods for proving lower bounds, known as the polynomial method and the adversary method. In this post we shall talk about the first, the polynomial method, and how it was improved in the work of Arunachalam et al.

The polynomial method

The polynomial method is a lower bound method based on a connection between quantum query algorithms and, as the name suggests, polynomials. The connection comes from the fact that for every t-query quantum algorithm $A$ that returns a random sign $A(x)$ (i.e. $\{-1,1\}$) on input $x$, there exists a degree$2t$ polynomial $p$ such that $p(x)$ equals the expected value of $A(x)$. From this it follows that if $A$ computes a Boolean function $f : \{-1,1\}^n \to \{-1,1\}$ with probability at least $1 - \epsilon$, then the polynomial $p$ satisfies $|p(x) - f(x)| \leq 2\epsilon$ for every $x$ (the factor of $2$ comes from the image being $\{-1,1\}$ instead of $\{0,1\}$). Therefore we can see that the minimum degree of a polynomial $p$ that satisfies $|p(x) - f(x)| \leq 2\epsilon$ for every $x$, called the approximate (polynomial) degree and denoted by $deg_\epsilon(f)$, serves as a lower bound for the query complexity $Q_\epsilon (f)$. Hence the problem of finding a lower bound for the query complexity is converted into the problem of lower bounding the degree of such polynomials.

Converse to the polynomial method

We now have a method for proving lower bounds for quantum query algorithms by using polynomials. A natural question that can arise is whether the polynomial method has a converse, that is, if a degree-$2t$ polynomial leads to a $t$query quantum algorithm. This would in turn imply a sufficient characterization of quantum query algorithms. Unfortunately, Ambainis showed in 2006 that this is not the case, by proving that for infinitely many $n$, there is a function $f$ with $deg_\epsilon(f) \leq n^\alpha$ and $Q_\epsilon (f) \geq n^\beta$ for some positive constants $\beta > \alpha$. Hence the approximate degree is not such a precise measure for quantum query complexity in most cases.

In the view of these negative results, the question that stays is, is there some refinement to the approximate polynomial that approximates $Q_\epsilon(f)$ up to a constant factor? Aaronson et al. tried to answer this question around 2016 by introducing a refined degree measure, called block-multilinear approximate polynomial degree and denoted by $bm\text{-}deg_\epsilon(f)$, which comes from polynomials with a so-called block-multilinear structure. This refined degree lies between $deg_\epsilon(f)$ and $2Q_\epsilon(f)$, which leads to the question of how well that approximates $Q_\epsilon(f)$. Once again, it was later shown that for infinitely many $n$, there is a function $f$ with $bm\text{-}deg_\epsilon(f) = O(\sqrt{n})$ and $Q_\epsilon(f) = \Omega(n)$, ruling out the converse for the polynomial method based on the degree $bm\text{-}deg_\epsilon(f)$ and leaving the question open until now, when it was answered by Arunachalam et al., who gave a new notion of polynomial degree that tightly characterizes quantum query complexity.

Characterization of quantum algorithms

In few words, it turns out that $t$-query quantum algorithms can be fully characterized using the polynomial method if we restrict the set of degree-$2t$ polynomials to forms that are completely bounded. A form is a homogeneous polynomial, that is, a polynomial whose non-zero terms all have the same degree, e.g. $x^2 + 3xy + 2y^2$ is a form of degree $2$. And the notion of completely bounded involves the idea of a very specific norm, the completely bounded norm (denoted by $\|\cdot\|_{cb}$), which was originally introduced in the general context of tensor products of operator spaces. But before we venture ourselves into this norm, which involves symmetric tensors and other norms, let us state the main result of the quantum query algorithms characterization.

Let  $\beta: \{-1,1\}^n \to [-1,1]$ and $t$ a positive integer. Then, the following are equivalent.

1. There exists a form $p$ of degree $2t$ such that $\|p\|_{cb} \leq 1$ and $p((x,\boldsymbol{1})) = \beta(x)$ for every $x \in \{-1,1\}^n$, where $\boldsymbol{1}$ is the all-ones vector.

2. There exists a $t$-query quantum algorithm that returns a random sign with expected value $\beta(x)$ on input $x \in \{-1,1\}^n$.

In short, if we find a form of degree $2t$ which is completely bounded ($\|p\|_{cb} \leq 1$) and approximates a function $f$ that we are trying to solve, then there is a quantum algorithm which makes $t$ queries and solves the function. Hence we have a characterization of quantum algorithms in terms of forms that are completely bounded. But we still haven’t talked about the norm itself, which we should do now. It will involve a lot of definitions, some extra norms and a bit of C*-algebra, but fear not, we will go slowly.

The completely bounded norm

For $\alpha \in \{0,1,2, \dotsc \}^{2n}$, we write $|\alpha| = \alpha_1 + \dotsc + \alpha_{2n}$. Any form $p$ of degree $t$ can be written as

$p(x) = \sum_{|\alpha| = t} c_{\alpha} x^\alpha$,

where $c_{\alpha}$ are real coefficients. The first step towards the completely bounded norm of a form is to define the completely bounded norm of a tensor, and the tensor we use is the symmetric $t$-tensor $T_p$ defined as

$(T_p)_\alpha = c_\alpha/|\{\alpha\}|!$

where $|\{\alpha\}|!$ denotes the number of distinct elements in the set formed by the coordinates of $\alpha$.

The relevant norm of $T_p$ is given in terms of an infimum over decompositions of the form $T_p = \sum_\sigma T^\sigma \circ\sigma$, where $\sigma$ is a permutation of the set $\{1,\dotsc,t\}$ and $(T^\sigma\circ\sigma)_\alpha = T^\sigma_{\sigma(\alpha)}$ is the permuted element of the multilinear form $T^\sigma$. So that the completely bounded norm of $p$ is kind of transferred to $T_p$ via the definition

$\|p\|_{cb} = \text{inf }\{\sum_\sigma \|T^\sigma\|_{cb} : T_p = \sum_\sigma T^\sigma\circ\sigma\}$.

Just to recap, with the coefficients of the polynomial $p$ we define the symmetric $t$-tensor $T_p$, which is then decomposed into the sum of permuted tensor (multilinear form) $T^\sigma$. We then define the completely bounded norm of $p$ as the infimum of the sum of the completely bounded norm of such tensor, but now without permuting it. Of course, we haven’t yet defined the completely bounded norm of such tensor, that is, what is $\|T^\sigma\|_{cb}$? We will explain it now.

The idea is to get a bunch of collections of $d \times d$ unitary matrices $U_1(i), U_2(i), \dotsc, U_t(i)$ for $i \in \{1, \dotsc ,2n\}$ and consider the quantity

$\|\sum_{i,j,\dotsc,k} T^\sigma_{i,j,\dotsc,k} U_1(i)U_2(j)\dotsc U_t(k)\|$.

We multiply the unitaries from these collections and sum them using the tensor $T^\sigma$ as weight, and then take the norm of the resulting quantity. But here we are using a different norm, the usual operator norm defined for a given operator $A$ as $\|A\| = \text{inf }\{c\geq 0 : \|Av\| \leq c\|v\| \text{ for all vectors } v\}$. Finally, with these ingredients in hand, we can define the completely bounded norm for $T^\sigma$, which is just the supremum over the positive integer $d$ and the unitary matrices, that is,

$\|T^\sigma\|_{cb} = \text{sup }\{\|\sum_{i,j,\dotsc,k} T^\sigma_{i,j,\dotsc,k} U_1(i)U_2(j)\dotsc U_t(k)\| : d \in \mathbb{N}, d \times d \text{ unitary matrices } U_i\}$.

If we can obtain the supremum of such norm over the size of the unitary matrices and the unitary matrices themselves, then we obtain the completely bounded norm of the tensor $T^\sigma$, and from this we get the completely bounded norm of the associated form $p$. Is there such a degree-$2t$ form with $\|p\|_{cb} \leq 1$ that approximates the function $f$ we want to solve? If yes, then there is a $t$-query quantum algorithm that solves $f$.

The proof

Let us briefly explain the proof of the quantum algorithms characterization that we stated above. Their proof involves three main ingredients. The first one is a theorem by Christensen and Sinclair showing that the completely-boundedness of a multilinear form is equivalent to a nice decomposition of such multilinear form. In other words, for a multilinear form $T$, we have that $\|T\|_{cb} \leq 1$ if and only if we can write $T$ as

$T(x_1,\dotsc,x_t) = V_1\pi(x_1)V_2\pi(x_2)V_3\dotsc V_t\pi(x_t)V_{t+1}$,                                                         (1)

where $V_i$ are contractions ($\|V_ix\| \leq \|x\|$ for every $x$) and $\pi_i$ are *-representations (linear maps that preserve multiplication operations, $\pi(xy) = \pi(x)\pi(y)$).

The second ingredient gives an upper bound on the completely bounded norm of a certain linear map if it has a specific form. More specifically, if $\sigma$ is a linear map such that $\sigma(x) = U\pi(x)V$, where $U$ and $V$ are also linear maps and $\pi$ is a *-representation, then $\|\sigma\|_{cb} \leq \|U\|\|V\|$.

The third ingredient is the famous Fundamental Factorization Theorem that “factorizes” a linear map in terms of other linear maps if its completely bounded norm is upper bounded by these linear maps. In other words, if $\sigma$ is a linear map and exists other linear maps $U$ and $V$ such that $\|U\|\|V\| \leq \|\sigma\|_{cb}$, then, for every matrix $M$, we have $\sigma(M) = U^\ast(M\otimes I)V$.

With these ingredients, they proved an alternative decomposition of equation (1) which was later used to come up with a quantum circuit implementing the tensor $T_p$ and using $t$ queries, and then showed that this tensor $T_p$ matched the initial form $p$. This alternative decomposition is

$T(x_1,\dotsc,x_t) = u^\ast U_1^\ast (\text{Diag}(x_1)\otimes I)V_1 \dotsc U_t^\ast (\text{Diag}(x_t)\otimes I)V_tv$,                              (2)

where $U_i, V_i$ are contractions, $u,v$ are unit vectors, $\text{Diag}(x)$ is the diagonal matrix whose diagonal forms $x$. The above decomposition is valid if, similarly to (1), $\|T\|_{cb} \leq 1$.

We can see that the decomposition involves the operator $\text{Diag}(x)$ intercalated by unitaries $t$ times. Having $\text{Diag}(x)$ as the query operator, it is possible then to come up with a quantum circuit implementing the decomposition (2). If we look more closely, decomposition (2) has unit vectors on both the left and right sides, which looks like an expectation value. So what is going on is that decomposition (2) can be used to construct a quantum circuit whose expectation value matches $T_p$, and hence the polynomial $p$ used to construct $T_p$. We won’t get into much details, but we will leave the quantum circuit so that the reader can have a glimpse of what is going on.

In the above figure representing a quantum circuit that has $T_p$ as its expectation value, the registers C, Q, W denote control, query and workspace registers. The unitaries $W_i$ are defined by $W_i = V_{i-1}U_i^\ast$, the unitaries $U, V$ have $W_1V_0 u$ and $W_{2t+1}U_{2t+1}v$ as their first rows, where $V_0$ and $U_{2t+1}$ are isometries ($A$ is an isometry iff $\|Ax\| = \|x\|$ for every $x$).

Application of the characterization: separation for quartic polynomials

A couple of years ago, in 2016, Aaronson et al. showed that the bounded norm (and also the completely bounded norm that we spent some time describing) is sufficient to characterize quadratic polynomials. More specifically, they showed that for every bounded quadratic polynomial $p$, there exists a one-query quantum algorithm that returns a random sign with expectation $Cp(x)$ on input $x$, where $C$ is an absolute constant. This readily prompted the question of whether the same is valid for higher-degree polynomials, that is, if the bounded norm suffices to characterize quantum query algorithms.

As you might expect by now, the answer is no, since it is the completely bounded norm that suffices, and Arunachalam et al. used their characterization to give a counterexample for bounded quartic polynomials. What they showed is the existence of a bounded (not completely bounded) quartic polynomial p that for any two-query quantum algorithm whose expectation value is $Cp(x)$, one must have $C = O(n^{-1/2})$, thus showing that $C$ is not an absolute constant. The way they showed this is by using a random cubic form that is bounded, but whose completely bounded norm is $poly(n)$ and then embedded this form into a quartic form.

Conclusions

The characterization they found is, in my opinion, apart from the obvious point that it answers the open question of the converse of the polynomial method, quite interesting in the sense that the set of polynomials we should look at is the ones that are bounded by a particular norm, the completely bounded norm. The interesting point is exactly this one, the connection between quantum query algorithms and the completely bounded norm that was first introduced in the general context of tensor products of operator spaces as mentioned before. The norm itself looks quite exotic and complicated, which is linked to a question that Scott Aaronson made at the end of Arunachalam’s talk in QIP 2019, “sure, we could define a particular norm as one that restricts the set of polynomials to the ones that admit a converse to the polynomial method”. Of course, such a norm would not be quite an interesting one. He carries on with something like “In what is this completely bounded norm different compared to that?”. If I remember correctly, Arunachalam gave an answer on the lines of the completely bounded norm appearing in a completely different context. But I still find it surprising that you would go through all the exotic definitions we mentioned above for the completely bounded norm and discover that it is what is needed for the converse of the polynomial method.