A duplication-free quantum neural network for universal approximation

Xiaokai Hou Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610051, China    Guanyu Zhou zhoug@uestc.edu.cn Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610051, China    Qingyu Li Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610051, China    Shan Jin Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610051, China    Xiaoting Wang xiaoting@uestc.edu.cn Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610051, China
Abstract

The universality of a quantum neural network refers to its ability to approximate arbitrary functions and is a theoretical guarantee for its effectiveness. A non-universal neural network could fail in completing the machine learning task. One proposal for universality is to encode the quantum data into identical copies of a tensor product, but this will substantially increase the system size and the circuit complexity. To address this problem, we propose a simple design of a duplication-free quantum neural network whose universality can be rigorously proved. Compared with other established proposals, our model requires significantly fewer qubits and a shallower circuit, substantially lowering the resource overhead for implementation. It is also more robust against noise and easier to implement on a near-term device. Simulations show that our model can solve a broad range of classical and quantum learning problems, demonstrating its broad application potential.

I Introduction

Machine learning (ML) is a powerful data-analyzing tool that has generated a series of impactful results in image recognition [1, 2, 3], natural language processing [4, 5, 6], self-driving [7, 8, 9], etc. Meanwhile, quantum machine learning has become an emerging interdisciplinary subject combining machine learning with quantum computing. It studies two fundamental questions [10], one on applications of classical ML to quantum problems [11, 12, 13, 14, 15, 16], and the other on implementations of ML algorithms on a quantum processor. Recent works point out that implementing ML algorithms on a quantum computer to process quantum data requires exponentially fewer experiments than dealing with the same task on a classical computer [17]. In addition, it has been shown that many ML algorithms can be implemented on a quantum machine with certain levels of advantage, ranging from data fitting [18], support vector machine [19], nearest neighbors [20], principle component analysis [21], and linear regression [22], to PageRank algorithm [23], reinforcement learning [24, 25, 26], anomaly detection [27] and Boltzmann machine [28]. For other ML topics, especially the quantum neural networks (QNNs), the existence of quantum advantage is largely unknown [29], although an interesting result on the prediction advantage for QNNs has been recently achieved [30].

A neural network (NN) is a parameterized composite mapping comprised of activation functions and is very powerful in data fitting. A QNN is an NN implemented on a quantum computer [31] and there are various proposals to design it, such as the variational QNN [32, 33]. Besides quantum advantage, designing an effective and efficient QNN is also important. Not every QNN design effectively solves the given ML problem, and the concept of universal approximation is introduced to explain why some neural networks fail to work [34, 35]. Universal approximation, or the universality of an NN, refers to its ability to approximate arbitrary functions. For classical NNs, universality is easy to achieve; for QNNs, a smart design is required to make them universal and practical. In fact, the universality of NNs is closely related to the nonlinear activation functions, and the key point of designing a universal QNN is to generate the desired nonlinearity for function fitting. One method is to construct the quantum neurons, which are the building blocks of some QNNs [36, 37, 38, 39, 40, 41]; the other method is to construct a variational QNN from a parameterized quantum circuit, and the nonlinearity can be achieved by duplicating the quantum data into a tensor product of multiple copies [42, 43]. Examples of the latter method include the quantum circuit learning (QCL) algorithm [32] and the circuit-centric quantum classifiers (CCQ) algorithm [33]. So far, the universality of the QCL algorithm has been proved using the Weierstrass theorem, while the universality of the CCQ algorithm remains open. For both QCL and CCQ, approximating a highly-nonlinear function would require a tensor product of many data-encoding subsystems, resulting in a large overall system size with a considerable circuit complexity, conflicting with the principle of NISQ computing where a relatively small quantum system with a shallow circuit is preferred.

To address this problem, we propose a duplication-free quantum neural network (DQNN) based on a variational quantum circuit and rigorously prove its universality. Our model utilizes the classical sigmoid function to generate nonlinearity without duplicating the quantum data into a tensor product of multiple copies. Compared with the CCQ or QCL algorithms, our DQNN significantly reduces the system size and gate complexity, and hence the overall noise effect. Therefore, it is more likely to be implemented on near-term devices. Numerical simulations show that our DQNN outperforms the other two variational QNN algorithms with better performance on typical regression and classification problems and is more robust against noise. In addition, through solving a broad range of classical and quantum learning problems, our model has well demonstrated its wide application potential.

II Two types of designs for universal approximation

Before we discuss the DQNN design, let’s briefly review two common types of designs for the universal approximation of a classical NN. It turns out that universality is closely related to different kinds of convergence. Type I approximation relies on pointwise convergence; a typical example is the polynomial approximation, using the Weierstrass theorem and polynomials to approximate arbitrary functions; Type II approximation depends on the convergence in Lebesgue integration, e.g., L2-norm convergence. For polynomial approximation, there are several limitations. First, the Weierstrass theorem is only valid on a compact set, while many NN problems are defined on unbounded set, on which polynomials could fail to approximate a Lebesgue integrable function. For example, the function f(x)=ex defined on [0,+) can not be well approximated by polynomials since the latter diverges to ± while ex vanishes as x. Second, polynomial approximation has a numerical problem called Runge’s phenomenon [44] when the polynomial order becomes very large, which is why using high-order polynomials for function fitting is not favored in practical use. Due to these limitations, polynomials are ineffective in designing universal NNs. It has been proved that the necessary and sufficient condition for a universal feedforward NN is that its activation functions are not polynomial [35]. In comparison, Type II approximation utilizes basis functions such as the hat function in finite-element method [45] or the sigmoid/ReLU function composite with linear function in neural networks [46]. Usually, the optimal approximation is the solution to a minimization problem, and the error is evaluated in a suitable function norm. In particular, universal NNs based on L2-norm approximation are widely applied and found to be very effective. Such examples include the ones with activation functions, such as the sigmoid or the ReLU functions. Hence, this work aims to discuss how to construct a universal and effective QNN based on L2 approximation, using the trick analogous to the sigmoid function instead of the polynomials. We expect that such QNN is effective in a broad range of applications compared to the QNN based on polynomial approximation.

III DQNN and its universality

Mathematically, a variational QNN is a mapping q𝜽:𝒙dq𝜽(𝒙) built on a parameterized quantum circuit U(𝜽). It involves a hybrid quantum-classical algorithm that utilizes a quantum processor to generate the output q𝜽(𝒙) for each input 𝒙, and a classical processor to optimize 𝜽 to solve learning problems. Specifically, given an unknown function f:𝒙f(𝒙), and a data set D={(𝒙(i),y(i))}i=1m, where y(i)=f(𝒙(i)) is the label of 𝒙(i), the aim of QNN is to find an optimal 𝜽* such that q𝜽* best approximates the target function f. Our DQNN model consists of three parts, a quantum processor, a classical processor and a classical optimizer, as illustrated in Fig. 1(a). First, each data point 𝒙=[x1,x2,,xd]Td is encoded into a state of an n-qubit register (n=logd):

|𝒙¯=1γ[x1,x2,,xd,x~,0,0]T2n, (1)

using the amplitude encoding method [47], where x~𝒙1+𝒙 is a padding term chosen by the user and γ is a normalization factor (Appendix A). After initial state preparation, we apply to |𝒙¯ a series of variational quantum circuits {U(j)(𝜽(j))}j=1ncir according to |𝒙¯f(j)=U(j)(𝜽(j))|𝒙¯, followed by measurements of a set of observables {Bi}i=1nobs, to obtain the outcomes Bi𝜽(j),¯x=Tr(|𝒙¯f(j)𝒙¯f(j)|Bi). The choices of {Bi} are not unique and they can be chosen randomly from the generalized Pauli basis {Pl}l=14n where Pl=A1(l)A2(l)An(l) with Ak(l){I,X,Y,Z}, and {X,Y,Z} are the Pauli matrices. U(j)(𝜽(j)) comprises L-layers of quantum circuits. Each layer consists of n parameterized R-rotations (with one on each qubit), and n parameterized controlled-R gates, as shown in Fig. 1, with

R=R(θ1,θ2,θ3)=(eiθ2cosθ1eiθ3sinθ1eiθ3sinθ1eiθ2cosθ1).

Notice that the order of the 2n gates in each layer is not unique and can be chosen randomly [33]. Next, based on Bi𝜽(j),¯x and the sigmoid function σ(x)=1/(1+ex), the classical processor computes and obtains the output:

q𝜽,𝒂,𝒄,𝜶(𝒙¯)j=1nciri=1nobsαi(j)σ(ai(j)(Bi𝜽(j),𝒙¯ci(j))) (2)

with ai(j)>0, ci(j)[0,1] where 𝜽 and (𝒂,𝒄,𝜶) are parameters to be trained. The entire process from |𝒙¯ to q𝜽,𝒂,𝒄,𝜶(𝒙¯) is summarized in Fig. 1. Finally, one can find the optimal values of (𝜽,𝒂,𝒄,𝜶) to solve the given learning problem, through gradient-based optimization using algorithms such as SGD [48], ADAM [49] or BFGS [50]. The benefit of such construction is that DQNN can be proved to be universal based on L2 approximation, which is summarized in the following theorem:

Theorem 1.

Let G¯ be a subset of the complex sphere 𝕊 in 2n, and f(𝐱¯):G¯ be an arbitrary square-integrable function on G¯. We define the following parameterized functions:

q𝒛1,,𝒛ns,𝒂,𝒄,𝜶(𝒙¯)j=1nsαjσ(aj(|𝒙¯|𝒛j|2cj)) (3)

and denote Q(G¯){q𝐳1,,𝐳ns,𝐚,𝐜,𝛂} as the set of all such functions, where ns, {𝐳j}j=1ns𝕊, 𝐚+ns, 𝐜[0,1]ns, and 𝛂ns. Then Q(G¯) is dense in L2(G¯) in the following sense: for any ϵ>0,

G¯|q𝒛1,,𝒛ns,𝒂,𝒄,𝜶(𝒙¯)f(𝒙¯)|2𝑑μ<ϵ. (4)

The detailed proof of Theorem 1 can be found in Appendix B. It turns out that any q𝒛1,,𝒛ns,𝒂,𝒄,𝜶Q(G¯) in Eq. (3) can be generated as the output of the DQNN in Eq. (2). Specifically, given the parameters (𝒛1,,𝒛ns,𝒂,𝒄,𝜶), we can design a DQNN with a single observable B|bb|, nobs=1, and a series of quantum circuits {U(j)(𝜽(j))}j=1ns such that U(j)(𝜽(j))|b=|zj. Then the output of the DQNN in Eq. (2) is reduced to:

q𝜽,𝒂,𝒄,𝜶(𝒙¯) =j=1nsαjσ(aj(B𝜽(j),𝒙¯cj))
=j=1nsαjσ(aj(|𝒙¯|𝒛j|2cj))
=q𝒛1,,𝒛ns,𝒂,𝒄,𝜶(𝒙¯). (5)
Refer to caption
(a) large
Refer to caption
Refer to caption
Figure 1: (a) The framework of the DQNN consists of three parts, a quantum processor, a classical processor and a classical optimizer. Measurement results of a quantum processor are the inputs of a classical processor. The parameters in the quantum and classical processors are updated by a classical optimizer. (b) One layer variational quantum circuit of U(j)(𝜽(j)). (c) The transformation from |𝒙¯ to q𝜽,𝒂,𝒄,𝜶(𝒙¯). Yellow balls represent the quantum qubits; green circles represent the classical sigmoid function.

Thus we have proved the following corollary:

Corollary 1.

The DQNN designed in Fig. 1 (c) with the output in Eq. (2) is universal.

Notice that choosing multiple {U(j)(𝜽(j))}j=1ncir and a single observable B is only a sufficient condition to prove the universality in Corollary 1. In practice, we often design a DQNN with a single U(𝜽) (ncir=1) and a set of {Bi} as the first trial to solve the given problem. If it is not good enough, we then construct a new DQNN with a larger ncir. It turns out a DQNN with ncir=1 is often sufficient for most learning problems we have studied. In addition, in the proof of Corollary 1, we have assumed that the variational circuits {U(j)(𝜽(j))} with U(j)(𝜽(j))|b=|zj can be generated by our designed multi-layer variational circuits as shown in Fig. 1 (b). More layers in U(j)(𝜽(j)) will make it more powerful to approximate arbitrary unitary gate, but will also make the optimization process computationally more expensive. Hence, in practice, we often construct a DQNN starting from a single U(𝜽) consisting of one or two layers. In the following, we will study the performance of DQNN with different values of ncir, and for simplicity, we will denote the corresponding network as DQNNncir.

IV Performance comparison among QNN models

In Section II, we have mentioned that universality serves as a necessary condition to build an effective QNN, and among the two types of universal design, an NN with L2 convergence using sigmoid basis functions works more effectively in a broader range of applications. As shown above, the nonlinearity and the universality of DQNN originates essentially from the L2 convergence of sigmoids functions, which is distinctive from other polynomial-approximation-based QNN models, including the CCQ [33] and the QCL [32], whose nonlinearity is mainly generated by duplicating the same data-encoding state into a tensor product of multiple subsystems. In this way, DQNN has significantly reduced the number of qubits required to construct a universal QNN. In the following, we will demonstrate the advantages of DQNN by comparing its performance with those of the two well-established QNN models, the CCQ and the QCL.

For an ntot-qubit QNN model, we define its complexity as Cngatenm, where ngate denotes the number of quantum gates and nm denotes the number of quantum measurements. Notice that given a measurement precision, nm is proportional to the number of measurement observables nobs. Hence, with a further assumption that the QNN variational circuit can be efficiently generated, i.e., ngate=O(poly(ntot), the overall complexity of the QNN then becomes C=O(poly(ntot)nobs). We hope to figure out the numbers of qubits and observables required to approximate an M-order polynomial function f(𝒙)=i=0Mci𝒙i with 𝒙=1 using CCQ, QCL and DQNN respectively. We denote by ndata the number of qubits in the data-encoding register, and by ncopy the number of copies of the data-encoding register used in a QNN model. To approximate such f, CCQ requires ncopy=M2 duplicates of the data-encoding register |𝒙¯=1γi=1d+1xi|i such that the amplitude of the input state, |ψ𝒙=|𝒙¯M2, is an M2-order polynomial of 𝒙. After applying a circuit U(𝜽), we measure the expectation values of a given observable B according to B𝜽,𝒙=ψ𝒙|U(𝜽)BU(𝜽)|ψ𝒙. The output of the CCQ is an M-order polynomial of 𝒙. Then the goal of CCQ is to find the best value 𝜽=𝜽* through optimization such that 𝒙B𝜽*,𝒙 is the optimal function to approximate 𝒙f(𝒙). The number of qubits in the data-encoding register is ndata=logd and the number of observables is nobs=1. Hence the total complexity of CCQ is equal to CCCQ=O(poly(Mlogd)). For QCL, a d-dimensional classical data 𝒙 is encoded into a d-qubit data-encoding register as ρ(𝒙)=12di=1d(I+xiX+1xi2Z]) using several single-qubit rotation gates. In order to generate an M-order polynomial, ρ(𝒙) is duplicated into M identical copies in a tensor product as ρin(𝒙)=j=1Mρ(𝒙). Applying a circuit U(𝜽) to ρin, we measure the expectation value of a given observable B. The outcome B𝜽,𝒙 is an M-order polynomial of 𝒙. Through optimization, we can find the best value 𝜽*=𝜽 such that 𝒙B𝜽*,𝒙 will well approximate 𝒙f(𝒙). The total number of qubits in QCL is ntot=Md and the number of observables is nobs=1. Hence the total complexity of QCL is CQCL=O(poly(Md)).

Different from the above two duplication-based QNN models, DQNN only uses logd qubits to encode 𝒙 as |𝒙¯=1𝒙i=1d+1xi|i. We sequentially apply a set of variational quantum circuits {U(j)(𝜽(j))}j=1ncir to |𝒙¯ and measure the expectation Bi𝜽(j),𝒙¯=𝒙¯|U(j)(𝜽(j))BiU(j)(𝜽(j))|𝒙¯ for a set of chosen observables {Bi}i=1nobs. Next we apply the parameterized classical sigmoid function σ and derive the final output q𝜽,𝒂,𝒄,𝜶. Through the optimization, we can find the optimal values (𝜽*,𝒂*,𝒄*,𝜶*) to approximate the function f. The number of measurement observables nobs is determined by f. Hence the total complexity of DQNN is CDQNN=O(poly(logd)nobs). The comparison is summarized in Table 1. It can be seen that for large d and large M, our DQNN can significantly reduce the total complexity to generate nonlinearity. More details can be found in Appendix C.

Algorithm ncopy ndata Complexity
QCL M d O(poly(Md))
CCQ M2 logd O(poly(Mlogd))
DQNN 1 logd O(poly(logd)nobs)
Table 1: The number of required qubits and the circuit complexity among three proposals to approximate an M-order polynomial function f of 𝒙.
Refer to caption
Refer to caption
Refer to caption
Figure 2: (a) The regression data set generated by a polynomial function. (b) The classification data set with the ring-shaped boundaries. The points in the yellow part are labeled 0; the others are labeled 1. (c) The approximation process of the DQNN1 with one variational quantum circuit and 4 observables in solving the regression problem based on the dataset in (a). In the task, one more sigmoid function (the rightmost node) is added into the DQNN1 to transform the output of DQNN1 into a range (0,1).

To demonstrate that our DQNN has advantage in solving classical ML problems and in reducing circuit complexity, we apply DQNN, QCL, and CCQ to two typical supervised learning problems, a regression problem and a classification problem, and compare their performance. The first problem is learning from a data set {𝒙(i),y(i)} to approximate a highly nonlinear polynomial f(𝒙)=(0.71561.0125x12+x14)(0.71561.0125x22+x24) with x1,x2[0.8,0.8]2 (Fig. 2). For input 𝒙(i), the output of the QNN is denoted as q(i). One can define the relative error ϵ1Ni|q(i)y(i)y(i)| to measure the performance of the QNN in solving regression problems. A tiny relative error implies that the QNN approximates the target function f well. First, we run the hybrid optimization process for each model based on the data set. To show the effectiveness of the three QNN models, we choose different values of the number of layers for each model so that their circuit complexity C is similar in size. We use the same definition of ncopy as in the above section. Then, we compare the optimal relative errors achieved for each model. We choose two types of DQNN to solve the regression problem. One has a single-layer variational quantum circuit and multiple observables. We name it DQNN1. The other has 4 single-layer variational quantum circuits and one observable. We name it DQNN4. We find that, with no duplicate, ncopy=1, the relative error achieved by DQNN is substantially lower than those achieved by QCL and CCQ, as shown in Table 2(a). If we add one more duplicate to both the QCL and the CCQ models, their optimal relative error will decrease, but their complexity will increase and surpass that of the DQNN. In addition, we implement such comparison for a few other examples, and simulation results suggest that DQNN can prominently reduce the circuit complexity to approximate highly nonlinear functions compared with QCL and CCQ. The approximation process of the DQNN1 with 4 observables is shown in Fig. 2, and more details of the comparison can be found in Appendix D.

We apply the three models to a binary classification problem on a ring-shaped data set in the second task. The boundaries of the two sub-datasets are determined by six curves x12+x22=0.16, x12+x22=0.81, and x1=x2=±1 (Fig. 2). One can use classification accuracy to quantify the performance of QNN model in solving classification problems, which is defined as ϵ1Ni1δ(q(i)y(i)) where q(i) is the output of QNN, y(i) represents the label of 𝒙(i) and δ is the Dirac delta function [51]. Analogous to the first task, we choose the DQNN with one variational quantum circuit and multiple observables (DQNN1) and the DQNN with 4 variational quantum circuits and one observable (DQNN4) for the classification task. We choose appropriate values for ncopy and nlay, so that the circuit complexity C of the three models is similar in size. After optimizing the three QNN models, we find that accuracy achieved by the DQNN is substantially higher than those achieved by QCL and CCQ. In addition, increasing the number of duplicates does help QCL and CCQ to improve their optimal accuracy, but even a 5-copy model (with 12 qubits) for QCL or CCQ cannot perform equally well as a DQNN with only two qubits (Table 2(b)). Next, we implement such a comparison for a few other examples. Simulation results suggest that DQNN can prominently reduce the circuit complexity compared to QCL and CCQ, but maintain a good performance in solving classification problems.

Model ndata Copy number (ncopy) # Qubits (ntol) # Layers (nlay) # Observables (nobs) C Relative error
DQNN1 2 1 2 1 4 48 6.79%
DQNN4 2 1 2 1 1 48 5.46%
QCL 2 1 2 4 1 48 8.21%
CCQ 2 1 2 4 1 51 12.92%
QCL 2 2 4 5 1 120 7.35%
CCQ 2 2 4 4 1 99 4.74%
(a) Regression problem to approximate f(𝒙)=(0.71561.0125x12+x14)(0.71561.0125x22+x24)
Model ndata Copy number (ncopy) # Qubits (ntol) # Layers (nlay) # Observables (nobs) C Accuracy
DQNN1 2 1 2 1 20 240 91.40%
DQNN4 2 1 2 1 1 48 97.63%
QCL 2 2 4 5 2 240 74.18%
CCQ 2 2 4 10 1 243 75.20%
QCL 2 6 12 10 2 1440 76.93%
CCQ 2 6 12 10 1 723 80.63%
(b) A classification problem defined on a ring-shaped dataset
Table 2: A comparison of the performance among DQNN, QCL and CCQ in two classical ML problems. ndata indicates the number of qubits to store the classical data. The total number of qubits in each QNN model is equal to ntot=ndatancopy. We use Cngatenobs to represent the circuit complexity where ngate is the number of quantum gates in the variational quantum circuit and nobs is the number of observables. For QCL and CCQ, ncopy, nlay are free to choose; for DQNN, nlay and nobs are free to choose. We choose two types of DQNN in the simulations: DQNN1 contains one circuit and multiple observables, while DQNN4 contains four circuits and one observable. All of the results are obtained by averaging 10 independent simulations.

Since the DQNN can reduce the circuit complexity, we expect that it can achieve a better performance in the presence of noise, and hence is easier to be implemented on NISQ devices. We will consider two types of noise in this work: one is the coherent noise caused by the imprecision of the classical control on the parameter values in the QNN circuits; the other is the decoherence generated by interactions between the quantum register and its environment. We add a coherent noise ϵj to the j-th parameter, θj, in the variational circuit according to θ~j=θj+ϵj when we train the three QNN models, and test their performance under the noisy environment. The coherent noise ϵj is generated by sampling from a Gaussian distribution, N(0,Δ2) where Δ indicates the noise intensity. It can be seen from Fig. 3 and 3 that for the regression task, the DQNN achieves lower relative error than QCL and CCQ when we increase the coherent noise. Meanwhile, for the classification task (as shown in Fig. 3), DQNN is more robust than the other two when the coherent noise increases. In addition to the coherent noise, we further apply the decoherence, ϵ, to the two-qubit control gates, UCR, contained in the quantum circuits according to ϵ(UCRρUCR)=(1p)UCRρUCR+pPiUCRρUCRPi where {Pi} indicates the complete Pauli basis and p represents the probability of decoherence occurrence. From Fig. 3, we can find that, in the regression task, the DQNN has a significantly lower relative error than QCL and CCQ as the probability of decoherence increases. These numerical results demonstrate that the DQNN can decrease the influence of noise accumulation in the training process and is more likely to be implemented on near-term devices.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 3: Comparison of the robustness for three QNN models in solving the regression and classification problems. In the regression task, we use DQNN1, 2-qubit QCL and 2-qubit CCQ for the comparison. In the classification task, we use DQNN1, 4-qubit QCL and 4-qubit CCQ for the comparison. (a)-(b) Relative errors achieved by QCL, CCQ and DQNN as the coherent noise increases in the regression task. (c) The comparison among three QNNs in the classification task as the coherent noise increases. (d) The comparison among three QNNs in the regression task as the probability of decoherence occurrence increases.

Besides the comparison among three QNN models over two supervised learning problems, we further demonstrate the performance of the different QNNs in solving the typical real-world machine learning problem based on the handwritten digits data set, MNIST [52]. In this task, a QNN is first trained with 60,000 figures, each of which has a number ranging from 0 to 9, and then is tested with 10,000 new figures to generate the classification accuracy. A figure in the MNIST data set is reshaped to be a 256-dimensional vector and encoded into an 8-qubit quantum state by using the amplitude encoding method. Since the QCL needs 256 qubits to encode the classical data, it is intractable to implement the QCL in solving this task and we only compare the performance of the DQNN and the CCQ. For our DQNN, we use 20 2-layer variational circuits and one observable B=Z(1)Z(2)Z(8). The classification accuracy over 10,000 new figures can achieve 94.36%. For the CCQ, we use one variational quantum circuit and a set of POVM operators {Bj}j=110 to generate the final output. We adopt different layers for the variational quantum circuit, however, the performance of the CCQ is still significantly lower than the DQNN. For instance, the CCQ with a 20-layer variational quantum circuit only achieves 83.52% accuracy. The result demonstrates that the QNN without a strict universality guarantee will fail in solving machine learning problems.

V Comparison with classical neural networks

To demonstrate that our DQNN can perform at least as effectively as the classical NN, we apply the DQNN to solve three typical real-world classification problems and further compare its performance with the classical NN in the minimum parameters required the number of iterations to achieve a similar classification accuracy. Since our DQNN is essentially with a single hidden layer, we will choose a single-hidden-layer classical neural network for the comparison. On the handwritten digits data set, MNIST [52], we implement a binary classification (MNIST2) to distinguish the digits 0 and 1, and a 3-target classification (MNIST3) to recognize the digits 0, 1, 2. Each sample is reshaped into a 256-dimensional vector and taken as the input of DQNN and classical NN. Meanwhile, we implement two more classification tasks based on the Wine data set [53] and the Breast Cancer data set [53]. We use a 4-qubit state to store the 13-dimensional classical data for the Wine problem, and a 4-qubit state to store the 10dimensional data for the Breast Cancer problem. As shown in Table 3, the DQNN performs as well as the classic neural networks in discovering complex patterns hidden in real-world data sets. Notice that DQNN uses fewer parameters to achieve similar accuracy than the classical counterpart but is at the extra cost of required iterations in some cases.

Model Task NTrain NTest # Parameters # Iterations AccTrain AccTest
DQNN MNIST2 12665 2115 54 150 98.66% 99.05%
Classical NN MNIST2 12665 2115 259 26 98.54% 99.07%
DQNN MNIST3 18623 3147 495 254 98.61% 99.03%
Classical NN MNIST3 18623 3147 523 367 98.48% 99.02%
DQNN Wine 143 35 27 172 96.29% 100%
Classical NN Wine 143 35 83 230 97.04% 100%
DQNN Breast Cancer 560 39 34 282 94.46% 95.74%
Classical NN Breast Cancer 560 39 55 105 93.57% 95.89%
Table 3: A comparison of the minimum parameters required and the number of iterations to achieve a similar classification accuracy between DQNN and classical neural network on three typical real-world data sets. We use the ADAM algorithm to optimize the parameters. NTrain and NTest respectively denote the number of the training set and the test set. AccTrain and AccTest represent the classification accuracy on each data set.

VI Quantum phase recognition

Besides the classical ML problems, our DQNN model can solve ML problems on quantum data sets. In the following, we concentrate on a class of quantum phase recognition problems, the discrimination of the 2×2 symmetry-protected topological (SPT) phase on a S=1 Haldane chain [54]. The ground state of the Hamiltonian,

H= Ji=1N2Z(i)X(i+1)Z(i+2)h1i=1NX(i)
h2i=1N1X(i)X(i+1),

has three topological phases, the SPT phase, the paramagnetic phase and the antiferromagnetic phase, where h1 and h2 are parameters. Our goal is to identify whether a given state sampled from the phase diagram in Fig. 4 belongs to the SPT phase. To complete such a task, we take 20×20 equally spaced points from h1[0,1.6] and h2[1.6,1.6] as the training set and 64×64 equally spaced points as the test set. If the ground state belongs to the SPT phase, it is labeled [1,0]T; otherwise, it is labeled [0,1]T. We numerically implement the DQNN with a single 15-qubit variational quantum circuit with 420 parameters and 10 observables. After training, the accuracy of the DQNN on the test data achieves 99.10%. It indicates that our DQNN model has solved the quantum phase recognition problem with good performance. However, the training set to train the DQNN is larger than the training set to train the quantum convolutional neural network [55] for phase recognition task. The reason for the difference is that the DQNN used in the comparison is only a prototype, while the quantum convolutional neural network in [55] has been optimized for this task, with a stronger ability to analyze the local information of quantum states. Hence, as a future work, we hope to improve the DQNN design to optimize its performance for the given ML problem.

Refer to caption
Figure 4: The phase diagram of the S=1 Haldane chain. The phase boundary is generated by the 2-degree polynomial regression based on some boundary points.

VII Conclusion

There have been many successes in achieving larger scale and higher fidelity of the NISQ devices. These improvements provide the quantum neural network with the opportunity to deal with larger data sets and even outperform the classical counterpart. This paper presents a duplication-free quantum neural network model, DQNN, to solve machine learning problems. Meanwhile, we provide it with the universal guarantee to approximate any arbitrary continuous function using several variational quantum circuits, multiple measurement observables and the classical parameterized sigmoid function. We can enhance the expressibility of DQNN by increasing the number of quantum circuits and the number of observables without requiring auxiliary qubits. This property for the number of qubits makes DQNN more likely to be implemented on intermediate-scale quantum computers.

Unlike other variational QNN proposals, DQNN utilizes the classical computer in the hybrid system to generate nonlinearity. Without duplicates, DQNN significantly reduces the number of required qubits and decreases circuit complexity compared with two well-known QNN models. Our simulation results demonstrate that, compared with QCL and CCQ, DQNN uses fewer qubits and fewer quantum gates to outperform the other two QNN models and hence weaken the influence of the circuit noise. Furthermore, DQNN requires fewer parameters than the classical counterpart but is at an extra cost of the iterations in some cases. DQNN can also be used to solve quantum problems. These results indicate that the DQNN is an efficient QNN model which can find the patterns hidden in the classical and quantum data sets.

Like the classical naive neural network, DQNN can be considered a subroutine to construct more complex quantum deep learning models. In addition, DQNN can also be improved by optimizing its structure for the given ML problem. For example, DQNN can be combined with the concept of recurrent neural network to efficiently solve the natural language processing tasks, or integrated into a quantum reinforcement learning framework. Moreover, due to its simple design, DQNN is relatively easy to be implemented on NISQ devices for QNN demonstration.

Acknowledgements

The authors gratefully acknowledge the grant from the National Key R&D Program of China, Grant No. 2018YFA0306703. We also thank Chu Guo, Bujiao Wu, Yusen Wu, Shaojun Wu, Yuhan Huang, Donghong Han, Yingli Yang and Yi Tian for helpful discussions.

Refer to caption
Figure 5: Via the amplitude encoding process F:𝒙|𝒙¯, approximating f(𝒙):G is equivalent to the the problem of approximating f(F1|𝒙¯), which can be achieved efficiently by DQNN.

Appendix A Amplitude encoding method

Suppose {𝒙(i)} of the data set D={𝒙(i),y(i)} is contained in a subset G of d. Without loss of generality, we can choose an appropriate coordinate frame through translation such that 𝟎G and every 𝒙G satisfies 0<κ1|𝒙|κ2 with constants κ1 and κ2. In order to apply variational QNN, we encode the classical data into the states of a quantum register. In this work, we define the encoding mapping F:GG¯ such that |𝒙¯=F(𝒙) is an n-qubit state in the Hilbert space 2n with d2n:

F:𝒙=(x1,,xd)T|𝒙¯γ1(x1,,xd,x~,0,,0)T (6)

where x~=𝒙1+𝒙, γ=(𝒙2+x~2)12. It is easy to validate that κ11+κ1x~κ21+κ2 and

(1+(1+κ2)2)12<x¯d+1<(1+(1+κ1)2)12 (7)

with x¯d+1=γ1x~. Such encoding method is called amplitude encoding, since the classical information 𝒙 is encoded into the amplitudes of |𝒙¯, as illustrated in Fig. 5. Therefore, approximating f(𝒙):G is equivalent to the approximation of f(F1|𝒙¯):G¯.

Appendix B Proof of Theorem 1

It suffices to show Q(G¯)={q𝒛1,,𝒛ns,𝒂,𝒄,𝜶} is dense in L2(G¯). We will prove it by contradiction. Assume that the closure Q(G¯)¯L2(G¯). By the Hahn-Banach theorem [56], there exists a bounded linear functional l on L2(G¯) such that l(Q(G¯))=0 and l0. By the Riesz representation theorem [57], there exists a function g(𝒙¯)L2(G¯) such that the linear functional l can be represented as

l(f)=G¯f(𝒙¯)g(𝒙¯)𝑑μfor all fL2(G¯). (8)

Note that l0 implies g0 on G¯. Therefore, the following two sets S1={𝒙¯G¯:g(𝒙¯)>0} and S2={𝒙¯G¯:g(𝒙¯)<0} are measurable and at least one of them has a positive measure. Without loss of generality, we assume that g(𝒙¯)>0 almost everywhere in an open subset EG¯ with μ(E)>0. Substituting f(𝒙)=σ(a(|𝒙¯|𝒛|2c))Q(G¯) into above equation, and in view of l(Q(G¯))=0, we have:

G¯σ(a(|𝒙¯|𝒛|2c))g(𝒙¯)𝑑μ=0. (9)

As the open set E contains a small ball B(𝒛*,δ)={𝒙¯G¯:|𝒙¯𝒛*|<δ}, we will show that, by choosing large enough a>0 and 0<c<1, the function σ(a(|𝒙¯|𝒛|2c)) tends to 1 in a smaller ball B(𝒛*,δ1)B(𝒛*,δ) (δ1<δ) and vanishes quickly outside of B(𝒛*,δ1), which leads to a contradiction to Eq. (9).

Since G¯ is a real subset of the unit sphere,

|𝒙¯𝒛*|2=22𝒙¯|𝒛* for all 𝒙¯G¯, (10)

which implies that |𝒙¯|𝒛*|2>c is equivalent to one of the following two cases:

(i)|𝒙¯𝒛*|2<2(1c)or(ii)|𝒙¯𝒛*|2>2(1+c). (11)

For c1, case (i) means 𝒙¯ is sufficiently close to 𝒛*, and (ii) means 𝒙¯ is sufficiently close to 𝒛*, but the latter case is impossible. Specifically, for c sufficiently close to 1, and a given value κ2>0, we can choose c such that c(12(1+(1+κ2)2)1)2, and together with Eq. (7), we have:

|𝒙¯+𝒛*|2(zd+1*+x¯d+1)24(1+(1+κ2)2)12(1c) (12)

If (ii) were true, then by |𝒙¯𝒛*|2+|𝒙¯+𝒛*|2=4, we would find |𝒙¯+𝒛*|2<2(1c), contradictory to Eq. (12). Hence case (ii) is impossible. Therefore, when c1, |𝒙¯|𝒛*|2>c is only equivalent to case (i). Passing to the limit a+, we get

σ(a(|𝒙¯|𝒛*|2c)){1 for 𝒙¯ with |𝒙¯𝒛*|<δ1,0 for 𝒙¯ with |𝒙¯𝒛*|>δ1, (13)

with δ1=(2(1c))12. Taking c sufficiently close to 1, we have δ1δ, and from Eq. (9) we obtain

0 =G¯σ(a(|𝒙¯|𝒛|2c))g(𝒙¯)𝑑μ (14)
B(𝒛*,δ1)1g(𝒙¯)𝑑μ>0(as a+), (15)

which is a contradiction. Hence, we conclude that Q(G¯) is dense in L2(G¯). Thus for any fL2(G¯) and ϵ>0, there exists a q(𝒙¯)Q(G¯) such that

fqL2(G¯)2=G¯|f(𝒙¯)q(𝒙¯)|2𝑑μϵ, (16)

which proves Theorem 1.

Remark 1.

If σ is a ReLU-type function, then the conclusion still holds. In fact, instead of Eq.(13), we have σ(a(|𝐱¯|𝐳|2c))>0 for all 𝐱¯B(𝐳*,δ1) and σ(a(|𝐱¯|𝐳|2c))=0 otherwise, which leads to the contradiction:

0 =G¯σ(a(|𝒙¯|𝒛|2c))g(𝒙¯)𝑑μ
=B(𝒛*,δ1)σ(a(|𝒙¯|𝒛|2c))g(𝒙¯)𝑑μ>0.

Appendix C Introduction to other variational QNN models

In order to compare the performance of DQNN with other models, in this section, we will present a brief introduction to two established variational QNN models, circuit-centric quantum classifier (CCQ) [33] and quantum circuit learning (QCL) [32], and will apply these models to solve the following quantum learning problem: constructing a QNN to approximate an M-order polynomial f: 𝒙f(𝒙), where M is an even number.

First, we consider the CCQ model, where the classical data 𝒙=[x1,x2,,xd]Td with 𝒙=1 is mapped into a ndata-qubit register with ndatalogd, using the following encoding scheme:

|𝒙¯=1γ[x1,x2,,xd,x~,0,0]T2ndata, (17)

where x~ is a padding term chosen by the user, and γ is the normalization factor. Notice that |𝒙¯ alone is not sufficient as an input to generate the desired nonlinearity, and an input state as a tensor product of copies of |𝒙¯ is required to generate the high-order terms in f(𝒙). Specifically, to approximate the polynomial f, the input state |ψ𝒙 of CCQ is chosen as:

|ψ𝒙 =|𝒙¯ncopy=|𝒙¯M2 (18)
=1γM2[x1M2,x1M21x2,,x~M2,0,,0]T2ntot, (19)

where ntotncopyndata=12Mlogd. Here we have chosen the number of copies of |𝒙¯ in |ψ𝒙 to be ncopy=M2.

Denoting the variational quantum circuit in CCQ by U(𝜽), the output state |ψ𝜽,𝒙 becomes:

|ψ𝜽,𝒙=U(𝜽)|ψ𝒙=1γM2[u1,1(𝜽)u1,2(𝜽)u1,3(𝜽)u1,2ntot(𝜽)u2,1(𝜽)u2,2(𝜽)u2,3(𝜽)u2,2ntot(𝜽)u3,1(𝜽)u3,2(𝜽)u3,3(𝜽)u3,2ntot(𝜽)u2ntot,1(𝜽)u2ntot,2(𝜽)u2ntot,3(𝜽)u2ntot,2ntot(𝜽)][x1M2x1M21x2x1M21x30]=[p1(𝜽,𝒙)p2(𝜽,𝒙)p3(𝜽,𝒙)p2ntot(𝜽,𝒙)] (20)

where each amplitude pi(𝜽,𝒙) of |ψ𝜽,𝒙 is an M2-order polynomial of 𝒙. Hence, for a chosen observable B, the measurement outcome B𝜽,𝒙ψ𝜽,𝒙|B|ψ𝜽,𝒙 is an M-order polynomial of 𝒙. Then the goal of CCQ is to choose an appropriate B and to find the best value 𝜽=𝜽* through quantum-classical hybrid optimization such that 𝒙B𝜽*,𝒙 is the optimal function to approximate 𝒙f(𝒙). The quantum circuit for CCQ is presented in Fig. 6.

Refer to caption
Figure 6: The quantum circuit for CCQ to approximate the M-order polynomial function 𝒙f(𝒙).

Next, we consider the QCL model, where the data point 𝒙=[x1,x2,,xd]T is encoded into a d-qubit data-encoding pure-state density operator:

ρ(𝒙) =|ψ𝒙ψ𝒙|
=i=1d(RiY(arcsinxi)|0i0|iRiY(arcsinxi))
=12di=1d(I+xiX+1xi2Z) (21)

where RiY(𝜽) represents a Pauli-Y rotation on the i-th qubit. Analogous to the CCQ case, ρ(𝒙) along is not sufficient as an input to generate the desired nonlinearity. In order to generate an M-order polynomial, ρ(𝒙) is duplicated into M identical copies in a tensor product. Specifically, the input state of QCL is chosen as:

ρin(𝒙) =j=1Mρ(𝒙)
=12Mdj=1M(i=1d[I+xiX+1xi2Z). (22)

Applying a parameterized quantum circuit U(𝜽) to ρin(𝒙), followed by a measurement of a chosen observable B, we obtain the measurement outcome as the output of QCL:

B𝜽,𝒙Tr(U(𝜽)ρin(𝒙)U(𝜽)B) (23)

Define {Al}{I,X,Z}Md as a set of generalized Pauli operators. Due to Eq. (22), ρin(𝒙) can be expanded in terms of {Al}:

ρin(𝒙)=12Mdl=13Mdpl(x1,,xd,1x12,,1xd2)Al, (24)

where (x1,,xd,x1,x2,,xd)2dpl(x1,x2,,xd,x1,x2,,xd), l=1,2,,3Md, are polynomials with respect to all 2d variables. Assuming B is unitarily diagonalized as

B=W[λ100λ2ntot]W, (25)

we substitute Eq. (24) and Eq. (25) into Eq. (23) and obtain:

B𝜽,𝒙 =12Mdl=13Mdbl(𝜽,λ)pl
bl(𝜽,λ) j=12MdλjTr(Al(WU(𝜽))|ejej|WU(𝜽)),

where {|ej} is the set of computational basis. Thus, the task of QCL is to choose an appropriate observable B and to find the best value 𝜽=𝜽* through quantum-classical hybrid optimization such that 𝒙B𝜽*,𝒙 will well approximate 𝒙f(𝒙). The quantum circuit for QCL is given in Fig. 7.

Refer to caption
Figure 7: The quantum circuit for QCL to approximate the M-order polynomial function 𝒙f(𝒙).

In summary, to prepare the desired initial state to solve the same polynomial approximation problem, it requires 12Mlogd qubits for CCQ, and Md qubits for QCL. In comparison, it only requires logd qubits for DQNN to prepare the input state as |𝒙¯ in Eq. (17). This is a significant reduction for large M and large d. Specifically, for DQNN, after we prepare the input state |𝒙¯, we sequentially apply a series of variational quantum circuits, {U(j)(𝜽(j))}j=1ncir, and quantum measurement to derive the measurement outcome Bi𝜽(j),𝒙=𝒙¯|U(j)(𝜽(j))BiU(j)(𝜽(j))|𝒙¯ for a set of chosen observables {Bi}i=1nobs. Next, we apply a parameterized classical sigmoid function σ to each Bi𝜽(j),𝒙¯ to construct the final output q𝜽,𝜶,𝒂,𝒄(𝒙¯). Finally, we implement the hybrid optimization to find the optimal values of (𝜽,𝜶,𝒂,𝒄) to approximate the target polynomial, where 𝜶, 𝒂, 𝒄 are parameters used to construct the final output q𝜽,𝜶,𝒂,𝒄, with details in the following.

Appendix D Solving regression problems using DQNN, CCQ and QCL

In the following, we aim to solve a regression task of approximating a highly nonlinear function:

f:(x1,x2)[0.8,0.8]2 (0.7161.0125x12+x14)
×(0.7161.0125x22+x24) (26)

using DQNN, QCL and CCQ. The data set D{𝒙(i),y(i)}={(x1(i),x2(i),y(i))}i=1400 is of size 400, with (x1(i),x2(i)) randomly chosen from the square [0.8,0.8]2, and each label y(i) is calculated according to y(i)=f(x1(i),x2(i)). The goal of a QNN is to generate a nonlinear function 𝒙q𝜽(𝒙) to approximate f by optimizing over the parameters 𝜽 of the QNN.

Refer to caption
(a) DQNN1 based on one single-layer quantum circuit and 4 observables.
Refer to caption
(b) DQNN4 based on 4 single-layer quantum circuits and one observable.
Figure 8: The implementation of DQNN in the regression task of Eq. (26).

First, we use the DQNN model to solve this problem. We choose two cases for DQNN in the task. Case 1 (DQNN1) is based on a 2-qubit single-layer variational quantum circuit, U(1)(𝜽(1)), and four measurement observables, {Bi}i=14, randomly chosen the generalized Pauli basis on two qubits, as shown in Fig. 8(a). Case 2 (DQNN4) is based on four 2-qubit single-layer variational quantum circuits, {U(j)(𝜽(j))}j=14, and one measurement observable B1=Z(1)Z(2), as shown in Fig. 8(b). Through state reparation, each data point 𝒙 is encoded into the input state:

|𝒙¯=1γ[x1,x2,x~,0]T (27)

where x~=|𝒙|1+|𝒙| and γ=(|𝒙|2+x~2)12. Then we sequentially apply the variational circuit U(j)(𝜽(j)) and quantum measurement to obtain the measurement outcome Bi𝜽(j),𝒙¯:

Bi𝜽(j),𝒙¯Tr(U(j)(𝜽(j))|𝒙¯𝒙¯|U(j)(𝜽(j))Bi). (28)

Denoting the sigmoid function as σ(x), with

σ(x)11+ex, (29)

for DQNN1, we obtain

u(𝒙¯)i=14αi(1)σ(ai(1)(Bi𝜽(1),𝒙¯ci(1))),i=1,2,3,4. (30)

and finally derive

q𝜽,𝒂,𝒄,𝜶(𝒙¯)σ(a5(u(𝒙¯)c5)) (31)

where the parameter 𝜽 comes from the quantum circuit U(𝜽), while the parameters (𝒂,𝒄,𝜶) come from the classical processing after U(𝜽). For DQNN4, we obtain

u(𝒙¯)j=14α1(j)σ(a1(j)(B1𝜽(j),𝒙¯c1(j))),j=1,2,3,4 (32)

and derive

q𝜽,𝒂,𝒄,𝜶(𝒙¯)σ(a5(u(𝒙¯)c5)). (33)

Finally, through optimization, we can find the optimal values of parameters (𝜽,𝒂,𝒄,𝜶) that correspond to the optimal approximation to the target function f.

Second, we consider the CCQ model. For any data point (𝒙,y)D, 𝒙 can be encoded into a 2-qubit state |𝒙¯=1γ[x1,x2,x~,0]T as in Eq. (17). For the input state |ψ𝒙 in Eq. (19), the number of copies ncopy can be chosen to be ncopy1. Here we consider two cases, ncopy=1 and ncopy=2. For ncopy=1, we design U(𝜽) in CCQ to be a 2-qubit 4-layer variational quantum circuit; for ncopy=2, U(𝜽) in CCQ is chosen to be a 4-qubit 4-layer circuit. The circuit depth of U(𝜽) is 13 in the former case, and 21 in the latter case. Correspondingly, the input state |ψ𝒙 is either a 2-qubit state or a 4-qubits state, as shown in Fig. 9. After input state preparation in each case, we apply U(𝜽) and a measurement to |ψ𝒙, and derive the measurement outcome

B𝜽,𝒙Tr(U(𝜽)|ψ𝒙ψ𝒙|U(𝜽)B) (34)

where the observable is chosen as BZ(1), the local Pauli-Z gate on the first qubit. Then we derive the following output

q𝜽(𝒙)=B𝜽,𝒙+12. (35)

Through optimization, we can find the optimal value 𝜽 that correspond to the best approximated q𝜽 to the target function f.

Refer to caption
Refer to caption
Figure 9: The circuit of CCQ in the regression task of Eq. (26), with U(𝜽) chosen as a 2-qubit 4-layer circuit (case a), and as a 4-qubit 4-layer circuit (case b)
Refer to caption
Refer to caption
Figure 10: The circuit of QCL in the regression task of Eq. (26), with U(𝜽) chosen as a 2-qubit 4-layer circuit (case a), and as a 4-qubit 5-layer circuit (case b).

Finally, we apply QCL to solve the same problem. We also consider two cases where the classical data point 𝒙 is encoded into a 2-qubit state (ncopy=1) and a 4-qubit state (ncopy=2) respectively, as in Eq. (21):

ρ(𝒙)=14j=1ncopyi=12(I+xiX+1xi2Z). (36)

The entire process from the input state to the final output of QCL is illustrated in Fig. 10. Notice that the variational quantum circuit U(𝜽) is slightly different from the original suggestion in [32], as we find that our modified circuit has a better performance. After input state preparation in each case, we apply U(𝜽) and a measurement to ρ(𝒙), and derive the measurement outcome

B𝜽,𝒙Tr(U(𝜽)ρ(𝒙)U(𝜽)B). (37)

where the observable is chosen as BZ(1). Then we obtain the final output of QCL:

q𝜽,a(𝒙)=aB𝜽,𝒙 (38)

where a is a trainable parameter. Then, through optimization, we find the optimal values of the parameters (𝜽,a) that correspond to the best approximated q𝜽,a to the target function f.

References

  • Zheng et al. [2017] H. Zheng, J. Fu, T. Mei, and J. Luo, Learning multi-attention convolutional neural network for fine-grained image recognition, in IEEE International Conference on Computer Vision (ICCV), edited by K. Ikeuchi, G. Medioni, and M. Pelillo (Venice, Italy, 2017) p. 5219.
  • Shi et al. [2017] B. Shi, X. Bai, and C. Yao, An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition, IEEE Trans. Pattern Anal. Mach. Intell. 39, 2298 (2017).
  • Albawi et al. [2017] S. Albawi, T. A. Mohammed, and S. Al-Zawi, Understanding of a convolutional neural network, in 2017 International Conference on Engineering and Technology (ICET), edited by O. Bayat, S. A. Aljawarneh, and H. F. Carlak (Akdeniz University, Antalya, Turkey, 2017) p. 1.
  • Goldberg [2016] Y. Goldberg, A primer on neural network models for natural language processing, J. Artif. Intell. Res. 57, 345 (2016).
  • Goldberg [2017] Y. Goldberg, Neural Network Methods for Natural Language Processing, Vol. 37 (Morgan & Claypool, 2017).
  • Collobert and Weston [2008] R. Collobert and J. Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning, in Proceedings of the 25th International Conference on Machine Learning, edited by W. Cohen, A. McCallum, and S. Roweis (Helsinki, Finland, 2008) p. 160.
  • Nugraha et al. [2017] B. T. Nugraha, S. F. Su, and Fahmizal, Towards self-driving car using convolutional neural network and road lane detector, in International conference on automation, cognitive science, optics, micro electro-mechanical system, and information technology, edited by T. Arjon, H. Taufik, and K. D. Esti (Jakarta, Indonesia, 2017) p. 65.
  • Do et al. [2018] T. D. Do, M. T. Duong, Q. V. Dang, and M. H. Le, Real-time self-driving car navigation using deep neural network, in International Conference on Green Technology and Sustainable Development, edited by D. T. Trung (Ho Chi Minh City, Vietnam, 2018) p. 7.
  • Bojarski et al. [2017] M. Bojarski, P. Yeres, A. Choromanska, K. Choromanski, B. Firner, L. Jackel, and U. Muller, Explaining how a deep neural network trained with end-to-end learning steers a car (2017), arXiv e-prints, ArXiv:1704.07911.
  • Biamonte et al. [2017] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, Quantum machine learning, Nature 549, 195 (2017).
  • Carleo and Troyer [2017] G. Carleo and M. Troyer, Solving the quantum many-body problem with artificial neural networks, Science 355, 602 (2017).
  • Deng et al. [2017] D. L. Deng, X. P. Li, and S. Das Sarma, Quantum entanglement in neural network states, Phys. Rev. X 7, 021021 (2017).
  • Garrido Torres et al. [2021] J. A. Garrido Torres, V. Gharakhanyan, N. Artrith, T. H. Eegholm, and A. Urban, Augmenting zero-kelvin quantum mechanics with machine learning for the prediction of chemical reactions at high temperatures, Nat. Commun. 12, 1 (2021).
  • Schütt et al. [2019] K. Schütt, M. Gastegger, A. Tkatchenko, K.-R. Müller, and R. J. Maurer, Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions, Nat. Commun. 10, 1 (2019).
  • Gao and Duan [2017] X. Gao and L. M. Duan, Efficient representation of quantum many-body states with deep neural networks, Nat. Commun. 8, 1 (2017).
  • Paruzzo et al. [2018] F. M. Paruzzo, A. Hofstetter, F. Musil, S. De, M. Ceriotti, and L. Emsley, Chemical shifts in molecular solids by machine learning, Nat. Commun. 9, 1 (2018).
  • Huang et al. [2022] H.-Y. Huang, M. Broughton, J. Cotler, S. Chen, J. Li, M. Mohseni, H. Neven, R. Babbush, R. Kueng, J. Preskill, and J. R. McClean, Quantum advantage in learning from experiments, Science 376, 1182 (2022).
  • Wiebe et al. [2012] N. Wiebe, D. Braun, and S. Lloyd, Quantum algorithm for data fitting, Phys. Rev. Lett. 109, 050505 (2012).
  • Rebentrost et al. [2014] P. Rebentrost, M. Mohseni, and S. Lloyd, Quantum support vector machine for big data classification, Phys. Rev. Lett. 113, 130503 (2014).
  • Wiebe et al. [2015] N. Wiebe, A. Kapoor, and K. M. Svore, Quantum algorithms for nearest-neighbor methods for supervised and unsupervised learning, Quantum Info. Comput. 15, 316 (2015).
  • Lloyd et al. [2014] S. Lloyd, M. Mohseni, and P. Rebentrost, Quantum principal component analysis, Nat. Phys. 10, 631 (2014).
  • Schuld et al. [2016] M. Schuld, I. Sinayskiy, and F. Petruccione, Prediction by linear regression on a quantum computer, Phys. Rev. A 94, 022342 (2016).
  • Paparo and Martin-Delgado [2012] G. D. Paparo and M. Martin-Delgado, Google in a quantum network, Sci. Rep. 2, 1 (2012).
  • Dong et al. [2008] D. Y. Dong, C. L. Chen, H. X. Li, and T.-J. Tarn, Quantum reinforcement learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B 38, 1207 (2008).
  • Saggio et al. [2021] V. Saggio, B. E. Asenbeck, A. Hamann, T. Strömberg, P. Schiansky, V. Dunjko, N. Friis, N. C. Harris, M. Hochberg, D. Englund, S. Wölk, H. Briegel, and P. Walther, Experimental quantum speed-up in reinforcement learning agents, Nature 591, 229 (2021).
  • Paparo et al. [2014] G. D. Paparo, V. Dunjko, A. Makmal, M. A. Martin-Delgado, and H. J. Briegel, Quantum speedup for active learning agents, Phys. Rev. X 4, 031002 (2014).
  • Liu and Rebentrost [2018] N. Liu and P. Rebentrost, Quantum machine learning for quantum anomaly detection, Phys. Rev. A 97, 042315 (2018).
  • Amin et al. [2018] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, and R. Melko, Quantum boltzmann machine, Phys. Rev. X 8, 021050 (2018).
  • Aaronson [2015] S. Aaronson, Read the fine print, Nat. Phys. 11, 291 (2015).
  • Huang et al. [2021] H. Y. Huang, M. Broughton, M. Mohseni, R. Babbush, S. Boixo, H. Neven, and J. R. McClean, Power of data in quantum machine learning, Nat. Commun. 12 (2021).
  • Schuld et al. [2014] M. Schuld, I. Sinayskiy, and F. Petruccione, The quest for a quantum neural network, Quantum Inf. Process 13, 2567 (2014).
  • Mitarai et al. [2018] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, Quantum circuit learning, Phys. Lett. A 98, 032309 (2018).
  • Schuld et al. [2020] M. Schuld, A. Bocharov, K. M. Svore, and N. Wiebe, Circuit-centric quantum classifiers, Phys. Lett. A 101, 032308 (2020).
  • Hornik et al. [1989] K. Hornik, M. Stinchcombe, and H. White, Multilayer feedforward networks are universal approximators, Neural Networks 2, 359 (1989).
  • Leshno et al. [1993] M. Leshno, V. Y. Lin, A. Pinkus, and S. Schocken, Multilayer feedforward networks with a nonpolynomial activation function can approximate any function, Neural Networks 6, 861 (1993).
  • Cao et al. [2017] Y. D. Cao, G. G. Guerreschi, and A. Aspuru-Guzik, Quantum neuron: an elementary building block for machine learning on quantum computers (2017), arXiv e-prints, ArXiv:1711.11240.
  • Yan et al. [2020] S. Yan, H. Qi, and W. Cui, Nonlinear quantum neuron: A fundamental building block for quantum neural networks, Phys. Lett. A 102, 052421 (2020).
  • Tacchino et al. [2019] F. Tacchino, C. Macchiavello, D. Gerace, and D. Bajoni, An artificial neuron implemented on an actual quantum processor, npj Quantum Inf. 5, 1 (2019).
  • Kristensen et al. [2021] L. B. Kristensen, M. Degroote, P. Wittek, A. Aspuru-Guzik, and N. T. Zinner, An artificial spiking quantum neuron, npj Quantum Inf. 7, 1 (2021).
  • Torrontegui and García-Ripoll [2019] E. Torrontegui and J. J. García-Ripoll, Unitary quantum perceptron as efficient universal approximator, Europhys. Lett. 125, 30004 (2019).
  • Beer et al. [2020] K. Beer, D. Bondarenko, T. Farrelly, T. J. Osborne, R. Salzmann, D. Scheiermann, and R. Wolf, Training deep quantum neural networks, Nat. Commun. 11, 1 (2020).
  • Schuld et al. [2015] M. Schuld, I. Sinayskiy, and F. Petruccione, Simulating a perceptron on a quantum computer, Phys. Lett. A 379, 660 (2015).
  • Wan et al. [2017] K. H. Wan, O. Dahlsten, H. Kristjánsson, R. Gardner, and M. Kim, Quantum generalisation of feedforward neural networks, npj Quantum Inf. 3, 1 (2017).
  • Runge [1901] C. Runge, über empirische funktionen und die interpolation zwischen äquidistanten ordinaten, Zeitschrift für Mathematik und Physik 46, 224 (1901).
  • Brenner and Scott [2007] S. C. Brenner and L. R. Scott, The Mathematical Theory of Finite Element Methods (Springer, 2007).
  • Goodfellow et al. [2016] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning (MIT Press, 2016).
  • Plesch and Brukner [2011] M. Plesch and Č. Brukner, Quantum-state preparation with universal gate decompositions, Phys. Rev. A 83, 032302 (2011).
  • Bottou [2010] L. Bottou, Large-scale machine learning with stochastic gradient descent, Comput. Stat. , 177 (2010).
  • Kingma and Ba [2015] D. Kingma and J. Ba, Adam: A method for stochastic optimization, in International Conference on Learning Representations, edited by Y. Bengio and Y. Lecun (San Diego, CA, 2015) p. 13.
  • Buckley and LeNir [1985] A. Buckley and A. LeNir, Algorithm 630: Bbvscg–a variable-storage algorithm for function minimization, ACM Trans. Math. Softw. 11, 103 (1985).
  • Bishop [2006] C. M. Bishop, Pattern Recognition and Machine Learning (Springer Press, Berlin, 2006).
  • LeCun et al. [2010] Y. LeCun, C. Cortes, and C. Burges, Mnist handwritten digit database (2010), figshare http://yann.lecun.com/exdb/mnist (2010).
  • Dua and Graff [2017] D. Dua and C. Graff, UCI machine learning repository (2017), http://archive.ics.uci.edu/ml (2017).
  • Haldane [1983] F. D. M. Haldane, Nonlinear field theory of large-spin heisenberg antiferromagnets: semiclassically quantized solitons of the one-dimensional easy-axis néel state, Phys. Rev. Lett. 50, 1153 (1983).
  • Cong et al. [2019] I. Cong, S. Choi, and M. D. Lukin, Quantum convolutional neural networks, Nat. Phys. 15, 1273 (2019).
  • Hahn [1927] H. Hahn, Über lineare gleichungssysteme in linearen räumen., J. für die Reine und Angew. Math. 1927, 214 (1927).
  • Fréchet [1907] M. Fréchet, Sur les ensembles de fonctions et les opérations linéaires, CR Acad. Sci. Paris 144, 1414 (1907).