19
An introduction to quantum machine learning Maria Schuld a , Ilya Sinayskiy a,b and Francesco Petruccione a,b a Quantum Research Group, School of Chemistry and Physics, University of KwaZulu-Natal, Durban, KwaZulu-Natal, 4001, South Africa b National Institute for Theoretical Physics (NITheP), KwaZulu-Natal, 4001, South Africa September 11, 2014 Abstract Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessable way, and discusses the potential of a future theory of quantum learning. Keywords: Quantum machine learning, quantum computing, artificial intelligence, machine learning 1 Introduction Machine learning refers to an area of computer sci- ence in which patterns are derived (‘learned’) from data with the goal to make sense of previously un- known inputs. As part of both artificial intelligence and statistics, machine learning algorithms process large amounts of information for tasks that come naturally to the human brain, such as image and speech recognition, pattern identification or strategy optimisation. These problems gain significant impor- tance in our digital age, an illustrative example being Google’s PageRank machine learning algorithm for search engines that was patented by Larry Page in 1997 1 and led to the rise of what is today one of the biggest IT companies in the world. Other important applications of machine learning are spam 1 See https://www.princeton.edu/ achaney/tmve/wiki100k/ docs/PageRank.html [Last accessed 6/24/2014] mail filters, iris recognition for security systems, the evaluation of consumer behaviour, assessing risks in the financial sector or developing strategies for computer games. In short, machine learning comes into play wherever we need computers to interpret data based on experience. This usually involves huge amounts of previously collected input-output data pairs, and machine learning algorithms have to be very efficient in order to deal with so called big data. Since the volume of globally stored data is growing by around 20% every year (currently ranging in the order of several hundred exabytes [1]), the pressure to find innovative approaches to machine learning is rising. A promising idea that is currently investigated by academia as well as in the research labs of leading IT companies exploits the potential of quantum computing in order to optimise classical machine learning algorithms. In the last decades, 1 arXiv:1409.3097v1 [quant-ph] 10 Sep 2014

An introduction to quantum machine learning - arXiv introduction to quantum machine learning ... the input-output relation of a computer program is ... ing is usually divided into

Embed Size (px)

Citation preview

An introduction to quantum machine learning

Maria Schulda, Ilya Sinayskiya,b and Francesco Petruccionea,b

aQuantum Research Group, School of Chemistry and Physics, University ofKwaZulu-Natal, Durban, KwaZulu-Natal, 4001, South Africa

bNational Institute for Theoretical Physics (NITheP), KwaZulu-Natal, 4001, South Africa

September 11, 2014

Abstract

Machine learning algorithms learn a desired input-output relation from examples in order to interpretnew inputs. This is important for tasks such as image and speech recognition or strategy optimisation,with growing applications in the IT industry. In the last couple of years, researchers investigatedif quantum computing can help to improve classical machine learning algorithms. Ideas range fromrunning computationally costly algorithms or their subroutines efficiently on a quantum computer tothe translation of stochastic methods into the language of quantum theory. This contribution gives asystematic overview of the emerging field of quantum machine learning. It presents the approaches as wellas technical details in an accessable way, and discusses the potential of a future theory of quantum learning.

Keywords: Quantum machine learning, quantum computing, artificial intelligence, machine learning

1 Introduction

Machine learning refers to an area of computer sci-ence in which patterns are derived (‘learned’) fromdata with the goal to make sense of previously un-known inputs. As part of both artificial intelligenceand statistics, machine learning algorithms processlarge amounts of information for tasks that comenaturally to the human brain, such as image andspeech recognition, pattern identification or strategyoptimisation. These problems gain significant impor-tance in our digital age, an illustrative example beingGoogle’s PageRank machine learning algorithm forsearch engines that was patented by Larry Pagein 19971 and led to the rise of what is today oneof the biggest IT companies in the world. Otherimportant applications of machine learning are spam

1See https://www.princeton.edu/ achaney/tmve/wiki100k/docs/PageRank.html [Last accessed 6/24/2014]

mail filters, iris recognition for security systems, theevaluation of consumer behaviour, assessing risksin the financial sector or developing strategies forcomputer games. In short, machine learning comesinto play wherever we need computers to interpretdata based on experience. This usually involves hugeamounts of previously collected input-output datapairs, and machine learning algorithms have to bevery efficient in order to deal with so called big data.

Since the volume of globally stored data is growingby around 20% every year (currently ranging inthe order of several hundred exabytes [1]), thepressure to find innovative approaches to machinelearning is rising. A promising idea that is currentlyinvestigated by academia as well as in the researchlabs of leading IT companies exploits the potentialof quantum computing in order to optimise classicalmachine learning algorithms. In the last decades,

1

arX

iv:1

409.

3097

v1 [

quan

t-ph

] 1

0 Se

p 20

14

physicists already demonstrated the impressivepower of quantum systems for information process-ing. In contrast to conventional computers builton the physical implementation of the two states‘0’ and ‘1’, quantum computers can make use ofa qubit’s superposition of two quantum states |0〉and |1〉 (e.g. encoded in two distinct energy levelsof an atom) in order to follow many different pathsof computation at the same time. But the lawsof quantum mechanics also restrict our access toinformation stored in quantum systems, and comingup with quantum algorithms that outperform theirclassical counterparts is very difficult. However,the toolbox of quantum algorithms is by now fairlyestablished and contains a number of impressiveexamples that speed up the best known classicalmethods [2]. The technological implementationof quantum computing is emerging [3], and manybelieve that it is only a matter of time until thenumerous theoretical proposals can be tested on realmachines. On this background, the new researchfield of quantum machine learning might offer thepotential to revolutionise future ways of intelligentdata processing.

A number of recent academic contributions ex-plore the idea of using the advantages of quantumcomputing in order to improve machine learningalgorithms. For example, some effort has been putinto the development of quantum versions [4, 5, 6]of artificial neural networks (which are widely usedin machine learning), but they are often based ona more biological perspective and a major break-through has not been accomplished yet [7]. Someauthors try to develop entire quantum algorithmsthat solve problems of pattern recognition [8, 9, 10].Other proposals suggest to simply run subroutines ofclassical machine learning algorithms on a quantumcomputer, hoping to gain a speed up [11, 12, 13]. Aninteresting approach is adiabatic quantum machinelearning, which seems especially fit for some classesof optimisation problems [14, 15, 16]. Stochasticmodels such as Bayesian decision theory or hiddenMarkov models find an elegant translation intothe language of open quantum systems [17, 18].Despite this growing level of interest in the field, a

k-nearest neighbour

k-means clustering

support vector machines

neural networks

decision trees

Bayesian theory

hidden Markov models

machine learning method

Efficient calculation of classical distances on a quantum computer

Reformulation in the language of open quantum systems

First explorations of quantum models

quantum approach

Figure 1: Overview of methods in machine learningand approaches from a quantum information perspec-tive as presented in this paper.

comprehensive theory of quantum learning, or howquantum information can in principle be applied tointelligent forms of computing, is only in the veryfirst stages of development.

This contribution gives a systematic overview ofthe emerging field of quantum machine learning, witha focus on methods for pattern classification. After abrief discussion of the concepts of classical and quan-tum learning in Section 2, the paper is divided intoseven sections, each presenting a standard method ofmachine learning (namely k-nearest neighbour meth-ods, support vector machines, k-means clustering,neural networks, decision trees, Bayesian theory andhidden Markov models) and the various approachesto relate each method to quantum physics. Thisstructure mirrors the still rather fragmented field andallows the reader to select specific areas of interest.As summarised in Figure 1, for k-nearest neighbourmethods, support vector machines and k-means clus-tering, authors are mainly concerned to find efficientcalculations of classical distances on a potential quan-tum computer, while probabilistic methods such asBayesian theory and hidden Markov models find ananalogy in the formalism of open quantum systems.Neural networks and decision trees are still waitingfor a convincing quantum version, although especiallythe former has been a relatively active field of re-

2

search in the last decade. Finally, in Section 4 webriefly discuss the need for future works on quantummachine learning that concentrate on how the actuallearning part of machine learning methods can beimproved using the power of quantum informationprocessing.

2 Classical and quantum learn-ing

2.1 Classical machine learning

The theory of machine learning is an important sub-discipline of both artificial intelligence and statistics,and its roots can be traced back to the beginnings ofartificial neural network and artificial intelligence re-search in the 1950’s [19, 20]. In 1959, Arthur Samuelgave his famous definition of machine learning as the‘field of study that gives computers the ability tolearn without being explicitly programmed’2. Thisis in fact misleading, since the algorithm itself doesnot adapt in the learning process, but the function itencodes. In more formal language, this means thatthe input-output relation of a computer program isderived from a set of training data (which is oftenvery big). Such methods gain importance as com-puters increasingly interact with humans and have tobecome more flexible to adapt to our specific needs.A prominent example is a spam mail filter thatlearns from user behaviour and external databasesto classify new spam mails correctly. However, thisis only one of many different cases where machinelearning intersects with our every-day lives.

In the theory of machine learning, the term learn-ing is usually divided into three types (see Figure2), which help to illustrate the spectrum of the field:supervised, unsupervised and reinforcement learning.In supervised learning, a computer is given examples

2It is interesting to note that although quoted in numer-ous introductions to machine learning, the original referenceto the machine learning pioneer’s most famous statement isvery difficult to find. Authors either refer to other secondarypublications, or falsely cite Samuel’s seminal paper from 1959[21].

unsupervised learning

supervised learning

reinforcement learning

Figure 2: The three types of classical learning. Super-vised learning derives patterns from training data andfinds application in pattern recognition tasks. Unsu-pervised learning infers information from the struc-ture of the input and is important for data cluster-ing. Reinforcement learning optimises a strategy dueto feedback by a reward function, and usually appliesto intelligent agents and games.

of correct input-output relations and has to infer amapping therefrom. Probably the most importanttask is pattern classification, where vectors of inputdata have to be assigned to different classes. Thismight sound like a rather technical problem, but is infact something humans do continuously - for examplewhen we recognise a face from different angles andlight conditions as belonging to one and the sameperson, or when we classify signals from our sensoryorgans as dangerous or not. We could even go sofar and say that pattern classification is the abstractdescription of ‘interpreting’ input coming from oursenses. It is no surprise that a big share of machinelearning research tries to imitate this remarkableability of human beings with computers, and thereis an entire zoo of algorithms that generalise fromlarge training data sets how to classify new input.

The second category, unsupervised learning, hasnot been part of machine learning for a long time, asit describes the process of finding patterns in datawithout prior experience or examples. A prominent

3

task is data clustering, or forming subgroups out of agiven dataset, in order to summarize large amountsof information by only a few stereotypes. This isfor example an important problem in sociologicalstudies and market research. Note that this taskis closely related to classification, since clusteringmeans effectively to assign a class to each vector of agiven set, but without the goal of treating new inputs.

Finally, reinforcement learning is the closestto what we might associate with the expression‘learning’. Given a framework of rules and goals,an agent (usually a computer program that actsas a player in a game) gets rewarded or punisheddepending on which strategy it uses in order to win.Each reward reinforces the current strategy, whilepunishment leads to an adaptation of its policy[22, 23]. Reinforcement learning is a central mech-anism in the development and study of intelligentagents. However, it will not be in the focus of thispaper, and it differs in many regards from the othertwo types of learning. Investigations into quantumgames and quantum intelligent agents are diverseand numerous (see for example, [24, 25, 26, 27, 28]),and shall be treated elsewhere.

Even within these categories, the expression‘learning’ can relate to different procedures. Forexample, it may refer to a training phase in whichoptimal parameters of an algorithm (e.g. weights,initial states) are obtained. This is done by pre-senting examples of correct input-output-relationsto a task, and adapting the parameters to reproducethese examples. The training set is then discarded[29]. An illustrative case close to human learningis the weight adjustment process in artificial neuralnetworks through backpropagation or deep learning[30, 31]. Training phases are often the most costlypart of a machine learning algorithm and efficienttraining methods become especially important whendealing with so called big data. Besides learningas a parameter optimisation problem, there is alarge number of machine learning algorithms thatdo not have an explicit learning phase. For example,if presented with an unclassified input vector, thek-nearest-neighbour for pattern classification uses

the training data to decide upon its classification. Inthis case, learning is not a parameter optimisationproblem, but rather a decision function inferred fromexamples. In reinforcement learning, this decisionfunction becomes a full strategy, and learning refersto the adaptation of the strategy to increase thechances of future reward.

Whatever type and procedure of learning is cho-sen, optimal machine learning algorithms run withminimum resources and have a minimum error raterelated to the task (as indicated by misclassificationof input, poor division into clusters, little reward ofa strategy). Challenges lie in the problem of findingparameters and initial values that lead to an optimalsolution, or to come up with schemes that reduce thecomplexity class of the algorithm.3 This is wherequantum computing promises to help.

2.2 Quantum machine learning

Quantum computing refers to the manipulation ofquantum systems in order to process information.The ability of quantum states to be in a superposi-tion can thereby lead to a substantial speedup of acomputation in terms of complexity, since operationscan be executed on many states at the same time.The basic unit of quantum computation is the qubit,|ψ〉 = α |0〉 + β |1〉 (with α, β ∈ C and |0〉 , |1〉 in thetwo-dimensional Hilbert space H2). The absolutesquares of the amplitudes are the probability tomeasure the qubit in the 0 or the 1 state, andquantum dynamics always maintain the property ofprobability conservation given by |α|2 + |β|2 = 1. Inmathematical language this means that transforma-tions that map quantum states onto other quantumstates (so called quantum gates) have to be unitary.Through single qubit quantum gates we are able tomanipulate the basis state, amplitude or phase ofa qubit (for example through the so called X-gate,the Z-gate and the Y-gate respectively), or put aqubit with β = 0 (α = 0) into an equal superposition

3The complexity of a problem tells us by what factor thecomputational resources needed to solve a problem grow if weincrease the input to the problem (e.g. the digits of a number)by one.

4

H

1 0 0 00 1 0 00 0 0 10 0 1 0

XOR

X 0 1 1 0

1 1 1 1

12

X

Hadamard

0 1

|0 1 0

|1

Measurement

qubit states

1 0 0 00 0 1 00 1 0 00 0 0 1

SWAPX

X

Figure 3: Representation of qubit states, unitarygates and measurements in the quantum circuitmodel and in the matrix formalism.

α = β = 1/√2 (α = 1/

√2, β = −1/

√2) (the Hadamard

or H-gate). Multi-qubit gates are often based oncontrolled operations that execute a single qubitoperation only if another (ancilla or control qubit) isin a certain state. One of the most important gatesis the two qubit XOR-gate, which flips the basisstate of the second qubit in case the first qubit is instate |1〉. A two-qubit gate that will be mentionedlater is the SWAP-gate exchanging the state of twoqubits with each other.

Quantum gates are usually expressed as unitarymatrices (see also Figure 3). The matrices operate on2n-dimensional vectors that contain the amplitudesof the 2n basis states of a n-dimensional quantumsystem. For example, the XOR-gate working on thequantum state |ψ〉 = 1/

√2 (|00〉 + |11〉) would look

like 1 0 0 00 1 0 00 0 0 10 0 1 0

· 1√2

1001

=1√2

1010

,

and produce |ψ′〉 = 1/√2 (|00〉 + |10〉). The art

of developing algorithms for a potential quantum

computer is to use such elementary gates in orderto create a quantum state that has a relatively highamplitude for states that represent solutions for thegiven problem. A measurement in the computationalbasis then produces such a desired result with arelatively high probability. Quantum algorithmsare usually repeated a number of times since theresult is always probabilistic. For a comprehensiveintroduction into quantum computing, we refer tothe standard textbook by Nielsen and Chuang [2].

In quantum machine learning, quantum algorithmsare developed to solve typical problems of machinelearning using the efficiency of quantum computing.This is usually done by adapting classical algorithmsor their expensive subroutines to run on a potentialquantum computer. The expectation is that inthe near future, such machines will be commonlyavailable for applications and can help to processthe growing amounts of global information. Theemerging field also includes approaches vice versa,namely well-established methods of machine learningthat can help to extend and improve quantuminformation theory.

As mentioned before, there is no comprehensivetheory of quantum learning yet. Discussions of ele-ments of such a theory can be found in [32, 33, 34].Following the remarks above, a theory of quantumlearning would refer to methods of quantum infor-mation processing that learn input-output relationsfrom training input, either for the optimisation ofsystem parameters (for example unitary operators,see [35]) or to find a ‘quantum decision function’ or‘quantum strategy’. There are many open questionsof how an efficient quantum learning procedurecould look like. For example, how can we efficientlyimplement an optimisation problem (that is usuallysolved by iterative and dissipative methods such asgradient descent) on a coherent and thus reversiblequantum computer? How can we translate andprocess important structural information, such asdistance metrics, using quantum states? How do weformulate a decision strategy in terms of quantumphysics? And the overall question, is there a generalway how quantum physics can in principle speed up

5

certain problems of machine learning?

An underlying question is also the representa-tion of classical data by quantum systems. Themost common approach in quantum computing isto represent classical information as binary strings(x1, ...xn) with xi ∈ {0, 1} for i = 1, ..., n, thatare directly translated into n-qubit quantum states|x1...xn〉 from a 2n-dimensional Hilbert space withbasis {|0....00〉 , |0....01〉 , ..., |1....11〉}, and to read in-formation out through measurements. However, ex-isting machine learning algorithms are often basedon an internal structure of this data, for example theEuclidean distance as a similarity measure betweentwo examples of features. Alternative data represen-tations have been proposed by Seth Lloyd and hisco-workers, who encode classical information into thenorm of a quantum state, 〈x| x〉 = |~x|−1~x2, leadingto the definition [11, 12]

|x〉 = |~x|−1/2~x. (1)

In order to use the strengths of quantum mechan-ics without being confined by classical ideas of dataencoding, finding ‘genuinely quantum’ ways of rep-resenting and extracting information could becomevital for the future of quantum machine learning.

3 Quantum versions of machinelearning algorithms

Before proceeding to the discussion of classicalmachine learning algorithms and their quantumcounterparts, we have to take a look on the actualproblems these methods intend to solve, as wellas introduce the formalism used throughout thisarticle. Probably the most important application isthe task of pattern classification, and there are manydifferent classical algorithms tackling this problem.Based on a set of training examples consisting offeature vectors4 and their respective class attributes,the computer has to correctly classify an unknownfeature vector. For example, the feature vector

4A feature vector has entries that refer to information on aspecific case, in other words a datapoint.

could contain preprocessed information on patientsand their correctly diagnosed disease. A machinelearning algorithm then has to find the correctdisease of a new patient. More precisely, given atraining set T = {~vp, cp}p=1,...,N of N n-dimensionalfeature vectors ~v and their respective class cp, aswell as a new n-dimensional input vector ~x, we haveto find the class cx of vector ~x. Closely relatedto pattern classification are other tasks such aspattern completion (adding missing information toan incomplete input), associative memory (retrievingone of a number of stored memory vectors upon aninput) or pattern recognition (including finding andexamining the shape of patterns; this term is oftenused as a synonym to pattern classification).

The central problem of unsupervised learning isclustering data. Given a set of feature vectors {~vp},the goal is to assign each vector to one out of k dif-ferent clusters so that similar inputs share the sameassignment. Other problems of machine learning con-cern optimal strategies in terms of an unknown re-ward function, given a set of consecutive observationsof choices and consequences. As stated above we willnot concentrate on the learning of strategies here.

3.1 Quantum versions of k-nearestneighbour methods

A very popular and simple standard textbookmethod for pattern classification is the k-nearestneighbour algorithm. Given a training set T offeature vectors with their respective classificationas well as an unclassified input vector ~x, the ideais to choose the class cx for the new input thatappears most often amongst its k nearest neighbours(see Figure 4). This is based on the assumptionthat ‘close’ feature vectors encode similar examples,which is true for many applications. Commondistance measures are thereby the inner product,the Euclidian or the Hamming distance5. Choosingk is not always easy and can influence the resultsignificantly. If k is chosen too big we loose the

5The Hamming distance between two binary strings is thenumber of flips needed to turn one into the other [36].

6

k=5 'k=1'

Figure 4: (Colour online) a: Illustration of thekNN method of pattern classification. The new vec-tor (black cross) gets assigned to the class that themajority of its k closest neighbours have (in this caseit would be the orange circle shape). b: A variationis the nearest-centroid method in which the closestmean vector of a class of vectors defines the classifi-cation of a new input. This can be understood as ak-nearest neighbour method with preprocessed dataand k = 1.

locality information and end up in a simple majorityvote over the entire training set, while a very smallk leads to noise-biased results. A variation of thealgorithm suggests not to run it on the training set,but to calculate the means or centroid 1/Nc

∑p ~v

p ofall Nc vectors belonging to one class c beforehand,and to select the class of the nearest centroid (we callthis here the nearest-centroid algorithm). Anothervariation weights the influence of the neighbours bydistance, gaining an independence of the parameterk (the weighted nearest neighbours algorithm [37]).Methods such as k-nearest neighbours are obviouslybased on a distance metric to evaluate the similarityof two feature vectors. Efforts to translate thisalgorithm into a quantum version therefore focuson the efficient evaluation of a classical distancethrough a quantum algorithm.

Aımeur, Brassard and Gambs [38] introduce theidea of using the overlap or fidelity |〈a| b〉| of twoquantum states |a〉 and |b〉 as a ‘similarity mea-sure’. The fidelity can be obtained through a sim-ple quantum routine sometimes referred to as a swaptest [39] (see Figure 5). Given a quantum state|a, b, 0anc〉 containing the two wavefunctions as wellas an ancilla register initially set to 0, a Hadamard

|0

|a

|b

H H

Figure 5: Quantum circuit representation of a swaptest routine.

transformation sets the ancilla into a superposition1/√2(|0〉 + |1〉), followed by a controlled SWAP-gate

on a and b which swaps the two states under thecondition that the ancilla is in state |1〉. A sec-ond Hadamard gate on the ancilla results in state|ψSW 〉 = 1

2 |0〉 (|a, b〉+ |b, a〉) + 12 |1〉 (|a, b〉− |b, a〉) for

which the probability of measuring the ground stateis given by

P (|0anc〉) =1

2+

1

2|〈a| b〉|2 . (2)

A probability of 1/2 consequently shows that the twoquantum states |a〉 and |b〉 do not overlap at all (inother words, they are orthogonal), while a proba-bility of 1 indicates that they have maximum overlap.

Based on the swap test, Lloyd, Mohseni andRebentrost [11] recently proposed a way to retrievethe distance between two real-valued n-dimensionalvectors ~a and ~b through a quantum measurement.More precisely, the authors calculate the inner prod-uct of the ancilla of state |ψ〉 = 1√

2(|0, a〉 + |1, b〉)

with the state |φ〉 = 1√Z

(|~a| |0〉 − |~b| |1〉) (with

Z = |~a|2 + |~b|2), evaluating |〈φ| ψ〉|2 as part of aswap test. This looks complicated, but is first of allan inexpensive procedure since the states |φ〉 and|ψ〉 can be efficiently prepared [11]. The trick liesin the clever definition of a quantum state givenin Eq. (1), which encodes the classical length of avector ~x into the scalar product of the quantum statewith itself, 〈x| x〉 = |~x|−1|~x|. With this definition

the identity |~a − ~b|2 = Z |〈φ| ψ〉|2 holds true. The

classical distance between two vectors ~a and ~b canconsequently be retrieved through a simple quantumswap test of carefully constructed states. Lloyd,Mohseni and Rebentrost use this procedure for a

7

quantum version of the nearest-centroid algorithm.With ~a ≡ ~x and ~b ≡ 1

Nc

∑p ~v

p, they propose tocalculate the classical distance from the new inputto a given centroid, |~x − 1

Nc

∑p ~v

p|, through theabove described procedure. The authors claim thateven when considering the operations to constructthe quantum states involved, this quantum methodis more efficient than the polynomial runtime neededto calculate the same value on a classical computer.

Wiebe, Kapoor and Svore [13] also use a swap testin order to calculate the inner product of two vectors,which is another distance measure between featurevectors. However, they use an alternative repre-sentation of classical information through quantumstates. Given n-dimensional classical vectors ~a,~bwith entries aj = |aj |eiαj , bj = |bj |eiβj , j = 1, ..., nas well as an upper bound rmax for the en-tries of the training vectors in T and an upperbound for the number of zeros in a vector d (thesparsity), the idea is to write the parametersinto amplitudes of the quantum states |A〉 =1√d

∑j |j〉 (

√1− |aj |

2

r2maxe−iαj |0〉 +

ajrmax|1〉) |1〉 and

|B〉 = 1√d

∑j |j〉 |1〉 (

√1− |bj |

2

r2maxe−iβj |0〉 +

bjrmax|1〉)

and perform a swap test on |A〉 and |B〉. Ac-cording to Eq. (2), the probability of measuringthe swap-test ancilla in the ground state is thenP (|0〉anc) = 1

2 + 12 |

1dr2max

∑i aibi|2 and the inner

product of ~a,~b can consequently be evaluatedby |

∑i aibi|2 = d2r4max (2P (|0〉anc)− 1), which is

altogether independent of the dimension n of thevector. The authors in fact claim a quadraticspeed-up compared to classical algorithms. In thesame contribution, Wiebe, Kapoor and Svore alsogive a scheme for a (weighted) nearest-centroid algo-rithm based on the Euclidian distance evaluated bywell-known algorithms from the toolbox of quantuminformation, the amplitude estimation algorithm[40] and Durr and Høyer’s find minimum subroutine[41].

A full quantum pattern recognition algorithm forbinary features was presented by Trugenberger [9].He expands his quantum associative memory circuit

[42] for this purpose. At the centre is his subrou-tine to measure the Hamming distance between twobinary quantum states. He constructs a quantumsuperposition containing all states of the quantumtraining set, and writes the Hamming distance to thebinary input vector |x〉 = |x1...xn〉 , xi = {0, 1} intothe amplitude of each training vector state. This isdone by the following useful routine based on elemen-tary quantum operations. Given two binary strings|a1...an〉 and |b1...bn〉 with entries ai, bi ∈ {0, 1}, weconstruct the initial state |ψ〉 = |a1...an, b1...bn〉 ⊗1√2(|0〉 + |1〉), consisting of two registers for the

qubits of a and b respectively, as well as an extra2-dimensional ancilla register in superposition. Theinverse Hamming distance between each qubit of thefirst and second register,

dk =

{0, if |ak〉 = |bk〉 ,1, else,

replaces the respective qubit in the second register.This is done by applying an XORa,b-gate which over-writes the second entry bk with 0 if ak = bk and elsewith 1, as well as a NOT gate. The result is the state

|ψ′〉 =∣∣a1...an, d1...dn⟩⊗ 1√

2(|0〉+ |1〉).

To write the total Hamming distance dH(~a,~b) firstinto the phase and then into the amplitude, Trugen-berger uses the unitary operator U = exp(−i π2nH)with H = 1⊗

∑k( 1

2 (σz + 1))dk ⊗ σz working on thethree registers. Note that this adds a negative signin case the ancilla qubit is in |1〉. A Hadamard trans-formation on the ancilla state, Hanc = 1 ⊗ 1 ⊗ Hconsequently results in

|ψ′′〉 = cos[ π

2ndH(~a,~b)

] ∣∣a1...an, d1...dn, 0⟩+

+ sin[ π

2ndH(~a,~b)

] ∣∣a1...an, d1...dn, 1⟩ .Measuring the ancilla in |0〉 leads to a state in whichthe amplitude scales with the Hamming distancebetween ~a and ~b. Of course, the power of thisroutine only becomes visible if it is applied to a largesuperposition of training states in the first register|a1, ..., an〉 →

∑p |vp〉. A clever measurement then

8

retrieves the states close to the input state with ahigh probability.

3.2 Quantum computing for supportvector machines

A support vector machine is used for linear dis-crimination, which is a subcategory of patternclassification. The task in linear discriminationproblems is to find a hyperplane that is the bestdiscrimination between two class regions and servesas a decision boundary for future classification tasks.In a trivial example of one-dimensional data andonly two classes, we would ask which point x liesexactly between the members of class 1 and 2, sothat all values left of x belong to one class and allvalues right of x to the other. In higher dimensions,the boundary is given by a hyperplane (see Figure 6for two dimensions). It seems like a severe restrictionthat methods of linear discrimination require theproblem to be linearly separable, which means thatthere is a hyperplane that divides the datapointsso that all vectors of either class are on one side ofthe hyperplane (in other words, the regions of eachclass have to be disjunct). However, a non-separableproblem can be mapped onto a linearly separableproblem by increasing the dimensions [22].

A support vector machine tries to find the opti-mal separating hyperplane. The best discriminatinghyperplane has a maximum distance to the closestdatapoints, the so called support vectors. This isa mathematical optimisation problem of finding themaximum margin |~w|−1 (~v ~w + b) between the hyper-plane and the support vectors [29] (see Figure 6). Inthe 2-dimensional case, the boundary conditions are

~w~vi + b ≥ 1, when ci = 1,

~w~vi + b ≤ −1, when ci = −1,(3)

for each support vector ~vi from the training data setand its classification ci ∈ {−1, 1}. This means thatwhile finding a maximum margin, the hyperplanemust still separate the training vectors of the twoclasses correctly. This optimisation problem can be

||w||w*v +b

w v

||w||-b

Figure 6: A support vector machine finds a hyper-plane (here a line) with maximum margin to the clos-est vectors. This image illustrates the geometry ofthe optimisation problem based on [29].

formulated using the Langrangian method [22] or indual space [43].

Without going into the complex mathematicaldetails of support vector machines, it is importantto note that the mathematical formulation of theoptimisation problem contains a kernel K, a matrixcontaining the inner product of the feature vectors(K)pk = ~vp · ~vk, p, k = 1, ..., N (or the basis vectorsthey are composed of) as entries. Support vectormachines are in fact part of a larger class of so calledkernel methods [29] (for more details see [22]) thatsuffer from the fact that calculating kernels can getvery expensive in terms of computational resources.More precisely, quadratic programming problemsof this form have a complexity of O((Nn)3) [29]where Nn is the number of variables involved, andcomputational resources therefore grow significantlywith the size of the training data. It is thus crucialfor support vector machines to find a method ofevaluating an inner product efficiently. This is wherequantum computing comes into play.

Rebentrost, Mohseni and Lloyd [12] claim thatin general, the evaluation of an inner product canbe done faster on a quantum computer. Given the

quantum state6 |χ〉 = 1/√Nχ∑2n

i=1 |~xi| |i〉∣∣xi⟩, with

6The initial state can be constructed by using a QuantumRandom Access Memory oracle described in [44], accessing a

9

Nχ =∑2n

i=1 |~xi|2. The∣∣xi⟩ are a 2n-dimensional ba-

sis of the training vector space T , so that every train-ing vector |vp〉 can be represented as a superposition|vp〉 =

∑αi∣∣xi⟩. Similar to the same authors’ dis-

tance measurement given in Eq. (1), the quantumevaluation of a classical inner product relies on thefact that the quantum states are normalised as

⟨xi∣∣ xj⟩ =

~xi · ~xj

|~xi||~xj |.

The kernel matrix of the inner products of the basisvectors, K with (K)i,j = ~xi · ~xj , can then be calcu-lated by taking the partial trace of the correspondingdensity matrix |χ〉〈χ| over the states

∣∣xi⟩,trx[|χ〉〈χ|] =

1

2n∑i,j=1

⟨xi∣∣ xj⟩ |~xi||~xj |︸ ︷︷ ︸~xi·~xj

|i〉〈j| = K

tr[K].

Rebentrost, Mohseni and Lloyd propose that theinner product evaluation can not only be used forthe kernel matrix but also when a pattern has to beclassified, which invokes the evaluation of the innerproduct between the above parameter vector ~w andthe new input (see Eq. 3).7

3.3 Quantum algorithms for cluster-ing

Clustering describes the task of dividing a set ofunclassified feature vectors into k subsets or clusters.It is the most prominent problem in unsupervisedlearning, which does not use training sets or ‘priorexamples’ for generalisation, but rather extractsinformation on structural characteristics of a dataset. Clustering is usually based on a distance

superposition of memory states in O(log(nM)).7In the same paper, Rebentrost, Mohseni and Lloyd [12]

also present another quantum support vector machine thatuses the reformulation of the optimisation as a least-squaresproblem, which appears to be a system of linear equations.Following [45], this can be solved by a quantum matrix inver-sion algorithm, which under some conditions (depending onthe matrix and the output information required) can be moreefficient than classical methods. The classification is then pro-posed to be done through a swap test.

measure such as the squared Euclidean distance((~a−~b)2 with ~a,~b ∈ RN ).

The standard textbook example for clustering isthe k-means algorithm, in which alternately eachfeature vector or datapoint is assigned to its closestcurrent centroid vector to form a cluster for eachcentroid, and the centroid vectors get calculatedfrom the clusters of the previous step (see Figure 7).Of course, the first iteration requires initial choicesfor the centroid vectors, and a free parameter is thenumber k of clusters to be formed. The procedureeventually converges to stable centroid positions.However, these may represent local minima, asonly the position of the initial centroids defineswhether a global minima can be reached [46]. Otherproblems of k-means clustering are how to choosethe parameter k without prior knowledge of thedata, and how to deal with clusters that are visiblynot grouped according to distance measures (suchas concentric circles). Still, k-means works wellfor many simple applications of reducing manydatapoints into only a few groups, for example indata compression tasks. A variation of the k-meansalgorithm is the k-median clustering, in which therole of the centroid is taken over by the datapoint ofa cluster, that has the smallest total distance to allother points.

Besides versions of quantum clustering that aremerely inspired by quantum mechanics [47] or use the

quantum mechanical fidelity Fid(|ψ〉 , |φ〉) = |〈ψ| φ〉|2as a distance measure for an otherwise classical al-gorithm [38], several full quantum routines forclustering have been proposed. For example,Aımeur, Brassard, Gilles and Gambs [48] use twosubroutines for a quantum k-median algorithm.First, with the help of an oracle that calculatesthe distance between two quantum states, the totaldistance of each state to all other states of onecluster is calculated. Based on the find minimumsubroutine in [41], the authors then describe aroutine to find the smallest value of this distancefunction and select the according quantum state asthe new median for the cluster. Unfortunately, the

10

step 1 step 2

Figure 7: The alternating steps of a k-means algo-rithm. Step 1: The clusters (different shapes andcolours) are defined by attributing each vector to theclosest centroid vector (larger and darker shapes).Step 2: The centroids of each cluster defined in theprevious cycle are recalculated and define a new clus-tering.

oracle is not described in detail, and their quantummachine learning proposal largely depends on howand with what resources it can be implemented.

In their contribution discussed earlier, Lloyd,Mohseni and Rebentrost [11] present an unsu-pervised quantum learning algorithm for k-meansclustering that is based on adiabatic quantumcomputing. Adiabatic quantum computing is analternative to the above introduced method of im-plementing unitary gates, and tries to continuouslyadjust the quantum system’s parameters in anadiabatic process in order to transfer a ground statewhich is easy to prepare into a ground state whichencodes the result of the computation. Although notin focus here, quantum adiabatic computing seemsto be an interesting candidate for quantum machinelearning methods [15]. This is why we want to sketchthe idea of how to use adiabatic quantum computingfor k-means clustering.

In [11], the goal of each clustering step isto have an output quantum superposition |χ〉 =1/√Nc∑c,p∈c |c〉 |~vp〉, where as usual {|vp〉}p=1,...,N is

the set of N feature vectors or datapoints expressedas quantum states, and |c〉 is the cluster the sub-set {

∣∣vj⟩}j=1,...,Nc is assigned to after the cluster-ing step. The authors essentially propose to adi-

abatically transform an initial Hamiltonian H0 =1− 1

k

∑c,c′ |c〉〈c′|, into a Hamiltonian

H1 =∑c′,j

|~vp − ~vc′ |2|c′〉〈c′| ⊗ |j〉〈j|,

encoding the distance between vector ~vp to the cen-troid of the closest cluster, ~vc. They give a morerefined version and also mention that the adiabaticmethod can be applied to solve the optimisationproblem of finding good initial or ‘seed’ centroid vec-tors.

3.4 Searching for a quantum neuralnetwork model

An artificial neural network is a n-dimensionalgraph where the nodes xm are called neurons andtheir connections are weighted by parameters wmlrepresenting synaptic strengths between neurons(m, l = 1, ..., n). An activation function defines thevalue of a neuron depending on the current value ofall other neurons weighted by the parameters wml,and the dynamics of the neural network is given bysuccessively updating the value of neurons throughthe activation function. An artificial neural networkcan thus be understood as a computational device,the input being the initial values of the neuronsand the output either a stable state of the entirenetwork or the state of a specific subset of neurons.‘Programming’ a neural network can be done byselecting weight parameters wml and an activationfunction encoding a certain input-output relation.The power of artificial neural networks lies in thefact that they can learn their weights from trainingdata, a fact that neuroscientists believe is the basicprinciple of how our brain processes information [49].

For pattern classification we usually considerso called feed-forward neural networks in whichneurons are arranged in layers, and each layer feedsits values into the next layer. An input is presentedto a feed-forward neural network by initialising theinput layer, and after each layer successively updatesits nodes the output (for example encoding the

11

Out

Out

Out

In

In

In

Figure 8: Illustration of a feed-forward neural net-work with a sigmoid activation function for each neu-ron.

classification of the input) can be read out in thelast layer (see Figure 8).

Feed-forward neural networks often use sigmoid ac-tivation functions

xl = sgm

(N∑m=1

wmlxm;κ

),

defined by sgm(a;κ) = (1 + e−κa)−1. If an appropri-ate set of weight parameters is given, feed-forwardneural networks are able to classify input patternsextremely well. To evoke the desired generalisation,the network is initialised with training vectors, theoutput is compared to the correct output, and theweights adjusted through gradient descent in orderto minimise the classification error. The procedure iscalled backpropagation [50]. A challenge for patternclassification with neural networks is the computa-tional cost for the backpropagation algorithm, evenwhen we consider improved training methods suchas deep learning [30].

There are a number of proposals for quantum ver-sions of neural networks. However, most of themconsider another class, so called Hopfield networks,which are powerful for the related task of associa-tive memory that is derived from neuroscience ratherthan machine learning. A large share of the litera-ture on quantum neural networks tries to find spe-cific quantum circuits that integrate the mechanismsof neural networks in some way [6, 51, 52, 53], tryingto use the power of neural computing for quantum

computation. A practical implementation is given byElizabeth Behrman [54, 55, 56] who uses interact-ing quantum dots to simulate neural networks withquantum systems. An interesting approach is also touse fuzzy feed-forward neural networks inspired byquantum mechanics [57] to allow for multi-state neu-rons. Also worth mentioning is the pattern recogni-tion scheme implemented through adiabatic comput-ing with liquid-state nuclear magnetic resonance [16].Despite this rich body of ideas, there is no quantumneural network proposal that delivers a fully function-ing efficient quantum pattern classification methodthat the authors know of. However, it is an interest-ing open challenge to translate the nonlinear activa-tion function into a meaningful quantum mechanicalframework [7], or to find learning schemes based onquantum superposition and parallelism.

3.5 Towards a quantum decision tree

Decision trees are classifiers that are probably themost intuitive for humans. Depending on the answerto a question on the features, one follows a certainbranch leading to the next question until the finalclass is found (see Figure 9). More precisely, amathematical tree is an undirected graph in whichany two nodes are connected by exactly one edge.Decision trees in particular have one starting node,the ‘root’ (a node with outgoing but no incomingedges), and several end points or ‘leaves’ (nodes withincoming but no outgoing edges). Each node exceptfrom the leaves contains a decision function whichdecides which branch an input vector follows to thenext layer, or in other words, which partition on aset of data is makes. The leaves then represent thefinal classification. As in the example in Figure 9,this procedure could be used to classify an email as‘spam’, ‘no spam’ or ‘unsure’.

Decision trees, as all classifiers in machine learn-ing, are constructed using a training data set offeature vectors. The art of decision tree designlies in the selection of the decision function ineach node. The most popular method is to findthe function that splits the given dataset intothe ‘most organised’ sub-datasets, and this can

12

Email sender address book

No spam

YesNo

Email contains indicated word combinations

No Yes

SpamUnsure

Sender manually marked as spam

Figure 9: A simple example of a decision tree forthe classification of emails. The geometric shapessymbolise feature vectors from different classes thatare devided according to decision functions along thetree structure.

be measured in terms of Shannon’s entropy [22].Assume the decision function of a node splits aset of P feature vectors {~vp}, p = 1, ..., N intoM subsets each containing {N1, ..., NM} vectors

respectively (and∑Mi=1Ni = N). Without further

information, we calculate the probability of anyvector ~vp to be attributed to subset i, i ∈ {1, ...,M}(in other words to proceed to the ith node of thenext layer) as ρi = Ni

N , and the entropy caused bythe decision function or partition is consequentlyS = −

∑Mi=1 ρ

ilog(ρi). For example, in a binary treewhere all nodes have two outgoing edges, the bestpartition would split the original set into two subsetsof the same size. Obviously, this is only possible ifone of the features allows for such a split. Dependingon the application, an optimal decision tree would besmall in the number of nodes, branches and/or levels.

Lu and Brainstein [58] propose a quantum versionof the decision tree. Their classifying process followsthe classical algorithm with the only difference thatwe use quantum feature states |v〉p = |vp1 , ..., vpn〉 en-coding n features into the states of a quantum sys-tem. At each node of the tree, the set of trainingquantum states is divided into subsets by a measure-ment (or as the authors call it, estimating attributevi, i = 1, ..., n). Lu and Brainstein do not give a

clear account of how the division of the set at eachnode takes place and remain enigmatic in this essen-tial part of the classifying algorithm. They contributethe interesting idea of using the von Neumann en-tropy to design the graph partition. Although thefirst step has been made, the potential of a quantumdecision tree is still to be established.

3.6 Quantum state classification withBayesian methods

Stochastic methods such as Bayesian decision theoryplay an important role in the discipline of machinelearning. It can also be used for pattern classifi-cation. The idea is to analyse existing information(represented by the above training data set T ) in or-der to calculate the probability that a new input isof a certain class. An illustrative example is the riskclass evaluation of a new customer to a bank. This isnothing else than a conditional probability and canbe calculated using the famous Bayes formula

p(c|~x) =p(c)p(~x|c)p(~x)

.

Here, p(c), p(~x) are the probabilities of data beingin class c and of getting input ~x respectively, whilep(c|~x) is the conditional probability of assigning cupon getting ~x and p(~x|c) is the class likelihood ofgetting ~x if we look in class c. Obviously, we assignthe class with the highest conditional probability (or‘Bayes classifier’) p(cl|~x) to an input [22]. Values ofinterest, such as risk functions, can be calculatedaccordingly. Bayesian theory is an interestingcandidate for the translation into quantum physics,since both approaches are probabilistic.

Opposed to above efforts to improve machinelearning algorithms through quantum computing,Bayesian methods can be used for an important taskin quantum information called quantum state clas-sification. This problem stems from quantum in-formation theory itself, and the goal is to use ma-chine learning based on Bayesian theory in order todiscriminate between two quantum states producedby an unknown or partly unknown source. This is

13

again a classification problem, since we have to learnthe discrimination function between two classes c1, c2from examples. The two (unknown) quantum statesare represented by density matrices ρ, σ. The basicidea is to use a positive operator-valued measurement(POVM) with binary outcome corresponding to thetwo classes as a Bayesian classifier, in other words,to learn (or calculate) the measurement on our quan-tum states that is able to discriminate them [59]. Forthis process we have a training set consisting of ex-amples of the two states and their respective classifi-cation, T = {(ρ, c1), (σ, c2), (ρ, c1), ...} and the exper-imenter is allowed to perform any operation on thetraining set. Guta and Kot lowski [59] find an optimalqubit classification strategy while Sasaki and Carlini[60] are concerned with the related template match-ing problem8 by solving an optimisation problem forthe measurement operator. Sentis et al. [17] give avariation in which the training data can be stored asclassical information. The proposals are so far of the-oretical nature and await experimental verification ofthe usefulness of this scheme.

3.7 Hidden quantum Markov models

In the last couple of years, hidden Markov modelswere another important method of machine learningthat has been investigated from the perspectiveof quantum information [61, 18]. Hidden Markovmodels are Markov processes for which the states ofthe system are only accessible through observations(see Figure 10, for a very readable introductionsee [62]). In a (first order discrete and static)Markov model, a system has a countable set of statesS = {sm}m=1,...,M and the transition between thesestates are governed by a stochastic process in sucha way that given a set of transition probabilities{aml}m,l=1,...,M , the system’s state at time t+ 1 onlydepends on the previous state at time t. In a hiddenmodel, the state of the system is only accessiblethrough observations at time t {ot} that can take oneof a set of symbols, and an observation again has acertain probability to be invoked by a specific state.

8Template matching is the task to assign the most similartraining vector of a training set to an input vector.

Hidden Markov models are thus doubly embeddedstochastic processes. To use a common applicationfor pattern recognition as an example [29], considera recorded speech. The speech is a realisation ofa Markov process, a so called Markov chain ofsuccessive words. The recording is the observation,and we shall for now imagine a way to translate thesignal into discrete symbols. A Markov model isdefined by the transition probabilities between wordsin a certain language, and the model can be learnedfrom examples of speeches. A hidden Markov modelalso includes the conditional probabilities that givena certain signal observation, a certain word has beensaid. Goals of such models are to find the sequenceof words that is the most likely for a recording, topredict the next word or, if only given the recording,to infer the optimal hidden Markov model thatwould encode it. Hidden Markov models play animportant role in many other applications such asDNA analysis and online handwriting recognition[29].

Monras, Beige and Wiesner [61] first introduced ahidden quantum Markov model in 2010. In contrastto a previous paper [63] in which the observations arerepresented by quantum basis states and the observa-tion process is given by a von Neumann or projectivemeasurement of an evolving quantum system, the au-thors consider the much more general formalism ofopen quantum systems (for an introduction to openquantum systems, see [64]). The state of a system isgiven by a density matrix ρ and transitions betweenstates are governed by completely positive trace-nonincreasing superoperators Ai acting on these ma-trices. These operations can always be represented bya set of Kraus operators [64] {Ki1, ...,Kiq} fulfilling the

probability conservation condition∑q Ki†q Kiq ≤ 1,

ρ′ = Aiρ =∑k

KikρKi†k .

The probability of obtaining state ρs = P (ρs)−1Asρ

is given by P (ρs) = tr[Asρ] [61].

The advantage of hidden quantum Markov modelsis that they contain classical hidden Markov models

14

S1

S2

S3

o12 o4 o8

t1t2 t3

Figure 10: (Colour online) A hidden Markov modelis a stochastic process of state transitions. In thissketch, the three states s1, s2, s3 are connected withlines symbolising transition probabilities. A deter-ministic realisation is a sequence of states, here thetransition s1 → s2 → s1 that give rise to observationso12 → o4 → o8. A task for hidden Markov modelsis to guess the most likely state sequence given anobservation sequence.

and are therefore a generalisation offering richerdynamics than the original process [61]. In futurethere might also be the possibility of ‘calculating’ theoutcomes of classical models via quantum simulation.That would be especially interesting if the quantumsetting could learn models from given examples, aproblem which is nontrivial [62]. Clark et al. [18]add the notion that hidden quantum Markov modelscan be implemented using open quantum systemswith instantaneous feedback, in which informationobtained from the environment is used to influencethe system. However, a rigorous treatment of thisidea is still outstanding, and the power of hiddenquantum Markov models to solve the problems forwhich classical models where developed is yet to beshown.

An interesting sibling of hidden quantum Markovmodels are quantum observable Markov decision pro-cesses [65] which use a very similar idea. Classicalobservable Markov decision processes can be under-stood as hidden Markov models in which before eachstep an agent takes a decision for a certain action,

leading to the next state of the system. The state ofthe system is again only accessible through observa-tions that deliver probabilistic information. The goalis to find a strategy (defining what action to takeupon what observation) that maximises the rewardsgiven by a reward function. This is a problem ofreinforcement learning by intelligent agents which isnot the focus of this contribution. However, we alsofind the striking analogy to Kraus operations on openquantum systems representing the actions that ma-nipulate the density matrix or stochastic descriptionof the system.

4 Conclusion

This introduction into quantum machine learninggave an overview of existing ideas and approachesto quantum machine learning. Our focus wasthereby on supervised and unsupervised methodsfor pattern classification and clustering tasks, andit is therefore by no means a complete review.In summary, there are two main approaches toquantum machine learning. Many authors try tofind quantum algorithms that can take the placeof classical machine learning algorithms to solve aproblem, and show how an improvement in terms ofcomplexity can be gained. This is dominantly truefor nearest neighbour, kernel and clustering methodsin which expensive distance calculations are sped upby quantum computation. Another approach is touse the probabilistic description of quantum theoryin order to describe stochastic processes. In thecase of hidden quantum Markov models, this servedto generalise the model, while Bayesian theorywas also used for genuinely quantum informationtasks like quantum state discrimination. A greatdeal of contributions is still in a phase of exploringpossibilities to combine formalisms from quantumtheory and methods of machine learning, as seen inthe area of quantum neural networks and quantumdecision trees.

As previously remarked, a quantum theory oflearning is yet outstanding. Although working onquantum machine learning algorithms, only very few

15

contributions actually answer the question of howthe strength and defining feature of machine learn-ing, the learning process, can actually be simulatedin quantum systems. Especially learning methods ofparameter optimisation have not yet been accessedfrom a quantum perspective. Different approaches toquantum computing can be investigated for this pur-pose. In quantum computing based on unitary quan-tum gates, the challenge would be to parameteriseand gradually adapt the unitary transformations thatdefine the algorithm. Several ideas in that direc-tion have been investigated already [66, 67, 35], andimportant tools could be quantum feedback control[68] or quantum Hamiltonian learning [69]. As men-tioned before, adiabatic quantum computing mightlend itself to learning as an optimisation problem[15]. Other alternatives of quantum computation,such as dissipative [70] and measurement-based quan-tum computing [71] might also offer an interestingframowork for quantum learning. In summary, eventhough there is still a lot of work to do, quantummachine learning remains a very promising emergingfield of research with many potential applications anda great theoretical variety.

Acknowledgements

This work is based upon research supported by theSouth African Research Chair Initiative of the De-partment of Science and Technology and NationalResearch Foundation.

References

[1] Martin Hilbert and Priscila Lopez. Theworld’s technological capacity to store, com-municate, and compute information. Science,332(6025):60–65, 2011.

[2] Michael A Nielsen and Isaac L Chuang. Quan-tum computation and quantum information.Cambridge University Press, 2010.

[3] I. M. Georgescu, S. Ashhab, and FrancoNori. Quantum simulation. Review of ModernPhysics, 86:153–185, 2014.

[4] Gerasimos G Rigatos and Spyros G Tzafestas.Neurodynamics and attractors in quantum as-sociative memories. Integrated Computer-AidedEngineering, 14(3):225–242, 2007.

[5] Elizabeth C Behrman and James E Steck. Aquantum neural network computes its own rel-ative phase. arXiv preprint arXiv:1301.2808,2013.

[6] Sanjay Gupta and RKP Zia. Quantum neuralnetworks. Journal of Computer and System Sci-ences, 63(3):355–383, 2001.

[7] Maria Schuld, Ilya Sinayskiy, and FrancescoPetruccione. The quest for a quantum neuralnetwork. Quantum Information Processing, DOI10.1007/s11128-014-0809-8, 2014.

[8] Dan Ventura and Tony Martinez. Quan-tum associative memory. Information Sciences,124(1):273–296, 2000.

[9] Carlo A Trugenberger. Quantum patternrecognition. Quantum Information Processing,1(6):471–493, 2002.

[10] Ralf Schutzhold. Pattern recognition on a quan-tum computer. Physical Review A, 67:062311,2003.

[11] Seth Lloyd, Masoud Mohseni, and PatrickRebentrost. Quantum algorithms for super-vised and unsupervised machine learning. arXivpreprint arXiv:1307.0411, 2013.

[12] Patrick Rebentrost, Masoud Mohseni, and SethLloyd. Quantum support vector machine forbig feature and big data classification. arXivpreprint arXiv:1307.0471, 2013.

[13] Nathan Wiebe, Ashish Kapoor, and KrystaSvore. Quantum nearest-neighbor algo-rithms for machine learning. arXiv preprintarXiv:1401.2142, 2014.

16

[14] Hartmut Neven, Vasil S Denchev, Geordie Rose,and William G Macready. Training a large scaleclassifier with the quantum adiabatic algorithm.arXiv preprint arXiv:0912.0779, 2009.

[15] Kristen L Pudenz and Daniel A Lidar. Quantumadiabatic machine learning. Quantum Informa-tion Processing, 12(5):2027–2070, 2013.

[16] Rodion Neigovzen, Jorge L Neves, Rudolf Sol-lacher, and Steffen J Glaser. Quantum patternrecognition with liquid-state nuclear magneticresonance. Physical Review A, 79(4):042321,2009.

[17] G Sentıs, J Calsamiglia, Ramon Munoz-Tapia,and E Bagan. Quantum learning without quan-tum memory. Scientific Reports, 2(708):1–8,2012.

[18] Lewis A Clark, Wei Huang, Thomas M Bar-low, and Almut Beige. Hidden quantummarkov models and open quantum systemswith instantaneous feedback. arXiv preprintarXiv:1406.5847, 2014.

[19] Stuart Jonathan Russell, Peter Norvig, John FCanny, Jitendra M Malik, and Douglas D Ed-wards. Artificial intelligence: A modern ap-proach, volume 3. Prentice Hall EnglewoodCliffs, 2010.

[20] Frank Rosenblatt. The perceptron: a proba-bilistic model for information storage and or-ganization in the brain. Psychological Review,65(6):386, 1958.

[21] Arthur L Samuel. Some studies in machinelearning using the game of checkers. IBM Jour-nal of research and development, 44(1.2):206–226, 2000.

[22] Ethem Alpaydin. Introduction to machine learn-ing. MIT press, 2004.

[23] Richard O Duda, Peter E Hart, and David GStork. Pattern classification. John Wiley &Sons, 2012.

[24] Steven E Landsburg. Quantum game theory.Wiley Encyclopedia of Operations Research andManagement Science, 2011.

[25] Jens Eisert, Martin Wilkens, and Maciej Lewen-stein. Quantum games and quantum strategies.Physical Review Letters, 83(15):3077, 1999.

[26] Hans J Briegel and Gemma De las Cuevas. Pro-jective simulation for artificial intelligence. Sci-entific Reports, 2, 2012.

[27] Jiangfeng Du, Hui Li, Xiaodong Xu, MingjunShi, Jihui Wu, Xianyi Zhou, and Rongdian Han.Experimental realization of quantum games ona quantum computer. Physical Review Letters,88(13):137902, 2002.

[28] Edward W Piotrowski and Jan S ladkowski.An invitation to quantum game theory. In-ternational Journal of Theoretical Physics,42(5):1089–1099, 2003.

[29] Christopher M Bishop et al. Pattern recognitionand machine learning, volume 1. springer NewYork, 2006.

[30] Geoffrey Hinton, Simon Osindero, and Yee-Whye Teh. A fast learning algorithm for deep be-lief nets. Neural Computation, 18(7):1527–1554,2006.

[31] David E Rumelhart, Geoffrey E Hinton, andRonald J Williams. Learning representationsby back-propagating errors. Cognitive Modeling,1988.

[32] Masahide Sasaki and Alberto Carlini. Quantumlearning and universal quantum matching ma-chine. Physical Review A, 66(2):022303, 2002.

[33] Esma Aımeur, Gilles Brassard, and SebastienGambs. Quantum speed-up for unsupervisedlearning. Machine Learning, 90(2):261–287,2013.

[34] Markus Hunziker, David A Meyer, Jihun Park,James Pommersheim, and Mitch Rothstein. Thegeometry of quantum learning. arXiv preprintquant-ph/0309059, 2003.

17

[35] Alessandro Bisio, Giulio Chiribella, Gia-como Mauro DAriano, Stefano Facchini, andPaolo Perinotti. Optimal quantum learning ofa unitary transformation. Physical Review A,81(3):032324, 2010.

[36] Richard W Hamming. Error detecting and errorcorrecting codes. Bell System Technical Journal,29(2):147–160, 1950.

[37] Klaus Hechenbichler and Klaus Schliep.Weighted k-nearest-neighbor techniques andordinal classification. 2004.

[38] Esma Aımeur, Gilles Brassard, and SebastienGambs. Machine learning in a quantum world.In Advances in Artificial Intelligence, pages 431–442. Springer, 2006.

[39] Harry Buhrman, Richard Cleve, John Watrous,and Ronald De Wolf. Quantum fingerprinting.Physical Review Letters, 87(16):167902, 2001.

[40] Gilles Brassard, Peter Høyer, Michele Mosca,and Alain Tapp. Quantum amplitude ampli-fication and estimation. arXiv preprint quant-ph/0005055, 2000.

[41] Christoph Durr and Peter Høyer. A quantum al-gorithm for finding the minimum. arXiv preprintquant-ph/9607014, 1996.

[42] Carlo A Trugenberger. Probabilistic quantummemories. Physical Review Letters, 87:067901,Jul 2001.

[43] Bernhard E Boser, Isabelle M Guyon, andVladimir N Vapnik. A training algorithm for op-timal margin classifiers. Proceedings of the fifthannual workshop on Computational learning the-ory, pages 144–152, 1992.

[44] Vittorio Giovannetti, Seth Lloyd, and LorenzoMaccone. Quantum random access memory.Physical Review Letters, 100(16):160501, 2008.

[45] Aram W Harrow, Avinatan Hassidim, and SethLloyd. Quantum algorithm for linear sys-tems of equations. Physical Review Letters,103(15):150502, 2009.

[46] Simon Rogers and Mark Girolami. A first coursein machine learning. CRC Press, 2012.

[47] David Horn and Assaf Gottlieb. Algorithm fordata clustering in pattern recognition problemsbased on quantum mechanics. Physical ReviewLetters, 88(1):018702, 2002.

[48] Esma Aımeur, Gilles Brassard, and SebastienGambs. Quantum clustering algorithms. Pro-ceedings of the 24th international conference onmachine learning, pages 1–8, 2007.

[49] Peter Dayan and Laurence F Abbott. Theoret-ical neuroscience, volume 31. MIT press Cam-bridge, MA, 2001.

[50] John A Hertz, Anders S Krogh, and Richard GPalmer. Introduction to the theory of neuralcomputation, volume 1. Westview Press, 1991.

[51] W Oliveira, Adenilton J Silva, Teresa B Lud-ermir, Amanda Leonel, Wilson R Galindo, andJefferson CC Pereira. Quantum logical neuralnetworks. 10th Brazilian Symposium on Neu-ral Networks, 2008. SBRN’08., pages 147–152,2008.

[52] Adenilton J da Silva, Wilson R de Oliveira,and Teresa B Ludermir. Classical and super-posed learning for quantum weightless neuralnetworks. Neurocomputing, 75(1):52 – 60, 2012.

[53] Massimo Panella and Giuseppe Martinelli. Neu-ral networks with quantum architecture andquantum learning. International Journal of Cir-cuit Theory and Applications, 39(1):61–77, 2011.

[54] Elizabeth C Behrman, James E Steck, andSteven R Skinner. A spatial quantum neuralcomputer. International Joint Conference onNeural Networks, 1999. IJCNN’99., 2:874–877,1999.

[55] Geza Toth, Craig S Lent, P Douglas Tougaw,Yuriy Brazhnik, Weiwen Weng, Wolfgang Porod,Ruey-Wen Liu, and Yih-Fang Huang. Quantumcellular neural networks. arXiv preprint cond-mat/0005038, 2000.

18

[56] Jean Faber and Gilson A Giraldi. Quantummodels for artificial neural networks. Elec-tronically available: http://arquivosweb. lncc.br/pdfs/QNN-Review. pdf, 2002.

[57] G. Purushothaman and N.B. Karayiannis.Quantum neural networks (qnns): inherentlyfuzzy feedforward neural networks. Neural Net-works, IEEE Transactions on, 8(3):679–693,1997.

[58] Songfeng Lu and Samuel L Braunstein. Quan-tum decision tree classifier. Quantum Informa-tion Processing, 13(3):757–770, 2014.

[59] Madalin Guta and Wojciech Kot lowski. Quan-tum learning: asymptotically optimal classifica-tion of qubit states. New Journal of Physics,12(12):123032, 2010.

[60] Masahide Sasaki, Alberto Carlini, and RichardJozsa. Quantum template matching. PhysicalReview A, 64(2):022317, 2001.

[61] Alex Monras, Almut Beige, and Karoline Wies-ner. Hidden quantum markov models and non-adaptive read-out of many-body states. AppliedMathematical and Computational Sciences, 3:93,2010.

[62] Lawrence R Rabbiner. A tutorial on hid-den markov models and selected applications inspeech recognition. Proceedings of the IEEE,77(2):257–286, 1989.

[63] Karoline Wiesner and James P Crutchfield.Computation in finitary stochastic and quantumprocesses. Physica D: Nonlinear Phenomena,237(9):1173–1195, 2008.

[64] Heinz Peter Breuer and Francesco Petruccione.The theory of open quantum systems. OxfordUniversity Press, 2002.

[65] Jennifer Barry, Daniel T Barry, and ScottAaronson. Quantum pomdps. arXiv preprintarXiv:1406.2858, 2014.

[66] Søren Gammelmark and Klaus Mølmer. Quan-tum learning by measurement and feedback.New Journal of Physics, 11(3):033017, 2009.

[67] Søren Gammelmark and Klaus Mølmer.Bayesian parameter inference from continuouslymonitored quantum systems. Physical ReviewA, 87(3):032115, 2013.

[68] Alexander Hentschel and Barry C Sanders. Ma-chine learning for precise quantum measure-ment. Physical Review Letters, 104(6):063603,2010.

[69] Nathan Wiebe, Christopher Granade, Christo-pher Ferrie, and David Cory. Quantum hamil-tonian learning using imperfect quantum re-sources. Physical Review A, 89(4):042314, 2014.

[70] Frank Verstraete, Michael M Wolf, and J Igna-cio Cirac. Quantum computation and quantum-state engineering driven by dissipation. NaturePhysics, 5(9):633–636, 2009.

[71] HJ Briegel, DE Browne, W Dur, R Raussendorf,and M Van den Nest. Measurement-based quan-tum computation. Nature Physics, 5(1):19–26,2009.

19