Jack Romo 30th April 2019jackromo.com/2019/Turing-Machine-Networks.pdf · Department of Computer Science Submitted in part fulﬁlment for the degree of BSc. Turing Machine Networks

Department of Computer Science

Submitted in part fulfilment for the degree of BSc.

Turing Machine Networks

Jack Romo

30th April 2019

Supervisor: Dr. Detlef Plump

For my beloved cat Pooh, whose unparalleled research skillswould surely have benefited this project, had he not been

asleep throughout.

Acknowledgements

I now take this time to extend my gratitude to my supervisor, Detlef Plump,for his invaluable experience, insight and knowledge in theoretical computerscience - without such guidance, I sincerely doubt this project would havebeen as satisfying as it has turned out to be. I also thank my friends fortheir everlasting support, and my parents for both their love and for givingme the opportunities I now have the privilege to take.

Contents

Executive Summary v

1 Literature Review 11.1 Multicore Processor Models . . . . . . . . . . . . . . . . . 11.2 Boolean Circuits . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Parallel Turing Machines . . . . . . . . . . . . . . . . . . . 41.4 Machine Classes . . . . . . . . . . . . . . . . . . . . . . . 51.5 Distributed Computing Models . . . . . . . . . . . . . . . . 6

2 Definitions 72.1 Turing Networks . . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.1 An Example Network . . . . . . . . . . . . . . . . . 112.2 Measuring Complexity . . . . . . . . . . . . . . . . . . . . 12

2.2.1 Example Time Function . . . . . . . . . . . . . . . . 13

3 Complexity Theory 143.1 The Parallel Computation Thesis . . . . . . . . . . . . . . . 153.2 Simulation by Turing Machines . . . . . . . . . . . . . . . . 183.3 Upper Bounds . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Evaluation 244.1 Simulating BSP . . . . . . . . . . . . . . . . . . . . . . . . 244.2 Simulating Boolean Circuits . . . . . . . . . . . . . . . . . . 26

5 Conclusion 30

A Appendix 31A.1 Network Topologies . . . . . . . . . . . . . . . . . . . . . . 31A.2 Computing Undecidable Problems . . . . . . . . . . . . . . 33A.3 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . 34A.4 Space Complexity . . . . . . . . . . . . . . . . . . . . . . . 35A.5 Upper Bounds on Network Topologies . . . . . . . . . . . . 36

iii

List of Figures

1.1 The shared memory model of PRAM. . . . . . . . . . . . . 11.2 A Boolean Circuit that decides x1 6= x2 for x1x2 ∈ 0, 12. . 3

2.1 Diagram of the CTM T. . . . . . . . . . . . . . . . . . . . . 12

4.1 Diagram of B, with one layer of size n magnified. Note thesend and receive trees Si and Ri, connected at their leaves. 28

iv

Executive Summary

Introduction

The importance of parallelism is difficult to overstate. As sequential com-puters have failed to meet modern computational challenges, engineershave begun to distribute computations across many processors, makingprior algorithms immensely faster. Such a prevalent concept demandsmodelling so its power can be best understood. Many diverse models havebeen developed, ranging from pragmatic models like BSP [1] and LogP [2],to theoretically involved and abstract ones like Parallel Turing Machines [3].

It is unfortunate then that a direct connection of results for classicalTuring Machines to parallel complexity theory has seen little effort [4],with the only major model doing this being Parallel Turing Machines [3],proposed by Wiedermann. This model, while well-developed in its own right,only emulates shared memory parallelism, not accounting for message-passing distributed models like MapReduce [5]. This makes it difficult tobring the hefty theory surrounding Turing Machines to bear on message-passing parallelism, with models like BSP and LogP giving little such theoryand the LOCAL, ASYNC and CONGEST [6] models focusing oncommunication challenges instead. Thus, it is harder to analyze problemssuch as how much faster an problem can be solved in a parallel message-passing environment than sequentially by a Turing Machine, a questionthat debatably defines the efficacy of such an approach to parallelism.

This project aims to produce a model that resolves this conundrum,linking Turing Machines and message-passing parallelism more stronglyand providing an environment in which substantial complexity results can beproduced. It should be noted that this project will restrict itself to classicalcomputation, not straying into the domain of nonstandard models likequantum computing; these are beyond the scope of this project.

Project Outline

In defining our model for parallel complexity theory, we wish for it to fulfillthe following four requirements:

v

Executive Summary

1. The model should intuitively emulate a network of communicatingprocessors without shared memory.

2. The model should allow for meaningful complexity analysis.3. The model should establish a way to compare sequential complexity

classes as defined with Turing Machines and parallel ones.4. The model should be able to simulate other parallel models efficiently.

To achieve this, the project proceeds in three major steps. First, the modelfor parallelism is defined in full, after reflection upon the design decisionsmade by other prevalent models of parallelism. A metric for computationaltime is defined, and an example is investigated to show the model is sound.The model has intuitively been defined as a network of Turing Machineswith extra send/receive transitions, clearly satisfying our first requirement.

Next, complexity theoretic results are pursued to verify that this model canprovide meaningful connections between sequential and parallel complexity.Lower and upper bounds on complexity are produced, and their intuitivemeanings expanded upon; in particular, it is shown that problems beyondEXPTIME can only be made tractable if the network topology we use isitself intractable to compute. This shows our model can derive powerfulresults for parallel complexity theory, satisfying our second requirement. Inthe process, these relations of our model’s complexity classes to standardTuring Machine ones verifies our third requirement; this is aided by the factthat our model is a network of Turing Machines itself.

Finally, other models are simulated within this one to establish the model’srelevance to others and that its complexity results do indeed translate toother areas of parallelism research, cementing the importance of the modeloverall. We in particular choose to simulate BSP, a practical model ofparallelism, and Boolean Circuits, a model of great theoretical significance.This shows our model can encapsulate both practical and theoreticallyinclined approaches to parallel modelling, demonstrating its widespreadapplicability as a model of parallelism.

Overall, the project fulfilled its requirements, proving its versatility as amodel for message-passing parallel complexity. Future work remains ininvestigating dynamic networks and further classifying network topologies.

Ethical Considerations

This project has involved no physical experiments and concerns itself onlywith abstract reasoning. As such, confidential data has not been collected,no human participants have been involved, and neither its results nor theirimmediate applications find ethical consequence.

vi

1 Literature Review

We begin by exploring current research into related lines of work, outliningwell-understood models of parallel and distributed computation along withtheir respective approaches to complexity analysis.

1.1 Multicore Processor Models

One of the most well-known models for parallel computation is PRAM.An elaboration upon the earlier RAM (Random Access Machine) modelfor sequential algorithms by Fortune and Wyllie [7], a PRAM, or ParallelRandom Access Machine, is defined as an unbounded set of indexed RAMprocessors P0, P1, P2... along with a similarly unbounded set of sharedmemory cells C0, C1, C2... such that each processor holds its own localmemory, knows its own index and is able to execute read/write instructionsto any shared memory unit [8, p. 22]. This is a natural expansion of RAM,which was merely a single such processor with memory where accessingany address would take constant time. The PRAM model will normallyinclude an instruction set [8, pp. 22-23], each instruction taking one constantunit of time to execute. Measurements of algorithmic complexity are thusdone by finding upper bounds on the number of such instructions executed.

Shared Memory

P1 P2 P3 Pn...

Figure 1.1: The shared memory model of PRAM.

While this model provides a simple and versatile framework to modelparallel computation, it suffers a number of issues. From the perspectiveof practicality, Eijkhout [9, p. 71] argues that this model is "...unrealistic inpractice...", stating that assumptions about multiple accesses to the same

1

1 Literature Review

location being possible and a lack of memory hierarchy inherently disqualifyPRAM from being an honest emulation of modern processors. Anotherissue to be noted is that of PRAM’s network topology; PRAM inherentlyassumes every memory unit is equally accessible with no overhead byevery processor, meaning the network topology is always the same, asseen in [10, p. 4]. This means communication overhead and topologyissues cannot be explored, both being pertinent enough to motivate thecreation of the LOCAL model for distributed computation [11].

To tackle a few of these issues, further models were proposed, such asLogP [2]. This framework attempts to rectify the overly simplistic assump-tions PRAM makes, adding four constant factors to incur extra overhead.These consist of L, an upper bound on the latency in sending data, o, theoverhead time taken by a processor to engage with message transmission,g, the minimum time gap between sending messages, and P, the numberof processors/memory modules. Further assumptions of asynchronousexecution and a finite network bandwidth are made as well. This model not-ably makes no assumptions of the network topology, allowing for a diverserange of possible implementations. Furthermore, LogP assumes localmemory, making communication a far more central issue than in PRAM,where shared memory makes communication complexity far more trivial.

Another such model is BSP, or Bulk Synchronous Parallelism, which ex-tends sequential computation in a similar way to LogP. In a similar manner,it allows the topology to be of any form and fails to restrict independentprocessor layouts, citing itself to be "independent of target architecture"[1, p. 2]. However, it differs greatly in its addition of global supersteps ofcomputation [1, p. 3], within each of which a number of local asynchronouscomputations occur across processors, followed by a final ’barrier synchron-ization’ where the entire network waits to synchronize before a numberof communications occur en masse. By doing this, deadlock becomesimpossible and programming is made easier [9, p. 116]. This model hasseen a great deal of focus, even possessing a full library implementation[12] and an algebraic semantics by He [13].

A proposition was made by Valiant in 2010 to extend the BSP modelfurther to Multi-BSP [14]. Valiant defined an instance of Multi-BSP tobe "...a tree structure of nested components where the lowest level ofleaf components are processors and every other level contains somestorage capacity." [14, p. 6] A parent vertex represents a storage levelthat contains its children; all such vertices at each level of depth in thetree share a predefined degree and communication bandwidth with theirchildren. Computation proceeds similarly to in BSP, consisting of a seriesof supersteps in which communication between parents and children mayoccur. This model makes a clear attempt to simulate a memory hierarchyin a parallel environment, aimed at "...capturing the most basic resourceparameters of multi-core architectures." [14, p. 1] To this end, it makes

2

1 Literature Review

explicit the substantial issues of network topology, being the first suchmodel we have explored to do this.

While all of the frameworks discussed up to this point are effective prac-tical models of parallel computation, they all diverge substantially from workin sequential complexity on Turing Machines, being more akin to modernprocessors by design. This makes it nontrivial to translate results in se-quential complexity theory to parallel versions when using these models.On top of this, many of these models are heavily parameterized, addinggreat difficulty in producing comprehensive complexity theoretic results.Multi-BSP makes an effort to overcome this with a parameter-free notionof optimality [14, p. 4] by allowing discrepancy in some parameters up toconstant factors; however, Valiant cites an aim in future research of thismodel to be finding optimal such parameters for a given algorithm [14, p. 4],making clear the importance of the many parameters in this approach toparallelism.

1.2 Boolean Circuits

One model that better bridges the gap between classical sequential andparallel paradigms is that of Boolean circuits [15, ch. 6], which play a vitalrole in classical complexity theory in their own right. A Boolean Circuit withn inputs and a single output is defined as a directed acyclic graph with nsources (vertices with no incoming edges) and one sink (vertex with nooutgoing edges), with all non-input vertices being ’gates’ and ascribed withsome logical operation (OR, AND and NOT) [15, p. 107]. An example circuitis shown below:

x1

x2 ∨

∧ ¬

∧

Figure 1.2: A Boolean Circuit that decides x1 6= x2 for x1x2 ∈ 0, 12.

A circuit’s size is considered to be the number of vertices it contains. Aninput is a string x ∈ 0, 1n; to compute the output, the source verticesare each given a value equal to each of the input string’s n characters,and every gate is recursively given a value equal to its logical operationperformed upon the value of its incoming edges, until a value is finallyattained at the sink. This value is taken to be the output. For instance, theinput 01 above would incur an output of 1, by evaluating the circuit in the

3

1 Literature Review

natural way described.

It is perhaps surprising that Boolean circuits play any role in parallelcomplexity theory, but the complexity class NC of all circuits of polynomialsize and polylogarithmic depth [15, p. 117] is in fact a reasonable complexityclass for algorithms that have some fast parallel implementation [15, p. 118];indeed, one can also define NC as the class of all languages decidablein polylogarithmic time with a polynomial number of processors [8, p. 44].It is well-known that NC ⊆ P [8, p. 44]; however, it is unknown whetherNC = P, a statement that if true would imply all tractable algorithms canbe drastically improved by parallelism.

Boolean circuits are an effective candidate to consider theoretical ques-tions about parallel complexity theory, as their connections with classicalcomplexity theory are well-understood to the point of their embedding inTuring Machines being fully explored [8, p. 71]. One must note, however,the lack of immediate intuition as to how one would implement a parallelprocess in a Boolean circuit; models like PRAM, LogP and BSP as ex-plored before stand nearer to how parallel algorithms are written by design,meaning analysis of specific algorithms is difficult at best.

1.3 Parallel Turing Machines

Another important model for parallelism to investigate is that of ParallelTuring Machines [3], an approach to parallelism involving many headsreading to one tape. A Parallel Turing Machine is similar to a standardTuring Machine; however, at any given stage, a multitude of read-writeheads may exist rather than just one. Computation starts with only onehead as a standard sequential Turing Machine, and proceeds identically,however a head may choose to multiply into several heads, which then eachexecute on their own. If two heads try and write a different character tothe same position on the tape, the computation is illegal, leaving undefinedbehavior [3, p. 5]. This means parallelism is achieved by choosing to ’spawn’new processes with the same shared memory.

It should be noted that Parallel Turing Machines can only give polynomialspeedup on sequential ones [3, p. 16]. Wiedermann, the creator of thePTM model, believes this to be "in good agreement with basic physicallaws" [3, p. 3], as shared memory models of parallelism which can spawnan exponential number of processors in polynomial time fail to take intoconsideration implicit communication and synchronization costs betweenprocessors, making them inaccurate. [3, p. 2]

4

1 Literature Review

1.4 Machine Classes

An important insight made in the study of parallel complexity theory was,as with sequential complexities, many different machines to analyze (eg.PRAM, Parallel Turing Machines) produced varying complexity classes,which were not all identical. This is to be expected; a 2-processor parallelmachine where one processor is an oracle that solves anything in one stepwill clearly give different complexities to PRAM!

In sequential complexity theory, discrepancies between machines wereresolved with the invariance thesis:

Invariance Thesis. Any two "reasonable" sequential machines can simu-late one another in polynomial-bounded time and constant-bounded space.[16, p. 5]

We note that ’reasonable’ is an inherently qualitative concept; machineslike Turing Machines fit into the complexity theory community’s idea ofreasonable. An equivalent formulation was later developed for parallelmodels of complexity, as a connection was observed between time takento execute on a parallel machine and space required on a sequential one:

Parallel Computation Thesis. The time taken to solve a problem in a"reasonable" parallel machine is polynomially related to the space takenon a "reasonable" sequential machine. [16, p. 5]

The concepts of reasonable sequential machines are of course equivalentin both theses. Models for parallelism can now be distinguished by whetherthey satisfy the Invariance Thesis, Parallel Computation Thesis or neither.Machines that satisfy the first case are said to be in the first machineclass; in the second case they are in the second machine class. Modelswhich fit into the first include standard Turing Machines, while those fittinginto the second include a variant of RAM called k-PRAM, developed bySavitch and Stimson [17], where processors can create up to k immediatecopies of themselves, which can then do the same recursively.

However, many models of parallelism do not fall into the second machineclass; notably, Parallel Turing Machines fall short unless PSPACE = P, asthey can only provide polynomial speedup to a sequential Turing Machine.It is thus a topic of debate whether the Parallel Computation Thesis is areasonable estimate of parallelism, with papers arguing for [18] and against[19]. It is, nevertheless, always of interest to show whether a model is inthese classes or not, something we will investigate.

5

1 Literature Review

1.5 Distributed Computing Models

We finally touch upon approaches to distributed complexity analysis, aparadigm distinct from parallel complexity theory but still relevant to consider.This distinction lies in what ’success’, ’failure’ and ’complexity’ mean to adistributed system. Three main disciplines exist in distributed computingtheory - analysis of timing, congestion and locality issues [6, p. 27]. Theseapproaches respectively investigate issues arising from asynchronicity ofmessages and of limited communication.

With respect to locality, the LOCAL model is standard [6]. It should benoted that the aim of this model is to consider complexities arising fromcommunication rather than from computation itself; to this end, a graphnetwork of processors G = (V, E) of distinct processors V and channels ofcommunication E ⊆ V×V is assumed such that processors awaken all atonce, messages of arbitrary size and complete correctness are transmittedin synchronized rounds of communication, and arbitrary computations areperformed by each processor with its local data [11, p. 6]. These decisionsall constitute an effort to isolate communication complexities; indeed, acentral class of complexity here is LD(t), the class of decision problemsthat all participant processors can agree on a solution for in t rounds ofcommunication, LD(O(1)) being of particular interest [20].

It is noteworthy that, while the LOCAL model does not directly aim toproduce classical complexity theoretic results, it is possible to bound thecomplexity of computation performed between each round of communic-ation r for a vertex v ∈ V by a function fA(H(r, v)) [20], where H(r, v)denotes the total amount of data received by v in all rounds before r. Thisapproach makes sense as H(r, v) thus denotes the size of our input for thecomputation v is to perform this round. Fraiginaud, Korman and Peleg notethat investigating this could lead to "...interesting connections between thetheory of locality and classical computational complexity theory." [20, p. 6]However, it seems that there has been little to no response to this in theliterature, as research continues to focus on communication complexity.

We should also give consideration to the CONGEST model [6], whichaims to add to the LOCAL model the complexity of message size, andthus of congestion. More specifically, it requires that message size isbounded by O(log n) bits for a graph with n nodes [11, p. 5], as eachnode has a unique identifier of this length. We may generalize this to theCONGEST (B) model, where we may send messages starting with anidentifier of size O(log n), then with B bits of content. This approach moreaccurately reflects real systems, as message transmission is in reality notinstantaneous. This should not discredit the LOCAL model’s importancein the reader’s mind however - adding message time to a model can cloudthe study of locality on its own.

6

2 Definitions

Armed with a comprehension of current parallel complexity models, wenow begin to consider a discrepancy in the models we have seen. Notably,we find no model for message-passing parallelism with connections to theclassical theory of Turing Machines, like Parallel Turing Machines give forshared memory. One could argue the proof of Boolean circuits being amodel for parallelism in [15, ch. 6] generalizes to parallelism without sharedmemory; however, the nature of this connection makes it difficult to analyzespecific algorithms or communication networks, as is possible in morepractical models like BSP or LogP. Unfortunately, these models in turn lackthe theoretical connections to Turing Machines to themselves suffice.

We now attempt to resolve this issue with a new model for message-passing parallelism. We will aim to satisfy the following four requirements:

1. The model should intuitively emulate a network of communicatingprocessors without shared memory.

2. The model should allow for meaningful complexity analysis.3. The model should establish a way to compare sequential complexity

classes as defined with Turing Machines and parallel ones.4. The model should be able to simulate other parallel models efficiently.

2.1 Turing Networks

To satisfy our first requirement, our model will explicitly consist of a networkof processors. To this end we note that, as per the approach of LOCALand CONGEST [6], networks can be effectively modelled with simpleundirected graphs, with vertices representing processors and edges repres-enting communication channels. Undirectedness implies communicationis two-way, a reasonable choice as such a system can emulate one-waycommunication by simply only communicating in one direction.

To satisfy the third requirement, we will assume each processor in thenetwork is a Turing Machine, a model that is known to emulate sequentialcomplexity well and will give us the connection to Turing Machines we seek.The second and fourth are to be verified after our model is defined.

With regards to communication, we will follow the example of CONGEST

7

2 Definitions

and demand communication takes time proportional to message size. Thisforces us to consider communication as a major issue in parallel complexity,a decision that more directly reflects the very real overhead of sending longmessages. Communication will be synchronous for each character sent;asynchronicity may be emulated via intermediate buffer processors.

For simplicity, we make a number of assumptions. The first of theseis that our network is static and known a priori ; mobile and unknownnetworks present a range of challenges we lack the time to investigatehere. Furthermore, we assume input is given to a single processor andoutput is required from it alone. This is reasonable as we are not emulatingdistributed computing where consensus is meaningful - instead, we areemulating algorithms that reap parallelism to compute functions efficiently,so one input and one output emulate this. Requiring output at the sameprocessor is not a substantial demand either, as we simply wish for theanswer to exist somewhere, and sending it back to the input processor canno more than double the time taken by returning where you came from.This is a constant factor, so does not concern complexity analysis.

Finally, we assume all processors can be given the same algorithm toexecute, knowing only whether they are the input/output processor or not.This is because, even if there exist a finite number of different algorithmsused by the different processors, we can identify each processor by thepath of communications taken to reach it, and choose from a finite set ofalgorithms to execute. Note we are implicitly assuming processors do notbegin computation before being communicated with - this is justified, as anycomputations done before this happens must be agnostic to input, givingresults that can simply be encoded in the processor beforehand. This willallow us to represent unbounded networks with an infinite graph, ensuringthat an infinite amount of computations cannot occur in finite time as only afinite number of processors could then be interacted with.

Armed with this, we now begin to develop definitions. We start by defininga single processor in our model, essentially a Turing Machine augmentedto support communication:

Definition 1. A Communicative Turing Machine, or CTM, is a tuple

T = 〈Q, Σ, Γ, qm, qs, ha, hr, δt, δs, δr〉where Q, Σ are nonempty and finite, Σ∪ Λ ⊂ Γ, qm, qs, ha, hr ∈ Q,

δt : Q× Γ 7→ Q× Γ× L, R, Sδs : Q× Γ 7→N× Γ×Q2

δr : Q 7→N×Q

are partial functions, and δt(qs, x), δs(qs, x) undefined ∀ x ∈ Γ.

8

2 Definitions

This definition expands upon the standard Turing Machine (see AppendixA.3) in a number of ways. There are now two start states, qm and qs;these are respectively the master and slave start states. Henceforth, theprocessor that is given input and produces output shall be called the masternode and all others slave nodes. Naturally, slave nodes will start in thestate qs and the master in qm; this allows nodes to identify their status.

The three transition functions are another notable difference. δt is theclassical transition function as per standard Turing Machines, albeit partialnow. δs is the send function, identifying in what states we wish to send acharacter, which integer-indexed neighbor we wish to send it to, and thestates we will go to if this operation succeeds or if the neighbor doesn’texist. δr is the receive function, identifying in which states we are able toreceive a character from a given neighbor, and what state we will transitionto if this happens. Intuitively, all processors in our network will be of thisform, and any one may transition, send or receive if this is currently possibleat a given time. Note the end of the definition forces a CTM in the slavestarting state to wait for a message before beginning computation.

Definition 2. An oriented graph G = 〈G, φ〉 is a pair consisting of asimple graph G = 〈V, E〉, E ⊆ V × V and a partial injective functionφ : V ×N 7→ V such that for any v ∈ V, there exists an n ∈ N

such that φ(v, 1), ..., φ(v, n) is v’s neighborhood in G and φ(v, m) isundefined for all m > n. We call φ the orientation of G.

A network topology G = 〈G = 〈V, E〉, φ〉 is an oriented graphwhere G is connected, E is a decidable symmetric relation, G hasbounded degree, 1 ∈ V ⊆N, and membership in V is decidable.

Definition 3. A Turing Network is a tuple T = 〈G, T〉 where T is aCTM and G is a network topology.

Intuitively, the orientation gives an indexing of every neighbor for eachvertex. Vertex 1 will represent our master node. Note that, as all vertices ofour network are identified by natural numbers, the network topology will becountably infinite at its largest; this is acceptable, as only a finite amountof vertices will be interacted with in a finitely long computation anyways.Having an infinite graph lets us emulate unbounded numbers of processors,similar to a botnet or other distributed computing system.

We also enforce that every topology has bounded degree, as a CTM canonly reference a fixed number of distinct neighbors. We will not considernetworks of unbounded degree here, for the sake of simplicity.

To define a computation, we must first define a configuration of ournetwork, similarly to a standard Turing Machine (see Appendix A.3). Thisbegins with CTMs, and is then expanded to general networks:

9

2 Definitions

Definition 4. A CTM configuration of a CTM T is a 4-tuple of the formC ∈ Γ∗ × Γ×Q× Γ∗. We name the set of all CTM configurations forthe CTM T C(T).

We say, for CTM configurations Cn = 〈rn, sn, qn, tn〉, n ∈ N, a net-work topology G = 〈G, φ〉, G = 〈V, E〉 and v1, v2 ∈ V,

C1 ` C2 ⇔ C1 transitions to C2 as configurations of T′

and 〈q1, s1〉 ∈ dom(δt)

〈C1, C2〉 `v2v1 〈C3, C4〉 ⇔ φ(v1, n1) = v2 ∧ φ(v2, n2) = v1

∧ δs(q1, s1) = 〈n1, s4, q3, q〉∧ δr(q2) = 〈n2, q4〉∧ r1 = r3, r2 = r4, t1 = t3, t2 = t4, s1 = s3

C1 9v1 C2 ⇔ δs(q1, s1) = 〈n, s1, q, q2〉∧ 〈v, n〉 6∈ dom(φ)

where T′ = 〈Q, Σ, Γ, qm, ha, hr, δ〉 is a Turing Machine such thatδ |dom(δt)= δt and δ(q, s) = 〈q, s, S〉 for all 〈q, s〉 6∈ dom(δt).

This definition introduces the three basic transitions CTMs are capable of:standard transitions (`), communication transitions (`∗∗), and failed-to-send transitions (9∗). The first represents a standard change of stateas though the CTM were a Turing Machine. The second represents a singlecharacter being successfully sent from one CTM in configuration C1 toone in configuration C2, each transitioning to configurations C3, C4 in theprocess respectively. The third represents a CTM in configuration C1 tryingto send a message to its nth neighbor, but that neighbor not existing andthe CTM entering a chosen recovery state in configuration C2.

From these three forms of transition, we may now define a configurationof a Turing Network and the appropriate types of transition it can make:

Definition 5. A TN configuration of a TN T is a function of the formΩ : V → C(T). We say, for CTM configurations Ωn, n ∈N of a TuringNetwork T and v1, v2 ∈ V,

Ω1 `v1 Ω2 ⇔ Ω1 |V\v1= Ω2 |V\v1

∧ (Ω1(v1) ` Ω2(v1) ∨Ω1(v1) 9v1 Ω2(v1))

Ω1 `v2v1 Ω2 ⇔ Ω1 |V\v1= Ω2 |V\v1

∧ 〈Ω1(v1), Ω1(v2)〉 `v2v1 〈Ω2(v1), Ω2(v2)〉

Ω1 ` Ω2 ⇔ (∃ v ∈ V •Ω1 `v Ω2)

∨ (∃ v1, v2 ∈ V •Ω1 `v2v1 Ω2)

10

2 Definitions

This definition is characterized by a map from vertices to configurations,providing finally a notion of the state of the entire network. The two firstforms of transition simply capture a transition involving one or two vertices,respectively. The last is simply a generalized transition, where one of thetwo above happen. Now, we may define a computation of a Turing Network:

Definition 6. An initial state of a TN T is a configuration ΩS for someS ∈ Σ∗ where

ΩS(1) = 〈λ, Λ, qm, S〉ΩS(v) = 〈λ, Λ, qs, λ〉 ∀ v ∈ V \ 1

A final state of T is a configuration Ωh where Ωh(1) = 〈A, b, q, C〉where q ∈ ha, hr. The output string is AbC with all characters not inΣ deleted. We say Ωh is accepting if q = ha and rejecting otherwise.

A derivation sequence Ψ = Ωnn∈X is a sequence of indexedconfigurations of T where X ⊆ N and for any n, m ∈ X, Ωn ` Ωm ifm is the least element of X greater than n. We say Ψ is a computationif it starts with an initial state and ends with a final state.

Say that, for two derivation sequences of T , Ψ1, Ψ2, Ψ1 ≤ Ψ2 if theformer is a prefix of the latter as a sequence.

We say T accepts a string S ∈ Σ∗ if every derivation sequencestarting with ΩS is less than an accepting computation and all rejectingcomputations are greater than an accepting computation. We say itrejects if there exists a rejecting computation not greater than someaccepting computation.

We say T computes a function f : Σ∗ 7→ Σ∗ if, for every input strings ∈ dom(f ), T accepts s and every computation of T with input strings has a final state with output string f (s).

This now gives us the terminology to discuss computation as a series ofconfigurations. We now restrict ourselves to considering computations only.

2.1.1 An Example Network

To help illustrate our model, we now produce an example Turing Network.The network topology will simply be an infinite path starting at 1; the networkwill, given a number, communicate with that many vertices in the networkbefore terminating. This solves no specific problem beyond showcasingwhat a Turing Network looks like and how to produce a time function.Formally, we have a network T = 〈G, T, φ〉 where

11

2 Definitions

G = 〈V, E〉, V = N

E = 〈n, n + 1〉 : n ∈N ∪ 〈n + 1, n〉 : n ∈Nφ(1, 1) = 2φ(n, 1) = n− 1, n > 1φ(n, 2) = n + 1, n > 1

A diagram of the CTM T is given in Figure 2.1 below. Note we usesimilar conventions to classical Turing Machine diagrams, albeit includingnew types of arrows for sending characters and receiving them. An arrowsplitting in two at a black circle is a send transition, with the one-line arrowidentifying the state reached on success and the dashed arrow the statereached on failure. A double-lined arrow identifies the state reached when acharacter is received. We define the alphabets as Σ = a, Γ = Σ ∪ Λ.

qs

qm

q1 q2

q3 q4

q5 q6

q7q8q9

q10 q11 ha

1/R 1/S

1/Sa/a/R

Λ/Λ/L

a/a/L

Λ/2/Λ/R

a/a/S a/2/a/R

Λ/2/Λ/S

2/S

Λ/Λ/S

Λ/1/Λ/S

Λ/2/Λ/R

a/2/a/R

Λ/2/Λ/S 2/S

Figure 2.1: Diagram of the CTM T.

It is evident that our machine will, given input an, send the message an−1

from vertex 1 to 2, and continue sending strings decreasing linearly in sizeuntil nothing is sent to vertex n+1. At this point, a confirmation messagewill be sent to vertex n, which will send one to vertex n-1, and so forthuntil vertex 1 receives a confirmation and knows the message has beenreceived. The output string is identical to the input when this halts.

2.2 Measuring Complexity

We now have the facilities necessary to start defining the cost of a givencomputation. In doing this, we establish exactly how much can be gainedby introducing maximal parallelism:

12

2 Definitions

Definition 7. A parallel derivation sequence of a Turing Network Tis a derivation sequence Ψ = Ωnn∈X where for all i, j ∈ X, i 6= jand vn ∈ V,

Ωi `v1 Ωx ∧ Ωj `v2 Ωy ⇒ v1 6= v2

Ωi `v2v1 Ωx ∧ Ωj `v3 Ωy ⇒ v1 6= v2, v3

Ωi `v2v1 Ωx ∧ Ωj `v4

v3 Ωy ⇒ v1, v2 6= v3, v4

where x, y ∈ X are the successors of i, j in X, if they exist.

Definition 8. The parallel time of a derivation sequence is the smal-lest number of parallel derivation sequences it itself is a concatenatedsequence of.

The time function τ : Σ∗ → N0 of a Turing Network T maps in-put strings to the longest parallel time of any accepting computationstarting with said string as input the Turing Network is capable of.

2.2.1 Example Time Function

Armed with this knowledge, we can now begin to classify the time functionof our previous example TN T . At any given time, only one vertex maytransition or one pair of vertices may communicate, hence any parallelsubsequence is trivial and we may simply count the number of transitionsin this trivial case. We note that, for a given processor that is neitherthe vertex n nor the master node, it takes 2n + 2 transitions to receive amessage of length n and a further n + 1 to return the message’s start, 1 todetermine that a further message of length n− 1 must be sent, and 1 toreceive an acknowledgement. The master will make one unaccounted fortransition to receive the final acknowledgement, and the nth node will take6 unaccounted for transitions to determine it has nothing to send. Hence,

τT (n) =n−1

∑r=1

(2r + 2 + r + 1 + 1 + 1) + 6 + 1 =32

n2 +72

n + 2

Thus, τT (n) = O(n2) and we can see our network takes quadratic timeto terminate given an input of length n.

It should be noted that there are a multitude of other potential ways wecould gauge complexity, including communication costs, space complexityand the number of processors needed to complete a computation. However,we do not investigate these for the sake of brevity.

13

3 Complexity Theory

Armed with a measure for the time taken by a Turing Network to com-plete a parallel computation, we may begin to build a complexity theoreticframework around our model for parallelism. We begin by introducing ourstandard complexity class:

Definition 9. For any given function f : N0 → N0 and network topo-logy G,

PARTIMEG(f (n)) =g | Σ set, g : Σ∗ 7→ Σ∗ computable by some

TN T with topology G and τT (n) = O(f (n))

The reader may rightly question this specific choice of complexity class,namely why we are only willing to consider problems solvable given aspecific network topology. This is because, given free choice of networktopology, we are in fact able to compute literally any decision problem inquadratic time, including undecidable ones; intuitively, the topology can, ifleft unchecked, encode arbitrary information we may retrieve by traversingit. (See Appendix A.2 for details.) The reader may thus believe this modelis far too general, encompassing behaviors too broad to be considered amodel of computation. However, this problem plagues Boolean Circuitsas well [15, p. 110], since without considering how hard it is to constructa circuit, these too can compute undecidable problems. Later, we willconsider the difficulty of constructing a given network topology, the methodby which Boolean Circuits resolve this conundrum in classes like NC.

We note a few simple laws pertaining to our complexity class, the firstrelating to the sequential complexity class DTIME(f (n)) in [15, p. 25]:

Theorem 1. For any given function f : N0 → N0 and network topo-logies G, H,

DTIME(f (n)) ⊆ PARTIMEG(f (n)) (3.1)

H ' G ⇒ PARTIMEH(f (n)) = PARTIMEG(f (n)) (3.2)

(The operator ' depicts isomorphism. See Appendix A.1 for details.)

14

3 Complexity Theory

Seeing (3.1) is trivial, as the master processor can compute anything aclassical Turing Machine can by acting as a Turing Machine without com-muning with any neighbors. (3.2) is true as, since orientation is preservedand 1 remains the same, identical derivations are possible in both networktopologies up to renaming of nodes greater than 1. Intuitively, the graphswill look the same to computations as vertex names are not explicitly used,only indexing by orientation.

3.1 The Parallel Computation Thesis

Ideally, we would like some verification of the well-known Parallel Com-putation Thesis [21, p. 3], mentioned in Section 1.4. Hence, we shalldevelop an understanding of a wide class of network topologies that cancompute PSPACE-complete problems in polynomial time in an effort toidentify topologies that verify this thesis for our model.

We begin by defining the class of polynomial time problems for a TN:

Definition 10. For a given network topology G,

Ppar〈G〉 =⋃

d∈N

PARTIMEG(nd)

Initially we consider only the binary tree, namely B = 〈G = 〈V, E〉, φ〉where V = N and

E = (n,⌊n

2

⌋), (⌊n

2

⌋, n) : n ∈N, n > 1

φ(n, 1) = 2n, n ∈N

φ(n, 2) = 2n + 1, n ∈N

φ(n, 3) =⌊n

2

⌋, n ∈N \ 1

We note that with this graph, we can reach an exponential number ofvertices if we were to broadcast a message across the network startingat the master vertex for linear time. Hence, this network has potential forenormous distribution of computations.

We choose the QBF (Quantified Boolean Formulae) problem [15] as thePSPACE-complete problem we will attempt to solve on this network; if wecan compute it in polynomial time, we will have a strong positive resultfor the Parallel Computation Thesis, as then any problem computable inpolynomial space can be computed in parallel in polynomial time.

QBF is stated here as follows:

15

3 Complexity Theory

Definition 11. For the alphabet Σ = ∀, ∃, •,∧,∨,¬, (, ),>,⊥, a,

QBF = α ∈ Σ∗ : α is a valid satisfiable Boolean formula

Our algorithm is given below, where Σ is as in the definition of QBF:

Data: a string α ∈ Σ∗

Result: > if α is a satisfiable Boolean formula, ⊥ otherwiseif master node then

if α valid Boolean formula thenconvert to Prenex normal form;send to neighbor 1;return message from neighbor 1;

elsereturn ⊥;

endelse

σ← message from neighbor 3;if σ starts with ’∀ x•’ for some x ∈ a∗ then

delete ’∀ x•’ from start of σ;send σ to neighbor 1 with all occurrences of x set to ’true’;send σ to neighbor 2 with all occurrences of x set to ’false’;wait for both neighbors 1 and 2 to respond;if both neighbors 1 and 2 respond with > then

send > to neighbor 3;else

send ⊥ to neighbor 3;end

elseif σ starts with ’∃ x•’ for some x ∈ a∗ then

same as for ∀, but send > if either 1 or 2 respond with >;else

if σ contains a free variable thensame as ∃, but setting this variable and no deletion;

elseevaluate formula;send > to neighbor 3 if ’true’, ⊥ otherwise;

endend

endend

Algorithm 1: Binary tree algorithm to decide QBF.

In devising this algorithm, we have assumed a single CTM is capable of

16

3 Complexity Theory

computing the following in polynomial time:

• Checking whether a string is a valid quantified Boolean formula.

• Converting a formula to Prenex Normal Form. [22]

• Replacing all occurrences of a variable with the value ’true’ or ’false’.

• Checking whether any variables are left in a formula.

• Evaluating a formula with no variables.

These are all reasonable assumptions, as a CTM can clearly do anythinga standard Turing Machine can do in the same time. With this, we may nowbegin to consider our algorithm’s complexity.

For any given node, no iteration occurs and all steps are polynomial timeas per our assumptions. Clearly, if conversion to Prenex normal form ispolynomial time, the result formula must be a polynomial of the original onein size. On top of this, each successive slave node communicated withremoves a quantifier or free variable, reducing the formula to a variable-freeBoolean expression in a linear number of such changes relative to thePrenex normal form of the formula, hence a polynomial number of suchsteps. Finally, the last slave nodes communicated with simply evaluatevariable-free formulae, and results are propagated back up a polynomialdepth of the binary tree back to the master node. Overall, this algorithmmust therefore take polynomial time, giving us the following theorem:

Theorem 2. PSPACE ⊆ Ppar〈B〉.

We are able to extend this result to topologies other than binary trees ifwe are willing to consider a factor we define below:

Definition 12. For a given network topology G, the neighborhoodfunction NG : N→ P(V) is defined as

NG(n) = v ∈ V : ∃ path in G from 1 to v of length ≤ n

and the growth function γG : N→N is defined as

γG(n) = |NG(n)|

Theorem 3. For any network topology G where γG(n) = Ω(2n),

PSPACE ⊆ Ppar〈G〉

We omit the full proof of the above theorem here; in order to prove this, wemust show that we can find a spanning tree of exponential growth function

17

3 Complexity Theory

in our topology, and then prune all dead end branches. We will assume thisis possible, as it is merely an issue of traversing the network and havingeach node record its assigned parent and children in the subtree.

After this, we have a tree with some vertices having only one child andsome having more. Our algorithm can now, if a vertex has only one child,simply pass on the formula to the child and pass the result back up tothe parent. We also, instead of sending one formula to each child, sendboth to each child; the children each first try to evaluate the first formulathey receive then the second. The parent simply waits for answers aboutboth formulae from any source, rather than one answer from each child.This stops problems where one child simply has one further child and soforth, which would give an insufficent rate of growth - the other child cancompensate. The overall rate of growth, though, will be sufficient to ensureall exponential possible assignments of values to variables are checkedwhen all vertices in a polynomial depth of the tree are used.

This gives a positive result towards the Parallel Computation Thesis forour model, showing for a large class of topologies, any problem sequentiallysolvable with polynomial space is solvable in parallel in polynomial time.

3.2 Simulation by Turing Machines

In order to identify upper bounds on complexity classes for Turing Networks,we will follow in the footsteps of Boolean Circuits [8] and attempt to simulatea TN on a classical Turing Machine. Clearly, if a computation taking time t1on a Turing Network can be simulated in time t2 on a Turing Machine, and aproblem cannot be solved in time t2 sequentially, it cannot be solved in par-allel in time t1 as we could then simulate its parallel execution sequentiallyin time t2. Hence, we are interested in deriving t2 from t1.

To construct a simulation, we will need to establish how to store a givenconfiguration, how to produce an initial configuration given an input string,how to produce an output string given a final configuration, how to choosea possible transition and how to execute it. If we can do these things, simu-lation is trivial, simply being a matter of establishing an initial configurationand executing possible transitions until a final configuration is reached.

To make our lives easier we will assume a multi-tape Turing Machine,specifically 3 tapes. A configuration will be stored as follows:

• The first tape will contain a comma-delimited array, each elementbeing a pair of a vertex name and the corresponding CTM’s configur-ation. The read-write head is represented by a special character, towhose left is the character the head is currently over.

18

3 Complexity Theory

• The second tape contains the neighborhoods of each vertex that hasparticipated in a transition - a comma-delimited array of pairs, wherethe first element is a vertex name and second is a list of its neighbors,ordered by their indices as per the orientation.

• The third tape is a scratchpad for other work.

To identify a possible transition, we need to check if any of the threeelementary types of transition (standard, communication, failed-to-send)are possible. To check if a standard transition is possible, we may simplyiterate through all CTM tapes, checking if any one is currently able toexecute a standard transition by the character it is currently reading andits current state; executing the transition takes constant time, and all theinformation needed can be encoded in the TM simulating our network.

To check if a communication transition is possible, we iterate through allCTM tapes, gathering a list of vertices able to send a character to a vertex.We then iterate through this list, looking at each element’s neighborhood inthe second tape and seeing if any of its neighbors are willing to receive. Ifwe find one such vertex, execute the involved steps for a communicationtransition to the neighbor, and empty the third tape. If the neighbor trying tobe communicated with is not in the first tape, it has not been communicatedwith; add its empty tape to our first tape, compute its neighborhood in thesecond tape and execute the communication if it is able to receive sucha message. If it is not able to receive the message, delete it again. Ifthe sending vertex is trying to send to a neighbor of index larger than itsneighborhood’s size, the neighbor doesn’t exist - execute a failed-to-sendtransition.

Producing an initial state given an input string is easy enough, as wemerely create a single CTM tape with only the input string inside in the firsttape, a list containing only the master node’s neighbors in the second tape,and nothing in the third tape. Detecting acceptance or rejection requires usto observe the state of the master CTM; once this is done, we can deleteeverything but the output tape’s contents, delete all characters not in theoutput alphabet and terminate.

Given this algorithm outline, we now aim to devise its complexity. We firstinvestigate the complexity of a single transition. We first define the followingnotion, as we will need to factor in the size of vertex names present in ourtape:

Definition 13. For any network topology G, the name growth func-tion κG : N→N is defined as

κG(n) = max(NG(n))

We also note that the length of the first tape after n transitions with an

19

3 Complexity Theory

input string of length x is bounded to O(h(n)), where the function h isdefined as h(n) = x + γG(n)(κG(n) + n). This is because it is boundedby the length of the list multiplied by the size of each element, where eachelement is a pair of a vertex name and a tape. The vertex name is evidentlybounded above by κG(n) by definition, and the size of any tape is boundedabove by n, the number of steps having occurred so far.

Similarly, the length of the second tape after n transitions is bounded byO(k(n)), where, given our topology’s degree is ∆,

k(n) = γG(n)(

κG(n) + ∆κG(n + 1))

ie. the number of vertices seen so far times the maximum size of anygiven list of neighbors, each of which contains a vertex name and a list ofneighbors. This bound is given by the largest possible vertex name plus thelargest possible neighbor; any vertex has a constant number of neighbors,so ∆ can be ignored.

If we are simulating the TN T = 〈T, G〉 we note checking for and po-tentially executing a standard transition after n steps takes time t1(n) =O(h(n)), ie. the size of the first tape. This bound holds because we merelyneed to traverse the first tape until we find a vertex able to perform astandard transition and do it, the latter step taking constant time.

To check for and execute a communication, we must produce the listof all potentially receiving and potentially sending vertices which takesO(h(n)) time, find any matches between these lists which takes O((h(n) +k(n))2) time, update the sender and receiver which take O(h(n)) timeto reach on the first tape and constant time to update, and if not, trycommuning with a new vertex or fail a communication. To communewith a new vertex, for each of the O(γG(n)) possibly sending vertices, ittakes O(h(n) + k(n)) time to check its neighbors in the second tape, aswell as the existence of each such neighbor in the first tape (there are aconstant-bounded number of neighbors so we may pretend there is only oneneighbor). Then, it will take O(k(n) + h(n)) to reach the end of both tapesand O(maxv∈N(n+1),m tφ(v, m)) to compute this new neighbor’s neighbors,where tφ(v, n) is the time taken to compute φ(v, n), and O(h(n)) time tocompute/undo this communication. Finally, O(h(n) + k(n)) time is neededto check for a communication with a nonexistent neighbor, ie. the timetaken to traverse the list of candidate sending vertices and check eachone’s list of neighbors to see if the candidate neighbor exists.

Adding these all together and removing constant factors, the time takento check for and execute a communication transition takes time

t2(n) = O((h(n) + k(n))2 + γG(n) max

v∈NG(n+1),m∈Ntφ(v, m)

)

20

3 Complexity Theory

Therefore, the time to execute the nth transition takes overall time O(t2(n)),as it dominates O(t1(n)) evidently.

From this, we may now deduce the overall time taken, as the time toconstruct the input string and then the output at the end are both O(x)time for an input string of length x. There will also be a maximum of τT (x)transitions clearly, each of which may involve all γG(n) vertices seen sofar transitioning. Therefore, the overall time taken to simulate on a 3-tapeTuring Machine is

tT (x) = O(

x +τT (x)

∑n=1

γG(n)t2(n))

= O(

x +τT (x)

∑n=1

γG(n)((h(n) + k(n))2 + γG(n) max

v∈NG(n+1),m∈Ntφ(v, m)

))= O

(γG(τT (x))τT (x)

((h(τT (x)) + k(τT (x)))2

+ γG(τT (x)) maxv∈NG(τT (x)+1),m∈N

tφ(v, m)))

The x may be removed at the end as all other terms will dominate it. Wemay finally expand h(n)+ k(n) and notice it reduces to x+γG(n)κG(n+ 1).For the sake of simplicity, we also define the function

Φ(n) = maxv∈NG(n+1),m∈N

tφ(v, m)

which represents the longest time taken to compute a neighbor of anythingof distance n + 1 away from the vertex 1. Finally, we note that a 1-tapeTuring Machine can emulate a 3-tape one in quadratic time [23, p. 293],giving us the final expression

tT (x) = O(

γ2G(τT (x))τ

2T (x)

((x + γG(τT (x))κG(τT (x) + 1)

)2+ γG(τT (x))Φ(τT (x))

)2)

(A similar bound for space complexity is given in Appendix A.4, along withsome simple upper bounds derived from it.) We may now begin to use thisto prove bounds on complexity.

3.3 Upper Bounds

Using the new function tT for any given Turing Network T we have justderived, we may begin to observe upper bounds on complexity classes.

Firstly, a straightforward theorem immediately presents itself:

21

3 Complexity Theory

Theorem 4. For any network topology G and TG(f ) denoting the setof all TNs T with this topology such that τT (n) = O(f (n)),

PARTIMEG(f (n)) ⊆ DTIME( maxT ∈TG(f )

tT (n))

All this theorem says is that if a problem is solvable in a given time ina Turing Network, we can simulate it running in this time on a sequentialTuring Machine. This gives us a concrete upper bound on parallel time wecan now exploit for more intuitive results.

We now investigate Ppar〈G〉 for the rest of this section, as it constitutes allproblems which are tractable given parallelism. Evidently, any TN T takingpolynomial time to execute will have a simulation time of

tT (n) = O(

γ2G(O(nd))n2d

((n+γG(O(nd))κG(O(nd))

)2+γG(O(nd))Φ(O(nd))

)2)

In order to simplify our above theorem in this case, we will give a greaterdeal of scrutiny to the functions κG, γG and tφ, the major parameterswe have not made sufficiently concrete. We note that γG(n) = O(2n)always as any network topology’s degree is constant by definition, greatlysimplifying our analysis. Hence,

tT (n) = O(

2O(nd)n2d(2O(nd)κG(O(nd))2 + 2O(nd)Φ(O(nd)))2)

= O(

2O(nd)(κG(O(nd))2 + Φ(O(nd))

)2)

Note that the n2d and n terms vanish as the ignored constant factors in2O(nd) allow us to dominate them completely.

If we analyze at this level of granularity, it is evident that we cannot expecta better bound than EXPTIME at this point owing to the appearance of2O(nd) within our expression. To achieve this bound precisely, we nowinvestigate what bounds would be necessary on tφ and κG, the only tworemaining properties of G influencing our complexity bound. This will reveala class of network topologies which are unable to bring anything outside ofEXPTIME back to polynomial time through unbounded parallelism.

Evidently, it is a necessary and sufficient condition that κG(αnd), Φ(βnd) =

O(2O(nd)) for some α, β ∈N. The reader should be pleased to know thatthis requirement is not impossible to satisfy for κ - in fact, every networktopology is isomorphic to one with this complexity of κ, as we can simplyname the vertices in order of distance from the vertex 1. There are only upto O(2n) vertices of distance n away, so this naming will retain the propertyκG(n) = O(2n) for our new topology G, and hence κG(αnd) = O(2O(nd)).

22

3 Complexity Theory

The question now remains whether this is possible to satisfy for Φ. Evenif every topology is isomorphic to one with a sufficiently small κ, there isno guarantee that the complexity of the orientation will similarly decrease.We note that if maxm∈N tφ(v, n) = O(f (v)) for all v ∈ V, then Φ(n) =O(f (κG(n + 1))), ie. the time taken to compute a neighbor of the vertexwith the largest name of distance n away from vertex 1.

In the worst allowable case of κG, where this function is O(2nk) and thus

κG(O(nd)) = O(2O(ndk)), we must have then that f (O(2nd)) = O(2nh

) foran EXPTIME upper bound. Finding the maximal possible function f for thisis nontrivial, however, we note that if f (n) = O(2(log n)c

) then

f (2nd) = O(2(log 2nd

)c) = O(2ndc·(log 2)c

) = O(2O(nk))

for some constant k. This means that any orientation of quasi-polynomialcomplexity is sufficiently fast to be acceptable. However, if f (n) = O(2n)

only, then f (2nd) = O(22nd

), a bound lying outside of EXPTIME and in2-EXPTIME instead. Hence, the upper bound on f lies somewhere betweenquasi-polynomial and exponential complexity.

We now have the following theorem:

Theorem 5. For any network topology G such that κG(n) = O(2nd)

and tφ(v, n) = O(2(logv)c),

Ppar〈G〉 ⊆ EXPTIME

Fundamentally, this theorem shows that in order to be able to make aproblem beyond EXPTIME tractable, a network topology must implicitlycontain the results of substantially difficult computations in its structure.

We have now shown a strong upper bound on a large class of networktopologies. Some examples of network topologies which satisfy this areboth the binary tree B and the path in our initial example. This suggests,for instance, that Presburger arithmetic cannot be made tractable by paral-lelism across a binary tree, as it is not within EXPTIME. [24] We identifysome simple network topologies that satisfy this upper bound in AppendixA.5, including grids, all finite topologies and all constant-degree trees.

It is a desirable goal to find a PSPACE upper bound such that, coupledwith our PSPACE lower bound in Section 3.1, we could find a class ofnetwork topologies in the second machine class. An attempt at this is givenin Appendix A.4, which unfortunately fails to maintain our PSPACE lowerbound in the process. Resolving this is left as future work.

23

4 Evaluation

After having considered this model’s complexity theory in depth, we nowtake the time to return to the goals we originally posed for Turing Net-works to fulfill in the project outline. Our first requirement is trivially met;our model is literally such a system. Our second requirement has beenachieved throughout Chapter 3, showing we can define meaningful com-plexity classes and deduce substantial complexity bounds for a wide rangeof network topologies, especially for polynomial time parallel problems. Weclaim our third requirement is sated similarly, as the bounds we have foundconstrain tractable parallel problems between two complexity classes forTuring Machines. We have also provided an embedding of Turing Networksin Turing Machines in the process; the reverse simulation is trivial.

Our fourth requirement, however, has been left unconsidered thus far;we have not yet simulated other well-known models of parallelism with ourown, a feat that would cement the relevance of our model by ensuring itcan enact the behaviors that make other models so useful. It is now timewe produce such simulations, namely of BSP and Boolean circuits; withthis in place, we will have a sound case for the utility of Turing Networks.

4.1 Simulating BSP

BSP has found widespread use as a model for practical parallel computation[1] [13] [12], as it presents a useful and adaptable model for synchronization.To show our model for parallelism encapsulates other practical approaches,it is relevant for us to take BSP as a case study for simulation. We will showour model can simulate the two major traits given in [13] that identify BSP:

1. Global Supersteps: "The execution of a program proceeds in super-steps, in which concurrent components (normally called processes)compute asynchronously, with a global synchronisation at the end ofeach superstep." [13, p. 1]

2. Global Communication: "Processes communicate with one anotherasynchronously. All communication operations take effect at thesynchronisation point." [13, p. 1]

We note that the major hurdle will be to emulate these global operationswith local ones, as our model does not support global synchronization as a

24

4 Evaluation

primitive. We will assume that local communication is sufficient to emulateglobal communication, as messages can be transmitted across a network.Issues here can be relegated to network topology design.

Hence, we will attempt to emulate a system which performs local com-putations, awaits a global synchronization, transmits messages and thenrepeats ad infinitum, which we claim to be a sufficient simulation of theBSP model. In order to perform a global synchronization, we will place themaster vertex in charge of coordination; it will signal for a global superstepto end, and it will decide when the next one begins. Broadcasts of thesedecisions will have to be designed such that a vertex done synchronizingdoes not interfere with those still doing so. We will assume the existence ofspecial characters that will represent four special messages we will nameSYNC, FINISH, ACK, and NEXTSTEP. A superstep will proceed as follows:

1. All vertices perform local computations, not sending messages.

2. The master vertex finishes its computations and sends SYNC to itsneighbors, along with whatever data it wished to send to them.

3. Every slave, when done with its local computations, enters a waitingstate, ready to receive SYNC from any neighbor. Once it receivesone, it records the data this neighbor sent (possibly nothing), andenters a second state.

4. In this second state, each vertex now tries to send SYNC to allneighbors, along with whatever data it wishes to send them (possiblynothing). It also waits for such messages from all neighbors it hasnot received one from yet, recording data sent (possibly nothing). If aneighbor still in the slave starting state receives SYNC, it ignores it ifno data is sent and enters the same waiting state otherwise.

5. Once every neighbor has been sent SYNC and has sent one as well,the vertex enters a second waiting state.

6. The master, on reaching this state, sends FINISH to every neighbor.

7. This message is broadcasted by all waiting vertices to produce aspanning tree; if a vertex has already received a FINISH message, itwill reject all others and try and send FINISH to all neighbors similarly.All vertices who are done doing this wait again. A vertex in the slavestarting state rejects such messages.

8. When no neighbor accepts a vertex’s FINISH message, it knows itselfto be a leaf. It sends ACK back to its parent; after all its children sendit ACK, a non-leaf sends ACK to its parent. When the master receivesACK from all neighbors, all vertices are ready for the next superstep.

9. The master sends NEXTSTEP to all children in the spanning tree,which is broadcasted down the tree recursively. When a vertex hassent NEXTSTEP to all its children, it returns to step 1.

This algorithm works since the master vertex will not decide the superstep

25

4 Evaluation

ends until all vertices have told it so in step 8. Hence, all vertices will be inthe same state between steps 8 and 9, having sent all communications forthe superstep and ready to start the next. Vertices can then freely beginthe next superstep, knowing their neighbors are either in this superstepor waiting to begin it by sending all NEXTSTEP messages they need to.Vertices in the latter case importantly won’t accept SYNC messages yet,meaning they will definitely execute the next superstep’s local computations.

The complexity of one superstep is proportional to the longest path inthe currently visible vertices times the length of the longest message sent,which after n supersteps and message length x is O(nx). This is far longerthan the constant time taken in BSP, but we argue this to be irrelevant;as shown by the LOCAL model [11], locality is central to distributedsystems, which are the most realistic model for a system that can grow tosuit larger problems. Hence, constant-time global operations are unrealisticfor a variable-sized system and a constant factor to ignore for one of fixedscale. Regardless, our simulation is polynomial-time and constant-spaceper vertex, showing we can simulate practical models of parallelism withidentical restrictions to the Invariance Thesis. We could thus conclude ourmodel to be ’reasonable’ with respect to well-known models like BSP.

4.2 Simulating Boolean Circuits

It is known that Boolean circuits are an approximate simulation of parallelism[15, p. 118], hence classes like NC, the class of all problems solvable bya Boolean Circuit of polynomial size and polylogarithmic depth [8, p. 45]are of interest to parallel complexity theory. It is in our best interest toestablish this connection to parallelism through our own model to show wecan approximate parallelism in a similar manner.

Unfortunately, we quickly arrive at an impasse: our model requires theinput to be present only at the master processor, whereas Boolean circuitsassume each bit is available to a different logic gate immediately. Wewill find the linear-time complexity of transferring each bit of the inputto a different processor to be insurmountable, and must therefore find aworkaround in our complexity analysis.

A further issue arises since Boolean circuits are a model of nonuniformcomputation [15, p. 118], meaning different inputs are processed by differ-ent kinds of processors. In the context of Boolean circuits, this is becausea given circuit can only accept a fixed input size, so an infinite family ofcircuits is required to deal with arbitrary input length. The proof given in [15]that Boolean circuits represent parallel computations assumes a similarkind of nonuniformity for parallel models as well, also assuming input isdistributed across the network at the start.

26

4 Evaluation

To tackle both of these issues, we may choose some prefix of a derivationsequence to be the setup computation, where we send data to the correctpositions to emulate some other model of parallelism that would have theseat the required locations immediately. We can then subtract the time takento do this from our time function, meaning many of our definitions can bereused. Of course, this will require the desired initial state to be reachablefrom our standard one, which is reasonable as an unreachable initial statewould not arise in any constructible system.

This ignorance of a setup computation is reasonable as a Boolean circuit,in order to be used in any realistic setting, would have to be constructeditself and input appropriately provided before it could compute anything.Our model merely makes this requirement explicit, so we must label thesecomputations appropriately. This reflects the proof provided in [15], whichmakes clear that these setup operations must be completed beforehand.

We now emulate a family of Boolean circuits Cn : n ∈ N whichcomputes a function f : 0, 1∗ → 0, 1∗ with a single TN. We first defineour topology, which will be an infinite rooted binary tree of root vertex 1,with extra augmentations. These augmentations will be added to each layerof the binary tree, meaning each set of vertices equidistant from vertex 1.

We wish to, for circuit Cn, reach the first tree layer of size larger thanthis circuit’s size, then having the vertices in this layer each emulate alogic gate in the circuit, transmitting values between one another until theresult is computed. This result will then be propagated back up to themaster vertex. The augmentations we will add to each layer will comprisethe interconnections necessary to transmit messages between all thevertices in a given layer in O(log2 n) time. We will assume all gates have aconstant-bounded fan-in, meaning no processor will receive more than aconstant number of messages overall. This will be essential in our designof the interconnection network, owing to our requirement that all networktopologies have constant degree.

We begin by noting we can derive a notion of ’left’ and ’right’ child of anyvertex in our binary tree from our orientation. By doing this, any layer of sizen can be implicitly indexed from 1 to n by left-to-right ordering. Henceforth,we assume such an indexing is in place for any layer of any tree.

For any given layer of the binary tree, say of distance log2(n) to the rootand thus of size n, we add to each vertex in the layer two new rooted binarytrees with said vertex as their roots, each of whose bottom layer is of sizen themselves. For vertex i, one of its trees will be titled the ith sendingtree, and the other the ith receiving tree. We now add one more edge toeach vertex in the bottom layer of each sending/receiving tree: for any1 ≤ i, j ≤ n, add an edge between vertex j of the bottom layer of the ith

sending tree and vertex i of the bottom layer of the jth receiving tree.

27

4 Evaluation

We now give an algorithm to send a character from vertex i to j:

1. Send the character, along with the sender and intended recipientin binary (thus of size at most 2 log2(n) + 1) down the send tree,navigating to the jth vertex on the bottom layer of the sending tree.

2. Send the message from this vertex to the receiving tree vertex it isconnected to, ie. vertex i at the bottom layer of the jth receiving tree.

3. Send the message up the receiving tree until it reaches vertex j.

Overall, 2 log2(n) vertices propagate a message of length 2 log2(n) + 1,resulting in O(log2 n) time to send one message. As no two verticeshave the same sending tree nor connect to the same place on a receivingtree, all n vertices can enact steps 1 and 2 in parallel. Step 3 may leadto congestion as several messages propagate down the same vertex’sreceiving tree, but as we are assuming a constant bound on how manymessages are sent to the same vertex, this leads only to a constant amountof extra time. Hence, the whole process takes O(log2 n) time.

We name this entire network topology B. Clearly it has constant degreeand 1 is in it, namely its root; as it is a countably infinite union of finite-sizedlayers, vertices can be made natural numbers. Thus, it is a valid topology.

......

......

......

......

S1

R1

S2

R2

S3

R3

. . .

Sn

Rn

Figure 4.1: Diagram of B, with one layer of size n magnified. Note the sendand receive trees Si and Ri, connected at their leaves.

Now we provide a description of our simulation, starting with what we willdefine to be the setup computation. To begin, we compute the length n ofthe input provided at the master vertex and then compute the appropriatecircuit description Cn, taking polynomial time. We then compute Cn’s lengthN, writing it in base 2, and then write down the length of this expression plus1; this gives dlog2Ne, the layer to which we will send our circuit. We thenbroadcast our circuit description along with this number down the binarytree; each node sends it further down and subtracts one from the numberunless it is 1, in which case it is not propagated further. In these propagated

28

4 Evaluation

messages, we also include an extra number 0; when the message is sentto the left child, an extra 0 is added, otherwise an extra 1. By this numberin the message, each vertex is indexed from left to right in the layer.

We can now send the message down, which we pad with enough 0’s soit is of length 2dlog2 Ne, the size of the chosen tree layer. As we send thismessage down the tree, each vertex splits it down the middle, the lowerhalf sent to the left child and upper half sent to the right. These are thenreceived by the source vertices of the Boolean circuit 1; non-source verticesawait a 0 and discard it. Meanwhile, each vertex in the layer discards theentire circuit description except for the gate corresponding to its index.Each vertex knows where to send data to and receive data from by theseindices as well. After this, each vertex sends a message to the vertexindexed as 1, which will then send acknowledgements to all non-sourcevertices to declare that computation is allowed to begin.

This marks the end of the setup computation. The vertex indexed as 1now sends an acknowledgement to the first source vertex, who sends twoacknowledgements to source vertices 2 and 3, and so forth like a binarytree, meaning all source vertices are activated in polylogarithmic time. Now,each vertex awaits all its inputs, computes the gate operation on them inconstant time, and propagates the output to the destination until finallyreaching destination vertices. This whole process takes O(f (N) log2 N)time due to the extra time spent sending messages, where the circuittakes O(f (N)) time. The destination vertices then propagate the outputback up the tree, the message being pieced back together in the inverseway the source was split up, finally arriving at the master vertex again inO(m + m log(N−m)) time for output of length m. Hence, the time takenby our simulation, minus the setup computation, is

τT (n) = O(f (N) log2(N) + m(1 + log(N−m)))

We note also that the setup computation time is a linearithmic functionof the complexity of computing the circuit description. For NC, this meanspolynomial. We then gain the following theorem, since in NC we haveN = O(nd), f (n) = O(logc n) and m = 1:

Theorem 6. Any language in NC is decidable by a TN in polylogar-ithmic time, with setup computation taking polynomial time.

As computing a circuit from NC takes polynomial time already, this con-firms that Turing Networks are capable of simulating Boolean circuits ef-ficiently, again within identical2 bounds to the Invariance Thesis. Thisverifies our fourth requirement and shows Turing Networks can simulateboth powerful theoretical and practical models of parallelism, confirming itsown capacity on both fronts and demonstrating its range of applicability.

1We assume the source vertices are at the start of the circuit description.2We claim our simulation is constant-space too; we leave the proof as future work.

29

5 Conclusion

In conclusion, we have managed to produce a model of message-passingparallelism which connects meaningfully to the sequential theory of TuringMachines, giving strict bounds on what improvements can be gained bymoving from sequential algorithms to parallel message-passing ones. Themodel was also shown to be able to simulate both BSP and Boolean circuits,demonstrating it is able to efficiently encapsulate the behaviors of otherwell-known models of parallelism and thus transfer meaningful complexityresults over to them. Thus, Turing Networks have been proven to be auseful and significant model of parallel complexity theory.

A number of limitations of this model are to be noted, namely that itassumed both a static network and one of constant degree. In moderndistributed systems, for instance, dynamic networks are possible due tounreliable connections, as investigated by models like the π-calculus [25].Hence, a more advanced model should take this into account. This couldhave been achieved by having the network topology gain or lose edges oneach derivation, but this is beyond the scope of this project. Also, requiringa constant-degree network could be altered with a model that writes downan identifier for which neighbor it wishes to communicate with, but this wasalso considered beyond the scope of this project.

Another issue is that only time complexity was investigated, and nothingwith respect to space or network usage. A full investigation of any suchmetric for complexity, however, could make for a dissertation in its own right,and is left as future work.

More future work is to be found in classifying network topologies bycomplexity bounds they introduce; such an explicit capacity to analyzeindividual topologies is a strength of this model. In particular, finding theclasses of topology that fit into the first and second machine classes is aparticularly interesting problem to investigate. Identifying these classesexactly is beyond the scope of this project.

Overall, this project has achieved its original goals, and it is hoped thatthe Turing Network model may find use in the future as a powerful tool formessage-passing parallel complexity theory.

30

A Appendix

A.1 Network Topologies

This section contains miscellaneous information about network topologies.

Theorem 7. The collection of all network topologies, Ξ, is a set. Fur-thermore, it is uncountable.

To see this, every set of vertices is a subset of the natural numbers, andevery set of edges is a subset of N2. An orientation function is also alwaysa subset of N3. Hence, overall, we can produce an injection from Ξ toP(N6), making this collection a set.

To show Ξ is uncountable, we can produce an injection from P(N \ 1)to Ξ by mapping X ⊆N \ 1 to a linked list whose vertices are 1 and theelements of X in ascending order.

We present a definition for network topology isomorphism below:

Definition 14. Network topologies G = 〈G, φG〉 and H = 〈H, φH〉 areisomorphic, written G ' H, iff there exists a bijection f : VG → VHsuch that for all vn ∈ VG, m ∈N,

f (1) = 1 (A.1)v1EGv2 ⇐⇒ f (v1)EHf (v2) (A.2)f (φG(v1, m)) = φH(f (v1), m) (A.3)

We require (A.1) so that the master vertex does not change location inthe graph, as this affects computations substantially. (A.2) simply requiresthis to be a standard graph isomorphism; recall that EG and EH, the edgerelations of G and H, are indeed relations. (A.3) identifies the orientationswith one another so the orientation remains the same structurally. Note thatwe do not need to ensure both orientations are defined on these inputs, as(A.2) and the definition of an orientation ensures this to be the case.

31

A Appendix

Theorem 8. Network topology isomorphism is an equivalence relationon Ξ. Furthermore, Ξ/' is uncountable.

We do not give a proof of ' being an equivalence relation, as it isroutine. To see the second point, note that we can construct an injection(0, 1) → Ξ/' by, for x ∈ (0, 1), converting x to its binary representationand creating a linked list graph. We encode x in the orientation; namely, ifvn is the nth vertex to the right, we require φ(vn, 1) = vn−1 if x’s nth decimalplace in binary is 0, and vn+1 if it is 1. As no two such graphs correspondingto different numbers will be isomorphic, the set Ξ/' is uncountable.

This means that there are an uncountable number of ’unique’ topologies,since, as we saw from Theorem 1 in Chapter 3, two isomorphic networktopologies will yield identical complexity classes. We can expect this forany other measure of complexity than time as well; a TN cannot access thenames of its vertices, so there is no way for it to tell which topology in anequivalence class of ' it is operating on.

We briefly present the construction of a Cartesian product of two networktopologies, an extension of the Cartesian product of two graphs [26, p. 9]to be used in Appendix A.5:

Definition 15. The Cartesian product of two network topologies G, H,written as GH, is a network topology 〈G, φ〉 such that G =〈V, E〉 and

vij =12(i + j)(i + j + 1) + j (A.4)

V = vij | i ∈ VG, j ∈ VH (A.5)

vijEvkl ⇐⇒ (iEGk ∧ j = l) ∨ (jEHl∧ i = k) (A.6)

φ(vij, 2n) = vkl ⇐⇒ (j = l∧ φG(i, n) = k) (A.7)

φ(vij, 2n + 1) = vkl ⇐⇒ (i = k ∧ φH(j, n) = l) (A.8)

(A.4) is simply the Cantor pairing function, here used to injectively mapeach pair of vertices to a unique new vertex, as in (A.5). The edge relationis standard; note that in the orientation in (A.7) and (A.8), we change so thateven-numbered neighbors are now neighbors as per G, and odd-numberedones are neighbors as per H. Notably, this choice means GH 6' HG ingeneral, a potentially undesirable trait of our encoding. Fixing this is left asfuture work.

Our definition does, however, have reasonable complexity in the followingsense:

32

A Appendix

Theorem 9. For any network topologies G, H,

γGH(n) = O(γG(n)γH(n))

κGH(n) = O((κG(n) + κH(n))2)

tφGH(vij, n) = O(max(tφG

(i, bn/2c), tφH(j, bn/2c)))

Seeing each of these to be true is routine application of the definitions.This means we do not suddenly gain, for instance, exponential complexityby taking a Cartesian product of two topologies, which we should expect.

A.2 Computing Undecidable Problems

Below is given a proof that any decision problem, here stated as the char-acteristic function of a subset of the natural numbers, can be computed bysome Turing Network. This shows that, given arbitrary network topologies,even undecidable problems can be solved by a Turing Network, indicat-ing we must restrict our networks to those that are computable or eventractable.

Theorem 10. Any decision problem g : N→ 0, 1 can be computedby some TN in O(n2) time.

To prove this, consider some such function g. We use the same graph Gas we used in our example in Section 2.1.1, namely a path of successivevertices starting with 1. However, our orientation φ is now different:

φ(1, 1) = 2φ(n + 1, 1) = n + 2g(n)− 1φ(n + 1, 2) = n− 2g(n) + 1

This means neighbor 1 of vertex n will be n− 1 if g(n− 1) = 0, andn + 1 otherwise, the reverse holding for neighbor 2. To exploit this, wecreate a TN which broadcasts characters to every neighbor recursively,counting down until it reaches the node n steps away in a similar fashionto our earlier example. Messages sent back to a predecessor are ignored,which is necessary as we no longer know which neighbor is the successorby the orientation alone. When this node is reached, it sends a specialcharacter to neighbor 1; if this neighbor has sent it a message, neighbor1 is the predecessor and this fact is broadcast until it reaches the masternode. The inverse case is handled dually, and we have computed g(n) inquadratic time by virtue of sending length n messages n times.

33

A Appendix

A.3 Turing Machines

We include the definitions of standard Turing Machines we assume through-out this project here.

Definition 16. A Turing Machine is a tuple

T = 〈Q, Σ, Γ, q0, ha, hr, δ〉where Q, Σ are nonempty finite sets, Σ ∪ Λ ⊆ Γ, q0, ha, hr ∈ Q,

and δ : Q× Γ→ Q× Γ× L, R, S.

We say that Q is the set of states, Σ is the input alphabet, Γ is thetape alphabet, q0 is the initial state, ha is the accepting state, hr isthe rejecting state and δ is the transition function.

Our definition as above is equivalent to the standard one given in [27,p. 319], albeit differing in that the blank symbol Λ is implicitly in Γ, there isonly one accepting state and there is an explicit rejecting state. Transform-ing between these two definitions is trivial.

We will henceforth paraphrase Turing Machine with TM. We will alsoassume TMs have an infinite tape in the right direction only, with the read-write head starting at the leftmost position in the tape.

We now define a configuration of a TM, ie. a ’snapshot’ of the tapecontents, the read-write head’s position on the tape, and the current stateof the TM. This is represented as a list of the tape contents to the left of thehead, the character just below the head, the current state, and the contentsto the right of the head that have been visited by the head before.

Definition 17. A configuration of a TM T is a tuple C ∈ Γ∗ × Γ ×Q× Γ∗.

We say T transitions from one configuration C1 = 〈r1, s1, q1, t1〉 toC2 = 〈r2, s2, q2, t2〉, written C1 ` C2, iff δ(q1, s1) = 〈q2, s, X〉 and

X = S⇒ r1 = r2, s = s2, t1 = t2

X = L⇒ r1 = r2s2, t2 = st1

X = R⇒ r2 = r1s, t1 = s2t2

The definition of a transition captures the concept of a read-write headreading a character, transitioning to a new state from its old one, writing anew character and moving left/right one step or staying still.

34

A Appendix

A.4 Space Complexity

We continue the calculations in Section 3.2 to produce a space complexityof our simulation of a TN T . The reader should familiarize themselves withthis section if they have not already.

The space required to simulate our 3-tape TM on a one-tape one isbounded by the sum of all tape sizes, ignoring some constant factor. Weknow the first two tapes are of size O(h(n)) and O(k(n)). At any point, ourthird tape contains the list of sending/receiving tapes or the intermediatestate necessary to compute a vertex’s neighborhood. Hence, the maximumspace needed for transition n is s(n) = O(h(n) + k(n) + Φs(n)). Here,we have defined a new function Φs, defined as

Φs(n) = maxv∈NG(n+1),m∈N

sφ(v, m)

where sφ is the space complexity of our orientation function. Hence, Φs isthe most space needed to compute any neighbor of any vertex within n + 1steps from vertex 1. To ascertain the overall space complexity, note thatthis complexity is simply that of the most space needed by any transition.Thus, the overall space required is

sT (x) = O(h(τT (x)) + k(τT (x)) + Φs(τT (x)))= O(x + γG(τT (x))κG(τT (x) + 1) + Φs(τT (x)))

We now gain a simple theorem for space complexity:

Theorem 11. For any network topology G and function f : N0 →N0,

PARTIMEG(f (n)) ⊆ DSPACE(

n+γG(f (n))κG(f (n)+ 1)+Φs(f (n)))

In particular, we may test this bound with Ppar〈G〉:

Ppar〈G〉 ⊆ DSPACE(

n + γG(O(nd))κG(O(nd)) + Φs(O(nd)))

We note that this requires γG, κG and Φs to all be within O(nd), meaningthis must hold for sφ as well since Φs(n) = O(κG(sφ(n))). However, ifγG(n) = O(nd), note that we lack our condition placing PSPACE withinPpar〈G〉 from section 3.1. We obtain the following theorem:

Theorem 12. For any network topology G such that γG, κG = O(nd)and φ ∈ PSPACE,

Ppar〈G〉 ⊆ PSPACE

35

A Appendix

A.5 Upper Bounds on Network Topologies

Continuing from Section 3.3, we present some simple laws that let usderive network topologies that satisfy Theorem 5. We first define this set oftopologies:

TEXP = G | κG(n) = O(2nd), tφ(v, n) = O(2(log v)c

), d, c ∈N

We now present some basic network topologies and constructions thatpreserve membership in this set.

Theorem 13. Any finite network topology is in TEXP.

Clearly κG and thus γG are in O(1) if G is finite. To see why this holds forthe orientation, note that the entire orientation function can be representedas a finite set of ordered 3-tuples, each containing two arguments and theirresult. This can be seen as a finite string, and for any finite string, thereexists a Turing Machine that only writes that string out in constant time anddoes nothing else. Hence, tφ(v, n) = O(1).

Theorem 14. Any rooted tree topology of constant degree is in TEXP.

Here, we define constant-degree rooted tree topologies similarly to thebinary tree topology B, but with n children rather than just 2. This includesthe path topology we investigated in Section 2.1.1, a tree where each vertexhas one child. It is routine to check this is true.

Theorem 15. G, H ∈ TEXP ⇒ GH ∈ TEXP

To see this, simply refer to Theorem 9 in Appendix A.1 and apply theconstraints on TEXP.

This suggests, for instance, that the grid topology, simply the Cartesianproduct of two path topologies, is in TEXP.

36

Bibliography

[1] D. Skillicorn, J. M. D. Hill and W. F. McColl, ‘Questions and answersabout BSP’, vol. 6, pp. 249–274, Jan. 1997. DOI: 10.1155/1997/532130.

[2] D. Culler, R. Karp, D. Patterson, A. Sahay, K. E. Schauser, E. San-tos, R. Subramonian and T. von Eicken, ‘LogP: Towards a realisticmodel of parallel computation’, in Proceedings of the Fourth ACMSIGPLAN Symposium on Principles and Practice of Parallel Pro-gramming, ser. PPOPP ’93, San Diego, California, USA: ACM, 1993,pp. 1–12. DOI: 10.1145/155332.155333.

[3] J. Wiedermann, Parallel Turing machines. Department of ComputerScience, University of Utrecht The Netherlands, 1984.

[4] P. Qu, J. Yan, Y.-H. Zhang and G. R. Gao, ‘Parallel Turing machine,a proposal’, Journal of Computer Science and Technology, vol. 32,no. 2, pp. 269–285, Mar. 2017. DOI: 10.1007/s11390-017-1721-3.

[5] J. Dean and S. Ghemawat, ‘Mapreduce: Simplified data processingon large clusters’, Commun. ACM, vol. 51, no. 1, pp. 107–113, DOI:10.1145/1327452.1327492.

[6] D. Peleg, Distributed Computing: A Locality-sensitive Approach. So-ciety for Industrial and Applied Mathematics, 2000. DOI: 10.1137/1.9780898719772.

[7] S. Fortune and J. Wyllie, ‘Parallelism in random access machines’,in Proceedings of the Tenth Annual ACM Symposium on Theory ofComputing, ser. STOC ’78, ACM, 1978, pp. 114–118. DOI: 10.1145/800133.804339.

[8] R. Greenlaw, H. J. Hoover and W. L. Ruzzo, Limits to Parallel Compu-tation: P-completeness Theory. Oxford University Press, Inc., 1995,ISBN: 0-19-508591-4.

[9] V. E. with Robert van de Geijn and E. Chow, Introduction to HighPerformance Scientific Computing. lulu.com, 2011. DOI: 10.5281/zenodo.49897.

[10] G. E. Blelloch and B. M. Maggs, ‘Algorithms and theory of compu-tation handbook’, in, M. J. Atallah and M. Blanton, Eds., Chapman& Hall/CRC, 2010, ch. Parallel Algorithms, pp. 25–25, ISBN: 978-1-58488-820-8.

37

https://doi.org/10.1155/1997/532130

https://doi.org/10.1155/1997/532130

https://doi.org/10.1145/155332.155333

https://doi.org/10.1007/s11390-017-1721-3

https://doi.org/10.1145/1327452.1327492

https://doi.org/10.1137/1.9780898719772

https://doi.org/10.1137/1.9780898719772

https://doi.org/10.1145/800133.804339

https://doi.org/10.1145/800133.804339

https://doi.org/10.5281/zenodo.49897

https://doi.org/10.5281/zenodo.49897

Bibliography

[11] P. Fraigniaud, A. Korman and D. Peleg, ‘Towards a complexity theoryfor local distributed computing’, J. ACM, vol. 60, no. 5, 35:1–35:26,DOI: 10.1145/2499228.

[12] M. W. Goudreau, J. M. D. Hill, K. Lang, B. Mccoll, S. B. Rao, D. Stefan-escu, T. Suel and T. Tsantilas, ‘A proposal for the BSP worldwidestandard library (preliminary version)’, Feb. 1997. [Online]. Available:http://www.bsp-worldwide.org/standard/stand2.htm.

[13] H. Jifeng, Q. Miller and L. Chen, ‘Algebraic laws for BSP program-ming’, in Euro-Par’96 Parallel Processing, L. Bougé, P. Fraigniaud,A. Mignotte and Y. Robert, Eds., vol. 2, Springer Berlin Heidelberg,1996, pp. 359–368. DOI: 10.1007/BFb0024724.

[14] L. G. Valiant, ‘A bridging model for multi-core computing’, Journal ofComputer and System Sciences, vol. 77, no. 1, pp. 154–166, 2011.DOI: 10.1016/j.jcss.2010.06.012.

[15] S. Arora and B. Barak, Computational Complexity: A Modern Ap-proach, ser. IT Pro. Cambridge University Press, 2009. DOI: 10.1017/CBO9780511804090.

[16] P. van Emde Boas, ‘Machine models and simulation’, in Handbook ofTheoretical Computer Science, J. van Leeuwen, Ed., vol. A, Elsevier,1990, ch. 1, pp. 1–66.

[17] W. J. Savitch and M. J. Stimson, ‘Time bounded random accessmachines with parallel processing’, J. ACM, vol. 26, no. 1, pp. 103–118, Jan. 1979. DOI: 10.1145/322108.322119.

[18] I. Parberry, ‘Parallel speedup of sequential machines: A defense ofparallel computation thesis’, SIGACT News, vol. 18, no. 1, pp. 54–67,Mar. 1986. DOI: 10.1145/8312.8317.

[19] N. Blum, ‘A note on the parallel computation thesis’, InformationProcessing Letters, vol. 17, no. 4, pp. 203–205, 1983. DOI: 10.1016/0020-0190(83)90041-8.

[20] P. Fraigniaud, A. Korman and D. Peleg, ‘Local distributed decision’, inProceedings of the 2011 IEEE 52Nd Annual Symposium on Found-ations of Computer Science, ser. FOCS ’11, IEEE Computer Society,2011, pp. 708–717. DOI: 10.1109/FOCS.2011.17.

[21] N. Dershowitz and E. Falkovich-Derzhavetz, ‘On the parallel compu-tation thesis’, Logic Journal of the IGPL, vol. 24, no. 3, pp. 346–374,2016. DOI: 10.1093/jigpal/jzw008.

[22] ‘A note on the size of prenex normal forms’, Inf. Process. Lett.,vol. 116, no. 7, pp. 443–446, Jul. 2016. DOI: 10.1016/ j . ipl .2016.03.005.

38

https://doi.org/10.1145/2499228

http://www.bsp-worldwide.org/standard/stand2.htm

https://doi.org/10.1007/BFb0024724

https://doi.org/10.1016/j.jcss.2010.06.012

https://doi.org/10.1017/CBO9780511804090

https://doi.org/10.1017/CBO9780511804090

https://doi.org/10.1145/322108.322119

https://doi.org/10.1145/8312.8317

https://doi.org/10.1016/0020-0190(83)90041-8

https://doi.org/10.1016/0020-0190(83)90041-8

https://doi.org/10.1109/FOCS.2011.17

https://doi.org/10.1093/jigpal/jzw008

https://doi.org/10.1016/j.ipl.2016.03.005

https://doi.org/10.1016/j.ipl.2016.03.005

Bibliography

[23] J. Hartmanis and R. E. Stearns, ‘On the computational complexityof algorithms’, Transactions of the American Mathematical Society,vol. 117, pp. 285–306, 1965. DOI: 10 .1090 /S0002 - 9947 - 1965 -0170805-7.

[24] M. J. Fischer and M. O. Rabin, ‘Super-exponential complexity of Pres-burger arithmetic’, in Quantifier Elimination and Cylindrical AlgebraicDecomposition, B. F. Caviness and J. R. Johnson, Eds., SpringerVienna, 1998, pp. 122–135. DOI: 10.1016/j.jsc.2015.11.008.

[25] R. Milner, Communicating and Mobile Systems: The π-calculus. Cam-bridge University Press, 1999, ISBN: 0-521-65869-1.

[26] J. Bang-Jensen and G. Z. Gutin, Digraphs: Theory, Algorithms andApplications, 2nd. Springer Publishing Company, Incorporated, 2008.DOI: 10.1007/978-1-84800-998-1.

[27] J. E. Hopcroft and J. D. Ullman, Introduction to Automata Theory,Languages and Computation, Second Edition. Addison-Wesley, 2000.DOI: 10.1145/568438.568455.

39

https://doi.org/10.1090/S0002-9947-1965-0170805-7

https://doi.org/10.1090/S0002-9947-1965-0170805-7

https://doi.org/10.1016/j.jsc.2015.11.008

https://doi.org/10.1007/978-1-84800-998-1

https://doi.org/10.1145/568438.568455

Documents

Jack Romo 30th April 2019jackromo.com/2019/Turing-Machine-Networks.pdf · Department of Computer Science Submitted in part fulﬁlment for the degree of BSc. Turing Machine Networks