Lordito algoritmico: alcuni problemi algoritmici che hanno favorito il progresso scientifico....

Preview:

Citation preview

L’ordito algoritmico:alcuni problemi algoritmici che hanno favorito il

progresso scientifico.

Alberto PolicritiDipartimento di Matematica e Informatica,

Universita’ di Udine.

policriti@dimi.uniud.it

www.dimi.uniud.it/~policrit

Di cosa parleremo

• Classi di problemi (i problemi specifici richiedono un trattamento tecnico)

• Problemi “significativi” (che legano l’algoritmica ad altre discipline)

• Complessita’ (perche’, alla fine, e’ il vero problema dell’algoritmica)

Quali problemi

• Il problema della decisione (Entscheidungsproblem)

• Problemi algoritmici in biologia computazionale

• Una riflessione sulla nozione di complessita’

Passato Presente Futuro

Le fonti principali

• M. Davis “The Universal Computer: the road from Leibniz to

Turing”• S. Feferman “On the light of Logic”• E. Green “Strategies for the systematic sequencing of complex

genomes”• D. Knuth “Papers on the foundation of Computer Science”

Il problema della decisione

Trovare un algoritmo per decidere le formule

se una formula della logica al prim’ordine e’

soddisfacibile.

In a sense it [il problema della decisione] is the most general probem of mathematics.

J. Herbrand

La logica del prim’ordine

x‚y‚uxyx‚u vy ‚vv‚uSe x e y sono donne e x e’ felice con u, allora esiste v tale che y e’ felice con ve u e v sono amici

Se x e y sono punti e x e’ sulla retta u, allora esiste v tale che y e’ sulla retta v e u e v sono parallele

Esempio:

Esempio:

Un algoritmo per risolvere il problema della decisione ci potrebbe dire se l’ipotesi di Riemann (ottavo problema di Hilbert) e’ vera o falsa!

David Hilbert

D. Hilbert nel 1937

Born: 23 Jan 1862 in Königsberg, Prussia (now Kaliningrad, Russia)Died: 14 Feb 1943 in Göttingen, Germany

The Entscheidungsproblem is solved when we know a procedure that allows for any given logical expression to decide by finitely many operations its validity or satisfiability. [...] The Entscheidungsproblem must be considered the main problem of mathematical logic.

Principles of Mathematical LogicD. Hilbert and W. Ackermann 1928

Hilbert sapeva porre problemi!

The mathematicians present at an international conference in Paris in August 1900 inevitably wondered what the new century would bring to their subject. [...] he presented, as a challenge to the mathematicians of the twentieth century, 23 problems that seemed utterly inaccessible by the methods available at the time. The Universal

ComputerM. Davis

In his work, Hilbert demonstrated an unusual combination of direct intuition and concern for absolute rigor. With exceptional technical power at his command, he would tackle outstanding problems, usually with a great originality of approach.The title of Hilbert’s lecture in Paris was simply, “Mathematical problems”.

Deciding the undecidable: Wrestling with Hilbert’s problemsS. Feferman

The great importance of definite problems for the progress of mathematical science in general ... is undeniable. ... [for] as long as a branch of knowledge supplies a surplus of such problems, it maintains its vitality. ... every mathematician certainly shares ..the conviction that every mathematical problem is necessarily capable of strict resolution ... we hear within ourselves the constant cry: There is the problem, seek the solution. You can find it through pure thought...

D. Hilbert

1. L’ipotesi del continuo2. La consistenza dell’aritmetica10. L’esistenza di un algoritmo per risolvere le equazioni

diofantee

The solution of three of Hilbert’s problems were to involve mathematical logic and the foundation of mathematics in an essential way; they are the ones numbered 1,2, and 10 in his list

Non parleremo di 1. e 2. e’ il legame con il problema della decisione

Deciding the undecidable: Wrestling with Hilbert’s problemsS. Feferman

Esempio:

E’ possibile scrivere una equazione diofantea che ammette soluzioni intere se e solo se l’ipotesi di Riemann e’ falsa.

Equazioni diofantee: P(x1, ... , xk) = 0 con P polinomio a coefficienti interi

Contrary to Hilbert’s expectations, Problem 10 was eventually solved in the negative. This was accomplished in 1970 by a young russian mathematician, Yuri Matiyasevich, who built on earlier work in 1950’s and 1960’s by the American logicians Martin Davis, Hilary Putnam, and Julia Robinson. [...]

Il decimo problema di Hilbert

Gia’ nel 1920 si sospettava che problemi come il precedente fossero indecidibili. Ma come dimostrare che non esiste un

algoritmo??

Deciding the undecidable: Wrestling with Hilbert’s problemsS. Feferman

La soluzione del secondo problema: il simposio di

Könisberg del 1930During the days immediately preceding Hilbert’s address, a symposium on the foundations of mathematics took place in Königsberg. [...] At the round table discussion that concluded the event, a shy young man named Kurt Gödel [...] made a quiet announcement that, to those who grasped its import, signalled a new era in foundational studies. Von Neumann got the point at once, and concluded that the jig was up, that Hilbert’s program could not succeed.

The Universal Computer

M. Davis

Il programma di Hilbert

1. La consistenzaconsistenza dell’aritmetica (secondo problema di Hilbert)

2. La completezzacompletezza della logica e dell’aritmetica (Gödel 1928)

3. Il problema della decisione (Entscheidungsproblem)

Kurt Gödel

Born: 28 April 1906 in Brünn, Austria-Hungary (now Brno, Czech Republic)Died: 14 Jan 1978 in Princeton, New Jersey, USA

The crucial step in Gödel’s proof was his demonstration that the propertyof a natural number of being the code of a proposition provable in PM isitself expressible in PM.[...]- U says that some particular proposition is not provable in PM.- That particular proposition is none other than U itself.- Therefore, U says: “U is not provable in PM.”

Gödel aveva scritto il primo compilatore e ... decretato la fine del programma di Hilbert!

The Universal Computer

M. Davis

Cosa rimane del programma di Hilbert?

Hilbert had also sought explicit calculational procedures bymeans of which it would always be possible to determine, given some premises and a proposed conclusion, written in the notation of what has come to be called “first-order logic”, whether Frege’s rules would enable that conclusion to be derived from those premises. The task of finding such procedures came to be known as Hilbert’s Entscheidungsproblem (literally: decision problem),

The Universal Computer

M. Davis

C’erano risultati parziali e i granndi giovani matematici erano tutti attivi: F. P. Ramsey, W. Ackermann, P. Bernays , M. Shönfinkel e lo

stesso Gödel

Apparently intrigued by these developments, Newman gave a lecture course in the spring term of 1935 on the foundations of mathematics featuring Gödel’s incompleteness theorem as its climax. Attending this course, Turing learned about Hilbert’s Entscheidungsproblem. Quite apart from the incredulity of such as Hardy, after Gödel’s work it was hard to believe that there could be an algorithm such as Hilbert had wanted. Alan Turing began to think about how it could be possible to prove that no such algorithm exists.

The Universal Computer

M. Davis

Now, if someone comes along with a proposed algorithm to settle a given decision problem in a positive way, one can check to see that it does the required work (or at least try to do so), without inquiring into the general nature of what constitutes an algorithm. But if it is to be shown that the problem is undecidable, one has to have a precise explanation of what algorithms can compute in general.

Deciding the undecidable: Wrestling with Hilbert’s problemsS. Feferman

Alan Turinghttp://www.turing.org.uk/turing/

His high pitched voice already stood out above the general murmur of well-behaved junior executives grooming themselves for promotion within the Bell corporation. Then he was suddenly heard to say: "No, I'm not interested in developing a powerful brain. All I'm after is just a mediocre brain, something like the President of the American Telephone and Telegraph Company."

Quoted in A Hodges, Alan Turing the Enigma of Intelligence, (London 1983) 251.

[...] on the basis of Turing’s analysis of the notion of computation, it is possible to conclude that anything computable by any algorithmic process can be computed by a Turing machine. So if we can prove that some particular task can not be accomplished by a Turing machine, we can conclude that no algorithmic process can accomplish that task. That is how Turing proved that there is no algorithm for the Entscheidungsproblem. In addition, Turing showed how to produce one individual Turing machine that, all by itself, can do anything that could be done by any Turing machine whatever – a mathematical model of an all-purpose computer.

The Universal Computer

M. Davis

Il metodo diagonale nel lavoro di Turing

Now, if we think of the halting set of a Turing machine as constituting a “package” and of the code number of that machine as labeling that package, then we have exactly the typical setup for applying the diagonal method: labeled packages in which the labels are exactly the kind of thing in the packages – in this case, natural numbers.

The Universal Computer

M. Davis

La macchina universale di Turing

The universal machine also provides a model of a “stored program” computer [...] in which the machine makes no fundamental distinction between “program” and “data.” Finally, the universal machine shows how “hardware” [...] thought of as a description of the functioning of a mechanism, canbe replaced by equivalent “software” [...] “stored” on the tape of a universal machine.

On computable numbers with an application to the `Entscheidungsproblem’A. Turing Proc. of the London Mathematical Society 1937

The Universal Computer

M. Davis

Turing’s universal computer was a marvelous conceptual device that all by-itself could execute any algorithmic task. But could one actually build such a thing? And aside from what such a machine could accomplish “in principle,” could it be designed and constructed so as to be able to solve real world problems in an acceptable time frame, and using reasonable available resources?

By the end of 1945, Turing had produced his remarkable ACE (Automatic Computing Engine) Report. One detailed comparison of the ACE Report with von Neumann's EDVAC Report, notes that whereas the latter ``is a draft and is unfinished … more important … is incomplete …'' the ACE Report ``is a complete description of a computer, right down to the logical circuit diagrams'' and even including ``a cost estimate of £11,200.''

The Universal Computer

M. Davis

ACE: la risposta (inglese) di Turing ad Edvac

[It] is … very contrary to the line of development here, and much more in the American tradition of solving one's difficulties by means of much equipment rather than by thought.… Furthermore certain operations which we regard as more fundamental than addition and multiplication have been omitted.---------------------------------------------Alan Turing

Problemi algoritmici in biologia computazionale

Astronomy began when the Babylonians mapped the heavens. Our descendants will certainly not say that biology began with today’s genome projects, but they may well recognize that a great acceleration in the accumulation of biological knowledge began in our era. To make sense of this knowledge is a challenge, and will require increased understanding of the biology of cells and organisms. But part of the challenge is simply to organise, classify and parse the immense richness of sequence data.

Biological sequence analysisR. Durbin, S. Eddy, A. Krogh and G. Mitchinson

Un po’ di storia

• 1953: F. Crick e J. Watson scoprono la struttura a doppia elica del DNA

• anni ’70: si sviluppano le tecniche per il sequenziamento di spezzoni di DNA (F. Sanger)

• anni ’80: viene lanciato il progetto genoma e partono le prime sperimentazioni pilota (insieme alle prime compagnie per lo sfruttamento commerciale di queste ricerche)

• anni ’90: vengono sequenziati i primi organismi (qualche M di paia di basi)

• 1990: viene pubblicato BLAST

• 1998: C. Venter annuncia la costituzione della compagnia privata Celera e sfida il consorzio pubblico per il sequenziaemnto del genoma umano: Celera otterra’ il risultato in 3 anni (e 300 M di $)

http://www.accessexcellence.org/AB/

Human Genome Working Draft Sequencepublished February 15 & 16, 2001

Science and Nature

Clone-by-clone shotgun sequencing

Dietro la sfida:Two main shotgun-sequencing strategies.

Whole-genome shotgun sequencing

Programmi e algoritmi in bioinformatica

[...] Yet other programs provide user-friendly viewers for inspection and editing of the resulting sequence assemblies. A particularly popular suite of programs for these various steps is Phred, Phrap and Consed,which are designed for base calling, sequence assembly and the viewing of sequence assemblies, respectively. [...]

(21 occorrenze della parola “programs” 2 della parola “algorithms”)

Strategies for the systematic sequencing of complex genomesEric D. Green

Programmi e algoritmi nella sfida

Finally, perhaps the most essential element of any whole-genome shotgun-sequencing strategy is the availability of a robust assembly program that can accommodate the inevitably large collection of sequence reads. [...] include algorithms that account for the anticipated spatial relationship of read pairs emanating from individual subclones, which help to avoid misassemblies due to repetitive sequences.

Strategies for the systematic sequencing of complex genomesEric D. Green

Com’e’ finita la sfida?

Among the most useful computer-based tools in modern biology are those that involve sequence alignments of proteins, since these alignements oftem provide insights into gene and protein function. There are several types of alignments: global alignments of pairs of proteins, multiple alignments of members of protein families, and alignments made diring data base searches to detect homologies.

S. Henikoff and J.G.Henikoff PNAS 1992

L’allineamento di sequenze

GTTGAT_TAGCTTATCCCAAAGCAAGGCACTGAAAATG_CTAGATGT_GATGTAGCTTAACCCAA_GCAAGGCACTAAAAATGCCTAGAT

Input:

GTTGATTAGCTTATCCCAAAGCAAGGCACTGAAAATGCTAGATGTGATGTAGCTTAACCCAAGCAAGGCACTAAAAATGCCTAGAT

Output:

Cos’e’ un allineamento?

Algoritmi

• Needelman-Wunsh 1970• Smith –Waterman 1981• Landau-Vishkin 1986• Wu-Manber 1992• Myers 1994• Chang-Lawler 1994• ...

G T T G A T T A G C T T A

G 0 1 2 3 4 5 6 7 8 9 10 11 12

T 1 0 1 2 3

G 2 1 1 1 2

A 3 2 2 2 1

T 4 3 2

G 5 4 3

T 6 5 4

A 7 6 5

GTTGATTAGCTTATCCCAAAGCAAGGCACTGAAAATGCTAGATGTGATGTAGCTTAACCCAAGCAAGGCACTAAAAATGCCTAGAT

GTTGAT_TAGCTTATCCCAAAGCAAGGCACTGAAAATG_CTAGATGT_GATGTAGCTTAACCCAA_GCAAGGCACTAAAAATGCCTAGAT

Altri problemi algoritmici correlati

• exact-matching (un problema piu’ “vecchio” e forse meno “applicativo”, gli algoritmi per la cui soluzione si sono rivelati fondamentali)

• strutture dati (non conviene rappresentare in memoria sequenze come stringhe ma come sistemi di indici per tutti i possibili suffissi della sequenza)

• protein folding (un bel problema NP-completo che ci hanno regalato i biologi)

• ...

Riflessioni conclusive

• Il problema della decisione poteva essere difficile ma era enunciato in modo chiaro e preciso. Matematicamente “pulito”.

• I problemi algoritmici in biologia computazionale non sono sempre altrettanto “puliti” (forse, piu’ sono interessanti e piu’ sono “sporchi”).

• In cosa consiste veramente la complessita’ di un problema algoritmico?

Complessita’: le risorse che abbiamo sono finite

Advances in our ability to compute are bringing us substantially closer to ultimate limitations

D. Knuth

Mathematics and Computer Science: Coping with Finiteness

Che risorse (computazionali) abbiamo?

40 miliardi di anni luce

10-13 cm

Universo

protone

10125

(maggiore o uguale al) numero di protoni nell’universo

Se assumiamo una unita’ di tempo pari al tempo necessario alla luce a viaggiare per 10-13 cm e assumiamo che l’universo sia nato 10 milioni di anni fa, il numero di

unita’ di tempo trascorse e’ minore o uguale a

1042

Che “speranze” abbiamo

• snail 0.0006 miles/h

• man 4 miles/h

• US auto 55 miles/h

• Jet 600 miles/h

• Supersonic jet 1200 miles/h

• man (pencil) 0.2/sec

• man (abacus) 1/sec

• calculator 4/sec

• computer 200.000/sec

• fast computer 2M/sec

start

finish

Grid problem: calcolare il numero di cammini da start a finish

Il problema e’ difficile

• non ci sono metodi noti per calcolare il numero di cammini (in a reasonable amount of time)

• possiamo comunque generare dei cammini random e usare un teorema di statistica che ci dice che la stima migliore e’ data dalla media dei reciproci delle probabilita’ osservate

• otteniamo una stima enorme: (1.6 ± 0.3) 1024

il problema di stabilire una (qualunque) proprieta’ dei cammini sulla griglia e’ algoritmicamente trattabile?

non possiamo contare nemmeno su una procedura esaustiva per enumerare i cammini!

Forse abbiamo bisogno di una teoria della complessita’ algoritmica che ci permetta di

classificare questo come un problema difficile

Un problema semplice (da enunciare) e “pulito”, ma ...

Conclusioni

I problemi algoritmici costituiscono l’ossatura dell’informatica e le loro soluzioni richiedono uno sforzo (matematico) genuino e particolare

I problemi algoritmici si sono rivelati essere “dietro la scena” in momenti cruciali dell’avanzamento scientifico

La complessita’ ed una teoria adeguata per il suo studio e’ probabilmente la piu’ interessante delle attuali sfide algoritmiche

My favorite way to describe computer science is to say that it is the study of algorithms.

D.Knuth

Recommended