13
Number 1 : 2011 The Norwegian metacenter for computational science A magazine published by the Notur II project The QUIET experiment – looking for gravitational waves in a sea of noise Pore scale simulations of CO 2 sequestration Barcoding ancient DNA Trajectories from ocean currents

The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

Number 1 : 2011

The Norwegian metacenter for computational science

A magazine published by the Notur II project

The QUIET experiment– looking for gravitational waves in a sea of noise

Pore scale simulations of CO2 sequestration

Barcoding ancient DNA

Trajectories from ocean currents

Page 2: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

2 3

EDITORIAL

Cover picture: © Crestock

Number 1 : 2011A magazine published by the Notur II project – The Norwegian metacenter for computational science

Editorial Director: Jacko KosterContributing Editor: vigdis guldsethSubscription: An electronic version is available on www.notur.no. Here you can download a PDF-file of the magazine and subscribe or unsubscribe to the magazine.ISSN: 1890-1956

Reproduction in whole or in part without written permission is strictly prohibited.e-mail address: [email protected]. Phone: +47 73 55 79 00. Fax: +47 73 55 79 01.Print: Skipnes

Number 1 : 2011

The Norwegian metacenter for computational science

A magazine published by the Notur II project

The QUIET experiment– looking for gravitational waves in a sea of noise

Pore scale simulations of CO2 sequestration

Barcoding ancient DNA

Trajectories from ocean currents

Computational science (or scientific computing) is the field of study concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems. In practice, it is typically the appli-cation of computer simulation and other forms of computation to problems in various scientific disciplines. Over the last three decades, computational science has estab-lished itself as the “third pillar” of scientific enquiry alongside theory and experiment. The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation and fresh insight into the research process that is otherwise unobtainable.

Computational science includes several problem domains. For example, numeri-cal simulation uses (numerical) methods to quantitatively represent the evolution of a physical system and has different objectives depending on the nature of the system being simulated, for example understand or reconstruct known phenomena (e.g., natural disasters) or predict unobserved or future situations (e.g., particle physics, weather). In model fitting and data analysis, models are tuned or equations solved to reflect observations, subject to model constraints (e.g. computational lin-guistics). And a third domain is computational optimization to improve known sce-narios (e.g., manufacturing processes, engineering).

This issue of META contains five contributions from a variety of fields that make use of the national infrastructures for high-performance computing (Notur II) and scientific data (NorStore), and illustrate advances in research using computational science, plus associated societal and/or economic impact.

Jacko Koster

Managing Director

UNINETT Sigma AS

CONTENTS

The QUIET experiment p : 4

Barcoding ancient DNA preserved in Arctic permafrost p : 10

Statistics of trajectories from ocean currents p : 14

Pore scale simulations of CO2 sequestration p: 17

Automatic Generation of Optimization Code p : 20

Disclaimer: UNINETT Sigma is not responsible for errors, opinions or views expressed by the authors or the contents of external information referred to in this magazine.

FEATURES

With the Notur and NorStore resources Norwegian

scientists can actually play a leading role in one of the biggest games in modern

cosmology - the search for primordial gravity waves.

In the particular case of CO2 storage

good computer simulations

of pore scale flow can give fast and reliable information about

the quality of a potential reservoir rock to

trap and store CO2

As part of the preparations for exploration of

off-shore petroleum

resources in Norwegian waters,

an Environmental Impact Assessment is

produced, in which the risk of various types of accidents and the

potentional impacts are described

:17

:14

:4

© C

rest

ock

Photo: Crestock

Page 3: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

4 5

ALBERT EINSTEIN, GRAVITY WAVES AND THE BIGGEST BANG OF THEM ALLThe background for the QUIET experiment begins almost 100 years ago, on Novem-ber 25th 1915, to be precise. On this date, Albert Einstein submitted a paper called “The field equations of gravitation” to the Prussian science academy, which provides a fundamentally new understanding of gravity. While Newton believed that grav-ity was a somewhat mysterious force that pulled bodies together through large dis-tances, Einstein claimed that gravity was nothing but the manifestation of a curved spacetime, and any body simply attempts to move in the straightest possible line within this curved space. This idea can be illustrated by a pilot flying around the world: Locally, the pilot always attempts to fly in the straightest possible line, and yet, after some 36 hours he comes back to the point where he started his flight, because the Earth is spherical.

As with any great scientific theory, this idea could be tested by observations. The first opportunity to do so came in 1919, when British astronomer Lord Eddington measured the bending of light around the Sun during a total solar eclipse: According to Einstein’s theory the amount of bend-ing would be twice as large as predicted by Newton’s theory. In a spectacular media display, Eddington confirmed Einstein’s predictions, and Einstein rose to instant world fame.

Einstein’s theory made many other pre-dictions as well. For QUIET, two are of

particular interest. First, according to Einstein the gravitational field set up by a massive body should behave in a simi-lar way as the electromagnetic field: Dis-turbances in this field should move at the speed of light, just like photons, forming what is called gravitational waves. Such waves arise whenever strong gravitational sources accelerates, say, when two black holes orbit each other. However, the physi-cal magnitude of such variations is typical-ly very small indeed: The waves produced by the Earth orbiting the Sun correspond to a stretching of space by only one part

in 1026! For this reason, no gravitational waves have been directly observed as of 2011.

The second important prediction is that the universe as a whole expands, as was later observed directly by Edwin Hubble in 1929. This effect can be compared to the surface of an inflating balloon; as the balloon in-flates its surface area becomes larger and larger, and the distance between any two points on the surface increases. The same happens in the universe; all galaxies move away from each other. And this has a very interesting and simple consequence: If the universe expands today, then it must have been smaller in the past. And that means that the density of the universe must have been higher in the past. And when the den-sity increases, so does the temperature. Taking this idea to its extreme, Einstein’s theory therefore makes the following sim-ple prediction: The universe started as an extremely hot and dense gas, in which only the very simplest elementary particles could exist; anything heavier than photons, protons and electrons would be instantly destroyed by the extreme heat. This idea is today known as “the Big Bang”.

THE COSMIC MICROWAVE BACKGROUND AND TWO NOBEL PRIZESUntil 1965, the Big Bang theory was mostly a theoretical speculation, based on math-ematics rather than hard observations. However, this all changed when two sci-entists at Bell Laboratories, Arnos Penzias and Robert Wilson , accidentally made the discovery of a lifetime: While they were

A U T H O R SHans Kristian Eriksen

Associate professorInstitute of Theoretical Astrophysics

University of Oslo

Sigurd Kirkevold NæssDoctoral research fellow

Institute of Theoretical Astrophysics University of Oslo

The QUIET experiment – looking for gravitational

waves in a sea of noise

Ingunn Kathrine WehusResearch associate

Imperial College London and Department of Physics

University of Oslo

From a mountain top reaching 5080 meters above sea level, situated in the driest desert in the world, some of the world’s most sensitive arrays of “miniature Tv antennas” have spent the last 30 months gazing at the sky, looking for tiny wrinkles in the fabric of space itself: Wrinkles that would reveal what the universe looked like when it was only 10-34

seconds old; wrinkles with a relative amplitude of perhaps no more than a few parts in a billion; and wrinkles that would qualify their discoverer for a Nobel prize. This experiment is called the “Q/U Imaging ExperimenT” (“QUIET” for short), and is one of several experi-ments competing to become the first to detect the tiny signal. Right in the middle of this race are the University of Oslo and NOTUR, whose computational expertise and resources have enabled Norwegian scientists to take on leading roles in the experiment.

In 1915 Albert Einstein published the theory of general Relativity. Today, this is still our best theory for describing the universe as a whole, and it is the foundation of modern cosmology.

Heavy objects orbiting each other produce gravitational waves. Such waves were also produced shortly after the Big Bang, and CMB experiments, like QUIET, are today trying to detect these waves.

© C

alif

orni

a In

stit

ute

of T

echn

olog

y

NAS

A/To

d St

rohm

ayer

(gSF

C)/

Dan

a B

erry

(Cha

ndra

X-R

ay O

bser

vato

ry)

Page 4: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

6 7

trying to make a picture of the Milky Way using a new radio antenna, they found an unexpected “noise term” in their obser-vations of 3 degrees Kelvin. And this sig-nal had one very peculiar property: It was equally strong in all directions on the sky. No known signal behaved like this. Except the residual radiation from the Big Bang, now cooled from an original temperature of 3000K to 3K. Penzias and Wilson had discovered the echo from the Big Bang by accident, today known as the “cosmic mi-crowave background”, or CMB for short. For this, they received the Nobel Prize in physics in 1978.

Shortly after the first detection of the CMB by Penzias and Wilson, it was realized that the background should in fact not be per-fectly uniform, but rather have tiny fluctua-tions around the mean. The reason is sim-ply that we see structure in the universe around us today; there are galaxies, solar systems, and even us. If the universe had been perfectly smooth early on, it would remain so to the very end. Further, if it was possible to measure these tiny fluc-tuations, cosmologist (that is, physicists which study the entire universe) found that it would be possible to extract an immense range of cosmological information from them. Everything from the age and con-tents of the universe to the evolution and destiny would be constrained. Needless to say, an intense flurry of activity started, as physicists started to build more and more sensitive instruments to make detailed measurements.

The breakthrough came with the NASA satellite called “the Cosmic Background Explorer” (COBE), which found the first signs of CMB fluctuations in 1992. These measurements showed that there were in-deed small perturbations in the CMB, cor-responding to temperature variations on the sky of ~10 muK -- just as the theorists had predicted almost forty years earlier. While the discovery of Penzias and Wilson can be said to change cosmology from a field of speculation into a proper branch of physics, the COBE discovery opened a new window of the early universe, which even-tually transformed cosmology into a high-precision science. For this, the leaders of the COBE mission, John Mather (NASA) and george Smoot (Berkeley), received the Nobel prize in physics in 2006.

Since then, many experiments have im-proved greatly on the observations made by COBE. Two particularly important experi-ments are WMAP, a second NASA-funded satellite which operated between 2001 and 2010, and Planck, an ESA-funded satellite currently taking data. The University of Oslo is a member of Planck, and scientists at the Institute of Theoretical Astrophys-ics (ITA) are just these days analyzing the data that are sent down from this satellite. Once Planck is finished with its job, it will provide a map with sufficient resolution and signal-to-noise that it should never be necessary to measure it again.

INFLATION AND THE BIRTH OF ALL GALAXIESHowever, just as one chapter of CMB cos-mology is about to end, another is just starting. To explain the background for this, it is necessary to go back to 1981, when theorists were asking themselves one important question: Why do we ob-serve the same temperature in two oppo-site directions on the sky the same? Since the photons we observe at the Earth today have been travelling towards us ever since the beginning of time, for more than 13 billion years, it would take 26 billion years for one of those photons to reach the birth place of the second photons. And since the universe is indeed only 13 billion years old, there is no time for this to have happened. So what was going on?

To explain this paradox, theorists proposed something very strange, today known as cosmic inflation: When the universe was only some 10-34 seconds old, it under went a very short period of extremely rapid ex-pansion, during which it increased in size by a factor of 1026! For comparison, that is about the same as inflating a balloon to the size of the entire observable universe -- in only 10-35 seconds!

As crazy as this idea sounds, it did have many attractive features. First, it could explain many of the paradoxes that plagued theoretical cosmology at the time; the smoothness of the universe is one ex-ample, the flatness of the universe is an-other. Second, it also provided a physical mechanism for how the very first struc-tures in the universe were generated: According to the inflationary theory, every-thing we see around us started simply as quantum fluctuations during that epoch of rapid expansion 10-34 seconds after the Big Bang. Third, and most importantly, the theory made concrete predictions that could be tested observationally.

The predictions made by inflation can be divided into two classes. First there are the circumstantial pieces of evidence: If the inflationary scenario is right, the fluctu ations we observe in the universe should be what is called isotropic (ie., look statistically similar in all directions on the sky), scale-invariant (ie., there should be no preferred length scale in the universe), and gaussian (ie., noise-like). The follow-up experiment to COBE, WMAP, has veri-fied all these predictions, and they appear

to be OK. Unfortunately, these were all predicted by cosmologists *before* inflation came along in 1981, and they can therefore not be taken as evidence of inflation .

However, the second class of evidence is something that is unique for inflation: If inflation is correct, there should be a background of gravity waves permeat-ing the universe, quantum mechanically created during the violent period of rapid expansion . This signature is almost impos-sible to generate by any other mechanism. If ever detected, this signature will there-fore provide direct evidence for in flation, and thereby the creation of everything we see around us today. And as if that was not enough, this detection will also be the first direct evidence of gravity waves, predicted by Einstein almost 100 years ago. It should come as no surprise that the discoverer of such fundamental physics will surely qualify for a new Nobel prize. And simi-larly, it should come as no surprise that scientists all over the world are currently looking very hard for these gravity waves. And QUIET is one of the main players in this game.

Matter curves spacetime, and objects moves within that spacetime along locally straight lines. However, a locally straight line may look curved from a distance.

Tiny fluctuations in the temperature of the CMB, as measured by the WMAP satelite. The red and blue spots in this map correspond to places in the universe with lower and higher density than the average. The high-density spots later evolved into galaxy structures.

Phot

o: N

ASA

NAS

A/To

d St

rohm

ayer

(gSF

C)/

Dan

a B

erry

(Cha

ndra

X-R

ay O

bser

vato

ry)

Page 5: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

8 9

GRAVITY WAVES, CMB POLARIZATION AND B-MODESThe most promising route toward detect-ing these primordial gravity waves is once again provided by the CMB fluctuations, but this time in a more subtle way than for the simple density fluctuations. The idea is the following: When a gravity wave moves through space, the space expands in one direction and compresses in the other. And when space itself expands or contracts, so does the photons within it. This, in turn, leads to a *polarized* CMB signal: The observed signal from a single point on the sky is hotter in one polarization compo-nent than the other.

Unfortunately, primordial gravity waves is not the only physical mechanism that generates polarization, as the same un-der- and overdensities we observe in the unpolarized CMB also generate a pattern of polarization fluctuations on their own, even in the absence of gravity waves. How-ever, the gravity waves signal has one particular feature that does distinguish it from other signal: It has a so-called diver-gence-free part called “B-modes”, behav-ing mathematically similar to a magnetic field. This component is the unique feature of early universe gravity waves that every-body is searching for.

Above we said that the magnitude of the temperature fluctuations were small, hav-ing an amplitude of only some ~10 muK relative to an absolute background of 3K. However, the B-modes are smaller yet,

with an expected amplitude of perhaps 0.1 muK or even less. In other words, CMB polarization experiments are looking for fluctuations that are seven orders of mag-nitude smaller than the mean on which they sit! For comparison, that is like meas-uring the height of a person to an accuracy of one thousandth of a millimeter!

THE QUIET EXPERIMENTTo produce such high sensitivity and clean pictures of the CMB sky, it is necessary to build extremely sensitive detectors -- and many of them. This poses many serious technical challenges, two of which are size and cost. Ten years ago, the typical CMB detector (called radiometer) was built by hand, and was about 30 cm end-to-end. The cost of a single device was therefore high, running at about $40,000, and it took a long time to calibrate and test it. Further, it was physically impossible to put many of these into a single telescope, since they were so large.

However, in 2003 a major breakthrough was made at the Jet Propulsion Laborato-ry (JPL/NASA), where two scientists, Drs. Todd gaier and Michael Seiffert, managed to design a fully functional radiometer on a 3-by-3 cm chip (MMIC), using similar tech-niques as those used for producing com-puter processors. This solved both of the two mentioned problems: First, the size became small enough to allow for a large array of the devices in a single telescope, and the cost of a single detector dropped to about $500.

Quickly realizing the potential massive scientific importance of this breakthrough for CMB cosmology, scientists from many institutions and countries organ-ized themselves to establish what is today known as the QUIET experiment. The goal of this experiment was to build and field a telescope based on the new detector tech-nology, attempting to measure the much sought-after gravity wave signal from in-flation. Today, this has grown into a col-laboration of 14 top institutions around the world (Caltech, Chicago, Columbia, Fermi-lab, JPL, KEK (Japan), Manchester, Max-Planck Institut (germany), Miami, Michi-gan, Oslo, Oxford, Princeton and Stanford), and about 50 scientists are working on it.

While having excellent detectors is cru-cial for searching for the CMB polariza-tion signal, this is not sufficient by itself. One problem is the Earth’s atmosphere. Water vapour absorbs radio waves, and at some frequencies the atmosphere is completely opaque. It is therefore better to observe through less atmosphere, and in a dry environ ment. For this reason, QUIET is located at the Chajnantor plateau, 5080 meters above sea level, in the Atacama

desert in Chile, the driest desert in the world. The only other site with competitive qualities in the world is the South Pole, which some CMB experiments do use for their observations. However, due to rather obvious infrastructure advantages, the Chile site was preferred for QUIET.

Equally important is it to suppress any form of systematic effects. Again, the gravity wave signal one is looking for has an amplitude of about 0.1 muK; for com-parison, the surrounding environment temperature is about room temperature, 300K, or nine orders of magnitude larger! Any small temperature fluctuation, not to mention pointing the telescope into a hot region, can therefore completely swamp any cosmologically relevant signal, unless properly accounted for.

QUIET DATA ANALYSIS IN NORWAYAnd it is at this stage that the University of Oslo and NOTUR enters the story: CMB data analysis is a complicated and computation-ally very demanding business. In order to detect the extremely valuable 0.1 muK sig-nal, buried deep beneath a sea of instru-mental noise, not to mention various sys-tematic effects and astrophysical irrelevant features (such as radiation from our own Milky Way), one has to scan through tens of terabytes of data repeatedly, looking for subtle correlations. This requires expertise not only in physics and astronomy, but also in statistics, image analysis, and, most im-portantly, in high-performance computing.

And this is one of the main strengths of the Institute of Theoretical Astrophysics compared to the international astrophysics community in general: Through a dedicated

and targeted effort during many years, the institute has built up world-leading exper-tise in high-performance computing for astrophysical problems, both in terms of manpower and hardware. And this is pre-cisely why the University of Oslo was invited into the QUIET experiment in the first place.

However, it is not sufficient only to have the expertise if one does not also have the computer resources to match it. For QUIET, the NOTUR and UiO-based cluster Titan provides the computational power, while NorStore provides the storage space. With-out these resources (and the user support provided by the Titan support staff!) it would have impossible to successfully complete the analysis on Norwegian ground. Con-versely, with these resources Norwegian scientists can actually play a leading role in one of the biggest games in modern cos-mology - the search for primordial gravity waves.

PRELIMINARY RESULTS AND A LOOK TOWARDS THE FUTUREOn December 23rd 2010, the first phase of the QUIET experiment ended. At that time, about 100 first-generation test detectors had been looking towards the sky for two and a half years, 24 hours per day, 7 days per week, and shutting down only during (occasional) bad weather or maintenance. This phase was designed to test the detec-tors, rather than making the ultimate B-mode measurement, before asking funding agencies for the necessary funds to build the real experiment.

Fortunately, we are very happy to report that the pilot phase was a great success. Using only the data taken during the first

nine months of operations, the first QUIET results were submitted to the Astrophysical Journal on December 14th, 2010 , already providing the second best measurement in the world of the CMB polarization sig-nal; only one South Pole based experiment claims slightly lower upper bounds on the gravity wave amplitude. Once the results from the full QUIET data set, comprising two and half years worth of data, is pub-lished later this year, it is clear that QUIET will set a new clear world-record in the hunt for primordial gravity waves.

As thrilled as we are with this success, there is of course no time to rest on our laurels. Other experiments are improving their detectors and methods, and so are we. In fact, the QUIET detector development group has already managed to reduce the noise by more than a factor of three relative to the pilot phase detectors, and this im-provement alone is equivalent to having ten times as many detectors in the telescope. The plan is to field 500 of these detectors either in 2013 or 2014, and let them scan the sky for up to five years. If so, QUIET has a real fighting chance to see the warp-ing and twisting of space that happened only 10-34 seconds after the Big Bang, and which eventually created everything we see around us in the universe today. Not bad for a small telescope sitting at a mountain top in Chile, gazing toward the skies.

References“First Season QUIET Observations: Measurements of CMB Polarization Power Spectra at 43 gHz in the Multipole Range 25 <= ell <= 475” , QUIET Collabo-ration, 2010, http://arxiv.org/abs/1012.3191

The QUIET telescope, located 5080m above sea level in the Atacama desert in Chile.

Traditional radiometer (a/left), around 30cm long, compared to a new QUIET radiometer (b/right), 3cm wide.

a b

Maps of the CMB polarization from the QUIET pilot phase. The gravitational wave signal would appear in the B-mode map (right), but this is currently dominated by instrumental noise.

Phot

o: Q

UIE

T

Phot

o: Q

UIE

T

Phot

o: Q

UIE

T

Page 6: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

1110

The past is the key to the futureWith the effects of the current global cli-mate change becoming more apparent every day, there is an immediate interest to improve our ability to predict how spe-cies and ecosystems will respond in the near future. Will species’ niches remain stable and do species track their niche as it moves geographically? Do species as-semblies remain constant when the cli-mate in which they occur changes? And how does this vary among major taxa, such as plants, mammals or invertebrates, or perhaps also within such large taxonomic groups? These are just a few examples of the many questions scientists are dealing with, and they reflect the different scales at which research is needed.

In the not too distant past the Earth ex-perienced a series of very strong climatic fluctuations, referred to as the Quaternary ice ages or glaciations. Major glacial (cold) and interglacial (warm) stages oscillated in 100 thousand and later in 41 thousand year cycles. By studying the changes in species’ distributions throughout those periods we can improve our understand-ing of species’ and ecosystem response to climatic change by answering questions such as the ones mentioned above, and perhaps increase our ability to predict the responses to the current climate changes.

From fossils to sedimentary ancient DNA (sedaDNA)Scientists have long relied on the finding of fossils and pollen to reconstruct past spe-cies distributions, but this changed in 2003 when Prof Eske Willerslev from the Centre for geogenetics in Copenhagen pioneered the retrieval of ancient DNA preserved in sediment (Willerslev et al. 2003). By ex-tracting DNA from 10 thousand to 400 thou-sand year old permafrost cores from Sibe-ria, Willerslev and colleagues were able to identify animal and plant taxa that had once lived in the area. The DNA of these taxa is preserved without the presence of any macrofossils. Instead, the DNA is thought to be bound to soil particles which (at least partly) protect it from enzymatic and micro-bial degradation. Due to the stratigraphic deposition of sediments it is furthermore possible to correlate the presence of spe-cific taxa to environmental and climatic

variables , allowing for more in depth ana-lyses of the prevailing ecosystems.

At the Natural History Museum, University of Oslo, we are working on the “BarFrost” project (Barcoding of Permafrost), in which we aim to reconstruct past ecosystems in the Arctic by identifying species from DNA sequences (a technique know as barcoding) retrieved from permafrost cores. Using new bioinformatic and sequencing tools, we de-sign novel barcoding markers to detect and identify a wide range of taxa (bryophytes, mammals, birds, fungi, insects, springtails) in more than 600 permafrost samples from throughout the Arctic, ranging in age from 10 to several 100 thousand years. We are closely linked to the EU funded Ecochange project in which, among other things, large-scale reconstructions of vascular plant diversity throughout the Arctic are made using similar techniques. We collaborate closely with the Laboratoire d’Ecologie Al-pine (LECA) in France and the Centre for geogenetics in Denmark.

The main technical steps carried out in our project are summarized in Figure 1. First, the permafrost cores are sampled in the field; second, total DNA is extracted from the cores in an ancient DNA laboratory; third, a PCR is performed to amplify the target DNA; fourth, the amplified DNA is sequenced using new sequencing technol-ogies; finally, the sequences are analysed and compared with a reference database for taxonomic identification.

Working with ancient DNA Retrieving authentic DNA from ancient samples is challenging. The DNA is typi-cally highly degraded and present in only very low quantities. As a result, there is a high risk of contamination with externally derived DNA. Common sources of contami-nants include human cells (from people handling the samples), microorganisms, airborne pollen, domesticated animals and/or cultivated plants. To minimize the contamination risk a range of measures are put into place, and this includes per-forming all laboratory work in a dedicated ancient DNA facility. At the Natural History Museum such a laboratory was opened in November 2010 (http://www.nhm.uio.no/english/about-nhm/infrastructure/adna/).

All air entering the laboratory is HEPA fil-tered and a positive air pressure is main-tained to avoid air flowing in from the cor-ridor. All rooms are exposed to a high dose of Uv radiation each night, and extensive cleaning protocols are in place. Research-ers working in the laboratory dress in a full body suit and they wear a facemask to avoid any of their skin cells or hairs get-ting into the samples. Finally, and perhaps most importantly, all researchers using the laboratory undergo extensive training in order to become familiar with procedures required for working with old samples.

Barcoding markers for environmental ancient DNADNA barcoding refers to the identification of organisms (ideally to the level of spe-cies) using a short DNA sequence from a standardized and agreed-upon position in the genome. A barcoding marker thus consists of a variable, diagnostic DNA se-quence, flanked by conserved stretches of DNA on which the primers can bind for DNA amplification (Figure 1). Due to the nature of environmental ancient DNA (highly degraded and containing a mixture of species) the design of suitable barcod-ing markers is greatly constrained, and so-called “standard barcoding markers” that are used by initiatives such as the Barcode of Life Data Systems (BOLD) are not suitable. Instead, we need markers that 1) amplify a very short variable region in the genome (< 200 basepair), and 2) are extremely robust to allow reliable ampli-fication of a large taxonomic group, while at the same time NOT amplifying any non-target species.

To design suitable barcoding markers we make use of two programs that have been developed by our collaborators at LECA: ecoPrimer and ecoPCR (Ficetola et al. 2010). The first program, ecoPrimer , detects conserved regions within a full set of DNA sequences to design PCR primers . At the same time, the program evalu-ates the quality of the variable, diagnostic stretch of DNA in between the conserved regions. It is possible to constrain a range of variables such as the minimum and max-imum length of the barcode, the minimum proportion of target taxa that the primers should bind to, and the maximum propor-

A U T H O R SSanne BoessenkoolPostdoctoral fellow

National Centre for BiosystematicsNatural History Museum

University of Oslo

Laura EppPostdoctoral fellow

National Centre for BiosystematicsNatural History Museum

University of Oslo

Eva BellemainPostdoctoral fellow

National Centre for BiosystematicsNatural History Museum

University of Oslo

Barcoding ancient DNApreserved in Arctic permafrost

© C

rest

ock

Spinning samples in the centrifuge in the ancient DNA lab.

Phot

o: S

anne

Boe

ssen

kool

Page 7: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

12 13

The first step in our analyses includes filter ing of the data: each sequence has to be sorted according to the unique tag that was added in the PCR. With this tag, sequences can be assigned to the original permafrost sample from which they were amplified. Following filtering, artefact sequences are removed and further er-ror checks are performed. The final goal of our analyses is to determine the taxo-nomic identity of each of our sequences. Several methods have been developed for such a task, using a diversity of algorithms based on either BLAST, phylogenetic infer-ence, length of the longest common sub-sequence or yet other ones. Which method performs best is currently unknown, and likely depends on the taxonomic diversity of the group that is being studied.

Determining the taxonomic identity of the sequences is strongly dependent on the availability of good reference databases. In the Ecochange project such a database was developed for the vascular plants from the Arctic (Sønstebø et al. 2010), and within Barfrost we are developing an additional database for Arctic and boreal bryophytes. We work closely with taxono-mists specialised in the respective groups, who provide lists of the abundant and/or ecologically important species that have to be included and also help to obtain speci-mens from which DNA can be sequenced for the reference database. Nevertheless,

creating such databases is not neces-sary and not feasible for each taxonomic group that we work with. For relatively small groups such as the mammals there is no immediate need for a completely new database, becaus e most species that we can encounter in the permafrost have already been sequenced reliably and these sequences are available in the pub-lic domain . For other extremely diverse groups, such as for example the fungi, creating a complete reference data base is such a large task that is beyond the possibilities within our project. Instead, fungi sequences are identified to MOTUs (Molecular Operational Taxonomic Units), which means that in practice perma frost sequences that do not match a single taxon in the public data base but actually correspond to several taxa are identified to the level of the last common taxon.

Current statusFirst results from the Ecochange project have shown that genetic barcoding of per-mafrost samples is a promising new tool for palaeoecological reconstructions of Arctic vascular plant diversity (Sønstebø et al. 2010). The taxonomic diversity that can be detected is high and the overall reso-lution obtained is considerably improved compared to traditional fossil pollen ana-lyses. We have now explored this method for other taxonomic groups, by design-ing new barcoding markers and testing whether these taxa can be amplified from ancient permafrost sediments from the Arctic. Results from these experiments show that success rates are highly vari-able among taxonomic groups. Confirm-ing earlier studies (Willerslev et al. 2003; Lydolph et al. 2005; Haile et al. 2009), DNA from mammalian species such as woolly mammoth, bison and moose can be reliably amplified, and a diversity of fungi sequences such as ectomycorrhiza fungi, wood decayers as well as soil-saprotrophs can be detected. For other taxa the situa-tion is more challenging, because ampli-fication success is much lower and more variable. It seems that DNA preservation in permafrost sediments is strongly taxon dependent, perhaps relating to factors such as density, ecology and/or physiology of the different species.

Future perspectivesComparing the data obtained for mam-

mals and fungi to the vascular plant DNA will allow more comprehensive investiga-tions of past ecosystems than has been possible up to now, and provide the data required to tackle more in depth questions related to changes in species distributions over time. For other taxa we are currently investigating the possibilities as well as the limitations of our methods and evalu-ating the most promising approaches. This young field of sedimentary ancient DNA still has to deal with many challenges in the laboratory as well as analytically, but the data that we are obtaining is providing exciting new insights to palaeoecology.

ReferencesFicetola, g.F., Coissac, E., Zundel, S., Riaz, T., Shehzad, W., Bessière, J., Taberlet, P., Pompanon, F. An In Silico approach for the evaluation of DNA barcodes. 2010. BMC Genomics, 11:434.

Haile, J., Froese, D.g., MacPhee, R.D.E., Roberts, R.g., Arnold, L.J., Reyes, A.v., Rasmussen, M., Nielsen, R., Brook, B.W., Robinson, S., Demuro, M., gilbert, T.P., Munch, K., Austin, J.J., Cooper, A., Barnes, I., ller, P.M., Willerslev, E. 2009. Ancient DNA reveals late survival of mammoth and horse in interior Alaska. Proceedings of the National Academy of Sciences. 106 (52), 22363–22368.

Sønstebø, J.H., gielly, L., Brysting, A.K., Elven, R., Edwards, M., Haile, J., Willerslev, E., Coissac, E., Rioux, D., Sannier, J., Taberlet, P., Brochmann, C. 2010. Using next-generation sequencing for molecular reconstruction of past Arctic vegetation and climate. Molecular Ecology Resources. 10 (6), 1009-1018.

Lydolph, M.C., Jacobsen, J., Arctander, P., gilbert, T.P., gilichinsky, D., Hansen, A.J., Willerslev, E., Lange, L. 2005. Beringian Paleoecology Inferred from Permafrost-Preserved Fungal DNA. Applied and environmental microbiology. 1012-1017.

Willerslev, E., Hansen, A.J., Binladen, J., Brand, T.B., gilbert, T.P., Shapiro, B., Bunce, M., Wiuf, C., gilichinsky, D., Cooper, A. 2003. Diverse Plant and Animal genetic Records from Holocene and Pleis-tocene Sediments. Science. 300, 791-795.

tion of non-target taxa that the primers may bind to. The primer pairs that are identified by eco-Primer are subjected to a more in depth evaluation, both by visually analyzing a sequence alignment and by running an electronic or in silico PCR in the second program, ecoPCR. This electronic PCR can be performed on very large da-tabases such as the complete genbank or EMBL, providing a complete list of all the DNA se-quences that would, in theory, be amplified by the primer pair.

Using these bioinformatics tools, we have designed barcoding markers suitable for the am-plification of DNA preserved in degraded environmental sam-ples. These markers target sev-eral groups of organisms such as birds, fungi, bryophytes and beetles, with more groups being added in the future . All mark-ers are tested and optimized in the laboratory to ensure optimal performance. The next step in the process is to test whether these taxa can also be amplified from the ancient perma frost samples.

Sequencing DNA preserved in perma frostTotal DNA is extracted from the permafrost sediment samples in the dedicated ancient DNA facil-ity. PCR reactions with the differ-ent barcoding markers are set up in the same facility, but then the actual PCR is performed in our modern laboratories to avoid any back-contamination from the PCR product to the original DNA extract. We visualise the PCR products on agarose gels to check whether any DNA was successfully amplified. Depend-ing on the barcoding marker that is used, we typically see success rates between 5 and 90%, but due to the low concentration of DNA in our original samples the success can be extremely variable.

Positive PCR products are sequenced on the Roche 454 gS FLX Titanium at the

Norwegian High Throughput Sequencing Centre. This “next-generation” sequenc-ing technology is based on a novel fibre-optic slide of individual wells and is capa-ble of massively parallel DNA sequencing providing an unprecedented rate of output

data. To be able to sequence multiple PCR products at the same time we make use of a multiplexing system where each PCR product contains a 10 base-pair unique tag. The tagging sys-tem together with the enorm ous output from the 454 has meant a serious breakthrough for stud-ies using environmental, multi species samples such as per-mafrost sediments. Previously, each PCR product had to be cloned and each clone had to be sequenced individually. This time consuming and costly pro-cess only allowed the analysis of a limited number of clones (e.g. 20) per sample. Now, we have thousands of sequencing reads per sample, which can be com-pared to thousands of clones! Thanks to this change in tech-nology we are now able to get a more in depth and complete view of the diversity in our samples.

Analysing the sequencing outputThe enormous output of the new sequencing technology has led to new, analytical challenges. The old way of visually looking at each sequence individually is no longer feasible. Instead, automated bioinformatic ap-proaches are needed that can reliably process large numbers of sequences at a time. Where-as previously the biologist was usually capable of analysing his or her own sequencing data, we now find ourselves increasingly dependent on bioinformaticians who are computationally skilled while, at the same time, able to understand the biological side of the questions we are asking. For many applications there are no ready-made programs avail-able to analyse the data, and new, personalised scripts have to be developed. In the BarFrost

project we work closely with bioinformati-cians at LECA who have developed a set of scripts implemented in the OBITools package that are applicable for the data we generate.

Extracting DNA in the hood in the ancient DNA lab.

Figure 1: From permafrost sampling to identification of past species assemblages.

Acknowledgments We would like to thank our colleagues Christian Brochmann and galina gusarova, and our collaborators Eske Willerslev, Pierre Taberlet, Eric Coissac, Mary Edwards, James Haile, Håvard Kauserud, Hans Stenøien, vladimir gusarov and Arild Johnsen. The BarFrost project is supported by the Research Council of Norway under grant no. 191627/v40.

© C

rest

ock

Phot

o: S

anne

Boe

ssen

kool

Page 8: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

14 15

Such statistics are based on computati-ons of the decay of the spilled substance (e.g. due to evaporation), and its advec-tion by ocean surface currents. In this article, the consequences of applying different algorithms for generation of trajectory statistics are discussed. The impli cation of the discussion below is that in order to gene rate ocean circu lation results from which reliable trajectory statist ics can emerge, the requirement for computational resources can pres-ently only be met by high performance computing facilities like NOTUR .

Traditionally, trajectory statistics have been computed based on an assump-tion of a deterministic relationship be-tween winds and surface currents, usu-ally in the form of currents being set to a given fraction of the wind speed (e.g.

0.03), directed at a specified angular off-set from the wind direction (typically 15º to the right in Norwegian waters). The resulting current may be superimposed on a stationary or seasonally varying large-scale ocean current climatology. A consequence of this algorithm is that the spatial scales of the surface currents in the ocean become slave to the horizontal scales of the atmospheric circulation.

Observations which suggested that varia-bility is present on much finer scales in the ocean date back more than a century to Nansen and Helland-Hansen (1909), who found “puzzling waves” in their ob-servations of temperature and salinity in the Norwegian Sea. The theoretical foun-dation for such waves, derived some dec-ades later (e.g. Eady, 1949), reveal that the scales depend on the stratification of the fluid in question. Due to differences in their properties, internal instabilities in the ocean will have much smaller spa-tial scales than their corresponding fea-tures in the atmosphere. Nevertheless, it was not until the era of satellite imagery, starting in the 1980s, that the vitality of the ocean circulation on these scales was confirmed observationally.

To be more specific, these atmospheric features which include pressure systems (highs and lows), are much larger than their oceanic counterparts, meanders and eddies. The panels in Figure 1 clearly demon strate the difference in scales, as we note that the region shown in the bottom panel, in which oceanic features are displayed, corresponds to the blue region in the top panel.

Due to the contrasting scales in the ocean and the atmosphere, the traditional approach for producing ocean trajectory statistics is flawed. In any given ocean region, eddy features outnumber atmos-pheric pressure systems by a factor of several orders of magnitude. Thus, the variability of pathways for drifting objects in the ocean is severely underestimated by the traditional approach. Neverthe-less, this approach has been applied in most EIAs for Norwegian waters.

One of the reasons for this short-coming is that until recently, limits in available computer resources have made simu-

lations of the ocean circulation with the required high spatial resolution impos-sible for anything but short periods or small geographi cal domains. Also, rele-vant observations, mainly trajectories of ocean surface drifters, have been much too sparse for construction of reliable statistics. To the authors knowledge, the first attempt of including eddy scales in trajec tory statistics in an examination of oil drift in Norwegian waters was made by Johansen et al. (2003; chapter 8).

Eddies and meanders in the ocean result from stochastic processes (instabilities), so a given external forcing, like moment-um from surface winds, can lead to very

As part of the preparations for exploration of off-shore petroleum resources in Norwegian waters, an Environmental Impact Assessment (EIA) is produced , in which the risk of various types of accidents and their potential impacts are described. There is a variety of such impacts, ranging from effects on marine birds and mammals, consequences for fish egg and larvae, and beaching of oil. An assessment of all of these effects requires spatial-statistical descriptions of the mass budget of the spilled petroleum.

A U T H O RArne Melsom

Senior scientist, Section OceanographyNorwegian Meteorological Institute

Statistics of trajectories from ocean currents

different responses in the ocean circu-lation. This is illustrated in Figure 2, where a set of three streamline plots for the ocean surface currents is displayed for the same day. As far as the ocean circu lation model is concerned, each of the three situations are realistic. The only difference in the experiments that lead to these results are small perturbations in the atmospheric forcing. As a conse-quence, in a given region there are dis-tinct ocean trajectory statistics for each passing atmospheric pressure system.

The trajectory statistics from the tradi-tional method with a deterministic re-sponse in the ocean from atmospheric forcing will converge when the weather statistics become saturated. But from the discussion above, it can be concluded that when stochastic smaller scales fea-tures in the ocean are taken into account, we must expect the statistics to converge more slowly, perhaps much more so.

The assessment of impacts in the Lofoten -Barents Sea region (“Utredning av konse­kvenser av helårig petroleumsvirksomhet i området Lofoten – Barentshavet”; ULB) includes trajectory statistics using the traditional approach (Rudberg, 2003) with 40 years of atmospheric forcing. This is supplemented by statistics that are based on simulations which includes stochastic features (Johansen et al., 2003). Howev-er, the latter investigation is carried out with ‘single realization’ results from two separate one year periods only, which is obviously much too little for the statistics to converge. A ‘single-realization’ result in this context is a single representation of the stochastic ocean circulation statis-tics that arises due to prescribed atmos-pheric forcing.

The problem with this approach in an exami nation of effects from stochastic processes is illustrated in Figure 3. This depiction is made on the basis of a 10-member ensemble, and shows results for minimum drift time from a given loca-tion (the black dot in the center/left). The simulation area is divided into 4km x 4km grid cells. First, we determine the mini-mum drift time to each of these cells, in each ensemble member. This leaves us with ten values for each cell (one from each member).©

Cre

stoc

k

Figure 1: Scales in the atmos phere and in the ocean. The red lines in the top panel are isobars at the surface, which is a represen-tation of the streamlines of the atmospheric circulation. Corre spondingly, the blue lines in the bottom panel are equi-lines for sea surface height, which can be inter-preted as streamlines of the ocean circulation. The area in blue in the top panel cor-responds to the region that is displayed in the bottom panel (the full domain of the ocean model is much larger).

Figure 2: Equi-lines for sea surface height in the Lofoten region for the same date, from three different ensemble members. Note particularly the differences in meandering patterns along the Norwegian Atlantic Current (where the streamlines are densest).

Page 9: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

16 17

Next, we select the lowest value in each cell, giving us a representation of the minimum drift time when the stochastic statist ics are represented by our modest-ly sized ensemble. This field is displayed by the shades of gray in Figure 3. Finally, we form a ‘slowest member’ product by select ing the highest value in each grid cell. The contour line for 5 days’ drift from this product is also shown in Figure 3. We note that generally, the 5 day con-tour line from the former product based on stochastic statistics is close to the 10 day contour for the ‘slowest member’ product.

The implication of this discovery is that ‘single realization’ statistics for minimum drift time by ocean currents in the Lofo-ten region, as investigated by Johansen et al. (2003), may misrepresent the statis-tics by up to a factor of 2 since stochastic proces ses affect the outcome.

A ‘single realization’ approach may be valid if the investigation spans a period where similar weather patterns occur multiple times. Thus, the investi gation by Lisæther (2010) which covers a 7 year period with simulation results that include stochastic ocean processes, is a step in the right direction. Nevertheless, 7 years is still not sufficient to describe the atmos pheric varia bility, and much less so the accompany ing ocean circu-lation varia bility.

An important objective in the on-going projects which this article is based on, is to examine how long a time series of results is needed for the drift statistics to converge in the presence of stochastic

oceanic features. From a previous project, a 10 year period with 10 ensemble mem-bers on 4km x 4km resolution is avail-able. Presently, we are in the process of performing a ‘single realization’ experi-ment which will cover about 50 years of forcing by atmospheric reanalysis fields. None of these ocean circulation experi-ments can presently be produced without access to a high perform ance computing facility, in our case NOTUR.

ReferencesEady, E. (1949) Long waves and cyclone waves. Tellus, 1, 33-52.

Helland-Hansen, B. and Nansen, F. (1909) The Norwegian Sea — its Physical Oceanography. Det Mallingske Bogtrykkeri, Kristiania.

Johansen Ø., Skognes K., Aspholm O.Ø., Østby C., Moe K.A., Fossum P. (2003) Uhellsutslipp av olje — konsekvenser i vannsøylen (ULB 7-c). Foundation for Scientific and Industrial Research at the Norwegian Institute of Techno-

logy (SINTEF), Trondheim, 72 pp. (Available online from http://www.regjeringen.no/upload/kilde/oed/rap/2003/0004/ddd/pdfv/181848-7c_miljo risiko_uhellsutslipp_i_vannsoylen-sluttrapport.pdf)

Lisæther, K. (2010) Oppdatering av faglig grunnlag for forvaltningsplanen for Barents-havet og områdene utenfor Lofoten (HFB). StormDrift Report 2010-01, Storm geo (Avail-able online from http://www.regjeringen.no/Upload/MD/vedlegg/hav_vannforvaltning/Forvaltningsplanen_Barentshavet/rapporter/storm_oljedirftsmodellering.pdf )

Melsom, A., and Y. gusdal (2010) A new method for assessing impacts of potential oil spills. In: Coastal to global operational oceanography: Achievements and Challenges. Eds.: Dahlin, Bell, Flemming, Petersson. EuroGOOS publication no. 28, SMHI, Norrköping, Sweden. pp. 531-535.

Rudberg, A. (2003) Oljedriftsmodellering i Lofoten og Barentshavet; spredning av olje ved akutte utslipp til sjø (ULB 7-a). Report 2003-0385, Det Norske veritas (Available online from http://www.regjeringen.no/upload/kilde/oed/rap/2003/0004/ddd/pdfv/174412-oljedriftsmodel lering.pdf)

Pore scale simulations of CO2 sequestration

Figure 3: Contour line shows an estimated perimeter of the region that may be im-pacted after 5 days’ drift from the position of the full black circle. gray shades display the minimum drift time based on the ensemble member with slowest drift. Numbers on the scale are drift time in days, and the examination was made for a 75-day period. The position of the origin has been chosen randomly. (Taken from Melsom and gusdal, 2010).

Acknowledgments The projects which support the activities that are presented here, are the NFR projects “Long-term Effects of Oil accidents on the pelagic ecosystem of the Norwegian and Barents Seas” (LEO) and “Spatiotemporal variability in mortality and growth of fish larvae and zooplankton in the Lofoten-Barents Sea ecosystem” (SvIM), and the EU project “Exploring the potential for probabilistic forecasting in MyOcean” (ProbaCast). All of these projects use and depend on NOTUR resources.

5 10

There is a broad political and scientific consensus that the anthropogenic carbon emission to the atmosphere should be reduced to limit the global warming. One approach to reach this goal is the geologi-cal sequestration of CO2. There are already several pilot plants that aim to capture CO2 emitted from the burning of hydrocarbons. This captured CO2 must subsequently be transported and safely stored in a geo-logical formation from where it cannot be released into the atmosphere for a

geological storage of CO2 is considered to be a crucial factor in reducing the emis-sions of carbon to the atmosphere. Large reservoirs will have to hold millions of tons of CO2 for thousands of years. However, in these massive scale contexts it is the gas and liquid flow that takes place in tiny pores and ducts within the reservoir rocks that ultimately controls the stor-age capacity. The com plex geometry and fluid physics in the pore space create challenges when it comes to numerical simu lations.

predefined period of time. Aquifers and petroleum reservoirs on the Norwegian Continental Shelf are evaluated for such storage.

To further evaluate the geological storage capabilities for CO2 in hydrocarbon reser-voirs and saline aquifers, it is necessary to use large scale reservoir models to inter-pret the multiphase flow in the reservoirs. However, the fluid transport in reservoirs can be considered on many length scales,

Figure 1: Schematic illustration of how constitutive relations of reservoir rocks are included in large scale reservoir modelling.

A U T H O R SHåkon Rueslåtten

Senior geology AdvisorNumerical Rocks AS

Thomas RamstadResearch Scientist

Numerical Rocks AS

Page 10: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

18 19

but the flow takes place inside the pore space of the reservoir rocks, and these processes are governing the flow and stor-age properties of the rocks.

Hence, these large scale reservoir simu-lations are totally reliant upon the input of a sufficient amount of good constitu-tive physical parameters for the various rock types present in the reservoir (e.g. relative permeability and capillary pres-sure). Inclusion of such data into large scale reservoir simulators is illustrated in Figure 1.

Traditionally, these constitutive para-meters are measured experimentally on core samples taken from the wells, but experi ments of this type are generally time- consuming and challenging to perform . As a consequence, experimental data are often sparse.

An alternative approach is therefore, to simulate the behaviour of the fluids confined in the pore system by applying advanced computer modelling on digital images of the rocks. In addition to being faster, computer aided analysis provide much additional information to labora-tory measurements and hence, enrich already existing experimental data. This

has been made possible by improved imaging techniques and rapid increase in computational power.

In the particular case of CO2 storage, good computer simulations of pore scale flow can give fast and reliable information about the quality of a potential reservoir rock to trap and store CO2.

Model buildingA consequence of the scarcity of experi-mental data that are included in the simu-lation model of a potential reservoir for CO2 storage, is that assumptions are made that single data points represent large volumes of the reservoir. These assumpti-ons lead to an over-simplification of the reservoir model and is causing a major uncertainty in the simulations of reservoir behaviour.

Pore scale flow relations vary significantly throughout a reservoir depending on local pore structure, interactions between the fluid phases and fluid-rock interactions. These aspects add challenges to numeri-cal simulations of pore scale transport. Rock micro-structures show very com-plex geometries, and fluid phases give rise to surface effects and local capillary barriers. Predictive pore scale modelling

is hence reliant upon good digital models of the rock microstructure.

Digital rock models can be reconstructed from high resolutions electron microscope images of rock thin sections (Figure 2), or alternatively directly imaged through by high resolution X-ray micro-tomographs. Once a good and representative model of a porous rock is obtained, the actual flow simulations can start.

Flow simulation methodSimulations of gas like CO2 and liquids such as oil and water directly on digital models of porous rocks require sophisti-cated numerical algorithms and large computational resources. However, with this approach there is no need for major simplifications of the pore microstruc-ture, and CO2 migrations can be visualized directly giving insight into the dynamic displacement processes.

The method used for our simulations is a version of the lattice Boltzmann algorithm . It utilizes a kinetic collision scheme between local fluid phases and belongs to the class of mesoscopic simu-lation techniques – between microscopic and macro scopic. Because of this, and unlike most tradition al techniques for computa tional fluid dynamics , the lattice Boltzmann method is capable of capturing micro scopic effects and at the same time reproduce macroscopic behaviour. All rele-vant fluid properties like viscosity , density and surface tension can also be properly included. Snapshots of flow simu lations directly on a sandstone model are shown in Figure 3.

Even though the method is well suited for modelling flow in porous media, it is computationally heavy and parallelization of the programs are necessary in order to obtain satisfactory model resolution. The parallelization has been done using MPI and the simulations have been run on the NOTUR HPC facilities.

Trapping mechanismsThe most important aspect of geo-logical storage of CO2, is that it must be

Figure 2: Scanning Electron Microscopy (SEM) image of a thin section sample from an aquifer of the Ut-sira formation. Such images represent a starting point for cre-ating a digital model of the rock micro-structure.

1

0.1200 ml / hr20 ml / hr

0.01

0.0010 0.2

Rel

ativ

e pe

rmea

bilit

y

0.4 0.6Seff, water

0.8 1

Figure 4: Two pairs of relative permeability curves for two different global flow rates. Black curves are for CO2 and red ones are for water. If the flow rate is increased, this has a profound effect on the relative permeability of the CO2.

Figure 3: Simulated injection of CO2 (right) and water (left) in a digital model of a sandstone. The gas is col-oured red. In the case of water injection, larger bubbles of CO2 are trapped behind the water front.

This work is partly financed through the CLIMIT program administrated by gassnova. gassnova is the state enter-prise for development linked to Carbon

Capture and Storage (CCS) activities.

More info at: www.gassnova.no

Numerical Rocks AS

is company based in Trondheim that

has specialized in numerical model-

ling and simulation of fluid flow in the

pore space of reservoir rocks. Through

the proprietary software tool e-Core

the entire cycle from creating digi tal

rock models to calculating multi phase

flow data is incorporated.

More info at:

www.numericalrocks.com

trapped inside the reservoir for a long period of time; i.e. thousands of years. In that context , good understanding of the trapping mecha nisms for CO2 is of vital import ance when it comes to the evalu ation of potential reservoirs for seques tration. A key element in the pore scale trapping is capillary barriers. These barriers act locally on the interfaces between two fluids that are immiscible in nature. But even though the effects are local , they can have large effects on the permeability of the entire rock. This effect is quantified through the relative perme-ability.

If the relative permeability of a fluid phase is low, it means that its ability to flow through the pore space is suppressed by the other fluid and eventually becomes trapped. In a recovery context of oil and gas, relative permeability of CO2 to be aslow as possible for these fluids is not opti-mal. However, in the case of geological storage of CO2 we want the respective rela-tive permeability to be as low as possible , and hence know about the factors that af-fect it.

It is well established that the relative permea bility of a fluid pair depends on the relative strength of capillary forces versus viscous and buoyancy forces along with the microstructure of the pore space. For low flow rates and strongly immiscible fluids the capillary forces dominate and smaller ganglia of fluids that occupy pores

are trapped. If the flow rate is increased or the surface tension between fluids de-creases, trapped fluid can be mobilized. This will again affect the relative perme-ability of the phases (Figure 4). All in all, these data say a lot about the CO2 storage capacity and how to eventually enhance it.

Conclusion and outlookIn this work we have so far been able to run simulations of CO2 and water on various representative models of aquifers that can potentially store CO2. The CPU resources granted from NOTUR has been of great im-portance for the project with the ultimate goal of providing a commercial tool for geological sequestration of CO2.

Page 11: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

2120

mechanisms applied. In this framework, the ADATE system can be defined as an off-line, heuristic generating, learning hyper-heuristic. The aim is to present the system with a set of training instances to solve, and hope that the generated pro-gram can generalize to solve previously unknown instances with a similar qual-ity. ADATE is thus classified in the same category as genetic programming, gP, even though there are many aspects that sets it apart from mainstream gP. good examples of gP can be found in [Tay and Ho (2008)] and [geiger et al. (2006)], gen-erating dispatching rules for scheduling, where the dispatching rules are evolved based on measurable scheduling primi-tives, like due-date or processing time for a job and simple algebraic functions on these. The solutions (i.e. programs) are represented as syntax trees, which are then combined in a gA fashion using cross-over and mutation. A similar ap-proach, is to base a hyper-heuristic solv-er construction on components of known heuristics, using a grammar to define the relations between these components. The solutions (i.e. programs) are again repre-sented as syntax trees.

Contrasting this, ADATE also has a pool of tentative programs, written in a subset of ML, but they are organized according to syntactic complexity. At each complexity level, the best program is stored, together with a small set of other good programs. New programs are not made by mating with other programs, but by a systematic, complexity ordered, search through the space of program trans-formations. The synthesized programs may contain general recursion, invented

(see box for article reference). The base-line was a highly successful Tabu Search implement ation. The computational results show that the ADATE system is able to generate highly competi-tive code that produces more optimal solutions to hard BOOP instances within given iteration limits than the previously published Tabu Search implementation. The automatically generated code also gives new insights into the general design of meta-heuristic mechanisms, and contains novel search mechanisms, (see box for article reference).

Automatic generation of code is becom-ing an increasingly bigger field, and much work has been done on this. The focus of our research lies in finding and identi-fying automatically generated improve-ments to meta-heuristic search compo-nents. In particular, the focus is on the move selection function for a local search meta-heuristic, and to see if automatic programming of this function can lead to improvements in the search meta-heuristics as well as general insights into the design of meta-heuristics. Automatic Design of Algorithms through Evolution (ADATE)(see box for article reference) is a system for general automatic program-ming in a first order, purely functional subset of Standard ML. ADATE can synthe-size recursive programs for standard al-gorithm design problems such as sorting, searching, string processing and many others. In order to claim that a system performs general automatic program-ming, it should be able to effect ively generate either recursive calls or loops, where ADATE naturally uses the former since it is based on functional program-

ming. ADATE cannot only generate recursive calls to functions with known signatures, but also invent new recursive help functions as they are needed. It is very interesting to see if and how a sys-tem like ADATE can transform state- of-the-art local search based meta -heuristic code mechanisms to something better, and analyze the resulting code. Also of interest is how the allotted search-time (in terms of the number of iterations for the final program) affects the outcome, and how the sequence of program trans-formations progresses. There is not much work in the literature on the automatic generation of meta-heuristic optimiza-tion code. Fukunaga has made a genetic system for configuring local search heu-ristics for the Boolean satisfiability test-ing problem, (SAT). The approach used is tailored to the SAT problem, using a rather restricted composition function. Focus is on selecting the right combi-nation of minor heuristic components, mostly concerned with variable selection . This system is able to reproduce most of the popular heuristics for SAT. (It cannot , however, reproduce the heuristic R-Novelty). Most other approaches, like hyper-heuristics use high level build-ing blocks such as already functioning meta-heuristics. The ADATE system can also be regarded as a hyper-heuristic. According to Burke, a hyper-heuristic is a search method or learning mechanism for selecting or generating heuristics to solve hard computational search prob-lems. From this definition, there are two distinct sub-categories, heuristic selec-tion and heuristic generation. One might also distinguish between on-line and off-line learning depending on the feedback

Automatic Generation of Optimization Code

A U T H O R SArne Løkketangen

Professor Dr. ScientMolde University College

Specialized University in Logistics

Roland OlssonAssociate Professor Dr. Scient

Department of Computer ScienceØstfold University College

The project is undertaken by Profes-sor Arne Løkketangen at Molde Univer-sity College and 1. Amanuensis Roland Olsson at Østfold Regional College. They have known each other for many years, and two years ago they launched the project described here. They both have a computer science background, but later work has diverged considerably. Olsson has specialized in automatic program-ming, and has developed the ADATE system used in this project, while Løkke-tangen has specialized in meta-heuristic search methods for combinatorial optimi-zation problem, mostly for problems in the supply chain, and provides the domain area for the project. Their team will also be strengthened by a PhD student, start-ing this spring (2011).

Our aim in this project is to start with state-of-the-art optimization code, as published in good international journals, and improve the code using the ADATE system. As will be shown later, this is a very CPU-intensive task, but so far very successful. The problems we want to solve are in the class of NP-hard combinatorial optimization problems. The term combinatorial indicates that these problems have components of discrete choice. Well-known examples are the travelling salesman problem, TSP, and the knapsack problem, in many variants. These types of problems are very frequent in a supply chain, and involves production and distribution among other things. Many of these problems are too difficult for exact solution methods to find optimal solutions in reasonable time, so in practice one often has to settle for “good” solutions, and not the provable optimum.

Even small improvements in the quality of solutions can have a large impact on the bottom line in companies. The level of improve ment (i.e. cost savings) when using computer aided optimi zation methods for e.g. the vRP, are typically 15% when compared to a manually laid distribution plans. (The vRP – vehicle Routing Problem , is a generic depot-to-customer distribution problem)

Local Search based meta-heuristic methods for finding good solutions to such combinatorial optimization problems have attained a lot of success recently. A plethora of methods exist, each with its own successes, and also with its own para-meter settings and other method-specific details. At the same time, experience is needed to implement highly competitive code, and some of the experience applied is not easy to quantify . ADATE is a system to automatically generate code based on a set of input-output specifications, and can work in vastly different domains. It generates code in a subset of the pro-gramming language ML and works by searching for transformations of purely functional ML programs. We have used computing resources at Notur for the last two years to apply ADATE to improve me-ta-heuristic search with great success, and are continuing this work.

Code automatically generated by the ADATE system compares with, and surpas ses, state-of-the-art handcrafted meta-heuristic optimization code. In our first work along these lines, the pro-grams generated by ADATE targeted the move selection part of a solver for BOOP – Boolean Optimization Problems

Most computational problems handled by the Notur clusters seem to be based on vector and matrix calculations. Our project Automatic Generation of Optimization Code is quite different, but still requires a lot of CPU-power.

Page 12: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

22 23

Time complexity for ADATE runsThe biggest obstacle for large scale practical application of many evolu-tionary computation methods, includ-ing ADATE, is the huge number of CPU hours required to run them. If we ignore the time required for ADATE’s syntactic manipulations of programs, the overall run time is proportional to the average execution time of a newly synthesized program multiplied by the number of executed programs. The execution time of a program obviously depends on the number of training inputs and how long it executes for each input. The latter may increase by a factor of 1000 or more when going from a simple example such as permutation generation to a more com-putationally demanding one such as heu-ristic search on hard instances. However, the most difficult factor to analyze is the total number of programs that need to be executed during an evolution. Due to ADATE’s a-little-bigger-a-little-better order ing of its so-called base individuals , the total kingdom cardinality at the end of an evolution will be proportional to the size of the biggest program in the king-dom. Each of the programs in the king-dom will have had a number of children determined by the iteratively deepened cost limit mentioned above. The cost limit that is needed for a given level of program transformation complexity is proportional to the size of the program being transformed. Thus, both the king-dom cardinality and the cost limit are proportional to maximum program size. If the kingdom were to be incrementally filled with programs that never are re-placed by any others, the total number of executed programs would be propor-tional to the square of program size for a given maximum transformation com-plexity at any given position in a program. However, since programs in the kingdom frequently are knocked out by new and better ones, the total number of execut-ed programs is typically between O(smax

2) and O(smax

3), where smax is the maximum size of any program in the kingdom.

For detailed workings of ADATE, see box for article reference.

Results and findings so farMany of our findings are already pub-lished. Our state-of-the-art meta-heu-ristic for Boolean optimization problem was clearly beaten by the improved code generated by ADATE. As ADATE works by program transformations, there is an unbroken chain of ancestors from the final program back to the initial (or start) program . For example, one of our improved programs had 18 ancestors. This means that we, after development, can trace the computational complexity and quality of the generated programs, as well as identify the actual transfor-mation defining each new generation. Figure 1 shows the program Quality as it develops over time. The blue diamonds indicate the ancestors. As can be seen, the development is not monotonous. De-velopment is also fast in the beginning, and slower towards the end. Similarly, Figure 2 shows the corresponding se-mantic complexity of the generated pro-grams. (As is evident, there is no bloat-ing, a very common problem in socalled genetic programming).

Of particular interest is the observation that ADATE is able to re-engineer exist-ing state-of-the art code, not only making it better, but also discovering new search paradigms. For the BOOP-testing, in ad-dition to producing better, more efficient, code, the generated code contained a meta-heuristic component not before tried or tested in the literature. Later independent testing has shown that this is a significant improvement. (For those specially interested: The tabu tenure is positively correlated with the selected move quality).

Recent testing on improving heuris-tic methods for classes of Satisfiability problems have similarly given us pre-liminary results where ADATE discovers the same improvements to the meta-heuristic as the original inventor on a state-of-the-art local search method, and even reinvents Tabu Search. (These findings will be presented at MIC 2011 – Metaheuristics International Confer-ence, Udine, Italy, July 2011).

As stated above, our focus has recently been on improving state-of-the-art local search heuristics for random 3-SAT, a field that has enjoyed much attention , and where very good solvers exist. Even so are we confident that ADATE will be able to improve on these significantly. Depend-ing on our findings, we might also apply ADATE on other search methods in our domain.

8 260

8 280

0

Aver

age

valu

e

20 000 40 000

Time

60 000 80 000 100 000

8 300

8 320

8 340

8 360

8 380

8 400

8 460

0

100

0

Com

plex

ity

20 000 40 000

Time

60 000 80 000 100 000

200

300

400

500

600

700

Figure 1. Program Quality vs Time

Figure 2. Program Complexity

ArticlesOlsson R. (1995). Inductive functional programming using incremental pro-gram trans-formation, Artificial In-telligence 1, 55-83.Hvattum L.M. , A. Løkketangen and F. glover.(2004). Adaptative Memory Search for Boolean Optimization Problems. Discrete Applied Math­ematics, special issue on boolean and pseudo­boolean functions, vol. 142, pp 99 – 109. Løkketangen A. and Olsson R. (2010). generating Metaheuristic Optimi-zation Code using ADATE. Journal of Heuristics, special issue on Hyper-Heuristics. , volume 16, Number 6, 911-930

auxiliary functions and numerical con-stants that are optimized by ADATE. At any one time around 1000 different pro-grams are in the program pool, ready to be altered. As is pointed out in [Burke et al. (2007)], which generates heuristics for the bin-packing problem, the gener-ated heuristics can have validity outside the set of problems it is trained on. We observe similarly that even though the programs generated by ADATE were designed to run for only 100 or 1000 it-erations, they remain competitive for a much larger range of search efforts.

More about ADATE - Automatic design of algorithms through evolutionADATE is a system for automatic pro-gramming that generates purely func-tional programs that may contain gener-al recursion, invented auxiliary functions and numerical constants that are opti-mized by ADATE. In practice, ADATE may be used either to generate a program from scratch or to improve one or more parts of an existing program. In this pa-per, we have used the latter alternative.

ADATE is well suited to applications where important parts of the code are experimentally optimized. The design

and implementation of heuristics is such an application. Automatic synthesis of heuristics as described in this paper is an especially attractive application of automatic programming since it is no-toriously difficult to design heuristic al-gorithms that give very high quality solu-tions for a given class of instances within tight time constraints.

More than a decade of full-time research has been spent on developing the ADATE system which means that it is difficult to fully review it here. However, we hope that the following overview will provide a basic understanding of how it works.

ADATE maintains a hierarchically struc-tured kingdom of programs, similar to the taxonomy of Linnaeus. Since a so-called population in evolutionary com-putation rarely has a well-defined hier-archical structure, we have chosen to borrow the term kingdom from Linnean taxonomy instead of using the term pop-ulation.

The most important taxa in the ADATE taxonomy are families which are divided into genera which in turn consist of spe-cies. A species consists of programs that typically are quite similar and many of which are on the same plateau in the fitness landscape. The programs in a species are generated from one single founding program, which we could call Adam or Eve, using compound program transformations. ADATE maintains a number of different sets of founding pro-grams for each level of syntactic com-plexity. Such a set of founding programs is a genus and all genera for a given level of syntactic complexity are a family.

A basic principle used to organize the kingdom is that a program should be better than all smaller ones found so far. Program transformations in varying combinations are employed to produce new programs that become candidates for insertion into the kingdom. The search for program transformations is mostly systematic and does not rely on randomization for purposes other than introducing new floating point constants.

Page 13: The QUIET experiment · The discipline enhances scientific investigation by enabling practitioners to build and test models of complex phenomena, yielding new information, innovation

Upcoming EventsUpcoming Events

www.notur.no Return address: UNINETT Sigma AS, 7465 Trondheim, NORWAY

Subscription: If you would like to subscribe to the paper version of the META magazine, please go to www.notur.no/publications/magazine/

Upcoming Even

The Notur II project provides the national einfrastructure for computational science in Norway.

The infrastructure serves individuals and groups involved in education and research at Norwegian universities, colleges and research institutes, operational forecasting and research

at the Meteoro logical Institute, and other groups who contribute to the funding of the project. Consortium partners are UNINETT Sigma AS, the Norwegian University of Science and Technology (NTNU), the University of Bergen (UiB), the Uni-versity of Oslo (UiO) and the University of Tromsø (UiT). The project is funded in part by the Research Council of Norway and in part by the consortium partners.

The Notur project is complemented by two other projects that are financed in part by the Research Council of Norway . NorStore is a national infrastructure for scientific data. The Norgrid project deploys and operates non-trivial services for work load management and aggregation of the resources provided by the Notur and Norstore resources.

NOTUR2011 is the tenth annual meeting on High Performance Computing and Infra structure for computational science in Norway. The meeting is intended for everyone that works with computer- and data-intensive applications. The 2011 conference subject will be “Science in the clouds - doing research using shared and distributed science facilities”. A list of (international) speakers will present best practices and state-of-art approaches. The meeting is also an opportunity to discuss with colleagues and share experiences, opinions on the conference’s as well as related topics.

The NOTUR2011 conference will be hosted by the University of Oslo in the main auditorium in the new Ole-Johan Dahls building (IFI-2) at the Department of Informatics on Thursday and Friday May 26-27. Workshops on cloud computing and using e-Infrastructure will take place May 23-25.

For more information: www.notur.no

EGI User ForumApril 11-15 2011, Vilnius, Lithuaniahttp://uf2011.egi.eu

DEISA PRACE Symposium 2011 April 13-14 2011, Helsinki, Finland http://www.prace-project.eu/events/

The 26th NORDUnet Conference June 7-9, 2011, Reykjavik, Iceland http://www.nordu.net/conference/

ISC’11 - International Supercomputing Conference June 19-23, 2011, Hamburg, Germanyhttp://www.supercomp.de/isc11/

Euro-Par 2011 August 29 - September 2, 2011, Bordeaux, France http://europar2011.bordeaux.inria.fr/

The Partnership for Advanced Computing in Europe (PRACE) allows researchers from across Europe to apply for time on Europe’s largest high-performance computers via a central peer-review process. Calls for proposals for computer time on PRACE machines are issued regularly.

Preparatory Access allows researchers to apply for code scala bility testing and also support for code development and optimisation from PRACE software experts. Prepara-tory access allows researchers to optimise their codes before responding to regular project calls. Standard production runs are not allowed as part of preparatory access.

Preparatory access calls are rolling calls, researchers can apply for resources all year. There are no closing dates.

Proposals must be submitted via the PRACE website at:http://www.prace-ri.eu/hpc-accessAll open calls are listed at http://www.prace-ri.eu/Calls-for-Proposals

NOTUR2011 – Oslo, May 26-27, 2011

Upcoming Events PRACE: Preparatory Access to Europe’s largest high-performance computing systems

Photo: Per Ervland