Transcript
Page 1: Towardswebmodelingandsimulationwithbigdatapaoluzzi.dia.uniroma3.it/web/pao/doc/spm2012.pdf · and the emerging business of PLM systems for aerospace, automotive, naval, and manufacturing

Towards webmodeling and simulation with big data

Antonio DiCarlo b, Enrico Marino a, Alberto Paoluzzi a, and Federico Spini a

aDipartimento di Informatica e Automazione, Universita Roma Tre, Via della Vasca Navale, 79 I-00146 Rome, ItalybDipartimento di Strutture, Universita Roma Tre, Via Corrado Segre, 6 I-00146 Rome, Italy

Abstract

In this paper we explore the main challenges that solid modeling and computational physics communities are going to face, anddiscuss developments that seem to us like plausible next steps. Starting from lessons the older of us learned during the pastthree decades, we consider the size of some typical modeling and simulation problems in the new world of mobile, ubiquitous andembedded cloud computing, and introduce a novel approach towards establishing a general out-of-core framework for computingthe sort of problems our current software is unable to deal with. A few preliminary experiments in this direction, based on cutting-edge technologies, are presented. Some conspicuous social implications and the related constraints are briefly discussed.

Key words: Geometric and topological representations (primary keyword), Biomedical applications of shape and solid modeling, Physically-

based concepts for shape modeling, Mesh generation, cloud computing, large-scale web applications, solid modeling, computational physics

1. Introduction

In this paper we aim to present our view of a novel com-prehensive approach to problems in science and technologythat require solid modeling and multiphysics simulation onbig data sets, using at its best the computational infrastruc-ture that the emerging web-as-a-platform and platform-as-a-service paradigms are already providing us with. Pursu-ing such an ambitious objective necessitates most of themethodologies underlying solid and physical modeling tobe rethought from scratch. In fact, up to now, they havebeen primarily oriented towards the optimization of localresources and the implementation of ad hoc solutions toeach different class of physical-mathematical problems andtheir supporting data structures. Conversely, we claim that,on account of the opportunities offered (and the constraintsimposed) by the novel web platform, we should now headin the opposite direction, towards the availability of sim-ple, general-purpose and dimension-independent geometricdata structures and computational methods.

In the old days—around 1975—solid modeling hadits start with the first raster graphics terminals, vectorplotters and 16/32-bit minicomputers (Digital PDPs andVAXs, HP series 1000 and 3000), using Fortran 66/77,structured programming, proprietary operating systems,memory segmentation, and COMMON areas. The supportfor disk storage was based on the first relational databasesfor Computer-Aided Design [14]. The earliest modeling

languages and systems (PADL-1/2, TIPS, BUILD) werecreated in that computational era [21]. In the 1980’s, thediffusion of the UNIX operating system and the C pro-gramming language on one side, the PC revolution on theother, together with the ever decreasing cost of internalmemory and the appearance of 32-bit workstations, pro-duced over fifteen years a shift towards non-manifold repre-sentations, supported by middle-level topological libraries(Euler operators and graph operations) with sophisticatedlow-level implementations based on the use of dynamicRAM. Concurrently, from research developments in nu-merical analysis, B-splines and NURBS emerged as theubiquitous and most useful mathematical tool to supportboundary representations of solids, and the first geome-try kernels were created in academia. Later transformedinto commercial software and supported by big invest-ments, these geometric kernels became, with Windows NTcompetitors confronting high-end Unix workstations, thefoundational framework for all commercial solid modelersand the emerging business of PLM systems for aerospace,automotive, naval, and manufacturing industries.

Today, the ICT world and its technologies are chang-ing at a furious pace. But conversely, despite the tremen-dous amount of research done and the continuing techni-cal advances in computer-aided design, geometric comput-ing and scientific visualization, the most widely used soft-ware tools in the PLM industry still follow the basic ap-proach established about twenty years ago, centered around

Page 2: Towardswebmodelingandsimulationwithbigdatapaoluzzi.dia.uniroma3.it/web/pao/doc/spm2012.pdf · and the emerging business of PLM systems for aerospace, automotive, naval, and manufacturing

non-manifold topology, boundary representation, NURBScurves and surfaces, and therefore requiring appropriatepre-processors (usually Delaunay triangulations) towardsPDE solvers and post-processors towards graphics render-ers and user interfaces. This is not to negate that a lot ofwork was done and is being done on using computationalclusters and porting pre-existing software tools to novel sys-tem architectures, more and more oriented toward vectorprocessing and many-core chips. However, it is our convic-tion that time is ripe to radically rethink the fundamentalsof solid modeling.

Current transformations occurring in the discipline stemboth from its internal evolution and from the demands fromthe external world. A few years ago, solid modeling—as sev-eral other branches of computer science—came to be con-sidered a mature field: its central concepts and algorithmsenjoyed a general consensus, its user basis was large, andits most used methods had solid commercial implementa-tions. Then, researchers in the field started looking aroundin search of stimuli and ideas from the scientific communi-ties closer to the boundaries of their own discipline.

The main conference in the field—whose title waschanged adding ‘physical’ to ‘solid’ modeling—was unifiedwith the cousin conference on ‘geometric’ design and gotcommon sponsorship from ACM and SIAM, the two refer-ence learned societies. This trend towards the unificationof solid, geometric, and physical modeling materializesin works such as some recent contributions on trivariatesplines over T-meshes of polycubes [26,17,27]. These im-pressive results constitute an excellent example of how solidmodeling is striving away from boundary representationsrequiring off-line meshing to support physical simulations,to move towards more general cellular decompositions.

At the same time, new challenges are posed by old andnew application fields, namely, material science (think ofengineered surfaces, nanomaterials and metamaterials) andbiomedicine, where modeling and simulation issues rangefrom the molecular/protein level to multiscale modeling ofsubcellular organelles, cellular structures, tissues and or-gans. These problems set demanding requirements, rangingfrom cooperative support to multiphysics software, wheredifferent field equations imply different geometric struc-tures at the level of basic descriptive data, to robustnesstoward scale mismatch in coupled problems, huge complex-ity of the simulation environment, or terascale number ofelementary entities or agents.

The following sections are organized as follows. In Sec-tion 2 we discuss the technological advances that are mov-ing the ICT field towards new frontiers. Section 3 summa-rizes some novel approaches towards the integration of ge-ometry and physics into a unified conceptual framework,not yet implemented in computational systems. Our visionon modeling and simulation on the web platform is pre-sented in Section 4, together with an introduction to thelibrary of web-based modeling tools currently under devel-opment by our group. In Section 5 we sketch out the basicfeatures that, in our view, should characterize today’s in-

novative approach to solid and physical modeling.

2. Technologies of a new world

In this section we review some patterns of change we areexperimenting with our computational infrastructure, anddrive the reader’s attention toward the grand perspectivesthat open new possibilities for modeling and simulationapplications to predictive biomedicine, the design of newdrugs and the personalized simulation of their effects.

2.1. Data on the cloud: the inexhaustible resource

Somebody started to call ‘Hypernet’ the sum of the oldInternet plus all the new cellular and mobile technologies,such as smartphones and tablets, and the related socialgrids that share data, news and applications.

The current big trend is towards delivery of computingas a service, where shared resources, software, and dataare provided as utilities over a network, like power fromthe electric grid. The aim of cloud computing is to concen-trate computation and storage in a carefully and expertlymanaged core, where high-performance machines are linkedby high-bandwidth connections [15], while infrastructureproviders, like Amazon, Google, IBM, Sun, and others, al-low users to access remote infrastructures. In this way, com-putational power and storage appear to the end users asif they were inexhaustible resources, dynamically availableat any time and in any quantity.

This offer may be seen as a shift from product to ser-vice economy, where the back-end role moves from personalcomputers, with their client-server model, to HPC infras-tructures. Cloud computing also offers end users advan-tages in terms of mobility and collaboration. For example,the recent announcement of AutoCAD 2013 boasts newtools to connect with cloud-enabled services allowing theproduct users to access their data from anywhere and withany device, and to collaborate with colleagues in remotelocations.

New collaboration models are also emerging, like service-on-demand and mutualization of clusters, since renting thecloud infrastructure could be too costly for small busi-nesses. In order to reduce costs, small and medium enter-prises might join to purchase and maintain a common in-frastructure to be shared among them [11]. This infras-tructure may host several softwares offered as service-on-demand, and a quality of service level is granted to eachbusiness according to the investment on the cluster. Thesharing of clusters may be handled in the cloud in order tomake it transparent to customers.

2.2. Zero installation: the browser is the interface

With the client-server infrastructure that reigned in theindustry in the last twenty-five years, one had to install,

2

Page 3: Towardswebmodelingandsimulationwithbigdatapaoluzzi.dia.uniroma3.it/web/pao/doc/spm2012.pdf · and the emerging business of PLM systems for aerospace, automotive, naval, and manufacturing

maintain and update the operating systems and the appli-cation softwares in a myriad of locations. More recently, thedeployment of standardized virtual machines within wholedepartments of a large organization has reduced both thesecurity risks and the cost of maintenance. The emergenceof the web as a platform is now moving the core of thecomputation even outside of the server room, thereby re-ducing the competitive advantage of larger organizationsover the smaller companies. For most applications, the en-tire user interface may reside inside a single window in aweb browser [15].

What is more important is that today everybody is expe-riencing an ubiquitous access to information and data withthe new generation of mobile devices such as tablets andsmartphones. It is an easy prediction to say that tomorrowwe will interact with images, models and predictive simu-lations from similar devices, currently being designed withan ever-increasing embedded computing power (in partic-ular for high-end graphics rendering) and soon even with3D fruition and interaction.

2.3. Integrative biomedicine

Technological advances have made it possible to acquirelarge sets of biomedical data at a fast rate with affordablecosts. As a consequence, the ease of producing and col-lecting data in digital form over the Internet is causing agradual shift of paradigm in the approach to science andtechnology: from physical prototyping and testing to vir-tual prototyping and mathematical modeling to simulateand predict behavior and performance.

The most advanced biomedical research deals with prob-lems of increasing complexity, for which the traditionalapproach—based on a predetermined subdivision of bio-logical systems according to anatomical sub-systems, sci-entific and medical specializations and physical scales—istotally inadequate. As a consequence, it is necessary to pro-vide computational support to an integrative approach tobiomedical science, aimed at combining observations, the-ories and predictions across different temporal and dimen-sional scales, specialization boundaries and anatomical sub-systems.

The urgency of this need has prompted a number ofefforts: integrative biology, system biology, the PhysiomeProject, the Virtual Physiological Human (VPH), et cetera.The term physiome, whose root meaning is the descriptionof the functional behavior of the physiological state of anindividual or a species, has come to refer to the multiscalemodeling of human physiology using mathematical meth-ods and computational techniques, accommodating cross-disciplinary science (chemistry, biology, physics, computerscience) and a breadth of dimensional and temporal scales(sub-cellular to organs, sub-microsecond to tens-of-years).It is widely recognized that evolving physiome activities willincreasingly influence medicine and biomedical research,with an ever increasing demand for specific and robust com-

putational platforms.

3. Towards novel solid and physical modeling

3.1. Modeling inside boundaries: cellular complexes

Conventional, widely used approaches to solid modelingare rather limited in scope, being typically restricted to cer-tain classes of triangulations or tensor-product domains in2D or 3D, and most often confined to boundary representa-tions. Contrariwise, we need out-of-the-box computationalrepresentations and methods that support geometrical andphysical computations on meshes of any sort.

All meshes—partitioning either the boundary or the inte-rior of the model domain—and all physical quantities asso-ciated with them are properly represented by chain/cochaincomplexes. A chain complex is a sequence of linear spacesof d-chains, 0 ≤ d ≤ n, together with a sequence of linearboundary operators ∂, each mapping the space of d-chainsinto the space of (d−1)-chains. Cochain complexes are dualto chain complexes; the coboundary operators δ, mappingthe spaces of d-cochains into the spaces of (d+1)-cochainsare dual to the boundary operators ∂.

The chain/cochain representation [13] captures formallyand unambiguously all the combinatorial relationships ofabstract, geometrical, and physical modelling, via the stan-dard topological operators of boundary and coboundary.This representation applies to all cell complexes, with norestriction on type, dimension, codimension, orientability,manifoldness, et cetera. Moreover, this approach unifiesgeometrical and physical computations in a common for-mal computational structure. In particular, huge geometricstructures may be properly and efficiently represented by(sparse) adjacency matrices, and therefore efficiently ma-nipulated with tools from computational linear algebra, inparticular on the highly parallel vector GPUs of the lastgeneration.

3.2. Integrating geometry and physics

In all physical field theories, different physical quantitiesare associated with geometrical objects of different dimen-sion, i.e., with d-cells, d possibly ranging from zero to thetop dimension [22,23]. This does not mean that cells of alldimensions are actually operated on in each specific the-ory. In principle, however, the whole hierarchy of cells couldbe visited and should therefore be readily and efficientlyaccessible. Consequently, in cell-based approximations tofield theories, fields are naturally represented by d-cochains(taking values in suitable target spaces): the chain-cochainduality yields the content of the quantity represented bythe cochain in the support of the chain. A few basic exam-ples, discussed in Sections 3.2.1 and 3.2.2, may clarify thispoint.

3

Page 4: Towardswebmodelingandsimulationwithbigdatapaoluzzi.dia.uniroma3.it/web/pao/doc/spm2012.pdf · and the emerging business of PLM systems for aerospace, automotive, naval, and manufacturing

3.2.1. ElectromagnetismClassical (Maxwellian) electromagnetism lives in a 4-

dimensional setting, the product of a 3D space manifold andthe 1D time line. Charge is associated with 3-dimensionalcells. Two distinguished kinds of 3-cells may be considered:(i) space cubes times an instant, and (ii) space faces timesa time interval. When evaluated on 3-chains of the firstkind, the 3-cochain representing charge yields the chargecontained in a certain region of space at a given time (inthe limit of infinite refinement, this defines the notion ofinstantaneous charge density); when evaluated on 3-chainsof the second kind, the same 3-cochain yields the chargeflowing through a certain surface in space during the giventime lapse (in the limit of infinite refinement, this definesthe notion of instantaneous charge flux).

One of the celebrated Maxwell equations states that thecharge 3-cochain admits a potential, i.e., it is the cobound-ary of a 2-cochain: this potential, when evaluated on theboundary of a 3-chain, equals the value of the charge 3-cochain on that 3-chain. The boundary of a 3-chain of thefirst kind consists of space faces times an instant, whilethat of a 3-chain of the second kind comprises also spacelines times a time interval (this dichotomy brings aboutthe notions of electric flux density and magnetic field in-tensity). Maxwell’s construction is completed by the Fara-day 2-cochain (related to the electric field intensity and themagnetic flux density) and its potential 1-cochain (encom-passing the scalar and the vector potential) [24,8].

To sum up, Maxwellian electromagnetisms operates onspace-time cells of dimension ranging from 1 to 3. They donot span the full space-time hierarchy; however, to buildthem, the whole hierarchies of space cells and time cells areneeded.

3.2.2. MechanicsClassical mechanics deals with chunks of body-time,

rather than space-time, the dimension n of the bodymanifold being anything between 0 and 3 (extremes in-cluded) [12]. The dichotomy continuum vs. discrete corre-sponds to whether n is greater or equal to zero: discrete(0D) body manifolds are finite (but typically extremelylarge) sets of body-points; 1D continua model filaments,strings, rods, jets, streams; 2D continua model membranes,sheets, plates and shells; 3D continua are thought to fillopen sets in physical space. Mechanics distinguishes be-tween kinematical and dynamical quantities, the formerbeing associated with body-time cells of small dimension(0 or 1), the latter with body-time cells of small codimen-sion (0 or 1). In this respect, mechanics is simpler (and lessdepending on dimension) than electromagnetism. How-ever, mechanics is more complex in another respect: whileall electromagnetic quantities are real-valued, mechanicalquantities (both kinematical and dynamical) take valuesin more or less elaborate tensor manifold.

The basic kinematical information is encoded into the0-cochain that attaches a position in space (and possibly

other pointwise state descriptors, such as order parame-ters) to each elementary 0-chain (a body-point times aninstant). The coboundary of this cochain yields the differ-ence in placement of the same body-point at two differenttimes, when evaluated on a time-wise 1-chain (a body-pointtimes a time interval); when evaluated on a body-wise 1-chain (a body-line times an instant), the same 1-cochainyields the difference in placement between two body-pointsat the given time (in the limit of infinite refinement, thisbrings about the notions of velocity and body gradient ofplacement). Of course, body-wise 1-chains only exist in con-tinua, and this makes the key difference between discreteand continuum mechanics. This is also why the geometri-cal apparatus of the mechanics of discrete systems is muchsimpler than that of continuum mechanics.

Another essential piece of information, bridging betweenkinematics and dynamics, is the 0-cochain that delivers thetest velocity, i.e., the infinitesimal difference between theactual position of a body-point and the position assignedto it at the same time by a putative juxtaposed placement.Dynamics is encoded into a pair of impulse-valued cochains,the impulse-supply (n+ 1)-cochain and the impulse-fluxn-cochain, and the real-valued duality involving the test ve-locity 0-cochain and the two impulse-valued cochains thatdelivers the work (n+1)-cochain. Balance laws stem fromthe general principle that the work done on any test veloc-ity over any body-time (n+1)-chain should be zero.

3.3. Extra-large data sets: out-of-core spatial indexing

Contrary to what is taken for granted in establishedmanufacture and PLM procedures, in novel applicationsof modeling and simulation methods—think of integrativebiomedicine (Section 2.3) as a paradigmatic example—theshape of the simulation domain cannot be considered asa priori known. In general, it has to be computed con-currently, taking into account chemical, electrical andmechanical interactions. Even more importantly, thesenovel applications are characterized by the enormous sizeof the computer models they work on. The terascale—andeven larger—data size is due to several factors. First of all,simulations require cellular decompositions, instead of themore compact boundary representations. This fact aloneaccounts for one order of magnitude increase in model size.Secondly, the common factoring of repeated substructures,exploited to a great extent by large-scale hierarchical mod-els in CAD and computer graphics, cannot be benefitedfrom when dealing with large deformations, thus produc-ing an increase of several orders of magnitude in the modelsize. Finally, consider the sheer number and complexity ofcomponents: several thousands of atoms in a protein, tensof thousand proteins in a cell, billions or trillions of cells inan organ [5].

When the number of data elements to deal with is ex-tremely high, out-of-core distributed computational issuesmust be addressed for model scaling and segmentation,

4

Page 5: Towardswebmodelingandsimulationwithbigdatapaoluzzi.dia.uniroma3.it/web/pao/doc/spm2012.pdf · and the emerging business of PLM systems for aerospace, automotive, naval, and manufacturing

multigrid extraction and remeshing, fast progressive visu-alization, and predictive simulations. Model data must bespread across a distributed system using spatial indexing,in order to accelerate the execution of retrieval operations,normally driven by spatial proximity. For the sake of effi-ciency, the distribution itself must be driven by data ac-cessibility, so that large sets of queries, normally operatedby spatial closeness, can be efficiently—i.e., coherently—cached, and the quantity of data to be transferred acrossthe network or from disk to memory is minimized.

4. Modeling on the web platform

4.1. JavaScript and friends

The world is moving fast from proprietary platforms toubiquitous and standardized access to data and computingresources over the web. JavaScript, the language embed-ded in all browsers, is taking more and more momentum,resulting in a continuous updating of ever faster transla-tors into native codes, progressive client-side and server-side language moves, and increasing support of fast oper-ations over huge distributed databases of both structuredand unstructured data. Most modern computer languagesoffer today a compilation to JavaScript, and last genera-tion browsers support fully transparent access to embed-ded or parallel computational resources (GPUs, multi- andmany-core CPUs, distributed and parallel computing) us-ing WebGL and WebCL, i.e. the web-oriented implemen-tations of OpenGL and OpenCL, respectively. Let us re-call that OpenGL is the industry’s foundation for high-performance graphics, and OpenCL is the open standardfor the parallel programming of heterogeneous systems.JavaScript has always had an awesome object model at itsheart. CoffeeScript, the little language that compiles on-the-fly to JavaScript, is a recent compelling attempt to ex-pose the best parts of JavaScript in a simple way, borrowingsyntax from Haskell, Python and Ruby, the most successfulnew languages of the last fifteen years.

4.2. A geometry kernel for the web platform

In the past two decades our group in Rome developedPlasm [19,20,18], a geometric extension of a subset of FL,the functional programming language based on combinato-rial logic developed by John Backus’ group at IBM [2,4,3].Along the years, Plasm was reimplemented several timesand embedded in Common Lisp, Scheme and Python, us-ing a C++ geometry kernel based on multidimensional rep-resentations (hierarchical polyhedral complexes and pro-gressive BSP trees). This geometric language and the un-derlying kernel proved to be quite successful for modelingcomplex buildings and archeological recontructions. Re-cently, they were also used for fast development of virtualenvironments aimed at empowering distributed systems ofvideo surveillance and security of small areas. In the past

few years, we envisioned the need for parallel extensionsto modeling and simulation of biosystems [5], and startedsome modeling experiments in the area of biomedicine.

Several months ago we started developing a JavaScripttoolbox, inspired by Plasm, to support geometric and phys-ical computations on meshes of any sort. The aim of thislibrary is to provide scientists and engineers with friendlyand effective tools for dealing with computational modelsthat require demanding field problems to be formulatedand solved on discrete domains of any complexity and di-mension. The library allows for hierarchically organized de-compositions and supports multiresolution and multigridtechniques, providing both local and global access to cellcomplexes and their associated (co-)chain complexes forgeneration, analysis and editing at local, intermediate andglobal levels. In order to make it available on the emerg-ing web platform, the library is being written in a mixtureof JavaScript and CoffeeScript. For the sake of efficiency,critical parts of the code are being implemented in Web-CL and WebGL, thereby allowing for its streamlined useon multicore computers, computational clusters and super-computers. In Sections 4.3 to 4.5 we highlight the designchoices and the fundamental architecture of the geometrickernel and its basic representation schemes, which makeuse of well-blended numerical and symbolic algorithms.

4.3. Cell complexes: polytopal, cuboidal, simplicial

Our novel multidimensional geometry kernel is basedon a hierarchy of cellular decompositions. The corre-sponding classes (in CoffeeScript) or prototype objects (inJavaScript) are all derived from the (Hasse) Graph class,whose nodes represent the various cells of the discrete de-composition of the model. A precise characterization ofthe set of valid decompositions and of their refinements(local or global) is given below. Each type of cell complex(Polytopal, Cuboidal, and Simplicial) contains the special-izations of a Topology class and a PointSet class, withappropriate methods and properties. Decompositions withcurved cells are obtained by attaching an additional setof control points to the open interior of cell faces and byusing local affine coordinates. A compressed spatial index,that we name zetta-token and introduce in Section 4.4, isused to gain coherent caching and fast access to distributeddata sets.

4.3.1. Steiner and Delaunay refinements of Voronoicomplexes

A d-polytope in En is a bounded convex set supportedby an affine d-subspace (i.e., by a translated instance of alinear subspace of dimension d). A simple d-polytope is a d-polytope whose vertices (0-cells) are incident to exactly dfacets (i.e., to d faces of dimension d−1). It is easily checkedthat d-cubes are simple d-polytopes.

A polytopal complex P is a collection of polytopes in En,called cells, such that (i) each face of a cell is also contained

5

Page 6: Towardswebmodelingandsimulationwithbigdatapaoluzzi.dia.uniroma3.it/web/pao/doc/spm2012.pdf · and the emerging business of PLM systems for aerospace, automotive, naval, and manufacturing

Fig. 1. The refinement hierarchy: (a) a simple polytopal complex P with T-junctions; (b) its Steiner points (grey dots); (c) the Steinercomplex ΣP , comprised of quads; (d) the Delaunay simplicial complex dual to the Voronoi complex P .

Fig. 2. An example of local refinement: (a) the Voronoi cell to refine; (b) Steiner points (blue dots) in the supported Steiner subcomplex;

(c) the refined subcomplex; (d) a possible resolution of T-junctions at the boundary of the subcomplex (the next global Steiner refinement

would produce only cuboidal cells).

in P , and (ii) the intersection of any two cells is again acell. We call simple polytopal complex, or Voronoi complex,a polytopal complex of simple polytopes. Simplicial com-plexes are special Voronoi complexes, which in turn are aspecial case of regular CW-complexes [29].

A cuboidal d-complex, or hyper-cuboidal complex (HCC),in En is a Voronoi complex where every k-cell is the convexhull of 2k convexly independent points 1 (0 ≤ k ≤ d ≤ n).In other words, all the k-cells of a cuboidal complex havethe topology of the k-hypercube.

With some abuse of language, we call Voronoi points theset V ⊂ En of positions of the 0-cells of a Voronoi complex.We also call Steiner points S ⊂ En the centroids of k-faces(0 ≤ k ≤ d) of a Voronoi complex.

By adding Steiner points to all cells of a Voronoi complex,it is always possible to generate a cuboidal d-complex, i.e.,to refine the complex using only k-cuboids (0≤k≤d). Thecelebrated Catmull-Clark subdivision algorithm [10] is apreeminent example of this kind of refinement. Figures 1and 2 provide an illustration of such refinements.

Given a Voronoi complex P , we call Steiner refinementΣ(P ) the cuboidal complex generated by its Steiner points,and Delaunay refinement ∆(P ) the simplicial complexwhose vertices are the Steiner points of P and such that

1 The convex combination of a set of points is the linear combination

of their position vectors with nonnegative scalars summing to one.A set of points is said to be convexly independent if none of them

can be generated as a convex combination of the others.

each d-cell of Σ(P ) is the quasi-disjoint union of d-cells of∆(P ). Denoting by ≺ the refinement relation between cellcomplexes covering the same space, it holds true that, forall Voronoi complex P , P ≺ ΣP ≺ ∆P.

Let us denote with P k (0≤k≤d) the maximal subcom-plex of P of dimension k, often called the k-skeleton of P .When Steiner points are added to a subset L ⊂ P k of aVoronoi d-complex and to all the faces of cells in L, a localSteiner refinement ΣLP is obtained. We denote it as P>,to point out that it is a Voronoi complex with T -junctions,i.e., containing a set of k-faces sharing the same affine k-support and lying on the boundary of the same cell. A 2Dexample is shown in Figure 2c. Of course, further globalrefinements ΣP> and ∆P> are cuboidal and simplicial,respectively.

Simplicial complexes, quad meshes, hexahedral meshes,and in general all well-formed decompositions of finitespaces with convex subsets are Voronoi complexes. Ofcourse, hyper-cuboidal complexes generalize 3D hexahe-dral meshes to higher (and lower) dimensional spaces.In 2D, by introducing Steiner points in a Voronoi com-plex (any mesh of convex polygons), we obtain a mesh ofquads. In 3D we get a hexahedral mesh from any mesh of3-polytopes such that all the vertices of every cell are tri-hedral. The following is a useful property: when a Voronoid-mesh P is refined into a Steiner mesh with m vertices,m = |P 0|+ · · ·+ |P d|.

6

Page 7: Towardswebmodelingandsimulationwithbigdatapaoluzzi.dia.uniroma3.it/web/pao/doc/spm2012.pdf · and the emerging business of PLM systems for aerospace, automotive, naval, and manufacturing

4.3.2. Hasse graphsThe Hasse graph H = (N,A) of a d-complex is a (d+ 1)-

partite graph, where nodes N and arcs A are partitionedinto disjoint subsets Nk (0 ≤ k ≤ d) and Ak ⊂ Nk×Nk+1

(0≤k≤d−1), respectively. We use the abstract poset of acell complex, completely characterized by its Hasse graph,as a representation of the complex. In the present setting,the Hasse graph represents the (direct) containment rela-tionship between the faces of a set of polytopes. The Hassegraph can be shown to be a direct implementation of thechain (cochain) complex defined by a decompositive repre-sentation scheme in solid modeling [13]. This representationis very general, in that it applies to all domains that canbe characterized as cell complexes, without any restrictionson their type, dimension, codimension, orientability, man-ifoldness, and connectedness. The Hasse graph, as imple-mented in Plasm.js, is locally a DAG, but—for the sake ofefficiency—is globally cyclic, since every highest level noden ∈ Nd (i.e., every d-cell) is connected by direct arcs (n,m)to every lowest level node m ∈ N0 (i.e., every 0-face) in itsconvex hull. In other words, the arc subset Ad ⊂Nd×N0

represents the relationship of convex combination of sub-sets of vertices producing the highest-dimensional cells ofthe complex.

4.3.3. Simplicial decompositions for curved geometriesA simplicial map is a map between simplicial complexes

such that the images of the vertices of any simplex spana simplex. Let Σ0 and Σ1 be simplicial complexes, and letf be a vertex map Σ0→ Σ1, such that if (v0, . . . , vd) is asimplex of Σ0, then (f(v0), . . . , f(vd)) is a simplex of Σ1.Simplicial maps induce linear continuous maps between theunderlying polyhedral spaces via barycentric coordinates.As is well known, polynomial maps are closed with respectto affine transformations of control vertices, and rationalmaps have both affine and projective closures. The moralis: just map the vertices of a Delaunay decomposition of thedomain, and compute the image of any other point by lin-ear combination of the images of vertices with barycentriccoordinates.

4.3.4. Polytopal-to-cuboidal refinement algorithmOne component of our novel geometry kernel is the HCC

algorithm, that stands literally for hypercuboidal complexgeneration. This algorithm produces decompositions ofVoronoi complexes into meshes comprised of quads, hexa-hedra, and higher dimensional cells having the hypercubetopology. The algorithm rests on two key ideas:

(i) the d-hypercube topology is completely characterizedby the Hamming graph H(d, 2);

(ii) the Hamming graph of the d-hypercube is isomorphicto the Hasse graph of the (d−1)-simplex.

The Hamming graphH(d, 2) is a graph (N,A) with |N | =2d, such that there is an arc (i, j) ∈ A if and only if thenodes ni and nj differ precisely by one coordinate.

The Hamming graph of the 4-dimensional hypercube isshown in Figure 3a. By comparison with Figure 3b, thereader will note that this graph is isomorphic to the Hassegraph of the simplex σ3 (i.e., the tetrahedron), having one3-face, four 2-faces, six 1-faces, four 0-faces, and one (-1)-face (the empty set).

0000

0001001001001000

1100 10100110

1001 0101 0011

0111101111011110

1111

0

1234

10 98

7 6 5

11121314

15

Fig. 3. (a) The Hamming graph of the 4-hypercube; (b) the Hasse

graph of the 3-simplex.

In a more abstract way, HCC generation may be seenas a morphism γ : Hd→Hd between Hasse graphs of d-complexes that takes into account the topology of (k−1)-simplices (1 ≤ k ≤ d). Let us consider the Hasse graphsH = (N,A) and H ′ = (N ′, A′) = γ(H) := (γ(N), γ(A)).According to points (i) and (ii) above, the nodes in N ′k arein one-to-one correspondence with the H subgraphs thatare isomorphic to the Hasse graph of the simplex σk−1.Moreover, the arc (u,w), with u,w ∈ N ′, exists in A′ if andonly if γ−1(u) is a subgraph of γ−1(w).

4.3.5. Steiner meshing for dimension-independentmultigrid

Multigrid methods are a class of algorithms for solvingdifferential equations using a hierarchy of discretizations.If the basic idea behind multigrid [9]—i.e., the acceleratedconvergence of iterative solvers when used as global cor-rectors of a coarser solution—is generalized to structuresother than grids, one obtains multilevel, multiscale or mul-tiresolution methods [25].

The multidimensional HCC decomposition will providea great toolbox to support the whole family of multigridmethods, both in standard 2–3D settings, as well as in4D and higher dimensions. Of course, the required storagegrows quite rapidly. In particular, the total number mh+1

of cells of the Steiner d-complex Qdh+1, generated by apply-

ing a single HCC refinement to the complex Qdh, is given by

mh+1 = nh+1 + 2|Q1h|+ · · ·+ 2d|Qd

h+1| ,

where the number |Qkh+1| of k-cells of the (h+1)-th refine-

ment is 2k|Qkh| (1≤ k ≤ d), and the number of vertices of

the refinement is nh+1 =∑d

k=0 |Qkh|.

The standard implementation of multigrid methodsshould carry over quite simply and naturally to HCC re-finements, because the data structures we use to represent

7

Page 8: Towardswebmodelingandsimulationwithbigdatapaoluzzi.dia.uniroma3.it/web/pao/doc/spm2012.pdf · and the emerging business of PLM systems for aerospace, automotive, naval, and manufacturing

incidence and adjacency relations between cells comfort-ably accommodate the usual storage pattern with cellsstored into indexed arrays bounded by multiples of 2k. Amore abstract implementation of discrete differential op-erators, based on their connection with the (co)boundaryoperators on the (co)chain complex constructed on top ofthe cell complex, is also readily available.

4.3.6. Basic operationsPlasm.js provides a number of topological and geomet-

ric operators just out of the box. Other operators may beconstructed by properly combining some of the predefinedones. User-defined additions to the library may be imple-mented either in JavaScript or in CoffeeScript, usually ina script file located on the web server or even on the clientmachine. The choice is given between local or distributedstorage of a cell complex. In the first case, standard integerindices are used for describing topology and accessing cellsand cell elements and relations, whereas in the second casethe BigPointSet class is used for vertices and control pointsand zetta-tokens (to be described in Section 4.4) are asso-ciated with an appropriate partition of the object supportspace.

Primitive objects Several geometric primitives are pre-defined, including d-simplicial and d-cuboidal grids, andtheir Steiner and Delaunay refinements. Other primitivesare d-circles, d-balls, d-toruses, d-cylinders and d-cones.Instance generators of polytopal, cuboidal or simplicialcomplexes require as input a PointSet object for verticesdefinition, and the specification of the Topology of thecomplex through the array of cells of highest dimension,each given in turn as an array of vertex indices. Methodson complexes include general affine transformations anddimension-independent translations, dilations, rotationsand shears.

Topology editing, boundary, coboundary Local editing ofcomplexes is provided by methods of the Hasse graph class,where nodes and edges may be inserted and deleted, andseveral querying operations (about nodes by level, ances-tors and descendants, nodes visualization, et cetera) areavailable. The graph of a complex is usually built startingfrom a complex instance. The whole hierarchy of bound-ary and coboundary operators ∂k (1≤ k≤ d) and δh (0≤h ≤d−1) is readily available for any given d-complex, andrepresented using (sparse) signed incidence matrices. Theextraction of the boundary complex of a complex may re-turn data structures indexed either on the input verticesor on the smaller set of output vertices. The extraction ofk-skeletons is also provided.

Convex hulls and triangulations The very basic opera-tion in solid modeling—comparable to sort in computerscience—is the computation of the convex hull of a discreteset of points in En. Our kernel introduces a variation for

very large datasets of the dimension-independent Quick-Hull algorithm [6]. A forward step is first executed, basedon the possible pruning of interior points during the com-putation of zetta-tokens of regions, until the leaves of thespace partitions contain only a single maximal k-simplex(0≤ k ≤ d) from the input vertices. A backward step fol-lows, with pairwise merging of polytopes, starting from theset of simplices previously detected, whose pairwise adja-cency is codified within the zetta-tokens. The pruning inthe forward step is removed when a simplicial triangulationof the whole set of input points is required. The polytopalcomplex of boundary points is usually returned.

Cartesian product, Boolean operations, Minkowski sumIn our view, these are the most important high-level opera-tions in solid modeling. The Cartesian product of polytopalcomplexes [7] corresponds closely to the Cartesian prod-uct of Hasse graphs. Boolean operations of cell complexeswill make use of a dimension-independent variation of thezigzag algorithm normally used in 2D. The Minkowski sumof a cell complex with a polytope will use [18] a mixture ofstraight extrusions in higher-dimensional spaces, of projec-tions onto lower-dimensional spaces with filtering of emptyand redundant cells.

Hierarchical assemblies The management of structures,also named structured objects or object assemblies in PLM,or scene graphs in computer graphics and visualization, wasalways done in Plasm via the definition of ordered sequencesof geometric values (i.e., cell complexes) and affine trans-formations, with the semantics of PHIGS structures, whereeach transformation is orderly applied to all geometric val-ues following it in the sequence. It is actually implementedas a binary graph with transformations on the right leavesand references to cell complexes on the non-leaf nodes.

Mapping and filtering: support of curved geometries Thefoundation for curved geometries in Plasm.js is given bysupporting parametric geometry via simplicial maps overcell complexes. A very general map primitive accepts as in-put an array of coordinate functions Rd → R and a poly-topal complex P . The complex is checked for type, andpossibly refined into a simplicial complex ∆P . Then, a sim-plicial map is applied to the vertices using the given coor-dinate functions, whose number equates the dimension ofthe target space. A final step checks for possible filteringout of empty cells, as well as for possible glueing of mappedfaces and/or identification of mapped vertices. This ap-proach is very general and flexible, and allows not onlyfor applying standard numerical methods of parametric ge-ometry with polynomial or rational coordinate functions,but even for mixing them freely with algebraic or tran-scendental functions. For example, our implementation ofquadratic primitives (d-circles, d-balls and so on) uses sin(·)and cos(·) functions instead than NURBS. When possible,the hardware-supported webGL implementation of splines

8

Page 9: Towardswebmodelingandsimulationwithbigdatapaoluzzi.dia.uniroma3.it/web/pao/doc/spm2012.pdf · and the emerging business of PLM systems for aerospace, automotive, naval, and manufacturing

and NURBS is used, in favor of efficiency. Last but notleast, since JavaScript allows for higher-level functions, thelibrary provides for an easy implementation of transfinitemethods—as ever one of Plasm’s strong points.

4.4. Zetta-tokens for spatial indexing and caching

In recent years, efficient methods for indexing geometricdata have been developed for a wide variety of applications,namely for GIS, geographical databases, and progressivedisplay of large sets of 2D or 3D images with high resolution,like imagery from satellites or body scans. The topic goesunder the names of spatial database or spatial index, wherea spatial database is optimized to store and query data thatare related to objects in space, including points, lines andpolygons. Spatial indexes are used by spatial databases tooptimize spatial queries. In particular, highly efficient in-dexing techniques have been devised based on space-fillingcurves, thanks to the total ordering of their hierarchicalfractal structure.

In our prototype geometry kernel we have introduced anovel method to compute a total proximity ordering of mul-tidimensional points, using an indexing strategy based onthe computation of a hierarchy of affine parameterizationsof convex subsets of points. In particular, it is guaranteedthat the distance between all pairs of points in any intervalin the linearly ordered set is bounded above by a properfunction of the distance between the two interval extremes.

Our solution computes a data partition into convex sub-sets, and maps all data inside a subset to an integer index,so that a simple sort may move all the data within eachconvex subset close to each other. The index value associ-ated with all data inside the same cell of the leaf refinementof the model is stored using a compact coding as a numericstring over an alphabet of high cardinality. In our proto-type implementation we use 36 digits (0–9 and A–Z), i.e.,a hexatridecimal coding.

The computation of this spatial index, named zetta-token, starts from a parametrization of Ed with affinecoordinates, dependent on the choice of a reference simplexΣ with affinely independent vertices. This choice induces atiling of the d-dimensional affine space into 2d+1−1 convexregions. Each tile contains the whole collection of pointswith h nonnegative and k negative affine coordinates, withh+k = d+1. Every tile is convex, since the convex hullof every pair of points in it contains only points with thesame coordinate signs of the two extremes. Since eachcoordinate is either negative or nonnegative, there are ex-actly 2d+1−1 tiles, the case h=0 (all negative coordinates)being excluded. Each tile is characterized by the binarynumber whose bits codify the signs of the correspondingcoordinates. This affine partitioning is repeated hierar-chically, and the corresponding bits accumulated in thetokens associated to the leaf regions.

The above defined space tiling may be locally refined orupdated at will, without requiring a global restructuring of

the index values. The key point is that the data subsets as-sociated with each index interval are guaranteed to be spa-tially continuous. As a consequence, one may locate a querypoint in logarithmic time. Moreover, after the first access,one has just to read the data closest to the seed point onthe two sides of this totally ordered index. In other words,when looking for spatially continuous data, search is max-imally efficient and cache coherence is always guaranteed:neither data misses nor data repetitions can happen.

4.5. Data management: ZFS and CouchDB via RESTservices

Big data are commonly defined as datasets that growso large that it becomes very hard to work with them us-ing standard database management tools [28,16]. Difficul-ties include storage, searching, sharing, doing computationsand visualizing. In our prototype architecture, we are ex-perimenting with geometric data storage and retrieval us-ing a CouchDB database over zettabyte file systems (ZFS)with transactions operated via REST web services. ZFS is adistributed file system with support for 128 bits, where stor-age capacity is, for all practical purposes, unlimited. It en-joys many other features, such as rock-solid data integrity,easy administration, and a simplified model for managingdata. CouchDB, from Apache Foundation, is an open sourcedocument-oriented database, written mostly in the Erlangprogramming language, that can be queried and indexedusing JavaScript in a MapReduce fashion [1]. It is designedfor local replication and to scale horizontally across a widerange of devices. In CouchDB you may think of a documentas a collection of key–value pairs expressed in JSON, or-dered lists and associative maps included. Each data viewis constructed by a (server-side) JavaScript function thatacts as the ‘map’ half of a ‘map/reduce’ operation over therepresentation of resources—in our case the subcomplexesoriginated by multilevel domain partitions. The represen-tation of a cell subcomplex is exactly a JavaScript ObjectNotation (JSON) document that captures the current stateof it (with the possible addition of a document compres-sion, to speed up transmission). CouchDB, while packing adocument, writes out a completely new database file whichcan utilize the new gzip compression settings on ZFS.

Let us recall that a web service is a software system de-signed to support interoperable machine-to-machine inter-action over a network. REST-compliant web services havethe primary purpose to manipulate the representations ofweb resources using a uniform set of stateless operations,i.e., using HTTP methods explicitly. Furthermore, RESTweb services expose directory structure-like URIs, and maytransfer directly the JSON, i.e., the exact internal repre-sentation of the geometric objects at hand. Other primarycharacteristics of REST services include scalability of com-ponent interactions, generality of interfaces, and indepen-dent deployment of components to interacting machines.

9

Page 10: Towardswebmodelingandsimulationwithbigdatapaoluzzi.dia.uniroma3.it/web/pao/doc/spm2012.pdf · and the emerging business of PLM systems for aerospace, automotive, naval, and manufacturing

5. Closure: be simple, general, robust, distributed

The most important lesson we learned in the recent pastand try to convey here is that the more the object to bemodeled and simulated is complex and massive—and cor-respondingly its data huge and distributed—, the more themodeling and simulation software should be simple andgeneral, built on the solid foundations provided by a fewrobust computational paradigms and naturally oriented to-wards distributed and/or parallel computing. In this re-spect, we believe that the right software tools are alreadyhere: they are the most general and simple instrumentsavailable these days, i.e., Internet protocols, languages, andtechnologies, to be teamed with a careful rethinking ofmany of the fundamental assumptions about model repre-sentation and management, data storage and buffering.

References

[1] Anderson, J. C., Lehnardt, J., and Slater, N. CouchDBThe Definitive Guide, 1st edition, ed. O’Reilly Media, 2010.

[2] Backus, J. Can programming be liberated from the VonNeumann’s style? a functional style and its algebra of programs.

Communications of the ACM 21, 8 (1995), 613–641. ACM

Turing Award Lecture.

[3] Backus, J., Williams, J., Wimmers, E., Lucas, P., and Aiken,

A. FL language manual, parts 1 and 2. Tech. rep., IBM ResearchReport, 1989.

[4] Backus, J., Williams, J. H., and Wimmers, E. L. Researchtopics in functional programming. Addison-Wesley Longman

Publishing Co., Inc., Boston, MA, USA, 1990, ch. An

introduction to the programming language FL, pp. 219–247.

[5] Bajaj, C., DiCarlo, A., and Paoluzzi, A. Proto-plasm:

A parallel language for scalable modeling of biosystems.Philosophical Transactions of the Royal Society A 366, 1878

(September 2008), 3045–3065. Issue ”The virtual physiologicalhuman: building a framework for computational biomedicine

I”. compiled by Marco Viceconti, Gordon Clapworthy, Peter

Coveney and Peter Kohl.

[6] Barber, C. B., Dobkin, D. P., and Huhdanpaa, H. The

quickhull algorithm for convex hulls. ACM Trans. Math. Softw.22, 4 (Dec. 1996), 469–483.

[7] Bernardini, F., Ferrucci, V., Paoluzzi, A., and Pascucci,V. Product operator on cell complexes. In Proceedings onthe second ACM symposium on Solid modeling and applications

(New York, NY, USA, 1993), SMA ’93, Acm, pp. 43–52.

[8] Bossavit, A. Computational Electromagnetism. Academic

Press, 1998.

[9] Briggs, W. L., Henson, V. E., and McCormick., S. F. Amultigrid tutorial: second edition. Society for Industrial andApplied Mathematics, Philadelphia, PA, USA, 2000.

[10] Catmull, E., and Clark, J. Recursively generated b-splinesurfaces on arbitrary topological meshes. Computer-Aided

Design (1978).

[11] Chakode, R., Me Andhaut, J.-F., and Charlet, F. High

performance computing on demand: Sharing and mutualizationof clusters. In Advanced Information Networking andApplications (AINA), 2010 24th IEEE International Conferenceon (april 2010), pp. 126 –133.

[12] DiCarlo, A. G. Lame vs. J.C. Maxwell: how to reconcile them.

In Scientific Computing in Electrical Engineering, W. Schilders,E. ter Maten, and S. Houben, Eds., vol. 4 of Mathematics inIndustry Series. Springer, 2004, pp. 1–13.

[13] DiCarlo, A., Milicchio, F., Paoluzzi, A., and Shapiro, V.Solid and physical modeling with chain complexes. In ACM

Solid and Physical Modeling Symposium. ACM Press, Beijing,

China, 2007. ACM SPM 2007.[14] Encarnacao, J., and Schlechtendahl, E. G. Computer aided

design: fundamentals and system architectures. Springer-Verlag,

1983.[15] Hayes, B. Cloud computing. Commun. ACM 51, 7 (July 2008).

[16] Hellerstein,J. Parallel programming in the age of big data. Gigaom Blog,

November (2008). http://gigaom.com/2008/11/09/mapreduce-

leads-the-way-for-parallel-programming/.[17] Li, B., Li, X., Wang, K., and Qin, H. Generalized polycube

trivariate splines. In Proceedings of the 2010 Shape Modeling

International Conference (Washington, DC, USA, 2010), SMI’10, IEEE Computer Society, pp. 261–265.

[18] Paoluzzi, A. Geometric Programming for Computer Aided

Design. John Wiley & Sons, Chichester, UK, 2003.[19] Paoluzzi, A., Bernardini, F., Cattani, C., and Ferrucci,

V. Dimension-independent modeling with simplicial complexes.

ACM Transactions on Graphics 12, 1 (1993), 56–102.[20] Paoluzzi, A., Pascucci, V., and Vicentino, M. Geometric

programming: a programming approach to geometric design.ACM Transactions on Graphics 14, 3 (1995), 266–306.

[21] Requicha, A. G. Representations for rigid solids: Theory,

methods, and systems. ACM Comput. Surv. 12 (December1980), 437–464.

[22] Tonti, E. On the formal structure of physical theories. Tech.

rep., Istituto di Matematica del Politecnico di Milano, 1975.[23] Tonti, E. A direct discrete formulation of field laws: The cell

method. Computer Modeling in Engineering and Sciences 2, 2

(2001), 237–258.[24] Tonti, E. Finite formulation of electromagnetic field. IEEE

Transactions on Magnetics 38, 2 (2002), 333–336.

[25] Trottenberg, U., Oosterlee, C., and Schuller, A.Multigrid. Academic Press, London, 2001.

[26] Wang, H., He, Y., Li, X., Gu, X., and Qin, H. Polycubesplines. In Proceedings of the 2007 ACM symposium on Solid

and physical modeling (New York, NY, USA, 2007), SPM ’07,

Acm, pp. 241–251.[27] Wang, K., Li, X., Li, B., Xu, H., and Qin, H. Restricted

trivariate polycube splines for volumetric data modeling. IEEE

Trans. Vis. Comput. Graph. 18, 5 (2012), 703–716.[28] White, T. Hadoop: The Definitive Guide, 1st edition, ed.

O’Reilly Media, 2009.

[29] Ziegler, G. M. Lectures on Polytopes. Springer, 1995.

10