9
Scalable Isosurface Visualization of Massive Datasets on COTS Clusters Xiaoyu Zhang Chandrajit Bajaj William Blanke Department of Computer Sciences and Center for Computational Visualization, TICAM Department of Electrical and Computer Engineering University of Texas at Austin http://www.ticam.utexas.edu/CCV Abstract Our scalable isosurface visualization solution on a commodity off- the-shelf cluster is an end-to-end parallel and progressive platform, from the initial data access to the final display. In this paper we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It partitions the volume data according to its workload spectrum for load balancing and cre- ates an I/O-optimal external interval tree to minimize the number of I/O operations of loading large data from disk. It achieves scal- ability by using both parallel processing and parallel disks. Inter- active browsing of extracted isosurfaces is made possible by using parallel isosurface extraction and rendering in conjunction with a new specialized piece of image compositing hardware called the Metabuffer. We also describe an isosurface compression scheme that is efficient for isosurface processing. CR Categories: I.3.1 [Computer Graphics]: Hardware Architecture—Parallel Processing; I.3.8 [Computer Graphics]: Ap- plications Keywords: Parallel Rendering, Metabuffer, Multi-resolution, Pro- gressive mesh, Parallel and Out-of-core Isocontouring 1 Introduction Today tomographic imaging and computer simulations are increas- ingly yielding massive datasets. Interactive and exploratory visual- ization has rapidly become an indispensable tool to determine and browse regions of interest within volumetric imaging data, and ver- ify and validate the results of computer simulations. One paradigm of exploratory visualization is to extract multiple 2-dimensional surfaces satisfying const from a given scalar field , , and render it at interactive frame rate ( ). This inter- active and exploratory visualization technique s popularly known as isocontour visualization. Isocontour visualization for extremely large datasets poses chal- lenging problems for both computation and rendering with guar- anteed frame rates. First, large isosurfaces are to be extracted in Commodity off-the-shelf time-critical manner from those large datasets, whose sizes are from multi-gigabytes to terabytes. As the size of the input data increases, isocontouring algorithms necessarily need to be executed out-of- core and/or on parallel machines for both efficiency and data acces- sibility. Second, the interactive aspect of the isocontour visualiza- tion demands that the scene is rendered quickly in order to provide responsive feedback to the user. In some cases, the detail allowed by a single high performance monitor may not be adequate for the resolution required. An even more common problem is that the dataset itself may be too large to store and render on a single ma- chine. Third, the extracted isosurface may need to be transmitted from the computational servers to the rendering servers via the net- work, if the computational and rendering servers do not coexist on the same machines. It may also need to be saved on disk for future studies. Compact representation of the isosurfaces should be used in order to meet the time limit of data transmission and save disk space. Related Work: Hansen and Hinker describe parallel methods for isosurface extraction on SIMD machines [17]. Ellsiepen de- scribes a parallel isosurfacing method for FEM data by dynami- cally distributing working blocks to a number of connected work- stations [13]. Shen, Hansen, Livnat and Johnson implement a par- allel algorithm by partitioning load in the span space [28]. Parker et al. present a parallel isosurface rendering algorithm using ray tracing [24]. Chiang and Silva give an implementation of out-of- core isocontouring using the I/O optimal external interval tree on a single processor [8, 9]. Bajaj et al. use range partition to reduce the size of data that are loaded for given isocontour queries and balance the load within a range partition [3]. In this paper, we propose and implement a parallel and out-of-core isocontouring algorithm using parallel processors and parallel I/O, which would be fully scalable to arbitrarily large datasets. Many research groups have recently studied the problem of using fast improving PC graphics cards for parallel rendering [12,18,20,26,27]. Schneider analyzes the suitability of PCs for par- allel rendering for four parallel polygon rendering scenarios: ren- dering of single and multiple frames on symmetric multiprocessors and clusters [27]. Samanta et al. discuss various load balancing schemes for a multi-projector rendering system driven by multiple PCs [26]. Heirich and Moll demonstrate how to build a scalable image composition system using off-the-shelf components [18]. In general, most parallel rendering methods can be classified based on where data is sorted from object-space to image-space [22]. In the sort-first approach, the display space is broken into a num- ber of non-overlapping display regions, which can vary in size and shape. Sort-first methods may suffer from load imbalance in both the geometric processing and rasterization if polygons are not evenly distributed across the screen partitions, because polygons are assigned to the rendering process before geometric processing. The Princeton University SHRIMP project [26] uses the sort-first approach to balance the load of multiple PC graphical worksta- tions. The sort-middle approach distributes transformed primitives

Scalab le Isosurface Visualization of Massive Datasets on

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scalab le Isosurface Visualization of Massive Datasets on

Scalab� le Isosurface Visualization of Massive Datasets on COTS � Clusters

Xiaoyu Zhang� ChandrajitBajaj� William Blanke�� Departmentof ComputerSciencesandCenterfor ComputationalVisualization,TICAM

� Departmentof ElectricalandComputerEngineeringUniversityof Texasat Austin

http://www.ticam.utexas.edu/CCV

Abstract

Our scalableisosurfacevisualizationsolutionon a commodityoff-the-shelfclusteris anend-to-endparallelandprogressive platform,from the initial dataaccessto the final display. In this paperwefocusonthebackendscalabilityby introducinga fully parallelandout-of-coreisosurfaceextractionalgorithm.It partitionsthevolumedataaccordingto its workloadspectrumfor loadbalancingandcre-atesan I/O-optimal external interval treeto minimize the numberof I/O operationsof loadinglargedatafrom disk. It achievesscal-ability by usingboth parallelprocessingandparalleldisks. Inter-active browsingof extractedisosurfacesis madepossibleby usingparallel isosurfaceextractionandrenderingin conjunctionwith anew specializedpieceof imagecompositinghardware called theMetabuffer. We alsodescribean isosurfacecompressionschemethatis efficient for isosurfaceprocessing.

CR Categories: I.3.1 [Computer Graphics]: HardwareArchitecture—ParallelProcessing;I.3.8 [ComputerGraphics]:Ap-plications

Keywords: ParallelRendering,Metabuffer, Multi-resolution,Pro-gressive mesh,ParallelandOut-of-coreIsocontouring

1 Introduction

Todaytomographicimagingandcomputersimulationsareincreas-ingly yieldingmassive datasets.Interactive andexploratoryvisual-izationhasrapidly becomeanindispensabletool to determineandbrowseregionsof interestwithin volumetricimagingdata,andver-ify andvalidatetheresultsof computersimulations.Oneparadigmof exploratory visualizationis to extract multiple 2-dimensionalsurfacessatisfying ������� const from a given scalarfield ����� ,������� , andrenderit at interactive framerate( ������� ). This inter-active andexploratoryvisualizationtechniques popularlyknownasisocontourvisualization.

Isocontourvisualizationfor extremelylargedatasetsposeschal-lenging problemsfor both computationand renderingwith guar-anteedframerates. First, large isosurfacesare to be extractedin

�Commodityoff-the-shelf

time-criticalmannerfrom thoselargedatasets,whosesizesarefrommulti-gigabytesto terabytes.As thesizeof theinputdataincreases,isocontouringalgorithmsnecessarilyneedto be executedout-of-coreand/oronparallelmachinesfor bothefficiency anddataacces-sibility. Second,the interactive aspectof the isocontourvisualiza-tion demandsthatthesceneis renderedquickly in orderto provideresponsive feedbackto theuser. In somecases,thedetailallowedby a singlehigh performancemonitormaynot beadequatefor theresolutionrequired. An even more commonproblemis that thedatasetitself may be too large to storeandrenderon a singlema-chine. Third, the extractedisosurfacemay needto be transmittedfrom thecomputationalserversto therenderingserversvia thenet-work, if thecomputationalandrenderingserversdo not coexist onthesamemachines.It mayalsoneedto besavedon disk for futurestudies.Compactrepresentationof the isosurfacesshouldbeusedin orderto meetthe time limit of datatransmissionandsave diskspace.Related Work: Hansenand Hinker describeparallel methodsfor isosurfaceextraction on SIMD machines[17]. Ellsiepende-scribesa parallel isosurfacing methodfor FEM databy dynami-cally distributing working blocksto a numberof connectedwork-stations[13]. Shen,Hansen,LivnatandJohnsonimplementa par-allel algorithmby partitioningload in thespanspace[28]. Parkeret al. presenta parallel isosurfacerenderingalgorithm using raytracing[24]. ChiangandSilva give an implementationof out-of-coreisocontouringusingtheI/O optimalexternalinterval treeon asingleprocessor[8,9]. Bajajetal. userangepartitionto reducethesizeof datathatareloadedfor givenisocontourqueriesandbalancetheloadwithin a rangepartition[3]. In this paper, we proposeandimplementaparallelandout-of-coreisocontouringalgorithmusingparallelprocessorsandparallelI/O, which would befully scalableto arbitrarily largedatasets.

Many researchgroups have recently studied the problem ofusing fast improving PC graphics cards for parallel rendering[12,18,20,26,27]. Schneideranalyzesthesuitabilityof PCsfor par-allel renderingfor four parallelpolygonrenderingscenarios:ren-deringof singleandmultiple framesonsymmetricmultiprocessorsandclusters[27]. Samantaet al. discussvariousload balancingschemesfor a multi-projectorrenderingsystemdrivenby multiplePCs[26]. Heirich andMoll demonstratehow to build a scalableimagecompositionsystemusingoff-the-shelfcomponents[18]. Ingeneral,mostparallelrenderingmethodscanbeclassifiedbasedonwheredatais sortedfrom object-spaceto image-space[22].

In thesort-firstapproach,thedisplayspaceis brokeninto anum-ber of non-overlapping display regions, which can vary in sizeandshape.Sort-firstmethodsmay suffer from load imbalanceinboththegeometricprocessingandrasterizationif polygonsarenotevenly distributed acrossthe screenpartitions,becausepolygonsareassignedto therenderingprocessbeforegeometricprocessing.The PrincetonUniversity SHRIMP project [26] usesthe sort-firstapproachto balancethe load of multiple PC graphicalworksta-tions. Thesort-middleapproachdistributestransformedprimitives

Page 2: Scalab le Isosurface Visualization of Massive Datasets on

insteadof polygonsto thegraphicspipesresponsiblefor thescreenpartitions.�

The sort-lastapproachis also known as image composition.Eachrenderingprocessperformsboth geometricprocessingandrasterizationindependentof all other renderingprocesses.Localimagesrenderedon the renderingprocessesare compositedto-getherto form the final image. The sort-lastmethodmakes theload balancingproblemeasiersincescreenspaceconstraintsareremoved. However, compositinghardware is neededto combinethe outputof the variousprocessorsinto a single correctpicture.Suchapproacheshave beenusedsincethe 60’s in single-displaysystems[15,23], andmorerecentwork includes[14,18].

Parallel & Progressive

Image �

Composition �

Parallel & Progressive

Surface

Extraction & Rendering

Hierarchical Volume Dataset

Figure1: A schematicdrawing showing thethreestagesof scalableisosurfacevisualizationpipeline.

Our solution to the image compositing problem is theMetabuffer, whosearchitectureis shown in figure3. This is a sort-last multi-display imagecompositingsystemwith several uniquefeaturessuchasmulti-resolutionandantialiasing[6]. A very simi-lar project,thoughcurrentlywithoutstressingmulti-resolutionsup-port, existsat StanfordUniversity andis calledLightening-2[19].The Metabuffer hardwaresupportsa scalablenumberof PCsandan independentlyscalablenumberof displays–thereis no a prioricorrespondencebetweenthe numberof renderersandthe numberof displaysto beused.It alsoallowsany rendererto beresponsiblefor any axis-alignedrectangularviewport within theglobaldisplayspaceat eachframe. Suchviewportscanbemodifiedon a frame-by-framebasis,canoverlaptheboundariesof displaytilesandeachotherarbitrarily, andcanvary in sizeupto thesizeof theglobaldis-playspace.Thuseachmachinein thenetwork is givenequalaccessto all partsof thedisplayspace,andtheoverallscreenis treatedasauniformdisplayspace,thatis, asthoughit weredrivenvia asingle,largeframebuffer, hencethenameMetabuffer.

While compressionis importantfor handlinglargedatasets,onepossibleapproachto theisosurfacecompressionproblemis to firstextract the isosurfaceinto triangularmeshesand then apply to itone of the surfacecompressionalgorithms[4, 10,11,16,29,30].Although it is conceptuallysimple,this methodhasseveral disad-vantages.First therenderingservershave to wait until thecompu-tationalserversfinish both isosurfaceextractionandcompression.Thecompressionof thesurfaceoftentakesa very long time, espe-cially truefor thevery largesurfacesextractedfrom largedatasets,which contradictsthegoalof usingcompressionfor real-timeren-dering.Furthermore,isosurfaceshave thepropertythateachvertexis anintersectionpointwith oneuniqueedgeof the3D volume.Analgorithmdesignedspecificallyfor an isosurfacemay get a bettercompressionratio. In this paperwe describeanindex andfunctionvalueencodingschemethat compressesthe isosurfaceandallowsstreamingthe compresseddatato the renderingserversincremen-tally eitherduringthesurfaceextractionor from cacheondisk. Wealsoshow themethodachievesabettercompressionratiothansomegeneralpurposesurfacecompressionalgorithms.Main Results: With all threeaforementionedtechniquescom-bined,parallelandout-of-corecomputation,parallelrenderingandcompression,it is possibleto obtain a fully scalablesystemforinteractive isosurface visualizationacrossmultiple isovaluesandfrom differentviewpoints. In this paperwe focuson thebackendparallelandout-of-coreisosurfaceextraction,leaving thedetailsofprogressive imagecompositionusing the Metabuffer and its per-formanceresultsto a separatepaper[6]. The restof our paperis

organizedas follows: Section2 briefly describesthe architectureof our framework for scalableisosurfacevisualization. Section3givesthedetailsof ourparallelandout-of-coreisocontouringalgo-rithm. Section3.2providesthedetailof our isosurfacecompressionmethodandcomparesits resultsto thatof anothersurfacecompres-sion algorithm. Section4 gives the performanceof our parallelimplementationon a COTScluster.

2 Framework

Network

Rendering Server

P r o j ! e c t o r

P r o j ! e c t " o r

P r o j ! e c t " o r

P r o j ! e c t o r

Rendering Server

Rendering

Server

Computing

Processor

Computing Processor

Computing Processor

. . .

Cluster #

M e t a B

u f f e r

D i s p l a y

Parallel Front-Side Parallel Back-side

Observer

Figure2: Oursystemarchitecturefor scalableisosurfacevisualiza-tion. The parallelback-sideaccomplishesprogressive surfaceex-tractionandrenderingwhile theparallelfront-sidecompositesanddisplaysprogressively.

2.1 Pipelined Stages

Wecanthinkof theprocessof scalabletime-criticalisosurfacevisu-alizationof massive dataasa parallelandprogressive streamfrombackto front asshown in figure1. Trianglestreamsaregeneratedby theback-endnodesby progressively extractingthe isosurfaces.The trianglestreamfrom the extractionnodewill be renderedbythe middle parallel renderingservers. All renderedimageswillthenbe compositedby the Metabuffer to displayon a multi-tiledscreen.Theprocessof compositingimagesfrom multiplerenderingserversto multiple displaysusingtheMetabuffer is calledparallelimagecomposition.Giventherequiredframerate,therefreshtimebetweentwo framesneedsto be sharedamongthesethreestages:triangleextraction, renderingand imagecomposition. While theimagecompositiontime taken by the Metabuffer is constant,howmuchexternaltime left in theframeinterval determineswhatreso-lution of triangleswill beextractedandrendered.

2.2 Hierarchical Volumetric Data

Due to the large sizeof the massive dataset,it is extremelytime-consumingor evenimpossibleto do isosurfaceextractionon a sin-gle processor. In order to scaleto very large datasets,we useacomputationalbackendconsistingof bothparallelprocessorsandparallel disks. Large datasetsare partitionedamongthe parallelprocessorsin aload-balancedwayandstoredhierarchicallyondiskfor efficient I/O access.Thehierarchicalvolumedatacanbestoredondisk in compressedform [5]. Figure2 illustratestheparallelendto end framework for scalableisosurfacevisualizationon a com-modityoff-the-shelfcluster. Theparallelback-sideprovidesmulti-resolutionisosurfacesextractedfrom thevolumedatasetto satisfythetimelimit. Thisstageisgovernedby theparallelandprogressivetriangleextractionalgorithms. We will describein detail our par-allel triangleextractionalgorithmsin section3.1.Producingmulti-resolutionrepresentationof thedataat theback-sideis essentialforthetime critical renderingof massive data.Whentheuserchanges

Page 3: Scalab le Isosurface Visualization of Massive Datasets on

viewing parametersfrequently, coarserrepresentationsof the dataarerendered$ in order to give the userresponsive feedback.Onlywhenthe userchoosesa certainviewing positionandsomeinter-estingisovalue,arethe detailsof the progressive meshor isocon-tour streamedfor renderingin order to producehigherresolutionimage. To reducethe time of datatransmissionover the network,the extractedmeshmay be communicatedto renderingservers incompressedformat. Although the renderingservers might be onthe samesetof machinesasthe triangleextractionprocesses,theprogressive trianglemeshextractionprocessescanin generalscaleindependentlyof thenumberof parallelrenderingservers.

2.3 The Metabuffer

Onenovel featureof the framework is the parallel renderingandimagecompositionsystemthat is able to renderthe given scenein the least latency. The parallel front-side is built aroundtheMetabuffer [6], which is customhardware built from commodityPCcomponents.TheMetabuffer hardwareprovidesseveraluniqueadvantagesto assistin renderinglarge surfacein parallel,suchasarbitrarily locatedandoverlappedviewportsandmulti-resolution.Eachrenderingserver in figure 2 is mappedto a viewport on thescreenspace.A very importantproblemin the parallel renderingis how to positionthoseviewportsandpartitionthemeshsuchthateachrenderingserver hasapproximatelyanequalamountof work.

TheMetabuffer allows thenumberof renderingserversto scaleindependentlyfrom the the numberof display tiles. Since theMetabuffer allows theviewportsto belocatedanywherewithin thetotal displayspaceandoverlapeachother, it is possibleto achievea muchhigherdegreeof load balancing.Sincethe viewportscanvary in size,thesystemsupportsmulti-resolutionrendering,for in-stanceallowing a single machineto rendera backgroundat lowresolutionwhile othermachinesrenderforegroundobjectsatmuchhigherresolution.

Figure3: Our Metabuffer architecture,where % representsa ren-deringengine, & is an on-boardframebuffer, and ' representsacomposingunit.

Giventheprogressivity from thetriangleextractionstageto finalimagecompositionstagein our framework andthe fact that eachstageis fully parallelizable,we canachieve a truly scalablerender-ing of largeisosurfaces.

3 Parallel Algorithms

In this sectionwe will discussin moredetail theparallelandout-of-coreisocontouringalgorithmandtheisosurfacecompressional-gorithmmentionedin section2 thatenablestheframework for scal-abletime-criticalvisualizationof massivedatasets.Firstwediscussthe scalablealgorithm of extracting progressive isosurfacesfromlargevolumedatasets.

3.1 Scalable Isosurface Extraction

A scalabledataanalysisand visualizationapplicationmust takedataprocessing,I/O, network andrenderingall into consideration.Specificallyfor the isocontourextractionof large volumedata,itshouldhave loadbalancedparallelcomputationfor fastsurfaceex-traction,out-of-corecomputationto scaletodatasetsbiggerthanthesizeof total mainmemory, andparallelI/O to avoid thebottleneckof accessingmassive dataon disk.

We canmodelsucha systemthat combinesparallelprocessingandparalleldisk accesswith a modelcalledthe BSP-Diskmodel[25]. The BSP-Diskmodel consistsof ( interconnectedproces-sors,eachof which mayhave a local memoryanddisk. TheBSP-Disk modelcombinesthefeaturesof theBSPparallelcomputationmodel[31] andthePDM [32] paralleldiskmodel.It is adistributedmemoryparallelcomputationmodel,while eachprocessorcanac-cessits local disk in parallel. The BSP-Diskmodel very closelydescribesthe characteristicsof a PC cluster, whereeachnodehasits own CPU,local memoryandlocal disk. Theparametersof theBSP-Diskmodelareasfollows:

)+* : Totalnumberof atomicunitsof theproblem.

)-, : Total numberof atomicunits thatcanfit in theonepro-cessor’s mainmemory.

) ( : Numberof processors.

)+. : Numberof Disks.

) & : Numberof atomicunitsthatfits in onediskblock.

) % : Time to reador write onedisk blockon thelocaldisk.

)+/ : minimalsynchronizationtime of theBSPmodel.

)10 : gapparameterof theBSPmodelwhich characterizesthecommunicationbandwidth.

In the caseof large datasets,we have the designconditions*32 (54 , and ,76 & . In contrastto thePDM modelwhereasingleprocessorhasequalaccessto all paralleldisks,the disksintheBSP-Diskmodelareassociatedto differentprocessorsastheirlocaldisks.Onevery importantexampleis thateveryprocessorhasoneassociatedlocal disk ( (8� . ), asfor thecaseof a PCcluster.More generally 9: processorscanbeassignedto every disk. Extracommunicationtime is requiredwhenoneprocessorneedsto ac-cessthe dataon a remotedisk. In the BSP-Diskmodelonediskblock canbe loadedfrom every disk into memoryin oneparallelI/O becauseeachdisk canbe accessedindependently. Thusup to. diskblockscanbereadinto mainmemoryin oneparallelI/O. Inotherwords,thedataaccesstime is sharedamongthe . disks.

The algorithmsdesignedfor the BSP-Diskmodel would con-siderall threepartsof the time for a real parallelandout-of-corealgorithm, local computation,local disk I/O andcommunication.Thereforethe time of the isosurfaceextraction can be written as; �=<�>@? 9 �

;BA�CD;BEGFHCI;KJ , where;BA

is thetime for local com-putation,

; ELFis the time for local disk accessand

; Jis the time

for communication.; A

and; J

areto bemeasuredusingtheBSP

Page 4: Scalab le Isosurface Visualization of Massive Datasets on

modeland;BEGF

is measuredas %NM *PO , where *QO is the numberof parallelR I/O operations.Theobjective of our isocontouringalgo-rithm for largedatasetsis to speedupthecomputationby distribut-ing theloadto multipleprocessors,minimizethenumberof parallelI/Os,andminimizeinter-processorcommunication(suchasthere-motedisk accesses).Thesefactorsdo not alwaysplay togetherforeachother. We mustmake tradeoffs accordingto the real systemparameters.To minimize thenumberof parallelI/O’s, we needtoload only thoseportionsof datasetthat contribute to the final re-sult, anddistributedatato thediskssuchthatdatais loadedevenlyfrom the . local disks.Minimizing thelocal computationtime re-quiresgoodloadbalanceamongtheprocessors.Finally minimizingcommunicationmeansminimal dataredistributionandremotedataaccessduringcomputation.We choosetheapproachof staticdatapartitioning that minimizesthe communicationduring isocontourextractionbecausedatacommunicationis currentlythe leastscal-ablefactorwhendatasizegetslargerandnew processorsanddiskscanbeeasilyadded.Themethodsof allowing processorsto dynam-ically stealdataor work from remotedisksarefor futurestudy.

In orderto achieve goodperformancefor a staticworkloadallo-cation methodfor parallel computation,datamust be partitionedcarefully such that each processorhas approximatelythe sameamountof work. Furthermoredatamustbedistributedamongthe. disks suchthat the numberof parallel I/O is minimal. Fortu-natelythework loadof isocontouringcomputationon a processoris proportionalto thesizeof datait loadsfrom its localdisk. Weusea datapartition methodsimilar to that in [3] to distribute the dataaccordingthe contourspectrum[2]. The contourspectrumpro-videsa work loadhistogramfor differentisovaluequeries.We usethenumberof trianglesin the isosurfaceasthe S axisof thecon-tour spectrum.In the idealdatapartition,eachprocessordoesthesameamountof work for everyvalueof theparameterT . While [3]tries to breakdatainto rangepartitionsthat caneachfit into mainmemory, weherepartitionthewholerangeof dataaccordingto theworkloadhistogram.Thepartitioningof thewholerangeavoidstheproblemof dataduplicationamongdifferentrangepartitions.Afterdatais partitioned,anI/O-optimal interval tree[1] is built asindexstructurefor thedataoneachdiskin awaysimilarto [9]. Theexter-nalmemoryinterval treehasoptimal UV��WGX�Y * C�Z I/O operationsfor stabbingqueries,whereK is thenumberof active cells.

Cell is the minimum unit of the volumedataset.Functionval-uesaredefinedon the verticesof the cells andusuallytri-linearlyinterpolatedinsidecell. Ideally we canusethe granularityof cellfor thedatapartitionandbuild theexternalinterval treefor all thecells.Howeverasshown in [9], it is verystorageinefficient to buildanexternalindex datastructureusingtheunit of cell becauseof thehigh overheadof dataduplicationamongcells. Insteadwe usetheatomicunit of block,which is usuallya � . rectangularslabof ad-jacentcells. Although someextra cells maybebe loadedbecauseof the largergranularity, block providesthe possibilityof tradeoffbetweendisk spaceandI/O efficiency. In our implementation,wechoosethesizeof block asthedisk block sizesuchthatoneblockcanbeloadedin oneI/O operation.Sinceweuseablockastheunitof datapartition andaccessing,it is possibleto extract isosurfaceprogressively at differentresolutionto give userthebasicshapeofthesurfacewith minimumdelay. Figure4 showsanisosurface(iso-value1200)of theMaleMRI datais extractedandrenderedatthreedifferentresolutions.

At thefirst stageof our algorithm,we partition thedataamongmultiplecomputationalnodesandbuild theexternalinterval treeasthe index structurefor the blockson eachdisk. Herewe have as-sumedeachnodehasoneprocessorandassociatedlocal disk. Theparallelandout-of-coreisocontouringalgorithmconsistsof follow-ing steps.

1. Break the data into blocks. At thestartof thedatadistribu-tion, thereis aninitial distributionof dataamongthosedisks,

which is not load balancedfor isosurfacequery. For exam-ple,wecanjustbreakthebig volumeinto slabsandeachdiskcontainsoneslabof thedata.First we divide thedatasetintoblocks,eachof which is in the sameorderof the disk blocksize. With eachblock, we storeall the informationfor doingisocontourextractionontheblock, includingthefunctionval-uesof all verticesof theblock andthegeometricinformationsuchasthedimensions,theorigin andthespanof theblock.We alsostorethe rangeinterval of the block explicitly. Weconstructa triangularmatrix in rangespaceto help the datapartition. Oneaxis of the matrix representsthe entirerangeof function valueswhich is divided into a specifiednumberof buckets. The secondaxis is the numberof buckets thatthe rangeof a block spans.The minimum function valueofa block andthe numberof bucketsspannedby its rangede-terminewhich matrix elementsit belongsto. For instance,ifa block hasrange [ � \^]`_ andthebucket interval sizeis a , theblock would belongto the [cb�dGe�fhg

ELikjl m \ b�don�fpe

jlqm _ matrix ele-

ment.Thusblocksfalling into thesamematrix elementhavesimilar spanin therangespace.We categorizetheblocksac-cordingto whichrangespacetriangularmatrixelementit fallsinto.

2. Redistribution of data blocks. The blocks falling into thesamematrix elementare then assignedequally to the pro-cessors.This processis repeatedfor all thematrix elements.At thebeginningof redistribution,theprocessorsbroadcasttoeachotherthenumberof blocksin thematrixelementandthestatisticalinformationaboutthe blocks. Then eachproces-sor run thesamealgorithmto determineto which processorsit needsto sendblocksandfrom which processorsit will re-ceive blocks. Herefirst we try to make eachprocessorhavethe samenumberof blocks of the matrix element. Furtherconsiderationis givento minimizethenumberof blocksto becommunicated.

3. Build External Interval Tree as the search data structure.During the redistribution of blocks,every block assignedtooneprocessoris givena uniqueID andaddedto a singlefilethat containsall the blocksassignedto the processor. Everyblock is storedfrom thedisk block boundaryandits locationis easilydecidedfrom its ID. While theblock is written to theblock file, its rangeinterval associatedwith its ID is storedinaninterval file, whichwill beusedto build theindex structureto the datablocks. Sinceeven the index structuremay notfit into the main memorywhile the datasizegetslarger, wechoosethe external interval tree[1, 8] for the full scalabilityof ouralgorithm.

4. Isocontour Query Processing. Foranisovaluequery, wefirstsearchtheexternalinterval treeto find all blockswhoserangeintersectstheisovalue.Suchstabbingqueryonexternalinter-val treeis simpleandI/O optimal [1]. The isosurfaceis thenextractedfrom thoseintersectedblocks. The surfacecanbeextractedin compressedformat,aswe describelater. To givethe userquick response,the extractedsurfaceis streamedtothe parallelrenderingserverswhich may resideon the samemachine. The renderedimagesfrom the parallel renderingenginesarethencomposedby theMetabuffer. Sincewe usetheblockastheatomicunit of datadistributionandaccessing,thesurfacecanbeextractedatdifferentresolutionto meetthedifferenttime requirement.Theusercangeta quick view ofthe shapeof the isosurfacebeforehe canpick an interestingisovalueandstudyit in moredetail.

This algorithm provides a load balancedand fully out-of-coremethodfor isocontourextraction,which would bescalableto arbi-

Page 5: Scalab le Isosurface Visualization of Massive Datasets on

Figure4: An isosurfacefor theMale MRI data(isovalue= 1200)is progressively extractedandrenderedat differentresolutionsandfromvariousviewpoints. Theleftmostisosurfacehas rts \ ��uts triangles;thecenterisosurfacehas vkwyx \ r��ts triangles,andtherightmostisosurfacehas v \ s�skr \ uzx{� triangles.

trary largedataset.Theextractedsurfacearerenderedby theparal-lel renderingserversandfinally compositedby theMetabuffer. Thesurfaceextractionprocessandrendingprocesscanberun in paral-lel. Figure5 shows oneisosurfaceextractedandrenderedby eightprocessors.Sincewe have usedthe diagramof trianglenumbersto determinethe partition of dataset,eachprocessorwill generateapproximatelyequalnumberof trianglesandbalancethe load ofrendering.

Figure5: Individual portionsof an isosurface(isovalue= 800)ex-tractedfrom thevisible maleMRI datawith eightcomputingpro-cessorsandrenderedby eightrenderingservers.Thefull resolutionisosurfacehas9,128,798triangles.

3.2 Isosurface Compression

The isosurfacesextractedon the computationalservers needtobe renderedby the rendering servers and compositedby theMetabuffer. Thecomputationalprocessesandrenderingprocessesmight not bepresenton thesamemachines.Thereforeisosurfacesneedto betransmittedfrom theback-endto renderingprocessesviathe network. Furthermorewe needto save andcachethoselargeisosurfacesextractedfrom largedatasetto examinethemfrom dif-ferentviewing directions.Compactrepresentationof theisosurfaceshouldbeusedin the transmissionin orderto meetthe time limit.We employ a methodof edgeindex encodingto compresstheiso-surface. Given the function valuesat its u verticesgreateror lessthan the isovalue | , a cell has its inside isosurfacetopology de-terminedasoneof the rt}~��r�w�v possibleconfigurations,whichcanbefurtherreducedaccordingto rotationalsymmetryandvertexcomplement[21]. We caneasily derive what edgesof a cell areintersectedby the isosurfacefrom its index andconfiguration. Acell intersectedby theisosurfaceis calleda valid cell. Givenfunc-

tion valueson the two endpointsof the intersectededge,thepointon this edgeintersectedby theisosurfacecanbecomputed.A ver-tex that is oneendpoint of anedgeintersectedby theisosurfaceiscalleda relevantvertex. Hencetherepresentationof theisosurfacecanbereducedasencodingtheconfigurationsof valid cellsandthefunction valueson the relevant vertices. We further notice that acell configurationcanbedeterminedif we know which verticesofits 8 cornersarerelevantvertices.Thereforeall necessaryinforma-tion for reconstructingtheisosurfaceis known: therelevantverticesandvalid cells,andthefunctionvaluesontherelevantvertices.Thestepsof theedgeindex compressionalgorithmare:

1. For everyvertex onaslice,wesetabit � ��x if it is arelevantvertex, � ��� otherwise. Similarly for eachcell in a layerbetweentwo slices,we seta bit � ��x if it is a valid cell,� �-� otherwise.

2. Encodethevertex bitmapof asliceusinganentropy encodingmethod,suchasadaptive run-lengthcoder[7] or arithmeticcoder[33].

3. Encodethefunctionvaluesof therelevantverticesonthesliceusingthesecond-orderdifferenceof theirquantizedvalues.

4. Encodethecell bitmapof a layersimilarly.

5. Repeatupperstepsuntil thereareno moreslicesandlayers.

Model Value DataType Orig. Size Gen.Alg. EdgeIndex

Hipip 0.0 float 1,465,862 129,068 88,804Foot 600. u_short 7,536,227 587,344 407,658Foot 1500. u_short 8,645,243 761,413 386,506

Engine 180. u_char 3,601,040 224,893 121,504Engine 89. u_char 15,888,473 1,012,491 618,492

Table1: Comparisonof compressedisosurfacesizesin bytesusingedgeindex codingwith a generalsurfacecompressionalgorithm.Thegeneralcompressionalgorithmusesu bits for eachcoordinatepervertex. For unsignedshortandchardatatype,we encodethemdirectlyusingthepredictive coder. Floatvaluesarenormalizedandquantizedusing ��r bits for thefirst pointand x{s bits for difference.

This methodhasyieldeda very goodcompressionratio to iso-surfaceof regular3D meshcomparedto thegeneralpurposetrian-gular surfacecompressionalgorithmsasshown in table1, wherethe generalalgorithmis that of [4]. Furthermore,the edgeindex

Page 6: Scalab le Isosurface Visualization of Massive Datasets on

Figure6: Speedupof isosurfaceextractionandrenderingfor twoisovalues(800and1200)comparedto theidealcase.Thedatausedis theVisibleMaleMRI data.

methodhasthe advantageof incrementalencodinganddecodingbecausetwo slicesandonelayerarenecessaryin mainmemoryatany time for the encodinganddecodingprocesses.Thusboth thecompressionanddecompressionof theisosurfaceusingedgeindexencodingneedonly a small amountof main memoryand the re-constructionof the surfacecan start far beforethe whole surfacetransmissionis finished. Furthermorethe encodingof indicesandfunction valuesis doneduring isosurfaceextraction,suchthat noexpensive post-extractioncompressionprocessis necessary.

4 Implementation and Results

Here we presentsome experimentalresults of the parallel andout-of-coreisocontouringalgorithmand the parallelvisualizationframework. Our primary interestin thosetestsis to show thescal-ability of our algorithmsto thesizeof thedatasetandthenumberof computers.Our implementationplatformis aclusterof CompaqSP750PCworkstations,eachof which hasa 800MHZ PentiumIIIprocessorwith 256M of Rambusmemory, a 9GB systemdisk anda 18GB datadisk, andan nVdia GeforceII graphicscard. Thesemachinesareconnectedby 100Mb/sethernetandServernetII. Ourtestsusethe100Mb/sethernetfor communication.Thesemachinesrun Linux (kernel 2.2.18)as the operatingsystemand eachdiskblock sizeis 4,096bytes.

We useseveraldatasetsof differentsize,shown in table2, andadifferentnumberof machinesto test theperformanceof the algo-rithm. Thethreetestdatasets,whosesizerangesarefrom vkwtv , &

Name Dimension SizeMale MRI w`x@r�M�w`x@r�M�x@r�w�r 656MB

Malecryosection x�u�����M�x{������M�x�u���u 6.6GBFemalecryosection x�v�����M�x{������M�wyx{u�v 16.5GB

Table2: Thesizesof our testimagingdatasets.

to morethan x{vk��& , areall from thevisible humanprojectof theNationalLibrary of Medicine.Eachof thosedatasetsis toolargefora singlePCto handlein its mainmemory. While all thosedatasetsarefrom medicalimaging,ouralgorithmcanbecertainlyappliedtoothertypesof data,for examplethosefrom largescalesimulation.

Thefirst testdatais theMaleMRI dataset.Thesurfaceextractedononemachineis renderedby thesamemachineandtheimagesarethencompositedaccordingtheirz values.Everyrenderingserver inthis configurationhastheviewport of thewholedisplayspace.In

this configurationevery renderingserver rendersat thesamereso-lution. We measureits isosurfaceextractionandrenderingtime byusingfrom oneprocessorto �kr processors.Beingableto run effi-cientlyona singleprocessordemonstratestheout-of-corepropertyof ourmethod.Figure6 shows thespeedupof isosurfaceextractionandrenderingfor two typical isovaluesu���� and x@rt��� , correspond-ing to theskinandbonerespectively, for theMRI dataset.Figure7shows theactualtimeof eachprocessorfor theisovalue u���� with adifferentnumberof processors.

Figure7: Individual processortime for extractingandrenderinganisosurface(isovalue= 800) from theMale MRI datawith 1, 8, 16and32 processors.

The slopesof the two speedupcurves are very similar, whichdemonstratesthe the isocontour extraction algorithm appliesequallywell to differentisovalues.

Thecontourspectrumsof datapartitioningfor thecasesof u , x{vand �kr processorsareshown in figure 8. The first row of figuresshow thediagramsof the trianglenumbersextractedby eachpro-cessorfor therangeof isovaluefrom 0 to 1900,wherethecurvesofthereal experimentalresultin solid thin linesarecomparedto theideal casein thick dashedline. Thesecondrow of figuresaretheactualsurfaceextractionandrenderingtime for thewholerangeofisovalues.Thesimilarity of the two setof curvesjustifiesour useof thespectrumof trianglenumbersasthework loaddiagram.

Figure9: Isosurfaceextractionandrenderingtime for the VisibleMalecryosectionandFemalecryosectiondataat two differentiso-values(13000and29000).

We also test our implementationon the much larger MalecryosectionandFemalecryosectiondatasets.We startat partition-ing theMalecryosectiondatainto u piecesandtheFemalecryosec-tion datainto x�v pieces,becauseof the r���& file sizelimit of Linux.Figure9 shows thetime of isosurfaceextractionandrenderingforthe two datasetsfor two different isovalues x{������� and rt������� . Itgivesvery goodspeedupwhenthenumberof processorsanddisks

Page 7: Scalab le Isosurface Visualization of Massive Datasets on

(a) Triangledistribution for 8 processors (b)Triangledistributionfor 16processors (c) Triangledistribution for 32processors

(d) Isosurface extraction and renderingtime for 8 processors

(e) Isosurface extraction and renderingtime for 16processors

(f) Isosurface extraction and renderingtime for 32processors

Figure8: Thehistogramof triangledistribution andisosurfaceextractionandrenderingtime for thedatapartitioningof theMale MRI data,wherethethick dashedlinesrepresenttheaveragedidealcase.

increases.Thesharpdropof computationaltime from u processorsto x�v processorsfor the Male cryosectiondataand from x�v pro-cessorsto �kr processorsfor theFemalecryosectiondatais duetobetteroperatingsystemdiskcacheperformancefor thesmallerpar-titions. A very large isosurface( s�uk� \ v��kw \ ��skr triangles)extractedfrom theFemalecryosectiondatais shown in Figure10. While thesurfaceis noisydueto thedataset,it demonstratesthescalabilityofoursystemto very largedataandvery largesurface.

5 Conclusion and Future Work

In this paperwe proposea scalabletime-critical renderingframe-work of massive datastreamsbasedaroundthe Metabuffer imagecompositionhardware.Thetime-critical renderingprocesscan bethoughtof asa chainfrom parallelandprogressive meshgenera-tion, parallelrenderingto parallelimagecomposition.Wedescribein detaila scalableisocontouringalgorithmby takingadvantageofthe parallelprocessorsandparalleldisks. It partitionsthe volumedataaccordingto its workloadspectrumfor loadbalancingandcre-atesan I/O-optimal external interval treeto minimize the numberof I/O operationsof loadinglargedatafrom disk.

Thereareimprovementsnecessaryfor real interactivity for verylargedatasets.Therearealsotheremainingproblemsof introduc-ing dynamicload balancingby replicatingsomedataon differentdisksandaccessingdatafrom remotedisksat runtime. It is an in-terestingproblemto seehow muchimprovementwecanachievebychoosingtheright replicationfactoranddatadistribution.

Acknowledgments: This researchis supportedin part by grantsACI-9982297,DMS-9873326from the NationalScienceFounda-tion, a DOE-ASCI grant BD4485-MOID from SandiaNationalLaboratory, LawrenceLivermoreNationalLaboratory, from grantUCSD 1018140aspart of the NationalPartnershipfor Advanced

Figure10: Two views of anisosurface(isovalue= 29000)extractedand renderedfrom the full resolutionVisible Femalecryosectiondata.Theisosurfacecontains���k�y�����k�y���t�`� triangles

Page 8: Scalab le Isosurface Visualization of Massive Datasets on

ComputationalInfrastructure,agrantBD 781from theTexasBoardof Higher� Education,anda gift andclustersupportfrom CompaqComputerCorporation.

References

[1] ARGE, L ., AND V ITTER, J. S. Optimalinterval managementin externalmemory. In Proc.IEEE Foundationsof ComputerScience(1996),pp.560–569.

[2] BAJAJ, C., PASCUCCI , V., AND SCHIKORE, D. The con-tourspectrum.In Proceedingsof the1997IEEEVisualizationConference(October1997).

[3] BAJAJ, C., PASCUCCI , V., THOMPSON, D., AND ZHANG,X. Parallel accleratedisocontouringfor out-of-corevisual-ization. In Proceedingsof IEEE Parallel VisualizationandGraphicsSymposium(October1999),pp.97–104.

[4] BAJAJ, C., PASCUCCI , V., AND ZHUANG, G. Singleresolu-tion compressionof arbitrarytriangularmesheswith proper-ties. In IEEEDataCompressionConference(1999),pp.247–256.

[5] BAJAJ, C., AND ZHANG, X. Streamingcompressedsurfacesextractedfrom compressedvolumes.Tech.rep.,UniversityofTexasatAustin,2000.

[6] BLANKE, W. J., FUSSELL , D. S., BAJAJ, C., AND ZHANG,X. The metabuffer: A scalablemultiresolutionmultidis-play3-dgraphicssystemusingcommodityrenderingengines.Tr2000-16,Universityof Texasat Austin,February2000.

[7] BOTTOU, L ., HOWARD, P., AND BENGIO, Y. The z-coderadaptive binary coder. In Proceedingsof IEEE Data Com-pressionConferenceDCC’98 (March1998),pp.13–22.

[8] CHIANG, Y., AND SILVA , C. T. I/O optimal isosurfaceex-traction. In IEEE Visualization9́7 (Nov. 1997),R. YagelandH. Hagen,Eds.,IEEE,pp.293–300.

[9] CHIANG, Y.-J., SILVA , C. T., AND SCHROEDER, W. J. In-teractive out-of-coreisosurfaceextraction. In Proceedingsofthe 9th Annual IEEE Conferenceon Visualization(VIS-98)(Oct.18–231998),ACM Press,pp.167–174.

[10] CHOW, M. M. Optimizedgeometrycompressionfor real-time rendering. In Proceedingsof IEEE Visualization ’97(Phoenix,1997),pp.347–354.

[11] DEERING, M. Geometrycompression.ComputerGraphics(SIGGRAPH95Proceedings)(1995),13–20.

[12] ELDRIDGE, M., IGEHY, H., AND HANRAHAN, P.Pomegranate:A fully scalablegraphicsarchitecture. Com-puterGraphics(SIGGRAPH2000Proceedings)(2000),443–454.

[13] ELLSIEPEN, P. Parallel isosurfacing in large unstructeddatasets. In Visualization in ScientificComputing(1995),Springer-Verlag,pp.9–23.

[14] EYLES, J., MOLNAR, S., POULTON, J., GREER, T., LAS-TRA , A., ENGLAND, N., AND WESTOVER, L . Pixelflow:Therealization.In Proceedingsof theSiggraph/EurographicsWorkshoponGraphicsHardware (August1997),pp.57–68.

[15] FUSSELL , D. S., AND RATHI , B. D. A vlsi-orientedarchi-tecturefor real-timerasterdisplay of shadedpolygons. InGraphicsInterface’82 (May 1982).

[16] GUEZIEC, A., TAUBIN, G., LAZARUS, F., AND HORN, W.Cuttingandstitching:Efficient conversionof a non-manifoldpolygonalsurfaceto a manifold. Tech.Rep.RC-20935,IBMT.J.WatsonResearchCenter, 1997.

[17] HANSEN, C., AND HINKER, P. Massively parallelisosurfaceextraction. In Visualization’92 (September1992).

[18] HEIRICH, A., AND MOLL , L . Scalabledistributedvisualiza-tion usingoff-the-shelfcomponents.In Parallel VisualizationandGraphicsSymposium– 1999(SanFrancisco,California,October1999),J.Ahrens,A. Chalmers,andH.-W. Shen,Eds.

[19] HUMPHREYS, G., BUCK , I ., ELDRIDGE, M., AND HAN-RAHAN, P. Distributed renderingfor scalabledisplays. InProceedingsof Supercomputing2000(October2000).

[20] HUMPHREYS, G., AND HANRAHAN, P. A distributedgraph-ics systemfor large tiled displays. In Proceedingsof IEEEVisualizationConference(1999),pp.215–223.

[21] LORENSEN, W., AND CL INE, H. Marching cubes: Ahigh resolution3d surfaceconstructionalgorithm. ComputerGraphics21, 4 (July 1987),163–169.

[22] MOLNAR, S., COX , M., ELLSWORTH, D., AND FUCHS, H.A sortingclassificationof parallelrendering.IEEEComputerGraphicsandApplications14, 4 (July1994).

[23] MOLNAR, S. E. Imagecompositionarchitecturesfor real-time imagegeneration. Ph.d.dissertation,technicalreporttr91-046,Universityof NorthCarolina,1991.

[24] PARKER, S., SHIRLEY, P., L IVNAT, Y., HANSEN, C., ANDSLOAN, P. Interactive ray tracingfor isosurfacerendering.InVisualization’98 (October1998).

[25] RAMACHANDRAN, V. Personalcommunication,November1999.

[26] SAMANTA , R., ZHENG, J., FUNKHOUSER, T., L I , K., ANDSINGH, J. P. Load balancingfor multi-projectorrenderingsystems.In SIGGRAPH/EurographicsWorkshoponGraphicsHardware (August1999).

[27] SCHNEIDER, B.-O. Parallelrenderingonpcworkstations.InParallel andDistributedProcessingTechniquesandApplica-tions(July1998),pp.1281–1288.

[28] SHEN, H., HANSEN, C., L IVNAT, Y., AND JOHNSON, C.Isosurfacingin spanspacewith utmostefficiency (issue). InVisualization’96 (1996),pp.287–294.

[29] TAUBIN, G., AND ROSSIGNAC, J. Geometriccompressionthroughtopologicalsurgery. ACM Trans.Gr. 17, 2 (1996),84–115.

[30] TOUMA , C., AND GOTSMAN, C. Trianglemeshcompres-sion. In Proceedingsof the24thConferenceon GraphicsIn-terface(GI-98) (1998),W. Davis, K. Booth,andA. Fourier,Eds.,MorganKaufmannPublishers,pp.26–34.

[31] VALIANT, L . G. A bridgingmodelfor parallelcomputation.Communicationsof theACM 33 (September1990),103–111.

[32] V ITTER, J. S. Externalmemoryalgorithmsanddatastruc-ture. In DIMACSSeriesin DiscreteMathematicsandTheo-retical ComputerScience(1999).

[33] WITTEN, I ., NEAL , R., AND CLEARY, J. Arithmetic codingfor datacompression.Commun.ACM 30 (June1987),520–540.

Page 9: Scalab le Isosurface Visualization of Massive Datasets on

(a) (b)

(c) (d)

Figure 1: (a),(c)Parallel isosurfacevisualizationof the Male MRI datashowing the skin andskeletonstructures.(a) Skin:isovalue= 800,numberof triangles= 9,128,803,(d) Skeleton:isovalue= 1200,numberof triangles= 6,442,810.(b),(d)Visualizinga time varyinggashydrodynamicssimulationof largestructuregalaxyformation(datacourtesyof Dr. Paul ShapiroandDr. HugoMartel) . (b) Translucentisosurfacestracegasdensity, andspheresrepresentasamplingof thesimulatedgasparticles.(d) A full-resolutionisosurfaceof gasdensity(isovalue= 160)showing thefilamentarystructurecontains21,329,730triangles.

(a) (b)

Figure 2: (a)VisualizingtheVisible Male dataon theUT multi-tiled backprojectionsystemusingtheMetabuffer simulator.(b)Parallel visualizationof a 3D seismicsimulationdata(courtesyof Dr. Paul Stoffa) of the northernBarbadosRidgeusingcombinedvolumerenderingandisocontouringshown on theUT front projectionsystem.