23
EDGE-BASED FINITE ELEMENT TECHNIQUES FOR NONLINEAR SOLID MECHANICS PROBLEMS Alvaro L.G.A. Coutinho 1,2 (*), Marcos A.D. Martins 1,2 , José L.D. Alves 1,2 , Luiz Landau 2 and Anderson Moraes 3 1 Center for Parallel Computing COPPE/Federal University of Rio de Janeiro PO Box 68516, Rio de Janeiro, RJ 21945-970, Brazil 2 Laboratory for Computational Methods in Engineering Department of Civil Engineering COPPE/Federal University of Rio de Janeiro PO Box 68506, Rio de Janeiro, RJ 21945-970, Brazil 3 PETROBRAS/CENPES/DIVEX/SETEX Cidade Universitária, Quadra 7, Ilha do Fundão, 21949-900, Rio de Janeiro, RJ, Brazil E-mail: alvaro,marcos, [email protected],br, [email protected] , [email protected] Voice: +55-21-290-7116 Fax: +55-21-290-6626 (*) Corresponding author

EDGE-BASED FINITE ELEMENT TECHNIQUES FOR NONLINEAR SOLID ...alvaro/papers/plast_edge.pdf · EDGE-BASED FINITE ELEMENT TECHNIQUES FOR NONLINEAR SOLID MECHANICS PROBLEMS Alvaro L.G.A

  • Upload
    dangnga

  • View
    220

  • Download
    5

Embed Size (px)

Citation preview

EDGE-BASED FINITE ELEMENT TECHNIQUES FOR NONLINEARSOLID MECHANICS PROBLEMS

Alvaro L.G.A. Coutinho 1,2 (*), Marcos A.D. Martins 1,2, José L.D. Alves 1,2,Luiz Landau 2 and Anderson Moraes 3

1 Center for Parallel Computing COPPE/Federal University of Rio de Janeiro PO Box 68516, Rio de Janeiro, RJ 21945-970, Brazil

2 Laboratory for Computational Methods in Engineering Department of Civil Engineering COPPE/Federal University of Rio de Janeiro PO Box 68506, Rio de Janeiro, RJ 21945-970, Brazil

3 PETROBRAS/CENPES/DIVEX/SETEX Cidade Universitária, Quadra 7, Ilha do Fundão, 21949-900, Rio de Janeiro, RJ, Brazil

E-mail: alvaro,marcos, [email protected],br, [email protected], [email protected]

Voice: +55-21-290-7116Fax: +55-21-290-6626(*) Corresponding author

ABSTRACT

Edge-based data structures are used to improve computational efficiency of InexactNewton methods for solving finite element nonlinear solid mechanics problems onunstructured meshes. Edge-based data structures are employed to store the stiffnessmatrix coefficients and to compute sparse matrix-vector products needed in the inneriterative driver of the Inexact Newton method. Numerical experiments on three-dimensional plasticity problems have shown that memory and computer time are reducedrespectively by factors of 4 and 6, compared with solutions using element-by-elementstorage and matrix-vector products.

Keywords: Edge-based data structures; Inexact Newton methods; Finite elements.

1. INTRODUCTION

Solving three-dimensional large-scale solid mechanics problems undergoing plasticdeformations is of fundamental importance in several science and engineeringapplications. Particularly in the Oil and Gas Industry, solid mechanics is being used toimprove the understanding of complex geologic problems, thus helping to reduce risksand operational costs in exploration and production activities (Arguello, 1998).

Traditional finite element technology for nonlinear quasi-static problems involves therepeated solution of systems of sparse linear equations by a direct solution method, thatis, some variant of Gauss elimination. The updating and factorization of the sparse globalstiffness matrix can result in extremely large storage requirements and a very largenumber of floating point operations.

Explicit quasi-static nonlinear finite element technologies (Biffle, 1993), on the otherhand, may be employed, reducing considerably memory requirements. Although robustand straightforward to implement, explicit schemes, based on dynamic relaxation ornonlinear conjugate gradients may suffer from low convergence rates.

In this paper we employ an Inexact Newton method (Kelley, 1995), to solve large-scalethree-dimensional incremental elastic-plastic finite element problems found in geologicapplications. In the Inexact Newton Method, at each nonlinear iteration, a linear systemof finite element equations is approximately solved by the preconditioned conjugategradient method. Convergence of two variants of Inexact Newton Methods, the InexactInitial Stress and the Inexact Tangent Stiffness methods, have been recently establishedfor J2-plasticity problems with return mapping stress computation (Blaheta and Axelsson,1997, Blaheta, 1997). Axelsson et al (1996) also provided convergence proofs for thecases of explicit and implicit stress computation, but restricted to von Mises materials.Regarding computational aspects, the kernels of the Inexact Newton Methods are thesame of the iterative driver, that is, matrix-vector products and preconditioning, besidesresidual evaluations and stiffness matrix updatings.

In the finite element method the implementation of global matrix-vector products areeasily parallelized in different computer architectures, performing element level productsfollowed by global assembly. This type of implementation is often referred to element-by-element (EBE) schemes. Matrix-vector products computed by EBE schemes arememory intensive, needing more operations than the product with the assembled matrix,because element matrices have many overlapping non-zero entries. However, particularlyfor large-scale nonlinear problems EBE methods have been very successful, because theyhandle large sparse matrices in a simple and straightforward manner. Besides, efficientpreconditioners may be derived keeping the same data structure. For a recent review ofsuch topics please see Ferencz and Hughes (1998).

Nevertheless, matrix-vector products can be further optimized using edge-based datastructures. These data structures have been introduced for residual evaluations in explicitfinite volume computations of compressible flows on unstructured grids (Barth, 1991,Mavriplis, 1992), and extended to finite elements by Peraire et al (1992) and Luo et al(1994). The use of edge-based data structures for the solution of Euler and Navier-Stokesflows in complex 3D configurations improved the overall efficiency of the algorithm andenabled a straightforward implementation of upwind schemes in the finite elementmethod. On the other hand, Venkatakrishnan (1994) has shown that sparse matrix-vectorproducts, where the non-symmetric sparse matrix comes from an unstructured 2D finitevolume discretization of a partial differential equation, may be efficiently computed inparallel using an edge-based data structure. Further, the edge-based product with thetranspose matrix is also more efficient than the product using standard sparse matrix datastructures. Venkatakrishnan has also studied the partitioning of the sparse matrix,interpreting the edge-based data structure as a graph representation of the global sparsematrix. A good description of the relations between unstructured grids, graphs and sparsematrices can be found in Saad (1996). Martins et al (1997) compared the performance ofEBE and edge-based data structures in the finite element solution of large-scale 3Dpotential flow problems. They have found that edge-based data structures reducedsolution time by a factor of 3 compared with EBE strategies. Memory requirements werealso reduced. Catabriga et al (1998) derived new edge-based preconditioners for non-symmetric finite element equations. Computational experiments have shown that edge-based preconditioners improve solution time and require less memory than EBEpreconditioners.

In the present work we extend the edge-based data structures to nonlinear quasi-staticsolid mechanics problems on unstructured finite element meshes. Edge-based datastructures are used to improve the computational efficiency of the sparse matrix-vectorproducts needed in the inner iterative driver of the Inexact Newton method. Theremainder of this work is organized as follows. In the next section we briefly review thegoverning nonlinear finite element equations and the Inexact Newton methods. Section 3describes the edge-based data structures for solid mechanics. Section 4 shows thenumerical examples. The paper ends with a summary of the main conclusions.

2. INCREMENTAL EQUILIBRIUM EQUATIONS AND THE INEXACT NEWTONMETHOD

The governing equations for the quasi-static deformation of a body occupying a volumeΩ is,

Ωρσ

inbx i

j

ij 0=+∂∂

(1)

where σij is the Cauchy stress tensor, xj is the position vector, ρ is the weight per unitvolume and bi is a specified body force vector. Equation (1) is subjected to the kinematicand traction boundary conditions,

hijijuii intxhnintxutxu ΓσΓ ),(;),(),( == (2)

where Γu represents the portion of the boundary where displacements are prescribed ( iu )

and Γh represents the portion of the boundary on which tractions are specified (hi). Theboundary of the body is given by hu ΓΓΓ ∪= , and t represents a pseudo-time (or

increment). Discretizing the above equations by a displacement-based finite elementmethod we arrive to the discrete equilibrium equation,

0int =+ extFF (3)

where Fint is the internal force vector and Fext is the external force vector, accounting forapplied forces and boundary conditions. Assuming that external forces are appliedincrementally and restricting ourselves to material nonlinearities only, we arrive, after astandard linearization procedure, to the nonlinear finite element system of equations to besolved at each load increment,

RuKT =∆ (4)

where KT is the tangent stiffness matrix, function of the current displacements, ∆u is thedisplacement increments vector and R is the unbalanced residual vector, that is, thedifference between internal and external forces.

RemarkWe consider here perfect-plastic materials described by Mohr-Coulomb yieldcriterion. Stress updating is performed by an explicit, Euler-forwardsubincremental technique (Crisfield, 1990).

Some form of Newton method generally solves the nonlinear finite element system ofequations given by Eq. (4), where the tangent stiffness matrix has to be updated andfactorized at every nonlinear iteration. This approach is known as Tangent Stiffness (TS)method. The burden of repeated stiffness matrix updatings and factorizations isalleviated, at the expense of more iterations, by: keeping the tangent stiffness matrixfrozen within a load increment; iterating with the elastic stiffness matrix, known as theInitial Stress (IS) method. Further improvements, especially in 3D problems, may beachieved replacing the direct solver by an iterative solution method such aspreconditioned conjugate gradients (PCG) or multigrid (Ferencz and Hughes, 1998,Parsons, 1997). The iterative drivers are usually implemented employing EBE strategies,computing matrix-vector products as,

e

nel

ee pKKp ∑

=

=1

(5)

where nel is the total number of elements in the mesh.

When solving iteratively the finite element system of linear equations, it isstraightforward to employ inexact versions of the standard Newton-like methods (Kelley,1995, Papadrakakis, 1993, Axelsson et al, 1996, Blaheta and Axelsson, 1997, Blaheta,1997). In this case, tolerances for the inner iterative driver may be adaptively selected tominimize computational effort towards the solution, giving rise to the followingalgorithm:

Given utol, rtol, relative displacement and residual tolerancesfor k=1,2 … , number of load increments do

Compute external force vector, kextF

do i while convergence

Compute internal force vector, iFint

Compute residual vector, ikext

i FFR int−=

Update tangent stiffness matrix, iTK

Compute tolerance for iterative driver, ηi

Solve iiT RuK =∆ for tolerance ηi

Update solution, uuu ∆+←

If toluu

u≤

∆ and tolk

ext

i

rF

R≤ then convergence

End whileEnd do.

According to the above algorithm, an approximate solution is obtained when the InexactNewton termination criterion is satisfied, that is, when,

ii

iiT RRuK η∆ ≤− (6)

The tolerance ηi may be selected using the expression (Papadrakakis, 1993),

=

α

ηηη0maxmin ,min,max

R

R i

i , 10 ≤< α (7)

where ηmin , ηmax and α are problem dependent parameters. Selecting ηi by Eq. (7) resultsin an adaptive procedure for solving the nonlinear systems of equations within each loadincrement. Equation (6) will force the tolerance ηi to lie within the interval [ηmin , ηmax ].When the nonlinear problem is far from the solution, the iterative driver computes a lowcost, low accuracy approximation. As nonlinear iterations progress towards the solution,the iterative driver produces better approximations.

Another method for selecting ηi , suggested by Kelley (1995), is based on a measure ofhow far the nonlinear iteration is from the solution, that is,

10,,min 20

2

max <<

= γγηη

R

R i

Ai (8)

If Aiη is uniformly limited away from 1, and taking ( )A

ii ηηη ,max min= , Kelley (1995) has

shown general convergence properties when Eq. (8) is used. To avoid that Aiη be too

small when the nonlinear iteration is away from the solution, Kelley also suggests thefollowing modification,

( )( )( )

≥<

=−−

1.0,,max,min

1.0,,min2

12

1max

21max

iiAi

iAiB

iif

if

γηγηηηγηηη

η (9)

In some cases iR can be very small, well beyond the required accuracy, resulting in

undesired work. To remedy this oversolving Kelley (1995) proposes to compute Ciη

using,

=

i

Kext

tolBi

Ci

R

Fr5.0,max,min max ηηη (10)

and finally taking ( )cii ηηη ,max min= .

Remarks:

i. Our experience indicates that selecting ηmax=0.1 and 6min

3 1010 −− ≤≤ η , for utol andrtol in the usual range, that is, 10-3 to 10-2, is enough for practical engineeringcomputations. Typical values for α and γ are 0.5 and 0.1, respectively.

ii. Inexact variants of the Tangent Stiffness and Initial Stress methods, respectivelyidentified as ITS and IIS are naturally defined. The works of Axelsson et al (1996),Blaheta and Axelsson, (1996) and Blaheta (1997) provide rigorous convergenceproofs of ITS and IIS methods for von Mises materials and different stress updatingalgorithms.

iii. In our implementation of internal forces evaluation we first split the elements inelastic and plastic lists. For each list we compute internal forces in parallel. For theelements in the plastic list we also store the plastic flow vectors to update theelement tangent stiffness matrices.

iv. We employ PCG as iterative driver. Since a small number of iterations is usuallyperformed at each PCG solve, we adopted a simple nodal block-diagonalpreconditioner. Therefore, the most expensive computational kernel in the linearsolver is the matrix-vector product.

3. EDGE-BASED DATA STRUCTURES

Edge-based finite element data structures have been introduced for explicit computationsof compressible flow in unstructured grids composed by triangles and tetrahedra (Peraire,1992, Luo, 1994). It was observed in these works that residual computations with edge-based data structures were faster and required less memory than standard element-basedresidual evaluations. Martins et al (1997) have studied edge-based data structures for theimplicit finite element solution of potential flow problems in large-scale industrialapplications. They use the concept of disassembling the finite element matrices to buildthe edge matrices. Following these developments, for 3D solid mechanics problems onunstructured meshes, we may derive an edge-based finite element scheme by noting thatthe element matrices can be disassembled into their edge contributions as,

∑=

=m

s

es

e KK1

(11)

where esK is the contribution of edge s to eK and m is the number of element edges (6

for tetrahedra). For instance, Figure 1 shows a tetrahedron and its edge disassemblinginto its six edges.

=

44

3433

242322

14131211

~

~~

~~~

~~~~

~

k

kk

kkk

kkkk

K e

[ kij~] = (3x3)

Element stiffness matrix

Tk k

k ke

T T~~ ~

~ ~1

12 12

12 12

=−

Edge matrix 1

Tk k

k ke

T T~~ ~

~ ~2

23 23

23 23

=−

Edge matrix 2

Tk k

k ke

T T

~~ ~

~ ~3

13 13

13 13

=−

Edge matrix 3

Tk k

k ke

T T~~ ~

~ ~4

14 14

14 14

=−

Edge matrix 4

Tk k

k ke

T T~~ ~

~ ~5

24 24

24 24

=−

Edge matrix 5

Tk k

k ke

T T~~ ~

~ ~6

34 34

34 34

=−

Edge matrix 6

Figure 1. Edge-based disassembling of a tetrahedral finite element for elasticityproblems.

Denoting by E the set of all elements sharing a given edge s, we may add theircontributions, arriving to the edge matrix,

∑∈

=Es

ess KK (12)

The resulting matrix is symmetric, preserving the structure of the edge matrices given inFigure 1. Thus, we need to store just one 3×3 block per edge. It is important to note thatthe edge matrix corresponds to the off-diagonal blocks of the assembled stiffness matrix.The corresponding edge-by-edge matrix-vector product may be written as,

s

nedges

ss pKKp ∑

=

=1

(13)

where nedges is the total number of edges in the mesh and ps is the restriction of p to theedge degrees-of-freedom. Implementation of the edge-by-edge matrix-vector productsfollows the standard EBE implementation. We used the standard pointers, localizationmatrices and destination arrays of finite element implementations (Hughes, 1987), in thiscase for a two-node element, that is, an edge. In Table 1 we compare the storagerequirements to hold the coefficients of the element stiffness matrices and the edgestiffness matrices as well as the flop count and indirect addressing (i/a) operations forcomputing matrix-vector products using element and edge-based data structures. All datain these tables are referred to nnodes, the number of nodes in the finite element mesh.According to Lohner (1994), the following estimates are valid for unstructured 3D grids,nel ≈ 5.5×nnodes, nedges ≈ 7×nnodes.

Data structure Memory flop i/a

EBE 429×nnodes 1386×nnodes 198×nnodes

Edges 63×nnodes 252×nnodes 126×nnodes

Table 1. Memory to hold the stiffness matrix coefficients and computational costs forelement and edge-based matrix-vector products for tetrahedral finite element meshes.

Clearly data in Table 1 favors the edge-based scheme. However, compared to EBE datastructure, the edge scheme does not present a good balance between flop and i/aoperations. Indirect addressing represents a major CPU overhead in vector, RISC andcache-based parallel machines. To improve this ratio, Lohner (1994) have proposedseveral alternatives to the single edge scheme. The underlying concept of suchalternatives is that once data has been gathered, reuse them as much as possible. Thisidea, combined with node renumbering strategies (Lohner, 1998), introduces furtherenhancements in the finite element edge-based scheme. Martins et al (1997) have foundthat among the suggested alternatives, a structure formed by gathering edges in spatialtriangular and tetrahedral arrangements, the superedges, present a high data reutilizationratio and are simple to implement. The superedges are formed reordering the edge list,

gathering edges with common nodes to form tetrahedra and triangles. To make adistinction between elements and superedges, we call a triangular superedge asuperedge3 and a tetrahedral superedge a superedge6. The matrix-vector product for asuperedge3 may be expressed as,

( )∑=

++++ ++=3

...7,4,12211

ned

sssssss pKpKpKKp (14)

and for a superedge6,

( )∑=

++++++++++ +++++=6

...13,7,15544332211

ned

sssssssssssss pKpKpKpKpKpKKp (15)

where ned3 and ned6 are respectively the number of edges grouped as superedge3’s andsuperedge6’s. Equations (14) and (15) may be regarded as a loop unrolling techniqueapplied to the edge loop. Table 2 gives the estimates for i/a reduction and flop increasefor both types of superedges. We may see that we achieved a good reduction of i/aoperations per edge, with a negligible increase of floating point operations.

Type Edges Nodes ia/edge ia reduction flop/edge flop increase

Edge 1 2 18: 1 1,00 36: 1 1,00

Superedge3 3 3 27: 3 0,50 110: 3 1,02

Superedge6 6 4 36: 6 0,33 221: 6 1,02

Table 2. Indirect addressing reduction and flop increase for the superedges.

For a given finite element mesh we first reorder the nodes by Reverse Cuthill-Mckeealgorithm to improve data locality. Then we extract the edges forming as much aspossible superedge6’s and superedge3’s. After that we color each set of edges by agreedy algorithm to allow parallelization on shared vector multiprocessors and scalableshared memory machines. To illustrate how a finite element mesh is subdivided intoedges, superedge6’s and superedge3’s, Figure 2 shows an unstructured mesh for a cubeand the resulting groups of edges. We have observed that for general unstructured gridsmore than 50% of all edges can be grouped into superedge6’s, while 5-10% are groupedinto superedges3’s.

(a) Finite element mesh (b) Superedge6’s

(c) Superedge3’s (d) Single edges

Figure 2. Subdivision of a finite element mesh into edges and superedges.

4. NUMERICAL EXAMPLES

4.1 Performance assessment of edge-based matrix-vector product

The performances of each edge-based matrix-vector product algorithm on a Cray J90superworkstation and on a SGI Origin 2000 are shown respectively on Tables 3 and 4. Inthese experiments we employed randomly generated indirect addressing to map global tolocal quantities, that is, edge quantities. Table 3 lists the CPU times for the matrix-vectorproducts on the Cray J90 for an increasing number of edges, supposing that all edges inthe mesh may be grouped as superedge3’s or superedge6’s.

We may observe that gathering the edges in superedges reduces considerably the CPUtimes, particularly for superedge6's. Another set of experiments were conducted on theSGI Origin 2000, a scalable shared memory multiprocessor. The average results of 5runs, in non-dedicated mode, considering a total number of edges of 2,000,000 are shownin Table 4. We may observe that all data structures present good scalability, but the

superedges are faster. For 32 CPU’s the superedge6 matrix-vector product is almost 3times faster than the product with single edges.

Number ofNedges

Edges Superedge3 Superedge6

3,840 1.92 1.89 1.87

38,400 2.81 2.42 2.23

384,000 11.96 8.12 5.82

3,840,000 102.4 59.64 42.02

38,400,000 1,005.19 579.01 399.06

Table 3. CPU times in seconds for edge-based matrix-vector products on the Cray J90.

Procs Edges Superedge3 Superedge6

4 48.0 22.8 15.6

8 31.0 15.8 11.1

16 19.0 9.6 6.8

32 10.9 5.8 3.9

Table 4. CPU times in seconds for edge-based matrix-vector products on the SGI Origin2000.

4.2 Three-dimensional Elastic-Plastic Halfspace under Point Load

This problem concerns the application of a point load normal to the surface of ahomogeneous, isotropic elastic-plastic halfspace. The domain, a unit cube, is modeledwith a cubic mesh, where each cube is subdivided into 5 tetrahedra. Symmetry boundaryconditions are assumed on three faces. Material properties are: Young’s modulus208.33 MPa; Poisson ratio 0.3; internal friction angle 20O; cohesion 0.07 MPa. Weconsider 4 different meshes, with increasing refinement. Topological data for the meshesare gathered in Table 5.

Mesh Elements Edges Nodes Equations

8× 8× 8 2,560 3,672 729 1,408

16×16×16 20,480 26,928 4,913 11,776

32×32×32 163,840 205,920 35,937 96,256

64×64×64 1,310,720 1,609,920 274,625 778,240

Table 5. Topological data for the meshes of the three-dimensional elastic-plastichalfspace under point load problem.

In all meshes the number of superedge6’s are around 60% of the total number of edges,and they are concentrated inside the domain. The number of superedge3’s varies between8% of total edges for the coarse mesh to 5% for the fine mesh. The superedge3’s appearmostly in the domain boundary. We solve this problem on the Cray J90, considering 10load increments, up to the complete failure using the IIS method, comparingcomputational costs for element and edge-based strategies. Displacement and residualtolerances were set to 10-3. We stop the nonlinear solution procedure on the last loadincrement after 500 nonlinear iterations. Table 6 lists the memory requirements for thesolution of these problems using both strategies. The edge-based solutions required lessthan one third of the memory of the element-based strategies.

Mesh Elements Edges Ratio (%)

edges/elements

8× 8× 8 0.24 0.07 29.67

16×16×16 1.91 0.53 27.70

32×32×32 15.26 4.08 26.75

64×64×64 123.98 32.07 26.28

Table 6. Memory requirements (Mwords) on the Cray J90 for the solution of the three-dimensional elastic-plastic halfspace under point load problem.

It is important to stress here that we use the edge-based data structure only in the matrix-vector products needed in the iterative driver. Plasticity computations are performedelementwise, as usual. Table 7 shows the total number of IIS iterations and correspondingCPU times on one CPU of a Cray J90, for the element and edge-based solutions. Weselected the tolerances for PCG using Eq.(10) and fixed ηmax=0.1 and ηmin=10-6. Weobserved that selecting tolerances by Eq.(7) yields more nonlinear iterations without anyaccuracy improvements.

Element Edge

Mesh IIS iterations CPU (s) IIS Iterations CPU (s)

8× 8× 8 557 96 539 39

16×16×16 727 1,167 728 434

32×32×32 729 12,326 729 3,983

64×64×64 N/A N/A 1,339 84,361

Table 7. IIS iterations and CPU times for the element and edge-based solutions on theCray J90 of the three-dimensional elastic-plastic halfspace under point load problem.

We may note that the edge-based solutions are typically 3 to 4 times faster than theirelement-based counterparts. We also observed that the PCG driver required very fewiterations to converge. For the edge-based solution on the fine mesh it was required onaverage 32 PCG iterations per linear solve. The total number of PCG iterations, ormatrix-vector products, was 43,126. Considering that we are solving nonlinear systems of778,240 unknowns, we achieved a very efficient solution strategy. We also solved thefine mesh problem in parallel. Figure 3 show the obtained speed-up curves on the CrayJ90.

0123456789

0 5 10

Procs

Spee

d-up matvec

mohr3d

perfect

Figure 3. Parallel speed-up’s on the Cray J90.

In Figure 3 matvec stands for the edge-based matrix-vector product routine and mohr3dfor the internal forces evaluation, which is involves elementwise computations. We mayobserve that good parallel speed-up was obtained in both cases. However, matrix-vectorproducts dominate solution costs. Figure 4 shows a plot of yield ratio isosurfaces for thelast increment. The yield ratio is defined as the ratio between the equivalent stress toyield stress. If the yield ratio is one, plasticity is fully developed.

Figure 4. Yield ratio isosurfaces for the three-dimensional elastic-plastic halfspace underpoint load problem on the fine mesh.

4.3 Extensional Behavior of a Sedimentary Basin

We analyze the extensional behavior of a sedimentary basin presenting a sedimentarycover (4 km) over a basement (2 km) with length of 15 km and thickness of 6 km. Themodel has an ancient fault between the two material layers, parallel to the right and leftfaces, with 500 m length and 60o of slope. A two-dimensional elastic analysis of thismodel was performed earlier by Moraes (1995). The relevant material properties, fromMoraes (1995), are compatible with the sediment pre-rift sequence and basement. Wehave densities of 2,450 kg/m3 and 2,800 kg/m3 respectively for the sediment layer andbasement; Young’s modulus of 20 GPa for the sedimentary cover and 60 GPa for thebasement; Poisson’s ratio, 0.3 for both rocks. The ratio between initial horizontal andvertical (gravitational) stresses is 0.429. We assume that both materials are underundrained conditions and modeled by Mohr-Coulomb failure criterion. Thus, we havesedimentary cover cohesion of 30 MPa, basement cohesion of 60 MPa, and internalfriction angle of 30o for both materials. We consider the model simply supported at itsleft and bottom faces, and we apply tension stresses at the right face and shear stresses atthe basement, opposing the basin extension. The loads are applied in 15 increments,ranging from 0.01 to 9.90, and the analysis is performed until the complete failure of themodel. The finite element mesh comprises 25,001 tetrahedra, 5,257 nodal points and32,133 edges. The number of edges grouped into superedge6's and superedge3's are 56.6and 4.3% of the total number of edges. Figure 5 shows a view of the mesh and thedeformed configuration for the last load increment.

Fig. 5. Deformed finite element mesh for the sedimentary basin with fault at the 15th loadincrement.

Table 8 presents a comparison of memory requirements to hold the stiffness matrix by aprofile solver, by the PCG-EBE storage scheme and by our PCG-Edges scheme. Theprofile solver is a standard implementation of symmetric Crout factorization usuallyemployed in Newton’s method. We may see that our solution strategy required 1.5% ofthe storage area needed by the profile solver and 6 times less memory than the PCG-EBEscheme.

Profile Solver PCG-EBE PCG-Edges

19,381,711

(1.000)

1,950,078

(0.100)

289,197

(0.015)

Table 8. Memory requirements to hold stiffness matrix (64-bit words).

Memory is a very important limiting factor of current large-scale 3D analyses and ourscheme presents an inexpensive memory demand compared to the usual procedures.Tables 9 and 10 compare the total number of nonlinear and PCG iterations and therelative CPU times for respectively the Initial Stress (IS) and Tangent Stiffness (TS) aswell as their Inexact variants (IIS and ITS). We show in these Tables IIS and ITSsolutions where PCG tolerances adaptively chosen between 10-6 to 10-2 and 10-6 to 10-1

according to the convergence of the outer (nonlinear) iterations. We have used Eq. (10) toselect these tolerances. In IS and TS solutions PCG tolerances were fixed to 10-6 .

IS IIS [10-6 , 10-2] IIS [10-6 , 10-1]

Nonlinear Iters 1,587 1,589 1,589

PCG Iters 271,846 146,381 68,626

CPU 1.00 0.55 0.28

Table 9. Edge-based Initial Stress solutions – 13 load increments, halted after 950nonlinear iterations; nonlinear tolerances 10-3.

TS ITS [10-6 , 10-2] ITS [10-6 , 10-1]

Nonlinear Iters 116 119 156

PCG Iters 27,806 15,726 12,897

Max Stress (MPa) 122.0 122.0 122.0

CPU 0.10 0.07 0.06

Table 10. Edge-based Tangent Stiffness solutions – 15 load increments, nonlineartolerances 10-3.

We may see in these Tables that the IIS and ITS solutions are faster than the IS and TSsolutions, although requiring more nonlinear iterations. The fastest solution is the ITSsolution where the PCG tolerance varies between 10-6 to 10-1. In Table 10, we may notealso that the maximum stress is the same for all solutions. Further, we observed thatmatrix-vector products are responsible for most of computer costs, and updating thestiffness matrix is inexpensive. The accuracy of IIS and ITS methods may be observed inFigures 6 and 7, where we show the load-displacement curves for all analyses. In theseFigures, the vertical displacements are measured for a node at the model midsection, inthe top of the sediment layer, over the fault.

We may note the complete rupture of the model, particularly in the ITS analysis. The IISanalyses were halted in the 13th load increment, after 950 nonlinear iterations. Figure 8shows the deformed mesh obtained by ITS methods at the last load increment. We mayalso see in this figure the yield ratio fringes at basin midsection. It can be observed thedevelopment of failure zones at the sedimentary cover and embankment, mostly in theright portion of the model, with an incision to the left underneath the fault. Thus, the faultis acting as a barrier for the spread of the failure zones. Similar behavior has been notedearlier by Moraes (1995) for two-dimensional models.

Figure 6. Initial Stress Load-displacement curves.

Figure 7. Tangent Stiffness Load-displacement curves.

Figure 8. Fringes of Mohr-Coulomb yield ratio at basin midsection for the 15th loadincrement.

We consider now the effects of a skew fault on the extensional behavior of a sedimentarybasin. The model also presents a sedimentary cover (4 km) over a basement (2 km) withlength of 15 km and thickness of 6 km, but with an ancient skew fault with 500 m lengthand 60o of slope. The skew fault starts at 5 km from the right face and ends at 5 km fromthe left face. The relevant material properties are the same of the previous analysis. Thefinite element mesh (see Figure 9) comprises 2,611,036 tetrahedra, 445,752 nodal pointsand 3,916,554 edges. We may see in Figure 9 the mesh refinement at the skew fault. Thenumber of superedge6’s is 57% of the total number of edges, while the number ofsuperedge3’s is just 6% of total. Boundary conditions and loads are exactly the same ofthe other analysis. However, loads are applied in 12 increments. Memory requirements tosolve this problem are 203.9 and 35.3 Mwords, respectively for element and edge-basedschemes.

We solve this problem on a 16 CPU’s Cray J90se using the ITS method and the edge-based strategy. Displacement and residual tolerances were set to 10-3. We selected againPCG tolerances using Eq.(10) and fixed their variation to be in the interval [10-6, 10-1].The parallel solution took only 15 minutes of elapsed time, corresponding to 36 nonlinearITS iterations and 9,429 PCG iterations. Figure 10 shows the yield ratio contours in thelast load increment. The effects of the skew fault on the two layers of the sedimentarybasin may be clearly seen.

Figure 9. Finite element mesh for the sedimentary basin with skew fault.

Figure 10. Yield ratio contours for the sedimentary basin at 12th load increment.

5. CONCLUSIONS

In this paper we have presented edge-based finite element techniques to improve theefficiency of Inexact Newton methods in the solution of large-scale three-dimensionalnonlinear solid mechanics problems on unstructured meshes. We have found bynumerical experimentation that the Inexact Initial Stress and Inexact Tangent Stiffnessmethods present good convergence and accuracy for solving plasticity problems wherematerials are modeled by Mohr-Coulomb criterion with explicit Euler-forward stressupdating. In our numerical experiments the fastest solutions were obtained by the InexactTangent Stiffness method.

The use of edge-based data structures to store stiffness matrix coefficients and tocompute matrix-vector products in the inner iterative driver of the Inexact Newtonmethod improved the overall computational efficiency of the solution algorithm. We haveobserved that solution times and memory were reduced respectively by factors of 4 and 6,compared with solutions using element-by-element stiffness matrix storage and element-by-element matrix-vector products. The use of superedges also contributes to improve thecomputer performance, reducing the effects of indirect addressing. The most effectivesuperedge is the one formed grouping 6 edges into a tetrahedron. In our numericalexperiments 50 to 60% of all edges could be grouped in this type of superedge. However,we made no attempt to optimize our grouping algorithm to improve this ratio.

Finally, we may anticipate that by using the concept of disassembling finite elementmatrices into their edge contributions, edge-based techniques can also be easilyimplemented for other types of finite elements.

ACKNOWLEDGEMENTS

This work is partially supported by CNPq grant 522692/95-8. Computer time on theCray J90 is provided by the Center of Parallel Computing at COPPE/UFRJ. The authorsare indebted to SGI Brazil for providing computer time on a Cray J90se and anOrigin 2000 at Eagan, MN, USA. We would like to thank Dr. F.L.B. Ribeiro, from theLaboratory for Computational Methods in Engineering, by his invaluable help in thevisualization of the sedimentary basin results.

REFERENCES

Arguello, J.G., Stone, C.M., Fossum. A.F., Progress on the development of a threedimensional capability for simulating large-scale complex geologic process, 3rd North-American Rock Mechanics Symposium, Int. S. Rock Mechanics, paper USA 327-3,1998.

Axelsson, O., Blaheta, R., Kohut, R., Inexact Newton solvers in plasticity, theory andexperiments, Report 9606, Dept. of Mathematics, University of Nijmegen, TheNetherlands, 1996.

Biffle, J.H., JAC3D - A three-dimensional finite element computer program for thenonlinear quasi-static response of solids with the conjugate gradient method, SandiaReport SAND87-1305, 1993.

Blaheta, R., Axelsson, O., Convergence of inexact Newton-like iterations in incrementalfinite element analysis of elasto-plastic problems, Comp. Meth. Appl. Mech. Engrg, 141,pp. 281-295, 1997.

Blaheta, R., Convergence of Newton-type methods in incremental return mappinganalysis of elasto-plastic problems, Comp. Meth. Appl. Mech. Engrg, 147, pp. 167-185,1997.

Barth, T.J., Numerical aspects of computing viscous high Reynolds number flows onunstructured meshes, AIAA 91-0721, 1991.

Catabriga, L., Martins, M.A.D., Coutinho, A.L.G.A., Alves, J.L.D., Clustered edge-basedpreconditioners for non-symmetric finite element computations, ComputationalMechanics, New Trends and Applications, Editors, S.R. Idelsohn, E. Onate, E.N.Dvorkin, CD-ROM, CIMNE, UPC, Barcelona, Spain, 1998.

Crisfield, M.A., Nonlinear finite element analysis of solids and structures, John Wileyand Sons, 1991.

Ferencz, R.M., Hughes, T.J.R., Iterative finite element solutions in nonlinear solidmechanics, in Handbook for Numerical Analysis, Vol. VI, Editors, P.G. Ciarlet and J.L.Lions, Elsevier Science BV, 1998.

Hughes, T.J.R., The finite element method, Prentice-Hall, NJ, 1987.

Kelley, C.T.H. Iterative Methods for Linear and Nonlinear Equations, SIAM,Philadelphia, 1995.

Mavriplis, D.J., Three-dimensional unstructured multigrid for the euler equations, AIAAJournal, 30(7), pp. 1753-1761, 1992.

Lohner, R., Edges, stars, superedges and chains, Comp. Meth. Appl. Mech. and Engrg.,Vol. 111, pp. 255-263, 1994.

Lohner, R., Renumbering strategies for unstructured-grid solvers operating on shared-memory, cache-based parallel machines, Comp. Meth. In Appl. Mech. Engrg., Vol. 163,pp. 95-109, 1998.

Luo, H, Baum, J.D., Lohner, R., Edge-based finite element scheme for the eulerequations, AIAA Journal, 32 (6), pp. 1183-1190, 1994.

Martins, M.A.D., Coutinho, A.L.G.A., Alves, J.L.D., Parallel iterative solution of finiteelement systems of equations employing edge-based data structures, 8th SIAMConference on Parallel Processing for Scientific Computing, Editors, M. Heath et al,1997.

Moraes A., A study of local stress fields and fault generation in extensional regimes bythe finite element method, MSc Thesis, Federal University of Ouro Preto, Ouro Preto,Brazil (in Portuguese), 1995.

Papadrakakis, M., Solving large-scale problems in mechanics: the development andapplication of computational solution procedures, John Wiley and Sons, 1993.

Parsons, I.D., Parallel adaptive multigrid methods for elasticity, plasticity and eigenvalueproblems, in Solving Large-Scale Problems in Mechanics-Volume II: Parallel andDistributed Computer Applications, M. Papadrakakis, editor, J.Wiley, 1997.

Peraire, J., Peiro, J., Morgan, K., A 3d finite element multigrid solver for the eulerequations, AIAA Paper 92-0449, 1992.

Saad, Y., Iterative methods for sparse linear systems, PWS Publishing Company, 1996.

Venkatakrishnan, V., Parallel computation of Ax and ATx, Int. Journal for High SpeedComputing, 6, pp. 324-342, 1994.