1
Fig. 5. Spatial distribution of vertical permeability values in the MOSYS V1 modelling system mesh in the clustering experiment territory in Latvia. Screenshot fragment of MOSYS visualization platform HiFiGeo, log scale, EPSG:25884 projection. 300000 350000 400000 450000 500000 550000 600000 650000 700000 750000 6180000 6200000 6220000 6240000 6260000 6280000 6300000 6320000 6340000 6360000 6380000 6400000 6420000 6440000 5E-008 1E-007 5E-007 1E-006 5E-006 1E-005 5E-005 0.0001 0.0005 0.001 0.005 0.01 0.05 0.1 0.5 1 5 10 PermeabilityZ [m/day] Introduction Introduction The cover of Quaternary sediments, especially in formerly glaciated territories usually is the most complex part of the sedimentary sequences. In regional hydro-geological models it is often assumed as a single layer with uniform or calibrated properties (Valner, 2003). However, the properties and structure of Quaternary sediments control the groundwater recharge: it can either direct the groundwater flow horizontally towards discharge in topographic lows or vertically, recharging groundwater in the bedrock. Building a representative structure of Quaternary strata in a regional hydrogeological model is important as this affects all the underlying strata. Scope and objective Scope and objective This work aims to present calibration results and detail our experience while integrating a scalable generalization of hydraulic conductivity for Quaternary strata into the regional groundwater modelling system for the Baltic artesian basin – MOSYS V1. Methods Methods In this study the main unit of generalization is the spatial cluster and the main criteria for finding adequate representative borehole columns is the Normalized Copression Distance metric – a nonparametric datamining technique calculated by CompLearn ncd utility (Cilibrasi, 2007). Spatial clusters were obtained from flattened complete- link hierarchical clustering dendrogram representations, supplying cluster membership identifiers for each borehole. Using Voronoi tesselation and GIS dissolve spatial analysis functionality on this data spatial clusters were obtained in GIS format. The procedure is described in detail in “Lithological Uncertainty Expressed by Normalized Compression Distance” by Jatnieks et al. 2012. Full overview of the experiment workflow with software components and key stages is shown in Figure 1. We used two experiment matrices for hierarchical clustering. One where spatial clusters are obtained from Normalized Compression Distance (further identified as NCD matrix) matrix alone and a second where matrix is combined with range normalized Euclidean distance matrix of borehole locations in BalticTM93 projection (further identified as NCD+E experiment matrix). INVESTING IN YOUR FUTURE This work was supported by the European Social Fund project "Establishment of interdisciplinary scientist group and modelling system for ground-water research", 2009/0212/1DP/1.1.1.2.0/09/APIA/VIAA/060 Fig. 6. Middle fragment of vertical crossection AB (overview in Figure 5) across the Vidzeme region, showing horizontal permeability values on the right Y axis in Quaternary represented by 8 modelling system layers (only Q layers shown, for simplicity). Left Y axis - Z values in meters above sea level. Distance from crossection A point on X axis. Fig. 9. Squared error distribution by model calibration layers for each of the geometries derived from different logical clustering solutions using NCD matrix. 12 13 14 15 17 16 23 50 51 52 53 55 54 22 56 60 61 57 58 20 21 59 48 49 64 9 63 62 6 11 39 19 38 18 37 10 70 71 75 74 73 72 67 47 65 66 46 5 36 35 69 68 24 34 33 32 4 45 77 76 43 42 29 41 28 25 26 40 27 30 8 31 44 85 84 7 82 80 81 83 78 79 0 100 200 300 400 500 600 700 Squared error sum distribution by clustering solution and associated experimental geometry structure - NCD matrix D2nr D2ar D2br D3gj D3am D3pl D3slp D3dg D3ogkt D3stel D3jnsk C1 P Q Number of logical clusters after dendrogram flattening Squared error sum in meters by model calibration layers (smaller is better) Fig. 11. Squared error distribution by model calibration layers for each of the geometries derived from different logical clustering solutions using NCD+E matrix. 70 69 67 68 42 43 51 50 47 49 46 45 44 81 82 48 66 57 52 55 56 53 62 54 61 64 63 85 83 84 60 65 59 58 78 77 79 80 72 71 73 74 76 37 40 41 39 38 33 32 31 30 29 27 28 35 34 36 26 25 24 23 22 20 21 17 18 19 16 11 14 12 13 15 10 9 8 5 6 4 7 0 200 400 600 800 1000 1200 1400 Squared error sum distribution by clustering solution and associated experimental geometry structure - NCD+E matrix D2nr D2ar D2br D3gj D3am D3pl D3slp D3dg D3ogkt D3stel D3jnsk C1 P Q Squared error sum in meters by model calibration layers (smaller is better) Number of logical clusters after dendrogram flattening References References 1.Bennett, C. H., Gacs P., Li M., Vitanyi P., Zurek W. 1998, Information Distance, IEEE Transactions on Information Theory, 44(4), 1407-1423., IEEE. 2. Cilibrasi, R. 2007., Statistical Inference Through Data Compression, ILLC Dissertation Series DS-2007-01, Institute for Logic, Language and Computation, Universiteit van Amsterdam. 3. Klints I., Virbulis J., Timuhins A., Sennikovs J. and Bethers U., Calibration of the hydrogeological model of the Baltic Artesian Basin, EGU Geophysical research abstracts EGU2012-10003 [this section poster board A255]. 4. Takčidi, E. 1999. Datu bāzes "Urbumi" dokumentācija [Documentation of the database "Boreholes"]. Valsts ģeoloģijas dienests, Rīga. [In Latvian]. 5. Valner, L. 2003. Hydrogeological model of Estonia and its applications. Proc. Estonian Acad. Sci. Geol., 52, 3, 179-192. 6. Virbulis, J., Bethers, U., Saks, T., Sennikovs, J. & Timuhins, A. Hydrogeological model of the Baltic Artesian Basin. Hydrogeology Journal. [Under revision]. 7. C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound constrained optimization (1997), ACM Transactions on Mathematical Software, 23, 4, pp. 550 - 560. Results Results The experiment geometries based on NCD matrix perform better overall. The various Quaternary representations tested have a measurable effect on model results - the worst geometries, such as NCD+E 7, 3 and others with low logical cluster count, perform more than twice as bad compared to the best NCD clustering based geometries. NCD clustering based geometries with the better initial calculation results (in Figures 8 and 10), such as the 12 and 13 cluster variants, also tend to respond better to optimization with MOSYS auto-calibration routine (Figure 12). Some the geometries with lower results from intial model run, show very minimal improvement with calibration ( Figure 12). For our experiment territory, consisting of formerly glaciated territory of Latvia, the best performing generalizations have 12 and 13 clusters. This could have been successfully deduced from the clustering dendrogram in Figure 2. The best performing model geometry before applying the NCD metric based clustering solution, had squared error sum of 814 meters. The best NCD clustering geometry provides 9.7% improvement to 735. This may seem modest in nominal terms. It isn't, because the experimental Quaternary strata were inserted in a previously calibrated model geometry with initial conductivity values as estimates. Model autocalibration works by multiplying layer material properties with coefficients determined by the the Scipy L-BFGS-B routine (Zhu et al. 1997). Since this works on a layer basis, the proportional differences in conductivity values between elements in each layer remain constant. Ideally, the model geometry area built using clustering and the rest of the territory would be calibrated seperately. This could yield further improvements. A B Fig. 3. Algorithm for calculating Z values for including spatial clustering results into the MOSYS finite-element mesh geometry and resolving structure conflicts, arising from different lithological structures, geological columns selected from the most typical borehole log in this cluster. 380000 390000 400000 410000 420000 430000 440000 6355000 6360000 6365000 6370000 6375000 6380000 6385000 6390000 6395000 6400000 6405000 5E-008 1E-007 5E-007 1E-006 5E-006 1E-005 5E-005 0.0001 0.0005 0.001 0.005 0.01 0.05 0.1 0.5 1 5 10 50 100 PermeabilityXY [m/day] Fig. 4. After the algorithm described in Fig. 3 creates the layer structure into the model geometry, the respective conductivity values are assigned to mesh elements, created by the new placeholder layers. The cluster membership for each of mesh elements is determined by intersecting the element centroids, shown here in white with spatial clusters in GIS Shapefile layer. Log scale for conductivity values on Y axis. Fig. 2. Complete-link agglomerative hierarchical clustering dendrogram for Normalized Compression Distance matrix of serialized borehole log lithological structures. SCALABLE GENERALIZATION OF HYDRAULIC CONDUCTIVITY IN QUATERNARY SCALABLE GENERALIZATION OF HYDRAULIC CONDUCTIVITY IN QUATERNARY STRATA FOR USE IN A REGIONAL GROUNDWATER MODEL STRATA FOR USE IN A REGIONAL GROUNDWATER MODEL J nis J TNIEKS ā Ā J nis J TNIEKS ā Ā 1 1 , Konr ds POPOVS ā , Konr ds POPOVS ā 1 1 , Ilze KLINTS , Ilze KLINTS 2 2 , Andrejs TIMUHINS , Andrejs TIMUHINS 3 3 , Andis KALV NS Ā , Andis KALV NS Ā 1 1 , Aija D LI A Ē Ņ , Aija D LI A Ē Ņ 1 1 , Tomas SAKS , Tomas SAKS 1 1 1 1 Faculty of Geography and Earth Sciences, Faculty of Geography and Earth Sciences, 2 2 Faculty of Physics and Mathematics, Faculty of Physics and Mathematics, 3 3 Laboratory for Mathematical Modelling of Environmental and Technological Processes. Laboratory for Mathematical Modelling of Environmental and Technological Processes. University of Latvia, Riga, Latvia. Contact: University of Latvia, Riga, Latvia. Contact: [email protected] [email protected] , , www.puma.lu.lv www.puma.lu.lv Class Lithology Hori- zontal conduc- tivity, m/day Vertical conduc- tivity, m/day 1 sandstone 5 1 2 sandstone- low conductivity 1 0.5 3 sandstone- high conductivity 50 20 4 limestone 30 10 5 limestone-low conductivity 1 0.1 6 dolomite-high conductivity 100 30 7 dolomite 30 10 8 domerite (clayey dolomite) 0.1 0.01 9 clay 0.00001 0.0000 1 10 loam 0.001 0.001 11 sandy loam 0.001 0.001 12 silt 0.01 0.01 13 sand 5 5 14 gravel 20 20 15 gypsum 10 10 16 peat 10 10 17 gyttja and other biogenic sediments 0.0003 0.0003 18 soil 1 1 19 other 1 1 20 concrete 0 0 Table 1. Initial conductivity values from lithology classifier, used for generalization of borehole log structures for this study. This degree of similarity and the spatial heterogeneity of the cluster polygons can be varied by different flattening of the hierarchical cluster model into variable number of clusters. Such an approach provides a scalable generalization solution. It can be scaled from broad to more detailed generalization, depending on model and structural characteristics of the modelling territory, with the help of the clustering dendrogram ( Figure 2). Using the dissimilarity matrix of the NCD metric, a borehole, most similar to all the others from the lithological structure point of view, can be identified from the subset of all cluster member boreholes. The log structure of this borehole then is applied throughout the territory of the corresponding spatial cluster. The spatial locations of the same logical clusters can be disjoint - in different parts of the modelling territory and have the same borehole log column, representing the lithological structure throughout spatial clusters with the same logical cluster identifier. 162 model geometries were prepared, incorporating different candidate representations of Quaternary strata made from spatial clusters with the most typical borehole column, selected using NCD metric. The borehole log column lithological structure is composed using aggregate lithology types as shown in Table 1 with their conductivity values. To integrate these results into the geometry of regional groundwater model MOSYS V1, the algorithm, shown in Figure 3, calculates mesh point heights in meters above sea level for every mesh point in every newly created Quaternary layer. After the layer surface calculations, the mesh element cluster memebership is determined by intersecting element centroids (white dots in Figure 4) with the spatial clusters. The performance of resulting geometries is detailed in Figures 8-11. For both matrix variants (NCD and NCD+E) 82 model geometries, based on different flattenings of cluster dendrogram ( Figure 2) were prepared and run in MOSYS V1 modelling environment. From the inital model run results in Figures 8 and 10, 3 better perfoming, one median and also the worst performing geometry was run through MOSYS autocalibration routine (Klints et al. 2012). This provided insight in their sensitivity to calibration through further optimization of hydraulic conductivity values in Quaternary strata (Figure 12). The model geometry, in which the Quaternary clustering experiment geometries were integrated, had already been calibrated until convergence before these experiements. Fig. 7. Previous implementation of Quaternary strata in the MOSYS V1 modelling system consisted of 4 layers - two sets of aquifer-aquitard transitions. The arrow indicates the border between the previous Quaternary implementation (left) and the new Quaternary implementation, using hierarhical clustering of Normalized Compression Distanece, on the right. Fig. 12. Selected experimental model geometries and their response to calibration of model Quaternary strata. The response of the selected new model geometries to calibration indicates their influence on the underlying model strata, not just improved performance in the part of the model representing the Quaternary deposits. Geometries NCD 12c, NCD 13c were calibrated further, including the remaining bedrock layers as optimization parameters and yielding further decrease of squared error sum, used as overall fitness function for model performance against observed pressure heads. Examples of crossections from NCD 12c structure are shown in Figures 6 and 7. Fig. 1. Overview of the full experiment workflow for using Normalized Compression Distance and hierarchical clustering as an approch for scalable generalization of Quaternary sediment lithological structure in the MOSYS regional groundwater modelling system for the Baltic artesian basin territory. 12 13 14 15 17 16 23 50 51 52 53 55 54 22 56 60 61 57 58 20 21 59 48 49 64 9 63 62 6 11 39 19 38 18 37 10 70 71 75 74 73 72 67 47 65 66 46 5 36 35 69 68 24 34 33 32 4 45 77 76 43 42 29 41 28 25 26 40 27 30 8 31 44 85 84 7 82 80 81 83 78 79 1050 1060 1070 1080 1090 1100 1110 1120 1130 1140 Number of logical clusters against squared sum of head errors - NCD matrix Number of logical clusters after dendrogram flattening Squared error sum in meters (smaller is better) Fig. 8. Squared sum of errors against geometries prepared from different clustering solutions using NCD matrix. 70 69 67 68 42 43 51 50 47 49 46 45 44 81 82 48 66 57 52 55 56 53 62 54 61 64 63 85 83 84 60 65 59 58 78 77 79 80 72 71 73 74 76 37 40 41 39 38 33 32 31 30 29 27 28 35 34 36 26 25 24 23 22 20 21 17 18 19 16 11 14 12 13 15 10 9 8 5 6 4 7 1050 1100 1150 1200 1250 1300 1350 1400 1450 1500 1550 1600 1650 1700 1750 Number of logical clusters against squared sum of head errors - NCD+E matrix Number of logical clusters after dendrogram flattening Squared error sum in meters (smaller is better) Fig. 10. Squared sum of errors against geometries prepared from different clustering solutions using NCD+E matrix. 1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103 109 115 121 127 133 139 145 151 157 163 169 175 181 187 193 199 205 211 217 223 229 235 241 247 253 259 265 271 277 283 289 295 301 307 313 319 325 331 337 343 349 355 361 367 373 379 385 391 850 900 950 1000 1050 1100 1150 Target function values against optimization iteration count Optimization runs for Quaternary strata in 5 model geometries for each of NCD and NCD + E matrices NCD 12c NCD 13c NCD 14c NCD 74c NCD 79c NCD+E 7c NCD+E 67c NCD+E 69c NCD+E 70c NCD+E 73c Number of iterations for MOSYS auto-calibration routine Squared error sum in meters (smaller is better)

SCALABLE GENERALIZATION OF HYDRAULIC CONDUCTIVITY IN ... · Fig. 5. Spatial distribution of vertical permeability values in the MOSYS V1 modelling system mesh in the clustering experiment

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Fig. 5. Spatial distribution of vertical permeability values in the MOSYS V1 modelling system mesh in the clustering experiment territory in Latvia. Screenshot fragment of MOSYS visualization platform HiFiGeo, log scale, EPSG:25884 projection.

    300000 350000 400000 450000 500000 550000 600000 650000 700000 750000

    6180000

    6200000

    6220000

    6240000

    6260000

    6280000

    6300000

    6320000

    6340000

    6360000

    6380000

    6400000

    6420000

    6440000

    5E-008

    1E-007

    5E-007

    1E-006

    5E-006

    1E-005

    5E-005

    0.0001

    0.0005

    0.001

    0.005

    0.01

    0.05

    0.1

    0.5

    1

    5

    10

    PermeabilityZ [m

    /day]

    IntroductionIntroductionThe cover of Quaternary sediments, especially in formerly

    glaciated territories usually is the most complex part of the sedimentary sequences. In regional hydro-geological models it is often assumed as a single layer with uniform or calibrated properties (Valner, 2003). However, the properties and structure of Quaternary sediments control the groundwater recharge: it can either direct the groundwater flow horizontally towards discharge in topographic lows or vertically, recharging groundwater in the bedrock. Building a representative structure of Quaternary strata in a regional hydrogeological model is important as this affects all the underlying strata.

    Scope and objectiveScope and objectiveThis work aims to present calibration results and detail

    our experience while integrating a scalable generalization of hydraulic conductivity for Quaternary strata into the regional groundwater modelling system for the Baltic artesian basin – MOSYS V1.

    MethodsMethodsIn this study the main unit of generalization is the spatial

    cluster and the main criteria for finding adequate representative borehole columns is the Normalized Copression Distance metric – a nonparametric datamining technique calculated by CompLearn ncd utility (Cilibrasi, 2007).

    Spatial clusters were obtained from flattened complete-link hierarchical clustering dendrogram representations, supplying cluster membership identifiers for each borehole. Using Voronoi tesselation and GIS dissolve spatial analysis functionality on this data spatial clusters were obtained in GIS format. The procedure is described in detail in “Lithological Uncertainty Expressed by Normalized Compression Distance” by Jatnieks et al. 2012. Full overview of the experiment workflow with software components and key stages is shown in Figure 1.

    We used two experiment matrices for hierarchical clustering. One where spatial clusters are obtained from Normalized Compression Distance (further identified as NCD matrix) matrix alone and a second where matrix is combined with range normalized Euclidean distance matrix of borehole locations in BalticTM93 projection (further identified as NCD+E experiment matrix).

    INVESTING IN YOUR FUTURE

    This work was supported by the European Social Fund project "Establishment of interdisciplinary scientist group and

    modelling system for ground-water research", 2009/0212/1DP/1.1.1.2.0/09/APIA/VIAA/060

    Fig. 6. Middle fragment of vertical crossection AB (overview in Figure 5) across the Vidzeme region, showing horizontal permeability values on the right Y axis in Quaternary represented by 8 modelling system layers (only Q layers shown, for simplicity). Left Y axis - Z values in meters above sea level. Distance from crossection A point on X axis.

    Fig. 9. Squared error distribution by model calibration layers for each of the geometries derived from different logical clustering solutions using NCD matrix.

    12 13 14 15 17 16 23 50 51 52 53 55 54 22 56 60 61 57 58 20 21 59 48 49 64 9 63 62 6 11 39 19 38 18 37 10 70 71 75 74 73 72 67 47 65 66 46 5 36 35 69 68 24 34 33 32 4 45 77 76 43 42 29 41 28 25 26 40 27 30 8 31 44 85 84 7 82 80 81 83 78 79

    0

    100

    200

    300

    400

    500

    600

    700

    Squared error sum distribution by clustering solution and associated experimental geometry structure - NCD matrix

    D2nr D2ar D2br D3gj D3am D3pl D3slp D3dg D3ogkt D3stel D3jnsk C1 P Q

    Number of logical clusters after dendrogram flattening

    Squa

    red

    erro

    r sum

    in m

    eter

    s by

    mod

    el c

    alib

    ratio

    n la

    yers

    (sm

    alle

    r is

    bette

    r)

    Fig. 11. Squared error distribution by model calibration layers for each of the geometries derived from different logical clustering solutions using NCD+E matrix.

    70 69 67 68 42 43 51 50 47 49 46 45 44 81 82 48 66 57 52 55 56 53 62 54 61 64 63 85 83 84 60 65 59 58 78 77 79 80 72 71 73 74 76 37 40 41 39 38 33 32 31 30 29 27 28 35 34 36 26 25 24 23 22 20 21 17 18 19 16 11 14 12 13 15 10 9 8 5 6 4 7

    0

    200

    400

    600

    800

    1000

    1200

    1400

    Squared error sum distribution by clustering solution and associated experimental geometry structure - NCD+E matrix

    D2nr D2ar D2br D3gj D3am D3pl D3slp D3dg D3ogkt D3stel D3jnsk C1 P Q

    Squa

    red

    erro

    r sum

    in m

    eter

    s by

    mod

    el c

    alib

    ratio

    n la

    yers

    (sm

    alle

    r is

    bette

    r)

    Number of logical clusters after dendrogram flattening

    ReferencesReferences1. Bennett, C. H., Gacs P., Li M., Vitanyi P., Zurek W. 1998, Information Distance, IEEE Transactions on Information Theory,

    44(4), 1407-1423., IEEE.2. Cilibrasi, R. 2007., Statistical Inference Through Data Compression, ILLC Dissertation Series DS-2007-01, Institute for

    Logic, Language and Computation, Universiteit van Amsterdam.3. Klints I., Virbulis J., Timuhins A., Sennikovs J. and Bethers U., Calibration of the hydrogeological model of the Baltic

    Artesian Basin, EGU Geophysical research abstracts EGU2012-10003 [this section poster board A255].4. Takčidi, E. 1999. Datu bāzes "Urbumi" dokumentācija [Documentation of the database "Boreholes"]. Valsts ģeoloģijas

    dienests, Rīga. [In Latvian].5. Valner, L. 2003. Hydrogeological model of Estonia and its applications. Proc. Estonian Acad. Sci. Geol., 52, 3, 179-192.6. Virbulis, J., Bethers, U., Saks, T., Sennikovs, J. & Timuhins, A. Hydrogeological model of the Baltic Artesian Basin.

    Hydrogeology Journal. [Under revision].7. C. Zhu, R. H. Byrd and J. Nocedal. L-BFGS-B: Algorithm 778: L-BFGS-B, FORTRAN routines for large scale bound

    constrained optimization (1997), ACM Transactions on Mathematical Software, 23, 4, pp. 550 - 560.

    ResultsResultsThe experiment geometries based on NCD matrix perform

    better overall.The various Quaternary representations tested have a

    measurable effect on model results - the worst geometries, such as NCD+E 7, 3 and others with low logical cluster count, perform more than twice as bad compared to the best NCD clustering based geometries.

    NCD clustering based geometries with the better initial calculation results (in Figures 8 and 10), such as the 12 and 13 cluster variants, also tend to respond better to optimization with MOSYS auto-calibration routine (Figure 12).

    Some the geometries with lower results from intial model run, show very minimal improvement with calibration (Figure 12).

    For our experiment territory, consisting of formerly glaciated territory of Latvia, the best performing generalizations have 12 and 13 clusters.

    This could have been successfully deduced from the clustering dendrogram in Figure 2.

    The best performing model geometry before applying the NCD metric based clustering solution, had squared error sum of 814 meters. The best NCD clustering geometry provides 9.7% improvement to 735. This may seem modest in nominal terms. It isn't, because the experimental Quaternary strata were inserted in a previously calibrated model geometry with initial conductivity values as estimates. Model autocalibration works by multiplying layer material properties with coefficients determined by the the Scipy L-BFGS-B routine (Zhu et al. 1997). Since this works on a layer basis, the proportional differences in conductivity values between elements in each layer remain constant. Ideally, the model geometry area built using clustering and the rest of the territory would be calibrated seperately. This could yield further improvements.

    A

    B

    Fig. 3. Algorithm for calculating Z values for including spatial clustering results into the MOSYS finite-element mesh geometry and resolving structure conflicts, arising from different lithological structures, geological columns selected from the most typical borehole log in this cluster.

    380000 390000 400000 410000 420000 430000 4400006355000

    6360000

    6365000

    6370000

    6375000

    6380000

    6385000

    6390000

    6395000

    6400000

    6405000

    5E-008

    1E-007

    5E-007

    1E-006

    5E-006

    1E-005

    5E-005

    0.0001

    0.0005

    0.001

    0.005

    0.01

    0.05

    0.1

    0.5

    1

    5

    10

    50

    100

    PermeabilityXY [m

    /day]

    Fig. 4. After the algorithm described in Fig. 3 creates the layer structure into the model geometry, the respective conductivity values are assigned to mesh elements, created by the new placeholder layers. The cluster membership for each of mesh elements is determined by intersecting the element centroids, shown here in white with spatial clusters in GIS Shapefile layer. Log scale for conductivity values on Y axis.

    Fig. 2. Complete-link agglomerative hierarchical clustering dendrogram for Normalized Compression Distance matrix of serialized borehole log lithological structures.

    SCALABLE GENERALIZATION OF HYDRAULIC CONDUCTIVITY IN QUATERNARY SCALABLE GENERALIZATION OF HYDRAULIC CONDUCTIVITY IN QUATERNARY STRATA FOR USE IN A REGIONAL GROUNDWATER MODELSTRATA FOR USE IN A REGIONAL GROUNDWATER MODEL

    J nis J TNIEKSā ĀJ nis J TNIEKSā Ā 11, Konr ds POPOVSā, Konr ds POPOVSā 11, Ilze KLINTS, Ilze KLINTS22, Andrejs TIMUHINS, Andrejs TIMUHINS33, Andis KALV NSĀ, Andis KALV NSĀ 11, Aija D LI AĒ Ņ, Aija D LI AĒ Ņ 11, Tomas SAKS, Tomas SAKS1111Faculty of Geography and Earth Sciences, Faculty of Geography and Earth Sciences, 22Faculty of Physics and Mathematics, Faculty of Physics and Mathematics, 33Laboratory for Mathematical Modelling of Environmental and Technological Processes. Laboratory for Mathematical Modelling of Environmental and Technological Processes.

    University of Latvia, Riga, Latvia. Contact: University of Latvia, Riga, Latvia. Contact: [email protected]@lu.lv, , www.puma.lu.lvwww.puma.lu.lv

    Class Lithology

    Hori-zontal

    conduc-tivity, m/day

    Vertical conduc-

    tivity, m/day

    1 sandstone 5 1

    2sandstone-

    low conductivity

    1 0.5

    3sandstone-

    high conductivity

    50 20

    4 limestone 30 10

    5 limestone-low conductivity 1 0.1

    6 dolomite-high conductivity 100 30

    7 dolomite 30 10

    8domerite (clayey

    dolomite)0.1 0.01

    9 clay 0.00001 0.0000110 loam 0.001 0.00111 sandy loam 0.001 0.00112 silt 0.01 0.0113 sand 5 514 gravel 20 2015 gypsum 10 1016 peat 10 10

    17gyttja and

    other biogenic sediments

    0.0003 0.0003

    18 soil 1 119 other 1 120 concrete 0 0

    Table 1. Initial conductivity values from lithology classifier, used for generalization of borehole log structures for this study.

    This degree of similarity and the spatial heterogeneity of the cluster polygons can be varied by different flattening of the hierarchical cluster model into variable number of clusters. Such an approach provides a scalable generalization solution. It can be scaled from broad to more detailed generalization, depending on model and structural characteristics of the modelling territory, with the help of the clustering dendrogram (Figure 2).

    Using the dissimilarity matrix of the NCD metric, a borehole, most similar to all the others from the lithological structure point of view, can be identified from the subset of all cluster member boreholes. The log structure of this borehole then is applied throughout the territory of the corresponding spatial cluster. The spatial locations of the same logical clusters can be disjoint - in different parts of the modelling territory and have the same borehole log column, representing the lithological structure throughout spatial clusters with the same logical cluster identifier.

    162 model geometries were prepared, incorporating different candidate representations of Quaternary strata made from spatial clusters with the most typical borehole column, selected using NCD metric. The borehole log column lithological structure is composed using aggregate lithology types as shown in Table 1 with their conductivity values.

    To integrate these results into the geometry of regional groundwater model MOSYS V1, the algorithm, shown in Figure 3, calculates mesh point heights in meters above sea level for every mesh point in every newly created Quaternary layer.

    After the layer surface calculations, the mesh element cluster memebership is determined by intersecting element centroids (white dots in Figure 4) with the spatial clusters.

    The performance of resulting geometries is detailed in Figures 8-11. For both matrix variants (NCD and NCD+E) 82 model geometries, based on different flattenings of cluster dendrogram (Figure 2) were prepared and run in MOSYS V1 modelling environment. From the inital model run results in Figures 8 and 10, 3 better perfoming, one median and also the worst performing geometry was run through MOSYS autocalibration routine (Klints et al. 2012).

    This provided insight in their sensitivity to calibration through further optimization of hydraulic conductivity values in Quaternary strata (Figure 12). The model geometry, in which the Quaternary clustering experiment geometries were integrated, had already been calibrated until convergence before these experiements.

    Fig. 7. Previous implementation of Quaternary strata in the MOSYS V1 modelling system consisted of 4 layers - two sets of aquifer-aquitard transitions. The arrow indicates the border between the previous Quaternary implementation (left) and the new Quaternary implementation, using hierarhical clustering of Normalized Compression Distanece, on the right.

    Fig. 12. Selected experimental model geometries and their response to calibration of model Quaternary strata.

    The response of the selected new model geometries to calibration indicates their influence on the underlying model strata, not just improved performance in the part of the model representing the Quaternary deposits.

    Geometries NCD 12c, NCD 13c were calibrated further, including the remaining bedrock layers as optimization parameters and yielding further decrease of squared error sum, used as overall fitness function for model performance against observed pressure heads.

    Examples of crossections from NCD 12c structure are shown in Figures 6 and 7.

    Fig. 1. Overview of the full experiment workflow for using Normalized Compression Distance and hierarchical clustering as an approch for scalable generalization of Quaternary sediment lithological structure in the MOSYS regional groundwater modelling system for the Baltic artesian basin territory.

    12 13 14 15 17 16 23 50 51 52 53 55 54 22 56 60 61 57 58 20 21 59 48 49 64 9 63 62 6 11 39 19 38 18 37 10 70 71 75 74 73 72 67 47 65 66 46 5 36 35 69 68 24 34 33 32 4 45 77 76 43 42 29 41 28 25 26 40 27 30 8 31 44 85 84 7 82 80 81 83 78 79

    1050

    1060

    1070

    1080

    1090

    1100

    1110

    1120

    1130

    1140Number of logical clusters against squared sum of head errors - NCD matrix

    Number of logical clusters after dendrogram flattening

    Squa

    red

    erro

    r sum

    in

    met

    ers

    (sm

    alle

    r is

    bette

    r)

    Fig. 8. Squared sum of errors against geometries prepared from different clustering solutions using NCD matrix.

    70 69 67 68 42 43 51 50 47 49 46 45 44 81 82 48 66 57 52 55 56 53 62 54 61 64 63 85 83 84 60 65 59 58 78 77 79 80 72 71 73 74 76 37 40 41 39 38 33 32 31 30 29 27 28 35 34 36 26 25 24 23 22 20 21 17 18 19 16 11 14 12 13 15 10 9 8 5 6 4 7

    105011001150120012501300135014001450150015501600165017001750

    Number of logical clusters against squared sum of head errors - NCD+E matrix

    Number of logical clusters after dendrogram flattening

    Squa

    red

    erro

    r sum

    in

    met

    ers

    (sm

    alle

    r is

    bette

    r)

    Fig. 10. Squared sum of errors against geometries prepared from different clustering solutions using NCD+E matrix.

    1 7 13 19 25 31 37 43 49 55 61 67 73 79 85 91 97 103

    109

    115

    121

    127

    133

    139

    145

    151

    157

    163

    169

    175

    181

    187

    193

    199

    205

    211

    217

    223

    229

    235

    241

    247

    253

    259

    265

    271

    277

    283

    289

    295

    301

    307

    313

    319

    325

    331

    337

    343

    349

    355

    361

    367

    373

    379

    385

    391

    850

    900

    950

    1000

    1050

    1100

    1150

    Target function values against optimization iteration count

    Optimization runs for Quaternary strata in 5 model geometries for each of NCD and NCD + E matrices

    NCD 12c

    NCD 13c

    NCD 14c

    NCD 74c

    NCD 79c

    NCD+E 7c

    NCD+E 67c

    NCD+E 69c

    NCD+E 70c

    NCD+E 73c

    Number of iterations for MOSYS auto-calibration routine

    Squa

    red

    erro

    r sum

    in

    met

    ers

    (sm

    alle

    r is

    bette

    r)

    mailto:[email protected]://www.puma.lu.lv/

    Slide 1