Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction...

Supporting Information

Predicting taxonomic and functional structure of microbial community in acid mine drainage

Jialiang Kuang, Linan Huang, Zhili He, Linxing Chen, Zhengshuang Hua, Pu Jia, Shengjin Li, Jun Liu,

Jintian Li, Jizhong Zhou and Wensheng Shu

Supplementary

Methods

Sampling procedure, physicochemical analyses and DNA extraction

Amplification and bar-coded pyrosequencing of bacterial and archaeal 16S rRNA genes

Processing of pyrosequencing data

GeoChip analysis

Prediction model of microbial assemblages and functional metabolic potentials

Table S1 Functional genes that selected for the statistical analyses in this study.

Table S2 Site locations and environmental conditions of acid mine drainage (AMD) samples.

Table S3 Relative abundance (%) of dominant lineages across AMD microbial communities.

Table S4 Summary of statistics (R

2) from dissimilarity test (Adonis) between two mining

areas on the functional community structure.

Table S5 Environmental and taxonomic variable loadings on the PCs across the AMD samples.

Table S6

Multiple linear regression (MLR) of environmental variables and relative

abundance of dominant microbial lineages on metabolic potential of functional

genes.

Table S7

Validation of predictive models for relative abundances of dominant microbial

taxa (Phylum level, mean relative abundance > 1%) based on the artificial neural

network (ANN).

Table S8

Validation of predictive models for relative abundances of dominant microbial

taxa (Order level, mean relative abundance > 0.1%) based on the artificial neural

network (ANN).

Table S9

Validation of predictive models for relative abundances of key microbial taxa

(OTU level, observed in at least half of the total samples) based on the artificial

neural network (ANN).

Table S10a Validation of predictive models for metabolic potentials (original signals) of key

functional genes based on the artificial neural network (ANN).

Table S10b Validation of predictive models for metabolic potentials (normalized data) of key

functional genes based on the artificial neural network (ANN).

Table S11

Predictive equations and functional parameters that provide the best prediction for

relative abundances of dominant microbial taxa based on the artificial neural

network (ANN).

Table S12 Predictive equations and functional parameters that provide the best prediction for

functional metabolic potentials based on ANN.

Table S13 Predictive equations and functional parameters that provide the best prediction for

environmental properties based on ANN.

Table S14 Functional genes that reveled consistent or fluctuant relative metabolic potentials

along the gradient of pH levels.

Figure S1 The consensus networks of environmental (a) and taxonomic (b) variables

generated by Bayesian network inference.

Figure S2 The scatter plots show the cross-validation of predicted and observed values for

relative microbial abundances at different taxonomic levels.

Figure S3 The scatter plots show the cross-validation of predicted and observed values for

functional metabolic potentials of different functional gene categories.

Figure S4

Bray-Curtis similarity between predicted and observed values of relative microbial

abundances (phylum level, a) and gene metabolic potentials of different functional

categories (with relative abundance information of microbial phyla, b).

Figure S5

The changes of relative metabolic potential of functional genes in sulfur cycling

(a), stress response (b), energy process and membrane transport (c) and antibiotic

resistance (d) along the gradient of pH levels.

Figure S6

The comparison of predicted and observed metabolic potentials of different

functional gene categories including carbon cycling (a), phosphorus (b) and sulfur

cycling (c) along the gradient of pH levels.

Figure S7 The comparison of predicted and observed metabolic potentials of nitrogen cycling

Figure S8

The comparison of predicted and observed metabolic potentials of different

functional gene categories including energy process (a) and membrane transport

(b) along the gradient of pH levels.

Figure S9 The comparison of predicted and observed metabolic potentials of metal resistance

Figure S10 The comparison of predicted and observed metabolic potentials of stress response

Figure S11 The comparison of predicted and observed metabolic potentials of antibiotic

resistance along the gradient of pH levels.

Supplementary Methods

Sampling procedure, physicochemical analyses and DNA extraction

Acid mine drainage (AMD) samples were previously collected from 14 mining areas across Southeast

China with different mineralogy and the sampling sites ranged from about 10 m to over 1600 km

(Kuang et al., 2013). Briefly, water samples were taken using sterile serum bottles and immediately 5

kept on ice for transport to the laboratory. For DNA extraction, each sample of 500 ml water was

coarse filtered through a 3 μm fiber filter and then filtered through a 0.22 μm polyethersulfone (PES)

membrane filter. The cell pellets on the PES membranes were used for DNA extraction by following

the protocol described by Frias-Lopez et al. (2008) with an additional homogenizing step for cell lysis

using Fast Prep-24 Homogenisation System, and the filtrates were used for the chemical analyses. 10

Temperature, solution pH, dissolved oxygen (DO) and electrical conductivity (EC) were measured on

site by use of specific electrodes. Ferric and ferrous irons were measured by ultraviolet-colorimetric

assay with 1,10-phenanthroline at 530 nm. Total organic carbon (TOC) was measured by high-

temperature catalytic oxidation and infrared detection with a TOC analyzer and sulfate determined by a

BaSO4-based turbidimetric method. The element analysis was performed by inductively-coupled 15

optical emission spectrometry (ICP-OES) after the filtrates were digested at 180 oC with conc. HNO3

and HCl (1:3, v/v).

Amplification and bar-coded pyrosequencing of bacterial and archaeal 16S rRNA genes

PCR amplification, purification, pooling, and pyrosequencing of a region of the 16S rRNA gene were 20

performed following the procedure described by Fierer et al. (2008). The primer set F515 (5’-

GTGCCAGCMGCCGCGGTAA-3’, with an 8-bp error-correcting tag (Hamady et al., 2008)) and

R806 (5’-GGACTACVSGGGTATCTAAT-3’) was used to amplify the V4 hypervariable region.

Triplicate PCR reactions for each sample were amplified, pooled and purified. Finally, all PCR

products were combined with approximately equimolar amounts and sequenced by a 454 GS FLX 25

Titanium pyrosequencer.

Processing of pyrosequencing data

Raw data generated from the 454-pyrosequencing run were processed and analyzed following the

pipelines of Mothur (Schloss et al., 2009) and QIIME (Caporaso et al., 2010). Pyrosequences were 30

denoised using the commands of ‘shhh.flows’ (translation of PyroNoise algorithm; Quince et al., 2009)

and ‘pre.cluster’ (Huse et al., 2010) in Mothur platform. Chimeric sequences were identified and

removed using UCHIME with de novo method (Edgar et al., 2011). Quality sequences were

subsequently assigned to samples according to their unique 8-bp barcode and binned into phylotypes

using average clustering algorithm (Huse et al., 2010) at the 97% similarity level. Taxonomic 35

classification of phylotypes was determined based on the Ribosomal Database Project at the 80%

threshold (Wang et al., 2007). The relative abundance (%) of individual taxa was estimated within

each community by comparing the number of sequences assigned to a specific taxon versus the number

of total sequences obtained for that sample.

GeoChip analysis

The general pipeline of DNA labeling, GeoChip processing and data normalization was described

previously (He et al., 2007). Specifically, to obtain sufficient amounts of genomic DNA for the

hybridization, whole-community genome amplification (WCGA) (TempliPhi Amplification kit,

Amersham Biosciences, Piscataway, NJ) was conducted using approximately 1.0 ng of community 45

DNA from each sample following the procedure of Wu et al. (2006). Notably, appropriate

manipulation of community DNAs was necessary in applying microarray-based genomic technology

especially for samples with very low microbial biomass like AMD. Although previous report showed

that WCGA could produce significant biases in community composition (Bodelier et al., 2009), our

previous experimental study indicated that the amplification procedure we used here was in a 50

representative and quantitative fashion (Wu et al., 2006). Thus, the biases of WCGA in this study may

not significantly affect the actual functional structure. Equal amounts of amplified DNA (1.0 μg) were

then used for GeoChip 4.0 hybridization as previously described (Lu et al., 2012; Chan et al., 2013).

Signal intensity was normalized by the average control dye across samples and spots with signal-to-

noise ratio [SNR = (signal intensity - background)/standard deviation of background] greater than 2 55

were considered as positive signals for further analysis (He et al., 2007).

Prediction model of microbial assemblages and functional metabolic potentials

The modeling approach developed by Larsen et al. (2012) was applied for the prediction of microbial

assemblages and functional metabolic potentials. In this study, the dynamics of microbial community 60

structure and signal intensity of functional genes were modeled respectively. Since our results

indicated that the patterns of microbial community composition and functional gene structure were

largely determined by environmental conditions, thus the prediction of microbial assemblage and

functional metabolic potential were performed according to the environmental properties. Also, the

biotic interactions between different microbial taxa or relevant genes involved in the same functional 65

subcategory were incorporated into the modeling. Additionally, because of the potential influence of

relative microbial abundances to metabolic potentials that observed in other analyses of this study, we

constructed models of functional metabolic potentials with and without these microbial interactions.

Environmental variables, relative abundances of dominant microbial lineages and/or the signal

intensities of functional genes were merged as input matrixes, and the relationships between the 70

variables were estimated using Bayesian network inference with Java Objects (BANJO v2.2.0) (Smith

et al., 2006; Larsen et al., 2012). The networks generated by the Bayesian network inference were

directed acyclical graphs (DAGs), in which nodes were environmental parameters, microbial taxa or

functional genes. The directed edges of these DAGs revealed the relationships between nodes, and a

change in the value of a parent node has a significant conditional dependence on a change in value of a 75

child node (Larsen et al., 2012). In this study, the maximum number of parents in BANJO was set as

three, and the simulated annealing and the AllLocalMoves proposer were used with randomly

configured networks. The top-10 highest-scoring networks were subsequently used to generate the

consensus network.

The relationships revealed by the consensus network could be expressed as a set of formulas such that 80

the value of every node is a function of the value of its parent nodes. Finally, these functions were

derived using Eureqa v 0.99.9 beta software (Schmidt and Lipson, 2009). The operations including

constant, addition, subtraction, multiplication and division were permitted in equations. In the formula

search, data from 30 randomly selected samples were used for training, while the remaining samples

(10 samples) were used for validation (see below). The best-fitting equations were searched for 2 CPU 85

hours, and not all of the parents (if more than one parent for a given node) will be essentially

incorporated into the generated equations that best fit the observed data. All the possible solutions were

effectively ranked according to the Pearson’s correlation coefficients. The final equation that selected

for the prediction was defined by the following optimality criteria: choice of equations that best fitting

an obvious peak or drop in observed data; highest correlation with observed data; with more function 90

parameters; with the fewest terms (Larsen et al., 2012). After the generation and selection of final

formula that trained by data from 30 samples, the data of the remaining 10 samples were imported to

validate this equation. Additionally, since only a few taxa are consistently of high relative abundance

and many taxa are consistently of low relative abundance, it is possible to get deceptively high

correlations between predicted and observed values so long as the model correctly identifies the small 95

number of high abundance taxa (Larsen et al., 2015). Therefore, two null models were performed to

test whether the predicted model has better correlation with biological observation than these null

models: i) setting all taxa's predicted relative abundance/metabolic potentials equal to the average taxa

abundance/metabolic potentials across all samples, ii) setting all taxa abundances/metabolic potentials

equal to the minimum observed values across all samples (Larsen et al., 2015). 100

References

Bodelier PLE, Kamst M, Meima-Franke M, Stralis-Pavese N, Bodrossy L. (2009). Whole-community

genome amplification (WCGA) leads to compositional bias in methane-oxidizing communities as

assessed by pmoA-based microarray analyses and QPCR. Environ Microbiol Rep 1: 434-441. 105

Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK et al. (2010). QIIME

allows analysis of high-throughput community sequencing data. Nat Methods 7: 335-336.

Chan Y, Van Nostrand JD, Zhou J, Pointing SB, Farrell RL. (2013). Functional ecology of an

Antarctic Dry Valley. Proc Natl Acad Sci USA 110: 8990-8995.

Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. (2011). UCHIME improves sensitivity and 110

speed of chimera detection. Bioinformatics 27: 2194-2200.

Fierer N, Hamady M, Lauber CL, Knight R. (2008). The influence of sex, handedness, and washing on

the diversity of hand surface bacteria. Proc Natl Acad Sci USA 105: 17994-17999.

Frias-Lopez J, Shi Y, Tyson GW, Coleman ML, Schuster SC, Chisholm SW et al. (2008). Microbial

community gene expression in ocean surface waters. Proc Natl Acad Sci USA 105: 3805-3810. 115

Hamady M, Walker JJ, Harris JK, Gold NJ, Knight R. (2008). Error-correcting barcoded primers for

pyrosequencing hundreds of samples in multiplex. Nat Methods 5: 235-237.

He Z, Gentry TJ, Schadt CW, Wu L, Liebich J, Chong SC et al. (2007). GeoChip: a comprehensive

microarray for investigating biogeochemical, ecological and environmental processes. ISME J 1:

67-77. 120

Huse SM, Welch DM, Morrison HG, Sogin ML. (2010). Ironing out the wrinkles in the rare biosphere

through improved OTU clustering. Environ Microbiol 12: 1889-1898.

Kuang JL, Huang LN, Chen LX, Hua ZS, Li SJ, Hu M et al. (2013). Contemporary environmental

variation determines microbial diversity patterns in acid mine drainage. ISME J 7: 1038-1050.

Larsen PE, Dai Y, Collart FR. (2015). Predicting bacterial community assemblages using an artificial 125

neural network approach. Meth Mol Bio 1260: 33-43.

Larsen PE, Field D, Gilbert JA. (2012). Predicting bacterial community assemblages using an artificial

neural network approach. Nat Methods 9: 621-625.

Lu Z, Deng Y, Van Nostrand JD, He Z, Voordeckers J, Zhou A et al. (2012). Microbial gene functions

enriched in the Deepwater Horizon deep-sea oil plume. ISME J 6: 451-460. 130

Quince C, Lanzen A, Curtis TP, Davenport, RJ, Hall N, Head IM et al. (2009). Noise and the accurate

determination of microbial diversity from 454 pyrosequencing data. Nat Methods 6: 639-641.

Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB et al. (2009). Introducing

mothur: open-source, platform-independent, community-supported software for describing and

comparing microbial communities. Appl Environ Microbiol 75: 7537-7541. 135

Schmidt M, Lipson H. (2009). Distilling free-form natural laws from experimental data. Science 324:

81-85.

Smith VA, Yu J, Smulders TV, Hartemink AJ, Jarvis ED. (2006). Computational inference of neural

information flow networks. PLoS Comput Biol 2: e161.

Wang Q, Garrity GM, Tiedje JM, Cole JR. (2007). Naive Bayesian classifier for rapid assignment of 140

rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73: 5261-5267.

Wu L, Liu X, Schadt CW, Zhou J. (2006). Microarray-based analysis of subnanogram quantities of

microbial community DNAs by using whole-community genome amplification. Appl Environ

Microbiol 72: 4931-4941.

Table S1. Functional genes that selected for the statistic analyses in this study.

Category Subcategory Gene Abbreviations

Carbon cycling Carbon fixation

aclB aclB

CODH CODH

Pcc Pcc

RubisCo RubisCo

Nitrogen cycling

Nitrogen fixation nifH nifH

Ammonification

(mineralization)

gdh gdh

ureC ureC

Nitrification amoA amoA

Denitrification

narG narG

nirK nirK

nirS nirS

norB norB

nosZ nosZ

Assimilatory N reduction

nasA nasA

NiR NiR

nirA nirA

nirB nirB

Dissimilatory N reduction napA napA

nrfA nrfA

Phosphorus Phosphorus utilization

phytase phytase

ppk ppk

ppx ppx

Sulfur cycling

Adenylylsulfate reductase aprA aprA

aprB aprB

Sulfite reductase dsrA dsrA

dsrB dsrB

Sulfur oxidation sox sox

Metal resistance

silA silA

silC silC

silP silP

Al al al

aoxB aoxB

arsA arsA

arsB arsB

arsC arsC

arsM arsM

Cd cadA cadA

cadBD cadBD

Cd_Co_Zn

czcA czcA

czcC czcC

czcD czcD

Co corC corC

Cr chrA chrA

copA copA

cueO cueO

cusA cusA

mer mer

merB merB

merP merP

Ni nreB nreB

Pb pbrA pbrA

pbrT pbrT

tehB tehB

terC terC

terD terD

terZ terZ

Zn zitB zitB

zntA zntA

Energy process

Electron transport

Fe-S cluster binding protein fes

ferredoxin fer

ferredoxin oxidoreductase fero

NADH ubiquinone oxidoreductase NADH

terminal quinol oxidase quio

Cytochrome cytochrome cyt

Hydrogenase hydrogenase hyd

Ni-Fe hydrogenase Nfhyd

Table S1. Functional genes that selected for the statistic analyses in this study (continued).

Category Subcategory Gene Abbreviations

Membrane transport EPS glycosyl transferase glyt

other category ABC transporter ABCt

Stress response

Cold cspA cspA

cspB cspB

dnaK dnaK

groEL groEL

groES groES

grpE grpE

hrcA hrcA

Glucose limitation bglH bglH

bglP bglP

Nitrogen limitation glnA glnA

glnR glnR

Oxygen limitation

arcA arcA

arcB arcB

cydA cydA

cydB cydB

narH narH

narI narI

narJ narJ

Oxygen stress

ahpC ahpC

ahpF ahpF

fnr fnr

katA katA

katE katE

oxyR oxyR

perR perR

Osmotic stress proV proV

proX proX

Phosphate limitation

phoA phoA

phoB phoB

pstA pstA

pstB pstB

pstC pstC

pstS pstS

Protein stress clpC clpC

ctsR ctsR

Radiation stress obgE obgE

Antibiotic resistance

Transporter

ABC antibiotic transporter ABCat

MatE antibiotics MatE

MFS antibiotics MFS

SMR antibiotics SMR

Mex Mex

Beta-lactamases

beta-lactamase lac

class A beta-lactamase lacA

class C beta-lactamase lacC

other category Tet Tet

Van Van

Table S2. Site locations and environmental conditions of acid mine drainage (AMD) samples.

Sample

ID Location Mining area

Latitude

Logitude

(E) pH EC DO TOC SO4

2- Fe3+ Fe2+ Al As Cd Cu Pb Zn

NS Maanshan, Anhui AHMAS 31.64 118.62 4.1 3224 2.2 6.0 1319 1 0 0 0.00 0.00 0.5 0.00 0

JGS1 Tongling, Anhui AHTL 30.90 117.83 2.0 20000 0.9 67.0 7530 29283 589 2531 136.51 6.03 1028.0 1.60 1834

JGS2 Tongling, Anhui AHTL 30.90 117.83 2.2 16259 1.1 19.0 7443 3570 6 1891 64.86 7.22 699.0 0.92 1469

XSC1 Tongling, Anhui AHTL 30.91 117.89 2.9 2908 1.4 2.2 712 42 2 9 0.00 0.04 2.9 0.83 47

XSC3 Tongling, Anhui AHTL 30.90 117.90 2.9 4342 2.1 6.8 2852 219 10 90 0.00 0.00 12.0 0.14 3

YSC1 Tongling, Anhui AHTL 30.90 117.90 2.3 5113 2.5 7.5 4579 721 35 174 14.90 0.00 19.0 0.12 41

YSC2 Tongling, Anhui AHTL 30.90 117.83 2.2 6794 0.4 13.0 5931 1664 25 157 62.11 0.43 52.0 0.22 97

ZJ1 Zijin, Fujian FJZJ 25.19 116.38 2.0 16770 4.6 12.0 6823 3183 32 1297 10.41 0.00 268.0 0.50 82

DBS1 Dabaoshan, Guangdong GDDBS 24.52 113.72 2.6 2850 5.2 2.5 3469 427 9 168 0.00 0.02 6.3 0.67 144

DBS3 Dabaoshan, Guangdong GDDBS 24.57 113.72 2.5 3610 5.0 2.7 4632 559 7 132 0.00 0.00 16.0 0.10 27

FK1 Fankou, Guangdong GDFK 25.05 113.66 1.9 5890 4.8 10.0 6173 2541 252 53 0.74 0.32 0.0 0.26 427

YF1 Yunfu, Guangdong GDYF 22.97 112.01 2.4 2290 4.0 6.3 2785 281 147 114 0.00 0.00 0.0 0.33 0

Table S2. Site locations and environmental conditions of acid mine drainage (AMD) samples (continued).

Sample

ID Location Mining area

Latitude

Longitude

(E) pH EC DO TOC SO4

2- Fe3+ Fe2+ Al As Cd Cu Pb Zn

DC1 Dachang, Guangxi GXDC 24.86 107.58 2.7 3820 1.3 1.9 4031 890 145 0 0.00 0.00 0.0 0.12 38

PD1 Puding, Guizhou GZPD 26.58 105.72 3.0 3300 2.2 ND 3062 265 5 81 0.00 0.00 0.8 0.06 127

PD3 Puding, Guizhou GZPD 26.48 105.89 2.5 3510 1.9 14.0 3600 499 4 111 0.00 0.00 1.2 0.06 16

SL Shilu, Hainan HNSL 19.24 109.04 2.8 3155 1.2 3.3 699 150 9 8 0.00 0.00 6.7 0.11 0

DX1 Dexing, Jiangxi JXDX 29.01 117.73 2.0 3690 1.3 14.0 2766 506 5 124 0.00 0.00 19.0 0.05 0

YP1 Yongping, Jiangxi JXYP 28.21 117.77 2.7 4430 1.3 15.0 4685 91 19 321 0.00 0.00 25.0 0.17 16

All values are in mg L-1

, except pH, Latitude, Longitude (in standard units) and EC (in μS cm-1

EC: eletrical conductivity. DO: dissolved oxygen. TOC: total organic carbon.

ND, not determined.

Table S3. Relative abundance (%) of dominant lineages across acid mine drainage (AMD) microbial communities.

Sample

ID Euryarchaeota

Alpha- Beta- Gamma- Nitrospira Firmicutes

Actino- Acido- Others Unclassified

proteobacteria proteobacteria proteobacteria bacteria bacteria

NS 0.48 12.93 77.76 2.52 2.38 0.48 0.34 0.00 3.06 0.07

JGS1 0.60 0.65 0.05 41.69 47.56 6.77 0.15 0.05 2.49 0.00

JGS2 0.82 9.45 1.45 82.17 3.72 0.25 0.19 0.06 1.89 0.00

XSC1 0.01 1.50 95.96 2.26 0.21 0.00 0.02 0.00 0.04 0.00

XSC3 0.13 0.67 87.16 10.95 0.45 0.24 0.07 0.02 0.31 0.00

YSC1 10.15 43.18 3.88 7.80 27.35 0.69 0.03 0.75 4.54 1.63

YSC2 17.55 13.80 0.07 19.18 39.63 5.24 0.07 0.28 2.48 1.70

ZJ1 3.82 2.32 0.00 20.11 10.62 4.64 0.31 0.00 57.97 0.21

ZJ2 0.16 10.40 21.33 18.39 5.28 6.13 6.59 8.92 22.73 0.08

ZJ3 0.00 14.78 33.29 11.55 0.37 8.32 2.36 3.98 25.34 0.00

ZJ8 0.00 6.09 24.54 23.62 2.03 10.70 1.85 1.11 30.07 0.00

DBS1 0.21 3.22 17.37 70.73 1.32 0.13 0.08 0.81 6.10 0.02

DBS3 0.16 18.66 74.29 4.38 0.90 0.09 0.03 1.15 0.31 0.03

FK1 32.33 0.42 0.07 18.12 27.99 0.42 0.42 0.42 10.15 9.66

YF1 2.36 0.26 83.28 5.18 2.56 0.52 0.26 0.07 5.44 0.07

YF2 4.57 0.00 60.62 8.15 12.64 1.52 0.08 0.46 11.88 0.08

YF3 11.10 0.00 0.40 41.27 18.37 1.61 0.10 0.00 27.04 0.10

YF4 1.23 2.03 54.34 39.51 0.86 0.31 0.00 0.06 1.60 0.06

YF5 0.97 2.59 57.20 37.30 0.57 0.24 0.00 0.00 1.05 0.08

YF7 8.21 1.79 67.68 8.41 4.17 3.11 0.46 0.66 5.17 0.33

YF8 0.06 26.67 67.36 1.35 3.43 0.06 0.00 0.96 0.11 0.00

Table S3. Relative abundance (%) of dominant lineages across acid mine drainage (AMD) microbial communities (continued).

Sample

ID Euryarchaeota

Alpha- Beta- Gamma- Nitrospira Firmicutes

Actino- Acido- Others Unclassified

proteobacteria proteobacteria proteobacteria bacteria bacteria

DC1 3.76 10.94 35.21 12.31 2.22 2.05 1.71 1.20 30.26 0.34

DC2 5.40 2.61 42.51 2.96 0.17 0.35 0.35 0.70 43.73 1.22

DC3 0.09 15.82 76.14 1.07 0.09 0.00 0.00 0.09 6.70 0.00

DC5 0.41 0.34 94.87 1.98 0.07 0.68 0.00 0.07 1.44 0.14

DC7 2.63 2.21 60.28 4.80 4.32 19.34 0.28 0.66 5.42 0.07

DC8 1.49 5.60 76.87 7.20 0.96 0.36 0.16 0.13 7.01 0.23

PD1 17.72 1.15 30.13 28.84 6.17 1.58 0.50 0.93 12.12 0.86

PD3 0.59 1.51 86.69 2.43 1.44 3.21 0.33 1.77 1.97 0.07

PD4 16.65 0.35 62.60 2.01 2.27 1.31 0.35 1.05 13.34 0.09

PD7 24.59 1.66 20.99 12.15 20.03 6.35 0.28 1.66 11.46 0.83

SL 3.06 0.72 4.69 46.84 28.40 0.98 1.63 0.07 13.55 0.07

DX1 4.00 7.21 35.19 10.28 13.28 2.43 2.93 7.49 16.35 0.86

DX2 21.78 0.34 1.24 3.84 11.51 17.72 19.19 0.34 20.43 3.61

DX3 11.68 3.09 0.20 23.65 22.65 16.27 7.09 0.30 14.37 0.70

YP1 0.47 6.72 54.10 25.69 1.28 1.34 2.82 4.34 3.19 0.03

YP2 6.61 2.15 17.12 39.19 27.97 1.78 0.71 0.45 3.57 0.45

YP3 1.76 4.11 19.31 3.44 68.50 2.35 0.08 0.04 0.38 0.04

YP4 4.09 4.04 11.96 18.86 46.92 6.70 0.79 2.51 3.79 0.34

YP5 8.08 1.66 21.62 21.78 29.91 2.78 1.44 1.07 10.43 1.23

All phylotypes were classified at the phylum level (subphylum for the Proteobacteria).

Others include 12 phyla: Bacteroidetes, Chlamydiae, Chloroflexi, Crenarchaeota, Cyanobacteria, Deinococcus-Thermus, Gemmatimonadetes, OD1, OP11,

Planctomycetes, TM7, Verrucomicrobia; and two subphyla for Proteobacteria: Deltaproteobacteria and Epsilonproteobacteria.

Table S4. Summary of statistics (R2) from dissimilarity test (Adonis) between two mining areas on

the functional community structure.

AHTL JXDX JXYP FJZJ GDYF GXDC GZPD

Longitude (E) 118 117 117 116 112 108 106

Latitude (N) 31 29 28 25 23 25 26

Distancea 0.01-6.85 0.04-0.60 0.01-1.83 1.09-2.30 0.02-2.38 0.01-4.46 0.45-21.08

No. of samples 6 3 5 4 7 6 4

0.155 0.079 0.119 0.186* 0.089 0.080

0.195 0.281 0.222 0.105 0.281*

0.117 0.159 0.095 0.114

0.206 0.187 0.119

0.233* 0.242*

GXDC 0.108

Samples in a mining area were clustered into a group and compared with others based on Bray-Curtis

dissimilarity of the log-transferred signal intensity of the GeoChip data using Adonis (*, P < 0.05). Mining

areas with less than 3 samples were excluded from this analysis (i.e., totally 5 samples in 4 mining areas

were excluded). a The range of distance (km) between two samples within the mining area.

Table S5. Environmental and taxonomic variable loadings on the PCs across the AMD samples.

Environmental properties

(Abbr.)

PCEnv1

PCEnv2

(E2) Microbial taxa

(Abbr.)

PCTaxa1

PCTaxa2

PCTaxa3

PCTaxa4

exp.* 0.522 0.203 exp. 0.280 0.169 0.141 0.118

0.636 -0.001

Euryarchaeota (Eury)

-1.050 0.251 -0.691 0.062

Dissolved Oxygen (DO)

0.103 0.148

Acidobacteria (Acido)

-0.088 -0.769 0.368 -0.647

Total Organic Carbon (TOC)

-0.397 -0.175

Actinobacteria (Actino)

-0.842 -0.812 0.019 0.081

Electrical Conductivity (EC)

-0.503 -0.040

Firmicutes (Firm)

-0.808 -0.690 0.180 0.184

Sulfate (SO42-

-0.592 0.118

Nitrospira (Nitro)

-0.712 0.711 0.133 -0.420

Ferric ion (Fe3+

-1.519 0.387

Alphaproteobacteria (Alpha)

0.262 -0.074 -0.083 -1.214

Ferrous ion (Fe2+

-1.174 0.983

Betaproteobacteria (Beta)

1.151 -0.370 -0.620 0.298

Aluminum (Al)

-1.047 -0.152

Gammaproteobacteria (Gamma) -0.222 0.708 0.980 0.230

Copper (Cu)

-0.669 -1.246

Zinc (Zn)

-1.032 -0.342

Arsenic (As)

-0.615 -0.348

Cadmium (Cd)

-0.194 -0.205

Lead (Pb)

-0.376 -0.278

Phosphorus (P) -0.100 -0.024

Variables in bold show the dominant influence (top-50%) on each PC.

* Proportion explained.

Table S6. Multiple linear regression (MLR) of environmental variables and relative abundance of dominant microbial lineages on metabolic potential of functional genes.

Category Subcategory Gene PCsa AIC Best modelb Environmental properties Microbial taxa

pH Fe3+ Fe2+ Al Cu Zn Eury Acido Nitro Alpha Beta Gamma

Nitrogen cycling Denitrification narG E1 112.51 pH + Al + Cu -0.470

-0.452

Nitrogen cycling Denitrification nirK E1 112.06 Cu 0.386

Nitrogen cycling Assimilatory N reduction nirB E1 112.46 Fe2+ + Fe3+

0.666 -0.571

Sulfur cycling Sulfite reductase dsrB E1 107.26 pH + Cu -0.589

Energy process Electron transport Fe-S cluster binding protein E1 114.04 pH -0.325

Energy process Electron transport ferredoxin E1 114.04 Fe2+ + pH -0.365 0.447

Energy process Electron transport NADH ubiquinone oxidoreductase E1 113.82 pH -0.331

Energy process Electron transport terminal quinol oxidase E1 114.34 pH -0.314

Energy process Hydrogenase hydrogenase E1 113.57 pH -0.340

Metal resistance As arsB E1 113.17 pH -0.353

Metal resistance As arsM E1 113.80 pH -0.333

Metal resistance Cd cadA E1 112.78 pH -0.365

Metal resistance Cr chrA E1 112.66 pH -0.369

Metal resistance Cu copA E1 114.03 pH + Al -0.529

Metal resistance Te terC E1 114.33 Cu

Metal resistance Te terD E1 110.58 Zn + Fe2++ pH + Al -0.553 -0.368 0.326

Stress response Heat dnaK E1 113.66 Cu

Stress response Nitrogen limitation glnA E1 113.37 pH + Fe2+ + Zn -0.398

Stress response Oxygen stress fnr E1 107.60 pH -0.488

Stress response Oxygen stress oxyR E1 114.20 pH -0.320

Stress response Protein stress clpC E1 112.74 Al + pH 0.415

Antibiotic resistance Transporter MatE antibiotics E1 115.38 pH + Zn -0.352

Antibiotic resistance Transporter SMR antibiotics E1 113.91 pH -0.329

Nitrogen cycling Denitrification norB T1 112.60 Eury + Firm 0.316

Nitrogen cycling Dissimilatory N reduction nrfA T1 113.90 Eury

Stress response Heat hrcA T1 114.24 Eury -0.318

Stress response Nitrogen limitation glnR T1 111.51 Eury

Stress response Phosphate limitation pstC T1 102.82 Eury 0.570

Antibiotic resistance Transporter ABC antibiotic transporter T1 113.35 Beta

-0.348

Nitrogen cycling Assimilatory N reduction nirA T3 113.81 Gamma -0.333

Phosphorus Phosphorus utilization ppk T3 112.60 Gamma + Eury

-0.303

Sulfur cycling Sulfur oxidation sox T3 113.18 Gamma 0.353

Metal resistance Ag silP T3 109.03 Eury + Beta

Metal resistance Cd cadBD T3 111.11 Gamma + Beta 0.366 0.530

Metal resistance Ni nreB T3 111.07 Gamma

-0.412

Stress response Heat groES T3 106.08 Gamma + Acido + Eury -0.309 -0.405

Stress response Glucose limitation bglH T3 113.75 Gamma + Beta

Antibiotic resistance Transporter Mex T3 112.56 Gamma + Eury 0.318

Nitrogen cycling Ammonification gdh T4 112.59 Nitro + Acido

0.297 0.333

Energy process Electron transport ferredoxin oxidoreductase T4 114.35 Nitro 0.314

Metal resistance Hg mer T4 113.13 Alpha

Stress response Oxygen limitation cydB T4 114.32 Alpha 0.315

Antibiotic resistance other category Van T4 111.90 Beta + Alpha -0.304 0.307

a The most important PCs to the metabolic potential of functional genes that determined by ABT model, and the variables with dominant influence based on PC loadings were selected as input in the multiple linear regression (MLR) models.

b The best model is based on the AIC value.

Only significant estimates (P < 0.05) for the best model with stepwise method were reported to show the most important environmental properties and dominant taxa to the metabolic potential of functional gene.

Table S7. Validation of predictive models for relative abundances of dominant microbial taxa (Phylum level, mean relative

abundance > 1%) based on the artificial neural network (ANN).

Taxa Occurrenceb

Bray-Curtis similaritya

Predicted vs Observed Null model (Mean)c Null model (Minimum)

Euryarchaeota 95.0% 0.685

Acidobacteria 87.5% 0.709

Actinobacteria 87.5% 0.694

Firmicutes 95.0% 0.725

Nitrospira 100.0% 0.757

Alphaproteobacteria 95.0% 0.657

Betaproteobacteria 97.5% 0.783

Gammaproteobacteria 100.0% 0.801 0.620 0.109

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative

microbial abundances.

b The occurrence shows the percentage of the total samples where the given taxa were detected.

c This null model is to set all taxa’s predicted relative abundance equal to the average taxa abundance across all samples.

d This null model is to set all taxa abundances equal to the minimum observed taxa’s abundance.

Table S8. Validation of predictive models for relative abundances of dominant microbial taxa (Order level, mean relative abundance > 0.1%) based

on the artificial neural network (ANN).

Taxa Occurrenceb

Predicted vs Observed Null model (Mean)c Null model (Minimum)d

Nitrospirales 100.0% 0.735

Acidithiobacillales 97.5% 0.606

Thermoplasmatales 95.0% 0.493

Rhodospirillales 95.0% 0.456

Ferrovales 90.0% 0.717

Acidobacteria_Gp1 85.0% 0.709

Bacillales 85.0% 0.536

Burkholderiales 85.0% 0.813

Clostridiales 82.5% 0.782

Xanthomonadales 82.5% 0.630

Legionellales 65.0% 0.546

Rhizobiales 60.0% 0.797

Acidimicrobiales 52.5% 0.737

Pseudomonadales 52.5% 0.376

Chlamydiales 47.5% 0.739

Actinomycetales 45.0% 0.663

Sphingomonadales 42.5% 0.828

Rhodocyclales 42.5% 0.746

Chloroplast 32.5% 0.513

Hydrogenophilales 30.0% 0.906

Desulfuromonadales 30.0% 0.735

Sphingobacteriales 27.5% 0.868

Gemmatimonadales 22.5% 0.485

Planctomycetales 22.5% 0.701

Caulobacterales 22.5% 0.159

Desulfobacterales 22.5% 0.488

Holophagales 20.0% 0.806

Campylobacterales 17.5% 0.974

Opitutales 17.5% 0.671

Bacteroidales 15.0% 0.800

Enterobacteriales 15.0% 0.870

Acidobacteria_Gp16 12.5% 0.857

Neisseriales 12.5% 0.452

Rhodobacterales 7.5% 0.905

Aeromonadales 5.0% 0.975 0.174 0.039

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative microbial abundances.

b The occurrence shows the percentage of the total samples where the given taxa were detected.

Table S9. Validation of predictive models for relative abundances of key microbial taxa (OTU level, observed in at least half of the

total samples) based on the artificial neural network (ANN).

Taxa Occurrenceb

Predicted vs Observed Null model (Mean)c Null model (Minimum)

OTU2197 97.5% 0.688

OTU1 90.0% 0.569

OTU3 90.0% 0.526

OTU2196 80.0% 0.694

OTU5 77.5% 0.529

OTU0 75.0% 0.585

OTU10 57.5% 0.208

OTU12 57.5% 0.794

OTU4 55.0% 0.805

OTU2 52.5% 0.463

OTU11 52.5% 0.702

OTU17 52.5% 0.551

OTU21 52.5% 0.564

OTU26 52.5% 0.595 0.410 0.004

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative microbial

abundances. b The occurrence shows the percentage of the total samples where the given taxa were detected.

Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN).

Genes Abbr.

Bray-Curtis similaritya (Original signals)

ENVb TAXA

Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

aclB aclB 0.940 0.299 0.141 0.930 0.345 0.076 0.950 0.299 0.141 0.930 0.345 0.076

CODH CODH 0.969 0.317 0.263 0.954 0.429 0.237 0.975 0.317 0.263 0.954 0.429 0.237

Pcc Pcc 0.974 0.294 0.193 0.953 0.357 0.119 0.970 0.294 0.193 0.953 0.357 0.119

RubisCo RubisCo 0.961 0.290 0.215 0.959 0.353 0.195 0.965 0.290 0.215 0.959 0.353 0.195

nifH nifH 0.931 0.297 0.239 0.949 0.347 0.206 0.965 0.297 0.239 0.958 0.347 0.206

gdh gdh 0.916 0.262 0.172 0.905 0.348 0.178 0.941 0.262 0.172 0.905 0.348 0.178

ureC ureC 0.924 0.275 0.222 0.949 0.394 0.157 0.956 0.275 0.222 0.949 0.394 0.157

amoA amoA 0.962 0.326 0.196 0.954 0.372 0.187 0.976 0.326 0.196 0.958 0.372 0.187

narG narG 0.971 0.312 0.274 0.961 0.362 0.210 0.982 0.312 0.274 0.961 0.362 0.210

nirK nirK 0.965 0.289 0.235 0.950 0.339 0.202 0.971 0.289 0.235 0.950 0.339 0.202

nirS nirS 0.946 0.285 0.181 0.941 0.335 0.107 0.940 0.285 0.181 0.941 0.335 0.107

norB norB 0.929 0.249 0.139 0.920 0.299 0.065 0.943 0.249 0.139 0.917 0.299 0.065

nosZ nosZ 0.930 0.282 0.151 0.948 0.332 0.077 0.934 0.282 0.151 0.948 0.332 0.077

nasA nasA 0.930 0.243 0.143 0.915 0.293 0.110 0.913 0.243 0.143 0.925 0.293 0.110

NiR NiR 0.918 0.283 0.198 0.954 0.333 0.165 0.946 0.283 0.198 0.956 0.333 0.165

nirA nirA 0.900 0.235 0.033 0.914 0.285 0.000 0.914 0.235 0.033 0.914 0.285 0.000

nirB nirB 0.866 0.200 0.016 0.857 0.250 0.017 0.876 0.200 0.016 0.857 0.250 0.017

napA napA 0.950 0.293 0.204 0.950 0.342 0.140 0.945 0.293 0.204 0.950 0.342 0.140

nrfA nrfA 0.929 0.257 0.159 0.919 0.320 0.085 0.931 0.257 0.159 0.919 0.320 0.085

Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

phytase phytase 0.797 0.255 0.133 0.918 0.318 0.131 0.868 0.255 0.133 0.950 0.318 0.131

ppk ppk 0.912 0.267 0.151 0.930 0.329 0.150 0.974 0.267 0.151 0.930 0.329 0.150

ppx ppx 0.961 0.287 0.277 0.954 0.350 0.275 0.965 0.287 0.277 0.954 0.350 0.275

aprA aprA 0.950 0.318 0.175 0.944 0.365 0.166 0.964 0.318 0.175 0.958 0.365 0.166

aprB aprB 0.914 0.283 0.142 0.912 0.329 0.133 0.916 0.283 0.142 0.912 0.329 0.133

dsrA dsrA 0.966 0.321 0.270 0.964 0.408 0.191 0.970 0.321 0.270 0.964 0.408 0.191

dsrB dsrB 0.953 0.308 0.205 0.955 0.395 0.125 0.954 0.308 0.205 0.955 0.395 0.125

sox sox 0.944 0.270 0.177 0.936 0.390 0.157 0.949 0.270 0.177 0.936 0.390 0.157

Fe-S cluster

binding protein fes 0.975 0.209 0.127 0.870 0.296 0.207 0.980 0.209 0.127 0.870 0.296 0.207

ferredoxin fer 0.847 0.173 0.014 0.902 0.260 0.094 0.907 0.173 0.014 0.902 0.260 0.094

ferredoxin

oxidoreductase fero 0.908 0.188 0.029 0.872 0.275 0.109 0.925 0.188 0.029 0.872 0.275 0.109

NADH ubiquinone

oxidoreductase NADH 0.973 0.173 0.055 0.889 0.223 0.119 0.989 0.173 0.055 0.889 0.223 0.119

terminal quinol

oxidase quio 0.980 0.257 0.020 0.943 0.377 0.085 0.987 0.257 0.020 0.943 0.377 0.085

cytochrome cyt 0.902 0.311 0.193 0.946 0.423 0.173 0.912 0.311 0.193 0.946 0.423 0.173

hydrogenase hyd 0.918 0.268 0.025 0.919 0.355 0.032 0.931 0.268 0.025 0.917 0.355 0.032

Ni-Fe hydrogenase NFhyd 0.884 0.213 0.124 0.863 0.263 0.127 0.904 0.213 0.124 0.879 0.263 0.127

glycosyl

transferase glyt 0.912 0.201 0.077 0.899 0.288 0.083 0.913 0.201 0.077 0.892 0.288 0.083

ABC transporter ABCt 0.974 0.272 0.003 0.918 0.319 0.006 0.974 0.272 0.003 0.938 0.319 0.006

Genes Abbr.

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

silA silA 0.762 0.161 0.119 0.855 0.281 0.121 0.829 0.161 0.119 0.866 0.281 0.121

silC silC 0.926 0.233 0.020 0.932 0.353 0.010 0.948 0.233 0.020 0.932 0.353 0.010

silP silP 0.928 0.229 0.140 0.940 0.349 0.120 0.943 0.229 0.140 0.940 0.349 0.120

al al 0.933 0.281 0.066 0.912 0.328 0.057 0.928 0.281 0.066 0.912 0.328 0.057

aoxB aoxB 0.949 0.317 0.184 0.951 0.364 0.175 0.972 0.317 0.184 0.951 0.364 0.175

arsA arsA 0.797 0.203 0.122 0.845 0.249 0.123 0.831 0.203 0.122 0.875 0.249 0.123

arsB arsB 0.921 0.241 0.075 0.899 0.287 0.066 0.956 0.241 0.075 0.928 0.287 0.066

arsC arsC 0.973 0.334 0.273 0.968 0.380 0.264 0.986 0.334 0.273 0.971 0.380 0.264

arsM arsM 0.976 0.190 0.041 0.901 0.236 0.032 0.977 0.190 0.041 0.901 0.236 0.032

cadA cadA 0.952 0.305 0.213 0.943 0.417 0.187 0.974 0.305 0.213 0.943 0.417 0.187

cadBD cadBD 0.849 0.235 0.072 0.885 0.347 0.047 0.937 0.235 0.072 0.885 0.347 0.047

czcA czcA 0.970 0.323 0.190 0.949 0.435 0.111 0.981 0.323 0.190 0.949 0.435 0.111

czcC czcC 0.858 0.193 0.080 0.849 0.305 0.160 0.866 0.193 0.080 0.849 0.305 0.160

czcD czcD 0.965 0.335 0.272 0.969 0.447 0.193 0.966 0.335 0.272 0.974 0.447 0.193

corC corC 0.885 0.263 0.029 0.903 0.375 0.109 0.884 0.263 0.029 0.903 0.375 0.109

chrA chrA 0.967 0.334 0.261 0.967 0.446 0.235 0.969 0.334 0.261 0.966 0.446 0.235

copA copA 0.964 0.312 0.265 0.946 0.424 0.185 0.985 0.312 0.265 0.949 0.424 0.185

cueO cueO 0.745 0.316 0.255 0.755 0.128 0.044 0.799 0.316 0.255 0.755 0.128 0.044

cusA cusA 0.894 0.232 0.120 0.879 0.344 0.284 0.914 0.232 0.120 0.879 0.344 0.284

Genes Abbr.

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

mer mer 0.925 0.283 0.241 0.940 0.370 0.177 0.932 0.283 0.241 0.940 0.370 0.177

merB merB 0.821 0.163 0.099 0.847 0.213 0.164 0.879 0.163 0.099 0.847 0.213 0.164

merP merP 0.886 0.192 0.093 0.904 0.242 0.029 0.907 0.192 0.093 0.904 0.242 0.029

nreB nreB 0.869 0.155 0.025 0.838 0.218 0.132 0.879 0.155 0.025 0.835 0.218 0.132

pbrA pbrA 0.918 0.233 0.180 0.918 0.296 0.254 0.918 0.233 0.180 0.925 0.296 0.254

pbrT pbrT 0.812 0.212 0.104 0.876 0.274 0.178 0.850 0.212 0.104 0.876 0.274 0.178

tehB tehB 0.796 0.265 0.249 0.972 0.385 0.229 0.870 0.265 0.249 0.972 0.385 0.229

terC terC 0.960 0.280 0.215 0.967 0.400 0.195 0.971 0.280 0.215 0.967 0.400 0.195

terD terD 0.963 0.275 0.216 0.953 0.395 0.152 0.985 0.275 0.216 0.951 0.395 0.152

terZ terZ 0.917 0.195 0.020 0.873 0.315 0.085 0.922 0.195 0.020 0.873 0.315 0.085

zitB zitB 0.914 0.218 0.048 0.921 0.337 0.017 0.961 0.218 0.048 0.921 0.337 0.017

zntA zntA 0.945 0.265 0.116 0.938 0.385 0.052 0.953 0.265 0.116 0.938 0.385 0.052

cspA cspA 0.891 0.260 0.151 0.904 0.372 0.231 0.917 0.260 0.151 0.904 0.372 0.231

cspB cspB 0.833 0.214 0.089 0.860 0.326 0.169 0.878 0.214 0.089 0.860 0.326 0.169

dnaK dnaK 0.940 0.293 0.194 0.946 0.380 0.114 0.947 0.293 0.194 0.946 0.380 0.114

groEL groEL 0.925 0.274 0.141 0.946 0.361 0.147 0.952 0.274 0.141 0.946 0.361 0.147

groES groES 0.927 0.256 0.050 0.911 0.343 0.043 0.962 0.256 0.050 0.929 0.343 0.043

grpE grpE 0.962 0.318 0.223 0.960 0.405 0.229 0.967 0.318 0.223 0.960 0.405 0.229

hrcA hrcA 0.956 0.320 0.249 0.963 0.407 0.255 0.964 0.320 0.249 0.964 0.407 0.255

Genes Abbr.

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

bglH bglH 0.893 0.262 0.013 0.895 0.308 0.038 0.882 0.262 0.013 0.895 0.308 0.038

bglP bglP 0.905 0.256 0.081 0.905 0.302 0.055 0.935 0.256 0.081 0.905 0.302 0.055

glnA glnA 0.976 0.328 0.309 0.974 0.415 0.316 0.982 0.328 0.309 0.974 0.415 0.316

glnR glnR 0.911 0.236 0.057 0.886 0.323 0.064 0.950 0.236 0.057 0.919 0.323 0.064

arcA arcA 0.935 0.174 0.022 0.933 0.220 0.013 0.950 0.174 0.022 0.933 0.220 0.013

arcB arcB 0.905 0.259 0.122 0.917 0.306 0.226 0.918 0.259 0.122 0.917 0.306 0.226

cydA cydA 0.905 0.277 0.112 0.919 0.389 0.032 0.933 0.277 0.112 0.919 0.389 0.032

cydB cydB 0.947 0.273 0.020 0.906 0.385 0.060 0.955 0.273 0.020 0.906 0.385 0.060

narH narH 0.886 0.200 0.015 0.909 0.250 0.018 0.890 0.200 0.015 0.927 0.250 0.018

narI narI 0.816 0.297 0.265 0.967 0.347 0.232 0.975 0.297 0.265 0.967 0.347 0.232

narJ narJ 0.904 0.244 0.065 0.895 0.294 0.032 0.911 0.244 0.065 0.895 0.294 0.032

ahpC ahpC 0.973 0.333 0.235 0.972 0.379 0.226 0.976 0.333 0.235 0.972 0.379 0.226

ahpF ahpF 0.948 0.292 0.219 0.948 0.339 0.210 0.963 0.292 0.219 0.949 0.339 0.210

fnr fnr 0.978 0.315 0.247 0.977 0.402 0.167 0.985 0.315 0.247 0.977 0.402 0.167

katA katA 0.918 0.272 0.158 0.922 0.359 0.164 0.927 0.272 0.158 0.924 0.359 0.164

katE katE 0.967 0.313 0.249 0.962 0.400 0.185 0.966 0.313 0.249 0.962 0.400 0.185

oxyR oxyR 0.969 0.280 0.221 0.966 0.343 0.147 0.975 0.280 0.221 0.968 0.343 0.147

perR perR 0.824 0.140 0.030 0.819 0.202 0.130 0.825 0.140 0.030 0.819 0.202 0.130

proV proV 0.937 0.293 0.226 0.950 0.356 0.224 0.942 0.293 0.226 0.950 0.356 0.224

proX proX 0.909 0.227 0.111 0.908 0.289 0.109 0.911 0.227 0.111 0.908 0.289 0.109

phoA phoA 0.956 0.274 0.165 0.955 0.337 0.163 0.980 0.274 0.165 0.955 0.337 0.163

Genes Abbr.

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

phoB phoB 0.953 0.286 0.227 0.962 0.349 0.225 0.959 0.286 0.227 0.962 0.349 0.225

pstA pstA 0.974 0.288 0.252 0.981 0.350 0.250 0.982 0.288 0.252 0.981 0.350 0.250

pstB pstB 0.976 0.306 0.244 0.964 0.369 0.242 0.981 0.306 0.244 0.964 0.369 0.242

pstC pstC 0.964 0.282 0.186 0.955 0.344 0.166 0.970 0.282 0.186 0.955 0.344 0.166

pstS pstS 0.894 0.243 0.139 0.932 0.306 0.119 0.896 0.243 0.139 0.932 0.306 0.119

clpC clpC 0.964 0.333 0.190 0.964 0.445 0.165 0.974 0.333 0.190 0.964 0.445 0.165

ctsR ctsR 0.901 0.277 0.087 0.923 0.389 0.007 0.912 0.277 0.087 0.923 0.389 0.007

obgE obgE 0.955 0.279 0.186 0.945 0.342 0.112 0.956 0.279 0.186 0.949 0.342 0.112

ABC antibiotic

transporter ABCat 0.919 0.288 0.060 0.924 0.334 0.051 0.921 0.288 0.060 0.924 0.334 0.051

MatE antibiotics MatE 0.948 0.296 0.131 0.946 0.383 0.067 0.986 0.296 0.131 0.948 0.383 0.067

MFS antibiotics MFS 0.966 0.317 0.234 0.966 0.367 0.170 0.970 0.317 0.234 0.966 0.367 0.170

SMR antibiotics SMR 0.974 0.295 0.261 0.974 0.415 0.241 0.986 0.295 0.261 0.974 0.415 0.241

Mex Mex 0.815 0.148 0.037 0.811 0.198 0.043 0.867 0.148 0.037 0.881 0.198 0.043

beta-lactamase lac 0.895 0.276 0.007 0.907 0.323 0.019 0.907 0.276 0.007 0.915 0.323 0.019

class A

beta-lactamase lacA 0.942 0.312 0.125 0.949 0.424 0.099 0.953 0.312 0.125 0.949 0.424 0.099

class C

beta-lactamase lacC 0.920 0.313 0.224 0.944 0.425 0.199 0.921 0.313 0.224 0.950 0.425 0.199

Tet Tet 0.942 0.265 0.091 0.937 0.385 0.027 0.951 0.265 0.091 0.942 0.385 0.027

Van Van 0.806 0.106 0.017 0.771 0.225 0.134 0.835 0.106 0.017 0.814 0.225 0.134

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted metabolic potential of functional genes.

b Models were constructed without the information of microbial abundances of dominant phyla.

c Models were constructed with the information of microbial abundances of dominant phyla.

d The occurrence shows the percentage of the total samples where the probes of a given gene were detected in.

e This null model is to set all predicted metabolic potentials equal to the average value across all samples.

f This null model is to set all metabolic potentials equal to the minimum observed value.

Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN).

Genes Abbr.

Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

aclB aclB 0.755 0.104 0.054 0.818 0.175 0.055 0.810 0.104 0.054 0.818 0.175 0.055

CODH CODH 0.869 0.213 0.056 0.818 0.112 0.055 0.869 0.213 0.056 0.818 0.112 0.055

Pcc Pcc 0.791 0.170 0.042 0.863 0.214 0.044 0.791 0.170 0.042 0.863 0.214 0.044

RubisCo RubisCo 0.865 0.206 0.040 0.858 0.180 0.040 0.876 0.206 0.040 0.858 0.180 0.040

nifH nifH 0.786 0.175 0.050 0.800 0.139 0.051 0.786 0.175 0.050 0.836 0.139 0.051

gdh gdh 0.744 0.078 0.062 0.753 0.191 0.062 0.773 0.078 0.062 0.776 0.191 0.062

ureC ureC 0.859 0.246 0.059 0.814 0.155 0.058 0.859 0.246 0.059 0.814 0.155 0.058

amoA amoA 0.792 0.108 0.036 0.831 0.208 0.037 0.792 0.108 0.036 0.852 0.208 0.037

narG narG 0.839 0.187 0.041 0.819 0.145 0.040 0.839 0.187 0.041 0.819 0.145 0.040

nirK nirK 0.891 0.207 0.052 0.844 0.113 0.051 0.891 0.207 0.052 0.844 0.113 0.051

nirS nirS 0.863 0.253 0.043 0.824 0.164 0.042 0.873 0.253 0.043 0.824 0.164 0.042

norB norB 0.850 0.189 0.043 0.794 0.105 0.042 0.871 0.189 0.043 0.843 0.105 0.042

nosZ nosZ 0.849 0.134 0.041 0.867 0.179 0.041 0.849 0.134 0.041 0.876 0.179 0.041

nasA nasA 0.839 0.170 0.068 0.776 0.089 0.067 0.834 0.170 0.068 0.817 0.089 0.067

NiR NiR 0.703 0.153 0.064 0.857 0.144 0.067 0.726 0.153 0.064 0.868 0.144 0.067

nirA nirA 0.800 0.174 0.048 0.836 0.147 0.049 0.800 0.174 0.048 0.836 0.147 0.049

nirB nirB 0.777 0.112 0.047 0.766 0.091 0.047 0.777 0.112 0.047 0.766 0.091 0.047

napA napA 0.794 0.165 0.042 0.876 0.165 0.043 0.794 0.165 0.042 0.876 0.165 0.043

nrfA nrfA 0.814 0.208 0.047 0.779 0.137 0.046 0.814 0.208 0.047 0.779 0.137 0.046

Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).

Genes Abbr.

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

phytase phytase 0.775 0.111 0.045 0.797 0.153 0.046 0.775 0.111 0.045 0.797 0.153 0.046

ppk ppk 0.842 0.201 0.046 0.826 0.167 0.046 0.842 0.201 0.046 0.826 0.167 0.046

ppx ppx 0.867 0.261 0.066 0.727 0.265 0.063 0.883 0.261 0.066 0.727 0.265 0.063

aprA aprA 0.858 0.203 0.036 0.839 0.201 0.036 0.858 0.203 0.036 0.875 0.201 0.036

aprB aprB 0.725 0.180 0.058 0.763 0.134 0.059 0.746 0.180 0.058 0.763 0.134 0.059

dsrA dsrA 0.857 0.235 0.063 0.823 0.167 0.062 0.857 0.235 0.063 0.823 0.167 0.062

dsrB dsrB 0.784 0.231 0.036 0.865 0.193 0.037 0.784 0.231 0.036 0.865 0.193 0.037

sox sox 0.873 0.284 0.043 0.820 0.183 0.042 0.884 0.284 0.043 0.836 0.183 0.042

Fe-S cluster

binding protein fes 0.821 0.251 0.043 0.786 0.154 0.042 0.847 0.251 0.043 0.786 0.154 0.042

ferredoxin fer 0.790 0.289 0.110 0.865 0.261 0.112 0.790 0.289 0.110 0.865 0.261 0.112

ferredoxin

oxidoreductase fero 0.805 0.182 0.064 0.757 0.186 0.063 0.805 0.182 0.064 0.757 0.186 0.063

NADH ubiquinone

oxidoreductase NADH 0.992 0.293 0.058 0.737 0.284 0.053 0.992 0.293 0.058 0.737 0.284 0.053

terminal quinol

oxidase quio 0.987 0.337 0.051 0.861 0.285 0.048 0.988 0.337 0.051 0.861 0.285 0.048

cytochrome cyt 0.815 0.131 0.042 0.846 0.194 0.043 0.815 0.131 0.042 0.846 0.194 0.043

hydrogenase hyd 0.849 0.330 0.073 0.725 0.242 0.071 0.849 0.330 0.073 0.686 0.242 0.071

Ni-Fe hydrogenase NFhyd 0.840 0.144 0.049 0.853 0.188 0.049 0.840 0.144 0.049 0.871 0.188 0.049

glycosyl transferase glyt 0.829 0.170 0.042 0.859 0.183 0.042 0.861 0.170 0.042 0.844 0.183 0.042

ABC transporter ABCt 0.827 0.143 0.039 0.855 0.200 0.040 0.827 0.143 0.039 0.855 0.200 0.040

Genes Abbr.

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

silA silA 0.753 0.180 0.047 0.798 0.142 0.048 0.807 0.180 0.047 0.823 0.142 0.048

silC silC 0.814 0.170 0.040 0.881 0.186 0.041 0.832 0.170 0.040 0.881 0.186 0.041

silP silP 0.732 0.112 0.044 0.879 0.106 0.046 0.808 0.112 0.044 0.879 0.106 0.046

al al 0.829 0.146 0.043 0.844 0.178 0.043 0.829 0.146 0.043 0.844 0.178 0.043

aoxB aoxB 0.877 0.221 0.042 0.862 0.191 0.042 0.877 0.221 0.042 0.862 0.191 0.042

arsA arsA 0.798 0.128 0.053 0.802 0.150 0.054 0.840 0.128 0.053 0.858 0.150 0.054

arsB arsB 0.868 0.211 0.063 0.794 0.092 0.062 0.868 0.211 0.063 0.823 0.092 0.062

arsC arsC 0.873 0.164 0.042 0.859 0.145 0.041 0.873 0.164 0.042 0.869 0.145 0.041

arsM arsM 0.839 0.212 0.080 0.843 0.220 0.080 0.839 0.212 0.080 0.843 0.220 0.080

cadA cadA 0.792 0.138 0.043 0.854 0.143 0.044 0.810 0.138 0.043 0.854 0.143 0.044

cadBD cadBD 0.779 0.178 0.075 0.758 0.195 0.074 0.822 0.178 0.075 0.758 0.195 0.074

czcA czcA 0.874 0.271 0.035 0.840 0.191 0.035 0.885 0.271 0.035 0.840 0.191 0.035

czcC czcC 0.828 0.220 0.059 0.769 0.101 0.057 0.828 0.220 0.059 0.769 0.101 0.057

czcD czcD 0.784 0.158 0.049 0.851 0.181 0.050 0.784 0.158 0.049 0.841 0.181 0.050

corC corC 0.864 0.200 0.044 0.868 0.192 0.044 0.882 0.200 0.044 0.868 0.192 0.044

chrA chrA 0.842 0.181 0.045 0.848 0.192 0.045 0.842 0.181 0.045 0.848 0.192 0.045

copA copA 0.760 0.262 0.053 0.807 0.285 0.054 0.809 0.262 0.053 0.785 0.285 0.054

cueO cueO 0.723 0.108 0.124 0.775 0.212 0.125 0.723 0.108 0.124 0.775 0.212 0.125

cusA cusA 0.848 0.156 0.039 0.840 0.190 0.039 0.843 0.156 0.039 0.885 0.190 0.039

Genes Abbr.

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

mer mer 0.803 0.147 0.067 0.758 0.156 0.066 0.803 0.147 0.067 0.758 0.156 0.066

merB merB 0.764 0.143 0.070 0.745 0.189 0.069 0.781 0.143 0.070 0.745 0.189 0.069

merP merP 0.803 0.203 0.080 0.813 0.223 0.080 0.803 0.203 0.080 0.813 0.223 0.080

nreB nreB 0.832 0.172 0.058 0.806 0.138 0.058 0.832 0.172 0.058 0.824 0.138 0.058

pbrA pbrA 0.888 0.196 0.039 0.901 0.224 0.039 0.888 0.196 0.039 0.902 0.224 0.039

pbrT pbrT 0.812 0.166 0.047 0.817 0.177 0.047 0.812 0.166 0.047 0.817 0.177 0.047

tehB tehB 0.890 0.280 0.054 0.888 0.275 0.054 0.890 0.280 0.054 0.888 0.275 0.054

terC terC 0.864 0.124 0.038 0.891 0.179 0.038 0.864 0.124 0.038 0.891 0.179 0.038

terD terD 0.791 0.188 0.044 0.855 0.161 0.045 0.820 0.188 0.044 0.829 0.161 0.045

terZ terZ 0.778 0.115 0.045 0.802 0.136 0.045 0.805 0.115 0.045 0.802 0.136 0.045

zitB zitB 0.872 0.171 0.051 0.859 0.145 0.051 0.872 0.171 0.051 0.859 0.145 0.051

zntA zntA 0.850 0.203 0.037 0.853 0.209 0.037 0.850 0.203 0.037 0.853 0.209 0.037

cspA cspA 0.867 0.220 0.033 0.869 0.218 0.033 0.874 0.220 0.033 0.869 0.218 0.033

cspB cspB 0.801 0.162 0.045 0.787 0.134 0.044 0.801 0.162 0.045 0.787 0.134 0.044

dnaK dnaK 0.839 0.148 0.042 0.844 0.157 0.042 0.839 0.148 0.042 0.844 0.157 0.042

groEL groEL 0.832 0.149 0.054 0.834 0.152 0.054 0.832 0.149 0.054 0.834 0.152 0.054

groES groES 0.886 0.275 0.035 0.859 0.203 0.034 0.907 0.275 0.035 0.862 0.203 0.034

grpE grpE 0.881 0.260 0.043 0.857 0.213 0.042 0.881 0.260 0.043 0.857 0.213 0.042

hrcA hrcA 0.824 0.140 0.040 0.848 0.193 0.040 0.838 0.140 0.040 0.867 0.193 0.040

Genes Abbr.

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

bglH bglH 0.782 0.158 0.040 0.853 0.174 0.041 0.808 0.158 0.040 0.853 0.174 0.041

bglP bglP 0.777 0.203 0.048 0.813 0.117 0.048 0.835 0.203 0.048 0.813 0.117 0.048

glnA glnA 0.846 0.173 0.042 0.813 0.108 0.041 0.846 0.173 0.042 0.813 0.108 0.041

glnR glnR 0.813 0.107 0.050 0.804 0.123 0.050 0.856 0.107 0.050 0.882 0.123 0.050

arcA arcA 0.890 0.326 0.075 0.879 0.304 0.075 0.890 0.326 0.075 0.879 0.304 0.075

arcB arcB 0.852 0.160 0.049 0.890 0.223 0.049 0.865 0.160 0.049 0.890 0.223 0.049

cydA cydA 0.755 0.124 0.064 0.836 0.158 0.065 0.784 0.124 0.064 0.836 0.158 0.065

cydB cydB 0.887 0.308 0.045 0.830 0.193 0.044 0.887 0.308 0.045 0.830 0.193 0.044

narH narH 0.799 0.305 0.058 0.852 0.293 0.059 0.821 0.305 0.058 0.866 0.293 0.059

narI narI 0.884 0.179 0.063 0.813 0.193 0.062 0.828 0.179 0.063 0.813 0.193 0.062

narJ narJ 0.819 0.193 0.043 0.796 0.147 0.042 0.819 0.193 0.043 0.796 0.147 0.042

ahpC ahpC 0.861 0.195 0.045 0.911 0.195 0.046 0.861 0.195 0.045 0.911 0.195 0.046

ahpF ahpF 0.883 0.200 0.076 0.819 0.275 0.075 0.882 0.200 0.076 0.822 0.275 0.075

fnr fnr 0.853 0.161 0.037 0.908 0.172 0.038 0.853 0.161 0.037 0.908 0.172 0.038

katA katA 0.814 0.152 0.064 0.796 0.135 0.064 0.814 0.152 0.064 0.814 0.135 0.064

katE katE 0.811 0.158 0.039 0.820 0.162 0.039 0.826 0.158 0.039 0.820 0.162 0.039

oxyR oxyR 0.842 0.169 0.041 0.886 0.139 0.041 0.858 0.169 0.041 0.885 0.139 0.041

perR perR 0.825 0.190 0.058 0.795 0.131 0.057 0.825 0.190 0.058 0.795 0.131 0.057

proV proV 0.795 0.110 0.048 0.831 0.180 0.049 0.795 0.110 0.048 0.831 0.180 0.049

proX proX 0.833 0.192 0.047 0.788 0.102 0.046 0.833 0.192 0.047 0.788 0.102 0.046

phoA phoA 0.881 0.181 0.045 0.879 0.177 0.045 0.881 0.181 0.045 0.879 0.177 0.045

Genes Abbr.

ENVb TAXA

Predicted

Observed

Null model

(Mean)e

Null model

(Minimum)f

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

Predicted

Observed

Null model

(Mean)

Null model

(Minimum)

phoB phoB 0.838 0.196 0.043 0.867 0.153 0.043 0.838 0.196 0.043 0.867 0.153 0.043

pstA pstA 0.892 0.203 0.049 0.909 0.122 0.049 0.906 0.203 0.049 0.909 0.122 0.049

pstB pstB 0.839 0.177 0.036 0.854 0.208 0.037 0.839 0.177 0.036 0.854 0.208 0.037

pstC pstC 0.872 0.176 0.041 0.874 0.181 0.041 0.872 0.176 0.041 0.874 0.181 0.041

pstS pstS 0.817 0.113 0.056 0.820 0.119 0.056 0.817 0.113 0.056 0.820 0.119 0.056

clpC clpC 0.905 0.269 0.032 0.897 0.252 0.031 0.905 0.269 0.032 0.897 0.252 0.031

ctsR ctsR 0.788 0.199 0.055 0.824 0.171 0.056 0.788 0.199 0.055 0.824 0.171 0.056

obgE obgE 0.840 0.168 0.042 0.843 0.174 0.042 0.842 0.168 0.042 0.844 0.174 0.042

ABC antibiotic

transporter ABCat 0.828 0.168 0.046 0.847 0.194 0.046 0.840 0.168 0.046 0.847 0.194 0.046

MatE antibiotics MatE 0.866 0.239 0.048 0.866 0.212 0.048 0.916 0.239 0.048 0.887 0.212 0.048

MFS antibiotics MFS 0.873 0.216 0.044 0.876 0.221 0.044 0.873 0.216 0.044 0.876 0.221 0.044

SMR antibiotics SMR 0.808 0.246 0.036 0.878 0.181 0.038 0.814 0.246 0.036 0.878 0.181 0.038

Mex Mex 0.872 0.205 0.071 0.809 0.144 0.070 0.865 0.205 0.071 0.867 0.144 0.070

beta-lactamase lac 0.835 0.196 0.035 0.833 0.192 0.035 0.835 0.196 0.035 0.833 0.192 0.035

class A

beta-lactamase lacA 0.816 0.161 0.033 0.882 0.228 0.034 0.832 0.161 0.033 0.833 0.228 0.034

class C

beta-lactamase lacC 0.828 0.230 0.055 0.796 0.156 0.054 0.838 0.230 0.055 0.796 0.156 0.054

Tet Tet 0.795 0.176 0.043 0.865 0.221 0.045 0.795 0.176 0.043 0.871 0.221 0.045

Van Van 0.772 0.161 0.120 0.737 0.173 0.119 0.809 0.161 0.120 0.756 0.173 0.119

b Models were constructed without the information of microbial abundances of dominant phyla.

c Models were constructed with the information of microbial abundances of dominant phyla.

d The occurrence shows the percentage of the total samples where the probes of a given gene were detected in.

e This null model is to set all predicted metabolic potentials equal to the average value across all samples.

f This null model is to set all metabolic potentials equal to the minimum observed value.

Table S11. Predictive equations and functional parameters that provide the best prediction for relative abundances of dominant microbial taxa based on the artificial neural network (ANN).

Taxa Abbr. Functional

parameters Predictive equations

Train P valueb Validation P value Average P value

Euryarchaeota Eury pH, EC, Fe2+ 95.9*EC*Fe2+ + 26.7*pH*pH*Fe2+ - 134*pH*Fe2+ - 13.6*EC*EC*Fe2+ 0.712 < 0.001

0.598 0.027

0.685 < 0.001

Acidobacteria Acido Fe3+, Gamma 2.44/(Fe3+*Fe3+ - 3.27) -1.39/(55.2 - 2*Fe3+*Gamma) 0.728 < 0.001

0.650 0.021

0.709 < 0.001

Actinobacteria Actino Eury, Fe3+ - 68/(Eury*Eury - 478) - 0.816/(1010*Eury*Eury - 476*Eury) 0.784 < 0.001

0.660 0.047

0.694 < 0.001

Firmicutes Firm Actino, Nitro 1.08*Actino + 0.046*Nitro + -9.87*Nitro/(51.7*Actino + Nitro*Nitro - 50.1*Nitro) 0.746 < 0.001

0.649 0.014

0.725 < 0.001

Nitrospira Nitro pH, Cu, Zn Cu -5.67/(Cu - 2.06) + 61*Zn/pH - 23*Zn 0.778 < 0.001

0.692 0.008

0.757 < 0.001

Alphaproteobacteria Alpha pH, TOC, Beta pH + 2.03/(3.93 - Beta) - 0.011/(1.49*TOC - 0.556) 0.657 < 0.001

0.650 0.023

0.657 < 0.001

Betaproteobacteria Beta pH, Fe2+, Eury 29.5*Eury + 13.4*pH + 32.7*pH*Fe2+ + 4.88*Eury*pH*pH - 71.3*Fe2+ - 24.9*pH*Eury 0.807 < 0.001

0.701 0.007

0.783 < 0.001

Gammaproteobacteria Gamma Fe2+, Beta 21.6 + 3.02/(Beta - 1.4) - 10.2/(Beta - 17.6) - 0.219*Beta 0.865 < 0.001 0.766 < 0.001 0.801 < 0.001

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative microbial abundances.

b The observed values were randomly permutated 10,000 times and compared with predicted values. The significance of the Bray-Curtis similarity is defined as the number of resamples with greater similarity by random permutations than that with

observed values divided by total number of resamples.

Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN).

Genes Abbr. Functional

aclB aclB Al, Acido 232961 + 14107*Al + 1130*Al*Acido*Acido*Acido -

186*Acido*Acido*Acido*Acido - 8207*Acido*Al*Al 0.950 < 0.001 0.920 0.045 0.931 < 0.001

CODH CODH pH, DO, TOC 508125 + 191961*pH*pH/(87.8 + 1.92*pH*pH - 9.03*TOC - 36.2*DO*DO) 0.975 < 0.001

0.965 0.042

0.969 < 0.001

Pcc Pcc pH, TOC 1176937 + (241871 - 100000*pH)/(1.89*TOC + pH*TOC - pH) 0.970 < 0.001 0.950 0.031 0.964 < 0.001

RubisCo RubisCo DO, Alpha 643499 + 18027*Alpha + 12.9*DO*Alpha*Alpha*Alpha - 1764*Alpha*Alpha 0.965 < 0.001

0.945 0.024

0.956 < 0.001

nifH nifH DO 3066210 + 65545*DO + 1684063/DO - 410346/DO*DO 0.965 < 0.001 0.919 0.045 0.931 < 0.001

gdh gdh Fe3+, Nitro 200000*Fe3+ + 1182*Nitro + 1242/(2.23 - Nitro) - 35179*Fe3+*Fe3+ 0.941 < 0.001

0.934 0.026

0.936 < 0.001

ureC ureC Firm, Beta 1284444 + 2890/(0.013 - Firm) + (28972159 - 346594*Beta)/(Beta - 84.6) 0.956 < 0.001 0.932 0.066 0.945 < 0.001

amoA amoA pH 1031769 + 1000000*pH - 45446/(3.80 - pH) - 199433*pH*pH 0.976 0.010

0.951 0.012

0.962 < 0.001

narG narG nirK 428136 + 2.04*nirK + (547048 - 0.912*nirK)/(0.007*nirK - 5.59) - 0.006*nirK*nirK 0.982 < 0.001 0.960 0.038 0.971 < 0.001

nirK nirK Cu, Acido, Gamma 671138 + 203679*Acido + 164056*Cu + 0.964*Gamma*Gamma*Gamma -

3641*Cu*Gamma - 76175*Acido*Acido 0.971 < 0.001

0.926 0.002

0.938 < 0.001

nirS nirS Eury, Gamma, narG 705537 + 0.038*Eury*narG + 0.0002*narG*Gamma*Gamma - 2362*Gamma -

38255*Eury - 181*Gamma*Gamma - 760*Eury*Eury 0.940 < 0.001 0.935 0.020 0.939 < 0.001

norB norB Fe2+, Cd, Eury 159990 + 172363*Cd + 16726*Fe2+ + 1462*Eury + 100439*Fe2+*Fe2+*Cd -

357357*Fe2+*Cd 0.943 < 0.001

0.929 0.015

0.932 < 0.001

nosZ nosZ pH 4299268*pH + 49942917/pH - 39083676/(pH*pH) - 22315156 - 288282*pH*pH 0.934 < 0.001 0.917 0.042 0.930 < 0.001

nasA nasA Fe2+, Acido, NiR 518458 + 61193*Acido + 0.000006*NiR*NiR - 2.46*NiR - 61193*Fe2+*Acido*Acido 0.913 < 0.001

0.912 0.008

0.912 < 0.001

NiR NiR Acido 224601 + 137056*Acido + 26674*Acido*Acido*Acido -

1765*Acido*Acido*Acido*Acido - 113093*Acido*Acido 0.946 0.003 0.919 0.005 0.925 < 0.001

nirA nirA pH, nirB 60902 + 0.304*nirB - 2733*pH/(0.00001*nirB - pH) 0.914 0.002

0.895 0.033

0.900 < 0.001

nirB nirB pH 250137*pH + 86173*pH*pH + 10915*pH*pH*pH*pH - 276426 - 75289*pH*pH*pH 0.876 0.002 0.863 0.033 0.866 < 0.001

napA napA Pb, Acido 453943 - 152*Acido/(0.013 - 0.341*Pb) 0.945 0.003

0.944 0.034

0.944 < 0.001

nrfA nrfA pH 585744*pH - 114373*pH*pH 0.931 0.009 0.929 0.026 0.929 < 0.001

Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).

phytase phytase EC, As 9590644 + 70474754*EC*As + 709869*EC*EC + 53127*As*As - 5191343*EC -

132295841*As - 9389329*As*EC*EC 0.868 0.015

0.775 0.020

0.797 0.011

ppk ppk DO 844411 + 137339/(DO*DO) + (51307*DO - 43871)/(0.968*DO*DO*DO -

DO*DO*DO*DO) 0.974 0.005 0.936 0.011 0.912 < 0.001

ppx ppx pH 1071301 + (35959*pH*pH - 11965*pH*pH*pH)/(pH - 3.02) 0.965 < 0.001

0.948 0.045

0.961 < 0.001

aprA aprA EC, Acido 96020459*EC + 425725*EC*Acido + 2284143*EC*EC*EC +

15880*Acido*Acido*Acido - 118817246 - 1574124*Acido - 25690773*EC*EC 0.964 < 0.001 0.958 0.003 0.959 < 0.001

aprB aprB Fe2+ 170011 - 3967/(8.93 - 3.44*Fe2+) - 13422*Fe2+ 0.916 0.003

0.907 0.014

0.914 0.002

dsrA dsrA DO, Al, P 1854633 + 39512/(0.684 + 0.035*DO*DO*DO - DO) - 12937*DO 0.970 < 0.001 0.954 0.009 0.966 < 0.001

dsrB dsrB EC, As 742540*EC + (4891561*EC + 543507*EC*EC*EC - 3668671 -

45292*EC*EC*EC*EC - 2445781*EC*EC)/As - 1227621 - 86116*As 0.954 0.010

0.951 0.013

0.953 0.004

sox sox Pb, Gamma 586406 + 100000*Pb + 1473*Gamma + 4391/(Gamma - 2.98) 0.949 < 0.001 0.946 0.011 0.948 < 0.001

Fe-S cluster binding

protein fes Actino

5970955 + 31584*Actino + 20213/Actino - 19.8/(Actino*Actino) + 2803/(1.94*Actino

- 0.134) 0.980 < 0.001

0.974 0.026

0.979 < 0.001

ferredoxin fer Fe2+ 51937 + 98.3/(0.005 + 0.701*Fe2+*Fe2+ - 0.167*Fe2+ - 0.539*Fe2+*Fe2+*Fe2+) 0.907 < 0.001 0.828 0.014 0.847 < 0.001

ferredoxin

oxidoreductase fero Actino, fes

4293408 + 1368359*Actino + 1.01e-6*fes*fes + 6.74e-21*fes*fes*fes*fes +

0.169*fes*Actino*Actino + 5.47e-8*Actino*fes*fes - 3.37*fes - 0.547*Actino*fes -

1.35e-13*fes*fes*fes - 421588*Actino*Actino - 1.69e-8*Actino*Actino*fes*fes

0.925 < 0.001

0.901 0.012

0.918 < 0.001

NADH ubiquinone

oxidoreductase NADH Actino, quio, fes 12816 + 2.04*fes 0.989 < 0.001 0.979 < 0.001 0.980 < 0.001

terminal quinol

oxidase quio Actino, fes 23382*Actino + 3.75*fes - 488737 0.987 < 0.001

0.978 < 0.001

0.980 < 0.001

cytochrome cyt Fe3+, P 1834354*Fe3+ + 1000000*Fe3+*P - 62.1/(P*P) - 1434077 - 2956422*P -

336122*Fe3+*Fe3+ 0.912 0.008 0.908 0.009 0.902 0.016

hydrogenase hyd pH 1137811 - 1117019/pH - 169985*pH 0.931 0.006

0.913 0.042

0.918 0.024

Ni-Fe hydrogenase NFhyd pH 26001 + 35284*pH + 1250/(3.463 - pH) - 3057/(3.1 - pH) - 1167*pH*pH*pH 0.904 0.002 0.878 0.003 0.884 < 0.001

glycosyl transferase glyt Actino 167223 + 11945/(151*Actino - 4.61) + 79.4/(0.614*Actino - 0.043) 0.913 0.010

0.885 0.018

0.892 < 0.001

ABC transporter ABCt pH, DO 50289301 + 5156265/(5.43*pH - pH*DO) 0.974 0.011 0.973 0.012 0.974 0.004

silA silA Gamma 42797 + 6873/(Gamma - 4.32) + 3619/(0.062*Gamma - 4.35) 0.829 0.006

0.811 0.041

0.816 < 0.001

silC silC Firm 403314+ 970*Firm + -49358/(Firm - 11) + -2067/(10.2*Firm*Firm - 9.96*Firm) 0.948 0.001 0.933 0.053 0.937 < 0.001

silP silP Acido, Gamma 84496 + 123593*Acido + 23245*Acido*Acido*Acido -

48225*Acido*Acido*Acido*Acido - 328429*Acido*Acido 0.943 < 0.001

0.923 0.009

0.928 < 0.001

al al Actino 336538 + 50386*Actino/(0.21 - 5.92*Actino) 0.928 0.010 0.923 0.021 0.924 0.007

aoxB aoxB DO 275300 + 8228/DO - 1572*DO 0.972 0.028

0.941 0.037

0.949 0.005

arsA arsA SO42-, Acido

26815 + 10000*Acido + 32.7/(Acido - 0.451*Acido*Acido*Acido -

1.21*Acido*Acido) - 7268*Acido*Acido 0.831 < 0.001 0.807 0.001 0.823 < 0.001

arsB arsB P, arsM 84471 + (0.014*arsM - 579840 - 9.06e-11*arsM*arsM)/(P - 1.12) 0.956 < 0.001

0.909 0.002

0.921 < 0.001

arsC arsC pH, Firm 1190543 + 155*Firm*Firm - 35325/(Firm - 5.37) + (1195 - 393*pH)/Firm - 42731*pH 0.986 0.001 0.973 0.002 0.977 < 0.001

arsM arsM pH, DO 70000000 + (17476356 + 20185331*DO - 7970875*pH)/(DO - 0.162) 0.977 0.003

0.970 0.072

0.976 < 0.001

cadA cadA pH, Nitro 1353931 + 71.8*Nitro*Nitro + 15345/(5.54*Nitro - 4.74 - pH) - 100000*pH 0.974 < 0.001 0.949 0.003 0.956 < 0.001

cadBD cadBD Gamma 113340 + 189*Gamma*Gamma + 677/(0.282 - 0.002*Gamma*Gamma) -

4031*Gamma - 1.69*Gamma*Gamma*Gamma 0.937 < 0.001

0.889 0.003

0.901 < 0.001

czcA czcA Cd, Eury 1350240 + 1491*Eury + (3.11 - 415972*Cd)/(Eury - 12.6*Cd) - 572026*Cd 0.981 0.047 0.936 0.023 0.948 0.010

czcC czcC pH, Zn, Cd (2321245*Zn - 3460027)/(24.2*Zn - 35.3) 0.866 0.002

0.833 0.021

0.858 0.001

czcD czcD pH, Zn, Cd 2742379 - 298027/Zn + 1937665*Zn*Zn - 3086079*Zn - 100000*pH*Zn -

367520*Zn*Zn*Zn 0.966 < 0.001 0.965 0.028 0.965 < 0.001

corC corC DO, EC, Alpha 284882 + 14513*DO*DO + 7243*DO*EC*Alpha - 102100*DO - 21730*DO*Alpha -

36.9*Alpha*Alpha*Alpha 0.884 0.002

0.869 0.038

0.872 < 0.001

chrA chrA pH, Pb 1949385 + 949385*pH*Pb - 203945*pH - 2609204*Pb 0.969 < 0.001 0.963 0.026 0.967 < 0.001

copA copA Fe3+, Beta, cueO 307566 + 1429111*Fe3+ + 23.6*cueO + 3.2e-8*cueO*cueO*cueO - 0.002*cueO*cueO -

228820*Fe3+*Fe3+ 0.985 < 0.001

0.972 0.001

0.975 < 0.001

cueO cueO Al, cusA 20182 + 53375/(9.85 - 2e-8*cusA*cusA) 0.799 0.002 0.632 0.071 0.741 0.002

cusA cusA Cu, Acido 42529+ 962/(4.53 - 72.5*Acido) 0.914 0.006 0.872 0.019 0.882 0.002

mer mer As, P 816722 + 190355*P + 27.4/(0.916*P - 0.036) - 2881875*As*P 0.932 0.006 0.923 0.068 0.925 0.001

merB merB TOC, Eury (2695077*Eury - 32425627)/(71.1*Eury - 837) 0.879 0.007

0.834 0.018

0.847 0.003

merP merP As, mer 113795 + 0.121*As*mer + 2.03e-7*mer*mer + 4.85e-13*As*mer*mer*mer - 0.203*mer

- 15146*As - 2.42e-19*As*mer*mer*mer*mer - 3.63e-7*As*mer*mer 0.907 < 0.001 0.878 0.030 0.886 < 0.001

nreB nreB DO, Al, Pb (153719 - 4213612*Pb)/(1.78 - 45.9*Pb) 0.879 < 0.001

0.865 0.021

0.869 0.000

pbrA pbrA Eury, Actino,Pb 36297 + 61190*Pb + 6318*Actino + 17.2*Eury*Eury - 2131*Eury*Actino -

359701*Pb*Pb*Pb 0.918 < 0.001 0.913 0.004 0.914 < 0.001

pbrT pbrT SO42-

4060888 + 2626557*SO42-*SO4

2- + 48640*SO42-*SO4

2-*SO42-*SO4

2- + 198/(1.82*SO42-

- 6.66) - 5274654*SO42- - 583679*SO4

2-*SO42-*SO4

2- 0.850 0.004

0.801 0.012

0.812 0.004

tehB tehB As, P, terD 456011 + P*terD + (0.182*As*terD + 100000*As*As - 109482*As)/P - 600000*P 0.870 < 0.001 0.774 0.031 0.796 < 0.001

terC terC DO, S, Fe3+ 740223 + (307529*DO*S - 922586*DO)/(7.44*DO - 1.47 - DO*DO) 0.971 0.003

0.957 0.006

0.960 < 0.001

terD terD Al, Cu, Acido 961472 - 74.5/Acido + 226398*Al*Cu + 298408*Cu*Cu - 86290*Al - 594027*Cu -

100000*Al*Cu*Cu 0.985 < 0.001 0.960 < 0.001 0.966 < 0.001

terZ terZ pH, Eury 182167*pH + 23708*Eury - 144660 - 8700*pH*Eury - 6455*pH*pH*pH 0.922 0.002

0.917 0.015

0.918 < 0.001

zitB zitB Actino, Firm, zntA 33082 + 7649*Actino*Actino + 1.01e-13*zntA*zntA*zntA - 0.032*Actino*zntA 0.961 < 0.001 0.913 < 0.001 0.925 < 0.001

zntA zntA Zn, Firm 697816 + 21491*Zn - 3427*Zn/(Firm - 0.046) 0.953 0.003

0.938 0.008

0.949 0.001

cspA cspA Fe3+, Gamma 13182 + 10000*Fe3+ + (10000*Fe3+ - 36835)/(Gamma - 4.05) 0.917 < 0.001 0.875 0.037 0.906 < 0.001

cspB cspB pH 26739 - 462/(2.08 - pH) 0.878 < 0.001

0.818 0.050

0.833 < 0.001

dnaK dnaK DO, EC 521331 - 130258/(20*DO - 22.6) 0.947 < 0.001 0.918 0.043 0.940 < 0.001

groEL groEL groES 3603648 - 34228689570/groES + 0.001*groES*groES - 96.4*groES - 4.64e-

9*groES*groES*groES 0.952 0.003

0.916 0.019

0.925 0.001

groES groES As, Cd, Acido 96478 - 715/As - 10937*Acido - 23920*As 0.962 0.002 0.907 0.003 0.922 < 0.001

grpE grpE As, groEL 53703153 - 10454992600000/groEL + 8.91e-5*groEL*groEL + 7.5e17/(groEL*groEL) -

114*groEL 0.967 < 0.001

0.949 0.023

0.962 < 0.001

hrcA hrcA pH, Acido 5386545*pH + 204351*pH*pH*pH - 3861904 - 68858*Acido - 1860173*pH*pH 0.964 0.018 0.933 0.014 0.941 0.029

bglH bglH P, Beta 35250+ 250*P*P*Beta*Beta - 403*P*P*P*Beta*Beta 0.882 0.001

0.880 0.040

0.882 0.003

bglP bglP Acido 74057 + 87020*Acido + 156039*Acido*Acido*Acido -

31786*Acido*Acido*Acido*Acido - 222839*Acido*Acido 0.935 < 0.001 0.906 0.016 0.928 < 0.001

glnA glnA pH 2421028 + 463/(pH - 3.52) - 28547*pH*pH 0.982 0.002

0.975 0.007

0.976 < 0.001

glnR glnR TOC, Eury, Acido 197839 + 35819/TOC + 238469*TOC*Acido + 5004*TOC*Eury - 100000*Acido -

100000*Acido*TOC*TOC 0.950 < 0.001 0.912 0.002 0.922 < 0.001

arcA arcA cydB, narH 1.21*narH + 0.146*cydB + 20028857980/narH - 265934 0.950 < 0.001

0.930 < 0.001

0.935 < 0.001

arcB arcB pH, DO (14146781 - 49361*pH - 19318332*DO)/(247 - 329*DO) 0.918 0.008 0.866 0.017 0.905 0.001

cydA cydA Fe3+, Eury, arcB 68721 + 15787*Fe3+ + 1542*Eury + 0.492*arcB/(0.504 + Eury) 0.933 < 0.001

0.926 0.010

0.928 < 0.001

cydB cydB Zn, cydA, narH 129718 + 1.46*cydA + 0.764*narH*narH/cydA - 1.59*narH 0.955 < 0.001 0.945 < 0.001 0.947 < 0.001

narH narH pH, DO 165785 + (100000*pH*DO - 256025*pH*pH)/(209 - 42.3*DO) 0.890 0.002

0.877 0.079

0.886 < 0.001

narI narI Cu, Nitro 585479 + 2740*Cu*Nitro + 217156*Cu*Cu - 231782*Cu - 90.3*Nitro*Nitro -

53473*Cu*Cu*Cu 0.975 < 0.001 0.955 0.007 0.960 < 0.001

narJ narJ pH, DO 104334 + 29712*DO + 110*pH*DO*DO*DO - 5813*DO*DO 0.911 0.009

0.881 0.046

0.904 0.010

ahpC ahpC TOC, ahpF 5491077 + 313927*TOC + 3.83e-5*ahpF*ahpF - 22.2*ahpF - 2.05e-

11*ahpF*ahpF*ahpF - 170148*TOC*TOC 0.976 < 0.001 0.964 0.027 0.973 < 0.001

ahpF ahpF Eury, katE 26198241 + 8.08e-6*katE*katE - 867392672100/(katE - 1000000) - 28.1*katE 0.963 < 0.001

0.931 0.007

0.955 < 0.001

fnr fnr pH, ahpF 28.4*ahpF + 2742781560000/ahpF - 11621325 - 313758*pH - 1.52e-5*ahpF*ahpF 0.985 < 0.001 0.975 < 0.001 0.978 < 0.001

katA katA pH, DO 97242*pH + (165326- 26217*pH*pH)/DO 0.927 0.021

0.915 0.044

0.918 0.002

katE katE pH, Beta 1744706 - 104681*pH/(Beta - 36.1) - 78046*pH 0.966 < 0.001 0.956 0.016 0.964 < 0.001

oxyR oxyR pH, Fe2+, Eury 704238 + 19746*Eury + 12592*Fe2+ - 5938*pH*Eury - 155*Eury*Eury 0.975 < 0.001

0.958 0.013

0.967 < 0.001

perR perR pH, DO 31987 + 42136/(DO*DO - 5.94*DO) 0.825 0.002 0.823 0.057 0.824 < 0.001

proV proV SO42-

116643*SO42- + 986/(9 + SO4

2-*SO42- - 6*SO4

2-) - 234/(8.71 + SO42-*SO4

2- - 6*SO42-) -

99930 0.942 0.027 0.922 0.038 0.937 0.002

proX proX DO, Cu 93424 + 1762*DO - 931/(0.469*DO - 0.39) 0.911 0.001 0.909 0.029 0.909 < 0.001

phoA phoA pstC 25907 + 0.362*pstC + 9997/(87.7 + 1e-10*pstC*pstC - 0.0002*pstC) 0.980 < 0.001 0.948 0.008 0.956 < 0.001

phoB phoB pH, P 14430174 - 12263863/pH + 531416*pH*pH - 4765475*pH 0.959 0.015 0.951 0.041 0.953 0.010

pstA pstA Eury, pstC, pstS 1541*Eury + 3.84*pstS + 1.04*pstC - 699470 - 2.29e-6*pstC*pstS 0.982 < 0.001

0.961 0.002

0.971 < 0.001

pstB pstB pH, P 4612569 + 10199/(pH - 1.96) - 155833*pH 0.981 0.004 0.974 0.011 0.976 0.002

pstC pstC pH, Eury 627618 + 378470*pH + 27194*Eury - 7403*pH*Eury - 65401*pH*pH 0.970 < 0.001

0.966 0.042

0.967 < 0.001

pstS pstS pH, DO, P 645118 + 360948*P + 33388*DO - 100000*pH - 1845*P*DO*DO*DO 0.896 0.008 0.889 0.041 0.894 0.001

clpC clpC ctsR 400000 + 1.61*ctsR - 5911987830/ctsR - 2.17e-6*ctsR*ctsR 0.974 < 0.001

0.961 < 0.001

0.964 < 0.001

ctsR ctsR pH, TOC 296028 - 30766/TOC - 10758*pH/(13.1 - 13.8*pH*TOC) - 71532*TOC 0.912 < 0.001 0.866 0.008 0.901 < 0.001

obgE obgE Fe2+, Al, Actino 2612455 + (490435*Fe2+ - 391327)/Al - 31709*Actino - 361380*Fe2+ 0.956 < 0.001

0.953 0.018

0.954 < 0.001

ABC antibiotic

transporter ABCat pH 268357 + 7868/(3.82*pH - 7.71) - 26174*pH 0.921 0.023 0.912 0.012 0.919 0.011

MatE antibiotics MatE Cd, Actino, Nitro 389315 + 3159*Nitro + 23655/(Nitro - 11.9) + (23655*Cd - 63.7*Nitro)/Actino -

1088625*Cd*Actino 0.986 < 0.001

0.925 0.017

0.941 < 0.001

MFS antibiotics MFS SMR, Mex 47580 + 0.601*SMR + (37107 + 2.15e-8*SMR*SMR - 0.059*SMR)/(6.92 - 4.06e-

6*SMR) 0.970 < 0.001 0.954 0.018 0.966 < 0.001

SMR antibiotics SMR Pb, Acido, Beta 1741816 + 741613*Acido + 203*Acido*Beta*Beta - 21910*Acido*Beta -

206028*Acido*Acido 0.986 < 0.001

0.981 0.026

0.985 < 0.001

Mex Mex Eury, ABCat 90394 + 0.031*Eury*ABCat + 20.8*Eury*Eury*Eury - 0.08*ABCat -

0.004*ABCat*Eury*Eury 0.867 0.010 0.831 0.012 0.841 < 0.001

beta-lactamase lac Al, Alpha 76626 + 2628*Al*Al*Al + 16.1*Al*Alpha*Alpha - 1222*Alpha - 8560*Al*Al 0.907 0.010

0.861 0.014

0.895 0.003

class A beta-

lactamase lacA pH, Nitro 319145*pH + 9936*Nitro + -7123/Nitro - 234250 - 4061*pH*Nitro - 48191*pH*pH 0.953 < 0.001 0.936 0.009 0.949 < 0.001

class C beta-

lactamase lacC pH, Acido

524718 + 2144746*Acido*Acido + 241702*pH*pH*Acido*Acido - 35611*pH -

1447706*pH*Acido*Acido 0.921 < 0.001

0.919 0.042

0.921 < 0.001

Tet Tet pH 709163 + 4117/(37.6*pH - 103) 0.951 0.018 0.939 0.083 0.942 0.012

Van Van EC, Beta 21198 + (1433 + 36.7*Beta)/(EC + 0.13*Beta*Beta - 3 - 2.28*Beta) 0.835 0.018 0.746 0.053 0.769 0.002

b The observed values were randomly permutated 10,000 times and compared with predicted values. The significance of the Bray-Curtis similarity is defined as the number of resamples with greater similarity by random permutations than that with

observed values divided by total number of resamples.

Table S13. Predictive equations and functional parameters that provide the best prediction for environmental properties based on the artificial neural network (ANN).

Environmental

properties Abbr.

Functional

Dissolved Oxygen DO Fe2+ 1.69 + 0.024/(2.54 - Fe2+) + Fe2+/(4.17*Fe2+ - 13.7) 0.785 < 0.001

0.528 0.059

0.724 < 0.001

Total Organic Carbon TOC P (20.1*P*P - 3.97*P)/(16.8*P*P - 0.242 - 2.31*P) 0.927 < 0.001

0.829 0.002

0.853 < 0.001

Electrical Conductivity EC Fe3+ Fe3+ + 6.14/Fe3+ - 1.44 0.978 < 0.001

0.971 0.004

0.977 < 0.001

Sulfate SO42- TOC, EC 1.61 + 0.539*EC + 0.218/TOC - 0.093/(TOC*EC - 3*TOC) 0.978 < 0.001

0.965 < 0.001

0.975 < 0.001

Ferric ion Fe3+ pH, Fe2+ 2.45 + 1.61*Fe2+ - 0.562*pH*Fe2+ 0.968 < 0.001

0.963 0.003

0.967 < 0.001

Ferrous ion Fe2+ pH 7.97 + 0.474*pH*pH - 3.89*pH 0.753 < 0.001

0.734 0.011

0.744 < 0.001

Aluminum Al Fe3+ 0.761*Fe3+ + 0.092/(51.1*Fe3+ - 174) 0.897 < 0.001

0.866 0.002

0.888 < 0.001

Copper Cu Fe2+, Cd, P 8.34*Cd*P/(Cd + 0.198*Fe2+) 0.638 < 0.001

0.496 0.010

0.554 < 0.001

Zinc Zn Fe3+, Cd Fe3+ - 0.004/Cd - 0.702 0.813 < 0.001

0.802 0.016

0.805 < 0.001

Arsenic As pH, DO, Cd (0.104 + 2.34*Cd)/(pH + pH*DO - 2.75) 0.813 < 0.001

0.611 0.047

0.770 < 0.001

Cadmium Cd DO, Pb DO*Pb*Pb/(0.253 + DO*DO - DO) 0.722 < 0.001

0.567 0.053

0.652 < 0.001

Lead Pd pH, DO, Fe3+ 0.006*DO + 0.001*Fe3+*Fe3+*Fe3+*Fe3+ 0.641 0.004

0.628 0.041

0.638 0.001

Phosphorus P Al, Cd 0.048 + 0.041*Al + 5.09*Al*Cd - 8.45*Cd 0.762 < 0.001 0.748 < 0.001 0.757 < 0.001

a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted environmental properties.

b The observed values were randomly permutated 10,000 times and compared with predicted values. The significance of the Bray-Curtis similarity is defined as the number of resamples with greater similarity by

random permutations than that with observed values divided by total number of resamples.

Table S14. Functional genes that reveled consistent or fluctuant relative metabolic potentials along the gradient of pH levels.

Category Subcategory Gene Abbreviations pH range

< 2.0 2.0 - 2.5 2.5 - 3.0 3.0 - 3.5 3.5 - 4.0 > 4.0

Carbon cycling Carbon fixation

aclB aclB 90.70 69.28 55.40 50.78 50.64 53.12

CODH CODH 1.69 2.90 20.70 43.06 10.98 5.68

Pcc Pcc 39.31 38.65 35.67 25.02 7.31 93.04

RubisCo RubisCo 83.25 76.79 85.18 90.13 94.13 96.82

Nitrogen cycling Ammonification ureC ureC 65.31 65.42 65.83 67.71 69.93 66.85

Denitrification nirK nirK 62.99 43.57 44.40 55.63 65.92 89.04

Energy process Electron transport

Fe-S cluster binding protein fes 51.68 60.40 69.80 65.28 69.44 57.42

ferredoxin fer 35.28 35.05 36.67 40.66 30.93 69.42

NADH ubiquinone oxidoreductase NADH 61.68 60.40 69.80 65.28 69.44 57.42

terminal quinol oxidase quio 61.62 59.43 69.03 64.40 69.07 56.49

Hydrogenase Ni-Fe hydrogenase Nfhyd 25.32 30.54 31.02 59.18 35.24 30.64

Membrane transport EPS glycosyl transferase glyt 53.69 54.32 62.79 58.25 61.25 50.99

Metal resistance

Ag silA silA 37.01 38.40 40.48 42.32 44.98 42.45

silC silC 41.39 34.30 33.40 33.35 33.75 34.77

As arsA arsA 68.34 73.17 67.11 72.26 72.56 72.83

Cd cadBD cadBD 37.31 28.54 60.33 40.59 37.35 39.88

Cd_Co_Zn

czcA czcA 20.47 27.96 29.40 31.70 29.89 30.43

czcC czcC 64.25 55.37 76.49 71.08 71.55 71.96

czcD czcD 37.50 76.65 80.97 46.80 81.82 93.41

Cu cueO cueO 81.74 89.71 92.89 96.18 96.14 95.79

cusA cusA 81.43 89.55 92.76 96.10 96.06 95.71

mer mer 49.63 95.97 99.01 99.41 99.76 99.97

merB merB 47.61 64.54 64.63 67.61 64.90 63.49

merP merP 33.26 89.90 97.35 98.43 99.35 99.92

Ni nreB nreB 1.79 7.99 39.71 92.23 64.10 51.87

Pb pbrA pbrA 54.47 82.31 70.83 64.55 68.31 73.56

Te terD terD 53.10 23.97 55.05 60.79 33.66 7.26

Zn zitB zitB 71.92 92.73 70.69 61.81 64.09 60.32

Stress response

Cold cspB cspB 25.88 37.96 33.86 33.09 32.80 32.67

dnaK dnaK 39.81 86.46 96.86 99.07 99.84 99.98

groEL groEL 31.54 34.88 17.05 39.18 69.70 91.97

grpE grpE 89.23 93.45 94.09 73.13 16.68 5.53

hrcA hrcA 60.98 91.69 73.49 37.05 20.94 60.00

Glucose limitation

bglH bglH 5.11 18.46 31.42 37.51 46.43 50.21

cydA cydA 77.23 26.21 10.05 1.22 9.17 47.09

cydB cydB 74.23 29.90 18.21 9.63 10.60 47.96

narH narH 97.89 89.49 74.27 55.28 32.71 10.63

narI narI 46.69 78.77 99.49 97.20 72.44 52.38

narJ narJ 14.61 49.41 76.50 88.58 95.54 99.10

Oxygen stress

ahpC ahpC 82.83 78.48 82.87 84.35 86.78 88.59

ahpF ahpF 22.95 10.67 18.11 31.81 46.51 55.67

katA katA 63.86 69.28 90.33 98.81 82.59 45.69

katE katE 55.82 51.33 41.53 36.84 31.53 26.21

perR perR 19.20 61.52 87.64 96.08 99.31 99.93

Osmotic stress proX proX 82.17 66.28 89.47 96.66 99.41 99.94

Protein stress clpC clpC 64.00 67.99 76.60 76.35 76.18 76.09

ctsR ctsR 56.15 61.39 70.02 69.74 69.54 69.43

Radiation stress obgE obgE 60.19 76.87 90.13 95.71 98.62 96.95

Antibiotic resistance

Transporter Mex Mex 57.56 55.22 57.93 66.66 60.13 51.09

Beta-lactamases class C beta-lactamase lacC 24.03 21.79 12.08 3.73 24.86 59.47

other category Tet Tet 51.33 50.84 51.78 53.02 52.42 52.27

Van Van 30.01 27.33 27.49 27.46 27.45 27.45

Figure S1. The consensus networks of environmental (a) and taxonomic (b) variables generated by Bayesian

network inference.

Relative abundance (log10 (x+1))

0.0 0.5 1.0 1.5 2.0 2.5

R2(Phylum, n=8) = 0.70

0.0 0.5 1.0 1.5 2.0 2.5

R2 (Order, n=35) = 0.62

Predicted values0.0 0.5 1.0 1.5 2.0 2.5

R2 (OTU, n=14) = 0.52

PhylumOrderOTU

Figure S2. The scatter plots show the cross-validation of predicted and observed values for relative microbial abundances at different taxonomic levels.

4 5 6 7 8

Functional metabolic potentials (log10 (x))

R2 = 0.977

5.4 5.6 5.8 6.0 6.2

R2 = 0.967

5.0 5.5 6.0 6.5 7.0

R2 = 0.952

4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2

R2 = 0.981

5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6

R2 = 0.997

4.5 5.0 5.5 6.0 6.5 7.0 7.5

R2 = 0.996

Predicted values

5.0 5.5 6.0 6.5 7.0 7.5 8.0

R2 = 0.956

Predicted values 4 5 6 7

R2 = 0.998

Predicted values

4 5 6 7 8

R2 = 0.966

Carbon cyclingNitrogen cyclingPhosphorusSulfur cyclingEnergy processMembrane transportAntibiotic resistanceMetal resistanceStress response

Figure S3. The scatter plots show the cross-validation of predicted and observed values for functional metabolic potentials of different functional gene categories.

Functional metabolic potentials (log10 (x)) Functional metabolic potentials (log10 (x))

Euryarc

Acidob

Actino

Firmicu

Nitrosp

Alphap

Betapro

cteria

Gammap

ilarit

100 TrainValidationAverage

All com

munity

Carbon

cyclin

Nitroge

cyclin

Sulfur

cyclin

Energy

Membra

ne tra

Metal re

sistan

Stress

Antibio

tic re

sistan

ilarit

Figure S4. Bray-Curtis similarity between predicted and observed values of relative microbial abundances (phylum level, a) and gene metabolic potentials of different functional categories (with relative abundance information of microbial phyla, b). The similarity of the overall microbial community composition was calculated based on these eight microbial phyla. Average includes the data sets for training and validation. Values are mean ± SE and the significances of the similarity were listed in supplementary tables.

Color Key: < 2.0 2.0 - 2.5 2.5 - 3.0 3.0 - 3.5 3.5 - 4.0 > 4.0 (pH range)

cspAgroES

glnRarcA

cspAgroES

glnRarcA

cspAgroES

glnRarcA

proV100

aprBdsrA

Figure S5. The changes of relative metabolic potential of functional genes in sulfur cycling (a), stress response (b), energy process and membrane transport (c) and antibiotic resistance (d) along the gradient of pH levels. The metabolic potentials were normalized to relative values.

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

RubisCo

Observed valuesPredicted values

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

phytase

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

5.70aprB

5.4 dsrA

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Figure S6. The comparison of predicted and observed metabolic potentials of different functional gene categories including carbon cycling (a), phosphorus (b) and sulfur cycling (c) along the gradient of pH levels. Values were mean ± SE.

6.63gdh

5.52ureC

6.36narG

5.97 norB

5.36 nosZ

5.56NiR

5.48 nirA

5.25 napA

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Figure S7. The comparison of predicted and observed metabolic potentials of nitrogen cycling along the gradient of pH levels. Values were mean ± SE.

6.80fer

5.0 fero

7.11quio

7.37cyt

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

5.30ABCt

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Figure S8. The comparison of predicted and observed metabolic potentials of different functional gene categories including energy process (a) and membrane transport (b) along the gradient of pH levels. Values were mean ± SE.

4.70silP

5.00 al

5.2 arsC

7.98cadA

6.12 cadBD

5.12 czcA

5.1czcD

6.0corC

5.4chrA

6.42 cueO

4.6 cusA

4.8mer

5.1nreB

5.3pbrA

4.9 tehB

5.80terC

5.91terD

4.89 zntA

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Figure S9. The comparison of predicted and observed metabolic potentials of metal resistance along the gradient of pH levels. Values were mean ± SE.

4.50 dnaK

5.74groEL

5.7groES

4.95glnA

5.55arcA

5.00arcB

4.84cydA

5.28 cydB

5.25narI

5.79narJ

5.22 ahpC

6.22ahpF

6.65katA

5.50katE

6.24 oxyR

5.04phoA

5.76phoB

6.15pstS

5.9 clpC

5.82ctsR

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Figure S10. The comparison of predicted and observed metabolic potentials of stress response along the gradient of pH levels. Values were mean ± SE.

5.36 MatE

5.64MFS

5.10 lac

5.50 lacC

5.84 Tet

pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0

Figure S11. The comparison of predicted and observed metabolic potentials of antibiotic resistance along the gradient of pH levels. Values were mean ± SE.

Supporting Information - Nature Research · 2016-05-17 · Supporting Information ... Prediction...

Documents

Cours Assemblages Complet

Assemblages vissés

Assemblages Par Soudure

Assemblages Assemblages poteaux-poutres et Assemblages

Monitoring and Supporting Functional Skills among Children with Cerebral Palsy

New Initiatives Supporting Offender Reentry: Functional Family Therapy for Adults

Industrial Design Intelligence: Evaluation Supporting Aesthetic and Functional Design

Assemblages with Codices

Supporting Idea Generation Through Functional Decomposition: An Alternative Framing For Design Heuristics

Fungal Assemblages in Different Habitats in an Erman's ... · endophytes, mycorrhiza, phytopathogens and saprophytes, to explore the characteristics of functional guilds in speciﬁc

Cours Assemblages Pr

Welfare Services Functional Area Supporting Plan

Functional brain network architecture supporting the ...€¦ · Functional brain network architecture supporting the learning of social networks in humans Steven H. Tompson1,2, Ari

ANALYZING NON-FUNCTIONAL CAPABILITIES OF INFRASTRUCTURES SUPPORTING POWER …606155/... · 2013-02-18 · ANALYZING NON-FUNCTIONAL CAPABILITIES OF ICT. INFRASTRUCTURES SUPPORTING

Functional differentiation between fish assemblages from

Functional organization of stream fish assemblages in relation to

Connectivity maintain mammal assemblages functional diversity

Archipelagic Assemblages

Stabiles and assemblages

Catalogue 14 - Assemblages