View
1
Download
0
Category
Preview:
Citation preview
Supporting Information
Predicting taxonomic and functional structure of microbial community in acid mine drainage
Jialiang Kuang, Linan Huang, Zhili He, Linxing Chen, Zhengshuang Hua, Pu Jia, Shengjin Li, Jun Liu,
Jintian Li, Jizhong Zhou and Wensheng Shu
Supplementary
Methods
Sampling procedure, physicochemical analyses and DNA extraction
Amplification and bar-coded pyrosequencing of bacterial and archaeal 16S rRNA genes
Processing of pyrosequencing data
GeoChip analysis
Prediction model of microbial assemblages and functional metabolic potentials
Table S1 Functional genes that selected for the statistical analyses in this study.
Table S2 Site locations and environmental conditions of acid mine drainage (AMD) samples.
Table S3 Relative abundance (%) of dominant lineages across AMD microbial communities.
Table S4 Summary of statistics (R
2) from dissimilarity test (Adonis) between two mining
areas on the functional community structure.
Table S5 Environmental and taxonomic variable loadings on the PCs across the AMD samples.
Table S6
Multiple linear regression (MLR) of environmental variables and relative
abundance of dominant microbial lineages on metabolic potential of functional
genes.
Table S7
Validation of predictive models for relative abundances of dominant microbial
taxa (Phylum level, mean relative abundance > 1%) based on the artificial neural
network (ANN).
Table S8
Validation of predictive models for relative abundances of dominant microbial
taxa (Order level, mean relative abundance > 0.1%) based on the artificial neural
network (ANN).
Table S9
Validation of predictive models for relative abundances of key microbial taxa
(OTU level, observed in at least half of the total samples) based on the artificial
neural network (ANN).
Table S10a Validation of predictive models for metabolic potentials (original signals) of key
functional genes based on the artificial neural network (ANN).
Table S10b Validation of predictive models for metabolic potentials (normalized data) of key
functional genes based on the artificial neural network (ANN).
Table S11
Predictive equations and functional parameters that provide the best prediction for
relative abundances of dominant microbial taxa based on the artificial neural
network (ANN).
Table S12 Predictive equations and functional parameters that provide the best prediction for
functional metabolic potentials based on ANN.
Table S13 Predictive equations and functional parameters that provide the best prediction for
environmental properties based on ANN.
Table S14 Functional genes that reveled consistent or fluctuant relative metabolic potentials
along the gradient of pH levels.
Figure S1 The consensus networks of environmental (a) and taxonomic (b) variables
generated by Bayesian network inference.
Figure S2 The scatter plots show the cross-validation of predicted and observed values for
relative microbial abundances at different taxonomic levels.
Figure S3 The scatter plots show the cross-validation of predicted and observed values for
functional metabolic potentials of different functional gene categories.
Figure S4
Bray-Curtis similarity between predicted and observed values of relative microbial
abundances (phylum level, a) and gene metabolic potentials of different functional
categories (with relative abundance information of microbial phyla, b).
Figure S5
The changes of relative metabolic potential of functional genes in sulfur cycling
(a), stress response (b), energy process and membrane transport (c) and antibiotic
resistance (d) along the gradient of pH levels.
Figure S6
The comparison of predicted and observed metabolic potentials of different
functional gene categories including carbon cycling (a), phosphorus (b) and sulfur
cycling (c) along the gradient of pH levels.
Figure S7 The comparison of predicted and observed metabolic potentials of nitrogen cycling
along the gradient of pH levels.
Figure S8
The comparison of predicted and observed metabolic potentials of different
functional gene categories including energy process (a) and membrane transport
(b) along the gradient of pH levels.
Figure S9 The comparison of predicted and observed metabolic potentials of metal resistance
along the gradient of pH levels.
Figure S10 The comparison of predicted and observed metabolic potentials of stress response
along the gradient of pH levels.
Figure S11 The comparison of predicted and observed metabolic potentials of antibiotic
resistance along the gradient of pH levels.
Supplementary Methods
Sampling procedure, physicochemical analyses and DNA extraction
Acid mine drainage (AMD) samples were previously collected from 14 mining areas across Southeast
China with different mineralogy and the sampling sites ranged from about 10 m to over 1600 km
(Kuang et al., 2013). Briefly, water samples were taken using sterile serum bottles and immediately 5
kept on ice for transport to the laboratory. For DNA extraction, each sample of 500 ml water was
coarse filtered through a 3 μm fiber filter and then filtered through a 0.22 μm polyethersulfone (PES)
membrane filter. The cell pellets on the PES membranes were used for DNA extraction by following
the protocol described by Frias-Lopez et al. (2008) with an additional homogenizing step for cell lysis
using Fast Prep-24 Homogenisation System, and the filtrates were used for the chemical analyses. 10
Temperature, solution pH, dissolved oxygen (DO) and electrical conductivity (EC) were measured on
site by use of specific electrodes. Ferric and ferrous irons were measured by ultraviolet-colorimetric
assay with 1,10-phenanthroline at 530 nm. Total organic carbon (TOC) was measured by high-
temperature catalytic oxidation and infrared detection with a TOC analyzer and sulfate determined by a
BaSO4-based turbidimetric method. The element analysis was performed by inductively-coupled 15
optical emission spectrometry (ICP-OES) after the filtrates were digested at 180 oC with conc. HNO3
and HCl (1:3, v/v).
Amplification and bar-coded pyrosequencing of bacterial and archaeal 16S rRNA genes
PCR amplification, purification, pooling, and pyrosequencing of a region of the 16S rRNA gene were 20
performed following the procedure described by Fierer et al. (2008). The primer set F515 (5’-
GTGCCAGCMGCCGCGGTAA-3’, with an 8-bp error-correcting tag (Hamady et al., 2008)) and
R806 (5’-GGACTACVSGGGTATCTAAT-3’) was used to amplify the V4 hypervariable region.
Triplicate PCR reactions for each sample were amplified, pooled and purified. Finally, all PCR
products were combined with approximately equimolar amounts and sequenced by a 454 GS FLX 25
Titanium pyrosequencer.
Processing of pyrosequencing data
Raw data generated from the 454-pyrosequencing run were processed and analyzed following the
pipelines of Mothur (Schloss et al., 2009) and QIIME (Caporaso et al., 2010). Pyrosequences were 30
denoised using the commands of ‘shhh.flows’ (translation of PyroNoise algorithm; Quince et al., 2009)
and ‘pre.cluster’ (Huse et al., 2010) in Mothur platform. Chimeric sequences were identified and
removed using UCHIME with de novo method (Edgar et al., 2011). Quality sequences were
subsequently assigned to samples according to their unique 8-bp barcode and binned into phylotypes
using average clustering algorithm (Huse et al., 2010) at the 97% similarity level. Taxonomic 35
classification of phylotypes was determined based on the Ribosomal Database Project at the 80%
threshold (Wang et al., 2007). The relative abundance (%) of individual taxa was estimated within
each community by comparing the number of sequences assigned to a specific taxon versus the number
of total sequences obtained for that sample.
40
GeoChip analysis
The general pipeline of DNA labeling, GeoChip processing and data normalization was described
previously (He et al., 2007). Specifically, to obtain sufficient amounts of genomic DNA for the
hybridization, whole-community genome amplification (WCGA) (TempliPhi Amplification kit,
Amersham Biosciences, Piscataway, NJ) was conducted using approximately 1.0 ng of community 45
DNA from each sample following the procedure of Wu et al. (2006). Notably, appropriate
manipulation of community DNAs was necessary in applying microarray-based genomic technology
especially for samples with very low microbial biomass like AMD. Although previous report showed
that WCGA could produce significant biases in community composition (Bodelier et al., 2009), our
previous experimental study indicated that the amplification procedure we used here was in a 50
representative and quantitative fashion (Wu et al., 2006). Thus, the biases of WCGA in this study may
not significantly affect the actual functional structure. Equal amounts of amplified DNA (1.0 μg) were
then used for GeoChip 4.0 hybridization as previously described (Lu et al., 2012; Chan et al., 2013).
Signal intensity was normalized by the average control dye across samples and spots with signal-to-
noise ratio [SNR = (signal intensity - background)/standard deviation of background] greater than 2 55
were considered as positive signals for further analysis (He et al., 2007).
Prediction model of microbial assemblages and functional metabolic potentials
The modeling approach developed by Larsen et al. (2012) was applied for the prediction of microbial
assemblages and functional metabolic potentials. In this study, the dynamics of microbial community 60
structure and signal intensity of functional genes were modeled respectively. Since our results
indicated that the patterns of microbial community composition and functional gene structure were
largely determined by environmental conditions, thus the prediction of microbial assemblage and
functional metabolic potential were performed according to the environmental properties. Also, the
biotic interactions between different microbial taxa or relevant genes involved in the same functional 65
subcategory were incorporated into the modeling. Additionally, because of the potential influence of
relative microbial abundances to metabolic potentials that observed in other analyses of this study, we
constructed models of functional metabolic potentials with and without these microbial interactions.
Environmental variables, relative abundances of dominant microbial lineages and/or the signal
intensities of functional genes were merged as input matrixes, and the relationships between the 70
variables were estimated using Bayesian network inference with Java Objects (BANJO v2.2.0) (Smith
et al., 2006; Larsen et al., 2012). The networks generated by the Bayesian network inference were
directed acyclical graphs (DAGs), in which nodes were environmental parameters, microbial taxa or
functional genes. The directed edges of these DAGs revealed the relationships between nodes, and a
change in the value of a parent node has a significant conditional dependence on a change in value of a 75
child node (Larsen et al., 2012). In this study, the maximum number of parents in BANJO was set as
three, and the simulated annealing and the AllLocalMoves proposer were used with randomly
configured networks. The top-10 highest-scoring networks were subsequently used to generate the
consensus network.
The relationships revealed by the consensus network could be expressed as a set of formulas such that 80
the value of every node is a function of the value of its parent nodes. Finally, these functions were
derived using Eureqa v 0.99.9 beta software (Schmidt and Lipson, 2009). The operations including
constant, addition, subtraction, multiplication and division were permitted in equations. In the formula
search, data from 30 randomly selected samples were used for training, while the remaining samples
(10 samples) were used for validation (see below). The best-fitting equations were searched for 2 CPU 85
hours, and not all of the parents (if more than one parent for a given node) will be essentially
incorporated into the generated equations that best fit the observed data. All the possible solutions were
effectively ranked according to the Pearson’s correlation coefficients. The final equation that selected
for the prediction was defined by the following optimality criteria: choice of equations that best fitting
an obvious peak or drop in observed data; highest correlation with observed data; with more function 90
parameters; with the fewest terms (Larsen et al., 2012). After the generation and selection of final
formula that trained by data from 30 samples, the data of the remaining 10 samples were imported to
validate this equation. Additionally, since only a few taxa are consistently of high relative abundance
and many taxa are consistently of low relative abundance, it is possible to get deceptively high
correlations between predicted and observed values so long as the model correctly identifies the small 95
number of high abundance taxa (Larsen et al., 2015). Therefore, two null models were performed to
test whether the predicted model has better correlation with biological observation than these null
models: i) setting all taxa's predicted relative abundance/metabolic potentials equal to the average taxa
abundance/metabolic potentials across all samples, ii) setting all taxa abundances/metabolic potentials
equal to the minimum observed values across all samples (Larsen et al., 2015). 100
References
Bodelier PLE, Kamst M, Meima-Franke M, Stralis-Pavese N, Bodrossy L. (2009). Whole-community
genome amplification (WCGA) leads to compositional bias in methane-oxidizing communities as
assessed by pmoA-based microarray analyses and QPCR. Environ Microbiol Rep 1: 434-441. 105
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK et al. (2010). QIIME
allows analysis of high-throughput community sequencing data. Nat Methods 7: 335-336.
Chan Y, Van Nostrand JD, Zhou J, Pointing SB, Farrell RL. (2013). Functional ecology of an
Antarctic Dry Valley. Proc Natl Acad Sci USA 110: 8990-8995.
Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R. (2011). UCHIME improves sensitivity and 110
speed of chimera detection. Bioinformatics 27: 2194-2200.
Fierer N, Hamady M, Lauber CL, Knight R. (2008). The influence of sex, handedness, and washing on
the diversity of hand surface bacteria. Proc Natl Acad Sci USA 105: 17994-17999.
Frias-Lopez J, Shi Y, Tyson GW, Coleman ML, Schuster SC, Chisholm SW et al. (2008). Microbial
community gene expression in ocean surface waters. Proc Natl Acad Sci USA 105: 3805-3810. 115
Hamady M, Walker JJ, Harris JK, Gold NJ, Knight R. (2008). Error-correcting barcoded primers for
pyrosequencing hundreds of samples in multiplex. Nat Methods 5: 235-237.
He Z, Gentry TJ, Schadt CW, Wu L, Liebich J, Chong SC et al. (2007). GeoChip: a comprehensive
microarray for investigating biogeochemical, ecological and environmental processes. ISME J 1:
67-77. 120
Huse SM, Welch DM, Morrison HG, Sogin ML. (2010). Ironing out the wrinkles in the rare biosphere
through improved OTU clustering. Environ Microbiol 12: 1889-1898.
Kuang JL, Huang LN, Chen LX, Hua ZS, Li SJ, Hu M et al. (2013). Contemporary environmental
variation determines microbial diversity patterns in acid mine drainage. ISME J 7: 1038-1050.
Larsen PE, Dai Y, Collart FR. (2015). Predicting bacterial community assemblages using an artificial 125
neural network approach. Meth Mol Bio 1260: 33-43.
Larsen PE, Field D, Gilbert JA. (2012). Predicting bacterial community assemblages using an artificial
neural network approach. Nat Methods 9: 621-625.
Lu Z, Deng Y, Van Nostrand JD, He Z, Voordeckers J, Zhou A et al. (2012). Microbial gene functions
enriched in the Deepwater Horizon deep-sea oil plume. ISME J 6: 451-460. 130
Quince C, Lanzen A, Curtis TP, Davenport, RJ, Hall N, Head IM et al. (2009). Noise and the accurate
determination of microbial diversity from 454 pyrosequencing data. Nat Methods 6: 639-641.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB et al. (2009). Introducing
mothur: open-source, platform-independent, community-supported software for describing and
comparing microbial communities. Appl Environ Microbiol 75: 7537-7541. 135
Schmidt M, Lipson H. (2009). Distilling free-form natural laws from experimental data. Science 324:
81-85.
Smith VA, Yu J, Smulders TV, Hartemink AJ, Jarvis ED. (2006). Computational inference of neural
information flow networks. PLoS Comput Biol 2: e161.
Wang Q, Garrity GM, Tiedje JM, Cole JR. (2007). Naive Bayesian classifier for rapid assignment of 140
rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol 73: 5261-5267.
Wu L, Liu X, Schadt CW, Zhou J. (2006). Microarray-based analysis of subnanogram quantities of
microbial community DNAs by using whole-community genome amplification. Appl Environ
Microbiol 72: 4931-4941.
145
Table S1. Functional genes that selected for the statistic analyses in this study.
Category Subcategory Gene Abbreviations
Carbon cycling Carbon fixation
aclB aclB
CODH CODH
Pcc Pcc
RubisCo RubisCo
Nitrogen cycling
Nitrogen fixation nifH nifH
Ammonification
(mineralization)
gdh gdh
ureC ureC
Nitrification amoA amoA
Denitrification
narG narG
nirK nirK
nirS nirS
norB norB
nosZ nosZ
Assimilatory N reduction
nasA nasA
NiR NiR
nirA nirA
nirB nirB
Dissimilatory N reduction napA napA
nrfA nrfA
Phosphorus Phosphorus utilization
phytase phytase
ppk ppk
ppx ppx
Sulfur cycling
Adenylylsulfate reductase aprA aprA
aprB aprB
Sulfite reductase dsrA dsrA
dsrB dsrB
Sulfur oxidation sox sox
Metal resistance
Ag
silA silA
silC silC
silP silP
Al al al
As
aoxB aoxB
arsA arsA
arsB arsB
arsC arsC
arsM arsM
Cd cadA cadA
cadBD cadBD
Cd_Co_Zn
czcA czcA
czcC czcC
czcD czcD
Co corC corC
Cr chrA chrA
Cu
copA copA
cueO cueO
cusA cusA
Hg
mer mer
merB merB
merP merP
Ni nreB nreB
Pb pbrA pbrA
pbrT pbrT
Te
tehB tehB
terC terC
terD terD
terZ terZ
Zn zitB zitB
zntA zntA
Energy process
Electron transport
Fe-S cluster binding protein fes
ferredoxin fer
ferredoxin oxidoreductase fero
NADH ubiquinone oxidoreductase NADH
terminal quinol oxidase quio
Cytochrome cytochrome cyt
Hydrogenase hydrogenase hyd
Ni-Fe hydrogenase Nfhyd
Table S1. Functional genes that selected for the statistic analyses in this study (continued).
Category Subcategory Gene Abbreviations
Membrane transport EPS glycosyl transferase glyt
other category ABC transporter ABCt
Stress response
Cold cspA cspA
cspB cspB
Heat
dnaK dnaK
groEL groEL
groES groES
grpE grpE
hrcA hrcA
Glucose limitation bglH bglH
bglP bglP
Nitrogen limitation glnA glnA
glnR glnR
Oxygen limitation
arcA arcA
arcB arcB
cydA cydA
cydB cydB
narH narH
narI narI
narJ narJ
Oxygen stress
ahpC ahpC
ahpF ahpF
fnr fnr
katA katA
katE katE
oxyR oxyR
perR perR
Osmotic stress proV proV
proX proX
Phosphate limitation
phoA phoA
phoB phoB
pstA pstA
pstB pstB
pstC pstC
pstS pstS
Protein stress clpC clpC
ctsR ctsR
Radiation stress obgE obgE
Antibiotic resistance
Transporter
ABC antibiotic transporter ABCat
MatE antibiotics MatE
MFS antibiotics MFS
SMR antibiotics SMR
Mex Mex
Beta-lactamases
beta-lactamase lac
class A beta-lactamase lacA
class C beta-lactamase lacC
other category Tet Tet
Van Van
Table S2. Site locations and environmental conditions of acid mine drainage (AMD) samples.
Sample
ID Location Mining area
Latitude
(N)
Logitude
(E) pH EC DO TOC SO4
2- Fe3+ Fe2+ Al As Cd Cu Pb Zn
NS Maanshan, Anhui AHMAS 31.64 118.62 4.1 3224 2.2 6.0 1319 1 0 0 0.00 0.00 0.5 0.00 0
JGS1 Tongling, Anhui AHTL 30.90 117.83 2.0 20000 0.9 67.0 7530 29283 589 2531 136.51 6.03 1028.0 1.60 1834
JGS2 Tongling, Anhui AHTL 30.90 117.83 2.2 16259 1.1 19.0 7443 3570 6 1891 64.86 7.22 699.0 0.92 1469
XSC1 Tongling, Anhui AHTL 30.91 117.89 2.9 2908 1.4 2.2 712 42 2 9 0.00 0.04 2.9 0.83 47
XSC3 Tongling, Anhui AHTL 30.90 117.90 2.9 4342 2.1 6.8 2852 219 10 90 0.00 0.00 12.0 0.14 3
YSC1 Tongling, Anhui AHTL 30.90 117.90 2.3 5113 2.5 7.5 4579 721 35 174 14.90 0.00 19.0 0.12 41
YSC2 Tongling, Anhui AHTL 30.90 117.83 2.2 6794 0.4 13.0 5931 1664 25 157 62.11 0.43 52.0 0.22 97
ZJ1 Zijin, Fujian FJZJ 25.19 116.38 2.0 16770 4.6 12.0 6823 3183 32 1297 10.41 0.00 268.0 0.50 82
ZJ2 Zijin, Fujian FJZJ 25.20 116.38 2.9 970 3.1 2.9 842 7 0 54 0.00 0.00 36.0 0.21 7
ZJ3 Zijin, Fujian FJZJ 25.18 116.37 3.5 134 4.4 13.0 22 0 0 5 0.00 0.00 0.2 0.01 0
ZJ8 Zijin, Fujian FJZJ 25.18 116.38 3.4 1093 6.4 3.1 813 1 0 32 0.00 0.00 18.0 0.07 3
DBS1 Dabaoshan, Guangdong GDDBS 24.52 113.72 2.6 2850 5.2 2.5 3469 427 9 168 0.00 0.02 6.3 0.67 144
DBS3 Dabaoshan, Guangdong GDDBS 24.57 113.72 2.5 3610 5.0 2.7 4632 559 7 132 0.00 0.00 16.0 0.10 27
FK1 Fankou, Guangdong GDFK 25.05 113.66 1.9 5890 4.8 10.0 6173 2541 252 53 0.74 0.32 0.0 0.26 427
YF1 Yunfu, Guangdong GDYF 22.97 112.01 2.4 2290 4.0 6.3 2785 281 147 114 0.00 0.00 0.0 0.33 0
YF2 Yunfu, Guangdong GDYF 22.97 112.01 2.3 3450 15.0 5.4 4268 1019 346 117 0.00 0.00 0.0 0.50 0
YF3 Yunfu, Guangdong GDYF 22.97 112.01 2.1 9220 9.0 13.0 7085 7686 2561 1675 0.45 0.00 0.0 0.88 408
YF4 Yunfu, Guangdong GDYF 22.97 112.01 2.4 3930 2.0 3.6 5747 1490 331 266 0.00 0.00 0.0 0.78 62
YF5 Yunfu, Guangdong GDYF 22.97 112.01 2.5 3830 3.0 3.3 5611 1439 453 265 0.00 0.00 0.0 0.69 59
YF7 Yunfu, Guangdong GDYF 22.98 112.01 2.7 4910 11.0 5.0 6823 328 241 1878 0.40 0.00 0.1 0.55 256
YF8 Yunfu, Guangdong GDYF 22.99 112.01 2.6 3910 3.5 2.4 5863 251 8 0 0.09 0.00 0.1 0.46 0
Table S2. Site locations and environmental conditions of acid mine drainage (AMD) samples (continued).
Sample
ID Location Mining area
Latitude
(N)
Longitude
(E) pH EC DO TOC SO4
2- Fe3+ Fe2+ Al As Cd Cu Pb Zn
DC1 Dachang, Guangxi GXDC 24.86 107.58 2.7 3820 1.3 1.9 4031 890 145 0 0.00 0.00 0.0 0.12 38
DC2 Dachang, Guangxi GXDC 24.86 107.58 3.0 2350 0.9 1.6 1845 63 2 0 0.00 0.00 0.0 0.16 25
DC3 Dachang, Guangxi GXDC 24.86 107.58 2.8 2980 0.9 3.4 3144 973 202 16 0.13 0.00 0.1 0.10 46
DC5 Dachang, Guangxi GXDC 24.82 107.58 3.1 890 0.8 1.5 531 22 0 0 0.00 0.00 0.0 0.02 0
DC7 Dachang, Guangxi GXDC 24.85 107.57 2.5 5080 0.3 16.0 6028 2547 509 48 29.46 0.81 3.1 0.78 263
DC8 Dachang, Guangxi GXDC 24.85 107.57 2.5 4930 0.4 13.0 6018 2775 417 50 38.34 2.67 4.6 0.37 589
PD1 Puding, Guizhou GZPD 26.58 105.72 3.0 3300 2.2 ND 3062 265 5 81 0.00 0.00 0.8 0.06 127
PD3 Puding, Guizhou GZPD 26.48 105.89 2.5 3510 1.9 14.0 3600 499 4 111 0.00 0.00 1.2 0.06 16
PD4 Puding, Guizhou GZPD 26.48 105.89 3.0 4300 1.0 2.8 5155 708 327 232 0.00 0.00 0.1 0.12 0
PD7 Puding, Guizhou GZPD 26.47 105.87 2.9 2880 1.8 2.2 2873 184 6 62 0.00 0.00 0.9 0.04 0
SL Shilu, Hainan HNSL 19.24 109.04 2.8 3155 1.2 3.3 699 150 9 8 0.00 0.00 6.7 0.11 0
DX1 Dexing, Jiangxi JXDX 29.01 117.73 2.0 3690 1.3 14.0 2766 506 5 124 0.00 0.00 19.0 0.05 0
DX2 Dexing, Jiangxi JXDX 29.01 117.73 1.9 10330 0.8 48.0 6997 2451 271 1601 0.48 0.00 33.0 0.41 0
DX3 Dexing, Jiangxi JXDX 29.01 117.73 1.9 10200 1.2 55.0 6687 2573 240 1572 0.34 0.00 33.0 0.42 0
YP1 Yongping, Jiangxi JXYP 28.21 117.77 2.7 4430 1.3 15.0 4685 91 19 321 0.00 0.00 25.0 0.17 16
YP2 Yongping, Jiangxi JXYP 28.20 117.76 2.4 4390 1.4 14.0 4331 262 4 80 0.00 0.45 54.0 0.10 50
YP3 Yongping, Jiangxi JXYP 28.20 117.76 2.1 5740 1.1 6.2 5412 915 12 154 0.00 1.85 90.0 0.16 89
YP4 Yongping, Jiangxi JXYP 28.20 117.76 2.7 2740 1.3 3.1 2611 47 1 70 0.00 2.00 85.0 2.40 199
YP5 Yongping, Jiangxi JXYP 28.20 117.76 2.6 4510 1.4 16.0 4021 205 6 58 0.00 2.27 44.0 0.13 123
All values are in mg L-1
, except pH, Latitude, Longitude (in standard units) and EC (in μS cm-1
).
EC: eletrical conductivity. DO: dissolved oxygen. TOC: total organic carbon.
ND, not determined.
Table S3. Relative abundance (%) of dominant lineages across acid mine drainage (AMD) microbial communities.
Sample
ID Euryarchaeota
Alpha- Beta- Gamma- Nitrospira Firmicutes
Actino- Acido- Others Unclassified
proteobacteria proteobacteria proteobacteria bacteria bacteria
NS 0.48 12.93 77.76 2.52 2.38 0.48 0.34 0.00 3.06 0.07
JGS1 0.60 0.65 0.05 41.69 47.56 6.77 0.15 0.05 2.49 0.00
JGS2 0.82 9.45 1.45 82.17 3.72 0.25 0.19 0.06 1.89 0.00
XSC1 0.01 1.50 95.96 2.26 0.21 0.00 0.02 0.00 0.04 0.00
XSC3 0.13 0.67 87.16 10.95 0.45 0.24 0.07 0.02 0.31 0.00
YSC1 10.15 43.18 3.88 7.80 27.35 0.69 0.03 0.75 4.54 1.63
YSC2 17.55 13.80 0.07 19.18 39.63 5.24 0.07 0.28 2.48 1.70
ZJ1 3.82 2.32 0.00 20.11 10.62 4.64 0.31 0.00 57.97 0.21
ZJ2 0.16 10.40 21.33 18.39 5.28 6.13 6.59 8.92 22.73 0.08
ZJ3 0.00 14.78 33.29 11.55 0.37 8.32 2.36 3.98 25.34 0.00
ZJ8 0.00 6.09 24.54 23.62 2.03 10.70 1.85 1.11 30.07 0.00
DBS1 0.21 3.22 17.37 70.73 1.32 0.13 0.08 0.81 6.10 0.02
DBS3 0.16 18.66 74.29 4.38 0.90 0.09 0.03 1.15 0.31 0.03
FK1 32.33 0.42 0.07 18.12 27.99 0.42 0.42 0.42 10.15 9.66
YF1 2.36 0.26 83.28 5.18 2.56 0.52 0.26 0.07 5.44 0.07
YF2 4.57 0.00 60.62 8.15 12.64 1.52 0.08 0.46 11.88 0.08
YF3 11.10 0.00 0.40 41.27 18.37 1.61 0.10 0.00 27.04 0.10
YF4 1.23 2.03 54.34 39.51 0.86 0.31 0.00 0.06 1.60 0.06
YF5 0.97 2.59 57.20 37.30 0.57 0.24 0.00 0.00 1.05 0.08
YF7 8.21 1.79 67.68 8.41 4.17 3.11 0.46 0.66 5.17 0.33
YF8 0.06 26.67 67.36 1.35 3.43 0.06 0.00 0.96 0.11 0.00
Table S3. Relative abundance (%) of dominant lineages across acid mine drainage (AMD) microbial communities (continued).
Sample
ID Euryarchaeota
Alpha- Beta- Gamma- Nitrospira Firmicutes
Actino- Acido- Others Unclassified
proteobacteria proteobacteria proteobacteria bacteria bacteria
DC1 3.76 10.94 35.21 12.31 2.22 2.05 1.71 1.20 30.26 0.34
DC2 5.40 2.61 42.51 2.96 0.17 0.35 0.35 0.70 43.73 1.22
DC3 0.09 15.82 76.14 1.07 0.09 0.00 0.00 0.09 6.70 0.00
DC5 0.41 0.34 94.87 1.98 0.07 0.68 0.00 0.07 1.44 0.14
DC7 2.63 2.21 60.28 4.80 4.32 19.34 0.28 0.66 5.42 0.07
DC8 1.49 5.60 76.87 7.20 0.96 0.36 0.16 0.13 7.01 0.23
PD1 17.72 1.15 30.13 28.84 6.17 1.58 0.50 0.93 12.12 0.86
PD3 0.59 1.51 86.69 2.43 1.44 3.21 0.33 1.77 1.97 0.07
PD4 16.65 0.35 62.60 2.01 2.27 1.31 0.35 1.05 13.34 0.09
PD7 24.59 1.66 20.99 12.15 20.03 6.35 0.28 1.66 11.46 0.83
SL 3.06 0.72 4.69 46.84 28.40 0.98 1.63 0.07 13.55 0.07
DX1 4.00 7.21 35.19 10.28 13.28 2.43 2.93 7.49 16.35 0.86
DX2 21.78 0.34 1.24 3.84 11.51 17.72 19.19 0.34 20.43 3.61
DX3 11.68 3.09 0.20 23.65 22.65 16.27 7.09 0.30 14.37 0.70
YP1 0.47 6.72 54.10 25.69 1.28 1.34 2.82 4.34 3.19 0.03
YP2 6.61 2.15 17.12 39.19 27.97 1.78 0.71 0.45 3.57 0.45
YP3 1.76 4.11 19.31 3.44 68.50 2.35 0.08 0.04 0.38 0.04
YP4 4.09 4.04 11.96 18.86 46.92 6.70 0.79 2.51 3.79 0.34
YP5 8.08 1.66 21.62 21.78 29.91 2.78 1.44 1.07 10.43 1.23
All phylotypes were classified at the phylum level (subphylum for the Proteobacteria).
Others include 12 phyla: Bacteroidetes, Chlamydiae, Chloroflexi, Crenarchaeota, Cyanobacteria, Deinococcus-Thermus, Gemmatimonadetes, OD1, OP11,
Planctomycetes, TM7, Verrucomicrobia; and two subphyla for Proteobacteria: Deltaproteobacteria and Epsilonproteobacteria.
Table S4. Summary of statistics (R2) from dissimilarity test (Adonis) between two mining areas on
the functional community structure.
AHTL JXDX JXYP FJZJ GDYF GXDC GZPD
Longitude (E) 118 117 117 116 112 108 106
Latitude (N) 31 29 28 25 23 25 26
Distancea 0.01-6.85 0.04-0.60 0.01-1.83 1.09-2.30 0.02-2.38 0.01-4.46 0.45-21.08
No. of samples 6 3 5 4 7 6 4
AHTL
0.155 0.079 0.119 0.186* 0.089 0.080
JXDX
0.195 0.281 0.222 0.105 0.281*
JXYP
0.117 0.159 0.095 0.114
FJZJ
0.206 0.187 0.119
GDYF
0.233* 0.242*
GXDC 0.108
Samples in a mining area were clustered into a group and compared with others based on Bray-Curtis
dissimilarity of the log-transferred signal intensity of the GeoChip data using Adonis (*, P < 0.05). Mining
areas with less than 3 samples were excluded from this analysis (i.e., totally 5 samples in 4 mining areas
were excluded). a The range of distance (km) between two samples within the mining area.
Table S5. Environmental and taxonomic variable loadings on the PCs across the AMD samples.
Environmental properties
(Abbr.)
PCEnv1
(E1)
PCEnv2
(E2) Microbial taxa
(Abbr.)
PCTaxa1
(T1)
PCTaxa2
(T2)
PCTaxa3
(T3)
PCTaxa4
(T4)
exp.* 0.522 0.203 exp. 0.280 0.169 0.141 0.118
pH
0.636 -0.001
Euryarchaeota (Eury)
-1.050 0.251 -0.691 0.062
Dissolved Oxygen (DO)
0.103 0.148
Acidobacteria (Acido)
-0.088 -0.769 0.368 -0.647
Total Organic Carbon (TOC)
-0.397 -0.175
Actinobacteria (Actino)
-0.842 -0.812 0.019 0.081
Electrical Conductivity (EC)
-0.503 -0.040
Firmicutes (Firm)
-0.808 -0.690 0.180 0.184
Sulfate (SO42-
)
-0.592 0.118
Nitrospira (Nitro)
-0.712 0.711 0.133 -0.420
Ferric ion (Fe3+
)
-1.519 0.387
Alphaproteobacteria (Alpha)
0.262 -0.074 -0.083 -1.214
Ferrous ion (Fe2+
)
-1.174 0.983
Betaproteobacteria (Beta)
1.151 -0.370 -0.620 0.298
Aluminum (Al)
-1.047 -0.152
Gammaproteobacteria (Gamma) -0.222 0.708 0.980 0.230
Copper (Cu)
-0.669 -1.246
Zinc (Zn)
-1.032 -0.342
Arsenic (As)
-0.615 -0.348
Cadmium (Cd)
-0.194 -0.205
Lead (Pb)
-0.376 -0.278
Phosphorus (P) -0.100 -0.024
Variables in bold show the dominant influence (top-50%) on each PC.
* Proportion explained.
Table S6. Multiple linear regression (MLR) of environmental variables and relative abundance of dominant microbial lineages on metabolic potential of functional genes.
Category Subcategory Gene PCsa AIC Best modelb Environmental properties Microbial taxa
pH Fe3+ Fe2+ Al Cu Zn Eury Acido Nitro Alpha Beta Gamma
Nitrogen cycling Denitrification narG E1 112.51 pH + Al + Cu -0.470
-0.452
Nitrogen cycling Denitrification nirK E1 112.06 Cu 0.386
Nitrogen cycling Assimilatory N reduction nirB E1 112.46 Fe2+ + Fe3+
0.666 -0.571
Sulfur cycling Sulfite reductase dsrB E1 107.26 pH + Cu -0.589
Energy process Electron transport Fe-S cluster binding protein E1 114.04 pH -0.325
Energy process Electron transport ferredoxin E1 114.04 Fe2+ + pH -0.365 0.447
Energy process Electron transport NADH ubiquinone oxidoreductase E1 113.82 pH -0.331
Energy process Electron transport terminal quinol oxidase E1 114.34 pH -0.314
Energy process Hydrogenase hydrogenase E1 113.57 pH -0.340
Metal resistance As arsB E1 113.17 pH -0.353
Metal resistance As arsM E1 113.80 pH -0.333
Metal resistance Cd cadA E1 112.78 pH -0.365
Metal resistance Cr chrA E1 112.66 pH -0.369
Metal resistance Cu copA E1 114.03 pH + Al -0.529
Metal resistance Te terC E1 114.33 Cu
0.315
Metal resistance Te terD E1 110.58 Zn + Fe2++ pH + Al -0.553 -0.368 0.326
Stress response Heat dnaK E1 113.66 Cu
0.338
Stress response Nitrogen limitation glnA E1 113.37 pH + Fe2+ + Zn -0.398
Stress response Oxygen stress fnr E1 107.60 pH -0.488
Stress response Oxygen stress oxyR E1 114.20 pH -0.320
Stress response Protein stress clpC E1 112.74 Al + pH 0.415
0.577
Antibiotic resistance Transporter MatE antibiotics E1 115.38 pH + Zn -0.352
Antibiotic resistance Transporter SMR antibiotics E1 113.91 pH -0.329
Nitrogen cycling Denitrification norB T1 112.60 Eury + Firm 0.316
Nitrogen cycling Dissimilatory N reduction nrfA T1 113.90 Eury
0.330
Stress response Heat hrcA T1 114.24 Eury -0.318
Stress response Nitrogen limitation glnR T1 111.51 Eury
0.401
Stress response Phosphate limitation pstC T1 102.82 Eury 0.570
Antibiotic resistance Transporter ABC antibiotic transporter T1 113.35 Beta
-0.348
Nitrogen cycling Assimilatory N reduction nirA T3 113.81 Gamma -0.333
Phosphorus Phosphorus utilization ppk T3 112.60 Gamma + Eury
-0.303
Sulfur cycling Sulfur oxidation sox T3 113.18 Gamma 0.353
Metal resistance Ag silP T3 109.03 Eury + Beta
0.552
0.326
Metal resistance Cd cadBD T3 111.11 Gamma + Beta 0.366 0.530
Metal resistance Ni nreB T3 111.07 Gamma
-0.412
Stress response Heat groES T3 106.08 Gamma + Acido + Eury -0.309 -0.405
Stress response Glucose limitation bglH T3 113.75 Gamma + Beta
0.464
Antibiotic resistance Transporter Mex T3 112.56 Gamma + Eury 0.318
Nitrogen cycling Ammonification gdh T4 112.59 Nitro + Acido
0.297 0.333
Energy process Electron transport ferredoxin oxidoreductase T4 114.35 Nitro 0.314
Metal resistance Hg mer T4 113.13 Alpha
0.355
Stress response Oxygen limitation cydB T4 114.32 Alpha 0.315
Antibiotic resistance other category Van T4 111.90 Beta + Alpha -0.304 0.307
a The most important PCs to the metabolic potential of functional genes that determined by ABT model, and the variables with dominant influence based on PC loadings were selected as input in the multiple linear regression (MLR) models.
b The best model is based on the AIC value.
Only significant estimates (P < 0.05) for the best model with stepwise method were reported to show the most important environmental properties and dominant taxa to the metabolic potential of functional gene.
Table S7. Validation of predictive models for relative abundances of dominant microbial taxa (Phylum level, mean relative
abundance > 1%) based on the artificial neural network (ANN).
Taxa Occurrenceb
Bray-Curtis similaritya
Predicted vs Observed Null model (Mean)c Null model (Minimum)
d
Euryarchaeota 95.0% 0.685
0.489
0.001
Acidobacteria 87.5% 0.709
0.485
0.002
Actinobacteria 87.5% 0.694
0.369
0.001
Firmicutes 95.0% 0.725
0.497
0.001
Nitrospira 100.0% 0.757
0.489
0.011
Alphaproteobacteria 95.0% 0.657
0.518
0.001
Betaproteobacteria 97.5% 0.783
0.640
0.001
Gammaproteobacteria 100.0% 0.801 0.620 0.109
a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative
microbial abundances.
b The occurrence shows the percentage of the total samples where the given taxa were detected.
c This null model is to set all taxa’s predicted relative abundance equal to the average taxa abundance across all samples.
d This null model is to set all taxa abundances equal to the minimum observed taxa’s abundance.
Table S8. Validation of predictive models for relative abundances of dominant microbial taxa (Order level, mean relative abundance > 0.1%) based
on the artificial neural network (ANN).
Taxa Occurrenceb
Bray-Curtis similaritya
Predicted vs Observed Null model (Mean)c Null model (Minimum)d
Nitrospirales 100.0% 0.735
0.475
0.001
Acidithiobacillales 97.5% 0.606
0.464
0.002
Thermoplasmatales 95.0% 0.493
0.123
0.037
Rhodospirillales 95.0% 0.456
0.186
0.022
Ferrovales 90.0% 0.717
0.413
0.007
Acidobacteria_Gp1 85.0% 0.709
0.313
0.010
Bacillales 85.0% 0.536
0.101
0.010
Burkholderiales 85.0% 0.813
0.230
0.011
Clostridiales 82.5% 0.782
0.350
0.011
Xanthomonadales 82.5% 0.630
0.226
0.012
Legionellales 65.0% 0.546
0.398
0.002
Rhizobiales 60.0% 0.797
0.395
0.002
Acidimicrobiales 52.5% 0.737
0.218
0.029
Pseudomonadales 52.5% 0.376
0.489
0.011
Chlamydiales 47.5% 0.739
0.199
0.036
Actinomycetales 45.0% 0.663
0.136
0.032
Sphingomonadales 42.5% 0.828
0.275
0.005
Rhodocyclales 42.5% 0.746
0.081
0.014
Chloroplast 32.5% 0.513
0.436
0.001
Hydrogenophilales 30.0% 0.906
0.340
0.015
Desulfuromonadales 30.0% 0.735
0.343
0.001
Sphingobacteriales 27.5% 0.868
0.148
0.005
Gemmatimonadales 22.5% 0.485
0.113
0.019
Planctomycetales 22.5% 0.701
0.614
0.001
Caulobacterales 22.5% 0.159
0.262
0.007
Desulfobacterales 22.5% 0.488
0.169
0.008
Holophagales 20.0% 0.806
0.282
0.023
Campylobacterales 17.5% 0.974
0.095
0.007
Opitutales 17.5% 0.671
0.492
0.001
Bacteroidales 15.0% 0.800
0.072
0.050
Enterobacteriales 15.0% 0.870
0.118
0.018
Acidobacteria_Gp16 12.5% 0.857
0.408
0.005
Neisseriales 12.5% 0.452
0.341
0.009
Rhodobacterales 7.5% 0.905
0.351
0.002
Aeromonadales 5.0% 0.975 0.174 0.039
a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative microbial abundances.
b The occurrence shows the percentage of the total samples where the given taxa were detected.
c This null model is to set all taxa’s predicted relative abundance equal to the average taxa abundance across all samples.
d This null model is to set all taxa abundances equal to the minimum observed taxa’s abundance.
Table S9. Validation of predictive models for relative abundances of key microbial taxa (OTU level, observed in at least half of the
total samples) based on the artificial neural network (ANN).
Taxa Occurrenceb
Bray-Curtis similaritya
Predicted vs Observed Null model (Mean)c Null model (Minimum)
d
OTU2197 97.5% 0.688
0.493
0.001
OTU1 90.0% 0.569
0.436
0.001
OTU3 90.0% 0.526
0.436
0.001
OTU2196 80.0% 0.694
0.522
0.001
OTU5 77.5% 0.529
0.439
0.001
OTU0 75.0% 0.585
0.338
0.001
OTU10 57.5% 0.208
0.166
0.001
OTU12 57.5% 0.794
0.347
0.002
OTU4 55.0% 0.805
0.294
0.001
OTU2 52.5% 0.463
0.281
0.001
OTU11 52.5% 0.702
0.343
0.003
OTU17 52.5% 0.551
0.368
0.001
OTU21 52.5% 0.564
0.471
0.009
OTU26 52.5% 0.595 0.410 0.004
a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative microbial
abundances. b The occurrence shows the percentage of the total samples where the given taxa were detected.
c This null model is to set all taxa’s predicted relative abundance equal to the average taxa abundance across all samples.
d This null model is to set all taxa abundances equal to the minimum observed taxa’s abundance.
Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN).
Genes Abbr.
Bray-Curtis similaritya (Original signals)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
aclB aclB 0.940 0.299 0.141 0.930 0.345 0.076 0.950 0.299 0.141 0.930 0.345 0.076
CODH CODH 0.969 0.317 0.263 0.954 0.429 0.237 0.975 0.317 0.263 0.954 0.429 0.237
Pcc Pcc 0.974 0.294 0.193 0.953 0.357 0.119 0.970 0.294 0.193 0.953 0.357 0.119
RubisCo RubisCo 0.961 0.290 0.215 0.959 0.353 0.195 0.965 0.290 0.215 0.959 0.353 0.195
nifH nifH 0.931 0.297 0.239 0.949 0.347 0.206 0.965 0.297 0.239 0.958 0.347 0.206
gdh gdh 0.916 0.262 0.172 0.905 0.348 0.178 0.941 0.262 0.172 0.905 0.348 0.178
ureC ureC 0.924 0.275 0.222 0.949 0.394 0.157 0.956 0.275 0.222 0.949 0.394 0.157
amoA amoA 0.962 0.326 0.196 0.954 0.372 0.187 0.976 0.326 0.196 0.958 0.372 0.187
narG narG 0.971 0.312 0.274 0.961 0.362 0.210 0.982 0.312 0.274 0.961 0.362 0.210
nirK nirK 0.965 0.289 0.235 0.950 0.339 0.202 0.971 0.289 0.235 0.950 0.339 0.202
nirS nirS 0.946 0.285 0.181 0.941 0.335 0.107 0.940 0.285 0.181 0.941 0.335 0.107
norB norB 0.929 0.249 0.139 0.920 0.299 0.065 0.943 0.249 0.139 0.917 0.299 0.065
nosZ nosZ 0.930 0.282 0.151 0.948 0.332 0.077 0.934 0.282 0.151 0.948 0.332 0.077
nasA nasA 0.930 0.243 0.143 0.915 0.293 0.110 0.913 0.243 0.143 0.925 0.293 0.110
NiR NiR 0.918 0.283 0.198 0.954 0.333 0.165 0.946 0.283 0.198 0.956 0.333 0.165
nirA nirA 0.900 0.235 0.033 0.914 0.285 0.000 0.914 0.235 0.033 0.914 0.285 0.000
nirB nirB 0.866 0.200 0.016 0.857 0.250 0.017 0.876 0.200 0.016 0.857 0.250 0.017
napA napA 0.950 0.293 0.204 0.950 0.342 0.140 0.945 0.293 0.204 0.950 0.342 0.140
nrfA nrfA 0.929 0.257 0.159 0.919 0.320 0.085 0.931 0.257 0.159 0.919 0.320 0.085
Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).
Genes Abbr.
Bray-Curtis similaritya (Original signals)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
phytase phytase 0.797 0.255 0.133 0.918 0.318 0.131 0.868 0.255 0.133 0.950 0.318 0.131
ppk ppk 0.912 0.267 0.151 0.930 0.329 0.150 0.974 0.267 0.151 0.930 0.329 0.150
ppx ppx 0.961 0.287 0.277 0.954 0.350 0.275 0.965 0.287 0.277 0.954 0.350 0.275
aprA aprA 0.950 0.318 0.175 0.944 0.365 0.166 0.964 0.318 0.175 0.958 0.365 0.166
aprB aprB 0.914 0.283 0.142 0.912 0.329 0.133 0.916 0.283 0.142 0.912 0.329 0.133
dsrA dsrA 0.966 0.321 0.270 0.964 0.408 0.191 0.970 0.321 0.270 0.964 0.408 0.191
dsrB dsrB 0.953 0.308 0.205 0.955 0.395 0.125 0.954 0.308 0.205 0.955 0.395 0.125
sox sox 0.944 0.270 0.177 0.936 0.390 0.157 0.949 0.270 0.177 0.936 0.390 0.157
Fe-S cluster
binding protein fes 0.975 0.209 0.127 0.870 0.296 0.207 0.980 0.209 0.127 0.870 0.296 0.207
ferredoxin fer 0.847 0.173 0.014 0.902 0.260 0.094 0.907 0.173 0.014 0.902 0.260 0.094
ferredoxin
oxidoreductase fero 0.908 0.188 0.029 0.872 0.275 0.109 0.925 0.188 0.029 0.872 0.275 0.109
NADH ubiquinone
oxidoreductase NADH 0.973 0.173 0.055 0.889 0.223 0.119 0.989 0.173 0.055 0.889 0.223 0.119
terminal quinol
oxidase quio 0.980 0.257 0.020 0.943 0.377 0.085 0.987 0.257 0.020 0.943 0.377 0.085
cytochrome cyt 0.902 0.311 0.193 0.946 0.423 0.173 0.912 0.311 0.193 0.946 0.423 0.173
hydrogenase hyd 0.918 0.268 0.025 0.919 0.355 0.032 0.931 0.268 0.025 0.917 0.355 0.032
Ni-Fe hydrogenase NFhyd 0.884 0.213 0.124 0.863 0.263 0.127 0.904 0.213 0.124 0.879 0.263 0.127
glycosyl
transferase glyt 0.912 0.201 0.077 0.899 0.288 0.083 0.913 0.201 0.077 0.892 0.288 0.083
ABC transporter ABCt 0.974 0.272 0.003 0.918 0.319 0.006 0.974 0.272 0.003 0.938 0.319 0.006
Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).
Genes Abbr.
Bray-Curtis similaritya (Original signals)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
silA silA 0.762 0.161 0.119 0.855 0.281 0.121 0.829 0.161 0.119 0.866 0.281 0.121
silC silC 0.926 0.233 0.020 0.932 0.353 0.010 0.948 0.233 0.020 0.932 0.353 0.010
silP silP 0.928 0.229 0.140 0.940 0.349 0.120 0.943 0.229 0.140 0.940 0.349 0.120
al al 0.933 0.281 0.066 0.912 0.328 0.057 0.928 0.281 0.066 0.912 0.328 0.057
aoxB aoxB 0.949 0.317 0.184 0.951 0.364 0.175 0.972 0.317 0.184 0.951 0.364 0.175
arsA arsA 0.797 0.203 0.122 0.845 0.249 0.123 0.831 0.203 0.122 0.875 0.249 0.123
arsB arsB 0.921 0.241 0.075 0.899 0.287 0.066 0.956 0.241 0.075 0.928 0.287 0.066
arsC arsC 0.973 0.334 0.273 0.968 0.380 0.264 0.986 0.334 0.273 0.971 0.380 0.264
arsM arsM 0.976 0.190 0.041 0.901 0.236 0.032 0.977 0.190 0.041 0.901 0.236 0.032
cadA cadA 0.952 0.305 0.213 0.943 0.417 0.187 0.974 0.305 0.213 0.943 0.417 0.187
cadBD cadBD 0.849 0.235 0.072 0.885 0.347 0.047 0.937 0.235 0.072 0.885 0.347 0.047
czcA czcA 0.970 0.323 0.190 0.949 0.435 0.111 0.981 0.323 0.190 0.949 0.435 0.111
czcC czcC 0.858 0.193 0.080 0.849 0.305 0.160 0.866 0.193 0.080 0.849 0.305 0.160
czcD czcD 0.965 0.335 0.272 0.969 0.447 0.193 0.966 0.335 0.272 0.974 0.447 0.193
corC corC 0.885 0.263 0.029 0.903 0.375 0.109 0.884 0.263 0.029 0.903 0.375 0.109
chrA chrA 0.967 0.334 0.261 0.967 0.446 0.235 0.969 0.334 0.261 0.966 0.446 0.235
copA copA 0.964 0.312 0.265 0.946 0.424 0.185 0.985 0.312 0.265 0.949 0.424 0.185
cueO cueO 0.745 0.316 0.255 0.755 0.128 0.044 0.799 0.316 0.255 0.755 0.128 0.044
cusA cusA 0.894 0.232 0.120 0.879 0.344 0.284 0.914 0.232 0.120 0.879 0.344 0.284
Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).
Genes Abbr.
Bray-Curtis similaritya (Original signals)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
mer mer 0.925 0.283 0.241 0.940 0.370 0.177 0.932 0.283 0.241 0.940 0.370 0.177
merB merB 0.821 0.163 0.099 0.847 0.213 0.164 0.879 0.163 0.099 0.847 0.213 0.164
merP merP 0.886 0.192 0.093 0.904 0.242 0.029 0.907 0.192 0.093 0.904 0.242 0.029
nreB nreB 0.869 0.155 0.025 0.838 0.218 0.132 0.879 0.155 0.025 0.835 0.218 0.132
pbrA pbrA 0.918 0.233 0.180 0.918 0.296 0.254 0.918 0.233 0.180 0.925 0.296 0.254
pbrT pbrT 0.812 0.212 0.104 0.876 0.274 0.178 0.850 0.212 0.104 0.876 0.274 0.178
tehB tehB 0.796 0.265 0.249 0.972 0.385 0.229 0.870 0.265 0.249 0.972 0.385 0.229
terC terC 0.960 0.280 0.215 0.967 0.400 0.195 0.971 0.280 0.215 0.967 0.400 0.195
terD terD 0.963 0.275 0.216 0.953 0.395 0.152 0.985 0.275 0.216 0.951 0.395 0.152
terZ terZ 0.917 0.195 0.020 0.873 0.315 0.085 0.922 0.195 0.020 0.873 0.315 0.085
zitB zitB 0.914 0.218 0.048 0.921 0.337 0.017 0.961 0.218 0.048 0.921 0.337 0.017
zntA zntA 0.945 0.265 0.116 0.938 0.385 0.052 0.953 0.265 0.116 0.938 0.385 0.052
cspA cspA 0.891 0.260 0.151 0.904 0.372 0.231 0.917 0.260 0.151 0.904 0.372 0.231
cspB cspB 0.833 0.214 0.089 0.860 0.326 0.169 0.878 0.214 0.089 0.860 0.326 0.169
dnaK dnaK 0.940 0.293 0.194 0.946 0.380 0.114 0.947 0.293 0.194 0.946 0.380 0.114
groEL groEL 0.925 0.274 0.141 0.946 0.361 0.147 0.952 0.274 0.141 0.946 0.361 0.147
groES groES 0.927 0.256 0.050 0.911 0.343 0.043 0.962 0.256 0.050 0.929 0.343 0.043
grpE grpE 0.962 0.318 0.223 0.960 0.405 0.229 0.967 0.318 0.223 0.960 0.405 0.229
hrcA hrcA 0.956 0.320 0.249 0.963 0.407 0.255 0.964 0.320 0.249 0.964 0.407 0.255
Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).
Genes Abbr.
Bray-Curtis similaritya (Original signals)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
bglH bglH 0.893 0.262 0.013 0.895 0.308 0.038 0.882 0.262 0.013 0.895 0.308 0.038
bglP bglP 0.905 0.256 0.081 0.905 0.302 0.055 0.935 0.256 0.081 0.905 0.302 0.055
glnA glnA 0.976 0.328 0.309 0.974 0.415 0.316 0.982 0.328 0.309 0.974 0.415 0.316
glnR glnR 0.911 0.236 0.057 0.886 0.323 0.064 0.950 0.236 0.057 0.919 0.323 0.064
arcA arcA 0.935 0.174 0.022 0.933 0.220 0.013 0.950 0.174 0.022 0.933 0.220 0.013
arcB arcB 0.905 0.259 0.122 0.917 0.306 0.226 0.918 0.259 0.122 0.917 0.306 0.226
cydA cydA 0.905 0.277 0.112 0.919 0.389 0.032 0.933 0.277 0.112 0.919 0.389 0.032
cydB cydB 0.947 0.273 0.020 0.906 0.385 0.060 0.955 0.273 0.020 0.906 0.385 0.060
narH narH 0.886 0.200 0.015 0.909 0.250 0.018 0.890 0.200 0.015 0.927 0.250 0.018
narI narI 0.816 0.297 0.265 0.967 0.347 0.232 0.975 0.297 0.265 0.967 0.347 0.232
narJ narJ 0.904 0.244 0.065 0.895 0.294 0.032 0.911 0.244 0.065 0.895 0.294 0.032
ahpC ahpC 0.973 0.333 0.235 0.972 0.379 0.226 0.976 0.333 0.235 0.972 0.379 0.226
ahpF ahpF 0.948 0.292 0.219 0.948 0.339 0.210 0.963 0.292 0.219 0.949 0.339 0.210
fnr fnr 0.978 0.315 0.247 0.977 0.402 0.167 0.985 0.315 0.247 0.977 0.402 0.167
katA katA 0.918 0.272 0.158 0.922 0.359 0.164 0.927 0.272 0.158 0.924 0.359 0.164
katE katE 0.967 0.313 0.249 0.962 0.400 0.185 0.966 0.313 0.249 0.962 0.400 0.185
oxyR oxyR 0.969 0.280 0.221 0.966 0.343 0.147 0.975 0.280 0.221 0.968 0.343 0.147
perR perR 0.824 0.140 0.030 0.819 0.202 0.130 0.825 0.140 0.030 0.819 0.202 0.130
proV proV 0.937 0.293 0.226 0.950 0.356 0.224 0.942 0.293 0.226 0.950 0.356 0.224
proX proX 0.909 0.227 0.111 0.908 0.289 0.109 0.911 0.227 0.111 0.908 0.289 0.109
phoA phoA 0.956 0.274 0.165 0.955 0.337 0.163 0.980 0.274 0.165 0.955 0.337 0.163
Table S10a. Validation of predictive models for metabolic potentials (original signals) of key functional genes based on the artificial neural network (ANN) (continued).
Genes Abbr.
Bray-Curtis similaritya (Original signals)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
phoB phoB 0.953 0.286 0.227 0.962 0.349 0.225 0.959 0.286 0.227 0.962 0.349 0.225
pstA pstA 0.974 0.288 0.252 0.981 0.350 0.250 0.982 0.288 0.252 0.981 0.350 0.250
pstB pstB 0.976 0.306 0.244 0.964 0.369 0.242 0.981 0.306 0.244 0.964 0.369 0.242
pstC pstC 0.964 0.282 0.186 0.955 0.344 0.166 0.970 0.282 0.186 0.955 0.344 0.166
pstS pstS 0.894 0.243 0.139 0.932 0.306 0.119 0.896 0.243 0.139 0.932 0.306 0.119
clpC clpC 0.964 0.333 0.190 0.964 0.445 0.165 0.974 0.333 0.190 0.964 0.445 0.165
ctsR ctsR 0.901 0.277 0.087 0.923 0.389 0.007 0.912 0.277 0.087 0.923 0.389 0.007
obgE obgE 0.955 0.279 0.186 0.945 0.342 0.112 0.956 0.279 0.186 0.949 0.342 0.112
ABC antibiotic
transporter ABCat 0.919 0.288 0.060 0.924 0.334 0.051 0.921 0.288 0.060 0.924 0.334 0.051
MatE antibiotics MatE 0.948 0.296 0.131 0.946 0.383 0.067 0.986 0.296 0.131 0.948 0.383 0.067
MFS antibiotics MFS 0.966 0.317 0.234 0.966 0.367 0.170 0.970 0.317 0.234 0.966 0.367 0.170
SMR antibiotics SMR 0.974 0.295 0.261 0.974 0.415 0.241 0.986 0.295 0.261 0.974 0.415 0.241
Mex Mex 0.815 0.148 0.037 0.811 0.198 0.043 0.867 0.148 0.037 0.881 0.198 0.043
beta-lactamase lac 0.895 0.276 0.007 0.907 0.323 0.019 0.907 0.276 0.007 0.915 0.323 0.019
class A
beta-lactamase lacA 0.942 0.312 0.125 0.949 0.424 0.099 0.953 0.312 0.125 0.949 0.424 0.099
class C
beta-lactamase lacC 0.920 0.313 0.224 0.944 0.425 0.199 0.921 0.313 0.224 0.950 0.425 0.199
Tet Tet 0.942 0.265 0.091 0.937 0.385 0.027 0.951 0.265 0.091 0.942 0.385 0.027
Van Van 0.806 0.106 0.017 0.771 0.225 0.134 0.835 0.106 0.017 0.814 0.225 0.134
a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted metabolic potential of functional genes.
b Models were constructed without the information of microbial abundances of dominant phyla.
c Models were constructed with the information of microbial abundances of dominant phyla.
d The occurrence shows the percentage of the total samples where the probes of a given gene were detected in.
e This null model is to set all predicted metabolic potentials equal to the average value across all samples.
f This null model is to set all metabolic potentials equal to the minimum observed value.
Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN).
Genes Abbr.
Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
aclB aclB 0.755 0.104 0.054 0.818 0.175 0.055 0.810 0.104 0.054 0.818 0.175 0.055
CODH CODH 0.869 0.213 0.056 0.818 0.112 0.055 0.869 0.213 0.056 0.818 0.112 0.055
Pcc Pcc 0.791 0.170 0.042 0.863 0.214 0.044 0.791 0.170 0.042 0.863 0.214 0.044
RubisCo RubisCo 0.865 0.206 0.040 0.858 0.180 0.040 0.876 0.206 0.040 0.858 0.180 0.040
nifH nifH 0.786 0.175 0.050 0.800 0.139 0.051 0.786 0.175 0.050 0.836 0.139 0.051
gdh gdh 0.744 0.078 0.062 0.753 0.191 0.062 0.773 0.078 0.062 0.776 0.191 0.062
ureC ureC 0.859 0.246 0.059 0.814 0.155 0.058 0.859 0.246 0.059 0.814 0.155 0.058
amoA amoA 0.792 0.108 0.036 0.831 0.208 0.037 0.792 0.108 0.036 0.852 0.208 0.037
narG narG 0.839 0.187 0.041 0.819 0.145 0.040 0.839 0.187 0.041 0.819 0.145 0.040
nirK nirK 0.891 0.207 0.052 0.844 0.113 0.051 0.891 0.207 0.052 0.844 0.113 0.051
nirS nirS 0.863 0.253 0.043 0.824 0.164 0.042 0.873 0.253 0.043 0.824 0.164 0.042
norB norB 0.850 0.189 0.043 0.794 0.105 0.042 0.871 0.189 0.043 0.843 0.105 0.042
nosZ nosZ 0.849 0.134 0.041 0.867 0.179 0.041 0.849 0.134 0.041 0.876 0.179 0.041
nasA nasA 0.839 0.170 0.068 0.776 0.089 0.067 0.834 0.170 0.068 0.817 0.089 0.067
NiR NiR 0.703 0.153 0.064 0.857 0.144 0.067 0.726 0.153 0.064 0.868 0.144 0.067
nirA nirA 0.800 0.174 0.048 0.836 0.147 0.049 0.800 0.174 0.048 0.836 0.147 0.049
nirB nirB 0.777 0.112 0.047 0.766 0.091 0.047 0.777 0.112 0.047 0.766 0.091 0.047
napA napA 0.794 0.165 0.042 0.876 0.165 0.043 0.794 0.165 0.042 0.876 0.165 0.043
nrfA nrfA 0.814 0.208 0.047 0.779 0.137 0.046 0.814 0.208 0.047 0.779 0.137 0.046
Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).
Genes Abbr.
Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
phytase phytase 0.775 0.111 0.045 0.797 0.153 0.046 0.775 0.111 0.045 0.797 0.153 0.046
ppk ppk 0.842 0.201 0.046 0.826 0.167 0.046 0.842 0.201 0.046 0.826 0.167 0.046
ppx ppx 0.867 0.261 0.066 0.727 0.265 0.063 0.883 0.261 0.066 0.727 0.265 0.063
aprA aprA 0.858 0.203 0.036 0.839 0.201 0.036 0.858 0.203 0.036 0.875 0.201 0.036
aprB aprB 0.725 0.180 0.058 0.763 0.134 0.059 0.746 0.180 0.058 0.763 0.134 0.059
dsrA dsrA 0.857 0.235 0.063 0.823 0.167 0.062 0.857 0.235 0.063 0.823 0.167 0.062
dsrB dsrB 0.784 0.231 0.036 0.865 0.193 0.037 0.784 0.231 0.036 0.865 0.193 0.037
sox sox 0.873 0.284 0.043 0.820 0.183 0.042 0.884 0.284 0.043 0.836 0.183 0.042
Fe-S cluster
binding protein fes 0.821 0.251 0.043 0.786 0.154 0.042 0.847 0.251 0.043 0.786 0.154 0.042
ferredoxin fer 0.790 0.289 0.110 0.865 0.261 0.112 0.790 0.289 0.110 0.865 0.261 0.112
ferredoxin
oxidoreductase fero 0.805 0.182 0.064 0.757 0.186 0.063 0.805 0.182 0.064 0.757 0.186 0.063
NADH ubiquinone
oxidoreductase NADH 0.992 0.293 0.058 0.737 0.284 0.053 0.992 0.293 0.058 0.737 0.284 0.053
terminal quinol
oxidase quio 0.987 0.337 0.051 0.861 0.285 0.048 0.988 0.337 0.051 0.861 0.285 0.048
cytochrome cyt 0.815 0.131 0.042 0.846 0.194 0.043 0.815 0.131 0.042 0.846 0.194 0.043
hydrogenase hyd 0.849 0.330 0.073 0.725 0.242 0.071 0.849 0.330 0.073 0.686 0.242 0.071
Ni-Fe hydrogenase NFhyd 0.840 0.144 0.049 0.853 0.188 0.049 0.840 0.144 0.049 0.871 0.188 0.049
glycosyl transferase glyt 0.829 0.170 0.042 0.859 0.183 0.042 0.861 0.170 0.042 0.844 0.183 0.042
ABC transporter ABCt 0.827 0.143 0.039 0.855 0.200 0.040 0.827 0.143 0.039 0.855 0.200 0.040
Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).
Genes Abbr.
Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
silA silA 0.753 0.180 0.047 0.798 0.142 0.048 0.807 0.180 0.047 0.823 0.142 0.048
silC silC 0.814 0.170 0.040 0.881 0.186 0.041 0.832 0.170 0.040 0.881 0.186 0.041
silP silP 0.732 0.112 0.044 0.879 0.106 0.046 0.808 0.112 0.044 0.879 0.106 0.046
al al 0.829 0.146 0.043 0.844 0.178 0.043 0.829 0.146 0.043 0.844 0.178 0.043
aoxB aoxB 0.877 0.221 0.042 0.862 0.191 0.042 0.877 0.221 0.042 0.862 0.191 0.042
arsA arsA 0.798 0.128 0.053 0.802 0.150 0.054 0.840 0.128 0.053 0.858 0.150 0.054
arsB arsB 0.868 0.211 0.063 0.794 0.092 0.062 0.868 0.211 0.063 0.823 0.092 0.062
arsC arsC 0.873 0.164 0.042 0.859 0.145 0.041 0.873 0.164 0.042 0.869 0.145 0.041
arsM arsM 0.839 0.212 0.080 0.843 0.220 0.080 0.839 0.212 0.080 0.843 0.220 0.080
cadA cadA 0.792 0.138 0.043 0.854 0.143 0.044 0.810 0.138 0.043 0.854 0.143 0.044
cadBD cadBD 0.779 0.178 0.075 0.758 0.195 0.074 0.822 0.178 0.075 0.758 0.195 0.074
czcA czcA 0.874 0.271 0.035 0.840 0.191 0.035 0.885 0.271 0.035 0.840 0.191 0.035
czcC czcC 0.828 0.220 0.059 0.769 0.101 0.057 0.828 0.220 0.059 0.769 0.101 0.057
czcD czcD 0.784 0.158 0.049 0.851 0.181 0.050 0.784 0.158 0.049 0.841 0.181 0.050
corC corC 0.864 0.200 0.044 0.868 0.192 0.044 0.882 0.200 0.044 0.868 0.192 0.044
chrA chrA 0.842 0.181 0.045 0.848 0.192 0.045 0.842 0.181 0.045 0.848 0.192 0.045
copA copA 0.760 0.262 0.053 0.807 0.285 0.054 0.809 0.262 0.053 0.785 0.285 0.054
cueO cueO 0.723 0.108 0.124 0.775 0.212 0.125 0.723 0.108 0.124 0.775 0.212 0.125
cusA cusA 0.848 0.156 0.039 0.840 0.190 0.039 0.843 0.156 0.039 0.885 0.190 0.039
Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).
Genes Abbr.
Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
mer mer 0.803 0.147 0.067 0.758 0.156 0.066 0.803 0.147 0.067 0.758 0.156 0.066
merB merB 0.764 0.143 0.070 0.745 0.189 0.069 0.781 0.143 0.070 0.745 0.189 0.069
merP merP 0.803 0.203 0.080 0.813 0.223 0.080 0.803 0.203 0.080 0.813 0.223 0.080
nreB nreB 0.832 0.172 0.058 0.806 0.138 0.058 0.832 0.172 0.058 0.824 0.138 0.058
pbrA pbrA 0.888 0.196 0.039 0.901 0.224 0.039 0.888 0.196 0.039 0.902 0.224 0.039
pbrT pbrT 0.812 0.166 0.047 0.817 0.177 0.047 0.812 0.166 0.047 0.817 0.177 0.047
tehB tehB 0.890 0.280 0.054 0.888 0.275 0.054 0.890 0.280 0.054 0.888 0.275 0.054
terC terC 0.864 0.124 0.038 0.891 0.179 0.038 0.864 0.124 0.038 0.891 0.179 0.038
terD terD 0.791 0.188 0.044 0.855 0.161 0.045 0.820 0.188 0.044 0.829 0.161 0.045
terZ terZ 0.778 0.115 0.045 0.802 0.136 0.045 0.805 0.115 0.045 0.802 0.136 0.045
zitB zitB 0.872 0.171 0.051 0.859 0.145 0.051 0.872 0.171 0.051 0.859 0.145 0.051
zntA zntA 0.850 0.203 0.037 0.853 0.209 0.037 0.850 0.203 0.037 0.853 0.209 0.037
cspA cspA 0.867 0.220 0.033 0.869 0.218 0.033 0.874 0.220 0.033 0.869 0.218 0.033
cspB cspB 0.801 0.162 0.045 0.787 0.134 0.044 0.801 0.162 0.045 0.787 0.134 0.044
dnaK dnaK 0.839 0.148 0.042 0.844 0.157 0.042 0.839 0.148 0.042 0.844 0.157 0.042
groEL groEL 0.832 0.149 0.054 0.834 0.152 0.054 0.832 0.149 0.054 0.834 0.152 0.054
groES groES 0.886 0.275 0.035 0.859 0.203 0.034 0.907 0.275 0.035 0.862 0.203 0.034
grpE grpE 0.881 0.260 0.043 0.857 0.213 0.042 0.881 0.260 0.043 0.857 0.213 0.042
hrcA hrcA 0.824 0.140 0.040 0.848 0.193 0.040 0.838 0.140 0.040 0.867 0.193 0.040
Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).
Genes Abbr.
Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
bglH bglH 0.782 0.158 0.040 0.853 0.174 0.041 0.808 0.158 0.040 0.853 0.174 0.041
bglP bglP 0.777 0.203 0.048 0.813 0.117 0.048 0.835 0.203 0.048 0.813 0.117 0.048
glnA glnA 0.846 0.173 0.042 0.813 0.108 0.041 0.846 0.173 0.042 0.813 0.108 0.041
glnR glnR 0.813 0.107 0.050 0.804 0.123 0.050 0.856 0.107 0.050 0.882 0.123 0.050
arcA arcA 0.890 0.326 0.075 0.879 0.304 0.075 0.890 0.326 0.075 0.879 0.304 0.075
arcB arcB 0.852 0.160 0.049 0.890 0.223 0.049 0.865 0.160 0.049 0.890 0.223 0.049
cydA cydA 0.755 0.124 0.064 0.836 0.158 0.065 0.784 0.124 0.064 0.836 0.158 0.065
cydB cydB 0.887 0.308 0.045 0.830 0.193 0.044 0.887 0.308 0.045 0.830 0.193 0.044
narH narH 0.799 0.305 0.058 0.852 0.293 0.059 0.821 0.305 0.058 0.866 0.293 0.059
narI narI 0.884 0.179 0.063 0.813 0.193 0.062 0.828 0.179 0.063 0.813 0.193 0.062
narJ narJ 0.819 0.193 0.043 0.796 0.147 0.042 0.819 0.193 0.043 0.796 0.147 0.042
ahpC ahpC 0.861 0.195 0.045 0.911 0.195 0.046 0.861 0.195 0.045 0.911 0.195 0.046
ahpF ahpF 0.883 0.200 0.076 0.819 0.275 0.075 0.882 0.200 0.076 0.822 0.275 0.075
fnr fnr 0.853 0.161 0.037 0.908 0.172 0.038 0.853 0.161 0.037 0.908 0.172 0.038
katA katA 0.814 0.152 0.064 0.796 0.135 0.064 0.814 0.152 0.064 0.814 0.135 0.064
katE katE 0.811 0.158 0.039 0.820 0.162 0.039 0.826 0.158 0.039 0.820 0.162 0.039
oxyR oxyR 0.842 0.169 0.041 0.886 0.139 0.041 0.858 0.169 0.041 0.885 0.139 0.041
perR perR 0.825 0.190 0.058 0.795 0.131 0.057 0.825 0.190 0.058 0.795 0.131 0.057
proV proV 0.795 0.110 0.048 0.831 0.180 0.049 0.795 0.110 0.048 0.831 0.180 0.049
proX proX 0.833 0.192 0.047 0.788 0.102 0.046 0.833 0.192 0.047 0.788 0.102 0.046
phoA phoA 0.881 0.181 0.045 0.879 0.177 0.045 0.881 0.181 0.045 0.879 0.177 0.045
Table S10b. Validation of predictive models for metabolic potentials (normalized data) of key functional genes based on the artificial neural network (ANN) (continued).
Genes Abbr.
Bray-Curtis similaritya (Normalized data - values between 1 and 100 according to the formula listed in Materials and Methods)
ENVb TAXA
c
Occurrenced (> 50%) Occurrence (100%) Occurrence (> 50%) Occurrence (100%)
Predicted
vs
Observed
Null model
(Mean)e
Null model
(Minimum)f
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
Predicted
vs
Observed
Null model
(Mean)
Null model
(Minimum)
phoB phoB 0.838 0.196 0.043 0.867 0.153 0.043 0.838 0.196 0.043 0.867 0.153 0.043
pstA pstA 0.892 0.203 0.049 0.909 0.122 0.049 0.906 0.203 0.049 0.909 0.122 0.049
pstB pstB 0.839 0.177 0.036 0.854 0.208 0.037 0.839 0.177 0.036 0.854 0.208 0.037
pstC pstC 0.872 0.176 0.041 0.874 0.181 0.041 0.872 0.176 0.041 0.874 0.181 0.041
pstS pstS 0.817 0.113 0.056 0.820 0.119 0.056 0.817 0.113 0.056 0.820 0.119 0.056
clpC clpC 0.905 0.269 0.032 0.897 0.252 0.031 0.905 0.269 0.032 0.897 0.252 0.031
ctsR ctsR 0.788 0.199 0.055 0.824 0.171 0.056 0.788 0.199 0.055 0.824 0.171 0.056
obgE obgE 0.840 0.168 0.042 0.843 0.174 0.042 0.842 0.168 0.042 0.844 0.174 0.042
ABC antibiotic
transporter ABCat 0.828 0.168 0.046 0.847 0.194 0.046 0.840 0.168 0.046 0.847 0.194 0.046
MatE antibiotics MatE 0.866 0.239 0.048 0.866 0.212 0.048 0.916 0.239 0.048 0.887 0.212 0.048
MFS antibiotics MFS 0.873 0.216 0.044 0.876 0.221 0.044 0.873 0.216 0.044 0.876 0.221 0.044
SMR antibiotics SMR 0.808 0.246 0.036 0.878 0.181 0.038 0.814 0.246 0.036 0.878 0.181 0.038
Mex Mex 0.872 0.205 0.071 0.809 0.144 0.070 0.865 0.205 0.071 0.867 0.144 0.070
beta-lactamase lac 0.835 0.196 0.035 0.833 0.192 0.035 0.835 0.196 0.035 0.833 0.192 0.035
class A
beta-lactamase lacA 0.816 0.161 0.033 0.882 0.228 0.034 0.832 0.161 0.033 0.833 0.228 0.034
class C
beta-lactamase lacC 0.828 0.230 0.055 0.796 0.156 0.054 0.838 0.230 0.055 0.796 0.156 0.054
Tet Tet 0.795 0.176 0.043 0.865 0.221 0.045 0.795 0.176 0.043 0.871 0.221 0.045
Van Van 0.772 0.161 0.120 0.737 0.173 0.119 0.809 0.161 0.120 0.756 0.173 0.119
a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted metabolic potential of functional genes.
b Models were constructed without the information of microbial abundances of dominant phyla.
c Models were constructed with the information of microbial abundances of dominant phyla.
d The occurrence shows the percentage of the total samples where the probes of a given gene were detected in.
e This null model is to set all predicted metabolic potentials equal to the average value across all samples.
f This null model is to set all metabolic potentials equal to the minimum observed value.
Table S11. Predictive equations and functional parameters that provide the best prediction for relative abundances of dominant microbial taxa based on the artificial neural network (ANN).
Taxa Abbr. Functional
parameters Predictive equations
Bray-Curtis similaritya
Train P valueb Validation P value Average P value
Euryarchaeota Eury pH, EC, Fe2+ 95.9*EC*Fe2+ + 26.7*pH*pH*Fe2+ - 134*pH*Fe2+ - 13.6*EC*EC*Fe2+ 0.712 < 0.001
0.598 0.027
0.685 < 0.001
Acidobacteria Acido Fe3+, Gamma 2.44/(Fe3+*Fe3+ - 3.27) -1.39/(55.2 - 2*Fe3+*Gamma) 0.728 < 0.001
0.650 0.021
0.709 < 0.001
Actinobacteria Actino Eury, Fe3+ - 68/(Eury*Eury - 478) - 0.816/(1010*Eury*Eury - 476*Eury) 0.784 < 0.001
0.660 0.047
0.694 < 0.001
Firmicutes Firm Actino, Nitro 1.08*Actino + 0.046*Nitro + -9.87*Nitro/(51.7*Actino + Nitro*Nitro - 50.1*Nitro) 0.746 < 0.001
0.649 0.014
0.725 < 0.001
Nitrospira Nitro pH, Cu, Zn Cu -5.67/(Cu - 2.06) + 61*Zn/pH - 23*Zn 0.778 < 0.001
0.692 0.008
0.757 < 0.001
Alphaproteobacteria Alpha pH, TOC, Beta pH + 2.03/(3.93 - Beta) - 0.011/(1.49*TOC - 0.556) 0.657 < 0.001
0.650 0.023
0.657 < 0.001
Betaproteobacteria Beta pH, Fe2+, Eury 29.5*Eury + 13.4*pH + 32.7*pH*Fe2+ + 4.88*Eury*pH*pH - 71.3*Fe2+ - 24.9*pH*Eury 0.807 < 0.001
0.701 0.007
0.783 < 0.001
Gammaproteobacteria Gamma Fe2+, Beta 21.6 + 3.02/(Beta - 1.4) - 10.2/(Beta - 17.6) - 0.219*Beta 0.865 < 0.001 0.766 < 0.001 0.801 < 0.001
a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted relative microbial abundances.
b The observed values were randomly permutated 10,000 times and compared with predicted values. The significance of the Bray-Curtis similarity is defined as the number of resamples with greater similarity by random permutations than that with
observed values divided by total number of resamples.
Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN).
Genes Abbr. Functional
parameters Predictive equations
Bray-Curtis similaritya
Train P valueb Validation P value Average P value
aclB aclB Al, Acido 232961 + 14107*Al + 1130*Al*Acido*Acido*Acido -
186*Acido*Acido*Acido*Acido - 8207*Acido*Al*Al 0.950 < 0.001 0.920 0.045 0.931 < 0.001
CODH CODH pH, DO, TOC 508125 + 191961*pH*pH/(87.8 + 1.92*pH*pH - 9.03*TOC - 36.2*DO*DO) 0.975 < 0.001
0.965 0.042
0.969 < 0.001
Pcc Pcc pH, TOC 1176937 + (241871 - 100000*pH)/(1.89*TOC + pH*TOC - pH) 0.970 < 0.001 0.950 0.031 0.964 < 0.001
RubisCo RubisCo DO, Alpha 643499 + 18027*Alpha + 12.9*DO*Alpha*Alpha*Alpha - 1764*Alpha*Alpha 0.965 < 0.001
0.945 0.024
0.956 < 0.001
nifH nifH DO 3066210 + 65545*DO + 1684063/DO - 410346/DO*DO 0.965 < 0.001 0.919 0.045 0.931 < 0.001
gdh gdh Fe3+, Nitro 200000*Fe3+ + 1182*Nitro + 1242/(2.23 - Nitro) - 35179*Fe3+*Fe3+ 0.941 < 0.001
0.934 0.026
0.936 < 0.001
ureC ureC Firm, Beta 1284444 + 2890/(0.013 - Firm) + (28972159 - 346594*Beta)/(Beta - 84.6) 0.956 < 0.001 0.932 0.066 0.945 < 0.001
amoA amoA pH 1031769 + 1000000*pH - 45446/(3.80 - pH) - 199433*pH*pH 0.976 0.010
0.951 0.012
0.962 < 0.001
narG narG nirK 428136 + 2.04*nirK + (547048 - 0.912*nirK)/(0.007*nirK - 5.59) - 0.006*nirK*nirK 0.982 < 0.001 0.960 0.038 0.971 < 0.001
nirK nirK Cu, Acido, Gamma 671138 + 203679*Acido + 164056*Cu + 0.964*Gamma*Gamma*Gamma -
3641*Cu*Gamma - 76175*Acido*Acido 0.971 < 0.001
0.926 0.002
0.938 < 0.001
nirS nirS Eury, Gamma, narG 705537 + 0.038*Eury*narG + 0.0002*narG*Gamma*Gamma - 2362*Gamma -
38255*Eury - 181*Gamma*Gamma - 760*Eury*Eury 0.940 < 0.001 0.935 0.020 0.939 < 0.001
norB norB Fe2+, Cd, Eury 159990 + 172363*Cd + 16726*Fe2+ + 1462*Eury + 100439*Fe2+*Fe2+*Cd -
357357*Fe2+*Cd 0.943 < 0.001
0.929 0.015
0.932 < 0.001
nosZ nosZ pH 4299268*pH + 49942917/pH - 39083676/(pH*pH) - 22315156 - 288282*pH*pH 0.934 < 0.001 0.917 0.042 0.930 < 0.001
nasA nasA Fe2+, Acido, NiR 518458 + 61193*Acido + 0.000006*NiR*NiR - 2.46*NiR - 61193*Fe2+*Acido*Acido 0.913 < 0.001
0.912 0.008
0.912 < 0.001
NiR NiR Acido 224601 + 137056*Acido + 26674*Acido*Acido*Acido -
1765*Acido*Acido*Acido*Acido - 113093*Acido*Acido 0.946 0.003 0.919 0.005 0.925 < 0.001
nirA nirA pH, nirB 60902 + 0.304*nirB - 2733*pH/(0.00001*nirB - pH) 0.914 0.002
0.895 0.033
0.900 < 0.001
nirB nirB pH 250137*pH + 86173*pH*pH + 10915*pH*pH*pH*pH - 276426 - 75289*pH*pH*pH 0.876 0.002 0.863 0.033 0.866 < 0.001
napA napA Pb, Acido 453943 - 152*Acido/(0.013 - 0.341*Pb) 0.945 0.003
0.944 0.034
0.944 < 0.001
nrfA nrfA pH 585744*pH - 114373*pH*pH 0.931 0.009 0.929 0.026 0.929 < 0.001
Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).
Genes Abbr. Functional
parameters Predictive equations
Bray-Curtis similaritya
Train P valueb Validation P value Average P value
phytase phytase EC, As 9590644 + 70474754*EC*As + 709869*EC*EC + 53127*As*As - 5191343*EC -
132295841*As - 9389329*As*EC*EC 0.868 0.015
0.775 0.020
0.797 0.011
ppk ppk DO 844411 + 137339/(DO*DO) + (51307*DO - 43871)/(0.968*DO*DO*DO -
DO*DO*DO*DO) 0.974 0.005 0.936 0.011 0.912 < 0.001
ppx ppx pH 1071301 + (35959*pH*pH - 11965*pH*pH*pH)/(pH - 3.02) 0.965 < 0.001
0.948 0.045
0.961 < 0.001
aprA aprA EC, Acido 96020459*EC + 425725*EC*Acido + 2284143*EC*EC*EC +
15880*Acido*Acido*Acido - 118817246 - 1574124*Acido - 25690773*EC*EC 0.964 < 0.001 0.958 0.003 0.959 < 0.001
aprB aprB Fe2+ 170011 - 3967/(8.93 - 3.44*Fe2+) - 13422*Fe2+ 0.916 0.003
0.907 0.014
0.914 0.002
dsrA dsrA DO, Al, P 1854633 + 39512/(0.684 + 0.035*DO*DO*DO - DO) - 12937*DO 0.970 < 0.001 0.954 0.009 0.966 < 0.001
dsrB dsrB EC, As 742540*EC + (4891561*EC + 543507*EC*EC*EC - 3668671 -
45292*EC*EC*EC*EC - 2445781*EC*EC)/As - 1227621 - 86116*As 0.954 0.010
0.951 0.013
0.953 0.004
sox sox Pb, Gamma 586406 + 100000*Pb + 1473*Gamma + 4391/(Gamma - 2.98) 0.949 < 0.001 0.946 0.011 0.948 < 0.001
Fe-S cluster binding
protein fes Actino
5970955 + 31584*Actino + 20213/Actino - 19.8/(Actino*Actino) + 2803/(1.94*Actino
- 0.134) 0.980 < 0.001
0.974 0.026
0.979 < 0.001
ferredoxin fer Fe2+ 51937 + 98.3/(0.005 + 0.701*Fe2+*Fe2+ - 0.167*Fe2+ - 0.539*Fe2+*Fe2+*Fe2+) 0.907 < 0.001 0.828 0.014 0.847 < 0.001
ferredoxin
oxidoreductase fero Actino, fes
4293408 + 1368359*Actino + 1.01e-6*fes*fes + 6.74e-21*fes*fes*fes*fes +
0.169*fes*Actino*Actino + 5.47e-8*Actino*fes*fes - 3.37*fes - 0.547*Actino*fes -
1.35e-13*fes*fes*fes - 421588*Actino*Actino - 1.69e-8*Actino*Actino*fes*fes
0.925 < 0.001
0.901 0.012
0.918 < 0.001
NADH ubiquinone
oxidoreductase NADH Actino, quio, fes 12816 + 2.04*fes 0.989 < 0.001 0.979 < 0.001 0.980 < 0.001
terminal quinol
oxidase quio Actino, fes 23382*Actino + 3.75*fes - 488737 0.987 < 0.001
0.978 < 0.001
0.980 < 0.001
cytochrome cyt Fe3+, P 1834354*Fe3+ + 1000000*Fe3+*P - 62.1/(P*P) - 1434077 - 2956422*P -
336122*Fe3+*Fe3+ 0.912 0.008 0.908 0.009 0.902 0.016
hydrogenase hyd pH 1137811 - 1117019/pH - 169985*pH 0.931 0.006
0.913 0.042
0.918 0.024
Ni-Fe hydrogenase NFhyd pH 26001 + 35284*pH + 1250/(3.463 - pH) - 3057/(3.1 - pH) - 1167*pH*pH*pH 0.904 0.002 0.878 0.003 0.884 < 0.001
glycosyl transferase glyt Actino 167223 + 11945/(151*Actino - 4.61) + 79.4/(0.614*Actino - 0.043) 0.913 0.010
0.885 0.018
0.892 < 0.001
ABC transporter ABCt pH, DO 50289301 + 5156265/(5.43*pH - pH*DO) 0.974 0.011 0.973 0.012 0.974 0.004
Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).
Genes Abbr. Functional
parameters Predictive equations
Bray-Curtis similaritya
Train P valueb Validation P value Average P value
silA silA Gamma 42797 + 6873/(Gamma - 4.32) + 3619/(0.062*Gamma - 4.35) 0.829 0.006
0.811 0.041
0.816 < 0.001
silC silC Firm 403314+ 970*Firm + -49358/(Firm - 11) + -2067/(10.2*Firm*Firm - 9.96*Firm) 0.948 0.001 0.933 0.053 0.937 < 0.001
silP silP Acido, Gamma 84496 + 123593*Acido + 23245*Acido*Acido*Acido -
48225*Acido*Acido*Acido*Acido - 328429*Acido*Acido 0.943 < 0.001
0.923 0.009
0.928 < 0.001
al al Actino 336538 + 50386*Actino/(0.21 - 5.92*Actino) 0.928 0.010 0.923 0.021 0.924 0.007
aoxB aoxB DO 275300 + 8228/DO - 1572*DO 0.972 0.028
0.941 0.037
0.949 0.005
arsA arsA SO42-, Acido
26815 + 10000*Acido + 32.7/(Acido - 0.451*Acido*Acido*Acido -
1.21*Acido*Acido) - 7268*Acido*Acido 0.831 < 0.001 0.807 0.001 0.823 < 0.001
arsB arsB P, arsM 84471 + (0.014*arsM - 579840 - 9.06e-11*arsM*arsM)/(P - 1.12) 0.956 < 0.001
0.909 0.002
0.921 < 0.001
arsC arsC pH, Firm 1190543 + 155*Firm*Firm - 35325/(Firm - 5.37) + (1195 - 393*pH)/Firm - 42731*pH 0.986 0.001 0.973 0.002 0.977 < 0.001
arsM arsM pH, DO 70000000 + (17476356 + 20185331*DO - 7970875*pH)/(DO - 0.162) 0.977 0.003
0.970 0.072
0.976 < 0.001
cadA cadA pH, Nitro 1353931 + 71.8*Nitro*Nitro + 15345/(5.54*Nitro - 4.74 - pH) - 100000*pH 0.974 < 0.001 0.949 0.003 0.956 < 0.001
cadBD cadBD Gamma 113340 + 189*Gamma*Gamma + 677/(0.282 - 0.002*Gamma*Gamma) -
4031*Gamma - 1.69*Gamma*Gamma*Gamma 0.937 < 0.001
0.889 0.003
0.901 < 0.001
czcA czcA Cd, Eury 1350240 + 1491*Eury + (3.11 - 415972*Cd)/(Eury - 12.6*Cd) - 572026*Cd 0.981 0.047 0.936 0.023 0.948 0.010
czcC czcC pH, Zn, Cd (2321245*Zn - 3460027)/(24.2*Zn - 35.3) 0.866 0.002
0.833 0.021
0.858 0.001
czcD czcD pH, Zn, Cd 2742379 - 298027/Zn + 1937665*Zn*Zn - 3086079*Zn - 100000*pH*Zn -
367520*Zn*Zn*Zn 0.966 < 0.001 0.965 0.028 0.965 < 0.001
corC corC DO, EC, Alpha 284882 + 14513*DO*DO + 7243*DO*EC*Alpha - 102100*DO - 21730*DO*Alpha -
36.9*Alpha*Alpha*Alpha 0.884 0.002
0.869 0.038
0.872 < 0.001
chrA chrA pH, Pb 1949385 + 949385*pH*Pb - 203945*pH - 2609204*Pb 0.969 < 0.001 0.963 0.026 0.967 < 0.001
copA copA Fe3+, Beta, cueO 307566 + 1429111*Fe3+ + 23.6*cueO + 3.2e-8*cueO*cueO*cueO - 0.002*cueO*cueO -
228820*Fe3+*Fe3+ 0.985 < 0.001
0.972 0.001
0.975 < 0.001
cueO cueO Al, cusA 20182 + 53375/(9.85 - 2e-8*cusA*cusA) 0.799 0.002 0.632 0.071 0.741 0.002
cusA cusA Cu, Acido 42529+ 962/(4.53 - 72.5*Acido) 0.914 0.006 0.872 0.019 0.882 0.002
Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).
Genes Abbr. Functional
parameters Predictive equations
Bray-Curtis similaritya
Train P valueb Validation P value Average P value
mer mer As, P 816722 + 190355*P + 27.4/(0.916*P - 0.036) - 2881875*As*P 0.932 0.006 0.923 0.068 0.925 0.001
merB merB TOC, Eury (2695077*Eury - 32425627)/(71.1*Eury - 837) 0.879 0.007
0.834 0.018
0.847 0.003
merP merP As, mer 113795 + 0.121*As*mer + 2.03e-7*mer*mer + 4.85e-13*As*mer*mer*mer - 0.203*mer
- 15146*As - 2.42e-19*As*mer*mer*mer*mer - 3.63e-7*As*mer*mer 0.907 < 0.001 0.878 0.030 0.886 < 0.001
nreB nreB DO, Al, Pb (153719 - 4213612*Pb)/(1.78 - 45.9*Pb) 0.879 < 0.001
0.865 0.021
0.869 0.000
pbrA pbrA Eury, Actino,Pb 36297 + 61190*Pb + 6318*Actino + 17.2*Eury*Eury - 2131*Eury*Actino -
359701*Pb*Pb*Pb 0.918 < 0.001 0.913 0.004 0.914 < 0.001
pbrT pbrT SO42-
4060888 + 2626557*SO42-*SO4
2- + 48640*SO42-*SO4
2-*SO42-*SO4
2- + 198/(1.82*SO42-
- 6.66) - 5274654*SO42- - 583679*SO4
2-*SO42-*SO4
2- 0.850 0.004
0.801 0.012
0.812 0.004
tehB tehB As, P, terD 456011 + P*terD + (0.182*As*terD + 100000*As*As - 109482*As)/P - 600000*P 0.870 < 0.001 0.774 0.031 0.796 < 0.001
terC terC DO, S, Fe3+ 740223 + (307529*DO*S - 922586*DO)/(7.44*DO - 1.47 - DO*DO) 0.971 0.003
0.957 0.006
0.960 < 0.001
terD terD Al, Cu, Acido 961472 - 74.5/Acido + 226398*Al*Cu + 298408*Cu*Cu - 86290*Al - 594027*Cu -
100000*Al*Cu*Cu 0.985 < 0.001 0.960 < 0.001 0.966 < 0.001
terZ terZ pH, Eury 182167*pH + 23708*Eury - 144660 - 8700*pH*Eury - 6455*pH*pH*pH 0.922 0.002
0.917 0.015
0.918 < 0.001
zitB zitB Actino, Firm, zntA 33082 + 7649*Actino*Actino + 1.01e-13*zntA*zntA*zntA - 0.032*Actino*zntA 0.961 < 0.001 0.913 < 0.001 0.925 < 0.001
zntA zntA Zn, Firm 697816 + 21491*Zn - 3427*Zn/(Firm - 0.046) 0.953 0.003
0.938 0.008
0.949 0.001
cspA cspA Fe3+, Gamma 13182 + 10000*Fe3+ + (10000*Fe3+ - 36835)/(Gamma - 4.05) 0.917 < 0.001 0.875 0.037 0.906 < 0.001
cspB cspB pH 26739 - 462/(2.08 - pH) 0.878 < 0.001
0.818 0.050
0.833 < 0.001
dnaK dnaK DO, EC 521331 - 130258/(20*DO - 22.6) 0.947 < 0.001 0.918 0.043 0.940 < 0.001
groEL groEL groES 3603648 - 34228689570/groES + 0.001*groES*groES - 96.4*groES - 4.64e-
9*groES*groES*groES 0.952 0.003
0.916 0.019
0.925 0.001
groES groES As, Cd, Acido 96478 - 715/As - 10937*Acido - 23920*As 0.962 0.002 0.907 0.003 0.922 < 0.001
grpE grpE As, groEL 53703153 - 10454992600000/groEL + 8.91e-5*groEL*groEL + 7.5e17/(groEL*groEL) -
114*groEL 0.967 < 0.001
0.949 0.023
0.962 < 0.001
hrcA hrcA pH, Acido 5386545*pH + 204351*pH*pH*pH - 3861904 - 68858*Acido - 1860173*pH*pH 0.964 0.018 0.933 0.014 0.941 0.029
Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).
Genes Abbr. Functional
parameters Predictive equations
Bray-Curtis similaritya
Train P valueb Validation P value Average P value
bglH bglH P, Beta 35250+ 250*P*P*Beta*Beta - 403*P*P*P*Beta*Beta 0.882 0.001
0.880 0.040
0.882 0.003
bglP bglP Acido 74057 + 87020*Acido + 156039*Acido*Acido*Acido -
31786*Acido*Acido*Acido*Acido - 222839*Acido*Acido 0.935 < 0.001 0.906 0.016 0.928 < 0.001
glnA glnA pH 2421028 + 463/(pH - 3.52) - 28547*pH*pH 0.982 0.002
0.975 0.007
0.976 < 0.001
glnR glnR TOC, Eury, Acido 197839 + 35819/TOC + 238469*TOC*Acido + 5004*TOC*Eury - 100000*Acido -
100000*Acido*TOC*TOC 0.950 < 0.001 0.912 0.002 0.922 < 0.001
arcA arcA cydB, narH 1.21*narH + 0.146*cydB + 20028857980/narH - 265934 0.950 < 0.001
0.930 < 0.001
0.935 < 0.001
arcB arcB pH, DO (14146781 - 49361*pH - 19318332*DO)/(247 - 329*DO) 0.918 0.008 0.866 0.017 0.905 0.001
cydA cydA Fe3+, Eury, arcB 68721 + 15787*Fe3+ + 1542*Eury + 0.492*arcB/(0.504 + Eury) 0.933 < 0.001
0.926 0.010
0.928 < 0.001
cydB cydB Zn, cydA, narH 129718 + 1.46*cydA + 0.764*narH*narH/cydA - 1.59*narH 0.955 < 0.001 0.945 < 0.001 0.947 < 0.001
narH narH pH, DO 165785 + (100000*pH*DO - 256025*pH*pH)/(209 - 42.3*DO) 0.890 0.002
0.877 0.079
0.886 < 0.001
narI narI Cu, Nitro 585479 + 2740*Cu*Nitro + 217156*Cu*Cu - 231782*Cu - 90.3*Nitro*Nitro -
53473*Cu*Cu*Cu 0.975 < 0.001 0.955 0.007 0.960 < 0.001
narJ narJ pH, DO 104334 + 29712*DO + 110*pH*DO*DO*DO - 5813*DO*DO 0.911 0.009
0.881 0.046
0.904 0.010
ahpC ahpC TOC, ahpF 5491077 + 313927*TOC + 3.83e-5*ahpF*ahpF - 22.2*ahpF - 2.05e-
11*ahpF*ahpF*ahpF - 170148*TOC*TOC 0.976 < 0.001 0.964 0.027 0.973 < 0.001
ahpF ahpF Eury, katE 26198241 + 8.08e-6*katE*katE - 867392672100/(katE - 1000000) - 28.1*katE 0.963 < 0.001
0.931 0.007
0.955 < 0.001
fnr fnr pH, ahpF 28.4*ahpF + 2742781560000/ahpF - 11621325 - 313758*pH - 1.52e-5*ahpF*ahpF 0.985 < 0.001 0.975 < 0.001 0.978 < 0.001
katA katA pH, DO 97242*pH + (165326- 26217*pH*pH)/DO 0.927 0.021
0.915 0.044
0.918 0.002
katE katE pH, Beta 1744706 - 104681*pH/(Beta - 36.1) - 78046*pH 0.966 < 0.001 0.956 0.016 0.964 < 0.001
oxyR oxyR pH, Fe2+, Eury 704238 + 19746*Eury + 12592*Fe2+ - 5938*pH*Eury - 155*Eury*Eury 0.975 < 0.001
0.958 0.013
0.967 < 0.001
perR perR pH, DO 31987 + 42136/(DO*DO - 5.94*DO) 0.825 0.002 0.823 0.057 0.824 < 0.001
proV proV SO42-
116643*SO42- + 986/(9 + SO4
2-*SO42- - 6*SO4
2-) - 234/(8.71 + SO42-*SO4
2- - 6*SO42-) -
99930 0.942 0.027 0.922 0.038 0.937 0.002
proX proX DO, Cu 93424 + 1762*DO - 931/(0.469*DO - 0.39) 0.911 0.001 0.909 0.029 0.909 < 0.001
phoA phoA pstC 25907 + 0.362*pstC + 9997/(87.7 + 1e-10*pstC*pstC - 0.0002*pstC) 0.980 < 0.001 0.948 0.008 0.956 < 0.001
Table S12. Predictive equations and functional parameters that provide the best prediction for functional metabolic potentials based on the artificial neural network (ANN) (continued).
Genes Abbr. Functional
parameters Predictive equations
Bray-Curtis similaritya
Train P valueb Validation P value Average P value
phoB phoB pH, P 14430174 - 12263863/pH + 531416*pH*pH - 4765475*pH 0.959 0.015 0.951 0.041 0.953 0.010
pstA pstA Eury, pstC, pstS 1541*Eury + 3.84*pstS + 1.04*pstC - 699470 - 2.29e-6*pstC*pstS 0.982 < 0.001
0.961 0.002
0.971 < 0.001
pstB pstB pH, P 4612569 + 10199/(pH - 1.96) - 155833*pH 0.981 0.004 0.974 0.011 0.976 0.002
pstC pstC pH, Eury 627618 + 378470*pH + 27194*Eury - 7403*pH*Eury - 65401*pH*pH 0.970 < 0.001
0.966 0.042
0.967 < 0.001
pstS pstS pH, DO, P 645118 + 360948*P + 33388*DO - 100000*pH - 1845*P*DO*DO*DO 0.896 0.008 0.889 0.041 0.894 0.001
clpC clpC ctsR 400000 + 1.61*ctsR - 5911987830/ctsR - 2.17e-6*ctsR*ctsR 0.974 < 0.001
0.961 < 0.001
0.964 < 0.001
ctsR ctsR pH, TOC 296028 - 30766/TOC - 10758*pH/(13.1 - 13.8*pH*TOC) - 71532*TOC 0.912 < 0.001 0.866 0.008 0.901 < 0.001
obgE obgE Fe2+, Al, Actino 2612455 + (490435*Fe2+ - 391327)/Al - 31709*Actino - 361380*Fe2+ 0.956 < 0.001
0.953 0.018
0.954 < 0.001
ABC antibiotic
transporter ABCat pH 268357 + 7868/(3.82*pH - 7.71) - 26174*pH 0.921 0.023 0.912 0.012 0.919 0.011
MatE antibiotics MatE Cd, Actino, Nitro 389315 + 3159*Nitro + 23655/(Nitro - 11.9) + (23655*Cd - 63.7*Nitro)/Actino -
1088625*Cd*Actino 0.986 < 0.001
0.925 0.017
0.941 < 0.001
MFS antibiotics MFS SMR, Mex 47580 + 0.601*SMR + (37107 + 2.15e-8*SMR*SMR - 0.059*SMR)/(6.92 - 4.06e-
6*SMR) 0.970 < 0.001 0.954 0.018 0.966 < 0.001
SMR antibiotics SMR Pb, Acido, Beta 1741816 + 741613*Acido + 203*Acido*Beta*Beta - 21910*Acido*Beta -
206028*Acido*Acido 0.986 < 0.001
0.981 0.026
0.985 < 0.001
Mex Mex Eury, ABCat 90394 + 0.031*Eury*ABCat + 20.8*Eury*Eury*Eury - 0.08*ABCat -
0.004*ABCat*Eury*Eury 0.867 0.010 0.831 0.012 0.841 < 0.001
beta-lactamase lac Al, Alpha 76626 + 2628*Al*Al*Al + 16.1*Al*Alpha*Alpha - 1222*Alpha - 8560*Al*Al 0.907 0.010
0.861 0.014
0.895 0.003
class A beta-
lactamase lacA pH, Nitro 319145*pH + 9936*Nitro + -7123/Nitro - 234250 - 4061*pH*Nitro - 48191*pH*pH 0.953 < 0.001 0.936 0.009 0.949 < 0.001
class C beta-
lactamase lacC pH, Acido
524718 + 2144746*Acido*Acido + 241702*pH*pH*Acido*Acido - 35611*pH -
1447706*pH*Acido*Acido 0.921 < 0.001
0.919 0.042
0.921 < 0.001
Tet Tet pH 709163 + 4117/(37.6*pH - 103) 0.951 0.018 0.939 0.083 0.942 0.012
Van Van EC, Beta 21198 + (1433 + 36.7*Beta)/(EC + 0.13*Beta*Beta - 3 - 2.28*Beta) 0.835 0.018 0.746 0.053 0.769 0.002
a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted metabolic potential of functional genes.
b The observed values were randomly permutated 10,000 times and compared with predicted values. The significance of the Bray-Curtis similarity is defined as the number of resamples with greater similarity by random permutations than that with
observed values divided by total number of resamples.
Table S13. Predictive equations and functional parameters that provide the best prediction for environmental properties based on the artificial neural network (ANN).
Environmental
properties Abbr.
Functional
parameters Predictive equations
Bray-Curtis similaritya
Train P valueb Validation P value Average P value
Dissolved Oxygen DO Fe2+ 1.69 + 0.024/(2.54 - Fe2+) + Fe2+/(4.17*Fe2+ - 13.7) 0.785 < 0.001
0.528 0.059
0.724 < 0.001
Total Organic Carbon TOC P (20.1*P*P - 3.97*P)/(16.8*P*P - 0.242 - 2.31*P) 0.927 < 0.001
0.829 0.002
0.853 < 0.001
Electrical Conductivity EC Fe3+ Fe3+ + 6.14/Fe3+ - 1.44 0.978 < 0.001
0.971 0.004
0.977 < 0.001
Sulfate SO42- TOC, EC 1.61 + 0.539*EC + 0.218/TOC - 0.093/(TOC*EC - 3*TOC) 0.978 < 0.001
0.965 < 0.001
0.975 < 0.001
Ferric ion Fe3+ pH, Fe2+ 2.45 + 1.61*Fe2+ - 0.562*pH*Fe2+ 0.968 < 0.001
0.963 0.003
0.967 < 0.001
Ferrous ion Fe2+ pH 7.97 + 0.474*pH*pH - 3.89*pH 0.753 < 0.001
0.734 0.011
0.744 < 0.001
Aluminum Al Fe3+ 0.761*Fe3+ + 0.092/(51.1*Fe3+ - 174) 0.897 < 0.001
0.866 0.002
0.888 < 0.001
Copper Cu Fe2+, Cd, P 8.34*Cd*P/(Cd + 0.198*Fe2+) 0.638 < 0.001
0.496 0.010
0.554 < 0.001
Zinc Zn Fe3+, Cd Fe3+ - 0.004/Cd - 0.702 0.813 < 0.001
0.802 0.016
0.805 < 0.001
Arsenic As pH, DO, Cd (0.104 + 2.34*Cd)/(pH + pH*DO - 2.75) 0.813 < 0.001
0.611 0.047
0.770 < 0.001
Cadmium Cd DO, Pb DO*Pb*Pb/(0.253 + DO*DO - DO) 0.722 < 0.001
0.567 0.053
0.652 < 0.001
Lead Pd pH, DO, Fe3+ 0.006*DO + 0.001*Fe3+*Fe3+*Fe3+*Fe3+ 0.641 0.004
0.628 0.041
0.638 0.001
Phosphorus P Al, Cd 0.048 + 0.041*Al + 5.09*Al*Cd - 8.45*Cd 0.762 < 0.001 0.748 < 0.001 0.757 < 0.001
a Bray-Curtis similarity with randomized permutation-based estimation was used to compare the observed with predicted environmental properties.
b The observed values were randomly permutated 10,000 times and compared with predicted values. The significance of the Bray-Curtis similarity is defined as the number of resamples with greater similarity by
random permutations than that with observed values divided by total number of resamples.
Table S14. Functional genes that reveled consistent or fluctuant relative metabolic potentials along the gradient of pH levels.
Category Subcategory Gene Abbreviations pH range
< 2.0 2.0 - 2.5 2.5 - 3.0 3.0 - 3.5 3.5 - 4.0 > 4.0
Carbon cycling Carbon fixation
aclB aclB 90.70 69.28 55.40 50.78 50.64 53.12
CODH CODH 1.69 2.90 20.70 43.06 10.98 5.68
Pcc Pcc 39.31 38.65 35.67 25.02 7.31 93.04
RubisCo RubisCo 83.25 76.79 85.18 90.13 94.13 96.82
Nitrogen cycling Ammonification ureC ureC 65.31 65.42 65.83 67.71 69.93 66.85
Denitrification nirK nirK 62.99 43.57 44.40 55.63 65.92 89.04
Energy process Electron transport
Fe-S cluster binding protein fes 51.68 60.40 69.80 65.28 69.44 57.42
ferredoxin fer 35.28 35.05 36.67 40.66 30.93 69.42
NADH ubiquinone oxidoreductase NADH 61.68 60.40 69.80 65.28 69.44 57.42
terminal quinol oxidase quio 61.62 59.43 69.03 64.40 69.07 56.49
Hydrogenase Ni-Fe hydrogenase Nfhyd 25.32 30.54 31.02 59.18 35.24 30.64
Membrane transport EPS glycosyl transferase glyt 53.69 54.32 62.79 58.25 61.25 50.99
Metal resistance
Ag silA silA 37.01 38.40 40.48 42.32 44.98 42.45
silC silC 41.39 34.30 33.40 33.35 33.75 34.77
As arsA arsA 68.34 73.17 67.11 72.26 72.56 72.83
Cd cadBD cadBD 37.31 28.54 60.33 40.59 37.35 39.88
Cd_Co_Zn
czcA czcA 20.47 27.96 29.40 31.70 29.89 30.43
czcC czcC 64.25 55.37 76.49 71.08 71.55 71.96
czcD czcD 37.50 76.65 80.97 46.80 81.82 93.41
Cu cueO cueO 81.74 89.71 92.89 96.18 96.14 95.79
cusA cusA 81.43 89.55 92.76 96.10 96.06 95.71
Hg
mer mer 49.63 95.97 99.01 99.41 99.76 99.97
merB merB 47.61 64.54 64.63 67.61 64.90 63.49
merP merP 33.26 89.90 97.35 98.43 99.35 99.92
Ni nreB nreB 1.79 7.99 39.71 92.23 64.10 51.87
Pb pbrA pbrA 54.47 82.31 70.83 64.55 68.31 73.56
Te terD terD 53.10 23.97 55.05 60.79 33.66 7.26
Zn zitB zitB 71.92 92.73 70.69 61.81 64.09 60.32
Stress response
Cold cspB cspB 25.88 37.96 33.86 33.09 32.80 32.67
Heat
dnaK dnaK 39.81 86.46 96.86 99.07 99.84 99.98
groEL groEL 31.54 34.88 17.05 39.18 69.70 91.97
grpE grpE 89.23 93.45 94.09 73.13 16.68 5.53
hrcA hrcA 60.98 91.69 73.49 37.05 20.94 60.00
Glucose limitation
bglH bglH 5.11 18.46 31.42 37.51 46.43 50.21
cydA cydA 77.23 26.21 10.05 1.22 9.17 47.09
cydB cydB 74.23 29.90 18.21 9.63 10.60 47.96
narH narH 97.89 89.49 74.27 55.28 32.71 10.63
narI narI 46.69 78.77 99.49 97.20 72.44 52.38
narJ narJ 14.61 49.41 76.50 88.58 95.54 99.10
Oxygen stress
ahpC ahpC 82.83 78.48 82.87 84.35 86.78 88.59
ahpF ahpF 22.95 10.67 18.11 31.81 46.51 55.67
katA katA 63.86 69.28 90.33 98.81 82.59 45.69
katE katE 55.82 51.33 41.53 36.84 31.53 26.21
perR perR 19.20 61.52 87.64 96.08 99.31 99.93
Osmotic stress proX proX 82.17 66.28 89.47 96.66 99.41 99.94
Protein stress clpC clpC 64.00 67.99 76.60 76.35 76.18 76.09
ctsR ctsR 56.15 61.39 70.02 69.74 69.54 69.43
Radiation stress obgE obgE 60.19 76.87 90.13 95.71 98.62 96.95
Antibiotic resistance
Transporter Mex Mex 57.56 55.22 57.93 66.66 60.13 51.09
Beta-lactamases class C beta-lactamase lacC 24.03 21.79 12.08 3.73 24.86 59.47
other category Tet Tet 51.33 50.84 51.78 53.02 52.42 52.27
Van Van 30.01 27.33 27.49 27.46 27.45 27.45
Figure S1. The consensus networks of environmental (a) and taxonomic (b) variables generated by Bayesian
network inference.
Relative abundance (log10 (x+1))
0.0 0.5 1.0 1.5 2.0 2.5
Obs
erve
d va
lues
0.0
0.5
1.0
1.5
2.0
2.5
R2(Phylum, n=8) = 0.70
0.0 0.5 1.0 1.5 2.0 2.5
Obs
erve
d va
lues
0.0
0.5
1.0
1.5
2.0
2.5
R2 (Order, n=35) = 0.62
Predicted values0.0 0.5 1.0 1.5 2.0 2.5
Obs
erve
d va
lues
0.0
0.5
1.0
1.5
2.0
2.5
R2 (OTU, n=14) = 0.52
PhylumOrderOTU
Figure S2. The scatter plots show the cross-validation of predicted and observed values for relative microbial abundances at different taxonomic levels.
4 5 6 7 8
Obs
erve
d va
lues
4
5
6
7
8
Functional metabolic potentials (log10 (x))
R2 = 0.977
5.4 5.6 5.8 6.0 6.2
5.4
5.6
5.8
6.0
6.2
R2 = 0.967
5.0 5.5 6.0 6.5 7.0
5.0
5.5
6.0
6.5
7.0
R2 = 0.952
4.8 5.0 5.2 5.4 5.6 5.8 6.0 6.2
Obs
erve
d va
lues
4.8
5.0
5.2
5.4
5.6
5.8
6.0
6.2
R2 = 0.981
5.0 5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6
5.0
5.2
5.4
5.6
5.8
6.0
6.2
6.4
6.6
R2 = 0.997
4.5 5.0 5.5 6.0 6.5 7.0 7.5
4.5
5.0
5.5
6.0
6.5
7.0
7.5
R2 = 0.996
Predicted values
5.0 5.5 6.0 6.5 7.0 7.5 8.0
Obs
erve
d va
lues
5.0
5.5
6.0
6.5
7.0
7.5
8.0
R2 = 0.956
Predicted values 4 5 6 7
4
5
6
7
R2 = 0.998
Predicted values
4 5 6 7 8
4
5
6
7
8
R2 = 0.966
Carbon cyclingNitrogen cyclingPhosphorusSulfur cyclingEnergy processMembrane transportAntibiotic resistanceMetal resistanceStress response
Figure S3. The scatter plots show the cross-validation of predicted and observed values for functional metabolic potentials of different functional gene categories.
Functional metabolic potentials (log10 (x)) Functional metabolic potentials (log10 (x))
Euryarc
haeo
ta
Acidob
acter
ia
Actino
bacte
ria
Firmicu
tes
Nitrosp
ira
Alphap
roteo
bacte
ria
Betapro
teoba
cteria
Gammap
roteo
bacte
ria
Bra
y-C
urtis
sim
ilarit
y
0
60
80
100 TrainValidationAverage
70
90
All com
munity
Carbon
cyclin
g
Nitroge
n cyc
ling
Phosp
horus
cyclin
g
Sulfur
cyclin
g
Energy
proc
ess
Membra
ne tra
nspo
rt
Metal re
sistan
ce
Stress
resp
onse
Antibio
tic re
sistan
ce
Bra
y-C
urtis
sim
ilarit
y
0
80
100
90
85
95
a
b
Figure S4. Bray-Curtis similarity between predicted and observed values of relative microbial abundances (phylum level, a) and gene metabolic potentials of different functional categories (with relative abundance information of microbial phyla, b). The similarity of the overall microbial community composition was calculated based on these eight microbial phyla. Average includes the data sets for training and validation. Values are mean ± SE and the significances of the similarity were listed in supplementary tables.
Color Key: < 2.0 2.0 - 2.5 2.5 - 3.0 3.0 - 3.5 3.5 - 4.0 > 4.0 (pH range)
fero
cyt
hyd
ABCt
fero
cyt
hyd
ABCt
fero
cyt
hyd
ABCt
100
80
60
40
20
ABCat
MatE
MFS
SMR
lac
lacA
ABCat
MatE
MFS
SMR
lac
lacA
ABCat
MatE
MFS
SMR
lac
lacA
100
80
60
40
20
cspAgroES
bglP
glnA
glnRarcA
arcB
fnr
oxyR
proV
cspAgroES
bglP
glnA
glnRarcA
arcB
fnr
oxyR
proV
cspAgroES
bglP
glnA
glnRarcA
arcB
fnr
oxyR
proV100
80
60
40
20
a b
c d
sox
aprA
aprBdsrA
dsrB
sox
aprA
aprBdsrA
dsrB
sox
aprA
aprBdsrA
dsrB
100
80
60
40
20
Figure S5. The changes of relative metabolic potential of functional genes in sulfur cycling (a), stress response (b), energy process and membrane transport (c) and antibiotic resistance (d) along the gradient of pH levels. The metabolic potentials were normalized to relative values.
aclB
5.35
5.40
5.45
CODH
5.8
6.2
6.6
7.0M
ean
sign
al o
f met
abol
ic p
oten
tials
(log
10(x
))
Pcc
5.6
6.4
7.2
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
RubisCo
5.75
5.80
5.85
Observed valuesPredicted values
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
aM
ean
sign
al o
f met
abol
ic
pote
ntia
ls (l
og10
(x))
phytase
4.7
4.9
5.1
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
ppk
5.85
5.90
5.95
6.00
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
Observed valuesPredicted values
ppx
5.90
5.95
6.00
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
b
aprA
5.55
5.60
5.65
5.70aprB
5.1
5.2
5.3
5.4 dsrA
6.23
6.25
6.27
dsrB
6.05
6.10
6.15
Observed valuesPredicted values
sox
5.7
5.8
5.9
Mea
n si
gnal
of m
etab
olic
pot
entia
ls (l
og10
(x))
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
c
Figure S6. The comparison of predicted and observed metabolic potentials of different functional gene categories including carbon cycling (a), phosphorus (b) and sulfur cycling (c) along the gradient of pH levels. Values were mean ± SE.
nifH
Mea
n si
gnal
of m
etab
olic
pot
entia
ls (l
og10
(x))
6.54
6.57
6.60
6.63gdh
5.43
5.46
5.49
5.52ureC
5.91
5.94
5.97
6.00
amoA
6.24
6.30
6.36narG
6.15
6.18
6.21
nirK
5.85
5.88
5.91
5.94
nirS
5.85
5.91
5.97 norB
5.12
5.20
5.28
5.36 nosZ
5.28
5.36
5.44
5.52
nasA
5.40
5.48
5.56NiR
5.32
5.40
5.48 nirA
4.9
5.0
5.1
5.2
nirB
4.65
4.95
5.25 napA
5.65
5.70
5.75
nrfA
5.6
5.7
5.8
5.9
Observed valuesPredicted values
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
Figure S7. The comparison of predicted and observed metabolic potentials of nitrogen cycling along the gradient of pH levels. Values were mean ± SE.
fes
6.77
6.78
6.79
6.80fer
4.6
4.8
5.0 fero
4.98
5.04
5.10
NADH
7.08
7.09
7.10
7.11quio
7.34
7.35
7.36
7.37cyt
5.9
6.0
6.1
hyd
5.2
5.3
5.4
5.5
Observed valuesPredicted values
Nfhyd
4.9
5.1
5.3
Mea
n si
gnal
of m
etab
olic
pot
entia
ls (l
og10
(x))
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
glyt
5.15
5.20
5.25
5.30ABCt
7.69
7.70
7.71
7.72
Mea
n si
gnal
of m
etab
olic
pote
ntia
ls (l
og10
(x))
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
a
b
Figure S8. The comparison of predicted and observed metabolic potentials of different functional gene categories including energy process (a) and membrane transport (b) along the gradient of pH levels. Values were mean ± SE.
silA
4.55
4.60
4.65
4.70silP
4.80
4.90
5.00 al
5.48
5.50
5.52
5.54
aoxB
5.42
5.44
5.46
arsA
4.0
4.4
4.8
arsB
4.9
5.0
5.1
5.2 arsC
6.0
6.1
6.2
arsM
7.90
7.94
7.98cadA
5.94
6.00
6.06
6.12 cadBD
4.88
4.96
5.04
5.12 czcA
6.10
6.12
6.14
6.16
czcC
4.8
4.9
5.0
5.1czcD
5.0
5.5
6.0corC
5.1
5.2
5.3
5.4chrA
6.05
6.10
6.15
6.20
copA
6.36
6.38
6.40
6.42 cueO
4.0
4.2
4.4
4.6 cusA
4.4
4.6
4.8mer
5.7
5.8
5.9
merB
4.45
4.55
4.65
merP
4.8
4.9
5.0
5.1nreB
4.9
5.1
5.3pbrA
4.3
4.5
4.7
pbrT
4.5
4.7
4.9 tehB
5.70
5.75
5.80terC
5.88
5.89
5.90
5.91terD
5.88
5.91
5.94
terZ
5.1
5.3
5.5
Observed valuesPredicted values
zitB
4.77
4.83
4.89 zntA
5.79
5.82
5.85
5.88
silC
5.58
5.61
5.64
Mea
n si
gnal
of m
etab
olic
pot
entia
ls (l
og10
(x))
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
Figure S9. The comparison of predicted and observed metabolic potentials of metal resistance along the gradient of pH levels. Values were mean ± SE.
cspA
4.5
4.6
4.7
cspB
4.35
4.40
4.45
4.50 dnaK
5.66
5.70
5.74groEL
5.4
5.5
5.6
5.7groES
4.4
4.6
4.8
5.0
grpE
5.85
5.90
5.95
hrcA
5.95
6.00
6.05
bglH
4.5
4.6
4.7
bglP
4.75
4.85
4.95glnA
6.28
6.32
6.36
glnR
5.35
5.45
5.55arcA
4.85
4.90
4.95
5.00arcB
4.72
4.76
4.80
4.84cydA
5.04
5.12
5.20
5.28 cydB
5.28
5.34
5.40
narH
5.13
5.19
5.25narI
5.70
5.73
5.76
5.79narJ
5.04
5.10
5.16
5.22 ahpC
6.16
6.18
6.20
6.22ahpF
5.60
5.65
5.70
5.75
fnr
6.45
6.55
6.65katA
5.30
5.40
5.50katE
6.12
6.16
6.20
6.24 oxyR
5.84
5.88
5.92
perR
4.20
4.35
4.50
4.65
proV
5.48
5.52
5.56
proX
4.88
4.96
5.04phoA
5.60
5.68
5.76phoB
5.88
5.92
5.96
pstA
5.92
6.04
6.16
pstB
6.60
6.66
6.72
pstC
6.00
6.05
6.10
6.15pstS
5.5
5.7
5.9 clpC
5.74
5.78
5.82ctsR
5.20
5.28
5.36
5.44
obgE
6.30
6.35
6.40
Observed valuesPredicted values
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
Mea
n si
gnal
of m
etab
olic
pot
entia
ls (l
og10
(x))
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
Figure S10. The comparison of predicted and observed metabolic potentials of stress response along the gradient of pH levels. Values were mean ± SE.
ABCat
5.20
5.28
5.36 MatE
5.56
5.60
5.64MFS
6.03
6.06
6.09
SMR
6.22
6.24
6.26
6.28
Mex
4.65
4.80
4.95
5.10 lac
4.68
4.80
4.92
lacA
5.35
5.40
5.45
5.50 lacC
5.60
5.68
5.76
5.84 Tet
5.82
5.85
5.88
Van
4.08
4.32
4.56
Observed valuesPredicted values
Mea
n si
gnal
of m
etab
olic
pot
entia
ls (l
og10
(x))
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
pH range< 2.0 2.0-2.5 2.5-3.0 3.0-3.5 3.5-4.0 > 4.0
Figure S11. The comparison of predicted and observed metabolic potentials of antibiotic resistance along the gradient of pH levels. Values were mean ± SE.
Recommended