DECIPHERING COMPLEX BIOLOGICAL PROCESSES USING GENE COEXPRESSION NETWORKS
By
LIHE LIU
A THESIS PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE
UNIVERSITY OF FLORIDA
2020
Š 2020 Lihe Liu
To my family and friends
4
ACKNOWLEDGMENTS
In the first place, I would like to express my most profound appreciation to my
advisor Dr. Francisco PeĂąagaricano. This work would not have been so novel and
fascinating without his mentorship. He proposed great ideas in the beginning, worked
closely with me during my exploration and gave me valuable advice to accomplish my
research projects. During my masterâs program, I have learned not only how to solve
problems but also how to think creatively. I would also like to extend my deepest
gratitude to my committee members Dr. Samantha Brooks and Dr. Matias Kirst for their
advice, suggestions and supports.
I am incredibly thankful to my colleagues who I ever worked with in the
Department of Animal Sciences at the University of Florida. Especially my labmates â
Dr. Rostam Abdollahi-Arpanahi, Dr. Juan Pablo Nani, Dr. Fernanda Rezende, Dr.
Helder Louvandini, Dr. Rodrigo De Almeida Teixeira, Dr. Laila Talarico Dias, Anil Sigdel,
Rocio Amorin, Hendyel Pacheco, Mariana Nehme Marinho, Simone Da Silva, Eugenio
Jara and GastĂłn Rijo. Many thanks to the Animal Sciences Graduate Student
Association in the Department of Animal Sciences.
I am deeply indebted to my parents who supported me all the time no matter
which city I am in on the planet.
5
TABLE OF CONTENTS page
ACKNOWLEDGMENTS .................................................................................................. 4
LIST OF FIGURES .......................................................................................................... 7
LIST OF ABBREVIATIONS ............................................................................................. 8
ABSTRACT ..................................................................................................................... 9
CHAPTER
1 INTRODUCTION .................................................................................................... 11
2 LITERATURE REVIEW .......................................................................................... 13
High Throughput Technologies and Omics Data .................................................... 13 Overview and History ....................................................................................... 13 Gene Expression Profiling and Analysis Methods ............................................ 15 Epigenetic Modifications and Analysis Methods ............................................... 16 Other Types of Omics Data .............................................................................. 17 Multi-Omics Data Integration and Systems Biology Methods ........................... 18
Biological Networks ................................................................................................ 20 Biological Networks Overview .......................................................................... 20 Gene Coexpression Network Analysis ............................................................. 21 Unweighted and Weighted Gene Correlation Network Analysis ....................... 24 Differential Network Analysis ............................................................................ 26 Limitations of Correlation Networks and Conditional Independencies .............. 28
Over-representation Analysis (ORA) ...................................................................... 29 Fisherâs Exact Test and Hypergeometric Distribution ....................................... 29 ORA Example ................................................................................................... 29 EnrichKit R software ......................................................................................... 30
3 DIFFERENTIAL NETWORK ANALYSIS OF BOVINE MUSCLE REVEALS CHANGES IN GENE COEXPRESSION PATTERNS IN RESPONSE TO CHANGES IN MATERNAL NUTRITION ................................................................. 33
Introduction ............................................................................................................. 33 Materials and Methods............................................................................................ 35
Ethics Statement .............................................................................................. 35 Animals and Experimental Design .................................................................... 35 RNA Extraction, Library Preparation and Sequencing ...................................... 36 RNA-Seq Quality Control and Mapping ............................................................ 36 Read Counting, Processing and Normalization ................................................ 36 Gene Coexpression Network Construction....................................................... 37 Module Preservation ........................................................................................ 38
6
Module Characterization................................................................................... 38 DNA Extraction, Library Preparation and Sequencing ...................................... 39 Bisulfite-Seq Quality Control and Mapping ....................................................... 39 Differentially Methylated Cytosines and Genes ................................................ 40
Results .................................................................................................................... 40 RNA-sequencing Analysis ................................................................................ 40 Network Construction and Module Identification .............................................. 41 Module Preservation ........................................................................................ 41 Module Characterization................................................................................... 42 DNA Methylation Analysis ................................................................................ 42 DNA Methylation and Module Preservation ...................................................... 43 DNA Methylation and Network Properties ........................................................ 43
Discussion .............................................................................................................. 44
4 CONCLUSIONS ..................................................................................................... 56
LIST OF REFERENCES ............................................................................................... 58
BIOGRAPHICAL SKETCH ............................................................................................ 65
7
LIST OF FIGURES
Figure page 2-1 Spermatogenesis coexpression network analysis: distributions of node
connectivity and node clustering coefficient between maternal diets. ................ 31
2-2 Functional terms and pathways significantly enriched with genes associated
with dairy cow conception under thermalâstress conditions. Four gene
annotation databases were analyzed. ................................................................ 32
3-1 Soft-thresholding powers and module detection. ................................................ 50
3-2 Network visualization. ......................................................................................... 51
3-3 Permutation-based composite preservation statistics. (A) Summary statistics đđ đ˘đđđđđŚ (y-axis) as function of module size. (B) Summary statistics
đđđđđđđ đđđ (y-axis) as function of module size. ............................................... 52
3-4 Functional characterization of non-preserved modules. Six gene annotation databases were analyzed: Gene Ontology (GO), KEGG, Medical Subject Headings (MeSH), InterPro, Reactome and Molecular Signatures Database. ... 53
3-5 Comparison of methylation levels between preserved and unpreserved modules. Methylation level was calculated as differentially methylated cytosines divided by all the cytosines evaluated. ................................................ 54
3-6 Relationship between methylation level and intramodular connectivity. Methylation level was calculated as differentially methylated cytosines divided by all the cytosines evaluated. ............................................................... 55
8
LIST OF ABBREVIATIONS
ATP
DAGs
DEGs
DMCs
Adenosine Triphosphate
Directed and Acyclic Graphs
Differentially Expressed Genes
Differentially Methylated Cytosines
GN
GO
HTO
KEGG
MeSH
MSigDB
NAD(P)H
Gene Network
Gene Ontology
High Throughput Omics
Kyoto Encyclopedia of Genes and Genomes
Medical Subject Headings
Molecular Signatures Database
Nicotinamide Adenine Dinucleotide Phosphate
NGS
ODE
ORA
TOM
WGCNA
Next Generation Sequencing
Ordinary Differential Equation
Over-Representation Analysis
Topological Overlap Matrix
Weighted Gene Coexpression Network Analysis
9
Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science
DECIPHERING COMPLEX BIOLOGICAL PROCESSES USING GENE
COEXPRESSION NETWORKS
By
Lihe Liu
August 2020
Chair: Francisco PeĂąagaricano Major: Animal Sciences
Coexpression network analysis is a powerful tool to reveal transcriptional
regulatory mechanisms, identify transcription factors, and discover gene functions. It
can also be used to investigate changes in coexpression patterns in response to
environmental insults or changes in experimental conditions. Maternal nutrition is
considered a major intrauterine regulator of fetal developmental programming. The
objective of this study was to investigate structural changes in gene coexpression
networks in the muscle of bull beef calves gestated under diets with or without
methionine supplementation. Both muscle transcriptome and methylome were
evaluated using next generation sequencing. Maternal methionine supplementation
significantly perturbed coexpression patterns in the offspringâs muscle. Indeed, we
found that neither the connection strength nor the connectivity pattern of six modules
(subnetworks) detected in the control diet was preserved in the methionine-rich diet.
Functional characterization revealed that some of the unpreserved modules are
implicated in myogenesis, adipogenesis, fibrogenesis, canonical Wnt/β-catenin
pathway, ribosome structure, rRNA binding and processing, mitochondrial activities,
ATP synthesis and NAD(P)H oxidoreductases, among other functions. The bisulfite
10
sequencing analysis showed that nearly 2% of all evaluated cytosines were differentially
methylated between maternal diets. Interestingly, there were significant differences in
the levels of gene body DNA methylation between preserved and unpreserved modules.
Overall, our findings provide evidence that maternal nutrition can significantly alter gene
coexpression patterns in the offspring, and changes in DNA methylation mediate some
of these perturbations.
11
CHAPTER 1 INTRODUCTION
Modern biological research approaches have experienced unprecedented
changes since the initiation of high throughput technologies (âomicsâ technologies),
such as genomics, epigenomics, transcriptomics, proteomics and metabolomics,
empowering the concurrent identification and quantification of a large number of
molecular entities within the organism. âOmicsâ ranges an increasingly broad scope of
studies. This considerable amount of data provides innovative insights only if one is
armed with the essential analytical techniques and computing resources to integrate
and disentangle multiple layers of information. Systems biology intends to recognize a
whole cellular system as a working unit, as opposed to investigating the elements as
isolated individuals, such perspective highlights holism instead of reductionism.
Systems biology approaches, e.g., multi-omics data integration, usually incorporate
mathematics, statistical modeling and computational techniques to mine vast amounts
of data. In biology research, perturbation experiments play a vital role in investigating
gene functions and regulatory pathways. The traditional differential expression (DE)
analysis tests genes individually; however, genes never work in isolation and rarely
have a solitary function. Coexpression network analysis is a ground-breaking tool to
reveal transcriptional regulatory mechanisms, identify transcription factors, and discover
gene functions, as well as characterize changes in global coexpression patterns in
response to external perturbations. Over-Representation Analysis (ORA) is a commonly
used test to determine if the known functional pathways or protein families are over-
represented (enriched) in the gene list obtained in experiments. The combination of a
12
coexpression network modeling and over-representation analysis greatly facilitates the
identification and functional interpretation of genes disturbed by treatments.
In this study, we evaluated the impact of maternal methionine supplementation
on gene coexpression patterns in the offspringâs muscle tissue. Both muscle
transcriptome and methylome were examined using next-generation sequencing
technologies. Changes in gene coexpression patterns between maternal control and
methionine-rich diets were evaluated using differential gene coexpression network
analysis. The disturbed (sub)networks were functionally characterized using over-
representation analysis. Additionally, we investigated the link between network
perturbation and DNA methylation. Our results provide evidence that maternal nutrition
can significantly perturb gene coexpression patterns in the offspring, and some of these
changes might be mediated by alterations in DNA methylation.
13
CHAPTER 2 LITERATURE REVIEW
High Throughput Technologies and Omics Data
Overview and History
Early in 1977, the âfirst-generationâ DNA sequencing technology, chain-
termination method, was developed by Frederick Sanger and colleagues, which was
then embraced as the predominant methodology in both laboratory applications and
commercial sequencing platforms. In 2001, this method was applied to initially
determine the sequence of the human genome (Lander et al. 2001; Venter et al. 2001),
soon after, the complete human genome was released in 2004 (International Human
Genome Sequencing 2004). At the same time, the world witnessed the archive of
genome sequences of several other model organisms (Mouse Genome Sequencing
Consortium 2002; Gibbs et al. 2004; Chimpanzee Sequencing and Analysis Consortium
2005). Despite this unprecedented progress, this âfirst-generationâ sequence technology
has limitations in throughput and cost. Consequently, the development of more powerful
sequencing approaches was stimulated in order to improve speed and precision while
concurrently diminish cost and workforce. For example, 454 was initiated by 454 Life
Sciences in 2005, Solexa publicized Genome Analyzer and Agencourt developed
SOLiD (Sequencing by Oligo Ligation Detection) in 2006. These alternatives had
enhanced performance on precision and throughput; consequently, they become the
three most desired massively parallel sequencing systems in this NGS era. Afterward,
other agencies procured these originator companies: Agencourt was acquired by
Applied Biosystems in 2006, and 454 was purchased by Roche in 2007, later, Solexa
was obtained by Illumina. Inspired by the keynote of cost-efficient and high-throughput
14
analysis approaches, technological advances have been made in other â-omicsâ field
besides genomics. For instance, the âexpression arrayâ was initiated within the late
1990s; soon after, a few refined array technologies were demonstrated competent in
measuring the expression levels of all protein-coding transcripts in a particular tissue.
Within the early 2000s, the measurement of expression quantitative trait loci (eQTL, loci
controlling gene expression) was first accomplished by array technologies. Also, Nobel
Prize winners John Fenn and Koichi Tanaka developed two ionization techniques â
electrospray (Whitehouse et al. 1985) and MALDI (matrix-assisted laser desorption
ionization) mass spectrometry (Karas et al. 1989), which made it possible to cohesively
perform the vaporization, ionization and evaluation of relatively large, non-volatile
biomolecules such as proteins and peptides. Gradually, numerous other âomicsâ
technologies were invented that have the potential of interrogating entire pools of
transcripts, proteins, metabolites and the genome.
Empowered by such new technologies, âomicsâ studies are gaining ever-
increasing popularity. Here the term âomicsâ is a suffix suggesting the assessment of
the complete complement of a given level of biological information flow. In these
scenarios, high-throughput (HT) screening techniques are utilized to produce the huge
sums of âomicsâ data and exhibit a system-level understanding of associations and
dependencies between molecular individuals. âOmicsâ ranges an increasingly broad
scope of studies, particularly genomics (the quantitative investigation of a complete set
of genetic sequences), transcriptomics (the measurement of all the messenger RNA
produced in a particular cell), proteomics (the study of the entire set of proteins,
15
focusing on protein structures, functions and abundance) and metabolomics
(metabolites and metabolic networks).
Gene Expression Profiling and Analysis Methods
Gene expression is the spatial and temporal pattern of a gene product. The
transcriptome is the complete collection of readouts of mRNA, or âtranscriptsâ, present
in one or a group of cells. Transcriptomics, also known as gene expression profiling,
measures the expression level of transcripts in a specific cell and in a particular state.
Gene regulations could induce phenotypic changes in the animals by changing the
relationships of proteins and metabolites. DNA microarray is one of the reliable
approaches to assess the abundance of the transcripts, where nucleic acid probes are
bound to microarray slides covalently and the target sequences with fluorescent labels
are then hybridized and scanned. Finally, the signal intensities in the scanned images
can appropriately represent expression information (Mantione et al. 2014). As an
alternative to microarray technology, RNA-sequencing (RNA-Seq) no longer requires
species- or transcript-specific probes and can detect novel transcripts and genetic
variants, for example, isoforms expression, exon expression, allele expression and
haplotype expression.
The well-known approach, differential expression analysis (DE analysis)
(Kadarmideen et al. 2007), was employed to compare the individual gene expression
levels across two conditions and identify genes showing statistically significant
differences, e.g., normal versus diseased or high-performance versus low-performance
animals. Transcriptomic studies can measure the effect of external perturbations and
compare different tissues at the gene expression level, thus providing novel insights into
the health, production traits and behaviors of animals. In addition, transcriptomic topics
16
also investigate interactions among genes by establishing global coexpression networks
that focus on the identification of coexpressed gene modules, where coexpression
suggests co-regulation (functional similarities) of genes involved in the same biological
processes.
Epigenetic Modifications and Analysis Methods
Originally defined as âthe branch of biology which studies the causal interactions
between genes and their products, which bring the phenotype into beingâ (Waddington
1942), the term epigenetics is characterized as the occurrences of changes in gene
expression or cellular phenotype that are stable and heritable while do not stem from
the changes in Watson-Crick base-pairing of DNA (Goldberg et al. 2007). Epigenetic
variation contributes to phenotypic variation through chromatin marks at the gene level,
which can affect local transcriptional potential and cause changes in the expression of
genes that underlying the trait. So far, the most comprehensively characterized
chemical alteration mechanism of chromatin is DNA methylation. This modification
features the covalent addition of a methyl group to the C5 (5meC) position of the
pyrimidine ring, where the cytosines are consequently being converted to 5-
methylcytosines through the enzyme DNA methyltransferases (DNMTs). This is a
common epigenetic mark found in eukaryotes and it often occurs in the CpG or CpHpG
(H = A, T, C) context. DNA methylations found at gene regulatory regions, i.e.,
promoters, were usually believed to have a repressive effect of gene expression (Laird
2010).
A wide range of strategies is employed in various DNA methylation profiling
approaches. DNA methylation status determination could be local or global. In genome-
wide DNA methylation mapping, NGS technologies have greatly influenced the
17
strategies utilized, for example, 5mC-specific immunoprecipitation (IP), methylation-
sensitive restriction enzymes (MSRE) digestion, sodium bisulfite modification (SBM)
and third-generation sequencing (TGS) technologies (Marzese et al. 2015). Despite the
starting method, either microarray or NGS technologies can subsequently be applied to
evaluate DNA methylation status. Among these methods, the most effective approach
to assess methylation status is DNA bisulfite conversion combined with NGS, which can
reach the single-base resolution; hence a comprehensive measurement of DNA
methylation of almost every cytosine on the genome can be yielded (Marzese et al.
2015). Whole-genome bisulfite sequencing (WGBS) generates the most comprehensive
DNA methylation maps; however, it is incredibly costly, and the subsequent data
analysis remains challenging. As such, different adaptations of WGNS have been
applied, for example, reduced representation bisulfite sequencing (RRBS) (Meissner et
al. 2005) and bisulfite sequencing of padlock probes (BSPP) (Diep et al. 2012).
Other Types of Omics Data
Proteomics was oriented to depict the whole collection of proteins in a cell, tissue
or a living organism (Haoudi et al. 2006). Proteomics topics can be characterized into
three primary ranges: (1) extensive identification of proteins and protein post-
translational modifications; (2) comparative study of protein levels and identification of
differentially displayed proteins across conditions, e.g., health and disease; and (3)
identification of protein-protein interactions using techniques such as yeast two-hybrid
system or MS (mass spectrometry) (Pandey et al. 2000). What lies in the center of
functional proteomics analysis is the identification of components of a protein complex,
note that it is often challenging to make functional predictions of a protein with only the
18
homology information or even three-dimensional structure available (Pandey et al.
2000).
Metabolomics studies focus on characterizing the global metabolite profiles in a
cell. Typically, all the metabolites with all chemical natures, e.g., amino acids,
antibodies, aptamers, small biomolecules, etc., will be examined (Rochfort 2005).
Metabolomics can significantly facilitate the in vivo metabolic network modeling because
it provides not only a descriptive data source but a quantitative data source of
intracellular metabolites (Suravajhala et al. 2016).
âOmicsâ ranges an increasingly broad scope of studies, which include general
topics, for example, genomics, transcriptomics, proteomics, and metabolomics, as well
as more specific topics regarding post-genomic biological research and modern medical
science. Such studies include but not limited to pharmacogenomics (quantitatively
measure the relationship of the genetic factors and the host response to drugs),
physiomics (dynamics and functions of cell physiology), nutrigenomics (quantitatively
measure the relationship of the genetics affects and the organismâs response to diet),
phylogenomics (combining evolution and genomic information) and interactomics
(networks of molecule interactions) (Schneider et al. 2011).
Multi-Omics Data Integration and Systems Biology Methods
The advent of âomicsâ technologies has empowered biological sciences to
identify and quantify the molecular components of a cell at different layers with ever-
increasing speed and precision. âOmicsâ technologies and the subsequent analysis
techniques significantly improved the speed and volume of the cellular components
information we can acquire, empowering a comprehensive multi-level characterization
of molecular entities. However, it is not appropriate to assume a linear combination of
19
the biological properties of cellular components predominating the trait we observe,
what underlies the âtraits systemâ is the collaboration of numerous interacting biological
entities at different levels, for example, genomics, transcriptomics, metabolomics and
proteomics and even beyond. This has needed new predictive models and
computational approaches for investigating how the genome itself and various external
conditions may impact these functional associations and the observable behaviors of
the âtrait systemâ. This holistic multi-omics integration approach could provide valuable
insights.
The term âsystems biologyâ alludes to a set of strategies and techniques that
endeavor to decipher complex biological processes by integrating cellular information
from different levels, such approach relies on the acquisition of observations of huge
numbers of cell components and the appropriate modeling of the interactions among
them (Baitaluk et al. 2006; Bruggeman et al. 2007). This approach usually incorporates
mathematics, statistical modeling and computational techniques to integrate vast
amounts of data so as to disentangle complex biological processes and envisage the
behaviors of the cellular systems. From a multi-omics integration point of view, this
approach takes into consideration data from different sources and try to link isolated
âomicsâ levels with genes/pathways functional annotation; thus, all âomicsâ levels can be
related to cell phenotypes. For example, Suravajhala et al. (2016) proposed that âa
typical data integration process goes from genome â epigenome â transcriptome â
metabolome â proteome â phenotype or disease.â
20
Biological Networks
Biological Networks Overview
Systematical investigations of huge groups of molecules, e.g., genes, proteins,
metabolites and their modifications have elucidated the accurate interaction and
regulation of complex molecular networks, thus explaining the functionalities of complex
cellular processes. With prominent topological and hierarchical properties, these
complex biological networks are extremely distinct from random networks and can
potentially provide innovative insights in deciphering fundamental mechanisms
underlying biological processes and disease pathologies. To date, there are several
kinds of biological networks that have been extensively studied and applied, for
example, genetic interaction networks, protein-protein interactions networks,
transcription factor binding networks, metabolic interaction networks, protein
phosphorylation (Zhu et al. 2007). Network objects (variables/molecules) are referred to
as ânodesâ and interactions as âedgesâ. Network edges are usually represented as
binary measurements (unweighted networks); however, various strengths could also be
assigned to certain edges (weighted networks). The directions of edges are dependent
on the logic of the network and usually imply casualty. Undirected networks are often
used to characterize the protein-protein and genetic interactions while networks with
directions embedded are usually utilized to depict transcription factor binding networks,
metabolic interaction networks and protein phosphorylation.
Network topology is essential in understanding network architecture and different
statistics can be used to characterize topological properties of a node, such as
connectivity, betweenness, cluster coefficient, shortest path length, etc. Biological
networks are found to have a âscale-freeâ format rather than randomly organized, where
21
the connectivity distribution, i.e., distribution of the number of directly connected nodes,
follows a power-law (Barabasi et al. 2003). As results, the distribution of connectivity
has a âfat-tailedâ characteristic where the nodes with high connectivity are termed
âhubsâ. Specifically, a biological network contains a small number of hubs with many
connections and a large number of nodes that have relatively fewer connections. Such
type of organization has shown high robustness, for example, a random loss of single
non-hub node in a scale-free network will induce fewer disorders than that in a random
network (Zhu et al. 2007). Presumably, a hub component found in a scale-free network
is relative more involved in the process and therefore usually plays vital roles in this
interacting system, for example, a transcription factor in a gene-gene interaction
network.
Gene Coexpression Network Analysis
With a large volume of gene expression data becoming available, gene networks
(GNs) have become a ground-breaking method for processing, interpreting and
predicting interactions between genes. Since genes never work in isolation and rarely
have single role, network theory could largely facilitate the modeling of complicated
cellular processes. A typical gene network analysis starts with the gene measurement
input, usually consists of messenger RNA expression, which could either be obtained
from NCBI Gene Expression Omnibus (GEO) (Barrett et al. 2013) or generated in a
given experiment. After data acquisition, data may be pre-processed using different
computational (quality control) methods so as to improve the quality of the input data.
Then, the preferred computational algorithms (machine learning approaches) are
applied to the preprocessed datasets in order to infer one or multiple networks. Finally,
the obtained network is optimized and validated using a single network approach (focus
22
on mechanisms); alternatively, networks from different conditions can be compared
using a differential network approach (focus on changes), thus revealing network
topological differences/changes found in different tissues or induced by external
perturbation, e.g., diseases, nutrition. Gene network construction using gene expression
levels as variables is intuitive and preferred since transcription is recognized as the
primary mechanism underlying gene expression regulation (Lappalainen et al. 2013).
The choice of a certain gene network reconstruction algorithm depends on the
logical thinking underneath the gene network and will shape different resulting networks
showing different behavior. Main gene network inference approaches include but not
limited to: (i) information theory, (ii) Boolean networks, (iii) Ordinary differential equation
(ODE), (iv) Bayesian and (v) Neural models (Delgado et al. 2019).
Information theory-based networks, as known as coexpression networks, use
pairwise correlation coefficients as primary information for dependencies inferences
between genes, for example, Pearson, Spearman or Kendall coefficients, besides, other
types of pairwise similarity measurement can also be applied in gene network
reconstructions, such as Euclidean distances or mutual information. These models have
obvious advantages, including intuitive, less computationally demanding and requiring
low number samples. As an example, 14 hub genes were successfully identified in the
regulation of postmenopausal osteoporosis disease (Wang et al. 2020).
In Boolean networks, different variables represent genes and Boolean binary
values are used to discretize expression levels, where silenced genes are annotated as
0 and activated genes are annotated as 1 (Melkman et al. 2018). Although the Boolean
network approach was proven useful (Orlando et al. 2008; Moignard et al. 2015; Simak
23
et al. 2017), the main limitations lie in the discretization procedure â gene expression is
usually considered as a continuous variable and is rarely measured by fully-activation or
fully-silenced, such discretization will result in massive information loss. Also, Boolean
networks are not robust to noisy data (Maheshri et al. 2007).
Ordinary differential equation (ODE) methods use continuous variables instead of
binary variables for gene expression measurement; also, it characterizes changes in
gene expression level as a dependent variable of the expression level of other genes;
moreover, it takes into account external factors. This approach yields models with
higher precision and facilitates the quantitative dynamic modeling of gene regulation
relationships (Chai et al. 2014). ODE models utilize continuous-time variables with
constraints and prohibit negative values (Chai et al. 2014). One of the disadvantages of
the ODE approach is that only the linear relationships or few specific kinds of non-linear
relationships are taken into consideration (Voit 2000; Hecker et al. 2009), while
biological regulations are often depicted as complex non-linear dynamics. Moreover, the
computational complexity of ODE models limits their ability to decipher large gene
networks and the value estimation for model parameters is challenging (Delgado et al.
2019).
Bayes theorem of probability along with graph theory was employed in Bayesian
GNs logics hence making it feasible to qualitatively measure the properties of GNs
(Chai et al. 2014). Generally, Bayesian networks are directed and acyclic graphs
(DAGs), which represents the joint distribution of nodes where a set of local node
probability distributions are included (Kaderali et al. 2008). The main advantage of
Bayesian networks is the robustness of GNs reconstruction since it has great
24
adaptability to combine various sources of data and prior knowledge. The Bayesian
network approach has been applied in numerous areas extending from evolutionary
development to medicine (Acerbi et al. 2014; Chekouo et al. 2015; Chudasama et al.
2018).
There are mainly two methods in the applications of Neural models: Artificial
Neural Network (ANN) and Recurrent Neural Networks (RNN). The first model is
completely neural while the second model involves fuzzy logic (Biswas et al. 2016).
RNN is capable of modeling non-linear and dynamic genes pairwise relationships
(Kordmahalleh et al. 2017). Neural models can fit in continuous variables and their
outcomes look homogeneous to the neural connections which can be observed in
natural processes. For a particular gene, the model can consider it as the source of
regulatory impact and compute a weighted sum of all potential regulating genes. Also, a
scoring function is utilized to evaluate the results, for example, network performance
optimization and error minimization.
Unweighted and Weighted Gene Correlation Network Analysis
Correlation network approaches have been widely performed in transcriptomics
studies. Typically, gene expression consists of the total amount of mRNA corresponding
to genes. In most scenarios, gene coexpression networks are reconstructed at the gene
level; however, measurements can also be measured at the probe level in microarrays
applications. Gene expression data are usually pre-processed/normalized using one of
many available methods. As results, the normalized gene expression measurements
are stored and communicated in an m Ă n dimensional matrix exprDatX whose ith
column xi corresponds to the ith gene (or probe) expression profile (across samples).
The mth component of xi represents the mth sample measurements. The correlations
25
between gene expression profiles across samples indicate gene expression similarities.
Inspired by the âguilt-by-associationâ heuristic, the highly correlated expression profiles
of two genes may indicate their functional associations, e.g., corresponding to the same
protein complexes, involved in the same pathways or participating in the same
regulatory and signaling system (Eisen et al. 1998).
An unweighted coexpression network is established by applying cutoff values to
the absolute values of the correlation matrix. To construct an unweighted network, an n
Ă n dimensional matrix Râ = â (đđđ) with pairwise correlations as entries are computed
using correlation coefficients, e.g., Pearsonâs correlation. By applying a threshold on
|đđđ|, then the correlation matrix is converted to an adjacency matrix Aâ = â (đđđ), where
entries are either 0 or 1. Finally, an unweighted network is established using the
pairwise connection information obtained from the adjacency matrix where two
genes i and j are either connected (đđđâ=â1) or disconnected (đđđâ=â0). Such dichotomic
approach (hard thresholding) can potentially result in information loss; hence, those
approaches with the abilities to preserve the continuous property of gene expression
are preferred in some cases.
Weighted Gene Coexpression Network Analysis (WGCNA) is a commonly
applied unsupervised learning approach in transcriptomics studies, especially for
correlation-based biological networks (Zhang et al. 2005; Langfelder et al. 2008). A
weighted coexpression network can be defined as a weighted adjacency matrix by
nature, and it can adequately characterize pairwise gene relationships while preserving
the continuity nature of the expression. Besides this, the weighted network approach
has other advantages: (i) weighted networks focus on clusters of genes (modules)
26
instead of single genes, this approach is more intuitive than testing genes individually
because genes never work in isolation, also, this approach dramatically alleviates the
problem of multiple testing; (ii) network is highly robust despite different soft thresholds;
(iii) such networks enable the geometric interpretation of the relationship detected; (iv)
network statistics can enhance the standard data-mining approaches and intuitively
characterize the network properties. (v) a parsimonious parametrization is feasible in
such application because weighted networks can be approximated by "factorizable"
networks, while networks with weak density (unweighted networks) cannot. Additionally,
the WGCNA algorithm provides a set of intuitive and persuasive module preservation
statistics that can be used to statistically quantify the cross-condition preservation status
of a given module or subnetwork. Module preservation is an essential step of differential
network analysis approach, this approach relies on module preservation statistics
because they can facilitate the statistical significance inference and biological
interpretation of network topological changes.
Differential Network Analysis
In the scenario where samples are collected from two or more different
groups/conditions, a differential network analysis is applied and comparisons of global
gene coexpression profiles can be made. This approach compares the set of inter-
connections of genes cross conditions, thus demonstrating the changes in network
topology due to external perturbations. In our previous work (Louvandini et al. 2020), we
performed a differential network (unweighted) analysis in order to compare the process
of spermatogenesis in lambs exposed to either a maternal control diet or a maternal
gossypol-rich diet. A total of 145 genes that showed expression in the RNA-Seq data
and belong to the Gene Ontology (GO) term spermatogenesis (GO:0007283) were
27
evaluated. For each maternal diet, a correlation matrix Râ = â (đđđ) with dimensions
145âĂâ145 was constructed using Pearson correlation coefficients. Each correlation
matrix was then translated into an adjacency matrix Aâ = â (đđđ), a 145âĂâ145 matrix with
entries either 0 or 1. Here, if |đđđ|ââĽâ0.5 and P-value â¤0.05, then đđđ =â1, otherwise đđđ = 0.
Finally, for each maternal diet, an unweighted network was constructed based on
adjacency matrix where two genes i and j were either connected (đđđâ=â1) or
disconnected (đđđâ=â0). The structure and topology of each network were evaluated
using node connectivity and node cluster coefficient (Langfelder et al. 2011). The
connectivity đđ of gene đ, defined as đđ = â đđđđâ đ , measures the connection strength of
gene đ with all the other genes. The cluster coefficient đđ of gene đ, defined as,
đđ = â â đđđđđđđđđđâ đđâ đ
(â đđđđâ đ )2 â â (đđđ)2đâ đ
which represents the local density of a network with 0 ⤠đđ ⤠1.
Figure 2-1 shows the distribution of the node connectivity and node cluster
coefficient, two classical network preservation statistics, in each maternal diet.
Interestingly, the exposure to gossypol in utero and throughout lactation caused
noticeable changes in the coexpression patterns of genes directly implicated in
spermatogenesis. Indeed, the maternal gossypol-rich diet altered the spermatogenesis
network decreasing both gene connectivity and gene clustering. The gene coexpression
network analysis revealed that the maternal exposure to gossypol significantly altered
the coexpression patterns of spermatogenesis-related genes, suggesting that gossypol
might disrupt gene coregulation mechanisms.
28
Limitations of Correlation Networks and Conditional Independencies
Conventionally, gene correlation networks are obtained by learning the pairwise
correlations of the gene expression profile. However, it should be noted that this
approach fails to distinguish direct (conditional independence) from indirect (marginal
independence) based on the observed expression similarities. The resulted networks
are missing the regulatory relationships because the observed undirected relationships
can be explained by multiple regulatory mechanisms. We have no way to choose
between them from expression data alone using correlation coefficients (Markowetz et
al. 2007). However, some statistical methods based on the concept of conditional
independencies can be applied to reveal the most likely possibilities (Dawid 1979). Let
đ, đ, đ be random factors sharing a joint distribution đ. Given that if and only if
P(X = x, Y = y | Z = z) = P(X = x | Z = z) ¡ P(Y = y | Z = z)
We can conclude that ( đ ⼠đ | đ), which means that đ is conditionally independent of đ
given đ. Note that this can also be represented in this way
P(X = x | Y = y, Z = z) = P(X = x | Z = z)
And this is a direct generalization of the independence condition for đ and đ,
P(X = x, Y = y) = P(X = x) ¡ P(Y = y).
Suppose we have a set of variables Z instead of a single variable Z, the same
conclusions hold. This gives us a chance to reveal the true regulatory relationships
between two genes while controlling the effect of other genes, using this approach, we
can retrieve causal relationships of genes instead of obtaining sets of undirected
coexpression connections.
29
Over-representation Analysis (ORA)
Fisherâs Exact Test and Hypergeometric Distribution
Named after its inventor Sir Ronald Fisher, Fisher's exact test is one of a class of
exact tests commonly used in the analysis of contingency tables, where categorical data
generated by classifying objects in two different ways. By applying such test, the
significance level (e.g., P-value) of the association (contingency) between two types of
classifications can be calculated exactly instead of relying on an approximation.
In the scenario of functional characterization, Over-Representation Analysis
(ORA) (based on Fisher's exact test) is a broadly utilized analysis to decide whether
known cellular functions or biological processes are over-represented (enriched) in an
experimentally derived gene list, e.g., differentially expressed genes (DEGs), genes
located in an unpreserved coexpression module or genes flagged by significant SNPs
from whole-genome scans. Given a specific biological pathway, the P-value of the
observing đ significant genes in the pathway can be calculated by the hypergeometric
distribution
đđŁđđđ˘đ = 1 â â (
đđ
) (đ â đđ â đ
)
(đđ
)
đâ1
đ=0
where đ refers to the number of significant genes, N refers to the total number of genes
tested and m refers to the number of genes in the pathway/gene-set of interest.
ORA Example
In our recent study (Sigdel et al. 2020), we applied ORA to a list of candidate
genes associated with cowsâ ability to conceive under thermal stress conditions. In this
analysis, four different annotation databases were explored, including GO (Ashburner et
30
al. 2000), Medical Subject Headings (Nelson et al. 2004), InterPro (Mitchell et al. 2019)
and Reactome (Jassal et al. 2020). As results, at least five different groups of geneâsets
were identified; these functional terms are related to fertilization, development, heat
shock proteins, cellular response to oxidative stress, and calcium ion homeostasis.
Figure 2-2 shows the most relevant functional terms and biological pathways enriched
with thermotolerant genes.
EnrichKit R software
In order to facilitate ORA analyses and improve collaborations, an R software
package was built for ORA named EnrichKit (0.1.0). This package serves as an
interface that enables the user to perform ORA using six annotation databases, namely,
GO (Ashburner et al. 2000), KEGG (Kanehisa et al. 2000), Medical Subject Headings
(Nelson et al. 2004), InterPro (Mitchell et al. 2019) and Reactome (Jassal et al. 2020)
and Molecular Signatures Database (MSigDB) (Liberzon et al. 2011). Currently, only
Bos Taurus (cow) is supported but other organisms may be included in future releases.
This R package is hosted on Github (https://github.com/liulihe954/EnrichKit).
31
Figure 2-1. Spermatogenesis coexpression network analysis: distributions of node connectivity and node clustering coefficient between maternal diets. a Node connectivity; (b) Node clustering coefficient. Control diet is represented in blue while gossypol-rich cottonseed diet is represented in red. Treatment means are marked by the dashed vertical lines: the blue line is for the control diet and the red line is for the gossypol-rich cottonseed diet. Adapted from Louvandini et al. (2020).
32
Figure 2-2. Functional terms and pathways significantly enriched with genes associated with dairy cow conception under thermalâstress conditions. Four gene annotation databases were analyzed: GO, Medical Subject Headings, InterPro and Reactome. The yâaxis displays the names and the total number of genes of each geneâset. The black dots represent the significance of
enrichment (âlog10 Pâvalue, Fisherâs exact test, top xâaxis) and the bars represent the number of significant genes in each functional term (bottom xâaxis). Adapted from Sigdel et al. (2020)
33
CHAPTER 3 DIFFERENTIAL NETWORK ANALYSIS OF BOVINE MUSCLE REVEALS CHANGES
IN GENE COEXPRESSION PATTERNS IN RESPONSE TO CHANGES IN MATERNAL NUTRITION
Introduction
Transcriptome analysis is an essential tool to uncover the molecular basis of
phenotypic variation. The advent of RNA sequencing has dramatically improved the
characterization and quantification of transcriptomes (Ozsolak et al. 2011). The most
common use of RNA sequencing is the identification of differentially expressed genes,
that is, genes that show differences in expression between conditions. However, genes
and gene products do not usually work in isolation, but they are connected in complex
networks. There is increasing interest in moving beyond differential expression and
examine transcriptional profiles in the context of molecular networks (Han 2008). It is
well-accepted that genes that are controlled by the same set of transcription factors or
are involved in the same biological processes tend to have similar expression profiles
(Allocco et al. 2004). This principle is known as guilt-by-association and represents the
basis for the reconstruction of gene networks using RNA sequencing data. These
networks, commonly called gene coexpression networks, are undirected graphs where
nodes correspond to genes and edges represent pairwise expression similarities.
Gene coexpression networks can be used for different purposes. One popular
application consists in the characterization of the topology of the reconstructed network
and examination of interesting nodes and coexpression structures. This single network
analysis focuses on the mechanisms allowing the identification of transcription factors
(hub genes), the functional annotation of unknown genes, i.e., the association of genes
of unknown function with well-described biological processes, and the detection of
34
transcriptional regulatory programs (van Dam et al. 2018). Another application consists
of evaluating gene coexpression networks but across conditions. Here, the term
condition is very broad and can refer to different tissues, different developmental
stages, or even different treatments. This application, commonly known as differential
network analysis, focuses on determining changes in the topology of the networks
across conditions. For instance, it is possible to examine whether connections or
subnetwork defined under normal conditions (control group) are reproducible and
preserved in the testing group (treatment group) (Langfelder et al. 2011). In this
scenario, differences in the topology of these two networks would indicate that
coexpression patterns were significantly perturbed by the treatment. Note that
expression similarities (coexpression) hint at common regulatory mechanisms
(coregulation), and hence, changes in the network might indicate that the treatment has
disrupted coregulation mechanisms, functional links and biological processes.
Undoubtedly, this shift in focus from differentially expressed genes to differentially
connected genes provides more holistic insights about gene regulation.
It is well-documented that different intrauterine insults can induce permanent
changes to the structure, physiology, and metabolism of the offspring. This
phenomenon has been termed fetal programming and may have lasting or lifelong
consequences (Godfrey et al. 2001). Maternal nutrition is considered a major
intrauterine environmental factor and it is now known that maternal nutritional status
during pregnancy can induce remarkable effects on fetal development (Wu et al. 2004).
There is growing evidence that maternal nutrition can alter epigenetic marks of the fetal
genome, such as DNA methylation. Indeed, this link between maternal nutrition and
35
subsequent modification of the fetal epigenome, including changes in gene expression,
is one of the molecular mechanisms proposed to explain the phenomenon of fetal
programming (Burdge et al. 2010; Chmurzynska 2010).
The main objective of this study was to assess whether maternal nutrition can
disrupt gene coexpression patterns in the offspring. Maternal nutritional treatments
consisted of control or a methionine-rich diets offered during the periconceptional and
early gestation periods. Both muscle transcriptome and methylome were evaluated in
the offspring using next generation sequencing. Note that DNA methylation depends on
the availability of methyl donors, such as methionine, and hence, we hypothesized that
maternal methionine supplementation could alter the fetal epigenome, which in turn
could induce significant changes in the topology of gene networks.
Materials and Methods
Ethics Statement
The University of Florida approved the procedures used in this study (IACUC
#2014408583)
Animals and Experimental Design
Beef cows from the University of Florida Range Cattle Research and Education
Center (Ona, Florida, US) were assigned to one of two nutritional treatments from days
-30 to +90 relative to the beginning of the breeding season. These treatments consisted
of a control diet based on limpograss hay supplemented with molasses and urea and a
methionine-rich diet equal to the control diet but supplemented with 10 g/head/day of
MetaSmart Liquid (Adisseo, Alpharetta, GA) providing 3.7 g/head/day of rumen-
protected methionine. Longissimus dorsi muscle samples were collected from 20 bull
calves, 10 per maternal diet, at one month of age. Maternal diets did not affect birth or
36
weaning weight but altered post weaning calf growth performance. Calves derived from
the methionine-rich diet had a greater average daily gain and feed conversion rate post
weaning.
RNA Extraction, Library Preparation and Sequencing
Total RNA was extracted using the Qiagen RNeasy Mini kit. RNA yield and
quality were evaluated using the Agilent 2100 Bioanalyzer (Agilent Technologies, Inc.).
RNA-sequencing libraries were prepared from 50 ng RNA samples using a poly(A)
capture method and then sequenced using Illuminaâs HiSeq 3000 at the University of
Florida. A total of 19 muscle samples from 19 bull calves derived from 9 control and 10
methionine-rich maternal diets were successfully processed and sequenced, and hence
used for subsequent RNA-sequencing analyses. RNA-sequencing data can be
accessed by NCBI GEO with the accession number GSE116974.
RNA-Seq Quality Control and Mapping
The quality of the sequencing reads was evaluated using the software FastQC
(v0.11.7, Babraham Bioinformatics, UK). Adaptor removal and trimming were conducted
with Trim Galore (version 0.4.4, Babraham Bioinformatics, UK) using the following
parameters: --paired, --clip_R1 10, --clip_R2 10, --three_prime_clip_R1 10, --
three_prime_clip_R2 10, and --length 20. The resulting paired-end sequencing reads
were mapped to the latest bovine reference genome (ARS-UCD1.2) using the software
Hisat2 (v2.1.0) (Kim et al. 2015).
Read Counting, Processing and Normalization
The number of reads that mapped to each annotated gene in the bovine GTF file
(ARS-UCD1.2) was obtained using the python script htseq-count (v0.6.1p1) using the
option intersection-nonempty (Anders et al. 2015). Both highly abundant genes
37
(myosins, n = 25 genes) and lowly expressed genes (read counts ⤠5 in at least 9
biological replicates) were removed from the raw expression data and not included in
subsequent analyses. After data processing, read counts were normalized using the
trimmed mean of M-values (TMM) normalization method available in the R package
edgeR (v3.14) (Robinson et al. 2010).
Gene Coexpression Network Construction
Genes with high expression variance across samples (top 50%) were used for
network modeling. The R package WGCNA (v1.69) was used for network construction
(Zhang et al. 2005; Langfelder et al. 2008). First, an unsigned adjacency matrix was
constructed based on pairwise Pearson correlation coefficients using the function
adjacency. This adjacency matrix was then remodeled as a Topological Overlap Matrix
(TOM) using the function TOMsimilarity. The TOM-based dissimilarity matrix, simply
defined as đđđ đ đđđ = (1 â đđđ), was used as a pairwise distance matrix for
hierarchical clustering. Note that genes in the same cluster share strong
interconnections, and they might define modules or subnetworks. Module detection was
performed by cutting the branches of the clustering dendrogram using the function
cutreeDynamic. Genes that could not be assigned to any module were considered as
background genes (grey module) and were not included in subsequent module
preservation analyses. For each module, the variable module eigengene (ME) was
calculated as the first principal component of the module expression data. Modules
detected in the dynamic cutting process were merged if their eigengene values were
highly correlated (Pearson correlation ⼠0.8). The module membership was calculated
using the function signedKME, where the expression profile of each gene was
38
correlated with the module eigengene in order to quantify how connected a gene was to
a given module.
Module Preservation
The function modulePreservation implemented in the R package WGCNA was
used to evaluate the preservation of each module (subnetwork) across conditions, i.e.,
between maternal control and maternal methionine-rich diets. A total of 12 different
module preservation statistics were investigated using a permutation test with 2000
resamples. These 12 different preservation statistics were combined in two composite
preservation significance scores, namely đđđđđ đđĄđŚ and đđđđđđđđĄđđŁđđĄđŚ. Finally, đđ đ˘đđđđđŚ was
computed as the mean of đđđđđ đđĄđŚ and đđđđđđđđĄđđŁđđĄđŚ, representing the general
preservation status of a given module across the two experimental conditions
(Langfelder et al. 2011). In addition, the alternative module-size-independent statistic
đđđđđđđ đđđ was also calculated to assess module preservation. This rank-based
statistic is defined as the mean of observed median ranks for individual connectivity and
density preservation statistics in each module; thus, modules with low median rank
values are considered as preserved between conditions/treatments. Finally, following
Langfelder and colleagues, an unpreserved module was defined as a module with
đđ đ˘đđđđđŚ ⤠2 and đđđđđđđ đđđ greater than half of the total number of the modules
detected (Langfelder et al. 2011).
Module Characterization
The functional characterization of the unpreserved modules was performed using
a Fisherâs exact test, a test of proportions based on the cumulative hypergeometric
distribution. Genes within each unpreserved module were scrutinized using six different
39
databases, including Gene Ontology (GO) (Ashburner et al. 2000), KEGG (Kanehisa et
al. 2000), Interpro (Mitchell et al. 2019), Reactome (Jassal et al. 2020), Medical Subject
Headings (MeSH) (Nelson et al. 2004) and Molecular Signatures Database (MSigDB)
(Liberzon et al. 2011). The P-values of significant overrepresentation/enrichment of
biological pathways, molecular functions and functional terms were calculated using
Fisherâs exact test.
DNA Extraction, Library Preparation and Sequencing
Total DNA was extracted from muscle samples for whole-genome bisulfite
sequencing analysis. Extraction, library construction, bisulfite treatment and sequencing
were performed by Novogene Bioinformatics Technology Co., Ltd (Beijing, China).
Libraries were sequenced with Illuminaâs HiSeq 3000 using 150-bp paired-end reads. A
total of 16 muscle samples from 16 bull calves derived from 7 control and 9 methionine-
rich maternal diets were successfully processed and sequenced, and hence used for
subsequent bisulfite-sequencing analyses. Whole-genome bisulfite sequencing data
can be accessed by NCBI GEO with the accession number GSE117194.
Bisulfite-Seq Quality Control and Mapping
The quality of the sequencing reads was evaluated using the software FastQC
(v0.11.7, Babraham Bioinformatics, UK). Adaptor removal and trimming was performed
when needed using the software Trim Galore (v0.4.4, Babraham Bioinformatics, UK).
After quality control and processing, the resulting paired-end sequencing reads were
aligned to the ARS-UCD1.2 bovine reference genome using the software Bismark
(v0.17.0, Babraham Bioinformatics, UK) (Krueger et al. 2011). The tool
deduplicate_bismark was used to remove duplicate read alignments. Methylation calls
were performed using Bismark methylation extractor (v0.17.0, Babraham
40
Bioinformatics) using the following parameters: --paired-end, --comprehensive, --
bedGraph, and --cytosine_report (Krueger et al. 2011).
Differentially Methylated Cytosines and Genes
Differential methylation between maternal diets was analyzed using a logistic
regression implemented in the R package Methylkit (v1.0.0) (Akalin et al. 2012). Only
cytosines with read coverage greater than 8 in a CpG context were evaluated.
Differentially methylated cytosines were defined as those having methylation
percentage changes between treatments greater than 20% and q-values ⤠0.10. The
software Rgmatch was used to match the cytosines to different gene features, such as
transcription start site, exons, introns, and upstream regions (Furio-Tari et al. 2016). In
addition, we defined methylation level as the ratio of differentially methylated cytosines
to all the cytosines evaluated in a given region. For each gene, we calculated the
methylation level for the regulatory region (transcription start site, promoter and
upstream region) and also inside the gene body (exons and introns).
Results
RNA-sequencing Analysis
The RNA-sequencing of the muscle transcriptome yielded about 50 million
paired-end reads per sample. Roughly 87% of the reads were mapped to the ARS-
UCD1.2 bovine genome assembly using the software Hisat2. After removing the
myosins (highly abundant genes in muscle) and lowly expressed genes (genes with 5 or
less read counts in at least 9 biological replicates), a total of 12,786 genes were
retained for the network analysis.
41
Network Construction and Module Identification
We first characterized the muscle transcriptome under normal conditions, i.e., no
nutritional treatment. As such, the inference of the gene coexpression network and
subsequent module identification was performed using only samples derived from the
maternal control diet. A total of 7,034 genes with high across-sample expression
variance were included in this network analysis. A soft-threshold equal to 24 was
derived from the high scale-free fitting index (R2 ⼠0.8; Figure 3-1A), resulting in a mean
connectivity equal to 31.45 (Figure 3-1B). A total of 147 preliminary modules were
detected using a dynamic tree cut process, and after merging highly correlated modules
(Pearsonâs correlation ⼠0.8; Figure 3-1C), a total of 14 modules, including the
background set (grey module) were retained for subsequent analysis (Figure 3-2A).
Module Preservation
After we characterized the muscle transcriptome in the control diet, we
investigated the impact of the nutritional treatment (maternal methionine
supplementation) on gene coexpression patterns. As such, we compared the structure
of coexpression networks between control and methionine diets to identify changes in
the topology. These changes were evaluated using a permutation test with 2000
iterations. Twelve module preservation statistics were calculated for each of the 14
modules previously identified in the control diet. A total of six modules were considered
as unpreserved (đđ đ˘đđđđđŚ ⤠2), five modules were considered as preserved with weak
to moderate evidence of preservation (2 < đđ đ˘đđđđđŚ ⤠10), and finally two modules
showed strong evidence of preservation (đđ đ˘đđđđđŚ > 10) (Figure 3-3). Overall, the six
42
unpreserved modules were considered as gene coexpression modules or subnetworks
that were significantly perturbed by the maternal methionine diet.
Module Characterization
The six unpreserved modules were further investigated to reveal their functional
roles and gain insights about the biological processes that were impacted by maternal
methionine supplementation. This functional characterization was performed using a
Fisherâs exact test, a hypergeometric-based overrepresentation test commonly used to
evaluate 2Ă2 contingency tables. Six different biological databases were evaluated,
including Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG),
Reactome, InterPro, Medical Subject Headings (MeSH), and Molecular Signatures
Database (MSigDB). Figure 3-4 shows the functional characterization for
âantiquewhite2â, the most perturbated module. Interestingly, our analysis revealed that
genes in this unpreserved module are closely related to (i) myogenesis, adipogenesis,
and fibrogenesis, (ii) ribosome structure, (iii) rRNA binding and processing, (iv)
mitochondrial activities, (v) ATP synthesis, and (vi) NAD(P)H oxidoreductases.
Moreover, genes in the module âvioletâ are implicated in the regulation of the canonical
Wnt signaling pathway, a signal transduction pathway that is involved in different
embryonic processes, such as cell fate specification, cell proliferation, and cell
migration. The full list of significant functional terms, including term ID, term name, total
number of genes in the module, and Fisherâs P-value are reported.
DNA Methylation Analysis
Whole-genome bisulfite sequencing produced roughly 350M paired-end reads
per sample. The software Bismark was used to map the reads to the ARS-UCD1.2
bovine genome assembly, yielding a 70% mapping rate. A total of 5,136,556 cytosines
43
(CpG context) were evaluated (read coverage ⼠8), and 101,094 were identified as
differentially methylated between maternal diets (methylation change ⼠20%, q-value â¤
0.10). Based on the ARS-UCD1.2 annotation file, cytosines were classified as (i) within
a gene (gene body: exons and introns), (ii) within the regulatory region (5.5 kb upstream
the gene), or (iii) located in an intergenic region. As results, we targeted a total of
25,491 genes annotated in the cow genome that had at least one evaluated cytosine
(either gene body or regulatory region), and 10,247 of the 25,491 had at least one
differentially methylated cytosine. Of interest, a total of 6,735 of the 7,034 genes used in
the network analysis had methylation data.
DNA Methylation and Module Preservation
We investigated if there were significant differences in DNA methylation between
genes in preserved and unpreserved modules. For each gene, we calculated the
methylation level as differentially methylated cytosines divided by all the cytosines
evaluated. We calculated the methylation level either for the gene body or the regulatory
region. Notably, the distribution of methylation level in the gene body was significantly
different in genes located in unpreserved modules (n = 1146) compared to genes in
preserved modules (n = 5589) (Kolmogorov-Smirnov test, P-value ⤠0.01, Figure 3-5).
We observed the same result if only transcription factors are considered. On the other
hand, there was no difference in the distribution of methylation level in the regulatory
region between preserved or unpreserved modules, neither for all genes nor for only
transcription factors.
DNA Methylation and Network Properties
We also investigated the relationship between DNA methylation and three
different gene network properties, namely differential coexpression score, module
44
membership, and intramodular connectivity. Interestingly, for those genes located in
preserved modules, we found a negative relationship between methylation level and
intramodular connectivity, considering either the gene body (regression coefficient đ˝ =
â0.23, P-value = 0.009, Figure 3-6) or the regulatory region (regression coefficient đ˝ =
â0.10, P-value = 0.013, Figure 3-6). On the contrary, there was no relationship between
methylation level and intramodular connectivity for genes in unpreserved modules.
Moreover, a significant negative relationship was found between methylation level in the
regulatory region and module membership, but again only for genes in preserved
modules.
Discussion
Global coexpression network analysis provides a powerful approach to uncover
the molecular basis of phenotypic variation. Gene coexpression networks are typically
used to infer and annotate gene functions, prioritize candidate regulatory genes, and
reveal transcriptional regulatory mechanisms. Lately, there has been a greater
emphasis on the use of network analysis to elucidate the changes in gene expression
patterns in response to changes in experimental conditions or environmental insults.
The present study was specially designed to reveal structural changes in gene
coexpression networks due to a maternal methionine-rich diet. Maternal nutrition
represents a major intrauterine environmental insult that can induce permanent changes
in the offspring. Here, we evaluated gene coexpression networks in the muscle of bull
beef calves gestated under a control or methionine-rich diet, we functionally
characterized the subnetworks altered by maternal methionine supplementation, and we
investigated the link between network perturbation and DNA methylation. Our results
45
provide evidence that maternal nutrition can significantly perturb gene coexpression
patterns in the offspring, and some of these changes might be mediated by alterations
in DNA methylation.
Maternal methionine supplementation significantly disturbed gene coexpression
patterns in the offspringâs muscle. In fact, following the methodology proposed by
Langfelder et al. (2011), we identified six modules or subnetworks that significantly
changed between experimental conditions. This permutation-based method basically
evaluates if the two most important aspects of module topology, namely density and
connectivity, are preserved between a reference condition and a test condition. Here,
we found that neither the connection strength nor the connectivity pattern of these six
subnetworks detected in the control diet was preserved in the methionine-rich diet.
There is growing evidence that certain intrauterine insults impact gene coexpression
patterns, which in turn may alter fetal developmental programming. For instance,
Deyssenroth et al. (2017) reported that alterations in gene coexpression networks in the
human placenta are associated with abnormal fetal growth and development. Lombardo
et al. (2018) showed that maternal immune activation via infection during pregnancy
disrupts fetal brain gene coexpression networks, and this disruption is associated with
an increased risk for autism spectrum disorder. Recently, we reported that exposure to
gossypol in utero and during lactation altered the development and gene expression of
the testicles, including a significant perturbation of coexpression patterns among
spermatogenesis-related genes (Louvandini et al. 2020). Overall, our findings provide
further evidence that intrauterine insults, such as diet, not only can change gene
46
expression but also alter coexpression patterns, which in turn suggests alterations in
coexpression mechanisms.
The functional characterization revealed that some of the unpreserved modules
are directly implicated in myogenesis, adipogenesis, and fibrogenesis. Notably, it is
well-documented that maternal nutrition alters fetal skeletal muscle development by
interfering with these three important processes (Du et al. 2010). For instance, Zhu and
collaborators have shown that nutrient deficiency in ruminants from early to mid-
gestation negatively impacts myogenesis, reducing muscle fiber number and also
muscle mass (Zhu et al. 2004; Zhu et al. 2006). In addition, Tong and collaborators
have reported that maternal overnutrition enhances adipogenesis in fetal skeletal
muscle (Tong et al. 2008; Tong et al. 2009). Similarly, Du et al. (2010) reported that
maternal undernutrition with supplementation of ruminal bypass protein from day 60 to
day 180 of gestation significantly affects adipogenesis, changing marbling scores in the
steer progeny. Moreover, pigs with reduced birth weight due to malnutrition in utero
have a greater content of collagen in their skeletal muscle (Karunaratne et al. 2005).
Note that myogenesis, adipogenesis and fibrogenesis are vital aspects of muscle
physiology, directly impacting lean muscle mass, marbling and also collagen content,
and hence, any disturbances in these processes may have longâterm consequences,
impacting muscle growth and meat quality.
We also found unpreserved subnetworks related to the regulation of the
canonical Wnt/β-catenin pathway. By acting through autocrine and/or paracrine
mechanisms, the Wnt family of secreted glycoproteins affects different aspects of cell
physiology, such as cell proliferation, cell differentiation or maintenance of precursor
47
cells (Novakofski 2004; Johnson et al. 2006). Interestingly, in skeletal muscle, β-catenin
regulates the expression of two transcription factors, namely PAX3 and GLI, which are
essential for skeletal myogenesis (Capdevila et al. 1998; Borycki et al. 2000). Indeed,
some studies have shown that blocking the β-catenin pathway reduces the total number
of myocytes (Pan et al. 2005; Yamanouchi et al. 2007). Shang and colleagues showed
that the upregulation of Wnt signaling promotes myogenesis, whereas downregulation
of this pathway promotes adipogenesis (Shang et al. 2007). Moreover, we identified
terms related to basic cell structures/functions, including ribosome structure, rRNA
binding and processing, mitochondrial activities, ATP synthesis and NAD(P)H
oxidoreductases. Notably, previous studies have shown that these cell activities can be
affected by maternal nutrient restrictions. For instance, PeĂąagaricano et al. (2014)
reported that maternal diets significantly impact functional terms closely related to the
ribosome in fetal muscle. Moreover, Mayeur et al. (2013) found that maternal
undernutrition induces placental mitochondrial abnormalities and reduced ATP level in
mice offspring. Similarly, Zhu et al. (2006) found that maternal nutrient restriction can
induce downregulation of key enzymes involved in mitochondrial function in the
offspringâs muscle.
The exact mechanisms by which maternal diet can affect gene coexpression
patterns in the offspring are not yet known. Here, we examined the hypothesis that
changes in DNA methylation cause changes in the topology of gene networks. Nearly
2% of all the evaluated cytosines in a CpG context were found to be differentially
methylated between maternal diets. Notably, we found significant differences in the
level of gene body methylation between genes in preserved modules versus genes in
48
unpreserved modules. The same trend was observed when only transcription factors
were considered. Although the function of DNA methylation in regulatory regions is well-
known, the role of DNA methylation within the gene is not yet well understood. Some
studies have suggested that DNA methylation in the gene body might be involved in the
regulation of alternative splicing (Shukla et al. 2011; Maunakea et al. 2013). Of special
interest, Saha et al. (2017) reported that the regulation of alternative splicing is
coordinated across functionally related genes. Therefore, changes in DNA methylation
might cause changes in isoform expression, which in turn can alter gene coexpression
patterns. Moreover, changes in DNA methylation also altered subnetwork properties.
Indeed, in preserved modules, there was a clear negative relationship between
methylation level and intramodular connectivity, i.e., more methylation, less connectivity,
that is, less gene activity. Notably, this functional relationship completely disappeared in
unpreserved modules. Overall, our findings suggest that maternal methionine
supplementation may induce changes in the offspring epigenome, such as changes in
DNA methylation, which in turn alter coexpression patterns and gene network
properties.
Overall, our study has shown that maternal nutrition levels during preconception
and early pregnancy can significantly impact gene coexpression patterns in the
offspring. Some of the perturbed gene functions are directly implicated in the
development of skeletal muscle, such as myogenesis, adipogenesis, and Wnt/β-catenin
pathway. Notably, some of the changes in gene coexpression patterns are associated
with changes in DNA methylation. To the best of our knowledge, this is the first study
that investigates the link between maternal nutrition, DNA methylation and gene
49
coexpression networks. Our findings suggest that maternal nutrition perturbs gene
coexpression patterns, and these alterations are in part mediated by changes in the
epigenome.
50
Figure 3-1. Soft-thresholding powers and module detection. (A) Scale-free topology fitting index (y-axis) as function of the soft-thresholding power (x-axis). (B) Mean connectivity (y-axis) as function of the soft-thresholding power (x-axis). (C) Tree plot of initial module eigengenes.
51
Figure 3-2. Network visualization. (A) Gene cluster dendrogram in the maternal control diet. (B) Gene cluster dendrogram in the maternal methionine diet.
52
Figure 3-3. Permutation-based composite preservation statistics. (A) Summary statistics đđ đ˘đđđđđŚ (y-axis) as function of module size. Non-preserved modules were
identified using đđ đ˘đđđđđŚ ⤠2. (B) Summary statistics đđđđđđđ đđđ (y-axis) as
function of module size. High median rank values suggest strong evidence of perturbation.
53
Figure 3-4. Functional characterization of non-preserved modules. Six gene annotation databases were analyzed: Gene Ontology (GO), KEGG, Medical Subject Headings (MeSH), InterPro, Reactome and Molecular Signatures Database. The y-axis displays the term ID and the total number of genes in each term. The black dots represent the significance of enrichment (-log10 P-value, Fisherâs exact test, top x-axis) and the bars represent the percentage of significant genes in each functional term (bottom x-axis).
54
Figure 3-5. Comparison of methylation levels between preserved and unpreserved modules. Methylation level was calculated as differentially methylated cytosines divided by all the cytosines evaluated.
55
Figure 3-6. Relationship between methylation level and intramodular connectivity. Methylation level was calculated as differentially methylated cytosines divided by all the cytosines evaluated. Regressions in preserved modules (red) were significant (P-value < 0.05).
56
CHAPTER 4 CONCLUSIONS
The era of systems biology is inspired by the fact that genes and proteins do not
work in isolation but are connected in intricate networks. Therefore, deciphering gene
coexpression patterns is crucial to understand the response of the system to external
perturbations. Indeed, differential network analysis allows us to determine changes in
the topology of the gene networks across conditions. Expression similarities hint at
common regulatory mechanisms, and hence, changes in gene networks might indicate
disruptions in coregulation mechanisms.
Our findings suggest that maternal nutrition status during preconception and
early pregnancy can significantly influence gene coexpression patterns in the offspring,
and some of these perturbations are mediated by changes in DNA methylation. Indeed,
our module preservation analysis revealed several changes in gene coexpression
patterns and network topology in response to maternal methionine supplementation.
Functional characterizations of the perturbed subnetworks revealed that these gene
modules are involved in processed directly implicated in the development and
physiology of the skeletal muscle, such as myogenesis, adipogenesis, and Wnt/β-
catenin pathway, as well as some other basic cell structures/functions, including
ribosome structure, rRNA binding and processing, mitochondrial activities, ATP
synthesis and NAD(P)H oxidoreductases. Disturbances in these processes may have
longâterm consequences, impacting muscle growth and meat quality.
The exact mechanisms by which external perturbations affect gene coexpression
patterns are not yet known. Here, we found significant differences in the level of DNA
methylation between genes in preserved modules versus genes in unpreserved
57
modules. In addition, we revealed some linear relationships between network properties
and gene methylation levels. Overall, our results suggest that some of the changes in
gene coexpression patterns are mediated by changes in DNA methylation.
58
LIST OF REFERENCES
Acerbi E., Zelante T., Narang V. & Stella F. (2014) Gene network inference using continuous time Bayesian networks: a comparative study and application to Th17 cell differentiation. BMC Bioinformatics 15, 387.
Akalin A., Kormaksson M., Li S., Garrett-Bakelman F.E., Figueroa M.E., Melnick A., et al. (2012) methylKit: a comprehensive R package for the analysis of genome-wide DNA methylation profiles. Genome Biol 13, R87.
Allocco D.J., Kohane I.S. & Butte A.J. (2004) Quantifying the relationship between co-expression, co-regulation and gene function. BMC Bioinformatics 5, 18.
Anders S., Pyl P.T. & Huber W. (2015) HTSeq--a Python framework to work with high-throughput sequencing data. Bioinformatics 31, 166-9.
Ashburner M., Ball C.A., Blake J.A., Botstein D., Butler H., Cherry J.M., et al. (2000) Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 25, 25-9.
Baitaluk M., Qian X., Godbole S., Raval A., Ray A. & Gupta A. (2006) PathSys: integrating molecular interaction graphs for systems biology. BMC Bioinformatics 7, 55.
Barabasi A.L. & Bonabeau E. (2003) Scale-free networks. Sci Am 288, 60-9.
Barrett T., Wilhite S.E., Ledoux P., Evangelista C., Kim I.F., Tomashevsky M., et al. (2013) NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res 41, D991-5.
Biswas S. & Acharyya S. (2016) Neural model of gene regulatory network: a survey on supportive meta-heuristics. Theory Biosci 135, 1-19.
Borycki A.G., Brown A.M.C. & Emerson C.P. (2000) Shh and Wnt signaling pathways converge to control Gli gene activation in avian somites. Development 127, 2075-87.
Bruggeman F.J. & Westerhoff H.V. (2007) The nature of systems biology. Trends Microbiol 15, 45-50.
Burdge G.C. & Lillycrop K.A. (2010) Nutrition, epigenetics, and developmental plasticity: implications for understanding human disease. Annu Rev Nutr 30, 315-39.
Capdevila J., Tabin C. & Johnson R.L. (1998) Control of dorsoventral somite patterning by Wnt-1 and β-catenin. Developmental biology 193, 182-94.
59
Chai L.E., Loh S.K., Low S.T., Mohamad M.S., Deris S. & Zakaria Z. (2014) A review on the computational approaches for gene regulatory network construction. Comput Biol Med 48, 55-65.
Chekouo T., Stingo F.C., Doecke J.D. & Do K.A. (2015) miRNA-target gene regulatory networks: A Bayesian integrative approach to biomarker selection with application to kidney cancer. Biometrics 71, 428-38.
Chimpanzee Sequencing and Analysis Consortium (2005) Initial sequence of the chimpanzee genome and comparison with the human genome. Nature 437, 69-87.
Chmurzynska A. (2010) Fetal programming: link between early nutrition, DNA methylation, and complex diseases. Nutr Rev 68, 87-98.
Chudasama D., Bo V., Hall M., Anikin V., Jeyaneethi J., Gregory J., et al. (2018) Identification of cancer biomarkers of prognostic value using specific gene regulatory networks (GRN): a novel role of RAD51AP1 for ovarian and lung cancers. Carcinogenesis 39, 407-17.
Dawid A.P. (1979) Conditional independence in statistical theory. Journal of the Royal Statistical Society: Series B (Methodological) 41, 1-15.
Delgado F.M. & Gomez-Vela F. (2019) Computational methods for Gene Regulatory Networks reconstruction and analysis: A review. Artif Intell Med 95, 133-45.
Deyssenroth M.A., Peng S.E., Hao K., Lambertini L., Marsit C.J. & Chen J. (2017) Whole-transcriptome analysis delineates the human placenta gene network and its associations with fetal growth. BMC Genomics 18.
Diep D., Plongthongkum N., Gore A., Fung H.L., Shoemaker R. & Zhang K. (2012) Library-free methylation sequencing with bisulfite padlock probes. Nat Methods 9, 270-2.
Du M., Tong J., Zhao J., Underwood K.R., Zhu M., Ford S.P., et al. (2010) Fetal programming of skeletal muscle development in ruminant animals. Journal of Animal Science 88, E51-E60.
Eisen M.B., Spellman P.T., Brown P.O. & Botstein D. (1998) Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences 95, 14863-68.
Furio-Tari P., Conesa A. & Tarazona S. (2016) RGmatch: matching genomic regions to proximal genes in omics data integration. BMC Bioinformatics 17, 427.
Gibbs R.A., Weinstock G.M., Metzker M.L., Muzny D.M., Sodergren E.J., Scherer S., et al. (2004) Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428, 493-520.
60
Godfrey K.M. & Barker D.J. (2001) Fetal programming and adult health. Public health nutrition 4, 611-24.
Goldberg A.D., Allis C.D. & Bernstein E. (2007) Epigenetics: a landscape takes shape. Cell 128, 635-8.
Han J.D. (2008) Understanding biological functions through molecular networks. Cell Research 18, 224-37.
Haoudi A. & Bensmail H. (2006) Bioinformatics and data mining in proteomics. Expert Rev Proteomics 3, 333-43.
Hecker M., Lambeck S., Toepfer S., van Someren E. & Guthke R. (2009) Gene regulatory network inference: data integration in dynamic models-a review. Biosystems 96, 86-103.
International Human Genome Sequencing C. (2004) Finishing the euchromatic sequence of the human genome. Nature 431, 931-45.
Jassal B., Matthews L., Viteri G., Gong C., Lorente P., Fabregat A., et al. (2020) The reactome pathway knowledgebase. Nucleic Acids Res 48, D498-D503.
Johnson M.L. & Rajamannan N. (2006) Diseases of Wnt signaling. Reviews in Endocrine & Metabolic Disorders 7, 41-49.
Kadarmideen H.N. & Reverter A. (2007) Combined genetic, genomic and transcriptomic methods in the analysis of animal traits. CABI review: perspectives in agriculture, veterinary science, nutrition and natural resources 2, 16.
Kaderali L. & Radde N. (2008) Inferring gene regulatory networks from expression data. In: Computational Intelligence in Bioinformatics (pp. 33-74. Springer.
Kanehisa M. & Goto S. (2000) KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 28, 27-30.
Karas M., Ingendoh A., Bahr U. & Hillenkamp F. (1989) Ultravioletâlaser desorption/ionization mass spectrometry of femtomolar amounts of large proteins. Biomedical & environmental mass spectrometry 18, 841-43.
Karunaratne J.F., Ashton C.J. & Stickland N.C. (2005) Fetal programming of fat and collagen in porcine skeletal muscles. Journal of Anatomy 207, 763-68.
Kim D., Langmead B. & Salzberg S.L. (2015) HISAT: a fast spliced aligner with low memory requirements. Nat Methods 12, 357-60.
Kordmahalleh M.M., Sefidmazgi M.G., Harrison S.H. & Homaifar A. (2017) Identifying time-delayed gene regulatory networks via an evolvable hierarchical recurrent neural network. BioData mining 10, 29.
61
Krueger F. & Andrews S.R. (2011) Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571-2.
Laird P.W. (2010) Principles and challenges of genome-wide DNA methylation analysis. Nature Reviews Genetics 11, 191-203.
Lander E.S., Linton L.M., Birren B., Nusbaum C., Zody M.C., Baldwin J., et al. (2001) Initial sequencing and analysis of the human genome. Nature 409, 860-921.
Langfelder P. & Horvath S. (2008) WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics 9, 559.
Langfelder P., Luo R., Oldham M.C. & Horvath S. (2011) Is my network module preserved and reproducible? PLoS Comput Biol 7, e1001057.
Lappalainen T., Sammeth M., Friedlander M.R., t Hoen P.A., Monlong J., Rivas M.A., et al. (2013) Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506-11.
Liberzon A., Subramanian A., Pinchback R., Thorvaldsdottir H., Tamayo P. & Mesirov J.P. (2011) Molecular signatures database (MSigDB) 3.0. Bioinformatics 27, 1739-40.
Lombardo M.V., Moon H.M., Su J., Palmer T.D., Courchesne E. & Pramparo T. (2018) Maternal immune activation dysregulation of the fetal brain transcriptome and relevance to the pathophysiology of autism spectrum disorder. Molecular Psychiatry 23, 1001-13.
Louvandini H., Correa P.S., Amorin R., Liu L., Ieda E.H., Jimenez C.R., et al. (2020) Gestational and lactational exposure to gossypol alters the testis transcriptome. BMC Genomics 21, 59.
Maheshri N. & OâShea E.K. (2007) Living with noisy genes: how cells function reliably with inherent variability in gene expression. Annual review of biophysics and biomolecular structure 36.
Mantione K.J., Kream R.M., Kuzelova H., Ptacek R., Raboch J., Samuel J.M., et al. (2014) Comparing bioinformatic gene expression profiling methods: microarray and RNA-Seq. Med Sci Monit Basic Res 20, 138-42.
Markowetz F. & Spang R. (2007) Inferring cellular networks--a review. BMC Bioinformatics 8 Suppl 6, S5.
Marzese D.M. & Hoon D.S. (2015) Emerging technologies for studying DNA methylation for the molecular diagnosis of cancer. Expert Rev Mol Diagn 15, 647-64.
62
Maunakea A.K., Chepelev I., Cui K.R. & Zhao K.J. (2013) Intragenic DNA methylation modulates alternative splicing by recruiting MeCP2 to promote exon recognition. Cell Research 23, 1256-69.
Mayeur S., Lancel S., Theys N., Lukaszewski M.A., Duban-Deweer S., Bastide B., et al. (2013) Maternal calorie restriction modulates placental mitochondrial biogenesis and bioenergetic efficiency: putative involvement in fetoplacental growth defects in rats. American Journal of Physiology-Endocrinology and Metabolism 304, E14-E22.
Meissner A., Gnirke A., Bell G.W., Ramsahoye B., Lander E.S. & Jaenisch R. (2005) Reduced representation bisulfite sequencing for comparative high-resolution DNA methylation analysis. Nucleic Acids Res 33, 5868-77.
Melkman A.A., Cheng X., Ching W.K. & Akutsu T. (2018) Identifying a Probabilistic Boolean Threshold Network From Samples. IEEE Trans Neural Netw Learn Syst 29, 869-81.
Mitchell A.L., Attwood T.K., Babbitt P.C., Blum M., Bork P., Bridge A., et al. (2019) InterPro in 2019: improving coverage, classification and access to protein sequence annotations. Nucleic Acids Research 47, D351-D60.
Moignard V., Woodhouse S., Haghverdi L., Lilly A.J., Tanaka Y., Wilkinson A.C., et al. (2015) Decoding the regulatory network of early blood development from single-cell gene expression measurements. Nat Biotechnol 33, 269-76.
Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520-62.
Nelson S.J., Schopen M., Savage A.G., Schulman J.L. & Arluk N. (2004) The MeSH translation maintenance system: structure, interface design, and implementation. Stud Health Technol Inform 107, 67-9.
Novakofski J. (2004) Adipogenesis: Usefulness of in vitro and in vivo experimental models. Journal of Animal Science 82, 905-15.
Orlando D.A., Lin C.Y., Bernard A., Wang J.Y., Socolar J.E., Iversen E.S., et al. (2008) Global control of cell-cycle transcription by coupled CDK and network oscillators. Nature 453, 944-7.
Ozsolak F. & Milos P.M. (2011) RNA sequencing: advances, challenges and opportunities. Nat Rev Genet 12, 87-98.
Pan W.J., Jia Y.Y., Wang J.Y., Tao D.L., Gan X.Q., Tsiokas L., et al. (2005) beta-Catenin regulates myogenesis by relieving I-mfa-mediated suppression of myogenic regulatory factors in P19 cells. Proceedings of the National Academy of Sciences of the United States of America 102, 17378-83.
63
Pandey A. & Mann M. (2000) Proteomics to study genes and genomes. Nature 405, 837-46.
PeĂąagaricano F., Wang X., Rosa G.J.M., Radunz A.E. & Khatib H. (2014) Maternal nutrition induces gene expression changes in fetal muscle and adipose tissues in sheep. BMC Genomics 15, 1034.
Robinson M.D., McCarthy D.J. & Smyth G.K. (2010) edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139-40.
Rochfort S. (2005) Metabolomics reviewed: a new "omics" platform technology for systems biology and implications for natural products research. J Nat Prod 68, 1813-20.
Saha A., Kim Y., Gewirtz A.D.H., Jo B., Gao C., McDowell I.C., et al. (2017) Co-expression networks reveal the tissue-specific regulation of transcription and splicing. Genome Research 27, 1843-58.
Schneider M.V. & Orchard S. (2011) Omics technologies, data and bioinformatics principles. In: Bioinformatics for omics Data (pp. 3-30. Springer.
Shang Y., Zhang C., Wang S., Xiong F., Zhao C., Peng F., et al. (2007) Activated β-catenin induces myogenesis and inhibits adipogenesis in BM-derived mesenchymal stromal cells. Cytotherapy 9, 667-81.
Shukla S., Kavak E., Gregory M., Imashimizu M., Shutinoski B., Kashlev M., et al. (2011) CTCF-promoted RNA polymerase II pausing links DNA methylation to splicing. Nature 479, 74-U99.
Sigdel A., Liu L., Abdollahi-Arpanahi R., Aguilar I. & Penagaricano F. (2020) Genetic dissection of reproductive performance of dairy cows under heat stress. Anim Genet.
Simak M., Yeang C.H. & Lu H.H. (2017) Exploring candidate biological functions by Boolean Function Networks for Saccharomyces cerevisiae. PLoS One 12, e0185475.
Suravajhala P., Kogelman L.J. & Kadarmideen H.N. (2016) Multi-omic data integration and analysis using systems genomics approaches: methods and applications in animal production, health and welfare. Genetics Selection Evolution 48, 38.
Tong J., Zhu M.J., Underwood K.R., Hess B.W., Ford S.P. & Du M. (2008) AMP-activated protein kinase and adipogenesis in sheep fetal skeletal muscle and 3T3-L1 cells. Journal of Animal Science 86, 1296-305.
64
Tong J.F., Yan X., Zhu M.J., Ford S.P., Nathanielsz P.W. & Du M. (2009) Maternal obesity downregulates myogenesis and β-catenin signaling in fetal skeletal muscle. American Journal of Physiology-Endocrinology and Metabolism 296, E917-E24.
van Dam S., Vosa U., van der Graaf A., Franke L. & de Magalhaes J.P. (2018) Gene co-expression analysis for functional classification and gene-disease predictions. Briefings in Bioinformatics 19, 575-92.
Venter J.C., Adams M.D., Myers E.W., Li P.W., Mural R.J., Sutton G.G., et al. (2001) The sequence of the human genome. Science 291, 1304-51.
Voit E.O. (2000) Computational analysis of biochemical systems: a practical guide for biochemists and molecular biologists. Cambridge University Press.
Waddington C.H. (1942) The epigenotype. Endeavour 1, 18-20.
Wang X.-L., Liu Y.-M., Zhang Z.-D., Wang S.-S., Du Y.-B. & Yin Z.-S. (2020) Utilizing benchmarked dataset and gene regulatory network to investigate hub genes in postmenopausal osteoporosis.
Whitehouse C.M., Dreyer R.N., Yamashita M. & Fenn J.B. (1985) Electrospray interface for liquid chromatographs and mass spectrometers. Anal Chem 57, 675-9.
Wu G., Bazer F.W., Cudd T.A., Meininger C.J. & Spencer T.E. (2004) Maternal nutrition and fetal development. The Journal of nutrition 134, 2169-72.
Yamanouchi K., Hosoyama T., Murakami Y. & Nishihara M. (2007) Myogenic and adipogenic properties of goat skeletal muscle stem cells. Journal of Reproduction and Development 53, 51-58.
Zhang B. & Horvath S. (2005) A general framework for weighted gene co-expression network analysis. Stat Appl Genet Mol Biol 4, Article17.
Zhu M.J., Ford S.P., Means W.J., Hess B.W., Nathanielsz P.W. & Du M. (2006) Maternal nutrient restriction affects properties of skeletal muscle in offspring. Journal of Physiology-London 575, 241-50.
Zhu M.J., Ford S.P., Nathanielsz P.W. & Du M. (2004) Effect of maternal nutrient restriction in sheep on the development of fetal skeletal muscle. Biology of Reproduction 71, 1968-73.
Zhu X., Gerstein M. & Snyder M. (2007) Getting connected: analysis and principles of biological networks. Genes Dev 21, 1010-24.
65
BIOGRAPHICAL SKETCH
Lihe Liu was born and raised in Zibo, Shandong province, Peopleâs Republic of
China. After he completed high school, he was admitted to Huazhong Agricultural
University (HZAU), where he earned his bachelorâs degree in animal sciences. In the fall
of 2016, he joined China Agricultural University (CAU) as an exchange student. After
one year of study, he came to the United States and joined the University of Florida
(UF) as an international exchange student in the fall of 2017. In the spring of 2018, he
joined Dr. Francisco PeĂąagaricano lab as an intern, soon later, he joined the lab
officially as a master program student. At UF, he has been conducting research in the
animal science field using computational biology approaches. He has experience in
statistical computing and bioinformatic analysis.