Academic onefile – document – brain-cox investigating and visualising gene co-expression in seven human brain transcriptomic datasets hipoxia anoxia

Designing and performing scientific experiments to generate functional

Evidence for the involvement of a gene in a mental or neurological disorder

Is often challenging. The cost of such experiments is typically high, in

Particular when human brain tissue is necessary. Furthermore, because of the

Large number of putative disease genes only a subset of genes can be followed

Up. Fortunately, computational methods as well as visualisation techniques

Exist that can help to prioritise which candidate genes to pursue with such

Methods. These methods are referred to as in silico prioritisation. These

Methods typically rely on knowledge collected in genomic databases, such as

The online mendelian inhertiance in man (OMIM) database [7], as well as gene

hipoxia anoxia

Expression data from healthy individuals [8, 9]. Popular examples of

Computational methods offering in silico prioritisation include endeavour

[10] and toppgene [11]. One of the most frequently employed visualisation

Tools for gene networks is string [12].

Both in silico prioritisation and gene network visualisation tools have been

Successfully applied to many diseases [13, 14]. Nevertheless, most tools are

Biased towards what is already known due to their reliance on genomic

Databases and literature searches via databases such as pubmed [15]. Because

Of this known bias some tools also integrate gene expression data from

Healthy individuals to implicate disease pathways discovered de novo from the

Data.Hipoxia anoxia however, such gene expression data are usually generated from easily

Obtained sources, such as blood or lymphocytes, and thus may not reflect the

Pathways in the relevant disease tissue [16, 17]. Gene expression in the

Brain is uniquely different from other tissues, reflecting the complex

Biological processes in the brain [18]. Leveraging such brain-specific gene

Signatures has indeed been shown to be beneficial in uncovering disease genes

For epileptic encephalopathies [19]. Furthermore, many available tools do not

Take into account that gene expression varies considerably over the course of

An individual’s development, especially in the brain. For example, in

The human brain, kang et al. [20] observed that gene expression is regulated

hipoxia anoxia

To a large degree temporally and only to a lesser extent spatially.

Very few tools offer both in silico prioritisation and gene network

Visualisation, which hinders interpretation and design of functional

Downstream analysis [8] (one notable exception is the downloadable

Application NETBAG [21]). Brain-cox is a novel web-application focusing on

Gene prioritisation and exploration of gene networks for diseases that

Originate in human brain tissue. Unlike any of the existing tools,

Brain-cox’s prioritisations are based solely on brain expression data,

Making use of up to seven available large datasets measuring gene expression

In the developing and ageing human brain. These datasets were processed and

Cleaned in a homogeneous manner, ensuring maximal reproducibility of results

hipoxia anoxia

Across datasets. To our knowledge this is the first time results from these

Seven precious brain expression datasets are directly comparable, within one

Resource. Besides prioritisations, brain-cox also allows users to investigate

Pathway membership and to explore changes in gene networks throughout brain

Development via interactive visualisations. Such temporal changes in gene

Networks have been hypothesized to play a key role in many neurological and

Mental disorders, with many such disorders showing distinct ages of onset.

Finally, we designed brain-cox to be user-friendly and easily accessible

Through a website to facilitate use by researchers who are not comfortable

With command line tools.

We downloaded seven published and publicly available datasets of gene

hipoxia anoxia

Expression from post-mortem human brain tissue samples (table 1). For six out

Of seven datasets, samples were collected from individuals deemed to be

Normal with respect to mental and neurological disorders. The hernandez et

Al. Dataset [22] contains some individuals with unknown disease status.

Datasets differ widely with regards to age range of individuals, number of

Individuals and number of samples as well as tissue types collected from each

Brain. To cater for this, brain-cox allows the user to select any combination

Of these datasets. Furthermore, users are able to further subset data by

Specifying developmental periods of interest. To this end, the individuals

Contributing samples were assigned to 15 different developmental periods

hipoxia anoxia

According to their age at death (table 2). This option facilitates targeted

Prioritisation and gene exploration for specific diseases. An example would

Be a disease with onset in childhood where a focus on brain samples from this

Time period are likely to be much more informative than samples from other

Time periods.

Different experimental protocols and microarray platforms were used in the

Generation of these seven datasets, leading to diverse sources of unwanted

Biological and technical variation. Thus, homogeneous pre-processing and data

Cleaning are vital in order to ensure that these heterogeneous datasets are

Comparable [23]. During pre-processing each sample was assessed for its

Quality and samples with poor quality spot plots, unusual plots of

hipoxia anoxia

Log-intensity ratios versus log-intensity averages or abnormal gene

Expression distributions were excluded. After pre-processing, each dataset is

Treated separately with one of two implemented cleaning strategies. Users can

Choose between conventional background correction [24] in combination with

Quantile normalisation [25] or removal of unwanted variation (RUV) [26], a

Data-driven approach. Unlike most other cleaning approaches, such as combat

[27], these two approaches do not require meta data (i.E. Batches,

Laboratory, etc.) on the samples, which in most datasets was partially or not

At all available.

RUV removes unwanted variation in an adaptive manner with the help of

Negative control genes. Such genes are affected by unwanted variation, but

hipoxia anoxia

Crucially not by the biological variation of interest. The default setting in

Brain-cox is to take all house-keeping genes as negative control genes, but

These can also be empirically chosen. When unwanted variation and biological

Variation of interest are correlated with each other, RUV removes biological

Signal. In order to account for such correlation to a degree, brain-cox

Applies a version of RUV with a regularization parameter, as previously

Described [28]. To prevent further removal of biological variation of

Interest, brain-cox also excludes known disease genes, candidates and further

Genes specified by the user from being negative control genes. This method

Has been demonstrated to reliably recover gene-gene correlations, which form

hipoxia anoxia

The basis of in silico prioritisation and network visualisations.

Furthermore, its application to a subset of the brain datasets demonstrated

Improved reproducibility across datasets compared to other cleaning

Strategies [28]. Here, we also demonstrate increased accuracy prioritising

Known pathway genes compared to background correction and quantile

Normalisation (additional file 1: figures S1 and S2). Furthermore, RUV also

Considerably reduces differences between datasets. When the seven datasets

Are combined, differences between the datasets are noticeably reduced and the

Remaining clustering can be attributed to developmental differences rather

Than data sources (compare additional file 1: figures S3, S4, S5 and S6).Hipoxia anoxia

We followed the leave-one-out cross-validation approach described by aerts et

Al. [10]. In this approach one gene is deleted from the known set of genes

And termed the defector gene. The ability to prioritise this gene

In a list of 99 other candidates, made up of random genes not known to be

Associated, determines the accuracy. Unlike aerts et al., we used two

Different types of known gene sets that will reflect a spectrum of networks,

With some gene sets likely to be connected within one network and other gene

Sets showing very little connection. The former gene sets will do well with

Our approach, the latter will not. The first set of genes consists of 37 KEGG

Pathways [36] which function in the human brain as judged by keywords search

hipoxia anoxia

(additional file 1: table S2). The second set of genes was automatically

Mined from the psygenet database [37], a resource that stores genes

Associated with psychiatric diseases (for a full list see additional file 1:

Table S3). This set contained 17 diseases, such as major affective disorder

And anhedonia, and their known genes.

Our prioritisation approach (at 20% correlation threshold) has mean

Specificity above 0.70 for any one of the brain array resources, or

Combinations thereof, for both the KEGG and psygenet set of known genes

(compare figs. 1 and 2). Sensitivity rapidly decreases with the number of

Datasets required to prioritise the defector genes. Requiring a

Gene to be prioritised in at least two datasets seems to result in the best

hipoxia anoxia

Trade-off between specificity and sensitivity of the method when precision

And negative prediction value (additional file 1: figures S8 and S9) are also

Considered. It is interesting to note that the sensitivity values for the

PsyGeNet sets are only slightly below those found for the KEGG gene sets.

This suggests that gene expression networks can be utilized for the

Identification of disease genes in psychiatric diseases much like for the

Construction of pathways. A large gene co-expression network functionally

Related to synaptic transmission and recently identified to be differentially

Regulated in schizophrenia is further testament to this [38]. It also

Suggests the utility of brain-cox for the interpretation of neuropsychiatric

hipoxia anoxia

GWAS results.

We compare brain-cox to two competing web-based prioritisation approaches,

Endeavour and toppgene. Both approaches rank candidate genes with the help of

Known disease genes. We chose not to compare the performance of brain-cox to

Prioritisation approaches that do not offer web interfaces. We acknowledge

That such tools could potentially offer superior performance, but they cannot

Be used without some knowledge of programming. One example is weighted gene

Co-expression network analysis (WGCNA) [14], the most commonly used network

Construction tool. While this tool requires extensive optimization of

Parameters, it allows temporal effects to be taken into account. We did

Investigate how the performance of WGCNA compares to our approach for a

hipoxia anoxia

Subset of 37 KEGG pathways described above. We found that brain-cox performs

Better, distinguishing between genuine pathway genes and random genes, than

WGCNA (additional file 1). Note that we also do not compare brain-cox to

Prioritisation tools specialised for only one disease, such as the algorithm

Detecting association with network (DAWN) for autism [6], which uses expert

Knowledge. Such tools might outperform brain-cox. However, for most

Neurological and mental diseases specialised tools do not yet exist.