Case study 2: Allergic Asthma

DIABLO for repeated measures designs and module-based analyses

Next, we demonstrate the flexibility of DIABLO by extending its use to a repeated measures cross-over study [27], as well as incorporating module-based analyses that incorporate prior biological knowledge [28–30]. We use a small multi-omics asthma dataset, including pre and post intervention timepoints, to compare a DIABLO model that can account for repeated measures (multilevel DIABLO) with the standard DIABLO model as described above [20,21]. An allergen inhalation challenge was performed as we previously described in [20,21] in 14 subjects and blood samples were collected before (pre) and two hours after (post) challenge; cell-type frequencies, leukocyte gene transcript expression and plasma metabolite abundances were determined for all samples.

Overview of multi-omics datasets analyzed for method benchmarking and in two case studies. The breast cancer case study includes training and test datasets for all omics types except proteins.

FEV1 profiles

We observed a net decline in lung function after allergen inhalation challenge (Supplementary Fig. 9), and the goal of this study was to identify perturbed molecular mechanisms in the blood in response to allergen inhalation challenge.

DIABLO

A module based approach (also known as eigengene summarization [18], see Methods) was used to transform both the gene expression and metabolite datasets into pathway datasets. Consequently, each variable in those two datasets now represented the scaled pathway activity expression level for each sample instead of direct gene/metabolite expression. The mRNA dataset was transformed into a dataset of metabolic pathways (based on the Kyoto Encyclopedia of Genes and Genomes, KEGG) whereas the metabolite dataset was transformed into a metabolite pathway dataset based on annotations provided by Metabolon Inc. (Durham, North Carolina, USA).

To account for the repeated measures experimental design, a multilevel approach [27] was first used to isolate the within-sample variation from each dataset (see Methods), and then DIABLO was applied to identify a multi-omics biomarker panel consisting of cells, gene and metabolite modules that discriminated pre- from post-challenge samples.

tune keepX

## $cells
## [1] 28  9
## 
## $gene.module
## [1]  28 229
## 
## $metabolite.module
## [1] 28 60

error rate of optimal keepX

We contrast the resulting ‘multilevel DIABLO’ (mDIABLO) with a standard DIABLO model that disregards the paired nature of this study by comparing their cross-validation classification performances.

DIABLO - unpaired (DIABLO) vs paired (mDIABLO)

Component plots

mDIABLO outperformed DIABLO (AUC=98.5% vs. AUC=62.2%, leave-one-out cross-validation, see Methods), and we observed a greater degree of separation between the pre- and post-challenge samples for mDIABLO compared to DIABLO.

Compare DIABLO vs. mDIABLO multi-omic panels

Common features (pathways) were identified across omics-types in the mDIABLO model, but not in the standard DIABLO model. Tryptophan metabolism and Valine, leucine and isoleucine metabolism pathways were identified in both the gene and metabolite module datasets using mDIABLO.

## [1] "Valine, leucine and isoleucine metabolism"
## [2] "Tryptophan metabolism"

Heatmap

The heatmap of pairwise associations of all features identified with mDIABLO demonstrated the ability of DIABLO to select groups of correlated features which were predictive of pre- and post-challenge samples.

The Asthma pathway was also identified [even though individual gene members were not significantly altered post-challenge and was negatively associated with Butanoate metabolism and positively associated with basophils, a hallmark cell-type in asthma. These findings depict DIABLO’s flexibility and sensitivity to detect subtle differences between repeated designs, and its ability to identify common molecular processes spanning different biological layers. The biological pathways identified suggest a mechanistic link with response to allergen challenge.

Circosplot

Asthma KEGG pathway

References

  1. Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods [Internet]. 2014 [cited 2016 Jan 19];11:333–7. Available from: http://www.nature.com/doifinder/10.1038/nmeth.2810
  2. Langfelder P, Horvath S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinformatics [Internet]. 2008 [cited 2016 Apr 4];9:559. Available from: http://www.biomedcentral.com/1471-2105/9/559
  3. The TCGA Research Network. The Cancer Genome Atlas [Internet]. Available from: http://cancergenome.nih.gov/
  4. Singh A, Yamamoto M, Kam SHY, Ruan J, Gauvreau GM, O’Byrne PM, et al. Gene-metabolite expression in blood can discriminate allergen-induced isolated early from dual asthmatic responses. Hsu Y-H, editor. PLoS ONE [Internet]. 2013 [cited 2015 Jul 18];8:e67907. Available from: http://dx.plos.org/10.1371/journal.pone.0067907
  5. Singh A, Yamamoto M, Ruan J, Choi JY, Gauvreau GM, Olek S, et al. Th17/Treg ratio derived using DNA methylation analysis is associated with the late phase asthmatic response. Allergy Asthma Clin Immunol [Internet]. 2014 [cited 2016 Mar 2];10:32. Available from: http://www.biomedcentral.com/content/pdf/1710-1492-10-32.pdf
  6. Liquet B, Lê Cao K-A, Hocini H, Thiébaut R. A novel approach for biomarker selection and the integration of repeated measures experiments from two assays. BMC Bioinformatics [Internet]. 2012 [cited 2015 Jul 18];13:325. Available from: http://www.biomedcentral.com/1471-2105/13/325/
  7. Allahyar A, de Ridder J. FERAL: network-based classifier with application to breast cancer outcome prediction. Bioinformatics [Internet]. 2015 [cited 2018 Feb 1];31:i311–9. Available from: https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btv255
  8. Cun Y, Fröhlich H. Network and data integration for biomarker signature discovery via network smoothed t-statistics. Boccaletti S, editor. PLoS ONE [Internet]. 2013 [cited 2017 May 30];8:e73074. Available from: http://dx.plos.org/10.1371/journal.pone.0073074
  9. Sokolov A, Carlin DE, Paull EO, Baertsch R, Stuart JM. Pathway-based genomics prediction using generalized elastic net. PLoS Comput Biol [Internet]. 2016 [cited 2017 May 30];12:e1004790. Available from: http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004790