Introduction
Brief Introduction
The i-GSEA4GWAS (improved GSEA for GWAS) web server is a web-based resource for analysis of GWAS data (typically each SNP's -log(P-value)) to identify pathways/gene sets correlated to certain traits by implementing an improved Gene Set Enrichment Analysis (i-GSEA) approach. i-GSEA4GWAS aims to establish an open platform to help further interpret the GWAS data to provide new insights in complex disease study, especially in complementation to the standard single variant/gene based analysis.
Methods
Genome-wide association study (GWAS) has become a popular approach by utilizing genome-wide genotype array to map susceptibility effects through examining the associations between genotype frequency and phenotypes/traits [1]. The traditional single variant/gene based analysis of GWAS remains the difficulty to further explore the biological function mechanism and ignores the combination effect of modest variants/genes. To solve these key issues and fully utilize GWAS data produced with high cost, gene set enrichment analysis (GSEA) [2] has been introduced to GWAS to identify the correlation between pathways/gene sets and phenotypes/traits [3]. However GSEA usually analyzes genotype data which is not available for most GWASs. In order to perform GSEA on the available GWAS data (SNP P-value, odds ratio or statistics), we implemented by using SNP label permutation. We further improved the GSEA (i-GSEA) by emphasizing on gene sets with high proportion of significant genes to detect combination of modest effects to ensure improved sensitivity.
Input, processing, and output
The i-GSEA4GWAS web server implements i-GSEA to help researchers explore the GWAS data efficiently. With GWAS data as input, three key steps are included in the program: 1) map variants to the genome-wide genes; 2) perform i-GSEA to identify pathways/gene sets correlated to traits; 3) display significant pathways/gene sets graphically with links to detailed text information.
Additional notes
It should be noted that i-GSEA4GWAS does not take into account the linkage disequilibrium (LD) patterns from SNP arrays and won't prune the set of SNPs for LD since these can only be done when genotype data is available. So users are recommended to input SNPs not in LD (say r2 < 0.2) to reduce the possibility of biased results due to LD patterns from SNP arrays. On the other hand, as the name of the web server defines, i-GSEA4GWAS is only applicable to whole-genome SNP arrays.
References
[1] McCarthy MI, et al., 2008 Nat Rev Genet 9 356-369.
[2] Subramanian A, et al., 2005 Proc Natl Acad Sci U S A 102 (43) 15545-15550.
[3] Wang K, et al., 2007 Am J Hum Genet 81 (6) 1278-1283.
|