|Home | Documents | Template Program | Citation|
1. Input data
The input data should be a text file containing only two columns separated by table and without head line. The gzip format of the text file is also supported. There are two types of data are supported as input:
1.1 SNP association data
The first column is SNP ID and the second column is the -log (P-value) or statistics or odds ratio. The format is as follows (SNP ID, -log (P-value)).
If your input is P-value, the server will help transform it to -log (P-value). Simply tick on -logarithm transformation (necessary ONLY for P-value data)
rs1000000 0.49471432586 rs10000010 0.51215487989 rs10000023 1.11367851344 rs10000030 0.35713994742 rs10000041 0.20210951694 rs1000007 0.04436034698 rs10000081 0.37110043558 rs10000092 0.40197592767 rs10000121 0.43937612545 rs1000014 0.45892023222
1.2 gene association data
The first column is gene HUGO symbol (http://www.genenames.org/) and the second column is the association data, e.g. -log (P-value), or statistics, or odds ratio. The format is as follows (gene symbol, maximum -log (P-value) of SNPs mapped to the gene):
GDA 1.947306 SCN3A 1.6901569 SCN3B 1.5979106 RPLP2 0.5395532 BTBD1 0.87419355 BTBD2 1.6567885 BTBD3 1.7276942 RPLP1 1.4337983 ACAA2 2.0501711 TMEFF2 1.7416022
2.1 Optional multiple-level broad-to-narrow SNPs->genes mapping rules
Multiple SNPs->genes mapping rules can be utilized: "500 kb upstream and downstream range of gene", "100 kb upstream and downstream range of gene", "5 kb upstream and downstream range of gene", "within gene", and "functional SNPs", ordered from broad to narrow but rough to accurate. The SNPs->genes mapping is established based on SNP and gene annotations from the Ensembl BioMart database (Release 56 - 15th September 2009, http://www.ensembl.org/biomart/martview). Only one option can be chosen per run, and it is only applicable for SNP data.Figure 2.1 Option of SNPs->genes mapping rules.
2.2 Choose gene set databaseFigure 2.2
2.2.1 canonical pathwaysThe canonical pathways are from MSigDB v2.5 containing the pathways integrated and curated from a variety of online resources as follows:
Signaling pathway database
Signal transduction knowledge environment
Human protein reference database
Gene arrays, BioScience Corp
Human cancer genome anatomy consortium
2.2.2 Curated gene ontology (GO) terms
GO biological process, GO molecular function, GO cellular component gene sets are from MSigDB v2.5. Only the GO terms with the following evidence codes, IDA IPI, IMP IGI, IEP ISS, TAS, and with reasonable categories are included. The reasonable categories are defined by MSigDB as: "GO gene sets for very broad categories, such as Biological Process, have been omitted from MSigDB. GO gene sets with fewer than 10 genes have also been omitted. Gene sets with the same members have been resolved based on the GO tree structure: if a parent term has only one child term and their gene sets have the same members, the child gene set is omitted; if the gene sets of sibling terms have the same members, the sibling gene sets are omitted".
2.2.3 Customized gene sets
Additionally, users can upload their own gene set data. The format requirements of the gene set are: 1) a text file without head line; 2) each gene set per line and table separated; 3) first column is gene set ID, second column is gene set description (use "na" or leave it as blank if not available), and the rest columns are gene HUGO symbols.
GO0045726 GO0045726 NOX1 P61812 Q9Y5S8 TGFB2 GO0016045 GO0016045 CD1D NLRC4 NOD1 NOD2 O75594 P15813 PARG PGLYRP1 PGLYRP2 GO0048536 GO0048536 BCL3 JARID2 NFKB2 NKX3-2 P20749 P31314 P78367 GO0010460 GO0010460 ADRA1A ADRA1B ADRB1 B1N7G2 B1N7G7 CHRNA7 CHRNA7-2 GO0035090 GO0035090 A0PJG1 A7MBM7 ANK1 LLGL1 P16157 Q15334 GO0050982 GO0050982 A2A3D9 A9Z1W1 GRIN2B MKKS MYC O15273 P01106 P48431 P55011 P98161 P98161-2 GO0007346 GO0007346 A6NDV4 AFAP1L2 APBB1 APBB2 ATM BCL6 BLM BRCA2 GO0001890 GO0001890 AKT1 ANG ARNT BIRC2 CDX2 CDX4 CEBPB CITED1 GO0016189 GO0016189 EEA1 Q15075 GO0008406 GO0008406 A6NKD2 ACVR2A AMH ANKRD7 AR BAX BRCA2 CSDE1 DMRT1 DMRT2
2.2.4 MHC/xMHC region masking for gene sets
If choosing the "Mask MHC/xMHC region", all the genes of the MHC/xMHC (major histocompatibility complex / extended major histocompatibility complex) region will be removed from the selected gene set database. Genes in the MHC/xMHC region genes are from Horton R, et al., Nature Reviews Genetics 2004 5, 889-899.
Figure 2.2.4 The option of masking genes in the MHC/xMHC region.
2.2.5 Filter gene sets by set size
The size of gene sets can be restrained to avoid the overly narrow or overly broad functional categories. The default minimum and maximum gene number in gene sets are 20 and 200, respectively (Wang et al., 2007 Am J Hum Genet 81 (6) 1278-1283; Fellay et al., 2009 PLoS Genet 5(12) e1000791).
Figure 2.2.5 The option of filtering gene sets by set size.
3. Output and display(example of result page)
The output interface contains the download link, from where all the results, both text and figures, can be downloaded, and a summary table in which the pathways/gene sets with FDR < 0.25 are presented and ordered by the increase of FDR (the threshold of FDR < 0.25 denotes the confidence of 'possible' or 'hypothesis', while the threshold of FDR < 0.05 is regarded as 'high confidence' or 'with statistical significance'). You can visit http://gsea4gwas.psych.ac.cn/getResult.do?result=9DA3BCD71BDB4CC5DEC84F64927C20EE.s3_1265892314763 to see an example result.
Figure 3 The result page.
3.1 Manhattan plot of pathway/gene set
A Manhattan plot is a type of bar graph, usually used to display data with a large number of data-points - many of non-zero amplitude, and with a distribution of higher-magnitude values, for instance in genome-wide association studies (http://en.wikipedia.org/wiki/Manhattan_plot). For the Manhattan plot of GWAS, the bar of x-axis is for each chromosome and the y-axis is for association data (typically -log (P-value)). Manhattan plot of GWAS maps the result of association test to chromosomal locations.
Here the Manhattan plot of gene set uses the Manhattan plot of GWAS as background, and highlights the results of association test for a given pathway/gene set. It helps users to graphically compare the association test results of the given pathway/gene set to the genome-scale data, and provides an interplay panel for user to view the information of the interesting genes belonging to the pathway/gene set.
Figure 3.1 Gene set Manhattan plot.
3.2 The number of Significant genes/Selected genes/All genes
Significant genes: genes mapped with at least one of the top 5% of all SNPs.
These numbers help users to have a clear overview of the pathways/gene sets concerning: how many genes are involved in this pathway/gene set, how many genes are included in i-GSEA analysis, and how many genes are significant.
Copyright: Bioinformatics Lab, Institute of Psychology, Chinese Academy of Sciences
Last update: June 23, 2010