GSEA
If you've chosen GSEA on the landing window of GSEACompass, keep on reading.
Once you clicked on the Open
button beneath GSEA
, the following window will be visible on your screen.

Each of the graphical elements you see has its meaning and use, here they're thoroughly explained.
Gene set database
The Choose file
field under Gene set database
let you select the gene sets file to be used when running the analysis, to better understand what this file must contain, check out the official GSEA description.
A gene sets file defines one or more gene sets. For each gene set, the file contains the gene set name and the list of genes in that gene set. A gene sets file is a tab-delimited text file in gmx or gmt format. For descriptions and examples of each file format, see GSEA file formats.
Note that GSEACompass doesn't verify the good format of the selected file, hence a gseapy exception may be thrown if the selcted file cannot be interpreted.
Number of permutations
It's a field accepting just positive integer numbers. It's uses to estimate the statical significance of the enrichment scores obtained.
The following explaination can be found in the official MSigDB GSEA website.
Number of permutations. Specify the number of permutations to perform in assessing the statistical significance of the enrichment score. It is best to start with a small number, such as 10. After the analysis completes successfully, run it again with a full set of permutations. The GSEA team recommends 1000 permutations.
Expression dataset
The Choose file
field under Gene set database
let you select the expression dataset to feed to GSEA. It's virtually a matrix having genes in an axis and samples in the other axis, each cell contains the expression of that gene in that sample.
A more general definition is the one in the MSigDB website, reported below.
An expression dataset file contains features (genes or probes), samples, and an expression value for each feature in each sample. It is a tab-delimited text file in gct, res, pcl, or txt format. For descriptions and examples of each file format, see GSEA file formats.
GSEACompass does inspect this file as much as it can, in order to notify the user of any abnormalties found in it. Such defects can be missing expressions, wrong expressions (string instead of a number), etc.
Phenotype labels
The Choose file
field under Phenotype labels
allows you to select a phenotype labels file, which basically is a single row containing two variant of a phenotype repeated multiple times (e.g. M M F F M, where M means make, F female).
As usual, here it is the official MSigDB GSEA description of it.
A phenotype label file, also known as a class file or template file, defines phenotype labels and assigns those labels to the samples in your expression dataset. A phenotype label file is a tab-delimited text file in cls format. For descriptions and examples of the cls file format, see GSEA file formats.
These kind of file usually contains a two rows before actual phenotype data; these two rows contains meta-data about those phenotype (e.g. number of them, number of samples, etc.). These meta-data are ignored by GSEACompass.
Remap to gene symbols
The Remap to gene symbols
option behaves just as the remap option in the official Broad institute GSEA software: when Remap
is selected, it simpty tells GSEACompass to convert each gene symbol in the expression dataset, pertaining to a specific gene notation, to the corresponding one of another notation. The result, internally, is an expression dataset composed of gene symbols of notation different from the original one. If gene symbol doesn't have a correspondence, it's simply dropped.
The file mapping each gene of a notation to the same in another notation, called chip platform file, is described in the following section.
Chip platform
The Choose file
field under Phenotype labels
allows you to select the chip platform to perform gene symbols notation conversions.
It contains, apart from descriptive columns ignored by GSEACompass, two main columns representing two gene symbols notations: each row contains a gene symbol in the first field and its corresponding one - in the other notation - in the second. These notations may even regard different species, those genese mapping to each other are then called orthologs.
Keep in mind that this file must be selected if and only if the Remap to gene symbols
option Remap
has been selected.
GSEACompass does perform some checks on it, verifying it's not missing any value.
Question mark
You may have noticed those small question mark icons, each of them - if clicked - open an helper popup, such as this one below.

They have been introduced to remind, if needed, the meaning of each file requested.
Once every file has been selected, click the Submit
button to start the analysis and wait until a new window appears.
Last updated