Welcome to the Fun Interactive RNA-Seq Expression Tool by the Molecular and Genomics Informatics Core. This tool is intended to be used in conjunction with data generated by the Core's pipelines to enable you to dive deeper in to your data! All available modules will load once you have completed the prerequisite steps- starting with uploading your data.
You should receive this raw hit count table directly from the Core in a directly uploadable format. It should contain headers for each column. The first two columns should be the ensembl stable ID and gene symbol respectively. Subsequent columns are for each individual sample. Each row should be a unique gene, followed by each sample's respective raw read counts. From the Core this should be a tab delimited file, but if you open it in other programs that may be modified.
Your experimental metadata is a table that contains the non-RNAseq data for the experiment. This can be customized to include a myriad of experimental data for sample grouping. Primarily, this will be to define your sample groups and will at minimum contain a column with the sample names (identical to the columns from the gene count table) and respective group. As this is extended, metadata correlations can be inferred from the sample variance as well.
Once you have imported your data, you must then execute the DESeq2 workflow. You do this by selecting which metadata column contains your comparators. For example, if you are doing a drug response study- you would have a column defining which drug group each sample is (see the example). You would then select that columns name for DESeq2. Once DESeq2 has been performed, you may begin to look at the wholistic data visualization tools- such as basic heatmaps and clustering plots.
Additionally, you must set what comparisons you wish to view- such as Knock Out vs Wild Type control, or Drug A vs No Drug. This occurs in the comparisons tab of DESeq2. Define your comparisons, then allow it to pull out the data as well as assign significance. For each comparison you wish to continue with- click on the button to add comparison to the analysis to allow it to be accesible in other modules. Once you have defined all of your comparisons, you can then explore full visualization.
The visualizations can be then explored by changing the upper tabs. Each visualization tool has associated customizations available.
PCA is principal component analysis. This is a method of reducing high dimensionality data into 2D or 3D visualizations. The individual principal components capture the variability across samples, with the primary variability captured in PC1 and PC2. The closer points are more similar in their gene expression profiles, based on the normalized counts. This is often a good quality control step because if your experiment was properly designed and controlled, the samples should clearly cluster by groups. For more variable samples like human derived, you might expect more variability and not as clear cut a divide.
Distance matrices is another way of measuring sample-to-sample relatedness. Here it will perform euclidean distance between each sample, showing which are more related or more distant. Similar to PCA, this should show the anticipated groupings of your samples.
The individual vectors that are used to calculate principal components are called Eigen vectors. For eigen correlation analysis, we will be correlating those vectors with various metadata columns. For smaller experiments this will most likely be skipped, but for more exploratory studies with expansive metadata this can be very helpful. For example, if you have a large patient study with various clinical features (ie- death during study, weight loss, secondary infections), eigen corerlations will help illuminate what metadata effects were driving the variabilty seen in the data. Do note- this is not meant to replace your sample groupings. You cannot replace sample groupings with samples that are more/less similar from output PCA/eigen calculations. This is meant to expand on your sample groupings and provide extra insights.
Volcano plots are a staple in RNA-seq analysis. These are meant to visualize the differences seen in your direct comparisons. For example, if you are doing treatment vs control you will be able to see the full spread of each gene- is it higher in treatment, lower in treatment, is it significant etc. This will only show cross-sample comparisons though on a per-comparison basis.
Venn diagrams allow effective visualization across comparisons. For example, if you wanted to see what was most similar between DrugA vs Untreated control and DrugB vs Untreated control, you could see what significant genes overlap using venn diagrams. This is limited in visualization effectiveness though to 5 or less comparisons
UpSet plots are an alternative to Venn diagrams that allow expansion beyond 5 comparisons. This shows for each combination what is shared across the comparisons.
Heatmaps are another staple of RNA-seq analysis. These allow you to see individual genes across all samples based on their expression levels. In this case, it can also include sample correlations based on hierarchal clustering.
Box plots enable visualization of individual genes across the selected groupings. This shows both the mean of the group, as well as the variance across samples within the group.
Violin plots enable visualization of individual genes across the selected groupings. This is very similar to box plots, but for higher sample numbers can more accurately represent the variance by the width of the violin. Note- this will not function for groups with less than 3 samples per group.
Dot Plots are an excellent visualization tool for GO/KEGG/MSigDB pathway visualization. These show the significant pathway alterations in Enrichment or Over Representation. Additionally, this can be viewed in ridge plots or barcharts. This is not currently available for species other than human and mouse.
Enrichment map plots are a useful tool fo visualizing the overlapping gene sets of various networks to more easily view functional outputs. This is not currently available for species other than human and mouse.
GSEA plots are the classic enrichment plots for a single GSEA term. These show the balance of genes across the specific GSEA term. This is not currently available for species other than human and mouse.
Pathview is an incredible tool to visualize your comparison overlaid on a specific KEGG pathway. This is not currently available for species other than human and mouse.
Example data can be found in the local repository
Please select the cross comparisons you wish to view.
These are based off of the comparator column chosen for the DESeq normalization.
To then add the comparator as an option for later analyses- click the button to 'Add Comparison to Analysis'