MaGIC FIRE-Tool

Introduction

Welcome to the Fun Interactive RNA-Seq Expression Tool by the Molecular and Genomics Informatics Core. This tool is intended to be used in conjunction with data generated by the Core's pipelines to enable you to dive deeper in to your data! All available modules will load once you have completed the prerequisite steps- starting with uploading your data.

Step 1: Data Upload

Hit Count File

You should receive this raw hit count table directly from the Core in a directly uploadable format. It should contain headers for each column. The first two columns should be the ensembl stable ID and gene symbol respectively. Subsequent columns are for each individual sample. Each row should be a unique gene, followed by each sample's respective raw read counts. From the Core this should be a tab delimited file, but if you open it in other programs that may be modified.

Metadata File

Your experimental metadata is a table that contains the non-RNAseq data for the experiment. This can be customized to include a myriad of experimental data for sample grouping. Primarily, this will be to define your sample groups and will at minimum contain a column with the sample names (identical to the columns from the gene count table) and respective group. As this is extended, metadata correlations can be inferred from the sample variance as well.

Step 2: DESeq2 and assign comparisons

DESeq Execution

Once you have imported your data, you must then execute the DESeq2 workflow. You do this by selecting which metadata column contains your comparators. For example, if you are doing a drug response study- you would have a column defining which drug group each sample is (see the example). You would then select that columns name for DESeq2. Once DESeq2 has been performed, you may begin to look at the wholistic data visualization tools- such as basic heatmaps and clustering plots.

Comparators

Additionally, you must set what comparisons you wish to view- such as Knock Out vs Wild Type control, or Drug A vs No Drug. This occurs in the comparisons tab of DESeq2. Define your comparisons, then allow it to pull out the data as well as assign significance. For each comparison you wish to continue with- click on the button to add comparison to the analysis to allow it to be accesible in other modules. Once you have defined all of your comparisons, you can then explore full visualization.

Step 3: Visualization

The visualizations can be then explored by changing the upper tabs. Each visualization tool has associated customizations available.

Clustering Visualization

PCA is principal component analysis. This is a method of reducing high dimensionality data into 2D or 3D visualizations. The individual principal components capture the variability across samples, with the primary variability captured in PC1 and PC2. The closer points are more similar in their gene expression profiles, based on the normalized counts. This is often a good quality control step because if your experiment was properly designed and controlled, the samples should clearly cluster by groups. For more variable samples like human derived, you might expect more variability and not as clear cut a divide.

Distance matrices is another way of measuring sample-to-sample relatedness. Here it will perform euclidean distance between each sample, showing which are more related or more distant. Similar to PCA, this should show the anticipated groupings of your samples.

The individual vectors that are used to calculate principal components are called Eigen vectors. For eigen correlation analysis, we will be correlating those vectors with various metadata columns. For smaller experiments this will most likely be skipped, but for more exploratory studies with expansive metadata this can be very helpful. For example, if you have a large patient study with various clinical features (ie- death during study, weight loss, secondary infections), eigen corerlations will help illuminate what metadata effects were driving the variabilty seen in the data. Do note- this is not meant to replace your sample groupings. You cannot replace sample groupings with samples that are more/less similar from output PCA/eigen calculations. This is meant to expand on your sample groupings and provide extra insights.

Comparison Visualizations

Volcano plots are a staple in RNA-seq analysis. These are meant to visualize the differences seen in your direct comparisons. For example, if you are doing treatment vs control you will be able to see the full spread of each gene- is it higher in treatment, lower in treatment, is it significant etc. This will only show cross-sample comparisons though on a per-comparison basis.

Venn diagrams allow effective visualization across comparisons. For example, if you wanted to see what was most similar between DrugA vs Untreated control and DrugB vs Untreated control, you could see what significant genes overlap using venn diagrams. This is limited in visualization effectiveness though to 5 or less comparisons

UpSet plots are an alternative to Venn diagrams that allow expansion beyond 5 comparisons. This shows for each combination what is shared across the comparisons.

Gene Visualizations

Heatmaps are another staple of RNA-seq analysis. These allow you to see individual genes across all samples based on their expression levels. In this case, it can also include sample correlations based on hierarchal clustering.

Box plots enable visualization of individual genes across the selected groupings. This shows both the mean of the group, as well as the variance across samples within the group.

Violin plots enable visualization of individual genes across the selected groupings. This is very similar to box plots, but for higher sample numbers can more accurately represent the variance by the width of the violin. Note- this will not function for groups with less than 3 samples per group.

Pathway Visualizations

Dot Plots are an excellent visualization tool for GO/KEGG/MSigDB pathway visualization. These show the significant pathway alterations in Enrichment or Over Representation. Additionally, this can be viewed in ridge plots or barcharts. This is not currently available for species other than human and mouse.

Enrichment map plots are a useful tool fo visualizing the overlapping gene sets of various networks to more easily view functional outputs. This is not currently available for species other than human and mouse.

GSEA plots are the classic enrichment plots for a single GSEA term. These show the balance of genes across the specific GSEA term. This is not currently available for species other than human and mouse.

Pathview is an incredible tool to visualize your comparison overlaid on a specific KEGG pathway. This is not currently available for species other than human and mouse.

Input Set up

Example data can be found in the local repository

Hit Count File

Select hit count file

Browse...

Choose Separator of hit count file

Comma

Tab

Metadata file

Select metadata File

Browse...

Choose Separator of metadata file

Comma

Tab

Raw Counts
Metadata

DESeq Factor for Normalization

Please select the cross comparisons you wish to view.

These are based off of the comparator column chosen for the DESeq normalization.

To then add the comparator as an option for later analyses- click the button to 'Add Comparison to Analysis'

Comparison Numerator

VS

Comparison Denominator

Name of gene

Group by

Label Size

Title Size

X-Axis Angle

0 45 90 270 315

Legend Position

Top Bottom Right left

Plot Heights:

Plot Widths:

Box Plots
Violin Plots

Choose Comparison1

Pick color1

Choose Comparison2

Pick color2

Choose Comparison3

Pick color3

Choose Comparison4

Pick color4

Choose Comparison5

Pick color5

FDR adjusted P or p-value

FDR adjusted P P-value

Font size:

Label Font size:

Line thickness:

Line Type:

Color Opacity:

Choose comparisons

FDR adjusted P or p-value

FDR adjusted P P-value

Integer title size:

Integer tick size:

Set title size:

Set tick size:

Names size:

Numbers on bar size:

Point size:

Line size:

Plot Heights:

Plot Widths:

Venn Diagrams
UpSet Plots

MaGIC FIRE-Tool

Fun Interactive RNA-Seq Expression Tool

MaGIC FIRE-Tool

Fun Interactive RNA-Seq Expression Tool

Introduction

Step 1: Data Upload

Hit Count File

Metadata File

Step 2: DESeq2 and assign comparisons

DESeq Execution

Comparators

Step 3: Visualization

Clustering Visualization

Comparison Visualizations

Gene Visualizations

Pathway Visualizations

Input Set up

Hit Count File

Metadata file

Current comparisons loaded:

VS

GSEA Settings:

Plot Settings:

Over Representation Settings:

Plot Settings:

Data will be based on the last run GSE/ORA analysis

Gene Set Enrichment Table

Over Representation Analysis Table