Utah State University Bioinformatics Facility

How VAR-seq data analysis works

Detection of single nucleotide polymorphisms (SNPs) is an important step in understanding the relationship between a genotype and phenotype. A likely workflow in genetic variation studies is the analysis and identification of variants associated with a specific trait or population. Input data could be from a whole genome or whole exome sequencing. SNPs are useful because they provide information about polymorphism within a population, genetic changes influencing common disease, and drug efficacy.

A. SNP

SNP-seq data analysis includes but is not limited to:

Data Management/Quality Control and Trimming
Alignment to Reference Genome
Variant Detection
Variant Filtering
Data Visualization

**Figure #1:** An illustration of a Manhattan plot depicting several strongly associated risk loci. Credit: M. Kamran Ikram et al, Creative Commons Attribution 2.5 Generic License

B. Additional Analysis for GWAS-like Data

Sample QC Task Checking

Discordant sex information
Calculating missingness
Heterozygosity scores
Relatedness

Batch Reports

Remove duplicates
Minor allele frequencies
SNP missingness
Differential missingness
Hardy-Weinberg equilibrium deviations

Basic PLINK association tests, producing manhattan and Q plots

CMH association test - association analysis, account for clusters
Permutation testing
Logistic regression

VAR-seq (SNP-Seq) Analysis