Detection of single nucleotide polymorphisms (SNPs) is an important step in understanding the relationship between a genotype and
phenotype. A likely workflow in genetic variation studies is the analysis and identification of variants associated with a specific trait or
population. Input data could be
from a whole genome or whole exome sequencing. SNPs are useful because they provide
information about polymorphism within a population, genetic changes influencing common disease, and
drug efficacy.
A. SNP
SNP-seq data analysis includes but is not limited to:
Data Management/Quality Control and Trimming
Alignment to Reference Genome
Variant Detection
Variant Filtering
Data Visualization
B. Additional Analysis for GWAS-like Data
Sample QC Task Checking
Discordant sex information
Calculating missingness
Heterozygosity scores
Relatedness
Batch Reports
Remove duplicates
Minor allele frequencies
SNP missingness
Differential missingness
Hardy-Weinberg equilibrium deviations
Basic PLINK association tests, producing manhattan and Q plots
CMH association test - association analysis, account for clusters
At USU’s high-performance computing and bioinformatics facility, we have developed a series of parallel, multicore, CPU-based,
open-source pipelines for large-scale -omics data analysis, which enables efficient and parallel analysis of multiple datasets
in a short period of time.