Utah State University Bioinformatics Facility

How Prokaryotes Assembly and Annotation Works

Genome assembly is the process of taking many individual disconnected graphs (pieces of the DNA) that are processed independently by an assembler and putting them back together to create the original assembly. A high-quality assortment/annotation of genomes has become a vital instrument for improving biological understanding of all species. The biggest challenge of genome assembly is “assembly error”. Assembly errors occur for a variety of reasons. Pieces are frequently discarded incorrectly as mistakes or repeats, while others are joined up in the wrong places or orientations. It is recommended that you use long, high-quality reads for your analysis to address many of these issues.

Single and paired reads are used to assemble a genome. Single reads are simply short sequenced fragments that can be joined together through overlapping regions to form a continuous sequence known as a 'contig'. Paired reads are roughly the same length as single reads, but they come from opposite ends of DNA fragments. Depending on the sequencer used, this distance can range from 200 base pairs to several tens of kilobases. Knowing that paired reads were generated from the same piece of DNA can help better to organize contigs into 'scaffolds.' Paired read data can also be used to determine the size of repetitive regions.

A genome assembly is considered good quality on the basis of

The number of scaffolds and contigs that represent the genome
The proportion of reads that are assembled
The absolute length of contigs and scaffolds
The length of contigs and scaffolds relative to the size of the genome

The most commonly used metric to evaluate new genome assembly is N50, the smallest scaffold or contig above which 50% of an assembly would be represented. Our de dovo assembly pipeline for prokaryotes consists of two major parts:

Pre Assembly Analysis

Raw subreads overlapping for error correction
Preassembly and error correction
Overlapping detection of the error corrected reads
Overlap filtering
Construct graph from overlaps
Construct contig from graph
Construct scaffolds or chromosome

Post assembly analysis:

Step 1 - Assembly QC assessment
Step 2 - Gene Prediction
Step 3 - Protein coding regions identification
Step 4 - Functional annotation
Step 5 - Results as tables, mapping BAM files, and summary statistics

Prokaryotes: De Novo Genome Assembly and Annotation

How Prokaryotes Assembly and Annotation Works

Pre Assembly Analysis

Post assembly analysis:

Other Services

Request a quote

Hiring Researchers