Long read sequencing technologies provide an efficient approach to generating highly contiguous and informative assemblies. However, higher relative error rates can introduce frameshifts and premature stop codons that pseudogenize genes, hindering downstream analyses. We developed a software tool that detects gene-fragmenting errors in draft assemblies of small genomes through comparison with a curated set of reference genome sequences and raw read information. In our presented example, detected errors represent less than 0.05% of the genome, but when corrected reduced the rate of pseudogenes from 23.3 to 5.6% in example long read assemblies, comparable to the rate of pseudogenes in short read assemblies. We demonstrate that this software can detect assembly errors in long read assemblies generated from small genomes and correct them to de-fragment genes.
Kastor: a reference-based comparative approach for assessment and correction of gene-fragmenting errors in long-read assemblies of small genomes
Janet S. H. Lorv,Brendan J. McConkey
Published 2025 in BMC Genomics
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
BMC Genomics
- Publication date
2025-04-18
- Fields of study
Biology, Medicine, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar, PubMed
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-54 of 54 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1