Calling structural variants (SVs), i.e., genomic alterations of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\ge $$\end{document}50bp, from whole genome short-read data remains challenging, as existing callers are known to lack accuracy and robustness. Therefore, meta-caller approaches combining the results of multiple standalone tools in a consensus set of reported SV calls, are widely used. Here, SV-MeCa (Structural Variant Meta-Caller) is presented, the first SV meta-caller incorporating variant-specific quality metrics from individual VCF outputs, rather than relying solely on number and combination of tools supporting consensus SV calls. In addition, SV-MeCa offers a suitable score to rank obtained consensus SV calls according to evidence of representing true positive calls, i.e., real-world variants. SV-MeCa applies seven standalone SV callers and merges resulting deletion and insertion calls into a union VCF file using SURVIVOR. For each entry in the SURVIVOR-generated consensus, caller-specific quality measures are extracted from corresponding standalone VCF files, and serve as input for an either deletion- or insertion-specific XGBoost decision tree classifier, which was previously trained on the HG002 SV benchmark data provided by the Genome in a Bottle consortium. The SV-MeCa XGBoost models assign a probability to (consensus) SV calls to represent true positive calls, which can be used for ranking the final output according to evidence. Performance of SV-MeCa and four previously published meta-caller approaches were evaluated based on autosomal SV calls in samples curated by the Human Genome Structural Variation Consortium, Phase 2. With regard to F\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_1$$\end{document} scores, which were 0.58 on average for deletions and 0.42 on average for insertions, SV-MeCa outperformed the other meta-callers. With regard to precision, only ConsensuSV achieved higher values (0.97 versus 0.64 on average for deletions, 0.75 versus 0.53 on average for insertions), and with regard to recall, SV-MeCa was outperformed exclusively by Meta-SV for deletions (0.55 versus 0.53). SV-MeCa, publicly available at https://github.com/ccfboc-bioinformatics/SV-MeCa, outperforms existing SV meta-caller approaches by taking variant-specific quality measures into account. Moreover, due to the XGBoost prediction probabilities serving as scores, the output of SV-MeCa can be continuously adjusted to user needs in terms of sensitivity and precision.
SV-MeCa: an XGBoost-based meta-caller approach for structural variant calling from short-read data
Rudel Christian Nkouamedjo-Fankep,Arda Söylev,Anna-Lena Kobiela,Jochen Blom,Corinna Ernst,Susanne Motameny
Published 2025 in BMC Bioinformatics
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
BMC Bioinformatics
- Publication date
2025-08-20
- Fields of study
Medicine, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar, PubMed
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-29 of 29 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1