Introduction i-Motifs (iMs) are cytosine-rich, four-stranded DNA structures with emerging roles in gene regulation and genome stability. Despite their biological relevance, genome-wide prediction of iM-forming sequences remains limited by low specificity and high false-positive rates, leading to considerable experimental burden. Method To address this, we developed a refined computational approach that prioritizes high-confidence iM candidates using a Position-Specific Similarity Matrix (PSSM) derived from multiple sequence alignments. The human reference genome (hg38) was scanned using a custom regular expression targeting cytosine-rich motifs, followed by scoring each sequence with the PSSM. Statistical significance was assessed via permutation testing, one-sided t-tests, Benjamini-Hochberg correction, and Z-scores. Results This pipeline identified 37,075 candidate sequences (15–46 nucleotides) with strong iM-forming potential. Validation against experimentally confirmed iMs and known G-quadruplexes (G4s) demonstrated significant differences in alignment scores and sequence similarity, confirming structural specificity. A random forest classifier trained on nucleotide features further supported the distinctiveness of the candidates, achieving a high classification performance. Conclusion This work presents a scalable and statistically robust method to enrich for biologically relevant iM sequences, providing a valuable resource for future experimental validation and the rational design of ligands targeting iMs to modulate gene expression in contexts such as cancer.
Sequence-based prioritization of i-Motif candidates in the human genome
Veronica Remori,M. Prest,Mauro Fasano
Published 2025 in Frontiers in Bioinformatics
ABSTRACT
PUBLICATION RECORD
- Publication year
2025
- Venue
Frontiers in Bioinformatics
- Publication date
2025-08-12
- Fields of study
Biology, Medicine, Computer Science
- Identifiers
- External record
- Source metadata
Semantic Scholar, PubMed
CITATION MAP
EXTRACTION MAP
CLAIMS
- No claims are published for this paper.
CONCEPTS
- No concepts are published for this paper.
REFERENCES
Showing 1-42 of 42 references · Page 1 of 1
CITED BY
- No citing papers are available for this paper.
Showing 0-0 of 0 citing papers · Page 1 of 1