ABSTRACT

Six subspecies are currently recognized in Salmonella enterica. Subspecies I (subspecies enterica) is responsible for nearly all infections in humans and warm-blooded animals, while five other subspecies are isolated principally from coldblooded animals. We sequenced 21 phylogenetically diverse strains, including two representatives from each of the previously unsequenced five subspecies and 11 diverse new strains from S. enterica subspecies enterica, to put this species into an evolutionary perspective. The phylogeny of the subspecies was partly obscured by abundant recombination events between lineages and a relatively short period of time within which subspeciation took place. Nevertheless, a variety of different tree-building methods gave congruent evolutionary tree topologies for subspeciation. A total of 285 gene families were identified that were recruited into subspecies enterica, and most of these are of unknown function. At least 2,807 gene families were identified in one or more of the other subspecies that are not found in subspecies I or Salmonella bongori. Among these gene families were 13 new candidate effectors and 7 new candidate fimbrial clusters. A third complete type III secretion system not present in subspecies enterica (I) isolates was found in both strains of subspecies salamae (II). Some gene families had complex taxonomies, such as the type VI secretion systems, which were recruited from four different lineages in five of six subspecies. Analysis of nonsynonymous-to-synonymous substitution rates indicated that the more-recently acquired regions in S. enterica are undergoing faster fixation rates than the rest of the genome. Recently acquired AT-rich regions, which often encode virulence functions, are under ongoing selection to maintain their high AT content. IMPORTANCE We have sequenced 21 new genomes which encompass the phylogenetic diversity of Salmonella, including strains of the previously unsequenced subspecies arizonae, diarizonae, houtenae, salamae, and indica as well as new diverse strains of subspecies enterica. We have deduced possible evolutionary paths traversed by this very important zoonotic pathogen and identified novel putative virulence factors that are not found in subspecies I. Gene families gained at the time of the evolution of subspecies enterica are of particular interest because they include mechanisms by which this subspecies adapted to warm-blooded hosts. Received 1 February 2013 Accepted 5 February 2013 Published 5 March 2013 Citation Desai PT, Porwollik S, Long F, Cheng P, Wollam A, Clifton S, Weinstock GM, McClelland M. 2013. Evolutionary genomics of the Salmonella enterica subspecies. mBio 4(2):e00579-12. doi:10.1128/mBio.00579-12. Editor B. Brett Finlay, The University of British Columbia Copyright © 2013 Desai et al. This is an open-access article distributed under the terms of the Creative Commons Attribution-Noncommercial-ShareAlike 3.0 Unported license, which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original author and source are credited. Address correspondence to Michael McClelland, mmcclelland@sdibr.org. Salmonella spp. cause about 1.3 billion cases of nontyphoidal salmonellosis worldwide each year (1). The economic burden due to salmonellosis in the United States alone is estimated to be ~$2.3 billion annually (2). Salmonella is also a major pathogen of domestic animals causing huge economic losses and providing a source of infection for humans (3). Serotyping-based identification of somatic (O) and flagellar (H) antigens was among the first methods used for taxonomic classification of Salmonella, and each serovar was initially considered a different species. However, cell surface antigens are sometimes horizontally transferred, a phenomenon that can cause classification of genetically unrelated strains within the same serovar (4, 5). Salmonella taxonomy took a major stride when Falkow and colleagues (6) used DNA hybridization to demonstrate that all tested serovars were related at the species level and identified five distinct subgenera within the species. Salmonella is now considered to consist of two species, Salmonella bongori and Salmonella enterica, and S. enterica is further classified into six subspecies, arizonae (IIIa), diarizonae (IIIb), houtenae (IV), salamae (II), indica (VI), and enterica (I) (7). S. enterica subsp. enterica (I) strains represent the vast majority of Salmonella strains isolated from humans and warm-blooded animals, while all the other subspecies and S. bongori are more typically (though not exclusively) isolated from cold-blooded animals (8). Approximately 50 of the nearly 2,600 known Salmonella serovars account for ~99% of all clinical isolates of Salmonella from humans and domestic mammals (9), and all of these 50 serovars are in subspecies I. Genome sequencing efforts in S. enterica have so far focused on the most prevalent serovars of subspecies enterica (I). We have sequenced the genomes of eleven additional members of subspecies enterica (I), selected based on their diversity, and those of two different serovars from each of the other five known subspecies. We compared these sequences to the whole-genome sequences of S. bongori (10) and to seven previously sequenced subspecies enRESEARCH ARTICLE March/April 2013 Volume 4 Issue 2 e00579-12 ® mbio.asm.org 1 terica (I) strains. We construct phylogenetic hypotheses for these 29 genomes while taking into account the high rate of recombination among Salmonella strains (11–15). Because acquisition and loss of genes is a major force driving the evolution of virulence in Salmonella (16), we modeled the gain and loss of gene families at each ancestral node. We reconstructed hypothetical gene contents of the most recent common ancestor (MRCA) of each subspecies. Gene families gained at the subspecies enterica (I) node may provide clues to the strategies and virulence factors that contributed to the formation of a lineage which has evolved to infect principally warm-blooded hosts. We also identified gene families undergoing accelerated evolution based on pairwise synonymous-tononsynonymous single nucleotide polymorphism (SNP) ratios. This group of gene families is particularly interesting because some of this selection may be driven by newly acquired life strategies or by interactions of the bacterium with the host. RESULTS AND DISCUSSION We sequenced 21 new Salmonella genomes, 10 of which were sequenced to completion while the remaining genomes were sequenced to obtain improved high-quality drafts. The strains and sequencing statistics are summarized in Table S1 in the supplemental material along with information regarding other previously published strains and species to which these new genomes were compared. Among the 21 new genomes, two strains were selected from each of the five previously unsequenced subspecies. In addition, 11 genomes were selected from 305 strains within subspecies enterica (I) that lacked many genes found in S. enterica subsp. enterica serovars Typhimurium LT2 and Typhi CT18 as well as strains representing distant genomovars within a single serovar based on comparative genomic hybridization (17–19) (data accessible at https://dl.dropbox.com/u/99836585/MMCC_all_CGH _100407.xlsx). The fact that some of the genomes have not been sequenced to completion means that some sequencing errors, misassemblies, annotation errors due to collapsing of duplicate genes, and duplicated annotations of genes that span contig boundaries still exist in a few locations in a few genomes in this data set. The analyses we perform below are designed to mitigate but not eliminate these limitations. The numbers presented are all estimates constrained by these caveats, and some analyses, such as strict tests of orthology and studies of gene duplication, are not possible on our draft genomes. Nevertheless, the obtained high-quality drafts permit a fascinating insight into evolutionary processes during Salmonella subspeciation. Phylogenetic analysis. We used three different approaches to predict phylogenetic relationships between orthologous “core” regions shared by all subspecies of Salmonella. We first used Mauve (20) to align the 29 Salmonella genomes included in our study and used Escherichia coli K-12 as an outgroup. We identified 737,062 SNPs in the “core” regions of ~2.6 Mb that were present and aligned with high confidence in all taxa. We used these data to construct a bootstrapped maximum likelihood (ML) tree using RAxML (version 7.2.6) (21). Figure 1A shows the cladogram constructed using this approach. The relationship between the subspecies was supported in all 1,000 bootstrap replicates using random samples of 50% of the SNP data. To estimate the divergence times of each subspecies, codon alignments were constructed for 2,025 genes present across all 30 genomes in single copies, as annotated by automated RAST (22) or annotated in the publically available genomes. A total of 348,642 synonymous SNPs, which did not change the amino acid sequence, were identified. Figure 1B shows a condensed and linearized version of the tree built using these SNPs (with 1,000 bootstraps on 50% of the data), calibrated based on a previously estimated 140-million-year divergence time between Salmonella and E. coli (23). The topology of this tree (Fig. 1B) was in agreement with the tree of all “core” SNPs (Fig. 1A). Using synonymous SNPs, it was estimated that subspecies enterica (I) diverged from its last common ancestor ~27 million years ago and that the most recent common ancestor (MRCA) of all analyzed subspecies enterica strains arose ~12 million years ago. This estimate supports the possibility that the subspecies evolved long after their respective preferred hosts. As a distinct alternative strategy to determine phylogeny, we built individual maximum likelihood (ML) trees for each of the 2,025 core genes. DNA sequences fo

PUBLICATION RECORD

CITATION MAP

EXTRACTION MAP

CLAIMS

  • No claims are published for this paper.

CONCEPTS

  • No concepts are published for this paper.

REFERENCES

Showing 1-80 of 80 references · Page 1 of 1