

These methods include the estimation of genomic distances based on the content of genomes, either orthologs, homologs, folds, or protein domains ( Gerstein, 1998 Fitz-Gibbon and House, 1999 Snel et al., 1999 Tekaia et al., 1999 Wolf et al., 2002 Deeds et al., 2005 Yang et al., 2005 House, 2009). During the past 15 years, however, considerable effort has been placed on comparing the similarity of organisms with genome-wide methods or, at least, with methods that use more than a single gene. The results are supportive of the idea that Actinobacteria and Firmicutes are closely related, which in turn implies a single origin for the gram-positive cell.įor the past three decades, the comparisons of ribosomal RNA (rRNA) between microorganisms have largely provided the taxonomic and phylogenetic basis for bacteriology ( Woese, 1987). In fact, the robustness of gene order support was found to be considerably greater for uniting these two phyla than for uniting any of the proteobacterial classes together. Consistently, our trees show the Actinobacteria as a sister group to the bulk of the Firmicutes. We then repeated our study of the relations of prokaryotes using gene order in 172 complete genomes better representing a wider-diversity of prokaryotes. Using the gene order distances in 143 genomes, the relations of prokaryotes were studied using neighbor joining and agreement subtrees. This initial work suggests that gene order may be useful in conjunction with other methods to help understand the relatedness of genomes. Gene content is only weakly correlated with rRNA divergence ( R 2 = 0.04) over all distances, however, it is especially strongly correlated at rRNA Jukes-Cantor distances of less than 0.1 ( R 2 = 0.67). The Jukes-Cantor gene order distances are reasonably well correlated with the divergence of rRNA ( R 2 = 0.24), especially at rRNA Jukes-Cantor distances of less than 0.2 ( R 2 = 0.52). First, we compared the distances found via the order of six orthologs to distances found based on ortholog gene content and small subunit rRNA sequences.

The raw distances were then corrected for gene order convergence using an adaptation of the Jukes-Cantor model, as well as using the common distance correction D′ = −ln(1-D). The method was based on repeatedly selecting five or six non-adjacent random orthologs from each of two genomes and determining if the chosen orthologs were in the same order. Initially using 143 genomes, we developed a method for calculating the pair-wise distance between prokaryotic genomes using a Monte Carlo method to estimate the conservation of gene order. 3Department of Molecular, Cell, and Developmental Biology, Institute of Genomics and Proteomics, University of California, Los Angeles, Los Angeles, CA, USA.2Department of Molecular, Cell, and Developmental Biology, University of California, Los Angeles, Los Angeles, CA, USA.1Penn State Astrobiology Research Center and Department of Geosciences, The Pennsylvania State University, University Park, PA, USA.
