Assigning strain names was also based on exact matching since strain names deemed too short for allowing partial matches only. We considered it beneficial, however, to relax this rule in three ways: (i) case-insensitive matching; (ii) equivalence of strain names that only differed by a ��T�� in the last position (which is often appended to indicate a type strain); and (iii) equivalence of strain names that only differed by characters other than letters, digits and underscores. Post-processing of the initial ranking The 1,000 target strains for the main GEBA project were selected from the 8,029 ranked strains as follows. First, for obvious reasons, strains with genome projects registered in GOLD were removed. Second, strains not available in the DSMZ collection were removed.

As not only the immediate accessibility of cryopreserved material, but also the generation of a sufficient amount of cell mass and the subsequent extraction of ultra-pure gDNA was necessary, it was deemed practical to postpone inaccessible strains to later phases of the project [10]. For the same reason, a small number of strains available in the holdings of the DSMZ but known as extremely challenging to cultivate (��fastidious��), were also disregarded in this phase of the project. This crucially necessary post-processing was eased considerably by the independence of the ranking of the selection of organisms. Target selection for genome sequencing within the Roseobacter clade The Roseobacter clade is a major lineage within the Rhodobacteraceae (Alphaproteobacteria) [17,19].

At the time of target selection (spring 2011) it included about 95 species [36]. The clade is of interest mainly because of its important role in marine environments, where its members form one of the most abundant and successful groups of non-obligately phototrophic prokaryotes [18,38]. For a phylogenomic assessment of the group a suitable selection of organisms has to be obtained. A phylogenetic tree including a total of 99 species was inferred from 1,366 aligned characters [39,40] of the 16S rRNA gene sequence under the maximum likelihood criterion [29,41,42]. For rooting, the genus Labrenzia (which belongs to the family Rhodobacteraceae, but not to the clade) was included but ignored when calculating the scores. (One of the advantages of these methods is that the ranking of the ingroup scores is independent of the ranking of the outgroup scores.

) Results Interrelationships of phylogeny-based indexes for target selection Table 2 show the correlations between the two measures, bRPD and uRPD, the heights in the tree of each leaf, and the number of nodes between the root and each leaf, and the residuals of the regression conducted with the latter two factors as the dependent and independent variable, respectively. Whereas bRPD and uRPD AV-951 were highly correlated, their correlation with the number of nodes was moderately strong and negative.

