The clusters, based on 16S rRNA identities and
compared with genome metrics:
in silico DNA hybridization (isDDH), Mash-similarity,
average nucleotide identity (ANI), conserved marker genes (CGS)
A: Methods.
Although genome sequences are still rather few for cyanobacteria, and many draft genomes do not contain 16S rRNA genes, we can correlate data derived from four methods of genome analysis with 16S rRNA identities within 16S-based clusters which contain more
than one genome sequence. We briefly describe some of the results below.
The following 16S rRNA identity cutoff values were employed
to define genera and species: members of the same species show greater than 98.70
%-99.0 % 16S rRNA sequence identity; members of a single genus share greater
than 96.50-96.90 % 16S rRNA sequence identity. The latter value is higher than
that (94.5%) proposed by Yarza et al. (2014) from a detailed study of
many bacterial phyla; however, the latter study seemingly included few
cyanobacteria, since it was based only on the type strains of bacteria and
archaea validly published under the rules of the ICNP. Inter-specific values
lie between the lower limit of the intra-specific range and the upper limit of
the genus cutoff.
For genome sequences, isDDH values were calculated with the DSMZ GGDC 2.1 server. GGDC uses 3
formulae for calculation of DDH, of which we employ formula 2 (sum of identities
in HSPs/total HSP length) for incomplete (draft) genomes and formula 3 (sum of identities
in HSPs/total genome length) for completed chromosomes; the latter is
equivalent to a wet-lab hybridization, except that plasmid sequences that are
not integrated into the chromosome are absent. Unless otherwise stated, the
values calculated by formula 2 are given, since there are relatively few (about
10% of the total) complete chromosomal sequences, required by formula 3. However,
formula 2 may, especially around the cutoff values, give results that are
higher than the true value, since only the positive identities are given, in
the absence of correction for genome size. The accepted cutoff value for species
distinction by wet-lab DDH is an arbitrary 70% (Wayne et al, 1987) and is employed by the isDDH method. However, the 70% value was derived from 16S rRNA-based phylogenies of the type species of a broad range of bacterial phyla, with the Cyanobacteria largely neglected; there is no proof that this is a good discriminatory value in individual phyla.
Data from wet-lab DDH studies are included where available; if
thermal elution of the hybridization products from hydroxyapatite was employed,
two values (% relative binding and Tm(e)) are obligatory, where a shift of 1-2.2% °C in the mean elution temperature (Tm(e)) represents a 1% base mismatch in the hybrid (see Stackebrandt & Goebel, 1994).
Where appropriate, genomic sequences were further studied in
CheckM for
degree of completeness and contamination (Parks et al., 2015). The JSpecies web
server was employed for ANI
(Average Nucleotide Identity) values; the inconvenience of uploading large genomes
to the server can be overcome by copying the genomes of interest to a local
folder, then zipping them into a single file with the command "zip -r
cyano.zip ." (note that the space and dot in the expression are mandatory;
the name "cyano" is only an example) from within the folder, then
uploading the zipfile to the server. The input genomes are deflated by 70% and
therefore upload more rapidly. Other available ANI calculation methods include fastANI,
which uses Mash in place of BLAST (Jain et al., 2018) and OAU, which
employs USEARCH in place of BLAST. Since both identity values and length of homologous
fragments differ slightly with different methods (see also Palmer et al.,
2020), it is wise to use the same method throughout. The accepted cutoff value
for species demarcation with ANI is around 96% (Richter and Rosselló-Móra,
2009), based largely on the 70% DDH value. However, the ANI method evaluates only the similarity of elements shared between
two genomes, does not yield information for regions that are not shared, and is
not corrected for total genome size. In consequence the results, like those of
formula 2 of the GGDC for isDDH, may be artifactually high.
Neither DDH nor ANI are normally used to distinguish taxa at
the inter-specific level. being usually employed to decide only if two taxa can
be assigned to the same species. We show below that both appear to efficiently discriminate
at the inter-specific level in most comparisons, even if they rarely give
conflicting results for the reasons described above. Further studies involved
the use of (a) the alignment-free Mash (Ondov et al., 2016) and (b) our
set of 79 conserved marker genes (CGS), which clearly does not measure the
similarities of the entire genomes. Both of these are described on the Main
page of this site. The Mash measurements are presented as distance matrices by
default, and are presented here as uncorrected similarity values. For the CGS estimates,
we also employ the uncorrected values throughout. These were obtained by
building a tree without applying a correction, then converting the finished
genomic tree to a distance matrix by using cophenetic.phylo of the R software package
ape. For both types of matrix, a single custom script was then employed to
extract the distance value for pairwise combinations of taxa, converting this
to a similarity value. We have attempted to correlate the results obtained by
these different methods in the following figure, where pairwise comparisons of
taxa assigned to the same species on the basis of 16S rRNA sequence identity
are shown as blue symbols, those assigned to different species of a single
genus are represented by red symbols and separate genera by green. Draft
genomes and metagenomes have been excluded from these calculations. The same
dataset was employed for all methods. We emphasize that the 16S rRNA gene
sequences are those extracted (by our custom shell script) from the genomes of
interest, and not those obtained by normal PCR amplifications, to avoid any
errors. Overall, the results confirm the value of our 16S rRNA-based cutoff
values, since the three categories of taxa are clearly separated:
However, the isDDH values calculated with formula 3 (top left) fail to
discriminate between taxa of different genera. Using our set of 79 conserved
marker proteins (CGS, bottom right) greatly increases the sensitivity for taxa
of different genera, but compresses the intra-specific taxa. The results
obtained from analysis of genome similarity with Mash (top right) and ANI
(bottom left) appear to overcome all biases of the other methods and correlate reasonably
well with the 16S rRNA identity estimates. We stress the use of only complete chromosomes
in performing these analyses; draft genomes by definition may contain
duplicated regions, or lack segments, thus obscuring the results. This can be
easily demonstrated by artificially duplicating or deleting a segment of a
draft genome; if the isDDH value obtained with the genome falls just above or below the inter-species or
generic cutoff value, the value obtained with the artificial constructs gives
an incorrect position. The minimum and maximum percentage similarity values obtained
from our dataset with each method are summarized in the following table,
together with a minimum value obtained by comparison of organisms from the top
and bottom of the tree.
The gap in the region of species separation is caused by the
paucity of available complete chromosome sequences; results which fall into
this region are treated as borderline cases throughout this document. In
addition to isDDH with formula 3 (f3), Mash-similarity, ANI and CGS, the table
contains the isDDH ranges obtained with the same dataset using formula 2 (f2). Notice
that the isDDH results obtained with the two formulae are not identical, and
that there is a marked overlap of the formula 2 values around the level of
generic separation, indicated in red in the table. This is clearly shown in the
following figure (left panel).
We would have expected similar results for isDDH values
calculated with the two formulae; that this is not the case is shown in their
plot in the figure (right panel). This is the result of the different
corrections applied in each formula. That formula 2 results are not reliable at
low values is shown by the observed minimum (see above) of 26.1%, which falls
into the overlap region. For these reasons, we give preference to the formula 3
results (calculated as total identities divided by genome size) for complete chromosomes
throughout this document, considering that the formula 2 (identities divided by
HSP length) estimates are only approximate. We are forced to use the latter for
draft genomes, but results obtained by all other methods should be considered
as having priority.
We report Mash similarity data only for
complete chromosomes since this method searches for kmers and is unlikely to be
valid for draft genomes, where kmers are possibly broken at the end of contigs.
In contrast, CGS results are valid for all genomes.
For many taxa we calculated the number of repeat sequences
in the genome with RepSeek with a seed length of 25.
Members of some genera (as defined by 16S rRNA sequence
identity) do not fit into the above scheme. These, described below in section B1, include the following:
Picocyanobacteria: Prochlorococcus and the Synechococcus
OMF sister group.
Members of the genus Planktothrix.
Sphaerospermopsis spp.
The
extremophiles: Euhalothecenatronophila strain Z-M001 and Halothece
sp. PCC 7418 of genus 5.2.4.3, and possibly Dactylococcopsis salina
PCC 8305 (genus 5.2.4.1), all halophiles, and the thermophiles: Thermosynechococcusvestitus strains BP-1 and PKUAC-SCTE542, T. vulcanus strain
NIES-2134, Thermosynechococcus sp. strains CL-1, NK55a and TA-1
(genus 8.1.1), all isolated from thermal springs
Synechocystis spp.
Pseudanabaena spp.
Some Calothrix, Crocosphaera, Leptolyngbya and Nostoc clusters.
B: 16S rRNA sequence identities and genomic identities
Note that we are unable to compare many incomplete
genomes or those which lack, or contain only short fragments of, the 16S rRNA gene.
B1: Clusters for which metrics are not congruent
Picocyanobacteria
the OMF clade
Within the "picocyanobacteria" (genus 7.1.1), for which we
use the term "OMF" (Oceanic, Marine, Freshwater, as
defined by Laloui et al., 2002), we show 62 genomic sequences; only 17 of the
genomes are complete, and 8 are metagenomes. The complete chromosomes show a
range of DNA base composition of 52.45 to 67.98 mol% G+C and vary in size from
2.11 to 2.98 Mbp. Genome sequences are available for only 18 of the 32 species
defined by 16S rRNA sequence identity. Comparison of genome metrics obtained
with complete chromosomes within individual specific clusters is possibly only
for two species (7.1.1T, 7.1.1Z).
Species 7.1.1T contains two strains represented by complete chromosomes:
LTW-R and CB0101; 2.42 and 2.79 Mbp, 62.56 and 64.06 mol% GC, respectively. Values
of 26.2% isDDH (GGDC formula 3), 76.44% ANI and 95.05% CGS place these strains
on the borderline between different species and genera, despite their 16S rRNA
sequence identity of 98.7%.
The five complete chromosomes in species 7.1.1Z, with 54.2-61.4
mol% G+C and size 2.11-2.59 Mbp, show Mash similarity 76.26-92.29%, CGS 76.26-98.56%,
isDDH 16.6-66.9% (GGDC formula
3) and ANI 71.79-87.58%. They are therefore not con-specific. Strains CC9605
and KORDI-52 are distinguished by medium values of Mash similarity (89.86%),
CGS (98.56%), isDDH (53.9% with
GGDC formula 3) and ANI (87.58%); strain WH 8109 shows Mash similarity 89.95%,
CGS 98.39%, isDDH 54.5% (GGDC
formula 3) and ANI 87.96% with KORDI-52. The genome metrics show that the
latter 3 strains are members of different species, the others being assigned to
a different genus.
The inclusion of draft genomes permits the comparison of two
pairs of Cyanobium spp. strains. Isolates PCC 7001 and NIES-981 share
98.78% 16S rRNA sequence identity and are therefore borderline members of a
single species (7.1.1O). However, values of 32.4% isDDH, 86.80% ANI and 97.29% CGS assign them to distinct species. Strains PCC 6307 and Copco Reservoir LC18 (7.1.1J) are identical in
16S rRNA sequence, their con-specificity being confirmed by 50.5% isDDH and 92.61% ANI values.
In contrast, inter-specific comparisons may be performed with
complete chromosomes for 11 of the defined clusters of the OMF clade. Mash
similarities of 72.61-78.50%, CGS 86.85-95.29%, isDDH 13.6-17.2% (GGDC formula 3), ANI 69.06-74.20% show that all are different genera.
The species not described above (7.1.1C, G, P, V, X, Y, AA
and AC), defined on the basis of 16S rRNA sequence identities, are all also poorly
supported by genome metrics.
Coutinho et al. (2016a) compared
the genomes of 24 Synechococcus strains with six methods: ANI, AAI (Average
Amino acid Identity), dinucleotide signature, isDDH, 16S rRNA identity and MLSA (Multi-locus sequence analysis) with genes rrsA, gyrB, pyrH, recA and rpoB. The authors assigned the generic
name Parasynechococcus to 15 strains (the sister group of Prochlorococcus),
with WH8102 as type strain. Although published under the rules of the ICNP, the
strains are not axenic, and are therefore invalid under that code. The genus was
also invalid under the rules of the ICN, since a type species was not named. However,
Komárek et al. (2020) validated the genus, and we have accepted this in all other phylogenetic trees on this site. Subsequently, Coutinho et al.
(2016b) assigned
specific names to each of the 15 strains. These specific names are shown in RED
in the figure below, but note that some of the generic names were later changed
by Walter et al. (2017); these have been retained in the same colour in
the figure. Walter et al. (2017) produced a tree of 100 genomes, using
the 31 marker genes of Wu & Eisen, and validated the circumscription of
species and genera with measurements of isDDH
and AAI values. Strains were considered to belong to the same species when they
share at least 98.8% 16S rRNA sequence identity, 95% AAI and 70% isDDH, the cut-offs for species delimitation being ≥95% AAI and ≥70% isDDH.
The AAI values are similar to the ANI values that we use throughout this page: e.g.
70-90% AAI within Parasynechococcus, vs 71.8-91.2% ANI; 63-85 AAI vs
70.8-90.8% ANI for Pseudosynechococcus; between "genera" AAI
59-62 %, ANI ~69.1-71.0%. AAI values ≥70% were employed for genus delimitation.
The new generic names were Pseudosynechococcus, Magnicoccus, Regnicoccus
and Inmanicoccus; none of these are validly published. Salazar et al.
(2020) markedly expanded the coverage of the OMF group by adding a large number
of genomic sequences, shown in VIOLET in the figure. These authors also added
the generic name Synechospongium to replace candidatus Synechococcus
spongiarum (16S rRNA cluster 7.1.5), as shown in GREEN in the figure.
Unfortunately, these authors included a large number of incomplete genomes of
this cluster, which we have excluded from our genome trees. Synechococcus
lacustris strain Tous was renamed to Lacustricoccus sp., marked in
the same colour. The generic names Synechospongium and Lacustricoccus
are again invalidly
This figure was extracted from our
genomic tree, with the 16S rRNA cluster numbers added as they appear in the 16S
rRNA tree. Strain PCC 6301 was employed as outgroup.
published. Since there is little correlation between similarity
values provided by 16S rRNA analysis with genome metrics, there is clearly an unusual
mode of evolution within this group. This is also true for the Prochlorococcus
sister group, described below. The members of both clusters may be considered
as being among the most abundant photoautotrophs in the oceans, having achieved
buoyancy by marked reduction in cell size. However, the genomes of most members
of the Prochlorococcus group appear to have undergone extensive
streamlining (reduction in size),
which is less evident in the OMF cluster. Although we do not know the true
ancestor of these groups, the sizes of the complete chromosomes of related
strains vary from 2.66-3.77 Mbp, 48.52-60.24 mol% G+C (Synechococcus
spp. strains PCC 6312, PCC 6715, JA-2-3B'a, JA3-3Ab), whereas the OMF cluster
strains contain genomes of size 2.11-2.98 Mbp, 52.45-67.98 mol% G+C. Most of
the completed chromosomes of strains of the OMF group contain two copies of the
rrn operon, like many other unicellular cyanobacteria, and therefore
extensive genome reduction has not occurred. Unlike strains of the Microcystis
and Moorena clusters described below, we find few repeat sequences (39-467)
in members of the OMF clade, insufficient to effect the genome metrics. Again
unlike the situation in Microcystis, the core genomes prepared from
members of the OMF clade show relatively unchanged ANI values, implying that
the evolutionary changes have occurred in all parts of the genome.
Outside of the OMF cluster, Walter et al.
(2017) renamed many taxa, in many cases unwisely; for example, seemingly based
only on their clustering in the tree and ignoring the 16S rRNA identity values
and genome similarity data, Cyanobacterium stanieri PCC 7202 was
transferred to the genus Geminocystis (represented by strain PCC 6308),
even though these isolates share only 93.52-94.54% 16S rRNA identity, 72.87%
ANI and 13.8% isDDH (GGDC formula 3), and Halothece sp. PCC 7418 was transferred to the genus Dactylococcopsis (PCC 8305), when the strains involved share only 96.26% 16S rRNA identity, 77.93% ANI and 16.2% isDDH (GGDC formula 3). Since these values are representative of distinct genera, such transfers are invalid
Prochlorococcus
Prochlorococcus spp. strains (genus 7.1.3) are divided into 5 species in the 16S rRNA tree on the basis of 16S rRNA sequence identity (within species, around 99.7%; between
species 97.3-98.5%). The genus Prochlorococcus was validated under the
rules of the ICN by Komárek et al. (2020), having been previously
invalidly published by Chisholm et al. (1992). The latter authors
designated strain CCMP1375 as the reference strain; Komárek et al.
(2020) changed this to strain MIT 9313, on the grounds that "the original
reference strain.....is below the 98.7% threshold in comparison to all other
strains. Although we are forced to accept this change, our results show that
strain MIT 9313 is less related than CCMP1375 to the others. This is also
evident from the two phylogenetic trees shown by these authors: of the 8
isolates included in the tree inferred from 16S rRNA sequences, 5 fall into
species 7.1.3A, the exceptions being strain 0801 (species 7.1.3B) and the two
reference strains which appear in species 7.1.3C and 7.1.3E. The phylogenomic
tree of these authors includes only 6 of the 8 strains. Strain MIT 9313 (50.74
mol% G+C, 2.41 Mbp in genome size) is a member of the group of strains
containing large genomes of high mol% G+C (see below), whereas CCMP1375 has a
genome with 36.44 mol% G+C and 1.75 Mbp, thereby being more representative of
the genus. Most members possess small genomes (approx. 1.6 Mbp) of low (30-38)
mol% G+C and a single copy of the rrn operon; members of species 7.1.3E
have larger genomes (around 2.6 Mbp) of higher (around 50) mol% G+C with two
copies of the operon, as calculated from the only two complete chromosomes
representing this group (including the new reference strain). With such large
differences, particularly in mol% G+C content, it is pointless to perform isDDH between
members of species 7.1.3E and the others. The tree of complete plus draft
genomes, based on our set of 79
core marker genes, contains genomes of 54 Prochlorococcus strains (14
complete), as shown in the figure below (where the Prochlorococcus clade
has been extracted from the genome tree, and the species cluster numbers are
derived from those of the 16S rRNA tree). The members of species 7.1.3A
represented by complete chromosome sequences show a minimum of 99.12% 16S rRNA
sequence identity. However, again using only the complete chromosomes, 2 strains
(CCMP1986 and MIT9515, sharing 63.4% isDDH
[GGDC formula 3] and therefore probably representing two different species)
form a sub-clade of species 7.1.3A, rather than being included within the clade;
this is supported by their isDDH
values of 41.7 to 48.8% with the other strains of species 7.1.3A, which
themselves show 68.3-82.5% isDDH
and are bracketed in the figure as "DDH 1". isDDH analysis with the 3
complete chromosomes of species 7.1.3B confirms the similarity of strains
NATL1A, NATL2A and MIT0801 (sharing a minimum isDDH value of 80.7%) and is thus in agreement with the results of 16S rRNA study ("DDH 2" in the figure). Unfortunately, there are
insufficient complete chromosomal sequences available for members of species
7.1.3C and 7.1.3D to permit isDDH analysis within each species, but the single complete
chromosomes of strains CCMP 1375 and MIT 9211 show only 15.9% isDDH, Mash similarity 76.74%, ANI 72.3% and CGS 87.80%, placing them into two distinct genera. Since clades 7.1.3A
to 7.1.3D (as defined by 16S rRNA similarity values) show longer branches in
the genomic tree than in the 16S rRNA sequence-based tree, increasing the
number of genes employed for the analysis results in increased resolution. Clade
7.1.3E is therefore separated more clearly from the others here than in the 16S
rRNA-based tree; strains MIT 9303 and MIT 9313, represented by complete chromosomes
of greater size and higher mol% G+C content than all other "species",
exhibit 78.3% isDDH, 95.84%
Mash similarity, 94.83% ANI and 98.89% CGS, thus representing a single species
("DDH 3"). The reference strain designated by Komárek et al.
(2020) is not, therefore, representative of the majority of strains of the
genus, as mentioned above. In addition, these authors claim that strain MIT
9313 was isolated from the Sargasso Sea, whereas it was found in the N.
Atlantic.
#: not present in 16S rRNA tree (containing only
fragments of the 16S rRNA gene). The genome of strain MIT9202 is draft,
erroneously included by Thompson et al.
In summary, the genome metrics further divide
"species" 7.1.3A into two species, maintain cluster 7.1.3B as a
single species and define "species" 7.1.3C and 7.1.3D as distinct
genera. The only generic and specific epithets appended, P. marinus, are
employed for all strains; it is evident that extensive renaming at both
taxonomic levels is required. An important step in this direction was made by
Thompson et al. (2013), who proposed division of the genus into 10
species based on isDDH and ANI
analysis of complete chromosomes; these species have not been validated under
the ICN. The proposed specific names are shown in parentheses and highlighted
in red in the figure; note that 2 complete chromosomes (marked in black) have
more recently become available. Our more extended isDDH/ANI studies, including
draft genomes, show that many more species can be distinguished by ANI values,
indicated in the figure as ANIb 1 to ANIb 25. There is no effect of repeated
sequences on species circumscription by ANI in this genus, since the strains of
species 7.1.3A to 7.1.3D contain only low numbers (37-190) of these elements
and even the strains with larger genomes (species 7.1.3E) contain only 425-533;
ANI values of the core and accessory genomes are unchanged. The necessity of
whole-genome sequencing to achieve a taxonomic scheme, rather than relying on a
16S rRNA-based phylogeny, is an open question: the strains of clusters 7.1.3A
to 7.1.3D have undergone extensive genome streamlining (see e.g. Giovannoni et
al., 2014; Sun & Blanchard, 2014, and references cited), and different
lineages may have lost and/or gained different genomic regions (see Humbert et
al., 2013, and references therein) making comparisons difficult and perhaps
explaining the apparent discrepancies between genome metrics in defining
species boundaries. Li et al. (2015) suggested that ANI and DDH values
vary in different genera and should be replaced by the sequence identity of
homologous genes as the criterion for delineation of species; this is exactly
our approach for inference of genomic phylogenetic trees.
Thompson et al. employed GGDC formula 2 (identities/HSP
length), which is equivalent to ANI, whereas we have used formula 3 (identities/total
genome length), which is more appropriate for complete chromosomes. ANI, if the
analysis is performed with default parameters, splits species established by isDDH
(formula 3) into further specific clusters (as shown in our tree, above) and
would appear to be inappropriate. It is possible to show this by decreasing the
length setting in JSpecies, or the fraglen or Kmer size settings in fastANI, to
a point where the ANI values match those of isDDH. isDDH does not suffer from
this artifact. If we consider only the isDDH values obtained with formula 3,
the species P. chisholmii, P. ponticus, P. neptunius and P.
nereus proposed by Thompson et al. become a single species; the six
remaining species appear to be validly delineated. Walter et al. (2017),
including only 13 strains in their tree, further divided the genus Prochlorococcus
into four genera, their names being shown after the "/" in the
figure. None of these genera are valid under the ICN. Eurycolium is equivalent
to clusters DDH 1 plus ANIb 13 in the figure, which together form 16S rRNA
group 7.1.3A; Prolificoccus represents DDH group 2 and 16S rRNA cluster
7.1.3B; Prochlorococcus was retained as two species (equivalent to ANIb
groups 18 and 21, 16S rRNA clusters 7.1.3C and 7.1.3D); Thaumococcus
(including only the two P. swingsii strains described above, forms ANIb
clusters 22 and 25, 16S rRNA group 7.1.3E). Tschoeke et
al. (2020), with a tree of 208 genomes created with an unspecified 249 marker gene
set, expanded further the studies of Walter et al. (2017). These authors
proposed the creation of a new genus, Riococcus, and greatly increased
the coverage of this group by the addition of many draft genomes; their
suggestions are shown in violet in the figure. None of the nomenclatural
changes were validly published under the rules of the ICN.
The 5 clusters shown in the
figure are distinguished by genomic GC content as summarized in the table
below, where we employ the generic names proposed by Walter et al.
Genome size, in contrast, is not discriminatory for the clusters of low mol%
G+C content, but in many cases is based on the added lengths of the multiple contigs
of draft genomes.
Calothrix spp. (part)
The two genomes representing species
1.15.15D, Calothrix parasitica NIES-267 (containing 3 identical rrn
operons) and Rivularia sp. PCC 7116 (also 3 identical rrn
operons), are similar in size (8.95 and 8.70 Mbp, respectively); they show only
82.23% ANI (measured over 54% of the genome) and 97.39% CGS, which places them
as members of two distinct species. The isDDH formula 2 value of 26.9% is not
discriminatory in this range. However, their extracted 16S rRNA gene sequences
exhibit 99.66% identity. Genome metrics and 16S rRNA identity values therefore
disagree in this cluster. Genus 1.11.2 contains the sequences of 16S rRNA genes extracted from the complete genome of Calothrix brevissima NIES-22 (5 operons, sharing 99.73-100% identity) and the draft genome of Calothrix sp. FACHB-1219 (1 operon). They fall into two species (1.11.2A and 1.11.2C, respectively). However, all genome metrics indicate their con-specificity: 57.8% isDDH, 96.20% ANI, 99.63% CGS. Other clusters of Calothrix spp., for which
all similarity values agree, are described below.
Chlorogloeopsis
The two draft genomes of Chlorogloeopsis fritschii
strain PCC 6912 and the draft genome of strain C. fritschii PCC 9212 (monospecific
genus 1.15.3) give high (99.8%) isDDH
values, ANI values of 99.96% and 100% CGS, as expected from their 16S rRNA sequence
identity of 100%. The strain represented by the metagenome C. fritschii
C084 (lacking the 16S rRNA gene) is a member of the same species, as shown by
values of 81.3% isDDH, 97.2% ANI and 99.76% CGS with the above. The C.
fritschii strains are only distantly related to the unidentified HTF (High
Temperature Form) PCC 7702 in genus 1.15.2 (23.1% isDDH, 77.99-78.97% ANI, 92.54% 16S rRNA identity). The isDDH value may be ambiguous in this range, but the ANI
value and 95.42% CGS clearly indicate that these strains are members of
distinct species, whereas the 16S rRNA identity value places them into
different genera. Organisms of the HTF type are often incorrectly named as Chlorogloeopsis
sp. in the literature. Similar results were obtained by Lachance (1981) in wet-lab
thermal hydroxyapatite elution DDH studies for C. fritschii strains PCC
6718, PCC 6912 and PCC 9212 in genus 1.15.3 (93% DDH, no change in ΔTm(e);
our unpublished results (Raum, Herdman & Rippka) using optical thermal
renaturation DDH again show the con-specificity of strains PCC 6912 and PCC
9212 (100% DDH) and that of unidentified HTF strains PCC 7517, PCC 7518, PCC
7519 and PCC 7702 (86-100% DDH) in genus 1.15.2.
Crocosphaera (part)
Within species 5.1.4.1A, the complete chromosome of Crocosphaera
subtropica ATCC 51142 (previously Cyanothece sp.) shows an isDDH value of 98.0%, 99.99% ANI (100%
16S rRNA sequence identity) with that of Crocosphaera subtropica ATCC
51472 (draft); these two strains are therefore clearly con-specific. However,
strain ATCC 51142 shows only 33.5-34.8% isDDH, 86.71-87.15% ANI and 98.02-98.25%
CGS with strains Cyanothece sp. BG0011 (draft), Crocosphaera
chwakensis CCY0110 (draft) and the unidentified unicellular cyanobacterium
SU3 (draft). The latter strains are each assigned to different species by all
genome metrics (34.8-38.6% isDDH, 87.51-88.76% ANI and 98.23-98.44% CGS). On
the basis of genome metrics, the single species delineated by 16S rRNA analysis
should therefore be divided into four. In contrast, members of Crocosphaera
species 5.1.4.1B (below) appear to be supported as con-specific by
all similarity measures.
Extremophiles The halophiles
In genus 5.2.4.3, the 16S rRNA genes extracted from the
complete chromosomes of Euhalothece natronophila Z-M001 (species
5.2.4.3A) and Halothece sp. PCC 7418 (species 5.2.4.3B), together with the
draft metagenome of "Cyanobacteria bacterium" GSL.Bin1 (species 5.2.4.3E),
from alkaline or hypersaline habitats, share 96.81-97.52% sequence identity and
show low isDDH and ANI values
(~15% and ~75%, respectively). Although the 16S rRNA identity value of 97.31%
between the strains represented by complete chromosomal sequences suggests them
to belong to two species of the same genus, the low values with isDDH (15.1%,
GGDC formula 3), ANI (75.76%), Mash (78.42%) and CGS (93.79%) suggest them to
belong to different genera. The separation of the halophilic strains Dactylococcopsis
salina PCC 8305 and Halothece sp. PCC 7418 into 2 genera (5.2.4.1 and
5.2.4.3), justified by low (96.36%) 16S rRNA sequence identity, is accompanied
by slightly higher isDDH (16.2%),
ANI (77.93%), Mash similarity (81.62%) and CGS (94.44%) values of their
complete chromosomes, which place them on the borderline between species and
genus. Monospecific genus 5.2.4.8 contains 16S rRNA genes extracted from the draft
genomes of strain Sodalinema stali (previously Oscillatoria or Phormidium)
sp. HE10JO (only one of four being shown), S. gerasimenkoae IPPAS B-353
(only one of two are shown), Geitlerinema sp. P-1104 (a single gene) and
the metagenomes labelled as Cyanobacteria bacterium T3Sed10_304 and Phormidium
sp. CSSed162cmB_426, all from hypersaline or alkaline environments. The 2 metagenomes are incomplete, containing respectively
only 72 and 63 of our 79-gene core marker set, and have been excluded from the
genome tree and calculation of genome metrics. The 16S rRNA sequences from
the first three genomes share 99.33-100% identity, but the genomes show only 81.74-87.72%
ANI and 97.69-98.53% CGS (the values of 28.2-35.1% isDDH should be considered as ambiguous in this region), placing
them in separate species of the genus. These rapid changes of the genomes (in
whole or part) of all of the extremophiles described above, with little change
in 16S rRNA identity, suggest a mechanism of evolution in which some or many
gene products have become adapted to the stress imposed in hypersaline
environments.
Organisms which tolerate lower salt concentrations are
widespread in the 16S rRNA tree. Of over 250, few genomic sequences are available.
In all cases, the 16S rRNA sequence identities and genome metrics are
congruent; these organisms are described in the appropriate paragraphs below.
The thermophiles
Cluster 8.1.1A of the 16S rRNA tree contains the sequences extracted from the genomes of 44 strains named as Thermosynechococcus vestitus, T. vulcanus and Thermosynechococcus spp., most isolated from thermal environments. Like the halophiles described above, the thermophilic members do not show congruent relationships between 16S rRNA and genomic phylogenies.
For comparison of genome metrics we use here only the complete genomes, since this permits their Mash similarity values to be used as an additional criterion. The tree of all 30 complete genomes is included below. Many of these are also shown in the tree inferred from all genomes. The genomes are similar in mol % G+C (53.3-53.9) and size (2.59-2.68 Mbp).
Despite behaving as a single species in the 16S rRNA tree (98.72-100% sequence identity), genome metrics divide the 30 strains into two genera. Synechococcus lividus PCC 6715 was included in the genus Thermosynechococcus by Komárek et al., (2020); this was quite clearly an error, since this strain is obviously a member of a different genus. Comparison of the completed genome of strain PCC 6715 shows 93.98% Mash similarity with the nearest relatives of species 1A in the figure below. In the all-genome tree, strain PCC 6715 is joined by strains PCC 6716 and PCC 6717.These share 99.62-100% identity among themselves, and 98.60-99.18% sequence identity to the others. They therefore fall onto the limit of species separation.
The strains of genus 1 are divided into two species (1A and 1B in the figure below) by all genome metrics. Their genomes range in size from 2.32 to 2.71 Mbp, and in GC content from 52.4 to 55.5 mol% G+C.
Species 1A contains 24 complete genomic sequences, which share 96.06-99.52% ANI, 95.6-100% isDDH (GGDC formula 3), 99.31-100% CGS and 97.56-100% Mash similarity.
Species 1B contains four complete genomes, separated from the strains of species 1A by values of 87.50-90.11% ANI, 66.9-79.3% isDDH (GGDC formula 3), 98.02-98.80% CGS and 89.57-92.21% Mash similarity. The isDDH value is higher than expected for separation at the species level. With GGDC formula 2, the isDDH value is 33.2-34.5%, confirming specific separation. Two genomes representing the type species, T. vestitus, including that of the reference strain T. vestitus BP-1 (renamed from T. elongatus by Komárek et al., 2020) lie in this species together with that of a strain named as T. vulcanus.
Shown as the outgroup in the Figure, Synechococcus sp. strain PCC 6312 is joined in the all-genome tree by Pseudocalidococcus azoricus BACA0444 (Luz et al., 2023). The two genomes are identical (99.8% identity of their extracted 16S rRNA genes), but are members of different species of a single genus as shown by genome metrics (89.41% ANI, 39.5% isDDH, 98.09% CGS). These strains differ from all members of the genus Thermosynechococcus in having lower G+C contents of 48.5 and 48.7 mol%.
The conflicts between 16S rRNA sequence identities and genome metrics in this cluster perhaps indicate an adaptation to thermal stress of many gene products. Other strains of thermal origin, such as Leptolyngbya sp. O-77, Thermoleptolyngbya sp. PKUAC-SCTA121 and PKUAC-SCTA183 (species 6.2.8A) and Synechococcus spp. JA-3-3Ab (species 10.1.1A) and JA-2-3B′a (species 10.1.1B), described below, show speciation patterns that are coherent when measured with all available methods. Similar conclusions for some of these strains were reported by Cheng et al. (2020). Li et al. (2014) have commented on the increased rates of evolution in microbial communities of extreme environments, and a thorough review of cyanobacterial halophiles was provided by Oren (2015).
Laspinema
In the monospecific genus 5.2.3.1, Laspinema sp. strains D2a and D3b, both represented by draft genomes (Stanojkovič et al. 2022), share 98.86% 16S rRNA sequence identity, and are therefore members of a single species. They share 98.79-98.93% 16S rRNA sequence identity with Oscillatoria acuminata PCC 6304 (completed genome) and 98.73-99.46% identity with the draft genome of strain Oscillatoria sp. HE19RPO. The 4 strains are therefore co-identic on the basis of 16S rRNA sequence identity. However, on the basis of genome metrics, the two strains of Laspinema form two species (89.75% ANI, 39.8% isDDH, 98.72% CGS). The Oscillatoria strains form two further species (89.91% ANI, 41.0% isDDH, 98.80% CGS), neither being related to the two Laspinema species (88.67% ANI, 37.2% isDDH, 98.61% CGS). The genomes of other strains of Laspinema spp. (D2b, D2c, D2d, D3a, D3c and D3d) are shown in the tree inferred from genome sequences, but are not included in the 16S rRNA tree since they contain no 16S rRNA genes.
"Leptolyngbya" (part)
Species 5.1.7.9B contains the filamentous Leptolyngbya
sp. PCC 7376, which shares 97.51% rRNA sequence identity with, for example, Picosynechococcus
sp. strain PCC 7002 in species 5.1.7.9A, suggesting their assignment to
distinct species of the same genus. That this is unlikely is shown by their
different genome properties: mean DNA base composition 43.87 mol% and 49.17
mol%, respectively; genome size 5.13 Mbp and 3.43 Mbp. The isDDH value for their completed chromosomes
is only 13.9% (GGDC formula 3), with only 71.91% ANI, 93.21% CGS and 73.77%
Mash similarity, placing them as members of different genera. Another
filamentous organism, Limnothrix rosea strain IAM M-220 (NIES-208), also
in species 5.1.7.9B, is represented by a draft genome sequence; this strain is
closely related to Leptolyngbya sp. PCC 7376 on the basis of both 16S rRNA
sequence identity (99.33%) and 23S rRNA sequence identity (99.03%); however,
their full genomes share only 21.4% isDDH,
77.39% ANI and 97.02% CGS, assigning them to two distinct species. The
filamentous members of this genus have therefore retained high 16S rRNA sequence
identity but have undergone more rapid divergence in other parts of the genome,
each in different regions. These differences between unicellular and
filamentous members may have arisen by either of two scenarios: (1) an HGT
event in which a filamentous ancestor of the Leptolyngbya and Limnothrix
strains acquired the rrn operon of a unicellular organism, followed by
extensive genomic evolution; (2) the ancestor was unicellular, evolving in
morphology and gene content while retaining the rrn operon.
Species 6.1.1C contains
two draft genomic sequences, Phormidium tenue NIES-30 and Leptolyngbya
sp. ULC186, that share 98.92% 16S rRNA sequence identity, 87.35% ANI and 98.53% CGS (the isDDH value of 34.0% obtained with GGDC formula 2 is not discriminatory in this region). They should
therefore be considered as members of different species. Within genus 6.1.10, 2
of the 6 species contain uniquely the genome sequences of strains of Leptolyngbya;
Leptolyngbya sp. BC1307 (species 6.1.10D) and L. foveolarum
ULC129 (species 6.1.10E) are defined as members of different species in sharing
97.24% 16S rRNA sequence identity. However, the results obtained with the
genome metric methods applicable to their draft genomes suggest them to be
members of two genera: 21.7% isDDH (again ambiguous in this region), 73.49% ANI
and 93.54% CGS. The extracted 16S rRNA gene of Synechococcus sp. PCC
7335 (species 6.1.10B) shares 97.51% sequence identity with that of Leptolyngbya
sp. BC1307 and 97.04% sequence identity with that of L. foveolarum ULC129.
Their draft genomes show only 19.1-19.7% isDDH
(again ambiguous) and 71.4-73.49% ANI, suggesting their assignment to
different genera. However, the CGS values (92.85-93.31%) define them as members
of a single genus. Other genomic sequences of Leptolyngbya spp. and
related genera, showing congruent results with all methods, are described below.
Nostoc spp. (part)
Although most clusters of Nostoc spp. are congruent
in terms of 16S rRNA identities and genome metrics (see below), several clusters are problematic.
Two strains of the type species, N.
commune (species 1.8.1L), NIES-4072 (with 3 identical rrn operons) and
HK-02 (NIES-2114, with 4 identical rrn operons), are identical in 16S
rRNA sequence and their draft genomes show 99.8% isDDH, 99.99% ANI (reported for 99% of the genome) and 100% CGS,
confirming their con-specificity. They share only 97.92% rRNA sequence identity,
49.7-51.0% isDDH and 91.16-91.82%
ANI with strain Nostoc flagelliforme CCNUN1 which, on the basis of 16S
rRNA identity, lies in a separate species (species 1.8.1AD). The genome
metrics, however, are conflicting: isDDH (with GGDC formula 2) suggests con-specificity
of the 3 strains (although we emphasize our concerns regarding the use of
formula 2); ANI places the strains on the borderline between species. The ANI
value seems to be artificially high, since only the identities of around 60% of
the genome are reported in comparisons involving strain CCNUN1. The CGS results
are also confusing, since the three strains are combined into a single species with
values of 99.38-100%. Strain CCNUN1 is represented by a complete chromosome
sequence (CP024785) with six 16S rRNA genes but only 5 rrn operons (rrnA-rrnE)
whose almost completely identical (99.8-100%) 16S rRNA sequences fall into
species 1.8.1AD (whose members show 99.8-100% 16S rRNA sequence identity); only
one (rrnA) is shown in the tree based on 16S rRNA sequences. The
additional 16S rRNA sequence (labelled as "(no operon)" in place of
an operon designation) falls into species 1.8.1AC.
The symbiotic 'Nostoc azollae' strain 0708 was described
by Ran et al., (2010). The quotation marks are as shown in the NCBI
entry, and presumably indicate that the authors were unsure of the identity of
this organism. This is in NCBI as 'Nostoc azollae' (TITLE field of
flatfile), but Trichormus (ORGANISM field). In a tree of 10 taxa
extracted from a larger genomic tree built with 476 marker genes and 53
organisms, the authors show strain 0708 clustering with Cylindrospermopsis.
This clustering is similar to our own, if we omit all intervening sequences. The
complete chromosome contains 4 identical copies of the 16S rRNA gene and lies
in species 1.2.5A of the 16S rRNA tree. Strain 0708 is separated at the generic
level from Sphaerospermopsis strains NIES-1949, NIES-73 and LEGE 00249 on
the basis of a mean value of 96.24% 16S rRNA sequence identity; however, the
genome metrics (81.24-81.45% ANI and borderline 95.52-95.58% CGS) suggest
separation at the specific, not generic, level. Values of 25.5-31.5% isDDH give
no useful information. A logical conclusion would be that, based on the genome
metrics, the organism Nostoc azollae strain 0708 is a member of the
genus Sphaerospermopsis. The organism was collected as an extracellular cyanobiont
of Azolla filiculoides and in terms of taxonomy has a mixed history,
being otherwise known as Anabaena azollae and Trichormus azollae,
but is clearly different from all other symbiotic Nostoc strains, most
of which lie in genus 1.8.1, as described below. The genome is also the smallest (5.35
Mbp) of all Nostoc strains in our database, which range from 6.33-8.42
Mbp, measured for complete chromosomes only, and is identical in size to the
"nearly complete" genome of S. kisseleviana NIES-73 (5.35 Mbp).
However, N. azollae strain 0708, although suggested to separate
at the generic level from Anabaena sp. PCC 7108 and Trichormus
sp. strain NMC-1 in genus 1.1.4, sharing 95.92% 16S rRNA sequence identity with
both, appears to be separated only at the specific level from these strains by
genome metrics (80.95-81.45% ANI, 95.56-95.75% CGS (borderline); the isDDH
values of 25.7-26.3% do not permit discrimination within this range)
Since strains NMC-1 and PCC 7108 were isolated from saline
habitats, Sphaerospermopsis spp. from freshwater and strain 0708 is an obligate
cyanobiont, it is unlikely that this can be explained by some degree of convergent
evolution in the genomes of these genera. Strain 0708 produces motile,
small-celled hormogonia (never described for Sphaerospermopsis spp. or
the two members of genus 1.1.4), which permit the formation of symbiosis. Strain
0708 cannot be a member of two distinct genera, as implied by the genome
metrics. The genome appears to be in a state of erosion (Ran et al.,
2010) and contains a large number of pseudogenes. The latter genomic
evolutionary process may greatly influence measures of similarity.
Nostoc strains UCD121 and UCD122, both cyanobionts, lie in species 1.8.1A of the 16S rRNA tree. This con-specificity is confirmed by values of 99.88-99.90% ANI, 99.3-99.4% isDDH and
100% CGS of their draft genomes, which include strain UCD120, not shown in the
16S rRNA tree. They show 80.95-81.49% ANI with Nostoc sp 2RC, Nostoc
sp Moss 3 and N. linkia strain z1 in genus 1.8.2 (all cyanobionts). The
isDDH values are 27.3-27.6%, the CGS values are 96.33-96.34% and the 16S rRNA
identities are 95.16-95.23%. isDDH is not discriminatory in this range of
values, but the ANI and CGS values both contradict the generic separation of
these strains. However, strains UCD120, UCD121 and UCD122 show 75.91-75.95% ANI,
22.2-22.4% isDDH, 94.70-94.71% CGS and 93.81% 16S rRNA sequence identity with
the symbiotic strain Nostoc sp TLC 26-01 in species 1.11.3A; separation
at the generic level is therefore confirmed by both ANI and CGS. They are also
separated at the generic level from the cyanobiont 'Nostoc sp' strain 0708
(species 1.2.5A, 75.54-75.65% ANI, 93.62-93.63% CGS) and the freshwater strain Nostoc
sp. PCC 7937 (ATCC 29413, 76.16-76.24% ANI, 94.73-94.74% CGS) in genus 1.10.1. Note
that Nostoc sp. PCC 7937 is incorrectly listed as A. variabilis
in NCBI (deposited by an author who received it from ATCC under that name), and
has been changed to Trichormus in the ORGANISM field of the NCBI flatfile).
The three Peltigera
cyanobionts (Nostoc sp. strains 213, 232 and N6) are con-specific with
other members of species 1.8.1A described below, if comparison is based on their ranges of
98.63-99.33% 16S rRNA sequence identity and 99.18-99.56% CGS. However, they show
unexpectedly low isDDH (40.6-40.9%) and ANI (87.77-88.43%) values, which
conflict with their con-specificity.
Planktothrix spp.
Within genus 5.2.1.1, genomic sequences of Planktothrix
spp. are available for 6 of the 10 species in the tree, some described in
detail by Gaget et al. (2015), and only six (of strains 82, 108, 758, NIES-204, NIVA-CYA 56/3 and PCC 7805) are complete The genomes of the 20 strains assigned to P. agardhii,
P. rubescens,P. prolifica and Planktothrix sp. in species
5.2.1.1A cannot be distinguished by 16S rRNA sequence identity (99.05-100%), isDDH, which gives values of 70.3-89.9%
(GGDC formula 2), ANI (95.57-99.92%) or CGS (99.22-99.74%); this cluster
therefore corresponds to a single species, which we propose as P. agardhii.
The cohesion of species 5.2.1.1A is further confirmed by isDDH of the 6 complete chromosomes
(71.1-92.0%, GGDC formula 3), ANI values of 96.07-98.67%, Mash similarity 96.97-99.11% and CGS
99.26-99.90%. The 2 members of species 5.2.1.1B (P. paucivesiculata) whose
genome sequences are available share 99.49% 16S rRNA sequence identity.
However, they show a relatively low (43.1%) isDDH value, only 90.41% ANI and 98.70% CGS, suggesting them to
be members of different species. There is therefore a conflict in the distance
estimates obtained with these draft genomes by the different methods. The two
strains differ from the first species by 16S rRNA sequence identities of 97.91%
and 98.12%, isDDH values of
44.5%, 88.88-90.36% ANI and 98.55-98.69% CGS. Within species 5.2.1.1B, the
inclusion of further strains gives internal values of 98.74-99.51% 16S rRNA
sequence identity. The genomic sequence of strain P. tepida PCC 9214 is
sufficiently distinct from all others (isDDH
values 27.4-32.6%, ANI 82.1-82.28% and 96.10-96.29% CGS) to support the
assignment of this strain to a separate species (5.2.1.1I); the eight 16S rRNA
sequences extracted from this genome share 99.06-100% identity. Species 5.2.1.1J (P.
serta) contains 2 strains, and a single genome sequence is available
(strain PCC 8927). This shows only 29.1-32.6% isDDH (ANI 83.63-86.11%) with members of species 5.2.1.1A, 5.2.1.1B and 5.2.1.1I, with which the strain shares 96.93-97.42% 16S rRNA
sequence identity. However, strains FACHB-1365, LEGE 06226 and PCC 9214,
assigned to 3 different species in the 16S rRNA tree, are con-specific on the
basis of genome metrics (55.2-65.5% isDDH, 93.41-98.13% ANI and 98.81-99.46%
CGS); these strains cluster together in the genomic tree, clearly separated
from the other members of the genus. The available methods of distance
estimation therefore give conflicting results for the genus Planktothrix,
with the exception of species 5.2.1.1A. Note that the genomes differ
dramatically in size, the six complete genomes being 4.70-6.27 Mbp, the draft genomes ranging from 5.97 to 6.73 Mbp, and that a second 16S rRNA gene extracted from strain LEGE 06226 falls into the bacterial outgroup. In view of
these uncertainties, we await further genomic sequences for clarification.
Pseudanabaena
Genus 5.4.1.1 contains 17 strains for which both genome sequences and their extracted 16S rRNA genes are jointly available (Pseudanabaena spp. ABRG5-3, CCNP1313, Chao 1806 and Chao 1811 (completed chromosomes), O-153, BC1403, GIHE-NHR1, lw0831, PCC 7429, ULC066, UWO310, UWO311 and Roaring Creek (all draft) and the metagenomic sequences Pseudanabaena spp. M085S1SP2A07QC, Salubria-1 and ULC187, together with the draft genome of Limnothrix redekeii PCC 9416).
Species 5.4.1.1A (as defined by 16S rRNA sequence identity) is divided into 4 species by genome metrics. The first comprises L. redekeii strain PCC 9416, and the metagenomes Pseudanabaena sp. Salubria-1 and M085S1SP2A07QC, sharing 54.7-86.5% isDDH, 93.37-99.66% ANI and 99.34-99.89% CGS. Strains CCNP1313 and ULC066 are both single members of individual species, showing 25.2-29.0% isDDH, 78.62-84.64% ANI and 97.22-98.10% CGS with the first species. The second contains strains PCC 7429, Roaring Creek and UWO310, showing 54.9-57.4% isDDH, 93.36-94.18% ANI, 99.48-99.51% CGS. Although strain UWO311 is also a member of this species on the basis of 16S rRNA sequence identity, genome metrics (44.6-46.9% isDDH, 91.04-91.14% ANI and 97.20-97.21% CGS) do not support con-specificity.
Strains ABRG5-3 and Chao 1811 (species 5.4.1.1G), both represented by complete genome sequences, are con-specific according to all methods: 99.33% 16S rRNA sequence identity, 69.2% isDDH (GGDC formula 3), 92.36% ANI, 93.95% Mash similarity and 99.5% CGS.
Strains BC1403 and lw0831 (5.4.1.1C), O-153 (species 5.4.1.1F), GIHE-NHR1 and Chao 1806 (5.4.1.1H), and the metagenome ULC187 (5.4.1.1D) clearly fall into 4 additional species of this genus, as shown in the 16S rRNA tree, with their low 16S rRNA sequence identities being confirmed by isDDH (23.7-40.7%), ANI (76.32-89.19%) and CGS (96.18-98.98%) values. There are therefore 12 genomospecies within genus 5.4.1.1 in contrast to the 6 species detected by 16S rRNA analysis. This is illustrated in the figure below.
Phylogenomic tree of the major Pseudanabaena cluster.
Complete genomes are coloured red, draft genomes violet,
metagenomes orange, type species green, outgroup grey.
The branch leading to the outgroup has been reduced by 90%.
Many other genomes are available for the Pseudanabaena
cluster (e.g. strain SR411, the metagenomes UBA1465, JADBBY000000000-JADBWY000000000);
these are all incomplete or lack a complete rrn operon. Note that other strains assigned to the genus "Pseudanabaena"
fall into many different generic clusters in addition to genus 5.4.1.1 (1.3.7, 5.4.1.2,
5.4.1.3, 5.4.1.5, 6.1.1, 6.1.4, 6.1.5, 6.1.8) on the basis of 16S rRNA
sequence identity; the single member of genus 1.3.7 is presumably a het-
mutant. Strains named as Limnothrix are also dispersed in the ribosomal
tree: genera 5.1.7.9, 5.4.1.1 and 5.4.2.2, with most being found in species 5.4.2.1A;
the 6 strains represented by genomic
sequences and two metagenomes in the latter species are described below.
Sphaerospermopsis
The draft genomic sequences of two axenic strains of Sphaerospermopsis,
S. reniformis NIES-1949 and S. kisseleviana NIES-73, together
with that of the non-axenic Sphaerospermopsis sp. LEGE 00249 (species
1.2.3A, which also contains many other members of this genus, not represented
by genomic sequences), despite their different specific epithets, are shown to
be con-specific by values of 74.8-83.2% isDDH,
97.05-98.21% ANI, 99.60-99.68%
CGS and 98.70-99.87% identity of their extracted 16S rRNA sequences. Other
strains of Sphaerospermopsis, also represented by draft genomic sequences
but not known to be axenic, are Sphaerospermopsis sp. LEGE 08334
(species 1.2.3D) and S. aphanizomenoides BCCUSP55 (species 1.2.5C). Strain
LEGE 08334 shares 53.9-54.3% isDDH, 93.48-93.77% ANI and 99.12-99.20% CGS with
the 3 members of the first species, therefore being a member of that species
despite low 16S rRNA identity. Analysis of the circular chromosome of S. torques-reginae
ITEP-024 (species 1.2.3B)shows that this strain is
con-specific with the members of species 1.2.3A, sharing 52.7-52.8% isDDH,
93.25-93.52% ANI. The genome of strain BCCUSP55 is incomplete, containing only
77 of our 79 core marker genes (the other genomes representing this genus all
contain 79); this is also shown by the apparent genome size, measured for this
draft genome as the sum of contig lengths, of 4.31 Mbp versus 5.35-6.03
Mbp for the two other strains, and the CheckM estimate of only 90.7%
completeness. The genome metrics values obtained with this strain (28.4-29.1% isDDH,
83.1-83.3% ANI, 96.35-96.41% CGS with the members of species 1.2.3A; 28.3% isDDH,
83.25% ANI, 96.40% CGS with strain LEGE 08334) suggest that it is a member of
the same genus as the other members, but the 16S rRNA identities (95.86-96.24%)
place it into a different genus. The final member of the Sphaerospermopsis
cluster is a strain named as Anabaena aphanizomenioides LEGE 00250; the
genome metrics (isDDH 82.0%, ANIb 97.67%, CGS 99.70% with strain NIES-1949)
clearly place both strains into the same species. Although separated into 2
genera on the basis of 16S rRNA sequence identity, genome metrics suggest that
the strains of Sphaerospermopsis lie together in a single generic
cluster with several species.
Synechocystis spp.
Although genus 5.1.7.11 (“Synechocystis”) is divided
into 6 species on the basis of 16S rRNA sequence identity, genome sequences
have been obtained for only 16 isolates, most being members of species 5.1.7.11A;
five (IPPAS B-1465, PCC 6714, PCC 6803, PCC 7338 and PCC 7339) are completed. Complete
chromosomal sequences are additionally available for sub-strains of PCC 6803;
note that strains IPPAS B-1465 and FACHB-898 are also sub-strains of PCC
6803. The complete genomes share 98.85-100% 16S rRNA sequence identity. Synechocystis
sp PCC 7509 is unusual, lying in genus 3.4.2 of the 16S rRNA tree and grouping
closely with Chlorogloea sp. CCALA 695 in the genome tree; this strain
is not further considered here. The 16S rRNA of strain PCC 6803 is identical to
that of strains FACHB-898, FACHB-908, FACHB-929 and IPPAS B-1465; the five
strains are confirmed as con-specific by high isDDH, ANI and CGS values (99.9-100%,
99.86-99.88% and 100%, respectively). Strain CACIAM 05 (closed genome) is also
a member of this species, showing 99.66% 16S rRNA identity, 88.6% isDDH
(formula 3), 96.56% ANI and 99.55% CGS with strain PCC 6803. However, the
apparent placement of strain FACHB-383 in this species (based on 99.39% 16S
rRNA sequence identity with strain PCC 6803) is not confirmed by the low values
of isDDH (32.1%) and ANI (86.22%); although isDDH should be considered as
non-discriminatory in this range, the value of 98.48% CGS supports this
specific separation. Strain FACHB-383 groups at the specific level with 3 other
strains (LEGE 00031. LEGE 00041, LEGE 06083), as shown by isDDH values of
77.7-78.2%, ANI 97.24-97.39% and CGS 99.58-99.62%. S. salina strain LEGE
06155 is separated from the above species by values of 40.4-41.0% isDDH,
89.90-90.19% ANI and 98.36-99.05% CGS, although the latter value falls into the
borderline zone. Strains LEGE 06099, PCC 7338 and PCC 7339 are con-specific, as
shown by values of 60.0-98.9% isDDH, 94.64-99.41% ANI and 99.30-99.96% CGS; the
higher values of the ranges of genome metrics are exhibited by strains PCC 7338
and PCC 7339. The 3 strains show only low values of all genome metrics to the species described
above. Strain PCC 6803 and the sub-strains GT-G, GT-I, GT-S, GT-T, LT-G, PCC-N and PCC-P show 100% ANI, 100% isDDH, 99.998-99.999% Mash similarity and 99.96-100% CGS, showing conservation of genome content in organisms that have been maintained in separate laboratories for up to 55 years. The con-specificity of Synechocystis sp. strains PCC 6714 and PCC
6803 (99.60% 16S rRNA sequence identity) is not confirmed by isDDH of their complete genome
sequences (47.5%; GGDC formula 3). This low isDDH value is supported by a low ANI value of 84.16%, Mash
similarity 87.44% and CGS 98.05%. All genome metrics, therefore, place strains
PCC 6714 and PCC 6803 into two distinct species, giving five species in total,
as shown in the figure below.
This figure was extracted from our genomic tree; strains represented by complete chromosomes are coloured red and circular (not suggested to be complete) are shown in blue.
The separation of strains PCC 6714 and PCC 6803 is
surprising. Their chromosomes are similar in mol% GC (47.73-47.81) and size (3.49-3.57
Mbp), with strain PCC 6803 having the larger genome of lower mol% G+C; they are
both axenic and have similar morphological and physiological characteristics
(Rippka et al., 1979). Kopf et al. (2014a) described extensive
differences between the complete chromosome of Synechocystis sp. PCC
6803 and the draft genome of Synechocystis sp. PCC 6714, later confirmed
with the complete chromosome of the latter strain (Kopf et al. 2014b).
The strains were shown to share 2838
protein-coding genes, with 845 unique genes in Synechocystis sp. PCC
6803 and 895 unique genes in Synechocystis sp. PCC 6714; the strains
further differed in a different composition of the pool of transposable
elements and the presence of a prophage in the genome of strain PCC 6714.
We can show rearrangements
by changes in position of many of our 79 core marker genes (see figure above),
although these alone should have no effect on isDDH values. It is interesting that most of the regions missing from the PCC 6714 chromosome (figure below, outer ring) are occupied by areas that diverge greatly in base composition (second black
ring) from most parts of the genome of strain PCC 6308.
The figure below compares the graphical output of fastANI
for pairwise comparisons (left: strain PCC 6803 with IPPAS B-1465, right: PCC
6803 with PCC 6714). The red lines indicate regions of identity between a pair
of genomes. Rearrangements and low similarity between strains PCC 6803 and PCC
6714 are evidenced by the slanted lines and their low density compared to the
comparison between two identical strains on the left. Unlike the situation in
the genus Microcystis described below, neither the core genomes nor the
accessory genomes of strains PCC 6714 and PCC 6803 show significant change in
ANI values; the repeat sequences (2764, 1008 and 1026 per genome for strains
PCC 6714, PCC 6803 and IPPAS B-1465, respectively) are too few to markedly
change the low ANI and isDDH values of the first two strains. Interestingly,
the number of paired repeats in the substrains of PCC 6803, maintained in
different laboratories, shows little change, ranging from 1008-1034.
B2: Most clusters are supported by all metrics
The ADA clade
Isolates known as Anabaena, Aphanizomenon,
Dolichospermum and Cuspidothrix lie in genus 1.1.1. Despite their
major morphological differences and generic names, they are resolved only as
species (1.1.1A to 1.1.1N) of a single genus; the 16S rRNA data are supported
in all cases by genome metrics. These values cannot be compared with the 16S
rRNA identities for some strains (AWQC131C, LEGE 00246, UHCC 0167, UHCC 0406) and metagenomes (AL09 39864, AL93, Clear-A1, CRKS33, I3-5-bin-073, LEO11-02 39865) that lack, or contain only partial copies of,
the gene; additionally, strains Dolichospermum spp. UHCC 0299 and UHCC
0352 have a large gap in the 16S rRNA sequence and have also been excluded from
16S rRNA identity calculations. All of these sequences, except that of the
metagenome I3-5-bin-073, are shown in the figure below, derived by analysis
with our 79 conserved marker genes. Österholm et al. (2020) showed a similar
genomic phylogeny, using the 31 marker genes of Wu & Eisen (2008), and
termed this cluster the ADA clade, derived from a larger tree of 94 genomes. This
definition of the ADA clade excluded the strains of Cuspidothrix. The
two trees are reasonably comparable, except that the clusters are more clearly
resolved with our 79 marker genes. The tree of Österholm et al. contains
in addition Aph. flos-aquae strain 2012/KM1/D3, excluded from our tree
because it is incomplete, having only 76 of our 79 core gene marker set.
The species designations are taken from our 16S
rRNA tree; colours are as in our genomic trees. Sequences labeled with
"$" are not shown in the 16S rRNA tree; most lack (or contain short
fragments of) the gene. The 16S rRNA sequences extracted from the metagenomes Anabaena sp. MDT14b and Anabaena sp. WA113 fall into the
bacterial outgroup of the RNA phylogeny. The metagenomic sequence D. circinale Clear-D4 (incomplete, 75 markers) is not included in the genomic tree.
Their tree lacks D. circinale strains ACBU02 (available from the JGI, not from the NCBI) and
ACFR02 (not in NCBI or JGI). Many sequences from the NCBI, not available at the
time of publication of Österholm et al., have been added here. Many of
these are taken from the thorough studies of Dreher et al. (2021a, b),
who resolved 10 ADA clades, including Cuspidothrix spp., in a tree of
216 genomes inferred with 91 unlisted marker genes. Our analysis shows the
assignment of the ADA genomes to 12 clades, including Cuspidothrix. The inclusion
of members of the latter genus is justified by genome metrics, which indicate
separation at the inter-specific level of all clades:
Genome metrics (%) observed with Cuspidothrix strain LEGE 03284 (333 contigs) and other members of the ADA clade
Cuspidothrix was created by Rajaniemi et
al. (2005) to accommodate strains then known as Aph. issatschenkoi. All
members are bloom-forming and are now known to produce neurotoxins. Since Cuspidothrix issatschenkoi strain CHARLIE-1 does not contain a rrn operon, we can
only perform a complete comparison of genome metrics with strain C. issatschenkoi
LEGE 03284.
Of the nine complete chromosomal sequences of
this genus, Dolichospermum sp. strain UHCC 0090 (1.1.1E) is clearly
separated at the specific level from Dolichospermum sp DET69 (represented
by a complete metagenomic sequence, species 1.1.1G), Anabaena sp. strain
WA102 and Aph. flos-aquae AFA KM1/D3 (1.1.1A) and Aph. flos-aquae
DEX188 (complete metagenome, species 1.1.1L) with 16S rRNA sequence identities of
97.77-98.52%, isDDH values of 37.8-45.7%
(GGDC formula 3), ANI 87.65-90.43%, MASH similarity 89.51-92.43% and CGS 97.61-98.39%.
For these DDH studies, it was necessary to artificially concatenate the two
circular chromosomes of strain UHCC 0090 (chromosome 1, CP003284, 4.33 Mbp and
chromosome 2, CP003285, 0.82 Mbp). Chromosome 1 contains 5 identical 16S rRNA
genes, of which only one is shown in the trees.
Within species 1.1.1E, the three strains
represented by complete chromosomes and three by complete metagenomic sequences
are confirmed as members of a single species, showing isDDH (65.5-74.2% with GGDC formula 3), ANI 95.82-96.93%, Mash
similarity of 96.87-97.72% and CGS 99.44-99.59%. The five 16S rRNA genes of the
complete chromosome of strain UHCC 0315 share only 98.79-99.19% identity, thus
exhibiting microheterogeneity. These sequences, of which we show 3 in the rRNA
tree, share 98.72-99.13% identity with the 16S rRNA of Dolichospermum
sp. strain UHCC 0090; isDDH values
of 65.8% (GGDC formula 3), ANI 96.42%, MASH similarity 97.31% and CGS 99.45% confirm
the con-specificity of the strains. Similar microheterogeneity may be seen in the
complete chromosome of D. flos-aquae CCAP 1403/13F (also in species
1.1.1E), which also contains 5 rrn operons; the 16S rRNA sequences share
98.99-99.73% identity; only two are included in the 16S rRNA tree. These share
98.65-98.92% identity with strain UHCC 0090. These values are at the limit of
our cutoff value; the isDDH (68.5%, GGDC formula 3), ANI (96.49%), Mash (97.38%)
and CGS (99.54%) values establish them as members of a single species.
Anabaena sp. strain WA102 (whose complete chromosome
contains five 16S rRNA genes sharing 99.9-100% identity, only one being shown
in the tree) is con-specific (species 1.1.1A) with D. heterosporum TAC447 (NIES-1647, also containing 5 rrn operons) and Aph. flos-aquae AFA
KM1/D3 (complete metagenome), as shown by values of 64.4-67.5% isDDH, 95.47-96.18% ANI,
96.50-97.12% MASH similarity and 99.51-99.53% CGS.
The organism Dolichospermum sp. DET69,
represented by a complete metagenome, lies in species 1.1.1G of the 16S rRNA
tree. Comparison with the other complete chromosomes shows that it is clearly separated
at the specific level from those in species 1.1.1E (40.4-41.7% isDDH with GGDC
formula 3, 88.31-88.62% ANI, 90.06-90.14% Mash similarity and 97.99-98.02%
CGS), 1.1.1K (32.6% isDDH with GGDC formula 3, 85.96% ANI, 87.54% Mash
similarity and 97.62% CGS) and 1.1.1C (39.6% isDDH with GGDC formula 3, 88.05%
ANI, 89.66% Mash similarity and 98.13% CGS).
With the complete genomic and complete metagenomic sequences excluded, 49
further genomes, all draft, remain in genus 1.1.1. Aph. flos-aquae
strain 2012/KM1/D3 (species 1.1.1A) has been further excluded from DDH
comparisons and the genome tree since the genome is incomplete. Eight species
(defined by 16S rRNA identity values) contain genome sequences, as described
below.
Within species 1.1.1E, 13 of the 14 remaining strains
(only 2 being shown in the 16S rRNA tree) share 98.62-99.79% 16S rRNA sequence
identity, more than 97.26% ANI and a minimum of 74.9% isDDH, confirming their
placement into a single species. Strain Anabaena sp. UHCC 0187 shares 98.63-99.38%
16S rRNA sequence identity with the above, seemingly being a member of the same
species, but is confirmed as a different species by isDDH values of 40.3-40.5%,
ANI values of 89.20-89.38% and 98.67-98.69% CGS. Note that, although the metagenome Anabaena sp. MTD14b lies
in this species, the two 16S rRNA genes extracted from the genome sequence fall
into the bacterial outgroup. Ten further metagenomes also lie in this
species, but the 16S rRNA sequences of three are not available for comparison;
however, these are confirmed as co-identic by values of 82.4% isDDH and 97.86%
ANI. Strains UHCC 0299 and UHCC 0352 fall into this species in the 16S rRNA
tree if only the 5-prime end of the gene is considered, but have long branches
due to the gaps in their rRNA sequence; they have been omitted from this tree. Some of the metagenomes show microheterogeneity of the 16S rRNA tree, the most pronounced being OL03 (5 rrn operons, 98.7-99.5% 16S rRNA sequence identity).
The single complete and three draft metagenomic sequences in species
1.1.1G share 99.2-100% 16S rRNA sequence identity and show 69.9-97.3% isDDH and 96.18-99.7% ANI values. All
four strains are therefore members of a single species. The metagenome D.
circinale Clear-D4 fits into this species on the basis of genome metrics
(96.05% ANI, 70.9% isDDH with strain ACFR02), but is incomplete, containing
only 75 of our 79 core marker genes and no 16S rRNA gene, and has been omitted
from the rRNA and genome trees.
The eight strains of species 1.1.1A for which
16S rRNA data are available are confirmed as members of a single species (99.74-99.93%
16S rRNA sequence identity, 68.2-73.4% isDDH,
95.43- 98.67% ANI and 99.46-99.93% CGS). The incomplete draft genome of Aph.
flos-aquae strain 2012/KM1/D3 falls into this cluster in the 16S rRNA tree,
but has been excluded from the genome tree and DDH calculation; the 16S rRNA
gene of the metagenome Aph. flos-aquae WA102 incorrectly places this
genome into cluster 1.1.1A due to extreme heterogeneity (see below); this has
been excluded from DDH calculation.
Species 1.1.1B, 1.1.1C and 1.1.1D in the 16S
rRNA tree contain the sequences of 3 draft genomes: Dolichospermum sp.
LEGE 00240, NIES-806 and UHCC 0259, all present in the genome tree. Values of
50.4-53.8% isDDH, 92.36-93.11% ANI and 98.93-99.04% CGS fall onto the borderline
of interspecific separation.
The draft metagenome labeled as WA102, but named
as Aph. flos-aquae, contains four 16S rRNA sequences, only two
being shown in the 16S rRNA tree; one (incomplete) is found in species 1.1.1A
(sharing 99.5% identity with the sequence from the complete chromosome of Anabaena
sp. strain WA102) and the other in 1.1.1L. The 2 sequences thus assign this
organism to different species. On the basis of isDDH, this draft metagenome fits into species 1.1.1L, sharing
82.3-82.6% identity and 97.44-97.81% ANI with Anabaena sp. UBA 12330 and
WA113, Aphanizomenon flos-aquae Clear-A1, CP01, DEX188, MDT13, MDT14a
and UKL13-PB (all being again metagenomic assemblies); it shows only 35.3% isDDH,
87.51% ANI with the complete chromosome of Anabaena sp. strain WA102 in
species 1.1.1A. Strains MDT13 and MDT14a show high isDDH values of 92.3% and 98.99% ANI. The (almost) full length 16S
rRNA genes extracted from the metagenomic sequences Anabaena spp.
strains MDT14b (2 sequences) and WA113 (3 sequences) are not of cyanobacterial
origin, since they fall into the bacterial outgroup of the tree. The metagenome
Aph. flos-aquae Clear-A1, showing 98.00% ANI and 83.8% isDDH to Aphanizomenon
MDT14a, places the organism into species 1.1.1L but, lacking a rrn
operon, is not shown in the RNA tree.
The four D. circinale strains represented
by genomes in species 1.1.1I are con-specific, as shown by values of 70.9-97.3%
isDDH and 96.21-99.77% ANI. The single metagenome of this species, Anabaena
sp. CRKS33, gives isDDH of 58.8-60.6% and ANI values of 94.62-94.81% with the D.
circinale strains, confirming membership of the same species.
D. planctonicum strain NIES-80 lies in species 1.1.1H; although it cannot be compared with
strain UHCC 0167 on the basis of 16S rRNA identity, the two are members of a
single species, showing an isDDH
value of 67.9% and 95.99% ANI. Strain NIES-80 shares 84.5% isDDH, 98.20% ANI
and 99.81% CGS with D. flos-aquae strain LEGE 04289.
The positioning of the eight sequences of species 1.1.1H and
1.1.1I is debatable. Dreher et al. (2021a, b) combined them into a
single species (ADA-1), but mention that "A case might be made in the
future to split ADA-1 into subspecies". Genome metrics for the two species
are conflicting: isDDH and ANI values suggest that they be combined, but 16S
rRNA sequence identities and CGS values do not. We have maintained specific
separation of these sequences, pending further study.
Species 1.1.1M contains two draft genomic
sequences (Anabaena sp. strains UHCC 0204 and UHCC 0253) that show
99.46% 16S rRNA sequence identity, 84.0% isDDH
and 98.05% ANI, confirming their assignment to the same species.
Genus 1.1.1 also contains 6 strains of Dolichospermum sp. each cultured from a pool of 10 akinetes. Since these are non-clonal, they are not included here.
The two strains of Cuspidothrix issatschenkoi
of species 1.1.1N are con-specific (isDDH 69.6%, ANI 96.18%). Strain CHARLIE-1
lacks not only the rrn operon but also the nif and gvp
gene clusters, present in strain LEGE 03284.
The problem of inter-mixing of the generic names
appended to members of genus 1.1.1 has been reported in several publications
(see Driscoll et al., 2018, and references therein) and may arise, in
part, by the over-zealous transfer of strains with a variety of specific
epithets into Dolichospermum by various authors, and in part by renaming
of strains originally identified as those species of Anabaena to Dolichospermum, which is done automatically by the NCBI. Dreher et al. (2021a) suggested
that this genus be called Anabaena, which appears to be incorrect, since
Rajaniemi et al. (2005) suggested that "If these subclusters would be
classified into a single widely conceived genus, its name must be Aphanizomenon
MORREN [1836] ex BORNET et FLAHAULT, Ann. Sci. Nat. Bot. 7, ser 7:241, 1888 (type-species=
Aphanizomenon flos-aquae [L.] RALFS ex BORNET et FLAHAULT ibid., 1888".
Acaryochloris
Within species 8.2.3A, Acaryochloris sp. CCMEE 5410 (draft
genome), Acaryochloris marina MBIC10699 and MBIC11017 (complete chromosomes) share 99.33-100% 16S rRNA sequence
identity, 53.2-95.4% isDDH, 92.83-99.35% ANI and 99.43-99.8% CGS. Acaryochloris thomasi
RCC1774 (draft), despite the generic name, has a unique pigment composition (lacking
Chl d, a pigment characteristic of the genus Acaryochloris) and is
clearly a member of a distinct genus, showing only 94.53-94.66% 16S rRNA
sequence identity and around 20.8% isDDH,
68% ANI, 88.2% CGS with the 2 members of genus 8.2.3. A. marina strain
S15, represented by a complete chromosomal sequence, is more problematic. It
clusters, although loosely, with strains MBIC10699, MBIC11017 and CCMEE 5410 in the genome
tree, and shows 48.0% isDDH (GGDC formula 3), 84.76% ANI and 98.19% CGS with
strain MBIC11017, thus being a distinct species. However, the 16S rRNA sequence
identity is 99.00%. Note that three metagenomes (CRU 2 0, RU 4 1 and SU 5 25)
are incomplete, containing respectively 74, 73 and 37 of our 79 core markers),
and have been excluded from the genome tree and genome metric calculations.
Anabaena
Two strains (CA=ATCC 33047 and 4-3) named as Anabaena
sp. and found in the same environment (estuary, Port Aransas, Texas) lie in the
monospecific genus 1.9.2; their draft genomes share 99.31% ANI, 94.4% isDDH, 99.98% CGS and 99.93% 16S rRNA sequence identity and are therefore members of a single species.
Anabaenopsis
In cluster 1.3.2B, Anabaenopsis tanganyikae CS531 and A. arnoldii CS1033 (both draft) are identical on the basis of 16S rRNA sequence comparison. A. elenkinii strain CCIBt3563 (complete genome) shares 99.06-100% 16S rRNA sequence identity with strains CS531 and CS1033. The three strains share 44.3-97.3% isDDH, 99.20-99.60% ANI and 99.20-99.98% CGS. They are therefore con-specific, despite their three different specific epithets. Anabaenopsis sp. FSS-46 does not contain a 16S rRNA gene, but shows 44.3% isDDH, 98.12-98.15% ANI and 99.83% CGS to the other members; the isDDH value is at the upper limit of specific separation, whereas the ANI and CGS values support con-specificity. Further genome sequences are required in order to resolve this conflict. The complete genome of A. circularis NIES-21 (6.57 Mbp, 40.3% G+C) falls elsewhere in the tree (family 1.12), showing only 94.76% 16S rRNA sequence identity, 21.3% isDDH, 75.10% ANI and 94.04% CGS to, for example, strain CS1033. Unfortunately, genome sequences of Cyanospira (species 1.3.2A of the 16S rRNA tree) are not available.
Aphanocapsa
The mono-specific genus 7.1.4 contains the metagenomic
sequences Aphanocapsa feldmannii 277cI and 277cV. These are identical on
the basis of 16S rRNA sequence identity and show 90.9% isDDH, 98.27% ANI. However, the genome sequence of 277cI is
incomplete, containing only 70 of our 79 core marker genes; this is reflected
by a large difference in apparent genome size (impossible to determine
precisely with draft genomes), that of 277cV being 2.55 Mbp versus only 1.92
Mbp for 277cI. The latter sequence has been excluded from the genome tree. Two
uncultured organisms, for which genomic sequences are not available, also fall
into this genus on the basis of 16S rRNA sequence identity.
Aphanothece
The genus 5.1.4.6 contains the draft genome sequences of 2
organisms, Aphanothece sacrum strains FPU1 and FPU3. These show 100% 16S
rRNA sequence identity, 99.99% ANI, 100% isDDH and 99.90% CGS, and are therefore identical.
Arthrospira
As described on the main page, members of the genus Arthrospira
(5.2.1.3) have been renamed to Limnospira (with L. fusiformis as
the type species) since some uncultured organisms named as the type species of Arthrospira,
A. jenneri, fall elsewhere in the tree (genus 5.3.3.1). We do not accept
this change, based only on environmental samples. Of the many cyanobacterial strains
assigned to the genus only 15 extracted 16S rRNA sequences, plus three extracted
from the complete chromosomes of L. fusiformis strain SAG 85.79 and A. platensis strain C1 (PCC 9438) and the
draft genome of L. fusiformis strain KN, are shown in the 16S rRNA
trees. A second 16S rRNA sequence extracted from the metagenome Limnospira
sp. RM-2019 contains an unlikely number (2.98%) of mismatches in paired regions
and is not shown. The strains of genus 5.2.1.3 are all closely related, sharing
99.13-100% 16S rRNA sequence identity, suggesting that all members of Arthrospira
can be assigned to a single species. This conclusion is supported by our genome
trees based on 79 conserved marker genes, and by isDDH analysis of the available genomes. The genomes show 89.8-99.2% isDDH, despite the diverse
geographical origins of the strains (Africa, China, Mexico, Switzerland). ANI
values of 93.37-99.10%, Mash similarities of 95.58-99.52% and CGS values of 98.99-99.79%
confirm that all Arthrospira strains (carrying 6 specific epithets) can
be assigned to a single species, for which we suggest the name A. platensis.
A similar conclusion was reached by Xu et al (2016) using a more restricted dataset, and these authors showed "unprecedented extensive chromosomal
rearrangements" among the genomes. The latter are evident in the figure
below (reciprocal comparisons of strains PCC 8005 and YZ).
This is also evidenced by our CGS analysis: the positions of
the 79 conserved marker genes on the same genomes are shown in the following
figure. Note that strain O9.13F has been excluded from DDH calculation because
the genome is only 84.1% complete as estimated by CheckM and contains only 72
of our 79 conserved marker genes. A. maxima strain CS-328, 96.6%
complete and containing only 73 of the 79 marker genes, has also been excluded.
Candidatus Atelocyanobacterium
The endosymbiont Candidatus Atelocyanobacterium thalassa
ALOHA (UCYN-A1), submitted as a complete genome (CP001842), contains only 76 of
79 core marker genes and cannot be shown in the genome trees. The two complete rrn
operons fall into species 5.1.4.2A; Candidatus Atelocyanobacterium thalassa
SIO64986 (=UCYN-A2, JPSP00000000), also containing 2 complete rrn
operons, lies in species 5.1.4.2B. The genomes show 26.1%
isDDH, 84.1% ANI. The isDDH value (GGDC formula 2) is ambiguous in this region,
but the ANI value confirms specific separation of the two organisms represented
by these metagenomes. The complete genome of Cyanobacterium endosymbiont of Braarudosphaera
bigelowii CPSB-1 (AP024987), containing 77 core marker genes, lies in
species 5.1.4.2B and is con-specific with the organism represented by the
metagenome SIO64986 (99.93% 16S rRNA sequence
identity, 94.2% isDDH, 99.27% ANI, 99.80% CGS). Genus 5.1.4.3 contains 3 complete
genome sequences of comparable strains: endosymbiont
of Epithemia turgida EtSB (AP012549), endosymbiont of Epithemia clementina EcSB (CP123991) and endosymbiont of Rhopalodia gibberula
RgSB (AP018341). All contain 79 core marker genes. These are members of different
species of the genus, sharing 98.65% 16S rRNA sequence identity, 54.9-61.0% isDDH
(GGDC formula 3), 87.26-88.02% ANI, 89.43-90.25% Mash similarity and 95.82-96.93% CGS. Generic
separation of these two groups of symbionts is confirmed by low values of all
metrics (95.26% 16S rRNA sequence identity, 13.1% isDDH, 70.04-70.20% ANI,
82.60-83.37% CGS).
Baaleninema Baaleninema simplex strains FC II and PCC 7105 (species 5.2.4.7B) are identical
on the basis of their 16S rRNA sequence identity (100%); their con-specificity
is confirmed by values of 74.3% isDDH
and 96.57% ANI of their draft genomes; the sequence of strain PCC 7105 contains
only 75 of the 79 marker genes and has been excluded from the genome tree.
Brasilonema
The genomes of Brasilonema sennae CENA114 and B. octagenarum UFV-E1 are identical, with 100% isDDH, 99.96% ANI and 100% CGS, and their extracted 16S rRNA genes are identical. Their genome sizes are 7.78-7.82
Mbp. These strains, despite their different specific names, are therefore
members of a single species (1.19.4A). They show 93.1-93.2% isDDH and
99.11-99.14% ANI (with about 93% coverage) with the genome of B.
octagenarum strain UFV-OR1, which is therefore a member of the same
species. The 16S rRNA gene extracted from the latter genome, however, shares
only 97.78% identity with those of strains CENA114 and UFV-E1; this gene sequence
shows many mispairings in helices 36-39, indicative of the presence of foreign
DNA, and does not correspond to the gene recovered from a 16S rRNA-specific PCR
reaction, which nevertheless falls into the same species. Strain CENA114 shows
40.7% isDDH, 97.77% CGS and 98.52% 16S rRNA sequence identity with Scytonema
tolypothrichoides VB-61278, showing them to be members of different species
of a single genus (species 1.19.4A and 1.19.4C), despite their different
appended generic names; the ANI value, calculated for the latter strain after the
removal of an excessive number of ambiguous sites, is 88.73%, confirming their
assignment to different species.
Calothrix
In addition to those described above, strains named as Calothrix spp. are widely dispersed
throughout order 1, with the majority falling into family 1.15. We show only 62
members in the 16S rRNA tree. Genera 1.15.7, 1.15.8, 1.15.14 and 1.15.15 are each divided into several species on the basis of their 16S rRNA sequence identities. Unfortunately, genomic sequences are available only for 7 species
of Calothrix and 1 of Rivularia.
The complete chromosome of Calothrix sp. NIES-4101,
containing 4 rrn operons of which the 16S rRNA of only one is shown,
forms the singleton genus in family 1.15. Species 1.15.7A contains 5 identical
copies of the rrn operon of the complete chromosome of Calothrix sp. PCC
7716, of which the 16S rRNA sequences of only 2 are shown. Species 1.15.7B
contains the single draft genomic sequence of Calothrix sp. PCC 7103. Species
1.15.7C, containing the chromosomal sequences of two strains (Calothrix
sp. NIES-4071, complete, and NIES-4105), is one of two clusters for which
intra-specific comparison is possible. Both genomes contain 5 copies of the rrn
operon; only 2 16S rRNA sequences are shown for each, sharing 99.87% sequence
identity within each strain and 99.87-100% between strains. Species 1.15.7D
contains 2 draft genome sequences of C.desertica PCC 7102. Genus 1.15.13 is represented by the single complete chromosome of Fulbrightiella
sp. PCC 6303, which contains 4 rrn operons; we show two 16S rRNA
sequences. Species 1.15.14C contains the complete chromosome sequence of Calothrix
sp. NIES-3974 which possesses 3 rrn operons, of which only one 16S rRNA
sequence is shown.
The complete chromosomes of Calothrix
spp. NIES-4071 and NIES-4101, Fulbrightiella sp. PCC 6303 and Rivularia sp. PCC 7116 vary in
size from 6.77-11.06 Mbp. Their extracted 16S rRNA genes show 90.77-92.79%
sequence identity. The isDDH values of 13.1-14.5% and ANI values of 68.67-73.7%
confirm the generic separation of these strains.
Within genus 1.15.7, strains of the 4 species have genomes
of size 11.06-11.58 Mbp (measured as the sum of contig lengths for the draft
genomes). They show 97.85-98.72% 16S rRNA sequence identity, 82.31-91.01% ANI
and 97.60-99.05% CGS, confirming their specific separation, although the
highest values for both ANI and CGS fall into the borderline region for species
separation. The DDH values of 29.2-29.5% (formula 2) are not discriminatory in
this range Within species 1.15.7C, strains NIES-4071 and NIES-4105 are identical
in genome size (11.06 Mbp); their chromosomes are identical (100% isDDH, ANI
and CGS). The two genomes representing species 1.15.15D, whose metrics do not
agree with the 16S rRNA identity value are described above.
Specific separation of strains PCC 7102 and PCC 7103 was
shown in the wet-lab DDH results of Lachance (1981), whose data demonstrate
(for the isolates available at that time) the division of this “genus” into
multiple clusters: strains PCC 7103 and PCC 7713 (species 1.15.7B), showing 80%
DDH with a ΔTm(e) of 1°C; and strain PCC 7102 (species 1.15.7D) with 65% DDH and 5°C ΔTm(e) to the above; strains PCC 7709, PCC 7715 and PCC 7716 (85-86% DDH, ΔTm(e) 5°C, species 1.15.7A); strains PCC 7111, PCC 7116, PCC 7204, PCC 7426, PCC 7711, PCC 7810 and PCC 7815, with lower DDH values of 40-60% and higher (13-15°C) ΔTm(e) (Rivularia species 1.15.15A, 1.15.15D and 1.15.15F).
Low DDH values of only 10-27% were
observed for strains PCC 6303 (genus 1.15.13), PCC 7507 (species 1.5.1B) and
PCC 7714 (species 1.15.9A) in crosses with representatives of the above
clusters. Strains PCC 7102 and PCC 7103 showed 52% DDH (ΔTm(e)
5°C), close to the isDDH value.
Candidatus Synechococcus spongiarum
The 16S rRNA genes
extracted from four metagenomes of Candidatus Synechococcus spongiarum, Cyanobacteria bacterium bin9 and
one named as Synechococcus sp. (mono-specific genus 7.1.5) share a
minimum of 99.12% identity. The genomes vary in size (estimated as the combined
lengths of their contigs) from 1.45 Mbp to 2.27 Mbp, resembling many of the Prochlorococcus and the Synechococcus OMF sister group described above. Of the genomes common to
both 16S rRNA and genomic trees, we can compare only 3: Candidatus Synechococcus
spongiarum metagenomes: 15L and 142, and Cyanobacteria bacterium bin9. The genome 15L is identical in
16S rRNA sequence to bin9; the two genomes show 91.0% isDDH, 99.24% ANI and 99.52% CGS,
confirming their con-specificity. Although 15L is suggested by 16S rRNA identity
of 99.33% to be con-specific with 142, the ANI value of 86.35% and 95.37% CGS indicate that
the organisms represented by these metagenomes should be assigned to different
species. The value of 32.2% obtained with isDDH formula 2 is ambiguous in this
region. Although both metagenomes lack 2 of our 79 core marker genes, the
missing genes are different: 15L lacks gmk and rimM, whereas 142
lacks purM and smpB. This indicates that the two genomes lack
different segments, which will decrease both the ANI and CGS values, in contrast to the Candidatus
Atelocyanobacterium thalassa metagenomes of genus 5.1.4.2 which both
lack the same pair of marker genes (purM and rbcS). As described
in the discussion of the Synechoccocus OMF clade above, and shown in the accompanying figure, Synechococcus spongiarum was suggested to be re-named to Synechospongium by Salazar et al. (2020).
Chamaesiphon
Genomic sequences are available for Chamaesiphon minutus
strain PCC 6605 (complete chromosome, species 2.1.1A) and C. polymorphus
CCALA 037 (draft, species 2.1.1D). Their extracted 16S rRNA sequences share
98.58% identity and the entire genomes show 87.45% ANI (but with only ~ 54%
coverage), 98.41% CGS and 36.8% isDDH,
confirming their assignment to two species. Note that the genome metrics values
may be slightly inaccurate, since the genome of strain CCALA 037 is incomplete,
containing only 77 of our 79 core marker genes.
Chroococcidiopsis
Within genus 3.1.14, Chroococcidiopsis spp. CCNUC1, PCC 7203 (both having completed chromosomal sequences), FACHB-1243, PCC 8201 and SAG
39.79 (PCC 7433) fall into species 3.1.14A. They share 99.33-100% 16S rRNA sequence
identity, 93.53-98.69% ANI, 58.3-90.1% isDDH and 99.49-99.92% CGS values. The two complete chromosomes share 98.69% ANI, 97.4% isDDH (GGDC formula 3), 99.93% CGS and 99.18% Mash similarity. The 16S rRNA gene extracted from a sixth Chroococcidiopsis sp. genome, PCC 7434, lies in a distinct species (3.1.14B), sharing only 98.12-98.39% 16S rRNA sequence identity with the members of species 3.1.14A; the genome metrics are 90.47-90.73% ANI and 43.7% isDDH.
Species 3.1.14D contains two identical draft genomic sequences of "Scytonema
millei" VB511283 (JTJC00000000.3 and QVFW00000000); their extracted 16S rRNA genes show 97.79-98.32% sequence identity to the member of species 3.1.14B;
the genome metrics are 39.6% isDDH and 89.7% ANI. JTJC00000000.3 is the third version of the genome of this organism to be submitted (by the same authors) to NCBI; the first two (JTJC00000000.1 and JTJC00000000.2), showing 56.5% isDDH,
seem to be chimeric (containing cyanobacterial and bacterial segments),
although a 16S rRNA gene (truncated to 1086 nt at the end of a contig) extracted
from the second groups with other Scytonema sp. strains in genus 1.19.11.
The latest versions appear to be the genomes of a Chroococcidiopsis sp.
as evident from the similarity values given above, not of a heterocystous
organism; this is supported by our NCBI BLAST studies using the extracted
marker genes, that always find Chroococcidiopsis sp. PCC 7203 at
identities greater than 90%, high ANI values with other members of genus 3.1.14,
and the placement of this genome by CheckM and pplacer as a close relative of
strain PCC 7203. The genome shares 91.3% isDDH
with the first version but only 21.7% with the second. A fourth species of
genus 3.1.14 contains the isolate named as unidentified cyanobacterium strain
TDX16; the high degree of contamination of the genome sequence makes DDH
studies pointless, and it has been excluded from the genome tree. The Chroococcidiopsis spp. of genus 3.1.14 are distinct from strain Chroococcidiopsis sp. PCC
6712 (currently [Jung et al., 2021] incorrectly named as Hyella disjuncta, genus 5.1.5.8), with which they share < 90% 16S rRNA sequence identity. Further members named as Chroococcidiopsis sp. lie in genera 3.1.17 and 3.1.18; only one (strain CCMEE 029, species 3.1.18B) is represented by a genomic sequence. This organism is shown only in the ribosomal and complete genome trees.
Coleofasciculus
The 16S rRNA sequences extracted from the draft genomes of two strains of Coleofasciculus chthonoplastes (LEGE 07081, LEGE 07092) fall into genus 3.1.25, and that of strain PCC 7420 into species 3.1.4A. One sequence of strain LEGE 07081 falls into the bacterial outgroup. The sequences of genus 3.1.25 are identical, showing 99.9% isDDH, 99.96% ANI and 99.99% CGS, and share only 95.03% 16S rRNA identity 22.2% isDDH, 73.59-73.60% ANI and 92.99-93.00% CGS with PCC 7420 in a different genus. The 16S rRNA sequences from the draft genomes of strains FACHB-64 and FACHB-1120 lie in species 3.3.2A, sharing 99.06% identity, 47.1% isDDH, 91.74% ANI and 99.23% CGS; the genome metrics suggest that these strains are borderline members of a single species. The sequence (obtained by 16S rRNA-specific PCR) of the type strain (SAG 2209) is found in species 3.1.4A.
Crocosphaera
Species 5.1.4.1B contains six strains (Crocosphaera watsonii
WH 0003, WH 0005, WH 0401, WH 0402, WH 8501 and WH 8502) that share 99.64-100%
16S rRNA sequence identity. These strains show 63.3-93.7% isDDH, 98.29-99.58%
ANI and 99.78-100% CGS; all genome metrics therefore confirm their
con-specificity. This is unlike the situation in the Crocosphaera
members of species 5.1.4.1A, described above.
Cyanobacterium
Separation of Cyanobacterium aponinum (represented by
strain PCC 10605) and C. stanieri (strain PCC 7202) at the generic
rather than specific levels is confirmed by 16S rRNA sequence identity (93.40%),
ANI (72.85%), isDDH (13.9%,
GGDC formula 3), Mash similarity (75.70%) and CGS (91.39%) with their completed
chromosomal sequences; these strains fall into genera 5.1.7.4 (species 5.1.7.4A)
and 5.1.7.6 (species 5.1.7.6A) in the tree. The four strains of species
5.1.7.4A, PCC 10605 (complete chromosome), 0216 (draft), FACHB-4101 (draft) and
IPPAS B-1201 (draft), are almost identical (99.80-100% 16S rRNA sequence
identity, 80.4-86.3% isDDH,
98.15-98.43% ANI, 99.80-99.89% CGS). The potentially chimeric nature of the genome of strain 0216
cannot be excluded, since both 16S rRNA and 23S rRNA genes of one of two
operons fall into the bacterial outgroup. Within genus 5.1.7.6, C. stanieri
strain PCC 7202 (species 5.1.7.6A) is separated at the specific level from Cyanobacterium
sp. strains IPPAS B-1200 and HL69 (species 5.1.7.6B), sharing 97.91% 16S rRNA sequence identity, 83.01-83.11%
ANI, 27.1-27.2% isDDH and 97.41-97.42% CGS. The latter two strains show 99.46% 16S rRNA sequence identity, 95.81% ANI, 66.2% isDDH and 99.53% CGS. The
complete chromosomes of the strains of the two genera show a marked difference
in size (4.11 Mbp for C. aponinum PCC 10605, 3.16 Mbp for the two
members of genus 5.1.7.6) and mol% GC (35.0 and 37.8-38.7, respectively). The
draft genome of C. stanieri strain LEGE 03274 is seen in species
5.1.7.6A, sharing 98.86% 16S rRNA sequence identity with strain PCC 7202. However,
genome metrics suggest that these strains be separated into 2 species (84.40%
ANI, 97.83% CGS; the isDDH value of 30.6% is not discriminatory at this level).
Cylindrospermum
Cylindrospermum spp. strains PCC 7417 (represented by a complete chromosome sequence) and NIES-4074 ("nearly complete") share only 97.64% 16S rRNA sequence identity;
the isDDH value of 29.0% (GGDC
formula 3), ANI of 83.41% and CGS of 97.17% confirm their assignment to
different species (1.7.1A and 1.7.1F). The 16S rRNA gene extracted from the
draft genome of Cylindrospermum sp. FACHB-282 shows 99.06% identity with
strain PCC 7417, the genomes share 38.8 % isDDH, 88.63% ANI and 98.50% CGS, not
supporting con-specificty; this genome shows only 97.11% 16S rRNA sequence
identity with NIES-4074, 28.8 % isDDH, 83.22 % ANI and 97.09 % CGS. The Cylindrospermum spp. strains therefore represent 3 species on the basis of genome metrics.
Desertifilum
Four strains of Desertifilum sp. (FACHB-866,
FACHB-868, FACHB-1129 and IPPAS B-1220) lie in cluster 4.3.1 and are identical
in 16S rRNA sequence. They show 98.8-99.9% isDDH, 100% CGS and 99.94-99.99%
ANI, confirming their con-specifity. For comparison, they share only 92.44% 16S
rRNA sequence identity, 68.96% ANI, 88.50% CGS and 19.7% isDDH with their
nearest relative, the metagenome Hormoscilla sp. GUM202, in genus 4.4.1.
Generic separation is therefore confirmed by all metrics.
Desikacharya
Genus 1.12.3 ("Desikacharya") contains 3
species, of which two are represented by genomic sequences: D. piscinale
strain CENA21 (complete chromosome), Nostoc
spongiaeforme FACHB-130 and Nostoc sp. MBR 210
(draft genomes), species 1.12.3C, and Nostoc sp. LEGE 06077
(draft), species 1.12.3A. Strain FACHB-130 is not shown in the genome tree, and is not further considered here. The species are separated by a maximum value of 98.57% 16S rRNA sequence identity. Strain CENA21 is con-specific with MBR 210 (93.08% ANI, 52.9% isDDH, 99.01% CGS). Specific separation of strain LEGE 06077 from CENA21 is confirmed by values of 85.48% ANI, 31.6% isDDH, 98.01% CGS.
Fischerella
Of the genomes of strains named as Fischerella sp., F. ambigua, F. major, F. muscicola, F. thermalis, Mastigocladus
lamimosus, Mastigocladus sp., Hapalosiphon sp., H. welwitschii, Westiella intricata and Westiellopsis
prolifica (genus 1.15.1), those of one member of species 1.15.1A are
available, 1 for species 1.15.4E, 5 for species 1.15.1G, 1 for species 1.15.1I, 5 for species 1.15.1J, 1 for species 1.15.1K, 22 for species 1.15.4A, 1 for species 1.15.4B, 3 for species 1.15.4C, 2 for species 1.15.4D and 1 for cluster 1.15.5. Strains within a single species show isDDH values of 76.3-99.9%, with members of separate species having isDDH values of
41.0-46.9%. The metagenome Hapalosiphon sp. JJU2 is not shown in the
rRNA phylogeny; the organism is con-specific with Fischerella sp. PCC 9339 (species 1.15.1G), as shown by values of 69.8% isDDH, 95.37% ANI and 99.34% CGS. Within species 1.15.1G, the sequences of 5 strains assigned to three genera (Fischerella, Mastigocladus and Westiellopsis) share 99.71-100% 16S rRNA sequence identity, confirmed by isDDH values of 81.0-85.3%, 96.74-98.22% ANI and 99.57-99.98% CGS, illustrating the current state of nomenclatural
confusion. Note that, of the 2 rrn operons recovered from the genomic
sequence of Westiella intricata strain UH HT-29-1, one falls into
species 1.15.1J together with a sequence obtained by 16S rDNA-specific PCR; the
other (from rrnA) lies in family 1.13. The other members of species
1.15.1J, named as Fischerella sp., Hapalosiphon welwitschii, Hapalosiphon sp. and Westiella intricata, show 16S rRNA sequence identities of 99.79-100%; the
isDDH (76.3-77.2%), ANI (96.59-100%) and CGS (99.73-99.86%) results are in perfect agreement with the 16S rRNA sequence identities. In summary, isDDH between all available genome sequences confirms the
con-specificity or division of the strains at the specific and generic levels defined by 16S rRNA sequence identity. The following strains are not shown in the RNA phylogeny, because they lack, or contain only short fragments of, rRNA genes: F. thermalis CCMEE 5196, CCMEE 5198, CCMEE 5273, CCMEE 5318, CCMEE 5328, CCMEE 5330 and Fischerella sp. L.E.CL.63. The members of species 1.15.4A, most of which are named as F. thermalis, all contain a characteristic insert in the paired stem-loop structure of helix 6.
Lachance (1981) in wet-lab thermal hydroxyapatite
elution DDH studies previously demonstrated the dispersal of Fischerella
spp. strains of the PCC into two specific clusters: strains PCC 7521, PCC 7522 and
PCC 7523 with 87-91% DDH and 0-2 °C ΔTm(e) (here species 1.15.4A); strains PCC 7115, PCC 7414, PCC 7520 and PCC 7603 with 98-100% DDH and no change of ΔTm(e) (species 1.15.4D). However, Fischerella muscicola
strain PCC 73103 showed 99.0% DDH, with 0 °C ΔTm(e), with members of our species 1.15.4D whereas this strain here falls into species 1.15.1A. The isDDH
results (32.7%, not discriminatory in this range), 85.53% ANI and 98.24% CGS values
confirm the assignment of strains PCC 73103 and PCC 7414 to different species. We
conclude that an error involving strain provision or mislabeling occurred. Comparison
of wet-lab and isDDH values for other strains is not possible, since there are no other results in common.
Geminocystis
Geminocystis spp. strains NIES-3708, NIES-3709 and PCC 6308 (genus 5.1.7.1) are assigned to three species on the basis of their 16S rRNA sequence identities of
97.64-98.04%. This is confirmed by low identities of isDDH (25.8-27.4%), ANI (79.75-82.40%) and CGS (95.03-96.44%).
Gloeobacter
The phylogenetic relationships inferred from genome sequences of Gloeobacter spp. and related genera in the Gloeobacteriales are shown in the figure below.
Phylogenetic tree inferred with the available genomes of the Gloeobacteriales.
Complete genomes are coloured red; type sp. black; metagenomes orange; outgroup grey. The branch length of the outgroup has been reduced by 90%.
At the base of the 16S rRNA and genome trees, the complete
genomes of Gloeobacter violaceus strain PCC 7421 and Gloeobactermorelensis
MG652769 (species 11.1.1A), described respectively by Nakamura et al.
(2003) and Saw et al. (2021), and the metagenome Gloeobacter sp. Hawaii (Grettenberger (2021), share 99.93% 16S rRNA sequence identity,
despite their different specific names. This con-specificity is confirmed by
genome metrics: 67.1% isDDH (GGDC formula 3), 91.96% ANI (a borderline value),
94.04% Mash similarity. The value of 98.31% CGS may suggest an inter-specific
separation. The complete genome of G. kilaueensis strain JS1 (species 11.1.1B)
shares 98.65% 16S rRNA sequence identity, 17.0% isDDH (GGDC formula 3), 73.6% ANI, 78.69% Mash similarity and 90.67% CGS with strain PCC 7421. Strain JS1, fully described by Saw et al. (2013), was not axenic and the authors reported genomic regions that were not
recognized by BLAST, were from other bacteria or of viral origin. In contrast,
strain PCC 7421 is held in the PCC in an axenic state. The regions described by
Saw et al. (2013) obviously make genome metrics inaccurate and difficult
to calculate. A fifth genomic sequence attributed to Gloeobacter is the
metagenome SpSt-379 (Zhou et al., 2020); this is smaller in size (4.03
Mbp versus 4.66-4.72 Mbp for the two isolates), has a much higher mol%
G+C content (65.3 versus 60.5-62.0), contains only 62 of our 79 core
gene marker set and lacks the rrn operon. It has been excluded from all
trees and from DDH/ANI calculations. Aphanocapsa lilacina strain
HA4352-LM1 (Ward et al., 2021), represented by a draft genomic sequence,
shares 99.46% 16S rRNA sequence identity with G. violaceus strain PCC
7421, suggesting con-specificity. The genome metrics (48.9% isDDH, 91.98% ANI,
98.21% CGS) all fall into the borderline area of specific separation. This
genome is not shown in the genome tree. Anthocerotibacter panamensis C109
(Rahmatpour et al., 2021; complete chromosome, genus 11.2.2), and Candidatus
Aurora vandensis MP9P1 (Grettenberger et al., 2020; draft, species
11.2.1A) behave as different genera with all estimates (96.62% 16S rRNA
sequence identity, 19.6% isDDH, 69.6% ANI). However, the genome of organism MP9P1
contains only 76 of our 79 core marker gene set, is smaller in size (3.07 Mbp) and
has been excluded from the genome tree. The metagenome Candidatus Aurora vandensis
LV9 (Grettenberger et al., 2020) is clearly a member of a different genus, sharing 13.1-20.7% isDDH, 65.16-69.46%
ANI and 76.28-76.81% CGS with all others. The metagenome Gloeobacterales cyanobacterium
ES-bin-313 (Zeng, unpublished) forms an additional genus, sharing only 16.7-18.6%
isDDH, 65.61-68.26% ANI, 76.28-86.66% CGS with the Gloeobacter strains
and 16.7% isDDH 65.34% ANI, 76.28% CGS with the LV9 metagenome. Both of these
metagenomes lack the 16S rRNA gene and are not shown in the RNA tree. Three metagenomes assigned to Candidatus Sivonenia alaskensis in the thorough study of Pessi et al. (2023) share 97.54-99.13% ANI, 81.0-95.0% isDDH and 99.28-99.88 CGS, but only 65.32-65.45% ANI, 20.8-25.0% isDDH and 78.21-78.57% CGS with G. violaceus PCC 7421. The latter values place these strains in different genera. The 3 metagenomes are similar in size (2.77-3.39 Mbp), much smaller than the complete genome of the reference strain PCC 7421 (4.66 Mbp). Unfortunately, 2 (PMM_0042 and PMM_0068) of the 3 Candidatus S. alaskensis genomes do not contain a rrn operon, and 16S rRNA sequence identities cannot be calculated. However, the metagenome IMG_3300025548_6 and strain PCC 7524 share only 91.78% 16S rRNA sequence identity.
Gloeocapsopsis
Of five genomic sequences available for members of this genus, three are of the same strain and fall into species 3.1.16A of the 16S rRNA phylogeny. The remaining two lie in species 3.1.16D, although one (G. crepidinum strain LEGE 06123) has been excluded from the genomic phylogeny. Comparisons are available, therefore, for only 2 strains: G. dulcis AAB1 (3.1.16A) and Gloeocapsopsis sp. IPPAS B-1203 (3.1.16D). These share 98.59% 16S rRNA sequence identity, their separation into two species of a single genus being confirmed by genome metrics values of 34.3% isDDH, 86.54% ANI and 98.41% CGS. They also share 96.81-98.05% 16S rRNA sequence identity with an organism known as
Chroogloeocystis siderophila (species 3.1.16B). That the latter is a member of the same genus as the Gloeocapsopsis strains is confirmed by genome metrics: 27.1-27.2% isDDH, 82.73-82.81% ANI and 97.59-97.61% CGS. The genus Gloeocapsopsis Geitler ex Komárek, 1993 clearly has priority over Chroogloeocystis I.I.Brown, D.Mummey & K.E.Cooksey, 2005, a genus which is currently considered as invalid (no description or diagnosis, or type) according to AlgaeBase. Gloeomargarita
Genus 10.2.1 contains the 16S rRNA genes extracted from the complete genomes of 2 strains of Gloeomargarita ahousahtiae, and the draft genome of Synechococcus sp. C9. One strain of Gloeomargarita, V14D9, is represented by two genomic sequences (CP088016 and OV696605), identical with all genome metrics, both were deposited in NCBI as Gloeomargaritales cyanobacterium. They show 99.12% 16S rRNA sequence identity (in comparisons of the extracted 16S rRNA genes), 92.64% ANI and 50.0% isDDH with Synechococcus sp. C9. Both strains, isolated from hot springs, are therefore con-specific, and lie in species 10.2.1A. The sequence of strain V14D9 obtained by 16S rRNA-specific PCR is short (1156 nt) but identical to the gene extracted from the genome. The type strain of the genus, G. lithophora Alchichica-D10, is represented by a 16S rRNA sequence in species 10.2.1F; the complete genome (CP017675) shows 97.84-98.01% 16S rRNA identity (extracted 16S rRNA genes), 81.66-81.93% ANI, 24.9-25.0% isDDH and 94.66% CGS with the members of species 10.2.1A. Four Gloeomargarita sp. metagenomes (SKY B31, SKY B120, SKY G98, SKY G116) have been omitted from the genome tree. Full details of the Gloeomargarita strains are given by Bacchetta et al. (2023).
Gloeothece
The two available genomic sequences of Gloeothece, Gloeothece
citriformis PCC 7424 and Gloeothece verrucosa PCC 7822, both
previously named as Cyanothece spp., form two genera (species 5.1.3.3C
and 5.1.3.4A). Their completed chromosomes share only 17.4% isDDH (GGDC formula 3), 78.22% ANI (with 50% coverage), 80.39% Mash similarity, 94.65% CGS and 95.2% identity of their
extracted 16S rRNA genes. The genome metrics values suggest that they are either
borderline members of different genera or assignable to two species of a single
genus.
Hormoscilla
Family 4.4 contains a single genus and named species, Hormoscilla
spongeliae, whose members are all cyanobionts with various marine sponges
as hosts. The 16S rRNA genes extracted from the metagenomes GM7CHS1pb, GUM202, SP5CHS1
and SP12CHS1 share 98.99-100% sequence identity, assigning them to a single
species (4.4.1C). The four metagenomes are divided into two sub-clusters by genome metrics:
(a) GM7CHS1pb with GUM202, with 99.94% ANI, isDDH 98.6%, 100% CGS and (b) SP5CHS1 and SP12CHS1, showing 98.12-99.54% ANI, 83.2-83.8% isDDH, 99.81% CGS. However, values of 92.88-93.22% ANI and 55.2-55.3% isDDH unite the sub-clusters into a single species.
The additional GUM007 metagenome (species 4.4.1B) is chimeric and has been excluded
from similarity estimation. The 16S rRNA sequence extracted from this genome
shows 97.90-97.98% identity to the others, suggesting placement into a second
species. Two other species of this genus are defined by 16S rRNA sequence
identity in our tree, but are not represented by genomic sequences.
Hydrococcus
Genus 5.1.3.5 contains two draft metagenomic sequences of pleurocapsalean
taxa named as Hydrococcus sp. These are identical in 16S rRNA sequence,
show 99.61% ANI, 99.2% isDDH, 99.39% CGS and share (with their nearest relatives) only 93.54% 16S rRNA sequence identity (70.0-70.18% ANI, 19.4% isDDH, 89.51-89.69% CGS)
with the complete chromosome of Gloeothece verrucosa PCC 7822 (genus
5.1.3.4) and 94.88% 16S rRNA sequence identity (76.39-76.64% ANI, 22.6% isDDH, 93.53-93.71% CGS) with
the singleton genus Pleurocapsa sp. PCC 7327.
Kamptonema
Within species 5.3.2.1A, the draft genomes of Kamptonema
spp. strains PCC 6407 and PCC 6506 show 96.1% isDDH, 99.95% ANI and 99.98% CGS, their 16S rRNA sequences being identical. Strain PCC 6407 shows only 27.2% isDDH
(80.34% ANI) and 97.91% 16S rRNA sequence identity with the unidentified
Oscillatoriales strain USR001, a member of the second species of this genus.
The latter has been excluded from the genome tree, containing only 31 of our 76
conserved markers; the isDDH and ANI values are evidently approximate, since
the genome is incomplete. Strain NRMCF 0138 is found by CheckM to be a bacterium (probably Chloroflexus); of two rrn operons, one 16S rRNA gene lies in genus 4.3.1 with genomes of Desertifilum, the second in cluster env2.2. Both 23S rRNA genes fall into the bacterial outgroup.
Leptolyngbya
Although many genomic sequences of Leptolyngbya
isolates and generic clusters thereof which have recently been renamed (e.g. Nodosilinea)
are available, they fall as single members of separate genera and species;
comparisons are therefore pointless, with the few following exceptions. Species
6.1.1F contains two Nodosilinea sp. sequences (FACHB-13 and FACHB-141) that share 99.93% 16S rRNA
sequence identity; the values of 58.7% isDDH, 95.50% ANI and 99.53% CGS clearly support
their con-specificity. The two sequences in species 6.1.1M, Leptolyngbya
antarctica ULC041 and Leptolyngbya sp. ULC073, are both metagenomic
assemblages. They are identical in 16S rRNA sequence and show an isDDH value of 86.1% 98.08% ANI and 99.45% CGS, confirming their con-specificity. Halomicronema excentricum strain
Lakshadweep and Lyngbya confervoides BDU141951 (species 6.1.5A) are similarly
identical in 16S rRNA sequence and share 98.7% isDDH, 99.98% ANI. Unfortunately, the sequence of the former genome is chimeric and version 1 of the latter genome is 23.1% contaminated; both have been excluded from the genome tree. Version 2 of the Lyngbya
confervoides BDU141951 genome is identical to version 1 in 16S rRNA
sequence, has little contamination, and is included in the genome tree. The
genome of this organism shares 82.04% ANI and 97.93% CGS with that of Halomicronema
sp. CCY15110 (species 6.1.5E), thus being confirmed as a member of a different
species. The value of 25.5% isDDH is not discriminatory.
Genus 6.3.25 contains the extracted 16S rRNA gene sequences
of two strains, Leptolyngbya sp. ULC077 and Phormidium sp. CLA17,
in two different species. They show 96.77% identity, the draft genomes sharing
28.0% isDDH, 83.23% ANI and 97.39% CGS. Genome metrics therefore confirm the
specific separation of these strains.
The 16S rRNA gene extracted from the genomic sequence of Leptolyngbya
sp. strain 2LT21S03, occupying a single branch within family 6.2, does not
match the normal 16S rRNA sequence derived by PCR amplification (FM177494,
family 6.3), sharing only 87.5% rRNA sequence identity; both are included in
the 16S rRNA tree. The genome sequence of strain 2LT21S03 is too incomplete to
be included in the genome tree. Strains Leptolyngbya sp. O-77 and Thermoleptolyngbya
sp. PKUAC-SCTA183 (species 6.2.8A), both from hot or thermal springs, sharing 99.53%
16S rRNA sequence identity, are similar in genome size (5.48 and 5.52 Mbp) and
GC content (55.9, 56.4 Mol%); their complete chromosomes show 66.7% isDDH, 89.65% ANI, 92.75% Mash similarity and 99.15% CGS, confirming their con-specificity Three strains of
"Candidatus curcubocaldaceae cyanobacterium", Thermoleptolyngbya
albertanoea ETS-08 and two strains identified only as "thermophilic
cyanobacterium", not represented by genome sequences, also fall within
species 6.2.8A; all members of this species were isolated from thermal
environments. The position of Thermoleptolyngbya sp. strain PKUAC-SCTB121
in the tree is uncertain, since the two copies of the 16S rRNA gene appear in
different genera: one in genus 6.2.8, the second in genus 6.3.33. Although the
latter shows 98.92% sequence identity with the two identical 16S rRNA genes of Leptodesmis sichuanensis PKUAC-SCTA121, the genome metrics clearly define the strains
as members of different genera (isDDH 13.1%, ANI 69.49%, MASH similarity
72.60%, CGS 89.22%). Strain PKUAC-SCTB121 is more likely a member of genus
6.2.8, sharing 97.15-97.63% 16S rRNA sequence identity, 68.1-99.8% is DDH (GGDC
formula 3, complete genomes), 89.47-99.70% ANI, 92.76-99.82 MASH similarity and
99.16-99.93 CGS with strains O-77 and PKUAC-SCTA183.
Within the mono-specific genus 6.3.1 (Leptolyngbyasensu
stricto), the 5 strains of L. boryana (4 having completed genome
sequences) are identical (100% 16S rRNA sequence identity) and show high values
of 97.4-100.0% isDDH, 99.14-100%
ANI, 99.99-100% CGS. Although the sizes of the complete chromosomes are virtually identical
(6.18-6.26 Mbp), the 6 draft genomes (Leptolyngbya boryana or Leptolyngbya sp.) are larger (6.65-7.26 Mbp); these show 99.93-100% 16S rRNA identity, 83-100% isDDH, 97.25-100% ANI and 99.87-100% CGS. Con-specificity of these isolates is therefore confirmed by all metrics. Including strains that are not represented by genomic sequences, isolates
within this cluster are unfortunately named as 6 different genera and 8 species.
Strain NIES-2135 carries a 16S rRNA gene on a plasmid, in addition to the
chromosomal copy; both are identical. Monospecific genus 6.3.2 contains a single strain (Leptolyngbya sp. NIES-2104) showing 94.50-95.88% 16S rRNA with the members of genus 6.3.1. See Section B2 (above) for strains whose placement is ambiguous in other genera named as Leptolyngbya.
Leptothoe
In species 6.1.6A, two strains, Leptolyngbyaceae
cyanobacterium CCMR0081 and CCMR0082, are identical in 16S rRNA sequence and
show 87.8% isDDH, 98.27% ANI, 99.91% CGS; they share 99.33% rRNA sequence identity, 63.2% isDDH, 94.30-94.37% ANI and 99.37-99.41% CGS with Leptolyngbya sp. PCC 7375
in the same species. The metagenomic Phormidesmis priestleyi Ana ITZX
contains no rrn operons and cannot be included in the 16S rRNA tree. Species
6.1.6B and 6.1.6D contain the draft genomic sequences of two strains of Leptothoe
(Leptothoe spongobia TAU-MAC 1115 and Leptothoe kymatousa TAU-MAC
1615). These share 97.36% 16S rRNA sequence identity, but this apparent
specific, rather than generic, separation cannot be confirmed by genome
metrics: the low (20.2%) isDDH value is non-discriminatory in this region, but
the 74.23% ANI value places the strains into two different genera. The strains
are not included in the genome tree, because they contain only 66 and 58 of our
79 gene marker set, therefore a CGS value cannot be calculated. These
sponge-associated organisms were suggested (Konstantinou et al., 2021)
to have undergone reduction of genome size; this is surely an invalid remark,
based on incomplete draft genomes, and most likely explains the low genome
metrics values obtained with genomes which lack different regions.
Limnothrix
The 16S rRNA sequences extracted from the genomes of six Limnothrix sp. lie in species 5.4.2.1A (isolates FACHB-406, FACHB-881, FACHB-1083, PR1529 and the metagenomes CACIAM 69d, P13C2). These share 99.73-100% 16S rRNA sequence identity, 54.2-99.9% isDDH, 93.83-99.96% ANI and 99.52-100% CGS; the genome metrics therefore confirm their con-specificity. This species contains the 16S rRNA sequences of 6 additional Limnothrix sp. strains obtained by specific PCR, carrying two different specific epithets. The draft genomes of two other strains named as L. rosea (IAM M-220 [NIES-208]) and the type species L. redekeii (PCC 9416) lie in clusters 5.1.7.9B and 5.4.1.1A, respectively. These show only 19.7-31.1% isDDH, 65.67-66.30% ANI and 77.06-80.57% CGS with the members of cluster 5.4.2.1A. Cluster 5.1.7.9B mostly contains strains named as Picosynechococcus sp., and cluster 5.4.1.1 is composed almost entirely of Pseudanabaena spp. L. rosea strain IAM M-220 is closely related to Leptolyngbya sp. PCC 7376 in the 16S rRNA phylogeny, but genome metrics place these strains into two different species (see below). The nearest relatives of L. redekeii PCC 9416 in the 16S rRNA phylogeny are the strains Pseudanabaena spp. PCC 7429 and ULC066 and the metagenome Salubria-1 in cluster 5.4.1.1A, with 99.33-100% 16S rRNA sequence identity. On the basis of genome metrics, however, the closest relative is the metagenome Salubria-1 (86.5% isDDH, 98.12% ANI, 99.90% CGS); the remaining strains show only 25.7-26.3% isDDH, 79.50-79.89% ANI and 97.23-97.40% CGS and are therefore members of different species. The reasons for this discrepancy are explained by the behaviour of Pseudanabaena spp. (see above).
Lyngbya
The genus Lyngbya is polyphyletic, as described on the main text page. The 2 available genome sequences in species 5.2.1.5B, Lyngbya
aestuarii BL J and Lyngbya sp. PCC 8106, appear to be con-specific, sharing
99.79% 16S rRNA sequence identity, 91.69% ANI, 46.8% isDDH and 98.76% CGS. The genome metrics all fall onto the borderline of specific separation. A third organism, named as the type species L. confervoides (strain BDU141951) is described above, in the paragraph describing Leptolyngbya.
Microcoleus Oscillatoria nigro-viridis PCC 7112, Microcoleus vaginatus strains FGP-2, PCC 9802 and Microcoleus sp. strains AT9b-C3, B4-C1, D3_18_C1, F12-A1, herbarium5, LAD1_D3, LEGE 07076 ,M2-B5, MOSTC5, N9_B2, Pol12A4 and w2-18bC1, in species 5.3.1.1A, share a minimum of 98.93% 16S rRNA sequence identity, their genomes showing 49.4-96.8% isDDH, 91.53-99.97% ANI and 97.38-98.71% CGS. The two M. vaginatus strains within this species show 96.8% isDDH, 99.97% ANI and 100% CGS, their 16S rRNA sequences being identical. The genome metrics therefore
confirm the con-specificity of these strains. Strain M. vaginatus HSN003 is not shown in the genome tree. M. vaginatus strain PCC
9802 shows only 91.36% 16S rRNA sequence identity with Allocoleopsis franciscana (Microcoleus sp.)
PCC 7113 in genus 3.1.10; generic separation is confirmed by values of 24.4%
isDDH (not discriminatory at this low level), 68.63% ANI and 87.89% CGS. Strain
PCC 7113 (species 3.1.10A) shares 98.72-98.79% 16S rRNA sequence identity with
strains FACHB-1 and FACHB-53 in species 3.1.10B; specific separation is confirmed by 41.8% isDDH, 89.19-89.25% ANI and 98.85-98.96% CGS. That strains FACHB-1 and FACHB-53 are con-specific is shown
by values of 99.93% 16S rRNA sequence identity, 85.0% isDDH, 97.72% ANI and 99.80% CGS. Allocoleopsis franciscana PCC 7113 shows only 90.51% 16S rRNA sequence identity with Microcoleus
sp. strain IPPAS B-353 in genus 5.2.4.8 (described in more detail above with
other halophiles); although isDDH (27.8%) may not be of great value within this
range, generic separation is supported by values of 66.96% ANI and 85.67% CGS. Species 5.3.1.1B contains Microcoleus sp. strains ARII-A1, Aus8_D4, herbarium12 and CAWBG27 (metagenome) that are virtually identical in 16S rRNA sequence, their con-specificity is confirmed by all genome metrics. The 16S rRNA sequences of three strains of Microcoleus (the second sequence of ARII-A1 together with those of K4-B3 and Z1_C3) lie in the bacterial outgroup.
Microcystis
The 50 strains represented by genomic sequences and 5
metagenomes of Microcystis spp. present in the 16S rRNA tree (genus
5.1.1.1) exhibit a minimum 16S rRNA sequence identity of 98.85% and therefore appear
to be members of a single species, which by consensus would be M. aeruginosa.
The specific names given for many members of genus 5.1.1.1 (bengalensis,
flos-aquae, ichthyoblabe, panniformis, protocystis,
pseudofilamentosa, ramosa, robusta, smithii, viridis, wesenbergii) may therefore be considered invalid. The gene extracted from the draft
metagenome TA09 gives lower 16S rRNA identity values; this may show poor
assembly of the genome that consequently contains foreign DNA. Of the 46
genomes shown in the genome tree, only 11 are complete (strains Chao1910, FACHB-1757, FD4,
MC19, NIES-88, NIES-102, NIES-298, NIES-843, NIES-2481, NIES-2549 and PCC
7806SL). Their high (99.60-100%) 16S rRNA sequence identity is mirrored by ANI
values of 94.8-99.49%, 95.31-99.69% Mash similarity and 99.20-99.97% CGS,
confirming the con-specificity of the strains. However, isDDH (GGDC formula 3) values greater than the intra-species limit of 62.10% are observed only for strains NIES-102, NIES-298, NIES-1757 and PCC 7806SL, the remainder being placed into a second species.
We have analyzed core and accessory
genomes for each strain, made with the spine software (Ozer et al.,
2014), using only the 10 available complete sequences. The core genomes
represent 56.39-68.63% of the total genome and show slightly higher mol% G+C
(43.5-44.0 vs 42.09-42.92%). The addition of the draft genomic sequences
would further slightly reduce the size of the core genome (see Humbert et al.,
2013). With the core genomes, ANI values increased only slightly to a minimum
of 95.81%, while isDDH values
increased dramatically to 96.6-100% and Mash similarities to 100%. All core
genomes contain our 79 marker genes (therefore the CGS values were unchanged) plus
two complete rrn operons. In contrast, all pairwise crosses of the
accessory genomes gave ANI values of at most 94.32% and a maximum isDDH value of 37.8%. Unlike the core genomes, the accessory genomes did not contain rrn operons or our marker genes. We have shown the presence of an unusually high number of repeat
sequences in the complete chromosomes of members of this genus, using the RepSeek
software (Achaz et al., 2006) with a minimum seed value of 25. The
complete chromosomes vary not only in size (4.29-5.87 Mbp) and number of genes
(4606-6671) but also in their content of paired repeats (3.46x104-1.54x105), the highest being found in strain M. panniformis FACHB-1757 (CP011339);
see the figure included in the description of Moorena. The majority (66-70%)
of the repeat sequences fall into the core genome, and within a single strain
the core and accessory genomes share 28-36% of these elements, showing that
they occur at least twice within the genome. The core genomes themselves share
49-82% of their total repeats between strains, whereas the accessory genomes
share only 18-25%. The combined results show the presence of a conserved core
genome and a variable accessory genome in M. aeruginosa, and
collectively unite all the strains into a single species. These results confirm
and extend those of Humbert et al. (2013), who commented on a genome
evolutionary strategy that combines a large genome plasticity, characterized by
a high number of repeated sequences, numerous rearrangements and an ability to
include new adaptive genes by horizontal gene transfer. Our unpublished studies
using wet-lab optical thermal renaturation DDH show the con-specificity of 5
strains of the PCC (PCC 7005, PCC 7806, PCC 7813, PCC 7820 and PCC 7941),
having DDH values of 64-100%. Results with 3 of these strains can be compared
with those of their incomplete genomes obtained with isDDH (the latter in parentheses): PCC 7806 x PCC 7941 64%
(72.4%); PCC 7806 x PCC 7005 74% (72.1%); PCC 7941 x PCC 7005 76% (80.1%). The
members of this toxin-producing genus require further detailed study. Several
strains, apparently misidentified as Sphaerocavum brasiliense, also fall
into genus 5.1.1.1 but are not represented by genomic sequences; more details
of these are given on the main page of this site.
Moorena
Sequenced genomes exist for Moorena producens strains 3L (represented by two
sequences, both draft), JHB (two sequences, one linear, one complete) and PAL-8-15-08-1 (complete), together with
the draft sequence of M. bouillonii strain PNG5-198 (genus 3.1.3).
Twelve metagenomic sequences have been excluded from this analysis. The two draft
genome sequences of strain 3L show 99.90% isDDH, 99.99% ANI and 100% CGS, their extracted
16S rRNA sequences being identical. The two genomes of strain JHB show 99.93% 16S rRNA sequence identity, 98.2% isDDH and 99.94% ANI. They show 98.92-100% rRNA sequence identity and 67.7% isDDH, 94.64-94.65% ANI with M. producens strain 3L, which is therefore a member of the same
species. The lower values of 98.65-98.79% rRNA sequence identity and 48.0-49.0%
isDDH, 92.02-92.50% ANI, 98.94% CGS with M.
bouillonii PNG5-198 confirm their con-specificity. Despite their very different geographic origins (Leao et al., 2017), strains JHB and PAL-8-15-08-1 share 99.06% rRNA sequence identity; however, they give only 47.6% isDDH,
99.01% CGS and 91.62% ANI (with 62% coverage). The genome metrics, unlike the
high 16S rRNA sequence identity, suggest these strains to be borderline members
of the same species. As found by the RepSeek programme, they contain an
exceptionally high number of repeat sequences, as shown in the figure below.
The two M. producens genomes contain 3.85x105
and 4.44x105 repeat sequences (black lozenge symbols), many more
than any other cyanobacterial strain. The complete chromosome of their nearest
relative in the genome tree, Allocoleopsis franciscana (Microcoleus sp.) strain PCC 7113, contains
only 3.35x103 repeat sequences. Only three of seven strains of Microcystis
(red lozenge symbols) contain significantly more than 1x105, these
being described in more detail in the appropriate paragraph. In the case of M.
producens, one would expect this high number of repeats to prevent accurate
determination of genetic relationships. The M. producens genomes contain
a complete hgl gene cluster and hetR, but not nif genes
(Leao et al., 2017); their genomes, of size 9.37-9.67 Mbp, are not
unlike those of many heterocystous cyanobacteria. Leao et al. suggested
that these organisms may have evolved from a heterocystous organism, a
situation analogous to that observed in Raphidiopsis brookii D9.
However, the latter organism lacks nifH and hglD, hetR
being found in many cyanobacteria. If the hypothesis is true, one would expect
members of the genus Moorena to cluster with the heterocystous
cyanobacteria; this is not the case for any of the trees shown on this site.
Nodularia
The strains of Nodularia for which genomes are available fall into two species of the 16S rRNA tree. Within species 1.3.1A, the complete chromosome of Nodularia spumigena strain UHCC 0039, the circular (not listed as complete) genome of N. spumigena strain CCY9414 and the draft genomes of Nodularia
sp. CENA596 and N. sphaerocarpa UHCC 0038 share 98.52-99.73% 16S rRNA sequence identity and show isDDH values of 58.2-98.4%, ANI 98.75-99.45% and CGS 98.75-99.94%, confirming their con-specificity. This species appears to be divided into three subclusters: 1a, strain UHCC 0038; 1b, strain LEGE 06071, which shows 98.86-99.10% 16S rRNA sequence identity, 38.2-38.4% isDDH, 88.67- 88.84% ANI, 98.75-98.77% CGS with subcluster 1c; 1c, strains CCY9414, UHCC 0039 and CENA 596, sharing 99.46-99.73% 16S rRNA sequence identity, 74.3-98.4% isDDH, 89.80-99.45% ANI, 99.56-99.94% CGS. The lower values of 98.52-98.93% 16S rRNA sequence identity, 38.0-38.4% isDDH, 87.91-88.96% ANI, 98.27-98.84% CGS of the members of species 1.3.1A with the draft genome of strain NIES-3585 (species 1.3.1B) confirm their specific separation.
Nostoc
The "Nostoc" strains for which sequenced genomes are
available mostly fall into dispersed generic/specific clusters of the tree as
single members, and show only low (around 22%) isDDH values. The 16S rRNA identity values are consistent with genome metrics in all cases, except for three clusters (described above).
Genus 1.8.1 in the ARB 16S rRNA tree contains
28 "Nostoc" strains represented by genomic sequences, 10 being complete,
and two genomes from herbaria; their grouping into the same or different
species is, in almost all cases, supported by both 16S rRNA sequence identity
and isDDH and ANI values. Some of these sequences have been omitted from the phylogenomic tree.
Nelson et al. (2019), in a tree of 100 genomes inferred with 834 unspecified single copy markers, described the complete chromosomes of four Nostoc cyanobionts,
isolated from hornworts and a liverwort. Nostoc sp. strains TCL240-02
and C057 lie in species 1.8.1A of the 16S rRNA tree, with Nostoc sp.
strain C052 in species 1.8.1I. Nostoc sp. strain TCL26-01 is unrelated
to the first three, falling into a different family (species 1.11.3A). The
con-specificity of strains TCL240-02 and C057 is confirmed by genome metrics
(isDDH 65.1%, ANI 93.19%), and the separation of strain C052 at the specific
level by lower values (isDDH 37.2%, ANI 86.43%). Similarity measures involving
strain TCL26-01 are more difficult to calculate, but the estimates of 14.6-14.7%
isDDH and 75.84-75.92% ANI support generic separation. The isDDH results
described above were obtained with GGDC formula 3.
Species 1.8.1A contains, in
addition to Nostoc sp. strains TCL240-02 and C057, Nostoc sp. strains
ATCC 53789 (represented by 2 sequences, 1 complete), N6, PCC 73102 (both
represented by complete chromosomes, and strains 213, 232, LEGE 12447, LEGE
12450, UCD121, UCD122 and UIC 10630 (all with draft genomes). Nostoc sp.
strain UCD120 also falls into this species on the basis of genome metrics, but
lacks rrn operons and is not shown in the 16S rRNA phylogeny. The
strains are all cyanobionts, except strain UIC 10630 which was isolated from
soil. Also present are several cyanobionts named as Nostoc sp. (16S rRNA
only). For Nostoc sp. strains 213, 232, N6, UCD120, UCD121 and UCD122,
genome metrics conflict either with their con-specificity within species 1.8.1A
or their assignment to different genera when compared to the cyanobiont members
of genus 1.8.2, as described above.
The remaining members of species 1.8.1A
are truly con-specific, as shown by their ranges of 98.59-99.93% 16S rRNA
sequence identity, 52.2-99.6% isDDH and 91.93-99.90% ANI. Their inter-specific relationship
to other members of genus 1.8.1 is shown unambiguously by values of 97.38-98.79%
16S rRNA sequence identity, 35.8-37.8% isDDH, 86.36-87.13% ANI and 98.28-98.56%
CGS. Strain N6 and Nostoc sp. 'Lobaria pulmonaria cyanobiont'
strain 5183 (species 1.8.1S), both represented by complete chromosomes (Gagunashvili and Andrésson, 2018), are also placed into different species by
all available measures (96.97% 16S rRNA sequence identity, 36.5% isDDH (GGDC
formula 3), 87.12% ANI, 89.02% Mash similarity and 98.18% CGS.
N. edaphicum strain CCNP1411 (complete chromosome) and Nostoc sp.
strain KVJ20 (draft) appear to be con-specific in species 1.8.1F, showing
99.46% 16S rRNA sequence identity, 59.4% isDDH and 93.59% ANI, but we have
shown that a small segment of the genome of strain CCNP1411 is chimeric. None
of the other Nostoc cyanobionts (e.g. strains 0708 from Azolla (described
above in detail) and KVJ20 from Blasia) are closely related to the
members of genus 1.8.1.
The complete chromosomes of N.
sphaeroides strains CCNUC1 and Kützing En lie in species 1.8.1AJ Their
extracted 16S rRNA genes show microheterogeneity and give identity values of 98.86-99.66,%.
Con-specificity is confirmed by all genome metrics (ANI 98.60%, isDDH 94.1% (GGDC
formula 3), MASH-similarity 99.06%, CGS 99.92%).
The draft genomes of Nostoc sp.
strains BAE, CCCryo 231-06 and HG1 form species 1.8.1N. Their con-specificity (16S rRNA sequence
identity 99.58-99.75%) is confirmed by an ANI value of 92.44% and isDDH 59.9%, the
latter value being borderline. CGS values are not available, since only strain
BAE is shown in the genomic tree.
The cyanobiont Nostoc
sp strain 2RC lies in species 1.8.2E with the N. linckia strain Z series
(13 genomes, of which we show only 4, all cyanobionts) plus the metagenome Nostoc
sp. Moss3 (moss epiphyte). Additionally present are some Nostoc spp., Desmonostoc
spp. and Dolichospermum spp. (represented by their 16S rRNA genes) from
varied habitats, only some of which are cyanobionts. Strain 2RC shares 96.48%
ANI (74.7% isDDH) with strain z1 and 96.3% ANI (74.4% isDDH) with strain Moss 3,
and 99.66% 16S rRNA sequence identity with both. The genome metrics therefore agree
with their con-specificity. Nostoc sp. Moss 3 itself shows low similarity to another
metagenome from a similar habitat, Nostoc sp. Moss 2 (found in species
1.8.1AD), showing only 95.50% rRNA sequence identity. However, values of 82.31% ANI and 96.27% CGS do not support this
generic separation. A value of 28.9% isDDH with GGDC formula 2 should not be
considered to be accurate in this range. The 13 N. linckia strain Z
series genomes are virtually identical in size (8.9-9.2 Mbp, deduced by
addition of the lengths of their contigs) and exhibit 99.91-99.99% ANI,
99.7-99.9% isDDH. The genomic sequence of strain NIES-25 (whose extracted 16S
rRNA sequence lies in species 1.8.2G, only one of three being shown in the 16S
rRNA tree) is incomplete, containing only 47 of our 79 marker genes, and has
been excluded from the similarity calculations.
The genome of Nostoc sp. ATCC 43529 contains 2 16S
rRNA genes, one falling into species 1.8.2H, the other into the bacterial
outgroup. The genome shares 98.45% 16S rRNA sequence identity, 42.8% isDDH, 90.04% ANI, to that of Nostoc sp. Hyl-09-2-6 in species 1.8.2I. The separation of these strains into two species is therefore confirmed by all methods. The two species (including sequences obtained via
normal PCR reactions) show internally 98.8-99.9% 16S rRNA sequence identity, separated
at 98.5%. The genome of strain PA-18-2419, whose 16S rRNA sequence falls into
species 1.8.2H, is chimeric; this has been excluded from the tree based on
genomic sequences and from calculation of genome metrics.
The genome of Nostoc minutum NIES-26, like that of
strain ATCC 43529, contains 2 16S rRNA genes, the first appearing in genus 1.8.6,
the second as a single "genus" in family 1.13. Genomic sequences of
close relatives of these strains are not available, therefore isDDH cannot be
used to decide which sequences are correct.
Two genomic sequences in 16S rRNA species 1.3.7E, the
metagenome Nostoc sp. 996 and the draft Nostocaceae cyanobacterium
CENA357, show 46.0 % isDDH, 90.93% ANI, 98.99% CGS, in agreement with a 16S
rRNA identity of 98.22%, placing them as borderline members of a single
species.
Nostoc sp. NIES-2111 and Nostoc sp. NIES-3756 (both terrestrial isolates) are con-specific (species 1.11.1A, 99.9% 16S rRNA sequence identity and 85.4% isDDH, 98.20% ANI). Within species 1.12.1A, Nostoc sp. HK-01 (complete chromosome), N. cycadae WK-1 (draft) and a strain named as Anabaenopsis circularis NIES-21 (= IAM M-4, Nostoc PCC 6720, draft genome) are con-specific, having identical 16S rRNA sequences;
they show 82.6-84.9% isDDH and 97.44-97.96%
ANI. However, Nostoc sp. strain PCC 7107, represented by a complete chromosome,
shows a high (99.60%) rRNA sequence identity with the other 3 members, 62.1% isDDH with strain Nostoc sp. HK-01 (GGDC formula 3) and 47.0-47.1% isDDH
with the remaining strains. The ANI values between strain PCC 7107 and the
others are 91.13-91.43%. The genome metrics (Mash similarity 93.25%, ANI 91.3% and
CGS 99.17%) between strains PCC 7107 and HK-01 suggest these strains to be
borderline members of a single species. Nostoc sp. strain FACHB-190 is
clearly a member of the same species (1.12.5C) as Aulosira sp. FACHB-615
(99.93% rRNA sequence identity, 84.0% isDDH, 97.50% ANI and 99.91% CGS). Lachance
(1981) demonstrated a wet-lab DDH value of 67% ΔTm(e)4°C between
strains Nostoc sp. PCC 7107 and PCC 6720 (= A. circularis NIES-21),
confirming their con-specificity.
In conclusion, the 16S rRNA sequence similarities, isDDH and
ANI values are in good agreement for defining species of this large clade of
Nostocacean cyanobacteria. However, genome sequences of further Nostoc
isolates are urgently required. Lachance (1981) first demonstrated by wet-lab DDH
the wide dispersal of Nostoc spp. strains of the PCC, but several
clusters were apparent. The first grouped strains PCC 6302, PCC 7121 and PCC
7422 with 52-56% DDH and high ΔTm(e) values of 9-16°C, which suggests these strains to be members of different species of the same genus; their PCR-derived 16S rRNA sequences are found here in three species of genus 1.8.2. In the second, strains PCC 6720, PCC 7107 and PCC 7416 showed 69-93% DDH and low (0-4°C) ΔTm(e) and should be considered to be members of a single species;
this is confirmed by the placement of their PCR-derived 16S rRNA sequences into
species 1.12.1A of the tree. High (93-100%) DDH values with low (0-1°C ΔTm(e)) show the near-identity of strains of the third cluster (PCC
6411, PCC 6705, PCC 6719, PCC 7118, PCC 7119 and PCC 7120); their PCR-derived
16S rRNA sequences are found here within genus 1.10.1.
Okeania
Metagenomes of four organisms attributed to the genus Okeania are found in two species of genus 5.2.2.1. Species 5.2.2.1C is composed of 16S rRNA genes extracted
from three Okeania sp. metagenomes, showing 99.60-100% sequence
identity. A further 6 are shown in the genome tree. The 9 genomes show 76.1-99.8%
isDDH, 94.87-99.97% ANI and 98.77-100% CGS, confirming their con-specificity. Separation of SIO1I7 into a distinct species is supported by low values of isDDH (47.1%), ANI (90.88-90.98%) and CGS (98.77-98.84%).
Phormidesmis
The genus Phormidesmis is problematic, since strains
bearing this name are found in genera of two families: 6.1.9, 6.1.10, 6.1.11,
6.3.18, 6.3.20 and 6.3.27, with 3 additional loner genera in family 6.3. In
genus 6.3.18, the only cluster that contains strains represented by genomic
sequences, two strains named as Phormidesmis (BC1401 and ULC007, the
latter represented by 3 identical sequences), are assigned to different species
(6.3.18A and 6.3.18B, respectively, showing only 97.31% 16S rRNA sequence identity)
in the 16S rRNA tree. Values of 26.3% isDDH (not discriminatory in this range), 81.90% ANI and 97.55% CGS confirm specific separation. They show even lower values (93.5-94.1% 16S rRNA sequence identity, 20.0-21.9% isDDH, 68.34-68.48%
ANI and 88.99-89.35% CGS) with the metagenomic sequence Alkalinema sp.
CACIAM 70d, which therefore lies in a distinct genus (6.3.22), sharing 99.66%
16S rRNA sequence identity with the draft genomic sequence of strain FACHB-965.
The genome sequence of Phormidesmis priestleyi strain Ana ITZX contains
neither 16S rRNA nor 23S rRNA genes and cannot be shown in the 16S rRNA trees; this
sequence is clearly also a member of a different genus to strains BC1401 and
ULC007, showing only 19.6-20.1% isDDH, 66.79-66.85% ANI and 84.04-84.40% CGS. The
metagenome CACIAM 70d and strain Ana ITZX, showing 66.43% ANI and 82.95% CGS,
are themselves assigned to different genera. Two genome sequences of Phormidesmis
priestleyi strain ULC027 are incomplete and lack both 16S rRNA and 23S rRNA
genes; they are shown neither in the trees based on 16S rRNA nor in the genome
tree.
Picosynechococcus
Within genus 5.1.7.9 (renamed to Picosynechococcus
by Komárek et al., 2020), in which all strains are of marine origin, 2
species (5.1.7.9A and C) defined by 16S rRNA sequence identity (sharing only
98.05% identity) are supported by low isDDH
(20.8-30.7%) and ANI (77.76-77.95%) values of their genomes. In species 5.1.7.9A,
the six 16S rRNA sequences extracted from the respective completed chromosomes
of unicellular strains (Picosynechococcus sp. strains PCC 7002,
PCC 7003, PCC 7117, PCC 8807, PCC 11901 and PCC 73109) show a range of 99.53-100%
identity; the isDDH values for
these genomes are 77.6-93.7% (GGDC formula 3), with ANI estimates ranging from
96.10 to 99.91%, CGS from 99.14 to 99.80% and Mash similarities from 97.28 to
97.60%. These strains are therefore confirmed as con-specific by all estimates.
The two strains of species 5.1.7.9C, Picosynechococcus sp.
NIES-970 and NKBG15041c, show 100% rRNA sequence identity, 98.51% ANI, 87.6% isDDH and 99.88% CGS, and are therefore again con-specific.
Planktothricoides
Three genomic sequences, identical in the sequence of their
extracted 16S rRNA genes, lie in species 5.2.3.4A, named as Planktothricoides
sp. They share 91.5-91.6% isDDH, 98.85-99.98% ANI and 99.95-100% CGS, confirming their con-specificity.
Prochloron
Three Prochloron didemni metagenomes (genus 5.1.6.1),
from wide geographic origins, exhibit 99.70-100% 16S rRNA sequence identity,
96.90-99.19% ANI and isDDH values of 78.4-93.4%, confirming their membership of a single species. The genome P1 was omitted from the genome tree, since it is incomplete. The two remaining strains share 99.74% CGS.
Raphidiopsis
The members of genus 1.2.1 were renamed from Cylindrospermopsis
to Raphidiopsis by Aguilera et al. (2018) in a detailed study of
16S rRNA, 16S–23S ITS and cpcBA-IGS sequences in which no evidence of
generic separation was found. Raphidiopsis has clear
priority over Cylindrospermopsis, and R. curvata was retained as
the type species. 16S rRNA and draft genome sequences are jointly available
in our phylogenetic trees for 12 strains of R. raciborskii (CS-505,
CS-508, CS-509, CENA302, CENA303, CYLP, CYRF, GIHE 2018, ITEP-A1, MVCC14,
MVCC19 and UNSW 506 [=CS-506]), together with R. brookii D9 and R.
curvata NIES-932 (represented by a draft genome and circular [possibly
complete] chromosome, respectively), the complete chromosomes of R.
curvispora GIHE-G1, R. raciborskii Cr2010, KLL07, N8 and the draft metagenome
Cyanobacteria bacterium REEB494. The sequences share 99.33-100% 16S rRNA sequence
identity (the complete chromosomes sharing 99.53-99.93%) and, on the basis of this
character, are members of a single species. The genome tree below shows that the
strains form two closely-related clusters. Within each cluster, the strains are clearly con-specific (cluster1: isDDH 61.6-95.4%, ANI 94.77-99.23%, CGS 99.07-99.94%; cluster 2: isDDH 71.1-72.2%, ANI 95.88-99.52%, CGS 99.37-99.39%). The clusters are separated by genome metrics values that fall onto our borderline for species delineation: isDDH 50.0-53.6%, ANI 91.17-92.49%, CGS 98.76-98.83%.
Extracted from our all-genome tree; rooted with Sphaerospermopsis spp.
Our BLAST results show that R. brookii D9,
R. curvata NIES-932 and R. raciborskii CENA303 lack the nifH
(N2 fixation) and hglD (heterocyst glycolipid synthase) genes
previously considered (Stucken et al., 2010) to be a diagnostic feature of
Raphidiopsis, although these genes are present in all other members of
the species. Consistent with the results of Stucken et al. (2010), the
chromosomes of the complete Raphidiopsis strains are small (3.43-3.91
Mbp), as shown in magenta in the following histogram; all other heterocystous
cyanobacteria possess larger chromosomes, ranging from 4.54 to 11.06 Mbp (pale
blue in the histogram).
Note that the chromosomes of the sister group of
Sphaerospermopsis spp. vary in size from 5.14 to 6.10
Mbp, heterocystous cyanobionts from 5.35 to 8.23 Mbp and the ADA clade members
(which, like Raphidiosis spp. also form water blooms and toxins) from
4.54 to 6.07 Mbp.
Although we do not know the genome size of the
ancestor, it is clear that the Raphidiopsis strains have undergone
genome reduction. Stucken et al. remarked that the complete chromosomes
of two Raphidiopsis spp. share a specific set of only 2539 genes with
greater than 90% ANI, and 3 copies of the rrn operon.
Richelia
Members of the genus Richelia (1.15.18), N2-fixing
endosymbionts of marine diatoms, divide into two species: the first contains
strains HH01 and HM01 (67.9% isDDH,
99.92% ANI, 99.26% 16S rRNA sequence identity; the latter was excluded from the
genome tree because it contains only 63 of 79 marker genes) together with the
metagenomic sequence Richelia UBA3481, that shows 99.7% isDDH with strain HH01
(unfortunately, 16S rRNA sequences are not present in this genome); the second
contains the single strain RC01, related to the first by only 96.98% 16S rRNA sequence
identity (DDH and ANI values were not compared for the incomplete genome
sequence of this strain, and it was excluded from the genome tree). A strain named as Calothrix
rhizosoleniae SC01, is related to strain HH01 by 20.6% isDDH, 75.78% ANI, 91.43% CGS and 96.84% 16S rRNA sequence identity, all metrics therefore place this organism into a separate genus. A further cluster, containing metagenomic sequences DT 104, UBA3308, UBA3957 and UBA3958,
is difficult to place due to the absence of 16S rRNA sequences in the genomes,
but probably represents a distinct genus. A final strain, FACHB-800, isolated
from a freshwater environment, lies in a different family (1.18) and is
presumably misidentified.
Rippkaea
Within genus 5.1.4.5, the complete chromosomes of Rippkaea
orientalis (formerly Cyanothece sp.) strains PCC 8801 and PCC 8802 are
con-specific, showing 96.2% isDDH
(GGDC formula 3) and 98.89% ANI, in good agreement with the 99.93% identity of
their extracted 16S rRNA genes, Mash similarity of 99.17% and CGS of 99.91%.
Roseofilum
Four organisms represented by draft metagenomes of genus
4.2.1 (Roseofilum reptotaenium AO1-A, RR100-1 and the unidentified
cyanobacteria UBA1583 and UBA2566) appear to be con-specific, with 16S rRNA
sequence identities of 99.66-100%, isDDH values of 56.3%-57.1%, 93.58-99.53% ANI and 99.06-99.97% CGS. The UBA1583 and UBA2566 metagenomes show an isDDH
value of 97.7%, 99.53% ANI and 99.97% CGS. Eleven other metagenomes assigned to
this genus (not shown in the 16S rRNA-based tree, because they do not contain
the gene) group with AO1-A, RR100-1, UBA1583 and UBA2566 with isDDH values of 56.8-78.7%, 93.51-100% ANI and 99.36-100% CGS. All members of this cluster, most named as Roseofilum sp., therefore represent a single species, despite their diverse geographic origins.
Scytonema
Within monospecific genus 1.19.1, the draft genomes of Scytonema
spp. strains HK-05 (2 sequences) and NIES-4073 show 99.06% 16S rRNA sequence
identity, 91.94% ANI (with 65% coverage), 57% isDDH and 98.91% CGS, and are therefore confirmed as members of a single species.
The metagenome labeled as Scytonema sp.
RU_4_4 (species 1.19.4B), shares 96.77% 16S rRNA sequence identity and 33.4% isDDH with S. tolypothrichoides VB-61278, suggesting them to be
members of different species, although the latter value may not be
discriminatory in this region. However, the genome of strain RU_4_4 (7.33 Mbp
from the combined lengths of the contigs) is potentially chimeric and has been
excluded from the genome tree; the genome of strain VB-61278 has a size
(calculated as the combined lengths of the contigs) of 10.01 Mbp, of which 7.9%
of the bases are ambiguous. JSpeciesWS is unable to calculate ANI values for
this strain unless the ambiguous sites are removed; the ANI value between the
genome of strain VB-61278 and the metagenome RU_4_4 is then 84.85%, confirming
their specific separation.
Note that four genomic sequences are available for "Scytonema millei" strain VB511283; one (JTJC00000000.1) is not of cyanobacterial origin or is chimeric; the single 16S rRNA sequence extracted from the second (JTJC00000000.2) lies with other Scytonema strains in genus 1.19.11; the third (JTJC00000000.3) and fourth (QVFW00000000) fall into Chroococcidiopsis species 3.1.14D, described above. The latter sequences share 99.93% 16S rRNA sequence identity, 100% isDDH and 100% ANI.
Spirulina
Two strains of Spirulina, PCC 6313 and PCC 9445, carrying
different specific epithets (major and subsalsa) are clearly
separate genera, sharing only 92.4% 16S rRNA sequence identity, 69.26% ANI, 88.16%
CGS and an isDDH value of
20.3%. Although isDDH values are not discriminatory in this range, the ANI and
CGS values confirm the generic separation of these strains. They are shown in
the 16S rRNA tree in genera 5.1.7.19 and 5.1.7.16, respectively. A third
genomic sequence, Spirulina major CCY15215, lies in genus 5.1.7.17. This
shows only 19.6% isDDH, 68.84% ANI and 88.41% CGS with strain PCC 9445, and
23.6% isDDH, 67.97% ANI, 87.14% CGS with strain PCC 6313, thereby confirmed as
a third genus. Cyanobacteria bacterium SBLK, represented by a draft metagenome,
shows 78.81% ANI with strain CCY15215, and is possibly a second species of the
genus; the CGS value of 94.59% is on the borderline between species and genus
discrimination and the isDDH value of 22.0% is not discriminatory. This
metagenome lacks a rrn operon and is consequently excluded from the 16S
rRNA tree. The metagenome Spirulina sp. SIO3F2 also lacks a rrn
operon; this represents a fourth genus, showing maximum values of 23.6% isDDH,
69.54% ANI and 87.87% CGS with members of the other genera. Two incomplete
metagenomic sequences, Spirulina spp. DLM2.Bin59 and S17.Bin059, have
been excluded from all trees.
Stanieria
Two Stanieria spp. strains (NIES-3757 and PCC 7437, genus
5.1.5.10) with completed chromosomes share 98.72% 16S rRNA sequence identity,
and fall into species 5.1.5.10B and 5.1.5.10A, respectively. Their genomes show
56.1% isDDH (GGDC formula 3), 89.64% ANI, 91.31% Mash similarity and 98.61% CGS, confirming their specific separation.
Synechococcus
In the mono-specific genus 7.2.1, Synechococcus elongatus
strains PCC 6301 (2 sequences), PCC 6311, PCC 7942 (two sequences), PCC 7943 and UTEX 3055, together with Synechococcus sp. strain
UTEX 2973 (all complete chromosome sequences) FACHB-1061 and a third sequence of strain PCC 7942 (draft sequences) are identical, showing 100% 16S rRNA sequence identity,
96-100% isDDH, 98.31-99.99%
ANI, 98.74-99.93% Mash similarity (for the complete chromosomes only) and 99.88-100%
CGS. Strain UTEX 2973 is a mutant of strain UTEX 625, of which the axenic
isolate is strain PCC 6301, therefore the identity values for these strains are
not surprising. Strain UTEX 3055 was isolated from the same habitat as strain
PCC 6301 (Waller's Creek, Texas, USA), but over 50 years later. We have been
unable to confirm the origin of strain PCC 7942. Strain PCC 11802 (draft genome
sequence) is not shown in the 16S rRNA tree since the genome lacks 16S rRNA
genes; values of 25.5% isDDH, 82.20% ANI and 98.44% CGS place this strain and PCC
6301 into different species. The draft genome of strain PCC 11801 shows only
52.4% isDDH, 98.47% CGS and
82.19% ANI with strain PCC 6301; the genome of strain PCC 11801 contains only a
short (618 nt) 16S rRNA fragment that shares 99.35% identity with that of
strain PCC 6301 over the comparable region. Since the strain is similar in
genome size to the other members of this cluster (2.69 Mbp to 2.77 Mbp), the
low genome metric values must indicate that it has both duplicated and missing
regions. Strain FACHB-242 is UTEX 625, from which the axenic strain PCC 6301
was derived. Prochlorothrix hollandica strains CALU 1027 and PCC
9006, which are identical isolates held in different culture collections, in
the single species of genus 7.3.1, share 100% rRNA sequence identity. Since the
genome sequence of strain CALU 1027 is incomplete, a DDH value was not computed
and the sequence was omitted from the genome tree.
Two genomes from Synechococcus sp. strains PCC 7502 (complete
chromosome) and PCC 9635 (draft genome) in genus 5.4.1.4, falling
among the "Pseudanabaena" strains, share 99.73% 16S rRNA
sequence identity but only 24.1% isDDH, 79.14% ANI and 96.29% CGS. The genome metrics therefore suggest that the strains are not con-specific. However, the metrics may be artificially low, since the genome of strain PCC 9635 contains extensive duplications, as detected by CheckM, additionally evidenced by the larger genome size (4.81 Mbp versus 3.51 Mbp for strain PCC 7502).
Unlike the situation in Thermosynechococcus described
above (see section on Extremophiles), a second cluster of thermophilic Synechococcus spp. show a
speciation pattern that is coherent with all methods of homology measurement. The
separation of Synechococcus spp. JA-3-3Ab (species 10.1.1A) and
JA-2-3B’a (species 10.1.1B) into two species is justified by their low (96.65%)
16S rRNA sequence identity, isDDH
(44.9%), ANI (84.61%), Mash similarity (89.44%) and CGS (96.19%) values of
their complete chromosome sequences. Strain JA-3-3Ab shares 99.86-100% 16S rRNA
sequence identity and isDDH values of 82.9-93.4% with the draft genomes of 6
other thermophilic members of the same species. Genus 10.1.2 contains only a single strain (S. bigranulatus Rupite) whose metrics can be compared in the ribosomal and genomic trees; this shares 94.39-94.80% 16S rRNA sequence identity, 20.8% isDDH and 76.74-77.12% ANI with the members of genus 10.1.1. All of these strains have a 17 nt deletion in helix 49 of the 16S rRNA sequence. The distance estimates from the
different methods therefore agree for this cluster. This is also true for the Thermoleptolyngbya spp. described above.
Tolypothrix
Of the few genome sequences of strains assigned to the genus Tolypothrix, strain Tolypothrix sp. PCC 9009 (in species 1.14.1E, draft genome) clusters with Hassallia byssoides strain VB512170, also represented by a draft genome sequence, in the same species (99.19% 16S rRNA sequence identity but low isDDH values of
40.9%, 87.14% ANI and 98.34% CGS); note that the latter genome
is contaminated by bacterial DNA and is far larger (13 Mbp) than expected (e.g.
8 Mbp for PCC 9009). Strain PCC 9009 also shows low relationship with Tolypothrix
sp. NIES-4075 in species 1.14.1A (isDDH 38.2%, ANI 88.07%, 98.39% CGS, 98.64% 16S rRNA sequence identity), confirming their assignment to different species. Strain Tolypothrix sp. PCC 7601 (in species 1.8.4A, draft genome) shows only 23.1% isDDH
(not discriminatory in this range), 74.98% ANI (with 38% coverage),
94.42% CGS and 95.80% 16S rRNA sequence identity with strain PCC 9009, confirming
them to be members of different genera. The members of species 1.8.4A, (strains Tolypothrix sp. PCC 7601, represented by a complete and a draft genome sequence, PCC 7712, completed genome, LEGE 11397 and Fremyella diplosiphon NIES-3275, both draft), show 99.85-100% 16S rRNA sequence identity,
54.2-99.7% isDDH, 93.5-99.9% ANI, 99.49-100% CGS and are therefore con-specific, despite
carrying two different generic names. Strain PCC 7601 is separated at the
specific level from Nostoc sp. 106C (species 1.8.4B, draft genome), Calothrix
spp. NIES-2098 (species 1.8.4C, complete chromosome) and NIES-2100 (species
1.8.4D, draft genome) by low (96.8-98.8%) 16S rRNA sequence identity, low isDDH values of 27.5-28.1% (not discriminatory in this range), ANI values of only 80.88-81.16% and CGS 97.35-98.03%. Strain NIES-2100 is separated at the specific level from Tolypothrix sp. FACHB-123 (the single member of species 1.8.4J): 97.87% 16S rRNA sequence identity, 31.1% isDDH, 84.68% ANI, 98.22% CGS. The incomplete genome sequence of Nostoc carneum NIES-2107, containing
only 75 of our standard 79 genes marker set, has been excluded from DDH
comparison and from the genome tree.
Tolypothrix
campylonemoides VB511288 is represented by two versions of the genome sequence. The first (JXCB00000000.1) contains no rrn operon and
cannot be shown in the tree based on 16S rRNA sequences, but does group with
the above strains in the genome tree; one of three 16S rRNA genes of the second
version (JXCB00000000.2) unfortunately shows high similarity to Rhodopseudomonas
spp., the others contain uncharacteristic segments and are impossible to align.
Other strains, previously named as Calothrix,
were studied in wet-lab DDH experiments by Lachance (1981) and assigned to the
genus Tolypothrix; the strains then available formed several clusters,
as we observe in the tree: strains PCC 7101, PCC 7504, PCC 7601 and PCC 7710 (species
1.8.4A of the tree) with strain PCC 7708 (in species 1.8.4B); strains PCC 6305
and PCC 6601 (species 1.8.3A).
Trichormus Trichormus strains are grouped in at least 2 generic clusters
in the 16S rRNA tree (1.1.4 and 1.10.1). Single strains are also found in 1.2.6B
(Sherwood et al., unpub), 1.3.7B (Miscoe et al., 2016) 1.4.1A
(Johansen et al., unpub) 1.12.1B (Miscoe et al., 2016) 1.13.4 (Rajaniemi
et al., 2005a). This makes a total of 7 generic clusters in 7 families. Rajaniemi
et al. (2005a) remarked on the polyphyly of this "genus".
Komárek, J. and Anagnostidis, K. (1989) transferred Anabaena variabilis
(and many other species) into the genus with T. variabilis as type
species. Two genomes
(Trichormus sp. NMC-1, Anabaena sp. PCC 7108, both draft and from
saline habitats) and the 16S rRNA of 13 strains (named as Anabaena sp., Trichormus
sp. and many T. variabilis) fall into genus 1.1.4, mostly isolated from freshwater
habitats. The draft genome of T. variabilis SAG 1403-b also fits into
this genus in the genome tree, but cannot be shown in the 16S rRNA tree. The
genomes share 89.42% ANI, 41.2% isDDH and 97.11% 16S rRNA sequence identity, as
expected from their placement into two species. The con-specificity of Anabaena
sp. strains PCC 6309 and PCC 7122 (species 1.2.9A in the tree) and their
separation from strain PCC 7108 (species 1.1.4B) with only 32% DDH and high (11
°C) ΔTm(e) of the hybrids was
shown by thermal hydroxyapatite elution DDH studies (Lachance, 1981).
Eight genomes of T. variabilis
are in the mono-specific genus 1.10.1; 6 (strains 9RC, ARAD, FSR, N2B, PNB and
V5, from Azolla caroliniana, A. feliculoides, A. pinnata and Azolla sp.,
Pratte and Thiel, 2021) are cyanobionts, 1 (strain 0441) from a thermal
spring and 1 (strain A2) of unspecified origin. Other genomes are: Anabaena sp. YBSO1 (terrestrial); Anabaena
variabilis NIES-23 (no information on habitat); in NCBI as A. variabilis
(TITLE field of flatfile), but Trichormus (ORGANISM field); Nostoc
sp. PCC 7937 (ATCC 29413, T. variabilis; UTEX 1444, A. variabilis)
(freshwater); in NCBI as A. variabilis (TITLE field of flatfile), but Trichormus (ORGANISM field), but the RefSeq changed the TITLE to T. variabilis.
Isolated as A. flos-aquae; received by the PCC from C.P. Wolk as A.
variabilis. Sensitive to cyanophage N1, like PCC 7120 (thus closely
related), but forms hormogonia, unlike PCC 7120; Nostoc sp. PCC 7120 (freshwater); the metagenomes Nostoc sp. Hyl-06-6-4 and Pleu-09-321, both of organisms epiphytic on moss. The genomes of Trichormus sp. strains 9RC, ARAD, FSR, N2B, PNB, V5 and A2 show 100% 16S rRNA sequence identity, 99.9% isDDH, 99.81-99.98 ANI and 99.99-100% CGS with each other and are
also identical to Nostoc sp. 7937, Anabaena sp. YBSO1 and T.
variabilis 0041; they share slightly lower values of 98.92% 16S rRNA
sequence identity with Nostoc sp. 7120, 98.92% with Anabaena variabilis NIES-23 and 98.99% with the identical Hyl-06-6-4 and Pleu-09-321, and are therefore con-specific. This is confirmed by isDDH (63.2-100%), ANI (92.15-99.99%) and CGS (99.50-99.82%) values.
Tychonema
Eight metagenomic sequences each named as Oscillatoriales
cyanobacterium in species 5.3.1.1B share 99.33-100% 16S rRNA sequence identity
and 85.7-99.3% isDDH. Two further metagenomic sequences, in species 5.3.1.1J, have identical 16S rRNA sequences and show 98.1% isDDH, 99.85%
ANI; they separate from the members of the preceding species with 98.84-99.30%
16S rRNA sequence identity, ~27.5% isDDH and ~85% ANI. Species 5.3.1.1B also contains four draft genomes of Tychonema
sp., only three of which are shown in the genome tree, sharing 99.06-100% 16S
rRNA sequence identity, 47.0-65.8% isDDH, 90.37-95.61% ANI and 98.36-99.32% CGS.
The genome metrics therefore place the strains on the borderline of species
separation. These strains share 97.92-98.19% 16S rRNA sequence identity, 27.8-28.3%
isDDH (ambiguous in this range), 82.37-82.69% ANI and 96.38-96.42% CGS with
strain FEM_GT703 and the metagenome B0820 in species 5.3.1.1H, confirming separation
at the species level. Although the two latter genomes show only 98.64% 16S rRNA sequence identity (placing them on the borderline of species separation), they share 96.7% isDDH, 99.62% ANI and 99.73% CGS.
Umezakia
McGregor et al. (2023) demonstrated that Umezakia natans TAC611 shows 99.92% 16S rRNA identity with Chrysosporum ovalisporum strains, and combined the two genera to give U. ovalisporum (type species: Umezakia ovalisporum (Forti) McGregor, Sendall, Niiyama, Tuji & Willis); a reference strain was not given. These strains are in species 1.3.3A of our 16S rRNA phylogeny, together with 10 strains plus one most likely misnamed as U. bergii whose 16S rRNA sequences were obtained by specific PCR. The strains are confirmed as con-specific by values of 99.33-99.97% ANIb, 96.1-99.6% isDDH and 99.95-99.98% CGS. U. ovalisporum FSS62 (McGregor et al.) is not included in the genome tree.
Since the authors found a fastANI value of 83% between the genome of C. bergii strain ANA360D and those of all members of C. ovalisporum, C. bergii was retained as a distinct genus, Chrysosporum. We find a JSpecies ANI value of 83.00-83.25% and place the genome not as a separate genus but as a species (1.3.3B), whose members we have renamed to Umezakia bergii. This is further confirmed by CGS values of 96.10-96.12%, the isDDH values (27.2-27.7%) being ambiguous. The 16S rRNA gene extracted from the genome of C. bergii strain ANA360D falls into the bacterial outgroup of our 16S rRNA phylogeny, and the 23S rRNA gene falls into the U. ovalisporum cluster. If the contaminating regions are large, the real similarity will be higher. In the absence of a 16S rRNA gene, we show in this cluster 8 sequences of strains obtained by specific PCR, including that of ANA360D. These show 98.29-100% 16S rRNA sequence identity.
Species 1.3.3C contains a single strain, the original Umezakia natans TAC101 (AY897614). Unfortunately, this strain was lost, and replaced by isolate TAC611 (Niiyama et al., 2011) now represented by 3 16S rRNA sequences obtained by specific PCR (AB608023, MT566431, MT566432) and that extracted from the genomic sequence of McGregor et al. (JANQDI000000000). The morphological characteristics of strain TAC611 were said to be "in a good agreement with the original description of U. natans." Strain TAC101 is not included in the phylogenetic tree of Niiyama et al. The 16S rRNA sequence of strain TAC101 (only 910 nt) shows 97.58-97.69% 16S rRNA sequence identity with that of strain TAC611, showing that the two strains are not identical, but members of different species. A longer (1418 nt) sequence of strain TAC101 (AF516748) is unfortunately chimeric, and not included in the 16S rRNA tree. Uncultured cyanobacteria
Within genus 5.2.4.9, the 23 metagenomes are identical, with
100% 16S rRNA sequence identity, 93.1-100% isDDH, and separated at the generic level from the sister taxon QS_8_64_29 (90.26% 16S rRNA sequence identity, 19.3% isDDH, 67.57% ANI). These genomes are not shown in our genomic tree.
Species 5.3.1.1E contains two metagenomes (not shown in the genomic tree) named as
Microcoleaceae bacterium UBA9251 and UBA11344; these show 64.0% isDDH and 94.41% ANI, the extracted 16S rRNA genes sharing 99.4% sequence identity. The data therefore confirm the
con-specificity of the uncultured organisms. However, the 16S rRNA sequence
extracted from the UBA11344 genome is short (1115 nt), revealing the fragmented
nature of this genome.
C: Conclusions
There is good agreement between the 16S rRNA sequence identity
and genome metrics obtained for many of the strain clusters. Discrepancies mostly
involve draft genomes and metagenomes; the possible incomplete nature or
contamination of these cannot be excluded.
Using only the 16S
rRNA gene as marker, we can show the incomplete status of many draft genomes, lacking
this gene, including those of Cyanobium sp. CACIAM 14 (JMRP00000000), Prochlorococcus
marinus SCGC AAA795-J16 (CVSX00000000), Prochloron didemni P2-Fiji
(from JGI), Hydrococcus rivularis NIES-593 (MRCB00000000), Hydrocoleum_sp._CS-953
(LGSU00000000), Nostoc calcicola FACHB-389 (MRBZ00000000), Calothrix
elsteri CCALA 953 (NTFS00000000), Calothrix sp. NIES-2101
(MRCD00000000), Phormidium ambiguum NIES-2119 (MRCE00000000), Leptolyngbya
valderiana BDU 20041 (LSYZ00000000), Leptolyngbya sp. JSC-1
(JMKF00000000), Trichodesmium thiebautii H9-4 (LAMW00000000), Anabaena
sp. 39858 (LJOR00000000), Anabaena sp. CRK533 (LJOT00000000), Anabaena
sp. AL93 (LJOU00000000), Cylindrospermopsis sp. CR12 (LMVE00000000), Pseudanabaena
sp. SR411 (NDHW00000000), Microcystis spp. strains LE013-01
(MTBU00000000), LSC13-02 (MTBT00000000), LE3 (MTBS00000000), Cyanobacteria
bacterium UBA791 (DBIX00000000), all of which completely lack the gene.
Additionally, many strains contain only a short fragment of the gene,
including: Crocosphaera watsonii WH 0003 (AESD00000000), Gloeocapsa
sp. PCC 73106 (ALVY00000000), Parasynechococcus sp. CB0101 (ADXL00000000),
Synechocystis sp. PCC 7509 (ALVU00000000), Leptolyngbya sp. Heron
Island (JAWNH00000000) and the Candidatus Synechococcus spongiarum isolates
mentioned above.
Contamination is
evident when at least one copy of the 16S or 23S rRNA gene is that of a
bacterium or of an unrelated cyanobacterium, such as: Anabaena aphanizomenioides
LEGE 00250 (JADEWB000000000), Anabaena sp. MDT14b (LJOV00000000), Aphanocapsa
montana BDHKU210001 (JTJD00000000), Chrysosporum ovalisporum UAM-MAO
(CDHJ00000000), Coleofasciculus sp._LEGE 07081 (JADEWW000000000), Hydrococcus
rivularis NIES-593 (MRCB00000000), Lyngbya confervoides BDU141951
(JTHE00000000), Mastigocladus laminosus UU774 (JXIJ00000000), Nodosilinea
sp. LEGE 07088 (JADEWX000000000), Nodularia sp. LEGE 06071 (JADEWH000000000),
Phormidium tenue NIES-30 (MRCG00000000), Planktothrix mougeotii
LEGE 06226 (JADEWU000000000), Prochlorothrix hollandica CALU 1027
(AJTX02000000), Scytonema tolypothrichoides VB-61278 (JXCA00000000), Tolypothrix
campylonemoides VB511288 (JXCB00000000).
The above examples
use only the rrn operon to demonstrate incomplete status. The CheckM
programme finds additional incomplete genomes, not listed above, for example: Synechococcus
sp. GFB01 (LFEK00000000) 84.81% complete, Aphanizomenon flos-aquae
2012/KM1/D3 (JSDP00000000) 87.52%, Calothrix sp. XPORK 5E (from JGI)
81.69%, unidentified endosymbiont of Epithemia turgida (AP012549)
84.86%, Prochlorococcus sp. HOT208 60m 813E23 (MWOT00000000) 25.86%,
unidentified cyanobacterium 13 1 40CM 2 61 4 (MNHI00000000) 13.79%. We note
that the genome of the Epithemia turgida symbiont) was deposited in NCBI
as complete.
CheckM also finds
contamination in many genomes; where this exceeds 10%, the sequences have been
excluded from the genome tree. These include: Fischerella thermalis
CCMEE 5319 (NMQD00000000), Leptolyngbya valderiana BDU 20041
(LSYZ00000000), Limnoraphis robusta CS-951 (LATL00000000), Microcystis
aeruginosa strain SPC777 (ASZQ00000000), Mastigocoleus testarum
BC008 (AXAQ00000000). Other genomes are heavily contaminated with bacterial DNA
and fall into the bacterial outgroup of genome trees, for example: Aphanocapsa
montana BDHKU210001 (JTJD00000000, version 1), Chrysosporum ovalisporum
UAM-MAO (CDHJ00000000), Cyanobacterium TDX16 (NDGV00000000), Fischerella
ambigua UTEX 1903 (obtained from author), Oscillatoriales cyanobacterium
MTP1 (LNAA00000000), Scytonema millei VB511283 (1 of 2 genomic sequences:
JTJC00000000.1). Full listings are given in the excluded genomes documents.
There are almost certainly other genomes in our dataset that
we are unable to detect as incomplete, and some may contain undetected contaminant
DNA sequences. Other possible causes of conflict include marked differences in
genome size or mol% G+C content, and major or multiple deletions. All of these parameters
will reduce the genome metrics values. Where relatedness estimates based on 16S
rRNA sequence identity and genome metrics diverge dramatically, one or both of
the genomes compared is often in the draft stage.
Leptolyngbya sp. PCC 7376 is a special case worthy of
further investigation; this may be the first clear example of HGT involving the
acquisition, by a filamentous organism, of the 16S rRNA gene from a unicellular
organism (see discussion of species 5.1.7.9, above). The alternative
possibility, that an ancestor was unicellular, is equally plausible.
In addition, the con-specificity of Synechocystis
sp. strains PCC 6714 and PCC 6803(99.60% 16S rRNA sequence identity) is
not confirmed by genome metrics of their complete chromosomal sequences;
although an HGT event involving the ancestors of both groups may have occurred,
further studies are required to confirm this possibility. Also, the planktonic
cyanobacteria of the genera Prochlorococcus, the OMF sister group of Prochlorococcus
and the halophiles show major conflicts between genetic distances obtained with
measurement of 16S rRNA sequence identity and genome metrics. These are
explained in the relevant paragraphs on this page. The two isolates of Gloeobacter
are also unusually divergent; we have no explanation for this except for the
possible contamination of the genome of strain JS1, mentioned above.
We should bear in mind, the statement of Hayashi
Sant’Anna et al. (2019) [where the term "dDDH" means isDDH] "The species circumscription threshold of many genomic metrics such as ANI and dDDH was
defined using the wet-lab DNA-DNA hybridization criterion as reference, which
in turn was defined by observing phenotypic coherence of strains.
Paradoxically, along the years studies exploring bacterial diversity showed
that genomic cohesion is not always accompanied by phenotypic homogeneity.
Therefore, it is clear that genomic metrics are not a panacea, and they could
not be appropriate in some cases, particularly when circumscribing a
genomospecies (species defined based on genome data), whose members have
undergone ecological diversification and substantial genome modifications (e.g.
reduction), forming subgroups that could be interpreted as distinct species".
D: References.
Achaz, G., Boyer,
F., Rocha, E.P.C., Viari, A. and Coissac, E. (2006). Repseek, a tool to retrieve approximate repeats
from large DNA sequences. Bioinformatics 23: 119-121.
Aguilera,
A., Gómez, E.B., Katovský, J., Echenique, R.O. and Salerno, G.L. (2018). The polyphasic analysis of two native Raphidiopsis isolates
supports the unification of the genera Raphidiopsis and Cylindrospermopsis
(Nostocales, Cyanobacteria). Phycologia
57: 130–146.
Bacchetta, T., López-García, P., Gutiérrez-Preciado, A., Mehta, N., Skouri-Panet, F., Benzerara, K., Ciobanu, M., Yubuki, N., Tavera, R. and Moreira, D. (2023). Description of Gloeomargarita ahousahtiae sp. nov. (Gloeomargaritales), a thermophilic cyanobacterium with intracellular carbonate inclusions. European Journal of Phycology. https://doi.org/10.1080/09670262.2023.2216257
Cheng,
Y.-I., Chou, L., Chiu, Y.-F., Hsueh, H.-T., Kuo, C.-H. and Chu, H.A. (2020). Comparative Genomic Analysis of a Novel Strain of Taiwan Hot-Spring
Cyanobacterium Thermosynechococcus sp. CL-1. Frontiers in Microbiology 11: article 82.
Chisholm, S.W., Frankel, S.L.,
Goericke, R., Olson, R.J., Palenik, B., Waterbury, J.B., West-Johnsrud, L. and
Zettler, E.R. (1992).Prochlorococcus marinus
nov. gen. nov. sp.: An oxyphototrophic marine prokaryote containing divinyl
chlorophyll a and b. Arch Microbiol
157: 297–300.
Coutinho, F., Tschoeke, D.A., Thompson, F. and Thompson, C.
(2016a). Comparative genomics of Synechococcus and proposal of the new
genus Parasynechococcus. PeerJ 4: e1522.
Coutinho, F., Dutilh, B., Thompson, C. and Thompson, F.
(2016b). Proposal of fifteen new species of Parasynechococcus based on genomic, physiological
and ecological features. Arch. Microbiol.
198: 973–986.
Dreher,
T.W., Davis, E.W. and Mueller, R.S. (2021a).
Complete genomes derived by directly sequencing freshwater bloom populations
emphasize the significance of the genus level ADA clade within the Nostocales. Harmful Algae 103: 102005.
Dreher,
T.W., Davis, E.W., Mueller, R.S. and Otten, T.G. (2021b). Comparative genomics of the ADA clade within the Nostocales. Harmful Algae 104: 102037.
Driscoll C.B., Meyer K.A., Šulčius
S, Brown N.M., Dick G.J. and 8 other authors (2018). A closely-related clade of globally distributed bloom-forming
cyanobacteria within the Nostocales. Harmful
Algae 77: 93–107.
Gaget, V., Welker, M., Rippka, R. and
Tandeau de Marsac, N. (2015). A polyphasic approach
leading to the revision of the genus Planktothrix (Cyanobacteria) and its
type species, P. agardhii, and proposal for integrating the emended valid
Botanical taxa, as well as three new species, Planktothrix paucivesiculata
sp. nov.ICNP, Planktothrix tepida sp. nov.ICNP, and Planktothrix serta
sp. nov.ICNP, as genus and species names with nomenclatural standing under the
ICNP. Systematic and Applied Microbiology
38: 141–58.
Gagunashvili,
A.N. and Andrésson, O.S. (2018). Distinctive characters of Nostoc
genomes in cyanolichens. BMC Genomics 19: Article number 434.
Giovannoni,
S.J., Cameron Thrash,J. and Temperton, B.
(2014). Implications of streamlining theory for microbial ecology. The ISME Journal 8: 1553–1565.
Grettenberger, C.L. (2021). Novel Gloeobacterales spp. from Diverse Environments across the Globe. MSphere 6: 4.
Grettenberger,
C.L., Sumner, D.Y., Wall, K., Brown, C.T., Eisen, J.A. and 4 other authors
(2020). A
phylogenetically novel cyanobacterium most closely related to Gloeobacter.
The ISME Journal; 14: 2142-2152.
Hayashi Sant’Anna, F.,
Bach, E., Porto, R.Z., Guella, F., Hayashi Sant’Anna, E., and Passaglia, L.M.P.
(2019). Genomic metrics made easy: what to do and
where to go in the new era of bacterial taxonomy. Critical Reviews in Microbiology 45:1 82–200.
Humbert, J.-F., Barbe, V.,
Latifi, A., Gugger, M., Calteau, A., and 9 other authors (2013). A Tribute to Disorder in the Genome of the Bloom-Forming Freshwater
Cyanobacterium Microcystis aeruginosa. PLoS ONE 8: e70747.
Jain, C., Rodriguez, R.L.M., Phillippy, A.M.,
Konstantinidis, K.T. and Aluru, S. (2018).
High throughput ANI analysis of 90K prokaryotic genomes reveals clear species
boundaries. Nature Communications 9:
5114.
Jung, P., D’Agostino, P.M., Brust, K., Büdel, B. and Lakatos, M. (2021). Final Destination? Pinpointing Hyella disjuncta sp. nov. PCC 6712 (Cyanobacteria) Based on Taxonomic Aspects, Multicellularity, Nitrogen Fixation and Biosynthetic Gene Clusters. Life 11: 916.
Komárek, J. and Anagnostidis, K. (1989). Modern approach to the classification system of cyanophytes. 4 -
Nostocales. Arch Hydrobiol Suppl. 82:
247–345.
Komárek, J., Johansen, J.R., Šmarda, J. and Strunecký, O.
(2020). Phylogeny and taxonomy of Synechococcus-like
cyanobacteria. Fottea 20: 171–191.
Konstantinou, D., Popin, R.V., Fewer,
D.P., Sivonen, K. and Gkelis, S. (2021). Genome
Reduction and Secondary Metabolism of the Marine Sponge-Associated
Cyanobacterium Leptothoe. Marine Drugs 19: 298.
Kopf, M., Klahn, S., Pade, N.,
Weingartner, C., Hagemann, M., Voss, B. and Hess, W.R. (2014a). Comparative genome analysis of the closely related Synechocystis
strains PCC 6714 and PCC 6803. DNA
Research 21: 255–266.
Kopf,
M., Klahn, S., Voss, B., Stuber, K., Huettel, B., Reinhardt, R. and Hess, W.R. (2014b). Finished Genome Sequence of the Unicellular Cyanobacterium Synechocystis
sp. Strain PCC 6714. Genome Announcements
2: e00757-14.
Lachance, M.A. (1981). Genetic relatedness of heterocystous Cyanobacteria by
Deoxyribonucleic Acid-Deoxyribonucleic Acid reassociation. International Journal of Systematic Bacteriology 31: 139–147.
Laloui, W., Palinska, K.A., Rippka, R.,
Partensky, F., Tandeau de Marsac, N., Herdman, M. and Iteman, I. (2002). Genotyping of axenic and non-axenic isolates of the genus Prochlorococcus
and the OMF-′Synechococcus′ clade by size, sequence analyses or RFLP of
the Internal Transcribed Spacer of the ribosomal operon.Microbiology 148: 453–465.
Leao, T., Castelão, G., Korobeynikov,
A., Monroe, E.A., Podell, S., Glukhov, E., Allen, E.E., Gerwick, W.H. and
Gerwick, L. (2017). Comparative genomics uncovers
the prolific and distinctive metabolic potential of the cyanobacterial genus Moorea.
Proceedings of the National Academy of Sciences 114: 3198–3203.
Li, S.-J., Hua, Z.-S., Huang, L.-N., Li J., Shi, S.-H.
and 5 other authors (2014). Microbial communities evolve faster in extreme environments. Scientific Reports 4: 1-9.
Li, X., Huang, Y. and Whitman,
W.B. (2015). The relationship of the whole genome
sequence identity to DNA hybridization varies between genera of prokaryotes. Antonie van Leeuwenhoek 107: 241–249.
Luz, R., Cordeiro, R., Katovský, J., Fonseca, A., Urbatzka, R., Vasconcelos, V. and Gonçalves, V. (2023). Description of Pseudocalidococcus azoricus gen. sp. nov. (Thermosynechococcaceae, Cyanobacteria), a Rare but Widely Distributed Coccoid Cyanobacteria. Diversity 15: 1157.
McGregor, G.B., Sendall, B.C., Niiyama, Y., Tuji, A. and Willis, A. (2023).Chrysosporum ovalisporum is synonymous with the true‐branching cyanobacterium Umezakia natans (Nostocales/Aphanizomenonaceae). Journal of Phycology 59: 326-341.
Mikhodyuk, O.S., Zavarzin, G.A. and
Ivanovsky, R.N. (2008). Transport systems for
carbonate in the extremely natronophilic cyanobacterium Euhalothece sp. Microbiology 77: 412–418.
Miscoe, L.H., Johansen, J.R., Kociolek, J.P., Pietrasiak, N., Sherwood, A.R. and
Vaccarino, M.A. (2016). Diatom flora and
cyanobacteria from caves on Kauai, Hawaii. Bibliotheca
Phycologica 120: 3-152.
Nakamura, Y., Kaneko, T., Sato, S., Mimuro, M., Miyashita, H. and 14 other authors
(2003). Complete genome structure of Gloeobacter
violaceus PCC 7421, a cyanobacterium that lacks thylakoids. DNA Research 10: 137–145.
Nelson, J.M., Hauser, D.A., Gudiño,
J.A., Guadalupe, Y.A., Meeks, J.C., Allen, N.S., Villarreal, J.C. and Li, F.-W.
(2019). Complete Genomes of Symbiotic Cyanobacteria
Clarify the Evolution of Vanadium-Nitrogenase. Genome Biol. Evol. 11:
1959–1964.
Niiyama, Y., Tuji, A. and Tsujimura, S. (2011).Umezakia natans M.Watan. does not belong to Stigonemataceae but to Nostocaceae. Fottea 11: 163–169.
Ondov,
B.D., Treangen, T.J., Melsted, P., Mallonee, A.B., Bergman, N.H., Koren, S. and
Phillippy, A.M. (2016). Mash: fast genome and
metagenome distance estimation using MinHash. Genome Biology 17: 132.
Oren, A. (2015). Cyanobacteria in hypersaline environments: biodiversity and
physiological properties. Biodiversity Conservation 24: 781–798.
Österholm,
J., Popin, R.V., Fewer, D.P. and Sivonen, K. (2020). Phylogenomic Analysis of Secondary Metabolism in the Toxic
Cyanobacterial Genera Anabaena, Dolichospermum and Aphanizomenon.
Toxins 212: 248.
Ozer, E.A., Allen, J.P. and Hauser A.R. (2014).
Characterization of the core and accessory genomes of Pseudomonas aeruginosa
using bioinformatic tools Spine and AGEnt. BMC Genomics 15: 737.
Palmer,
M., Steenkamp, E.T., Blom J., Hedlund, B.P. and Venter, S.N. (2020). All ANIs are not created equal: implications for prokaryotic
species boundaries and integration of ANIs into polyphasic taxonomy. International Journal of Systematic and Evolutionary Microbiology. 70: 2937-2948.
Parks, D.H., Imelfort, M., Skennerton,
C.T., Hugenholtz, P. and Tyson, G.W. (2015). CheckM:
assessing the quality of microbial genomes recovered from isolates, single cells,
and metagenomes. Genome Research 25: 1043–1055.
Pessi, I.S., Popin, R.V., Durieu, B., Lara, Y., Tytgat, B., Savaglia, V., Roncero-Ramos, B., Hultman, J., Verleyen, E., Vyverman, W. and Wilmotte, A. (2023). Novel diversity of polar Cyanobacteria revealed by genome-resolved metagenomics. Microbial Genomics 9: 001056.
Pratte, B.S. and Thiel, T. (2021). Comparative genomic insights into culturable symbiotic cyanobacteria from the water fern Azolla. Microbial Genomics 7:
000595.
Rahmatpour, N., Hauser, D.A., Nelson,
J.M., Chen, P.Y., Villarreal, J.C., Ho, M.-Y. and Li, F.-W. (2021). A novel thylakoid-less isolate fills a billion-year gap in the
evolution of Cyanobacteria. Current Biology 31: 857-2867.e4.
Rajaniemi, P., Hrouzek, P., Kastovska, K., Willame, R., Rantala, A. and 3 other authors
(2005a). Phylogenetic and morphological evaluation
of the genera Anabaena, Aphanizomenon, Trichormus and Nostoc
(Nostocales, Cyanobacteria). International
Journal of Systematic and Evolutionary Microbiology 55: 11–26.
Rajaniemi, P., Komárek, J., Willame,
R., Hrouzek, P., Katovská, K., Hoffmann, L. and Sivonen, K. (2005). Taxonomic consequences from the combined molecular and phenotype evaluation
of selected Anabaena and Aphanizomenon strains. Algological Studies 117: 371–391.
Ran, L., Larsson, J., Vigil-Stenman, T., Nylander, J.A.A., Ininbergs, K. and
5 other authors (2010). Genome erosion in a nitrogen-fixing vertically transmitted endosymbiotic multicellular Cyanobacterium. PLoS ONE 5: e11486.
Richter, M. and Rosselló-Móra, R. (2009). Shifting the genomic gold standard for the prokaryotic species definition. Proceedings of the National Academy of
Sciences 106: 19126–19131.
Rippka, R., Deruelles, J., Waterbury, J.B.,
Herdman, M. and Stanier, R.Y. (1979). Generic assignments,
strain histories and properties of pure cultures of Cyanobacteria. Journal of general Microbiology 111: 1–61.
Salazar,
V.W., Tschoeke, D.A., Swings, J., Cosenza, C.A., Mattoso, M., Thompson, C.C.
and Thompson, F.L. (2020). A new genomic taxonomy system for the Synechococcus collective. Environmental Microbiology 22: 4557–4570.
Saw, J.H.W., Schatz, M., Brown, M.V., Kunkel, D.D., Foster, J.S. and 6 other authors
(2013). Cultivation and Complete Genome Sequencing
of Gloeobacter kilaueensis sp. nov., from a Lava Cave in Kilauea
Caldera, Hawai′i. PLoS ONE 8: e76376.
Saw, J.H., Cardona, T. and Montejano,
G. (2021). Complete genome sequencing of a novel Gloeobacter
species from a waterfall cave in Mexico. Genome Biology and Evolution 13: evab 264.
Stackebrandt, E. and Goebel, B. M. (1994). Taxonomic note: A place for DNA-DNA
reassociation and 16S rRNA sequence analysis in the present species definition
in bacteriology. Int J Syst Bacteriol. 44: 846–849.
Stanojkovič, A., Skoupýý, S., Škaloud, P. and Dvořák, P. (2022). High genomic differentiation and limited gene flow indicate recent cryptic speciation within the genus Laspinema (cyanobacteria). Frontiers in Microbiology 13: 977454.
Stucken, K., John, U., Cembella, A.,
Murillo, A.A., Soto-Liebe, K. and 5 other authors (2010). The smallest known genomes of multicellular and toxic
Cyanobacteria:comparison, minimal gene sets for linked traits and the evolutionary
implications. PLoS ONE 5: e9235.
Thompson, C.C., Silva, G.G.Z., Vieira,
N.M., Edwards, R., Vicente, A.C.P. and Thompson, F.L. (2013). Genomic Taxonomy of the Genus Prochlorococcus. Microbial Ecology 66: 752–762.
Tschoeke, D.,
Salazar, V.W., Vidal, L., Campeão, M., Swings, J., Thompson, F. and Thompson,
C. (2020). Unlocking the Genomic Taxonomy of the Prochlorococcus
Collective. Microbial Ecology 80:
546–558.
Walter,
J.M., Coutinho, F.H., Dutilh, B.E., Swings J., Thompson, F.L. and Thompson,
C.C. (2017).
Ecogenomics and Taxonomy of Cyanobacteria Phylum. Frontiers in Microbiology 8: 2132.
Walter, J.M., Coutinho, F.H., Leomil, L., Hargreaves, P.I.,
Campeão, M.E. and 13 other authors (2020). Ecogenomics of the Marine Benthic Filamentous Cyanobacterium Adonisia. Microbial Ecology 80: 249-265.
Ward, R.D., Stajich, J.E., Johansen, J.R., Huntemann, M., Clum, A. and 16 other
authors (2021). Metagenome
Sequencing to Explore Phylogenomics of Terrestrial Cyanobacteria. Microbiology Resource Announcements 10: e00258-21.
Wayne, L.G., Brenner, D.J., Colwell, R.R.,
Grimont, P.A.D., Kandler, P., Krichevsky, M.I., Moore, L.H. and 6 other authors (1987). Report of the Ad Hoc Committee on
Reconciliation of Approaches to Bacterial Systematics. Int. J. Syst. Bacteriol. 37: 463–464.
Xu, T., Qin, S., Hu, Y., Song, Z.,
Ying, J., Li, P., Dong, W., Zhao, F., Yang, H., and Bao, Q. (2016). Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira
platensis: high genome plasticity and genetic diversity. DNA Research
23: 325-338.
Yarza,
P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.-H.,
Whitman, W.B., Uzébey, J., Amann, R. and Rosselló-Móra, R. (2014). Uniting the classification of cultured and uncultured bacteria and archaea
using 16S rRNA gene sequences. Nature Reviews Microbiology 12: 635-645.
Zeng, Y. (2021). Metagenome-assembled genomes from the Lille Firn glacier at the
Villum Research Station in northeast Greenland. Unpublished
Zhou, Z., Liu, Y., Xu, W., Pan, J.,
Luo, Z.H. and Li, M.(2020). Genome- and Community-Level
Interaction Insights into Carbon Utilization and Element Cycling Functions of
Hydrothermarchaeota in Hydrothermal Sediment. mSystems 5, e00795-19.