cyanobacterial phylogeny

Open the browser search box (press F3 or simultaneously Ctrl+f).

for navigation, then for reading.
You may also consult the genome metrics table:


A: Methods
B: 16S rRNA sequence identities and genomic identities C: Conclusions
D: References
cyanobacterial phylogeny

The clusters, based on 16S rRNA identities and
compared with genome metrics:
in silico DNA hybridization (isDDH), Mash-similarity,
average nucleotide identity (ANI), conserved marker genes (CGS)

A: Methods.

Although genome sequences are still rather few for cyanobacteria, and many draft genomes do not contain 16S rRNA genes, we can correlate data derived from four methods of genome analysis with 16S rRNA identities within 16S-based clusters which contain more than one genome sequence. We briefly describe some of the results below.

The following 16S rRNA identity cutoff values were employed to define genera and species: members of the same species show greater than 98.70 %-99.0 % 16S rRNA sequence identity; members of a single genus share greater than 96.50-96.90 % 16S rRNA sequence identity. The latter value is higher than that (94.5%) proposed by Yarza et al. (2014) from a detailed study of many bacterial phyla; however, the latter study seemingly included few cyanobacteria, since it was based only on the type strains of bacteria and archaea validly published under the rules of the ICNP. Inter-specific values lie between the lower limit of the intra-specific range and the upper limit of the genus cutoff.

For genome sequences, isDDH values were calculated with the DSMZ GGDC 2.1 server. GGDC uses 3 formulae for calculation of DDH, of which we employ formula 2 (sum of identities in HSPs/total HSP length) for incomplete (draft) genomes and formula 3 (sum of identities in HSPs/total genome length) for completed chromosomes; the latter is equivalent to a wet-lab hybridization, except that plasmid sequences that are not integrated into the chromosome are absent. Unless otherwise stated, the values calculated by formula 2 are given, since there are relatively few (about 10% of the total) complete chromosomal sequences, required by formula 3. However, formula 2 may, especially around the cutoff values, give results that are higher than the true value, since only the positive identities are given, in the absence of correction for genome size. The accepted cutoff value for species distinction by wet-lab DDH is an arbitrary 70% (Wayne et al, 1987) and is employed by the isDDH method. However, the 70% value was derived from 16S rRNA-based phylogenies of the type species of a broad range of bacterial phyla, with the Cyanobacteria largely neglected; there is no proof that this is a good discriminatory value in individual phyla.

Data from wet-lab DDH studies are included where available; if thermal elution of the hybridization products from hydroxyapatite was employed, two values (% relative binding and Tm(e)) are obligatory, where a shift of 1-2.2% °C in the mean elution temperature (Tm(e)) represents a 1% base mismatch in the hybrid (see Stackebrandt & Goebel, 1994).

Where appropriate, genomic sequences were further studied in CheckM for degree of completeness and contamination (Parks et al., 2015). The JSpecies web server was employed for ANI (Average Nucleotide Identity) values; the inconvenience of uploading large genomes to the server can be overcome by copying the genomes of interest to a local folder, then zipping them into a single file with the command "zip -r ." (note that the space and dot in the expression are mandatory; the name "cyano" is only an example) from within the folder, then uploading the zipfile to the server. The input genomes are deflated by 70% and therefore upload more rapidly. Other available ANI calculation methods include fastANI, which uses Mash in place of BLAST (Jain et al., 2018) and OAU, which employs USEARCH in place of BLAST. Since both identity values and length of homologous fragments differ slightly with different methods (see also Palmer et al., 2020), it is wise to use the same method throughout. The accepted cutoff value for species demarcation with ANI is around 96% (Richter and Rosselló-Móra, 2009), based largely on the 70% DDH value. However, the ANI method evaluates only the similarity of elements shared between two genomes, does not yield information for regions that are not shared, and is not corrected for total genome size. In consequence the results, like those of formula 2 of the GGDC for isDDH, may be artifactually high.

Neither DDH nor ANI are normally used to distinguish taxa at the inter-specific level. being usually employed to decide only if two taxa can be assigned to the same species. We show below that both appear to efficiently discriminate at the inter-specific level in most comparisons, even if they rarely give conflicting results for the reasons described above. Further studies involved the use of (a) the alignment-free Mash (Ondov et al., 2016) and (b) our set of 79 conserved marker genes (CGS), which clearly does not measure the similarities of the entire genomes. Both of these are described on the Main page of this site. The Mash measurements are presented as distance matrices by default, and are presented here as uncorrected similarity values. For the CGS estimates, we also employ the uncorrected values throughout. These were obtained by building a tree without applying a correction, then converting the finished genomic tree to a distance matrix by using cophenetic.phylo of the R software package ape. For both types of matrix, a single custom script was then employed to extract the distance value for pairwise combinations of taxa, converting this to a similarity value. We have attempted to correlate the results obtained by these different methods in the following figure, where pairwise comparisons of taxa assigned to the same species on the basis of 16S rRNA sequence identity are shown as blue symbols, those assigned to different species of a single genus are represented by red symbols and separate genera by green. Draft genomes and metagenomes have been excluded from these calculations. The same dataset was employed for all methods. We emphasize that the 16S rRNA gene sequences are those extracted (by our custom shell script) from the genomes of interest, and not those obtained by normal PCR amplifications, to avoid any errors. Overall, the results confirm the value of our 16S rRNA-based cutoff values, since the three categories of taxa are clearly separated:

cyanobacterial phylogeny

However, the isDDH values calculated with formula 3 (top left) fail to discriminate between taxa of different genera. Using our set of 79 conserved marker proteins (CGS, bottom right) greatly increases the sensitivity for taxa of different genera, but compresses the intra-specific taxa. The results obtained from analysis of genome similarity with Mash (top right) and ANI (bottom left) appear to overcome all biases of the other methods and correlate reasonably well with the 16S rRNA identity estimates. We stress the use of only complete chromosomes in performing these analyses; draft genomes by definition may contain duplicated regions, or lack segments, thus obscuring the results. This can be easily demonstrated by artificially duplicating or deleting a segment of a draft genome; if the isDDH value obtained with the genome falls just above or below the inter-species or generic cutoff value, the value obtained with the artificial constructs gives an incorrect position. The minimum and maximum percentage similarity values obtained from our dataset with each method are summarized in the following table, together with a minimum value obtained by comparison of organisms from the top and bottom of the tree.

cyanobacterial phylogeny

The gap in the region of species separation is caused by the paucity of available complete chromosome sequences; results which fall into this region are treated as borderline cases throughout this document. In addition to isDDH with formula 3 (f3), Mash-similarity, ANI and CGS, the table contains the isDDH ranges obtained with the same dataset using formula 2 (f2). Notice that the isDDH results obtained with the two formulae are not identical, and that there is a marked overlap of the formula 2 values around the level of generic separation, indicated in red in the table. This is clearly shown in the following figure (left panel).

cyanobacterial phylogeny

We would have expected similar results for isDDH values calculated with the two formulae; that this is not the case is shown in their plot in the figure (right panel). This is the result of the different corrections applied in each formula. That formula 2 results are not reliable at low values is shown by the observed minimum (see above) of 26.1%, which falls into the overlap region. For these reasons, we give preference to the formula 3 results (calculated as total identities divided by genome size) for complete chromosomes throughout this document, considering that the formula 2 (identities divided by HSP length) estimates are only approximate. We are forced to use the latter for draft genomes, but results obtained by all other methods should be considered as having priority.

We report Mash similarity data only for complete chromosomes since this method searches for kmers and is unlikely to be valid for draft genomes, where kmers are possibly broken at the end of contigs. In contrast, CGS results are valid for all genomes.

For many taxa we calculated the number of repeat sequences in the genome with RepSeek with a seed length of 25.

Members of some genera (as defined by 16S rRNA sequence identity) do not fit into the above scheme. These, described below in Section B1, include the following:

B: 16S rRNA sequence identities and genomic identities

Note that we are unable to make this comparison for many incomplete genomes or those which lack, or contain only short fragments of, the 16S rRNA gene.

B1: Clusters for which metrics are not congruent

B1a Picocyanobacteria

B1a1: the OMF clade

Within the "picocyanobacteria" (genus 7.1.1), for which we use the term "OMF" (Oceanic, Marine, Freshwater, as defined by Laloui et al., 2002), we show 62 genomic sequences; only 17 of the genomes are complete, and 8 are metagenomes. The complete chromosomes show a range of DNA base composition of 52.45 to 67.98 mol% G+C and vary in size from 2.11 to 2.98 Mbp. Genome sequences are available for only 18 of the 32 species defined by 16S rRNA sequence identity. Comparison of genome metrics obtained with complete chromosomes within individual specific clusters is possibly only for two species (7.1.1T, 7.1.1Z).

Species 7.1.1T contains two strains represented by complete chromosomes: LTW-R and CB0101; 2.42 and 2.79 Mbp, 62.56 and 64.06 mol% GC, respectively. Values of 26.2% isDDH (GGDC formula 3), 76.44% ANI and 95.05% CGS place these strains on the borderline between different species and genera, despite their 16S rRNA sequence identity of 98.7%.

The five complete chromosomes in species 7.1.1Z, with 54.2-61.4 mol% G+C and size 2.11-2.59 Mbp, show Mash similarity 76.26-92.29%, CGS 76.26-98.56%, isDDH 16.6-66.9% (GGDC formula 3) and ANI 71.79-87.58%. They are therefore not con-specific. Strains CC9605 and KORDI-52 are distinguished by medium values of Mash similarity (89.86%), CGS (98.56%), isDDH (53.9% with GGDC formula 3) and ANI (87.58%); strain WH 8109 shows Mash similarity 89.95%, CGS 98.39%, isDDH 54.5% (GGDC formula 3) and ANI 87.96% with KORDI-52. The genome metrics show that the latter 3 strains are members of different species, the others being assigned to a different genus.

The inclusion of draft genomes permits the comparison of two pairs of Cyanobium spp. strains. Isolates PCC 7001 and NIES-981 share 98.78% 16S rRNA sequence identity and are therefore borderline members of a single species (7.1.1O). However, values of 32.4% isDDH, 86.80% ANI and 97.29% CGS assign them to distinct species. Strains PCC 6307 and Copco Reservoir LC18 (7.1.1J) are identical in 16S rRNA sequence, their con-specificity being confirmed by 50.5% isDDH and 92.61% ANI values.

In contrast, inter-specific comparisons may be performed with complete chromosomes for 11 of the defined clusters of the OMF clade. Mash similarities of 72.61-78.50%, CGS 86.85-95.29%, isDDH 13.6-17.2% (GGDC formula 3), ANI 69.06-74.20% show that all are different genera.

The species not described above (7.1.1C, G, P, V, X, Y, AA and AC), defined on the basis of 16S rRNA sequence identities, are all also poorly supported by genome metrics.

Coutinho et al. (2016a) compared the genomes of 24 Synechococcus strains with six methods: ANI, AAI (Average Amino acid Identity), dinucleotide signature, isDDH, 16S rRNA identity and MLSA (Multi-locus sequence analysis) with genes rrsA, gyrB, pyrH, recA and rpoB. The authors assigned the generic name Parasynechococcus to 15 strains (the sister group of Prochlorococcus), with WH8102 as type strain. Although published under the rules of the ICNP, the strains are not axenic, and are therefore invalid under that code. The genus was also invalid under the rules of the ICN, since a type species was not named. However, Komárek et al. (2020) validated the genus, and we have accepted this in all other phylogenetic trees on this site. Subsequently, Coutinho et al. (2016b) assigned specific names to each of the 15 strains. These specific names are shown in RED in the figure below, but note that some of the generic names were later changed by Walter et al. (2017); these have been retained in the same colour in the figure. Walter et al. (2017) produced a tree of 100 genomes, using the 31 marker genes of Wu & Eisen, and validated the circumscription of species and genera with measurements of isDDH and AAI values. Strains were considered to belong to the same species when they share at least 98.8% 16S rRNA sequence identity, 95% AAI and 70% isDDH, the cut-offs for species delimitation being ≥95% AAI and ≥70% isDDH. The AAI values are similar to the ANI values that we use throughout this page: e.g. 70-90% AAI within Parasynechococcus, vs 71.8-91.2% ANI; 63-85 AAI vs 70.8-90.8% ANI for Pseudosynechococcus; between "genera" AAI 59-62 %, ANI ~69.1-71.0%. AAI values ≥70% were employed for genus delimitation. The new generic names were Pseudosynechococcus, Magnicoccus, Regnicoccus and Inmanicoccus; none of these are validly published. Salazar et al. (2020) markedly expanded the coverage of the OMF group by adding a large number of genomic sequences, shown in VIOLET in the figure. These authors also added the generic name Synechospongium to replace candidatus Synechococcus spongiarum (16S rRNA cluster 7.1.5), as shown in GREEN in the figure. Unfortunately, these authors included a large number of incomplete genomes of this cluster, which we have excluded from our genome trees. Synechococcus lacustris strain Tous was renamed to Lacustricoccus sp., marked in the same colour. The generic names Synechospongium and Lacustricoccus are again invalidly

cyanobacterial phylogeny

This figure was extracted from our genomic tree, with the 16S rRNA cluster numbers added as they appear in the 16S rRNA tree.
Strain PCC 6301 was employed as outgroup.

published. Since there is little correlation between similarity values provided by 16S rRNA analysis with genome metrics, there is clearly an unusual mode of evolution within this group. This is also true for the Prochlorococcus sister group, described below. The members of both clusters may be considered as being among the most abundant photoautotrophs in the oceans, having achieved buoyancy by marked reduction in cell size. However, the genomes of most members of the Prochlorococcus group appear to have undergone extensive streamlining (reduction in size), which is less evident in the OMF cluster. Although we do not know the true ancestor of these groups, the sizes of the complete chromosomes of related strains vary from 2.66-3.77 Mbp, 48.52-60.24 mol% G+C (Synechococcus spp. strains PCC 6312, PCC 6715, JA-2-3B'a, JA3-3Ab), whereas the OMF cluster strains contain genomes of size 2.11-2.98 Mbp, 52.45-67.98 mol% G+C. Most of the completed chromosomes of strains of the OMF group contain two copies of the rrn operon, like many other unicellular cyanobacteria, and therefore extensive genome reduction has not occurred. Unlike strains of the Microcystis and Moorena clusters described below, we find few repeat sequences (39-467) in members of the OMF clade, insufficient to effect the genome metrics. Again unlike the situation in Microcystis, the core genomes prepared from members of the OMF clade show relatively unchanged ANI values, implying that the evolutionary changes have occurred in all parts of the genome.

Outside of the OMF cluster, Walter et al. (2017) renamed many taxa, in many cases unwisely; for example, seemingly based only on their clustering in the tree and ignoring the 16S rRNA identity values and genome similarity data, Cyanobacterium stanieri PCC 7202 was transferred to the genus Geminocystis (represented by strain PCC 6308), even though these isolates share only 93.52-94.54% 16S rRNA identity, 72.87% ANI and 13.8% isDDH (GGDC formula 3), and Halothece sp. PCC 7418 was transferred to the genus Dactylococcopsis (PCC 8305), when the strains involved share only 96.26% 16S rRNA identity, 77.93% ANI and 16.2% isDDH (GGDC formula 3). Since these values are representative of distinct genera, such transfers are invalid

B1a2: Prochlorococcus

Prochlorococcus spp. strains (genus 7.1.3) are divided into 5 species in the 16S rRNA tree on the basis of 16S rRNA sequence identity (within species, around 99.7%; between species 97.3-98.5%). The genus Prochlorococcus was validated under the rules of the ICN by Komárek et al. (2020), having been previously invalidly published by Chisholm et al. (1992). The latter authors designated strain CCMP1375 as the reference strain; Komárek et al. (2020) changed this to strain MIT 9313, on the grounds that "the original reference below the 98.7% threshold in comparison to all other strains. Although we are forced to accept this change, our results show that strain MIT 9313 is less related than CCMP1375 to the others. This is also evident from the two phylogenetic trees shown by these authors: of the 8 isolates included in the tree inferred from 16S rRNA sequences, 5 fall into species 7.1.3A, the exceptions being strain 0801 (species 7.1.3B) and the two reference strains which appear in species 7.1.3C and 7.1.3E. The phylogenomic tree of these authors includes only 6 of the 8 strains. Strain MIT 9313 (50.74 mol% G+C, 2.41 Mbp in genome size) is a member of the group of strains containing large genomes of high mol% G+C (see below), whereas CCMP1375 has a genome with 36.44 mol% G+C and 1.75 Mbp, thereby being more representative of the genus. Most members possess small genomes (approx. 1.6 Mbp) of low (30-38) mol% G+C and a single copy of the rrn operon; members of species 7.1.3E have larger genomes (around 2.6 Mbp) of higher (around 50) mol% G+C with two copies of the operon, as calculated from the only two complete chromosomes representing this group (including the new reference strain). With such large differences, particularly in mol% G+C content, it is pointless to perform isDDH between members of species 7.1.3E and the others. The tree of complete plus draft genomes, based on our set of 79 core marker genes, contains genomes of 54 Prochlorococcus strains (14 complete), as shown in the figure below (where the Prochlorococcus clade has been extracted from the genome tree, and the species cluster numbers are derived from those of the 16S rRNA tree). The members of species 7.1.3A represented by complete chromosome sequences show a minimum of 99.12% 16S rRNA sequence identity. However, again using only the complete chromosomes, 2 strains (CCMP1986 and MIT9515, sharing 63.4% isDDH [GGDC formula 3] and therefore probably representing two different species) form a sub-clade of species 7.1.3A, rather than being included within the clade; this is supported by their isDDH values of 41.7 to 48.8% with the other strains of species 7.1.3A, which themselves show 68.3-82.5% isDDH and are bracketed in the figure as "DDH 1". isDDH analysis with the 3 complete chromosomes of species 7.1.3B confirms the similarity of strains NATL1A, NATL2A and MIT0801 (sharing a minimum isDDH value of 80.7%) and is thus in agreement with the results of 16S rRNA study ("DDH 2" in the figure). Unfortunately, there are insufficient complete chromosomal sequences available for members of species 7.1.3C and 7.1.3D to permit isDDH analysis within each species, but the single complete chromosomes of strains CCMP 1375 and MIT 9211 show only 15.9% isDDH, Mash similarity 76.74%, ANI 72.3% and CGS 87.80%, placing them into two distinct genera. Since clades 7.1.3A to 7.1.3D (as defined by 16S rRNA similarity values) show longer branches in the genomic tree than in the 16S rRNA sequence-based tree, increasing the number of genes employed for the analysis results in increased resolution. Clade 7.1.3E is therefore separated more clearly from the others here than in the 16S rRNA-based tree; strains MIT 9303 and MIT 9313, represented by complete chromosomes of greater size and higher mol% G+C content than all other "species", exhibit 78.3% isDDH, 95.84% Mash similarity, 94.83% ANI and 98.89% CGS, thus representing a single species ("DDH 3"). The reference strain designated by Komárek et al. (2020) is not, therefore, representative of the majority of strains of the genus, as mentioned above. In addition, these authors claim that strain MIT 9313 was isolated from the Sargasso Sea, whereas it was found in the N. Atlantic.

cyanobacterial phylogeny

#: not present in 16S rRNA tree (containing only fragments of the 16S rRNA gene).
The genome of strain MIT9202 is draft, erroneously included by Thompson et al.

In summary, the genome metrics further divide "species" 7.1.3A into two species, maintain cluster 7.1.3B as a single species and define "species" 7.1.3C and 7.1.3D as distinct genera. The only generic and specific epithets appended, P. marinus, are employed for all strains; it is evident that extensive renaming at both taxonomic levels is required. An important step in this direction was made by Thompson et al. (2013), who proposed division of the genus into 10 species based on isDDH and ANI analysis of complete chromosomes; these species have not been validated under the ICN. The proposed specific names are shown in parentheses and highlighted in red in the figure; note that 2 complete chromosomes (marked in black) have more recently become available. Our more extended isDDH/ANI studies, including draft genomes, show that many more species can be distinguished by ANI values, indicated in the figure as ANIb 1 to ANIb 25. There is no effect of repeated sequences on species circumscription by ANI in this genus, since the strains of species 7.1.3A to 7.1.3D contain only low numbers (37-190) of these elements and even the strains with larger genomes (species 7.1.3E) contain only 425-533; ANI values of the core and accessory genomes are unchanged. The necessity of whole-genome sequencing to achieve a taxonomic scheme, rather than relying on a 16S rRNA-based phylogeny, is an open question: the strains of clusters 7.1.3A to 7.1.3D have undergone extensive genome streamlining (see e.g. Giovannoni et al., 2014; Sun & Blanchard, 2014, and references cited), and different lineages may have lost and/or gained different genomic regions (see Humbert et al., 2013, and references therein) making comparisons difficult and perhaps explaining the apparent discrepancies between genome metrics in defining species boundaries. Li et al. (2015) suggested that ANI and DDH values vary in different genera and should be replaced by the sequence identity of homologous genes as the criterion for delineation of species; this is exactly our approach for inference of genomic phylogenetic trees.

Thompson et al. employed GGDC formula 2 (identities/HSP length), which is equivalent to ANI, whereas we have used formula 3 (identities/total genome length), which is more appropriate for complete chromosomes. ANI, if the analysis is performed with default parameters, splits species established by isDDH (formula 3) into further specific clusters (as shown in our tree, above) and would appear to be inappropriate. It is possible to show this by decreasing the length setting in JSpecies, or the fraglen or Kmer size settings in fastANI, to a point where the ANI values match those of isDDH. isDDH does not suffer from this artifact. If we consider only the isDDH values obtained with formula 3, the species P. chisholmii, P. ponticus, P. neptunius and P. nereus proposed by Thompson et al. become a single species; the six remaining species appear to be validly delineated. Walter et al. (2017), including only 13 strains in their tree, further divided the genus Prochlorococcus into four genera, their names being shown after the "/" in the figure. None of these genera are valid under the ICN. Eurycolium is equivalent to clusters DDH 1 plus ANIb 13 in the figure, which together form 16S rRNA group 7.1.3A; Prolificoccus represents DDH group 2 and 16S rRNA cluster 7.1.3B; Prochlorococcus was retained as two species (equivalent to ANIb groups 18 and 21, 16S rRNA clusters 7.1.3C and 7.1.3D); Thaumococcus (including only the two P. swingsii strains described above, forms ANIb clusters 22 and 25, 16S rRNA group 7.1.3E). Tschoeke et al. (2020), with a tree of 208 genomes created with an unspecified 249 marker gene set, expanded further the studies of Walter et al. (2017). These authors proposed the creation of a new genus, Riococcus, and greatly increased the coverage of this group by the addition of many draft genomes; their suggestions are shown in violet in the figure. None of the nomenclatural changes were validly published under the rules of the ICN.

The 5 clusters shown in the figure are distinguished by genomic GC content as summarized in the table below, where we employ the generic names proposed by Walter et al. Genome size, in contrast, is not discriminatory for the clusters of low mol% G+C content, but in many cases is based on the added lengths of the multiple contigs of draft genomes.

cyanobacterial phylogeny

B1b: Synechocystis spp.

Although genus (“Synechocystis”) is divided into 6 species on the basis of 16S rRNA sequence identity, genome sequences have been obtained for only 16 isolates, most being members of species; five (IPPAS B-1465, PCC 6714, PCC 6803, PCC 7338 and PCC 7339) are completed. Complete chromosomal sequences are additionally available for sub-strains of PCC 6803; note that strain IPPAS B-1465 may also be considered as a sub-strain of PCC 6803. The complete genomes share 98.85-100% 16S rRNA sequence identity. Synechocystis sp PCC 7509 is unusual, lying in genus 3.4.2 of the 16S rRNA tree and grouping closely with Chlorogloea sp. CCALA 695 in the genome tree; this strain is not further considered here. The 16S rRNA of strain PCC 6803 is identical to that of strains FACHB-898, FACHB-908, FACHB-929 and IPPAS B-1465; the five strains are confirmed as con-specific by high isDDH, ANI and CGS values (99.9-100%, 99.86-99.88% and 100%, respectively). Strain CACIAM 05 (closed genome) is also a member of this species, showing 99.66% 16S rRNA identity, 88.6% isDDH (formula 3), 96.56% ANI and 99.55% CGS with strain PCC 6803. However, the apparent placement of strain FACHB-383 in this species (based on 99.39% 16S rRNA sequence identity with strain PCC 6803) is not confirmed by the low values of isDDH (32.1%) and ANI (86.22%); although isDDH should be considered as non-discriminatory in this range, the value of 98.48% CGS supports this specific separation. Strain FACHB-383 groups at the specific level with 3 other strains (LEGE 00031. LEGE 00041, LEGE 06083), as shown by isDDH values of 77.7-78.2%, ANI 97.24-97.39% and CGS 99.58-99.62%. S. salina strain LEGE 06155 is separated from the above species by values of 40.4-41.0% isDDH, 89.90-90.19% ANI and 98.36-99.05% CGS, although the latter value falls into the borderline zone. Strains LEGE 06099, PCC 7338 and PCC 7339 are con-specific, as shown by values of 60.0-98.9% isDDH, 94.64-99.41% ANI and 99.30-99.96% CGS; the higher values of the ranges of genome metrics are exhibited by strains PCC 7338 and PCC 7339. The 3 strains show only low values of all genome metrics to the species described above. The con-specificity of Synechocystis sp. strains PCC 6714 and PCC 6803 (99.60% 16S rRNA sequence identity) is not confirmed by isDDH of their complete genome sequences (47.5%; GGDC formula 3). This low isDDH value is supported by a low ANI value of 84.16%, Mash similarity 87.44% and CGS 98.05%. All genome metrics, therefore, place strains PCC 6714 and PCC 6803 into two distinct species, giving five species in total, as shown in the figure below, rooted with Snowella sp. ULC335.

cyanobacterial phylogeny

This figure was extracted from our genomic tree;
strains represented by complete chromosomes are coloured red.

The separation of strains PCC 6714 and PCC 6803 is surprising. Their chromosomes are similar in mol% GC (47.73-47.81) and size (3.49-3.57 Mbp), with strain PCC 6803 having the larger genome of lower mol% G+C; they are both axenic and have similar morphological and physiological characteristics (Rippka et al., 1979). Kopf et al. (2014a) described extensive differences between the complete chromosome of Synechocystis sp. PCC 6803 and the draft genome of Synechocystis sp. PCC 6714, later confirmed with the complete chromosome of the latter strain (Kopf et al. 2014b).

The strains were shown to share 2838 protein-coding genes, with 845 unique genes in Synechocystis sp. PCC 6803 and 895 unique genes in Synechocystis sp. PCC 6714; the strains further differed in a different composition of the pool of transposable elements and the presence of a prophage in the genome of strain PCC 6714.

cyanobacterial phylogeny

We can show rearrangements by changes in position of many of our 79 core marker genes (see figure above), although these alone should have no effect on isDDH values. It is interesting that most of the regions missing from the PCC 6714 chromosome (figure below, outer ring) are occupied by areas that diverge greatly in base composition (second black ring) from most parts of the genome of strain PCC 6308.

cyanobacterial phylogeny

The figure below compares the graphical output of fastANI for pairwise comparisons (left: strain PCC 6803 with IPPAS B-1465, right: PCC 6803 with PCC 6714). The red lines indicate regions of identity between a pair of genomes. Rearrangements and low similarity between strains PCC 6803 and PCC 6714 are evidenced by the slanted lines and their low density compared to the comparison between two identical strains on the left. Unlike the situation in the genus Microcystis described below, neither the core genomes nor the accessory genomes of strains PCC 6714 and PCC 6803 show significant change in ANI values; the repeat sequences (2764, 1008 and 1026 per genome for strains PCC 6714, PCC 6803 and IPPAS B-1465, respectively) are too few to markedly change the low ANI and isDDH values of the first two strains. Interestingly, the number of paired repeats in the substrains of PCC 6803, maintained in different laboratories, shows little change, ranging from 1008-1034.

cyanobacterial phylogeny cyanobacterial phylogeny

B1c: Planktothrix spp.

Within genus, genomic sequences of Planktothrix spp. are available for 6 of the 10 species in the tree, some described in detail by Gaget et al. (2015), and only two (of strains NIES-204 and PCC 7805) are complete. The genomes of the 14 strains assigned to P. agardhii, P. rubescens, P. prolifica and Planktothrix sp. in species cannot be distinguished by 16S rRNA sequence identity (99.05-100%), isDDH, which gives values of 70.3-89.9% (GGDC formula 2), ANI (95.57-99.92%) or CGS (99.22-99.74%); this cluster therefore corresponds to a single species, which we propose as P. agardhii. The cohesion of species is further confirmed by isDDH of the 2 complete chromosomes (91.8%, GGDC formula 3), an ANI value of 98.67%, Mash similarity 99.11% and CGS 99.90%. The 2 members of species (P. paucivesiculata) whose genome sequences are available share 99.49% 16S rRNA sequence identity. However, they show a relatively low (43.1%) isDDH value, only 90.41% ANI and 98.70% CGS, suggesting them to be members of different species. There is therefore a conflict in the distance estimates obtained with these draft genomes by the different methods. The two strains differ from the first species by 16S rRNA sequence identities of 97.91% and 98.12%, isDDH values of 44.5%, 88.88-90.36% ANI and 98.55-98.69% CGS. Within species, the inclusion of further strains gives internal values of 98.74-99.51% 16S rRNA sequence identity. The genomic sequence of strain P. tepida PCC 9214 is sufficiently distinct from all others (isDDH values 27.4-32.6%, ANI 82.1-82.28% and 96.10-96.29% CGS) to support the assignment of this strain to a separate species (; the two 16S rRNA sequences extracted from this genome share 99.20% identity. Species (P. serta) contains 2 strains, and a single genome sequence is available (strain PCC 8927). This shows only 29.1-32.6% isDDH (ANI 83.63-86.11%) with members of species, and, with which the strain shares 96.93-97.42% 16S rRNA sequence identity. However, strains FACHB-1365, LEGE 06226 and PCC 9214, assigned to 3 different species in the 16S rRNA tree, are con-specific on the basis of genome metrics (55.2-65.5% isDDH, 93.41-98.13% ANI and 98.81-99.46% CGS); these strains cluster together in the genomic tree, clearly separated from the other members of the genus. The available methods of distance estimation therefore give conflicting results for the genus Planktothrix, with the exception of species Note that the genomes differ dramatically in size, the two complete genomes being  4.70 and 4.78 Mbp, the draft genomes ranging from 5.97 to 6.73 Mbp, and that a second 16S rRNA gene extracted from strain LEGE 06226 falls into the bacterial outgroup. In view of these uncertainties, we await further genomic sequences for clarification.

B1d: Sphaerospermopsis

The draft genomic sequences of two axenic strains of Sphaerospermopsis, S. reniformis NIES-1949 and S. kisseleviana NIES-73, together with that of the non-axenic Sphaerospermopsis sp. LEGE 00249 (species 1.2.3A, which also contains many other members of this genus, not represented by genomic sequences), despite their different specific epithets, are shown to be con-specific by values of 74.8-83.2% isDDH, 97.05-98.21% ANI, 99.60-99.68% CGS and 98.70-99.87% identity of their extracted 16S rRNA sequences. Other strains of Sphaerospermopsis, also represented by draft genomic sequences but not known to be axenic, are Sphaerospermopsis sp. LEGE 08334 (species 1.2.3D) and S. aphanizomenoides BCCUSP55 (species 1.2.5C). Strain LEGE 08334 shares 53.9-54.3% isDDH, 93.48-93.77% ANI and 99.12-99.20% CGS with the 3 members of the first species, therefore being a member of that species despite low 16S rRNA identity. Analysis of the circular chromosome of S. torques-reginae ITEP-024 (species 1.2.3B)shows that this strain is con-specific with the members of species 1.2.3A, sharing 52.7-52.8% isDDH, 93.25-93.52% ANI. The genome of strain BCCUSP55 is incomplete, containing only 77 of our 79 core marker genes (the other genomes representing this genus all contain 79); this is also shown by the apparent genome size, measured for this draft genome as the sum of contig lengths, of 4.31 Mbp versus 5.35-6.03 Mbp for the two other strains, and the CheckM estimate of only 90.7% completeness. The genome metrics values obtained with this strain (28.4-29.1% isDDH, 83.1-83.3% ANI, 96.35-96.41% CGS with the members of species 1.2.3A; 28.3% isDDH, 83.25% ANI, 96.40% CGS with strain LEGE 08334) suggest that it is a member of the same genus as the other members, but the 16S rRNA identities (95.86-96.24%) place it into a different genus. The final member of the Sphaerospermopsis cluster is a strain named as Anabaena aphanizomenioides LEGE 00250; the genome metrics (isDDH 82.0%, ANIb 97.67%, CGS 99.70% with strain NIES-1949) clearly place both strains into the same species. Although separated into 2 genera on the basis of 16S rRNA sequence identity, genome metrics suggest that the strains of Sphaerospermopsis lie together in a single generic cluster with several species.

B1e: the extremophiles

In genus, the 16S rRNA genes extracted from the complete chromosomes of Euhalothece natronophila Z-M001 (species and Halothece sp. PCC 7418 (species, together with the draft metagenome of "Cyanobacteria bacterium" GSL.Bin1 (species, from alkaline or hypersaline habitats, share 96.81-97.52% sequence identity and show low isDDH and ANI values (~15% and ~75%, respectively). Although the 16S rRNA identity value of 97.31% between the strains represented by complete chromosomal sequences suggests them to belong to two species of the same genus, the low values with isDDH (15.1%, GGDC formula 3), ANI (75.76%), Mash (78.42%) and CGS (93.79%) suggest them to belong to different genera. The separation of the halophilic strains Dactylococcopsis salina PCC 8305 and Halothece sp. PCC 7418 into 2 genera ( and, justified by low (96.36%) 16S rRNA sequence identity, is accompanied by slightly higher isDDH (16.2%), ANI (77.93%), Mash similarity (81.62%) and CGS (94.44%) values of their complete chromosomes, which place them on the borderline between species and genus. Monospecific genus contains 16S rRNA genes extracted from the draft genomes of strain Sodalinema stali (previously Oscillatoria or Phormidium) sp. HE10JO (only one of four being shown), S. gerasimenkoae IPPAS B-353 (only one of two are shown), Geitlerinema sp. P-1104 (a single gene) and the metagenomes labelled as Cyanobacteria bacterium T3Sed10_304 and Phormidium sp. CSSed162cmB_426, all from hypersaline or alkaline environments. The 2 metagenomes are incomplete, containing respectively only 72 and 63 of our 79-gene core marker set, and have been excluded from the genome tree and calculation of genome metrics. The 16S rRNA sequences from the first three genomes share 99.33-100% identity, but the genomes show only 81.74-87.72% ANI and 97.69-98.53% CGS (the values of 28.2-35.1% isDDH should be considered as ambiguous in this region), placing them in separate species of the genus. These rapid changes of the genomes (in whole or part) of all of the extremophiles described above, with little change in 16S rRNA identity, suggest a mechanism of evolution in which some or many gene products have become adapted to the stress imposed in hypersaline environments.

Organisms which tolerate lower salt concentrations are widespread in the 16S rRNA tree. Of over 250, few genomic sequences are available. In all cases, the 16S rRNA sequence identities and genome metrics are congruent; these organisms are described in the appropriate paragraphs below.

Like the halophiles, the thermophilic members Thermosynechococcus vestitus strains BP-1 and PKUAC-SCTE542, T. vulcanus strain NIES-2134, Thermosynechococcus sp. strains CL-1, HN-54, NK55a and TA-1, together with 3 metagenomes (only 2 [M3746_W2019_013 and M46_R2017_013] are shown in the trees), all isolated from thermal springs, share 99.06-100% 16S rRNA sequence identity (members of the pairs BP-1 and/NIES-2134 and NK55a/M3746_W2019_013 being identical), and are therefore members of the same species (species 8.1.1A) based only on this comparison. All except the metagenomes are represented by completed sequences. The genomes are all similar in size (2.52-2.71 Mbp) and mol% G+C (53.1-53.9). Genome analysis by isDDH confirms the con-specificity of all strains (64.4-78.6% isDDH with GGDC formula 3 for complete chromosomes). Strains BP-1, NK55a and NIES-2134 are confirmed as con-specific by values of 99.09% ANI, 94.02-99.34% Mash similarity and 99.51-99.87% CGS. Strains CL-1 and TA-1 are also con-specific, with 96.06% ANI, 97.56% Mash similarity and 99.27% CGS. However, the remaining strains show only 85.98-91.75% ANI, 88.27-92.21% Mash similarity, 96.94-98.54% CGS with strains BP-1, NK55a and NIES-2134. They show among themselves 86.29-90.34% ANI, 88.26-92.38% Mash similarity, 97.68-98.70% CGS, thus each behaving as an individual species in contrast to the con-specificity suggested by the 16S rRNA sequence analysis. The metagenome M46_R2017_013 also behaves as an individual species with all genome metrics. Similar conclusions for some of these strains were reported by Cheng et al. (2020). Strains BP-1 and NIES-2134 both show low BLAST query coverage with the marker recN, whereas the other strains contain a full-length copy of this gene, as described on the main page of this site. The complete chromosome of Synechococcus lividus PCC 6715 (species 8.1.1B; renamed to Thermosynechococcus sp. by Komárek et al., 2020), also isolated from a thermal spring, is similar in size (2.66 Mbp) and genomic G+C content (53.5 mol%) to the Thermosynechococcus strains listed above. It shows a range of 98.72-98.86% 16S rRNA sequence identity with all Thermosynechococcus strains, but only 16.3-17.0% isDDH (GGDC formula 3); these values are on the borderline of discrimination between species and genera. The results with ANI (74.76-75.19%), Mash similarity (78.02-78.77%) and CGS (93.62-94.18%) clearly place strain PCC 6715 into a different genus. This genus also contains strains PCC 6716 and PCC 6717, both represented by draft genomic sequences. The three strains are co-identic on the basis of 16S rRNA sequence identity, isDDH (86.3-93.7%), ANI (98.39-99.22%) and CGS (99.83-99.91%). The conflicts between 16S rRNA sequence identities and genome metrics in genus 8.1.1 perhaps indicate an adaptation to thermal stress of many gene products.

Other strains of thermal origin, such as Leptolyngbya sp. O-77, Thermoleptolyngbya sp. PKUAC-SCTA121 and PKUAC-SCTA183 (species 6.2.8A) and Synechococcus spp. JA-3-3Ab (species 10.1.1A) and JA-2-3B’a (species 10.1.1B), described in Section B3 below, show speciation patterns that are coherent when measured with all available methods.

Li et al. (2014) have commented on the increased rates of evolution in microbial communities of extreme environments, and a thorough review of cyanobacterial halophiles was provided by Oren (2015).

B1f: Nostoc spp. (part)

Although most clusters of Nostoc spp. are congruent in terms of 16S rRNA identities and genome metrics (see below, several clusters are problematic.

Two strains of the type species, N. commune (species 1.8.1L), NIES-4072 (with 3 identical rrn operons) and HK-02 (NIES-2114, with 4 identical rrn operons), are identical in 16S rRNA sequence and their draft genomes show 99.8% isDDH, 99.99% ANI (reported for 99% of the genome) and 100% CGS, confirming their con-specificity. They share only 97.92% rRNA sequence identity, 49.7-51.0% isDDH and 91.16-91.82% ANI with strain Nostoc flagelliforme CCNUN1 which, on the basis of 16S rRNA identity, lies in a separate species (species 1.8.1AD). The genome metrics, however, are conflicting: isDDH (with GGDC formula 2) suggests con-specificity of the 3 strains (although we emphasize our concerns regarding the use of formula 2); ANI places the strains on the borderline between species. The ANI value seems to be artificially high, since only the identities of around 60% of the genome are reported in comparisons involving strain CCNUN1. The CGS results are also confusing, since the three strains are combined into a single species with values of 99.38-100%. Strain CCNUN1 is represented by a complete chromosome sequence (CP024785) with six 16S rRNA genes but only 5 rrn operons (rrnA-rrnE) whose almost completely identical (99.8-100%) 16S rRNA sequences fall into species 1.8.1AD (whose members show 99.8-100% 16S rRNA sequence identity); only one (rrnA) is shown in the tree based on 16S rRNA sequences. The additional 16S rRNA sequence (labelled as "(no operon)" in place of an operon designation) falls into species 1.8.1AC.

The symbiotic 'Nostoc azollae' strain 0708 was described by Ran et al., (2010). The quotation marks are as shown in the NCBI entry, and presumably indicate that the authors were unsure of the identity of this organism. This is in NCBI as 'Nostoc azollae' (TITLE field of flatfile), but Trichormus (ORGANISM field). In a tree of 10 taxa extracted from a larger genomic tree built with 476 marker genes and 53 organisms, the authors show strain 0708 clustering with Cylindrospermopsis. This clustering is similar to our own, if we omit all intervening sequences. The complete chromosome contains 4 identical copies of the 16S rRNA gene and lies in species 1.2.5A of the 16S rRNA tree. Strain 0708 is separated at the generic level from Sphaerospermopsis strains NIES-1949, NIES-73 and LEGE 00249 on the basis of a mean value of 96.24% 16S rRNA sequence identity; however, the genome metrics (81.24-81.45% ANI and borderline 95.52-95.58% CGS) suggest separation at the specific, not generic, level. Values of 25.5-31.5% isDDH give no useful information. A logical conclusion would be that, based on the genome metrics, the organism Nostoc azollae strain 0708 is a member of the genus Sphaerospermopsis. The organism was collected as an extracellular cyanobiont of Azolla filiculoides and in terms of taxonomy has a mixed history, being otherwise known as Anabaena azollae and Trichormus azollae, but is clearly different from all other symbiotic Nostoc strains, most of which lie in genus 1.8.1, as described below. The genome is also the smallest (5.35 Mbp) of all Nostoc strains in our database, which range from 6.33-8.42 Mbp, measured for complete chromosomes only, and is identical in size to the "nearly complete" genome of S. kisseleviana NIES-73 (5.35 Mbp). However, N. azollae strain 0708, although suggested to separate at the generic level from Anabaena sp. PCC 7108 and Trichormus sp. strain NMC-1 in genus 1.1.4, sharing 95.92% 16S rRNA sequence identity with both, appears to be separated only at the specific level from these strains by genome metrics (80.95-81.45% ANI, 95.56-95.75% CGS (borderline); the isDDH values of 25.7-26.3% do not permit discrimination within this range)

Since strains NMC-1 and PCC 7108 were isolated from saline habitats, Sphaerospermopsis spp. from freshwater and strain 0708 is an obligate cyanobiont, it is unlikely that this can be explained by some degree of convergent evolution in the genomes of these genera. Strain 0708 produces motile, small-celled hormogonia (never described for Sphaerospermopsis spp. or the two members of genus 1.1.4), which permit the formation of symbiosis. Strain 0708 cannot be a member of two distinct genera, as implied by the genome metrics. The genome appears to be in a state of erosion (Ran et al., 2010) and contains a large number of pseudogenes. The latter genomic evolutionary process may greatly influence measures of similarity.

Nostoc strains UCD121 and UCD122, both cyanobionts, lie in species 1.8.1A of the 16S rRNA tree. This con-specificity is confirmed by values of 99.88-99.90% ANI, 99.3-99.4% isDDH and 100% CGS of their draft genomes, which include strain UCD120, not shown in the 16S rRNA tree. They show 80.95-81.49% ANI with Nostoc sp 2RC, Nostoc sp Moss 3 and N. linkia strain z1 in genus 1.8.2 (all cyanobionts). The isDDH values are 27.3-27.6%, the CGS values are 96.33-96.34% and the 16S rRNA identities are 95.16-95.23%. isDDH is not discriminatory in this range of values, but the ANI and CGS values both contradict the generic separation of these strains. However, strains UCD120, UCD121 and UCD122 show 75.91-75.95% ANI, 22.2-22.4% isDDH, 94.70-94.71% CGS and 93.81% 16S rRNA sequence identity with the symbiotic strain Nostoc sp TLC 26-01 in species 1.11.3A; separation at the generic level is therefore confirmed by both ANI and CGS. They are also separated at the generic level from the cyanobiont 'Nostoc sp' strain 0708 (species 1.2.5A, 75.54-75.65% ANI, 93.62-93.63% CGS) and the freshwater strain Nostoc sp. PCC 7937 (ATCC 29413, 76.16-76.24% ANI, 94.73-94.74% CGS) in genus 1.10.1. Note that Nostoc sp. PCC 7937 is incorrectly listed as A. variabilis in NCBI (deposited by an author who received it from ATCC under that name), and has been changed to Trichormus in the ORGANISM field of the NCBI flatfile).

The three Peltigera cyanobionts (Nostoc sp. strains 213, 232 and N6) are con-specific with other members of species 1.8.1A described below, if comparison is based on their ranges of 98.63-99.33% 16S rRNA sequence identity and 99.18-99.56% CGS. However, they show unexpectedly low isDDH (40.6-40.9%) and ANI (87.77-88.43%) values, which conflict with their con-specificity.

B1g: Chlorogloeopsis

The two draft genomes of Chlorogloeopsis fritschii strain PCC 6912 and the draft genome of strain C. fritschii PCC 9212 (monospecific genus 1.15.3) give high (99.8%) isDDH values, ANI values of 99.96% and 100% CGS, as expected from their 16S rRNA sequence identity of 100%. The strain represented by the metagenome C. fritschii C084 (lacking the 16S rRNA gene) is a member of the same species, as shown by values of 81.3% isDDH, 97.2% ANI and 99.76% CGS with the above. The C. fritschii strains are only distantly related to the unidentified HTF (High Temperature Form) PCC 7702 in genus 1.15.2 (23.1% isDDH, 77.99-78.97% ANI, 92.54% 16S rRNA identity). The isDDH value may be ambiguous in this range, but the ANI value and 95.42% CGS clearly indicate that these strains are members of distinct species, whereas the 16S rRNA identity value places them into different genera. Organisms of the HTF type are often incorrectly named as Chlorogloeopsis sp. in the literature. Similar results were obtained by Lachance (1981) in wet-lab thermal hydroxyapatite elution DDH studies for C. fritschii strains PCC 6718, PCC 6912 and PCC 9212 in genus 1.15.3 (93% DDH, no change in ΔTm(e); our unpublished results (Raum, Herdman & Rippka) using optical thermal renaturation DDH again show the con-specificity of strains PCC 6912 and PCC 9212 (100% DDH) and that of unidentified HTF strains PCC 7517, PCC 7518, PCC 7519 and PCC 7702 (86-100% DDH) in genus 1.15.2.

B1h: Calothrix spp. (part)

The two genomes representing species 1.15.15D, Calothrix parasitica NIES-267 (containing 3 identical rrn operons) and Rivularia sp. PCC 7116 (also 3 identical rrn operons), are similar in size (8.95 and 8.70 Mbp, respectively); they show only 82.23% ANI (measured over 54% of the genome) and 97.39% CGS, which places them as members of two distinct species. The isDDH formula 2 value of 26.9% is not discriminatory in this range. However, their extracted 16S rRNA gene sequences exhibit 99.66% identity. Genome metrics and 16S rRNA identity values therefore disagree in this cluster. Other clusters of Calothrix spp., for which all similarity values agree, are described below.

B1i: Crocosphaera (part)

Within species, the complete chromosome of Crocosphaera subtropica ATCC 51142 (previously Cyanothece sp.) shows an isDDH value of 98.0%, 99.99% ANI (100% 16S rRNA sequence identity) with that of Crocosphaera subtropica ATCC 51472 (draft); these two strains are therefore clearly con-specific. However, strain ATCC 51142 shows only 33.5-34.8% isDDH, 86.71-87.15% ANI and 98.02-98.25% CGS with strains Cyanothece sp. BG0011 (draft), Crocosphaera chwakensis CCY0110 (draft) and the unidentified unicellular cyanobacterium SU3 (draft). The latter strains are each assigned to different species by all genome metrics (34.8-38.6% isDDH, 87.51-88.76% ANI and 98.23-98.44% CGS). On the basis of genome metrics, the single species delineated by 16S rRNA analysis should therefore be divided into four. In contrast, members of Crocosphaera species (below) appear to be supported as con-specific by all similarity measures.

B1j: Pseudanabaena

Genus contains 15 strains for which both 16S rRNA and genome sequences are jointly available (Pseudanabaena spp. ABRG5-3 (completed chromosome), 0153, BC1403, GIHE-NHR1, lw0831, PCC 7429, ULC066, UWO310, UWO311 and Roaring Creek (all draft) and the metagenomic sequences Pseudanabaena sp. Salubria-1, ULC068 (incomplete), M53BS1SP1A06MG and ULC187, together with the draft genome of Limnothrix redekeii PCC 9416). Species (as defined by 16S rRNA sequence identity) is divided into 4 species by genome metrics. The first contains strains PCC 7429, Roaring Creek and UWO310 (sharing 98.85-100% 16S rRNA sequence identity, 57.4-63.5% isDDH, 94.18-94.88% ANI and 99.49-99.57% CGS). The second species comprises L. redekeii strain PCC 9416 and the metagenome Pseudanabaena sp. Salubria-1, sharing 86.5% isDDH, 98.12% ANI and 99.91% CGS. The third contains the single strain ULC066, showing 25.2-25.7% is DDH, 78.62-79.89% ANI, 97.17-97.43% CGS with the 2 genomospecies described above; the single member of the fourth (strain OWO311) shares 25.4-40.9% isDDH, 79.7-89.% ANI, 97.71-99.03% CGS with the first 3 species. The 4 species are well separated by genome metrics (24.9-26.3% isDDH, 78.62-79.50% ANI and 97.07-97.21% CGS). Strains 0153 (species, ABRG5-3 (, GIHE-NHR1 and the metagenome M53BS1SP1A06MG (, BC1403 and the metagenome ULC068 (, excluded from the genome tree), the metagenome ULC187 ( clearly fall into 5 additional species of this genus, as shown in the 16S rRNA tree, with their low 16S rRNA sequence identities being confirmed by isDDH (23.7-34.3%), ANI (76.32-78.12%) and CGS (96.18-98.98%) values. However, strain lw0831, (species of the 16S rRNA tree) shares only 40.7% isDDH, 89.19% ANI and 98.98% CGS with BC1403, thereby being assignable to a different species.

Many other genomes are available for the Pseudanabaena cluster (e.g. strain SR411, the metagenomes UBA1465, JADBBY000000000-JADBWZ000000000); these are all incomplete or lack a complete rrn operon. We note that the genome of Limnothrix redekeii PCC 9416 may be composed of parts of two different organisms (as warned by GGDC in calculating the isDDH values). Note that other strains assigned to the genus "Pseudanabaena" fall into many different generic clusters in addition to genus (1.3.7,,,, 6.1.1, 6.1.4, 6.1.5, 6.1.8) on the basis of 16S rRNA sequence identity; the single member of genus 1.3.7 is presumably a het- mutant. Strains named as Limnothrix are also dispersed in the ribosomal tree: genera, and, with most being found in species; the 4 strains represented by genomic sequences in the latter species are described below.

B1k: "Leptolyngbya" (part)

Species 6.1.1C contains two draft genomic sequences, Phormidium tenue NIES-30 and Leptolyngbya sp. ULC186, that share 98.92% 16S rRNA sequence identity, 87.35% ANI and 98.53% CGS. The isDDH value of 34.0% obtained with GGDC formula 2 is not discriminatory in this region. They should therefore be considered as members of different species. Within genus 6.1.10, 2 of the 6 species contain uniquely the genome sequences of strains of Leptolyngbya; Leptolyngbya sp. BC1307 (species 6.1.10D) and L. foveolarum ULC129 (species 6.1.10E) are defined as members of different species in sharing 97.24% 16S rRNA sequence identity. However, the results obtained with the genome metric methods applicable to their draft genomes suggest them to be members of two genera: 21.7% isDDH (again ambiguous in this region), 73.49% ANI and 93.54% CGS. The extracted 16S rRNA gene of Synechococcus sp. PCC 7335 (species 6.1.10B) shares 97.51% sequence identity with that of Leptolyngbya sp. BC1307 and 97.04% sequence identity with that of L. foveolarum ULC129. Their draft genomes show only 19.1-19.7% isDDH (again ambiguous) and 71.4-73.49% ANI, suggesting their assignment to different genera. However, the CGS values (92.85-93.31%) define them as members of a single genus. Other genomic sequences of Leptolyngbya spp. and related genera, showing congruent results with all methods, are described below.

B1L: single strain

B1L1: Leptolyngbya sp. PCC 7376
Species contains the filamentous Leptolyngbya sp. PCC 7376, which shares 97.51% rRNA sequence identity with, for example, Picosynechococcus sp. strain PCC 7002 in species, suggesting their assignment to distinct species of the same genus. That this is unlikely is shown by their different genome properties: mean DNA base composition 43.87 mol% and 49.17 mol%, respectively; genome size 5.13 Mbp and 3.43 Mbp. The isDDH value for their completed chromosomes is only 13.9% (GGDC formula 3), with only 71.91% ANI, 93.21% CGS and 73.77% Mash similarity, placing them as members of different genera. Another filamentous organism, Limnothrix rosea strain IAM M-220 (NIES-208), also in species, is represented by a draft genome sequence; this strain is closely related to Leptolyngbya sp. PCC 7376 on the basis of both 16S rRNA sequence identity (99.33%) and 23S rRNA sequence identity (99.03%); however, their full genomes share only 21.4% isDDH, 77.39% ANI and 97.02% CGS, assigning them to two distinct species. The filamentous members of this genus have therefore retained high 16S rRNA sequence identity but have undergone more rapid divergence in other parts of the genome, each in different regions. These differences between unicellular and filamentous members may have arisen by either of two scenarios: (1) an HGT event in which a filamentous ancestor of the Leptolyngbya and Limnothrix strains acquired the rrn operon of a unicellular organism, followed by extensive genomic evolution; (2) the ancestor was unicellular, evolving in morphology and gene content while retaining the rrn operon.

B2: Most clusters are supported by all metrics

Isolates known as Anabaena, Aphanizomenon, Dolichospermum and Cuspidothrix lie in genus 1.1.1. Despite their major morphological differences and generic names, they are resolved only as species (1.1.1A to 1.1.1N) of a single genus; the 16S rRNA data are supported in all cases by genome metrics. These values cannot be compared with the 16S rRNA identities for some strains (AWQC131C, LEGE 00246, UHCC 0167, UHCC 0406) and metagenomes (AL09 39864, AL93, Clear-A1, CRKS33, I3-5-bin-073, LEO11-02 39865) that lack, or contain only partial copies of, the gene; additionally, strains Dolichospermum spp. UHCC 0299 and UHCC 0352 have a large gap in the 16S rRNA sequence and have also been excluded from 16S rRNA identity calculations. All of these sequences, except that of the metagenome I3-5-bin-073, are shown in the figure below, derived by analysis with our 79 conserved marker genes. Österholm et al. (2020) showed a similar genomic phylogeny, using the 31 marker genes of Wu & Eisen (2008), and termed this cluster the ADA clade, derived from a larger tree of 94 genomes. This definition of the ADA clade excluded the strains of Cuspidothrix. The two trees are reasonably comparable, except that the clusters are more clearly resolved with our 79 marker genes. The tree of Österholm et al. contains in addition Aph. flos-aquae strain 2012/KM1/D3, excluded from our tree because it is incomplete, having only 76 of our 79 core gene marker set.

cyanobacterial phylogeny

The species designations are taken from our 16S rRNA tree; colours are as in our genomic trees. Sequences labeled with "$"
are not shown in the 16S rRNA tree; most lack (or contain short fragments of) the gene. The 16S rRNA sequences extracted
from the metagenomes Anabaena sp. MDT14b and Anabaena sp. WA113 fall into the bacterial outgroup of the RNA phylogeny.
The metagenomic sequence D. circinale Clear-D4 (incomplete, 75 markers) is not included in the genomic tree.

Their tree lacks D. circinale strains ACBU02 (available from the JGI, not from the NCBI) and ACFR02 (not in NCBI or JGI). Many sequences from the NCBI, not available at the time of publication of Österholm et al., have been added here. Many of these are taken from the thorough studies of Dreher et al. (2021a, b), who resolved 10 ADA clades, including Cuspidothrix spp., in a tree of 216 genomes inferred with 91 unlisted marker genes. Our analysis shows the assignment of the ADA genomes to 12 clades, including Cuspidothrix. The inclusion of members of the latter genus is justified by genome metrics, which indicate separation at the inter-specific level of all clades:

Genome metrics (%) observed with Cuspidothrix strain LEGE 03284 (333 contigs) and other members of the ADA clade

cyanobacterial phylogeny

Cuspidothrix was created by Rajaniemi et al. (2005) to accommodate strains then known as Aph. issatschenkoi. All members are bloom-forming and are now known to produce neurotoxins. Since Cuspidothrix issatschenkoi strain CHARLIE-1 does not contain a rrn operon, we can only perform a complete comparison of genome metrics with strain C. issatschenkoi LEGE 03284.

Of the nine complete chromosomal sequences of this genus, Dolichospermum sp. strain UHCC 0090 (1.1.1E) is clearly separated at the specific level from Dolichospermum sp DET69 (represented by a complete metagenomic sequence, species 1.1.1G), Anabaena sp. strain WA102 and Aph. flos-aquae AFA KM1/D3 (1.1.1A) and Aph. flos-aquae DEX188 (complete metagenome, species 1.1.1L) with 16S rRNA sequence identities of 97.77-98.52%, isDDH values of 37.8-45.7% (GGDC formula 3), ANI 87.65-90.43%, MASH similarity 89.51-92.43% and CGS 97.61-98.39%. For these DDH studies, it was necessary to artificially concatenate the two circular chromosomes of strain UHCC 0090 (chromosome 1, CP003284, 4.33 Mbp and chromosome 2, CP003285, 0.82 Mbp). Chromosome 1 contains 5 identical 16S rRNA genes, of which only one is shown in the trees.

Within species 1.1.1E, the three strains represented by complete chromosomes and three by complete metagenomic sequences are confirmed as members of a single species, showing isDDH (65.5-74.2% with GGDC formula 3), ANI 95.82-96.93%, Mash similarity of 96.87-97.72% and CGS 99.44-99.59%. The five 16S rRNA genes of the complete chromosome of strain UHCC 0315 share only 98.79-99.19% identity, thus exhibiting microheterogeneity. These sequences, of which we show 3 in the rRNA tree, share 98.72-99.13% identity with the 16S rRNA of Dolichospermum sp. strain UHCC 0090; isDDH values of 65.8% (GGDC formula 3), ANI 96.42%, MASH similarity 97.31% and CGS 99.45% confirm the con-specificity of the strains. Similar microheterogeneity may be seen in the complete chromosome of D. flos-aquae CCAP 1403/13F (also in species 1.1.1E), which also contains 5 rrn operons; the 16S rRNA sequences share 98.99-99.73% identity; only two are included in the 16S rRNA tree. These share 98.65-98.92% identity with strain UHCC 0090. These values are at the limit of our cutoff value; the isDDH (68.5%, GGDC formula 3), ANI (96.49%), Mash (97.38%) and CGS (99.54%) values establish them as members of a single species.

Anabaena sp. strain WA102 (whose complete chromosome contains five 16S rRNA genes sharing 99.9-100% identity, only one being shown in the tree) is con-specific (species 1.1.1A) with Aph. flos-aquae AFA KM1/D3 (complete metagenome), as shown by values of 67.5% isDDH, 96.18% ANI, 97.11% MASH similarity and 99.53% CGS.

The organism Dolichospermum sp. DET69, represented by a complete metagenome, lies in species 1.1.1G of the 16S rRNA tree. Comparison with the other complete chromosomes shows that it is clearly separated at the specific level from those in species 1.1.1E (40.4-41.7% isDDH with GGDC formula 3, 88.31-88.62% ANI, 90.06-90.14% Mash similarity and 97.99-98.02% CGS), 1.1.1K (32.6% isDDH with GGDC formula 3, 85.96% ANI, 87.54% Mash similarity and 97.62% CGS) and 1.1.1C (39.6% isDDH with GGDC formula 3, 88.05% ANI, 89.66% Mash similarity and 98.13% CGS).

With the complete genomic and complete metagenomic sequences excluded, 49 further genomes, all draft, remain in genus 1.1.1. Aph. flos-aquae strain 2012/KM1/D3 (species 1.1.1A) has been further excluded from DDH comparisons and the genome tree since the genome is incomplete. Eight species (defined by 16S rRNA identity values) contain genome sequences, as described below.

Within species 1.1.1E, 13 of the 14 remaining strains (only 2 being shown in the 16S rRNA tree) share 98.62-99.79% 16S rRNA sequence identity, more than 97.26% ANI and a minimum of 74.9% isDDH, confirming their placement into a single species. Strain Anabaena sp. UHCC 0187 shares 98.63-99.38% 16S rRNA sequence identity with the above, seemingly being a member of the same species, but is confirmed as a different species by isDDH values of 40.3-40.5%, ANI values of 89.20-89.38% and 98.67-98.69% CGS. Note that, although the metagenome Anabaena sp. MTD14b lies in this species, the two 16S rRNA genes extracted from the genome sequence fall into the bacterial outgroup. Ten further metagenomes also lie in this species, but the 16S rRNA sequences of three are not available for comparison; however, these are confirmed as co-identic by values of 82.4% isDDH and 97.86% ANI. Strains UHCC 0299 and UHCC 0352 fall into this species in the 16S rRNA tree if only the 5-prime end of the gene is considered, but have long branches due to the gaps in their rRNA sequence; they have been omitted from this tree. Some of the metagenomes show microheterogeneity of the 16S rRNA tree, the most pronounced being OL03 (5 rrn operons, 98.7-99.5% 16S rRNA sequence identity).

The single complete and three draft metagenomic sequences in species 1.1.1G share 99.2-100% 16S rRNA sequence identity and show 69.9-97.3% isDDH and 96.18-99.7% ANI values. All four strains are therefore members of a single species. The metagenome D. circinale Clear-D4 fits into this species on the basis of genome metrics (96.05% ANI, 70.9% isDDH with strain ACFR02), but is incomplete, containing only 75 of our 79 core marker genes and no 16S rRNA gene, and has been omitted from the rRNA and genome trees.

The eight strains of species 1.1.1A for which 16S rRNA data are available are confirmed as members of a single species (99.74-99.93% 16S rRNA sequence identity, 68.2-73.4% isDDH, 95.43- 98.67% ANI and 99.46-99.93% CGS). The incomplete draft genome of Aph. flos-aquae strain 2012/KM1/D3 falls into this cluster in the 16S rRNA tree, but has been excluded from the genome tree and DDH calculation; the 16S rRNA gene of the metagenome Aph. flos-aquae WA102 incorrectly places this genome into cluster 1.1.1A due to extreme heterogeneity (see below); this has been excluded from DDH calculation.

Species 1.1.1B, 1.1.1C and 1.1.1D in the 16S rRNA tree contain the sequences of 3 draft genomes: Dolichospermum sp. LEGE 00240, NIES-806 and UHCC 0259, all present in the genome tree. Values of 50.4-53.8% isDDH, 92.36-93.11% ANI and 98.93-99.04% CGS fall onto the borderline of interspecific separation.

The draft metagenome labeled as WA102, but named as Aph. flos-aquae, contains four 16S rRNA sequences, only two being shown in the 16S rRNA tree; one (incomplete) is found in species 1.1.1A (sharing 99.5% identity with the sequence from the complete chromosome of Anabaena sp. strain WA102) and the other in 1.1.1L. The 2 sequences thus assign this organism to different species. On the basis of isDDH, this draft metagenome fits into species 1.1.1L, sharing 82.3-82.6% identity and 97.44-97.81% ANI with Anabaena sp. UBA 12330 and WA113, Aphanizomenon flos-aquae Clear-A1, CP01, DEX188, MDT13, MDT14a and UKL13-PB (all being again metagenomic assemblies); it shows only 35.3% isDDH, 87.51% ANI with the complete chromosome of Anabaena sp. strain WA102 in species 1.1.1A. Strains MDT13 and MDT14a show high isDDH values of 92.3% and 98.99% ANI. The (almost) full length 16S rRNA genes extracted from the metagenomic sequences Anabaena spp. strains MDT14b (2 sequences) and WA113 (3 sequences) are not of cyanobacterial origin, since they fall into the bacterial outgroup of the tree. The metagenome Aph. flos-aquae Clear-A1, showing 98.00% ANI and 83.8% isDDH to Aphanizomenon MDT14a, places the organism into species 1.1.1L but, lacking a rrn operon, is not shown in the RNA tree.

The four D. circinale strains represented by genomes in species 1.1.1I are con-specific, as shown by values of 70.9-97.3% isDDH and 96.21-99.77% ANI. The single metagenome of this species, Anabaena sp. CRKS33, gives isDDH of 58.8-60.6% and ANI values of 94.62-94.81% with the D. circinale strains, confirming membership of the same species.

D. planctonicum strain NIES-80 lies in species 1.1.1H; although it cannot be compared with strain UHCC 0167 on the basis of 16S rRNA identity, the two are members of a single species, showing an isDDH value of 67.9% and 95.99% ANI. Strain NIES-80 shares 84.5% isDDH, 98.20% ANI and 99.81% CGS with D. flos-aquae strain LEGE 04289.

The positioning of the eight sequences of species 1.1.1H and 1.1.1I is debatable. Dreher et al. (2021a, b) combined them into a single species (ADA-1), but mention that "A case might be made in the future to split ADA-1 into subspecies". Genome metrics for the two species are conflicting: isDDH and ANI values suggest that they be combined, but 16S rRNA sequence identities and CGS values do not. We have maintained specific separation of these sequences, pending further study.

Species 1.1.1M contains two draft genomic sequences (Anabaena sp. strains UHCC 0204 and UHCC 0253) that show 99.46% 16S rRNA sequence identity, 84.0% isDDH and 98.05% ANI, confirming their assignment to the same species.

The two strains of Cuspidothrix issatschenkoi of species 1.1.1N are con-specific (isDDH 69.6%, ANI 96.18%). Strain CHARLIE-1 lacks not only the rrn operon but also the nif and gvp gene clusters, present in strain LEGE 03284.

The problem of inter-mixing of the generic names appended to members of genus 1.1.1 has been reported in several publications (see Driscoll et al., 2018, and references therein) and may arise, in part, by the over-zealous transfer of strains with a variety of specific epithets into Dolichospermum by various authors, and in part by renaming of strains originally identified as those species of Anabaena to Dolichospermum, which is done automatically by the NCBI. Dreher et al. (2021a) suggested that this genus be called Anabaena, which appears to be incorrect, since Rajaniemi et al. (2005) suggested that "If these subclusters would be classified into a single widely conceived genus, its name must be Aphanizomenon MORREN [1836] ex BORNET et FLAHAULT, Ann. Sci. Nat. Bot. 7, ser 7:241, 1888 (type-species= Aphanizomenon flos-aquae [L.] RALFS ex BORNET et FLAHAULT ibid., 1888".

Trichormus strains are grouped in at least 2 generic clusters in the 16S rRNA tree (1.1.4 and 1.10.1). Single strains are also found in 1.2.6B (Sherwood et al., unpub), 1.3.7B (Miscoe et al., 2016) 1.4.1A (Johansen et al., unpub) 1.12.1B (Miscoe et al., 2016) 1.13.4 (Rajaniemi et al., 2005a). This makes a total of 7 generic clusters in 7 families. Rajaniemi et al. (2005a) remarked on the polyphyly of this "genus". Komárek, J. and Anagnostidis, K. (1989) transferred Anabaena variabilis (and many other species) into the genus with T. variabilis as type species. Two genomes (Trichormus sp. NMC-1, Anabaena sp. PCC 7108, both draft and from saline habitats) and the 16S rRNA of 13 strains (named as Anabaena sp., Trichormus sp. and many T. variabilis) fall into genus 1.1.4, mostly isolated from freshwater habitats. The draft genome of T. variabilis SAG 1403-b also fits into this genus in the genome tree, but cannot be shown in the 16S rRNA tree. The genomes share 89.42% ANI, 41.2% isDDH and 97.11% 16S rRNA sequence identity, as expected from their placement into two species. The con-specificity of Anabaena sp. strains PCC 6309 and PCC 7122 (species 1.2.9A in the tree) and their separation from strain PCC 7108 (species 1.1.4B) with only 32% DDH and high (11 °C) ΔTm(e) of the hybrids was shown by thermal hydroxyapatite elution DDH studies (Lachance, 1981).

The members of genus 1.2.1 were renamed from Cylindrospermopsis to Raphidiopsis by Aguilera et al. (2018) in a detailed study of 16S rRNA, 16S–23S ITS and cpcBA-IGS sequences in which no evidence of generic separation was found. Raphidiopsis has clear
priority over Cylindrospermopsis, and R. curvata was retained as the type species. 16S rRNA and draft genome sequences are jointly available in our phylogenetic trees for 12 strains of R. raciborskii (CS-505, CS-508, CS-509, CENA302, CENA303, CYLP, CYRF, GIHE 2018, ITEP-A1, MVCC14, MVCC19 and UNSW 506 [=CS-506]), together with R. brookii D9 and R. curvata NIES-932 (represented by a draft genome and circular [possibly complete] chromosome, respectively), the complete chromosomes of R. curvispora GIHE-G1, R. raciborskii Cr2010, KLL07, N8 and the draft metagenome Cyanobacteria bacterium REEB494. The sequences share 99.33-100% 16S rRNA sequence identity (the complete chromosomes sharing 99.53-99.93%) and, on the basis of this character, are members of a single species. The genome tree below shows that the strains form two closely-related clusters. Within each cluster, the strains are clearly con-specific (cluster1: isDDH 61.6-95.4%, ANI 94.77-99.23%, CGS 99.07-99.94%; cluster 2: isDDH 71.1-72.2%, ANI 95.88-99.52%, CGS 99.37-99.39%). The clusters are separated by genome metrics values that fall onto our borderline for species delineation: isDDH 50.0-53.6%, ANI 91.17-92.49%, CGS 98.76-98.83%.

cyanobacterial phylogeny

Extracted from our all-genome tree; rooted with Sphaerospermopsis spp.

Our BLAST results show that R. brookii D9, R. curvata NIES-932 and R. raciborskii CENA303 lack the nifH (N2 fixation) and hglD (heterocyst glycolipid synthase) genes previously considered (Stucken et al., 2010) to be a diagnostic feature of Raphidiopsis, although these genes are present in all other members of the species. Consistent with the results of Stucken et al. (2010), the chromosomes of the complete Raphidiopsis strains are small (3.43-3.91 Mbp), as shown in magenta in the following histogram; all other heterocystous cyanobacteria possess larger chromosomes, ranging from 4.54 to 11.06 Mbp (pale blue in the histogram).

cyanobacterial phylogeny

Note that the chromosomes of the sister group of Sphaerospermopsis spp. (described below) vary in size from 5.14 to 6.10 Mbp, heterocystous cyanobionts from 5.35 to 8.23 Mbp and the ADA clade members (which, like Raphidiosis spp. also form water blooms and toxins) from 4.54 to 6.07 Mbp.

Although we do not know the genome size of the ancestor, it is clear that the Raphidiopsis strains have undergone genome reduction. Stucken et al. remarked that the complete chromosomes of two Raphidiopsis spp. share a specific set of only 2539 genes with greater than 90% ANI, and 3 copies of the rrn operon.

The complete chromosome of Nodularia spumigena strain UHCC 0039 and the circular (not listed as complete) genome of strain CCY9414 in species 1.3.1A share 99.73% 16S rRNA sequence identity and show an isDDH value of 98.4%, ANI 99.45% over 94.6% of the genome and CGS 99.94, confirming their con-specificity. Strain UHCC 0039 shares slightly lower (99.46%) 16S rRNA sequence identity, 74.2% isDDH, 96.85% ANI (calculated on 78.5% of the genome) and 99.54% CGS with the draft genome of Nodularia sp. CENA596, supporting their con-specificity, but lower values of 98.79% 16S rRNA sequence identity, 38.0% isDDH, 87.98% ANI (over only 58.7% of the genome) and 98.75% CGS with the draft genome of strain NIES-3585. The genome metrics suggest that the latter strain falls below the value employed here for species demarcation.

Two genomic sequences in 16S rRNA species 1.3.7E, the metagenome Nostoc sp. 996 and the draft Nostocaceae cyanobacterium CENA357, show 46.0 % isDDH, 90.93% ANI, 98.99% CGS, in agreement with a 16S rRNA identity of 98.22%, placing them as borderline members of a single species.

Cylindrospermum spp. strains PCC 7417 (represented by a complete chromosome sequence) and NIES-4074 ("nearly complete") share only 97.64% 16S rRNA sequence identity; the isDDH value of 29.0% (GGDC formula 3), ANI of 83.41% and CGS of 97.17% confirm their assignment to different species (1.7.1A and 1.7.1F). The 16S rRNA gene extracted from the draft genome of Cylindrospermum sp. FACHB-282 shows 99.06% identity with strain PCC 7417, the genomes share 38.8 % isDDH, 88.63% ANI and 98.50% CGS, not supporting con-specificty; this genome shows only 97.11% 16S rRNA sequence identity with NIES-4074, 28.8 % isDDH, 83.22 % ANI and 97.09 % CGS. The Cylindrospermum spp. strains therefore represent 3 species on the basis of genome metrics.

Two strains (CA=ATCC 33047 and 4-3) named as Anabaena sp. and found in the same environment (estuary, Port Aransas, Texas) lie in the monospecific genus 1.9.2; their draft genomes share 99.31% ANI, 94.4% isDDH and 99.93% 16S rRNA sequence identity and are therefore members of a single species.

The "Nostoc" strains for which sequenced genomes are available mostly fall into dispersed generic/specific clusters of the tree as single members, and show only low (around 22isDDH values. The 16S rRNA identity values are consistent with genome metrics in all cases, except for three clusters (described above).

Genus 1.8.1 in the ARB 16S rRNA tree contains 28 "Nostoc" strains represented by genomic sequences, 10 being complete, and two genomes from herbaria; their grouping into the same or different species is, in almost all cases, supported by both 16S rRNA sequence identity and isDDH and ANI values. Some of these sequences have been omitted from the phylogenomic tree.

Nelson et al. (2019), in a tree of 100 genomes inferred with 834 unspecified single copy markers, described the complete chromosomes of four Nostoc cyanobionts, isolated from hornworts and a liverwort. Nostoc sp. strains TCL240-02 and C057 lie in species 1.8.1A of the 16S rRNA tree, with Nostoc sp. strain C052 in species 1.8.1I. Nostoc sp. strain TCL26-01 is unrelated to the first three, falling into a different family (species 1.11.3A). The con-specificity of strains TCL240-02 and C057 is confirmed by genome metrics (isDDH 65.1%, ANI 93.19%), and the separation of strain C052 at the specific level by lower values (isDDH 37.2%, ANI 86.43%). Similarity measures involving strain TCL26-01 are more difficult to calculate, but the estimates of 14.6-14.7% isDDH and 75.84-75.92% ANI support generic separation. The isDDH results described above were obtained with GGDC formula 3.

Species 1.8.1A contains, in addition to Nostoc sp. strains TCL240-02 and C057, Nostoc sp. strains ATCC 53789 (represented by 2 sequences, 1 complete), N6, PCC 73102 (both represented by complete chromosomes, and strains 213, 232, LEGE 12447, LEGE 12450, UCD121, UCD122 and UIC 10630 (all with draft genomes). Nostoc sp. strain UCD120 also falls into this species on the basis of genome metrics, but lacks rrn operons and is not shown in the 16S rRNA phylogeny. The strains are all cyanobionts, except strain UIC 10630 which was isolated from soil. Also present are several cyanobionts named as Nostoc sp. (16S rRNA only). For Nostoc sp. strains 213, 232, N6, UCD120, UCD121 and UCD122, genome metrics conflict either with their con-specificity within species 1.8.1A or their assignment to different genera when compared to the cyanobiont members of genus 1.8.2, as described above.

The remaining members of species 1.8.1A are truly con-specific, as shown by their ranges of 98.59-99.93% 16S rRNA sequence identity, 52.2-99.6% isDDH and 91.93-99.90% ANI. Their inter-specific relationship to other members of genus 1.8.1 is shown unambiguously by values of 97.38-98.79% 16S rRNA sequence identity, 35.8-37.8% isDDH, 86.36-87.13% ANI and 98.28-98.56% CGS. Strain N6 and Nostoc sp. 'Lobaria pulmonaria cyanobiont' strain 5183 (species 1.8.1S), both represented by complete chromosomes (Gagunashvili and Andrésson, 2018), are also placed into different species by all available measures (96.97% 16S rRNA sequence identity, 36.5% isDDH (GGDC formula 3), 87.12% ANI, 89.02% Mash similarity and 98.18% CGS.

N. edaphicum strain CCNP1411 (complete chromosome) and Nostoc sp. strain KVJ20 (draft) appear to be con-specific in species 1.8.1F, showing 99.46% 16S rRNA sequence identity, 59.4% isDDH and 93.59% ANI, but we have shown that a small segment of the genome of strain CCNP1411 is chimeric. None of the other Nostoc cyanobionts (e.g. strains 0708 from Azolla (described above in detail) and KVJ20 from Blasia) are closely related to the members of genus 1.8.1.

The complete chromosomes of N. sphaeroides strains CCNUC1 and Kützing En lie in species 1.8.1AJ Their extracted 16S rRNA genes show microheterogeneity and give identity values of 98.86-99.66,%. Con-specificity is confirmed by all genome metrics (ANI 98.60%, isDDH 94.1% (GGDC formula 3), MASH-similarity 99.06%, CGS 99.92%).

The draft genomes of Nostoc sp. strains BAE and HG1 form species 1.8.1N. Their con-specificity (16S rRNA sequence identity 99.53%) is confirmed by an ANI value of 92.44% and isDDH 59.9%, the latter value being borderline. CGS values are not available, since only strain BAE is shown in the genomic tree.

The cyanobiont Nostoc sp strain 2RC lies in species 1.8.2E with the N. linckia strain Z series (13 genomes, of which we show only 4, all cyanobionts) plus the metagenome Nostoc sp. Moss3 (moss epiphyte). Additionally present are some Nostoc spp., Desmonostoc spp. and Dolichospermum spp. (represented by their 16S rRNA genes) from varied habitats, only some of which are cyanobionts. Strain 2RC shares 96.48% ANI (74.7% isDDH) with strain z1 and 96.3% ANI (74.4% isDDH) with strain Moss 3, and 99.66% 16S rRNA sequence identity with both. The genome metrics therefore agree with their con-specificity. Nostoc sp. Moss 3 itself shows low similarity to another metagenome from a similar habitat, Nostoc sp. Moss 2 (found in species 1.8.1AD), showing only 95.50% rRNA sequence identity. However, values of 82.31% ANI and 96.27% CGS do not support this generic separation. A value of 28.9% isDDH with GGDC formula 2 should not be considered to be accurate in this range. The 13 N. linckia strain Z series genomes are virtually identical in size (8.9-9.2 Mbp, deduced by addition of the lengths of their contigs) and exhibit 99.91-99.99% ANI, 99.7-99.9% isDDH. The genomic sequence of strain NIES-25 (whose extracted 16S rRNA sequence lies in species 1.8.2G, only one of three being shown in the 16S rRNA tree) is incomplete, containing only 47 of our 79 marker genes, and has been excluded from the similarity calculations.

The genome of Nostoc sp. ATCC 43529 contains 2 16S rRNA genes, one falling into species 1.8.2H, the other into the bacterial outgroup. The genome shares 98.5% identity, 42.8% isDDH, 90.04% ANI, to that of Nostoc sp. Hyl-09-2-6 in species 1.8.2I. The separation of these strains into two species is therefore confirmed by all methods. The two species (including sequences obtained via normal PCR reactions) show internally 98.8-99.9% 16S rRNA sequence identity, separated at 98.5%. The genome of strain PA-18-2419, whose 16S rRNA sequence falls into species 1.8.2H, is chimeric; this has been excluded from the tree based on genomic sequences and from calculation of genome metrics.

The genome of Nostoc minutum NIES-26, like that of strain ATCC 43529, contains 2 16S rRNA genes, the first appearing in genus 1.8.6, the second as a single "genus" in family 1.13. Genomic sequences of close relatives of these strains are not available, therefore isDDH cannot be used to decide which sequences are correct.

Seven genomes of T. variabilis are in the mono-specific genus 1.10.1; 6 (strains 9RC, ARAD, FSR, N2B, PNB and V5, from Azolla caroliniana, A. feliculoides, A. pinnata and Azolla sp., Pratte and Thiel, 2021) are cyanobionts, and 1 (strain 0441) from a thermal spring. Other genomes are: Anabaena sp. YBSO1 (terrestrial); Anabaena variabilis NIES-23 (no information on habitat); in NCBI as A. variabilis (TITLE field of flatfile), but Trichormus (ORGANISM field); Nostoc sp. PCC 7937 (ATCC 29413, T. variabilis; UTEX 1444, A. variabilis) (freshwater); in NCBI as A. variabilis (TITLE field of flatfile), but Trichormus (ORGANISM field), but the RefSeq changed the TITLE to T. variabilis. Isolated as A. flos-aquae; received by the PCC from C.P. Wolk as A. variabilis. Sensitive to cyanophage N1, like PCC 7120 (thus closely related), but forms hormogonia, unlike PCC 7120; Nostoc sp. PCC 7120 (freshwater); the metagenomes Nostoc sp. Hyl-06-6-4 and Pleu-09-321, both of organisms epiphytic on moss. The genomes of Trichormus sp. strains 9RC, ARAD, FSR, N2B, PNB and V5 show 100% 16S rRNA sequence identity, 99.9% isDDH and 99.81-99.98 ANI with each other and are also identical to Nostoc sp. 7937, Anabaena sp. YBSO1 and T. variabilis 0041; they share slightly lower values of 98.92% 16S rRNA sequence identity with Nostoc sp. 7120, and 98.92% 16S rRNA sequence identity with Anabaena variabilis NIES-23, and are therefore con-specific. This is confirmed by isDDH (63.2-100%) and ANI (92.15-99.99%) values.

Nostoc sp. NIES-2111 and Nostoc sp. NIES-3756 (both terrestrial isolates) are con-specific (species 1.11.1A, 99.9% 16S rRNA sequence identity and 85.4% isDDH, 98.20% ANI). Within species 1.12.1A, Nostoc sp. HK-01 (complete chromosome), N. cycadae WK-1 (draft) and a strain named as Anabaenopsis circularis NIES-21 (= IAM M-4, Nostoc PCC 6720, draft genome) are con-specific, having identical 16S rRNA sequences; they show 82.6-84.9% isDDH and 97.44-97.96% ANI. However, Nostoc sp. strain PCC 7107, represented by a complete chromosome, shows a high (99.60%) rRNA sequence identity with the other 3 members, 62.1% isDDH with strain Nostoc sp. HK-01 (GGDC formula 3) and 47.0-47.1% isDDH with the remaining strains. The ANI values between strain PCC 7107 and the others are 91.13-91.43%. The genome metrics (Mash similarity 93.25%, ANI 91.3% and CGS 99.17%) between strains PCC 7107 and HK-01 suggest these strains to be borderline members of a single species. Nostoc sp. strain FACHB-190 is clearly a member of the same species (1.12.5C) as Aulosira sp. FACHB-615 (99.93% rRNA sequence identity, 84.0% isDDH, 97.50% ANI and 99.91% CGS). Lachance (1981) demonstrated a wet-lab DDH value of 67% ΔTm(e)4°C between strains Nostoc sp. PCC 7107 and PCC 6720 (= A. circularis NIES-21), confirming their con-specificity.

Genus 1.12.3 ("Desikacharya") contains 4 species, of which three are represented by genomic sequences: D. piscinale strain CENA21 (complete chromosome), species 1.12.3A, "Nostoc spongiaeforme" FACHB-130 and "Nostoc" sp. MBR 210 (draft genomes), species 1.12.3C, and "Nostoc" sp. LEGE 06077 (draft), species 1.12.3D. The species are separated by a maximum value of 98.57% 16S rRNA sequence identity, the 2 members of species 1.12.3A being unified by a 16S rRNA sequence identity of 99.73%. The con-specificity of strains FACHB-130 and MBR 210 in species 1.12.3A is confirmed by values of 74.7% isDDH and 96.57% ANI. Strains LEGE 06077 and FACHB-130 are maintained as different species (31.9% isDDH, 85.45% ANI). However, strain CENA21 appears to be a member of species 1.12.3C, with isDDH values of 52.8% and 52.9%, ANI 92.88% and 93.08%, with strains FACHB-130 and MBR 210, respectively. Strain FACHB-130 is not shown in the genome tree, and the CGS value of 98.92% between strains CENA21 and MBR 210 falls into the borderline region for species separation.

In conclusion, the 16S rRNA sequence similarities, isDDH and ANI values are in good agreement for defining species of this large clade of Nostocacean cyanobacteria. However, genome sequences of further Nostoc isolates are urgently required. Lachance (1981) first demonstrated by wet-lab DDH the wide dispersal of Nostoc spp. strains of the PCC, but several clusters were apparent. The first grouped strains PCC 6302, PCC 7121 and PCC 7422 with 52-56% DDH and high ΔTm(e) values of 9-16°C, which suggests these strains to be members of different species of the same genus; their PCR-derived 16S rRNA sequences are found here in three species of genus 1.8.2. In the second, strains PCC 6720, PCC 7107 and PCC 7416 showed 69-93% DDH and low (0-4°C) ΔTm(e) and should be considered to be members of a single species; this is confirmed by the placement of their PCR-derived 16S rRNA sequences into species 1.12.1A of the tree. High (93-100%) DDH values with low (0-1°C ΔTm(e)) show the near-identity of strains of the third cluster (PCC 6411, PCC 6705, PCC 6719, PCC 7118, PCC 7119 and PCC 7120); their PCR-derived 16S rRNA sequences are found here within genus 1.10.1.

Of the few genome sequences of strains assigned to the genus Tolypothrix, strain Tolypothrix sp. PCC 9009 (in species 1.14.1E, draft genome) clusters with Hassallia byssoides strain VB512170, also represented by a draft genome sequence, in the same species (99.19% 16S rRNA sequence identity but low isDDH value of 40.9% and 87.14% ANI, based on only 31% coverage); note that the latter genome is contaminated by bacterial DNA and is far larger (13 Mbp) than expected (e.g. 8 Mbp for PCC 9009). Strain PCC 9009 also shows low relationship with Tolypothrix sp. NIES-4075 in species 1.14.1A (isDDH 38.2%, ANI 88.07%, 98.64% 16S rRNA sequence identity), confirming their assignment to different species. Strain Tolypothrix sp. PCC 7601 (in species 1.8.4A, draft genome) shows only 23.1% isDDH (not discriminatory in this range), 74.98% ANI (with 38% coverage), 94.42% CGS and 95.80% 16S rRNA sequence identity with strain PCC 9009, confirming them to be members of different genera. The members of species 1.8.4A, all represented by draft genome sequences, show 99.54-100% 16S rRNA sequence identity, 54.2-99.7% isDDH, 93.5-99.9% ANI and are therefore con-specific, despite carrying three different generic names. Strain PCC 7601 is separated at the specific level from Nostoc sp. 106C (species 1.8.4B, draft genome), Calothrix spp. NIES-2098 (species 1.8.4C, complete chromosome) and NIES-2100 (species 1.8.4D, draft genome) by low (96.8-98.8%) 16S rRNA sequence identity, low isDDH values of 27.5-28.1% (not discriminatory in this range), ANI values of only 80.88-81.16% and CGS 97.35-98.03%; the incomplete genome sequence of Nostoc carneum NIES-2107, containing only 75 of our standard 79 genes marker set, has been excluded from DDH comparison and from the genome tree. Other strains, previously named as Calothrix, were studied in wet-lab DDH experiments by Lachance (1981) and assigned to the genus Tolypothrix; the strains then available formed several clusters, as we observe in the tree: strains PCC 7101, PCC 7504, PCC 7601 and PCC 7710 (species 1.8.4A of the tree) with strain PCC 7708 (in species 1.8.4B); strains PCC 6305 and PCC 6601 (species 1.8.3A).

The two strains of genus 1.9.2 represented by 16S rRNA genes extracted from their draft genomes, Anabaena sp. CA and 4-3, are almost identical, as seen from the available metrics (99.93% 16S rRNA sequence identity, an isDDH value of 94.4% and 99.31% ANI).

Of the genomes of strains named as Fischerella spp., Mastigocladus lamimosus, Hapalosiphon spp., Westiella intricata and Westiellopsis prolifica (genus 1.15.1), those of one member of species 1.15.1A are available, 5 for species 1.15.1G, 5 for species 1.15.1J and 1 for species 1.15.1K. Strains within a single species show isDDH values of 76.3-99.9%, with members of separate species having isDDH values of 41.0-46.9%. The metagenome Hapalosiphon sp. JJU2 is not shown in the rRNA phylogeny; the organism is con-specific with Fischerella sp. PCC 9339 (species 1.15.1G), as shown by values of 69.8% isDDH, 95.37% ANI and 99.34% CGS. Within species 1.15.1G, the sequences of 5 strains assigned to three genera (Fischerella, Mastigocladus and Westiellopsis) share 99.71-100% 16S rRNA sequence identity, confirmed by isDDH values of 81.0-85.3% and 96.74-98.22% ANI, illustrating the current state of nomenclatural confusion. Note that, of the 2 rrn operons recovered from the genomic sequence of Westiella intricata strain UH HT-29-1, one falls into species 1.15.1J together with a sequence obtained by 16S rDNA-specific PCR; the other (from rrnA) lies in family 1.13. The other members of species 1.15.1J, named as Fischerella and Hapalosiphon spp., show, together with strain UH HT-29-1, 16S rRNA sequence identities of 99.79-100%; the isDDH (76.3-77.2%) and ANI (96.59-100%) results are in perfect agreement with the 16S rRNA sequence identities. A similar situation occurs for genus 1.15.4 (where genome sequences are available for all of the 4 species of Fischerella): isDDH between all available genome sequences confirms the con-specificity or division of the strains at the specific level on the basis of 16S rRNA sequence identity. Note that of the sequences of species 1.15.4A, all those from White Creek, Yellowstone NP, USA have been omitted from the genome tree; in the absence of a publication, it is not clear whether they represent isolates or are metagenomes derived from identical environmental samples. Unfortunately, no Mastigocladus spp. genomic sequences are available for genus 1.15.4. Lachance (1981) in wet-lab thermal hydroxyapatite elution DDH studies previously demonstrated the dispersal of Fischerella spp. strains of the PCC into two specific clusters: strains PCC 7521, PCC 7522 and PCC 7523 with 87-91% DDH and 0-2 °C ΔTm(e) (here species 1.15.4A); strains PCC 7115, PCC 7414, PCC 7520 and PCC 7603 with 98-100% DDH and no change of ΔTm(e) (species 1.15.4D). However, Fischerella muscicola strain PCC 73103 showed 99.0% DDH, with 0 °C ΔTm(e), with members of our species 1.15.4D whereas this strain here falls into species 1.15.1A. The isDDH results (32.7%, not discriminatory in this range), 85.53% ANI and 98.24% CGS values confirm the assignment of strains PCC 73103 and PCC 7414 to different species. We conclude that an error involving strain provision or mislabeling occurred. Comparison of wet-lab and isDDH values for other strains is not possible, since there are no other results in common.

Strains named as Calothrix spp. are widely dispersed throughout order 1, with the majority falling into family 1.15. We show only 63 members in the 16S rRNA tree. Genera 1.15.7, 1.15.8, 1.15.14 and 1.15.15 are each divided into several species on the basis of their 16S rRNA sequence identities. Unfortunately, genomic sequences are available only for 7 species of Calothrix and 1 of Rivularia, representing 4 genera plus 1 singleton genus containing only one strain.

The complete chromosome of Calothrix sp. NIES-4101, containing 4 rrn operons of which the 16S rRNA of only one is shown, forms the singleton genus in family 1.15. Species 1.15.7A contains 5 identical copies of the rrn operon of the complete chromosome of Calothrix sp. PCC 7716, of which the 16S rRNA sequences of only 2 are shown. Species 1.15.7B contains the single draft genomic sequence of Calothrix sp. PCC 7103. Species 1.15.7C, containing the chromosomal sequences of two strains (Calothrix sp. NIES-4071, complete, and NIES-4105), is one of two clusters for which intra-specific comparison is possible. Both genomes contain 5 copies of the rrn operon; only 2 16S rRNA sequences are shown for each, sharing 99.87% sequence identity within each strain and 99.87-100% between strains. Species 1.15.7D contains 2 draft genome sequences of C. desertica PCC 7102. Monospecific genus 1.15.13 is represented by the single complete chromosome of Calothrix sp. PCC 6303, which contains 4 rrn operons; we show only one 16S rRNA sequence. Species 1.15.14C contains the complete chromosome sequence of Calothrix sp. NIES-3974 which possesses 3 rrn operons, of which only one 16S rRNA sequence is shown. Species 1.15.15D is represented by the draft genome of C. parasitica strain NIES-267, containing 3 rrn operons, of which only one 16S rRNA sequence is shown, and also contains the complete chromosome of Rivularia sp. PCC 7116, of which only one 16S rRNA sequence of the 3 rrn operons is shown.

The complete chromosomes representative of 5 genera, Calothrix spp. NIES-4071, NIES-4101, PCC 6303 and Rivularia sp. PCC 7116 vary in size from 6.77-11.06 Mbp. Their extracted 16S rRNA genes show 90.77-92.79% sequence identity. The isDDH values of 13.1-14.5% and ANI values of 68.67-73.7% confirm the generic separation of these strains.

Within genus 1.15.7, strains of the 4 species have genomes of size 11.06-11.58 Mbp (measured as the sum of contig lengths for the draft genomes). They show 97.85-98.72% 16S rRNA sequence identity, 82.31-91.01% ANI and 97.60-99.05% CGS, confirming their specific separation, although the highest values for both ANI and CGS fall into the borderline region for species separation. The DDH values of 29.2-29.5% (formula 2) are not discriminatory in this range Within species 1.15.7C, strains NIES-4071 and NIES-4105 are identical in genome size (11.06 Mbp); their chromosomes are identical (100% isDDH, ANI and CGS). The two genomes representing species 1.15.15D, whose metrics do not agree with the 16S rRNA identity value are described above.

Specific separation of strains PCC 7102 and PCC 7103 was shown in the wet-lab DDH results of Lachance (1981), whose data demonstrate (for the isolates available at that time) the division of this “genus” into multiple clusters: strains PCC 7103 and PCC 7713 (species 1.15.7B), showing 80% DDH with a ΔTm(e) of 1°C; and strain PCC 7102 (species 1.15.7D) with 65% DDH and 5°C ΔTm(e) to the above; strains PCC 7709, PCC 7715 and PCC 7716 (85-86% DDH, ΔTm(e) 5°C, species 1.15.7A); strains PCC 7111, PCC 7116, PCC 7204, PCC 7426, PCC 7711, PCC 7810 and PCC 7815, with lower DDH values of 40-60% and higher (13-15°C) ΔTm(e) (Rivularia species 1.15.15A, 1.15.15D and 1.15.15F). Low DDH values of only 10-27% were observed for strains PCC 6303 (genus 1.15.13), PCC 7507 (species 1.5.1B) and PCC 7714 (species 1.15.9A) in crosses with representatives of the above clusters. Strains PCC 7102 and PCC 7103 showed 52% DDH (ΔTm(e) 5°C), close to the isDDH value.

Members of the genus Richelia (1.15.18), N2-fixing endosymbionts of marine diatoms, divide into three species: the first contains strains HH01 and HM01 (67.9% isDDH, 99.92% ANI, 99.26% 16S rRNA sequence identity; the latter was excluded from the genome tree because it contains only 63 of 79 marker genes) together with the metagenomic sequence Richelia UBA3481, that shows 99.7% isDDH with strain HH01 (unfortunately, 16S rRNA sequences are not present in this genome); the second contains the single strain RC01, related to the first by only 96.98% 16S rRNA sequence identity (DDH and ANI values were not compared for the incomplete genome sequence of this strain, and it was excluded from the genome tree); the third (a borderline species of the genus) is comprised of a strain named as Calothrix rhizosoleniae SC01, related to strain HH01 by 20.6% isDDH, 75.78% ANI and 96.84% 16S rRNA sequence identity. A further cluster, containing metagenomic sequences DT 104, UBA3308, UBA3957 and UBA3958, is difficult to place due to the absence of 16S rRNA sequences in the genomes, but probably represents a distinct genus. A final strain, FACHB-800, isolated from a freshwater environment, lies in a different family (1.18) and is presumably misidentified.

The remaining heterocystous cyanobacteria (genera 1.15.19 to 1.19.13) are represented by sequenced genomes of only 11 strains and one metagenome. Within monospecific genus 1.19.1, the draft genomes of Scytonema spp. strains HK-05 (2 sequences) and NIES-4073 show 99.06% 16S rRNA sequence identity, 91.94% ANI (with 65% coverage), 57% isDDH and 98.91% CGS, and are therefore confirmed as members of a single species. The draft genomes of Brasilonema sennae CENA114 and B. octagenarum UFV-E1 are identical, with 100% isDDH, 99.96% ANI and 100% CGS, and their extracted 16S rRNA genes are identical. Their genome sizes are 7.78-7.82 Mbp. These strains, despite their different specific names, are therefore members of a single species (1.19.4A). They show 93.1-93.2% isDDH and 99.11-99.14% ANI (with about 93% coverage) with the draft genome of B. octagenarum strain UFV-OR1, which is therefore a member of the same species. The 16S rRNA gene extracted from the latter genome, however, shares only 97.78% identity with those of the complete chromosomes; this gene sequence shows many mispairings in helices 36-39, indicative of the presence of foreign DNA, and does not correspond to the gene recovered from a 16S rRNA-specific PCR reaction, which nevertheless falls into the same species. Strain CENA114 shows 40.7% isDDH and 98.52% 16S rRNA sequence identity with Scytonema tolypothrichoides VB-61278, showing them to be members of different species of a single genus (species 1.19.4A and 1.19.4C), despite their different appended generic names; ANI values, calculated for the latter strain after the removal of an excessive number of ambiguous sites, is 88.73%, confirming their assignment to different species. The metagenome labeled as Scytonema sp. RU_4_4 (species 1.19.4B), shares 96.77% 16S rRNA sequence identity and 33.4% isDDH with S. tolypothrichoides VB-61278, suggesting them to be members of different species, although the latter value may not be discriminatory in this region. However, the genome of strain RU_4_4 (7.33 Mbp from the combined lengths of the contigs) is potentially chimeric and has been excluded from the genome tree; the genome of strain VB-61278 has a size (calculated as the combined lengths of the contigs) of 10.01 Mbp, of which 7.9% of the bases are ambiguous. JSpeciesWS is unable to calculate ANI values for this strain unless the ambiguous sites are removed; the ANI value between the genome of strain VB-61278 and the metagenome RU_4_4 is then 84.85%, confirming their specific separation. The remaining genomes represent strains identified, on the basis of 16S rRNA sequence identity, as members of different genera: Nostocales cyanobacterium HT-58-2 (complete chromosome, genus 1.19.5), Mastigocladopsis repens PCC 10914 (genus 1.19.3), S. hofmanni PCC 7110 (genus 1.19.6), S. millei VB511283 (genus 1.19.11) and Scytonema sp. strain UIC 10036 (a "loner" genus in family 1.19). A plasmid-borne 16S rRNA gene of strain HK-05 lies in genus 1.19.11. Note that three genomic sequences are available for "Scytonema millei" strain VB511283; one (JTJC00000000.1) is not of cyanobacterial origin or is chimeric; the single 16S rRNA sequence extracted from the second (JTJC00000000.2) lies with other Scytonema strains in genus 1.19.11 and the third (QVFW00000000) falls into Chroococcidiopsis species 3.1.14B, described below. The first sequence version of Tolypothrix campylonemoides VB511288 (JXCB00000000.1) contains no rrn operon and cannot be shown in the tree based on 16S rRNA sequences, but does group with the above strains in the genome tree; one of three 16S rRNA genes of the second version (JXCB00000000.2) unfortunately shows high similarity to Rhodopseudomonas spp., the others contain uncharacteristic segments and are impossible to align.

Genomic sequences are available for Chamaesiphon minutus strain PCC 6605 (complete chromosome, species 2.1.1A) and C. polymorphus CCALA 037 (draft, species 2.1.1D). Their extracted 16S rRNA sequences share 98.58% identity and the entire genomes show 87.45% ANI (but with only ~ 54% coverage), 98.41% CGS and 36.8% isDDH, confirming their assignment to two species. Note that the genome metrics values may be slightly inaccurate, since the genome of strain CCALA 037 is incomplete, containing only 77 of our 79 core marker genes.

Sequenced genomes exist for Moorena producens strains 3L (represented by two sequences, both draft), JHB (draft) and PAL-8-15-08-1 (complete), together with the draft sequence of M. bouillonii strain PNG5-198 (genus 3.1.3). Twelve metagenomic sequences have been excluded from this analysis. The two draft genome sequences of strain 3L show 99.90% isDDH and 99.99% ANI, their extracted 16S rRNA sequences being identical. They show 98.99% rRNA sequence identity and 67.7% isDDH, 94.64-94.65% ANI with M. producens strain JHB, which is therefore a member of the same species. The lower values of 98.65-98.79% rRNA sequence identity and 48.0-49.0% isDDH, 92.02-92.50% ANI with M. bouillonii PNG5-198 confirm their con-specificity. Despite their very different geographic origins (Leao et al., 2017), strains JHB and PAL-8-15-08-1 share 99.06% rRNA sequence identity; however, they give only 47.6% isDDH, 99.01% CGS and 91.62% ANI (with 62% coverage). The genome metrics, unlike the high 16S rRNA sequence identity, suggest these strains to be borderline members of the same species. As found by the RepSeek programme, they contain an exceptionally high number of repeat sequences, as shown in the figure below.

cyanobacterial phylogeny

The two M. producens genomes contain 3.85x105 and 4.44x105 repeat sequences (black lozenge symbols), many more than any other cyanobacterial strain. The complete chromosome of their nearest relative in the genome tree, Allocoleopsis franciscana (Microcoleus sp.) strain PCC 7113, contains only 3.35x103 repeat sequences. Only three of seven strains of Microcystis (red lozenge symbols) contain significantly more than 1x105, these being described in more detail in the appropriate paragraph. In the case of M. producens, one would expect this high number of repeats to prevent accurate determination of genetic relationships. The M. producens genomes contain a complete hgl gene cluster and hetR, but not nif genes (Leao et al., 2017); their genomes, of size 9.37-9.67 Mbp, are not unlike those of many heterocystous cyanobacteria. Leao et al. suggested that these organisms may have evolved from a heterocystous organism, a situation analogous to that observed in Raphidiopsis brookii D9. However, the latter organism lacks nifH and hglD, hetR being found in many cyanobacteria. If the hypothesis is true, one would expect members of the genus Moorena to cluster with the heterocystous cyanobacteria; this is not the case for any of the trees shown on this site.

Within genus 3.1.14, Chroococcidiopsis spp. CCNUC1, PCC 7203 (both having completed chromosomal sequences), FACHB-1243, PCC 8201 and SAG 39.79 (PCC 7433) fall into species 3.1.14A. They share 99.33-100% 16S rRNA sequence identity, 93.53-98.69% ANI, 58.3-90.1% isDDH and 99.49-99.92% CGS values. The two complete chromosomes share 98.69% ANI, 97.4% isDDH (GGDC formula 3), 99.93% CGS and 99.18% Mash similarity. The 16S rRNA gene extracted from a sixth Chroococcidiopsis sp. genome, PCC 7434, lies in a distinct species (3.1.14B), sharing only 98.12-98.39% 16S rRNA sequence identity with the members of species 3.1.14A; the genome metrics are 90.47-90.73% ANI and 43.7% isDDH. Species 3.1.14D contains the draft genomic sequence of "Scytonema millei" VB511283 (QVFW00000000); the extracted 16S rRNA gene of this strain shows 97.79-98.32% sequence identity to the member of species 3.1.14B; the genome metrics are 39.6% isDDH and 89.7% ANI. This is the third version of the genome of this organism to be submitted (by the same authors) to NCBI; the first two (JTJC00000000.1 and JTJC00000000.2), showing 56.5% isDDH, seem to be chimeric (containing cyanobacterial and bacterial segments), although a 16S rRNA gene (truncated to 1086 nt at the end of a contig) extracted from the second groups with other Scytonema sp. strains in genus 1.19.11. The latest version appears to be the genome of a Chroococcidiopsis sp. as evident from the similarity values given above, not of a heterocystous organism; this is supported by our NCBI BLAST studies using the extracted marker genes, that always find Chroococcidiopsis sp. PCC 7203 at identities greater than 90%, high ANI values with other members of genus 3.1.14, and the placement of this genome by CheckM and pplacer as a close relative of strain PCC 7203. The genome shares 91.3% isDDH with the first version but only 21.7% with the second. A fourth species of genus 3.1.14 contains the isolate named as unidentified cyanobacterium strain TDX16; the high degree of contamination of the genome sequence makes DDH studies pointless, and it has been excluded from the genome tree. The Chroococcidiopsis spp. of genus 3.1.14 are distinct from strain Chroococcidiopsis sp. PCC 6712 (currently [Jung et al., 2021] incorrectly named as Hyella disjuncta, genus, with which they share < 90% 16S rRNA sequence identity. Further members named as Chroococcidiopsis sp. lie in genera 3.1.17 and 3.1.18; only one (strain CCMEE 029, species 3.1.18B) is represented by a genomic sequence. This organism is shown only in the ribosomal and complete genome trees.

The complete chromosome of Gloeocapsa sp. PCC 7428 and the draft genome of Chroogloeocystis siderophila strain 5.2 s.c.1 (species 3.1.16B), both isolated from hot springs, show 99.40% 16S rRNA sequence identity, 92.55% ANI, 50.9% isDDH and 99.29% CGS. The 16S rRNA identity and genome metrics therefore assign them to a single species. The genomes of Gloeocapsopsis sp. AAB1 (represented by two identical sequences, isolated from the Atacama desert) and Chroococcales cyanobacterium IPPAS B-1203 (hot spring) share 98.59% 16S rRNA sequence identity, 86.54% ANI, 98.39% CGS and 34.3% isDDH, thereby representing two distinct species (3.1.16A, D) of the same genus. Chroococcidiopsis sp TS-821 (hot spring) shows 81.58-87.24% ANI, 97.51-97.55% CGS and 25.8-34.1% isDDH with the other species, thus forming a fourth (3.1.16C). This phylogenetically-defined genus therefore contains strains named as 4 different genera, plus one unidentified. All methods of similarity measurement agree for all clusters.

Monospecific genus 3.1.25 contains identical 16S rRNA sequences extracted from the draft genomes Coleofasciculus sp. LEGE 07081 and LEGE 07092, which share 99.9% isDDH and 99.96% ANI. The genome of strain LEGE 07081 appears to be chimeric, since one of two copies of the 16S rRNA gene falls into the bacterial outgroup; it has been excluded from the genome tree. Strain LEGE 07092 shows only 22.2% isDDH and 73.59% ANI with strain Coleofasciculus chthonoplastes PCC 7420 (species 3.1.4A).

Four organisms represented by draft metagenomes of genus 4.2.1 (Roseofilum reptotaenium AO1-A, RR100-1 and the unidentified cyanobacteria UBA1583 and UBA2566) appear to be con-specific, with 16S rRNA sequence identities of 99.66-100%, isDDH values of 56.3%-57.1%, 93.58-99.53% ANI and 99.06-99.97% CGS. The UBA1583 and UBA2566 metagenomes show an isDDH value of 97.7%, 99.53% ANI and 99.97% CGS. Eleven other metagenomes assigned to this genus (not shown in the 16S rRNA-based tree, because they do not contain the gene) group with AO1-A, RR100-1, UBA1583 and UBA2566 with isDDH values of 56.8-78.7%, 93.51-100% ANI and 99.36-100% CGS. All members of this cluster, most named as Roseofilum sp., therefore represent a single species, despite their diverse geographic origins.

Four strains of Desertifilum sp. (FACHB-866, FACHB-868, FACHB-1129 and IPPAS B-1220) lie in cluster 4.3.1 and are identical in 16S rRNA sequence. They show 98.8-99.9% isDDH, 100% CGS and 99.94-99.99% ANI, confirming their con-specifity. For comparison, they share only 92.44% 16S rRNA sequence identity, 68.96% ANI, 88.50% CGS and 19.7% isDDH with their nearest relative, the metagenome Hormoscilla sp. GUM202, in genus 4.4.1. Generic separation is therefore confirmed by all metrics.

Family 4.4 contains a single genus and named species, Hormoscilla spongeliae, whose members are all cyanobionts with various marine sponges as hosts. The 16S rRNA genes extracted from the metagenomes GM7CHS1pb, GUM202, SP5CHS1 and SP12CHS1 share 98.99-100% sequence identity, assigning them to a single species (4.4.1C). Con-specificity is confirmed by ANI values of 93.74-99.94% and isDDH values of 55.2-98.6%. The four metagenomes divide into two clusters: (a) GM7CHS1pb with GUM202, with 99.94% ANI, isDDH 98.6%, and (b) SP5CHS1 and SP12CHS1 showing 92.88-93.22% ANI, 55.2-55.3% isDDH with the members of cluster (a). The two members of cluster (b) internally show 98.12-99.54% ANI, 83.2-83.8% isDDH. The additional GUM007 metagenome (species 4.4.1B) is chimeric and has been excluded from similarity estimation. The 16S rRNA sequence extracted from this genome shows 97.90-97.98% identity to the others, suggesting placement into a second species. Two other species of this genus are defined by 16S rRNA sequence identity in our tree, but are not represented by genomic sequences.

The 49 strains represented by genomic sequences and 5 metagenomes of Microcystis spp. present in the 16S rRNA tree (genus exhibit a minimum 16S rRNA sequence identity of 98.85% and therefore appear to be members of a single species, which by consensus would be M. aeruginosa. The specific names given for many members of genus (bengalensis, flos-aquae, ichthyoblabe, panniformis, protocystis, pseudofilamentosa, ramosa, robusta, smithii, viridis, wesenbergii) may therefore be considered invalid. The gene extracted from the draft metagenome TA09 gives lower 16S rRNA identity values; this may show poor assembly of the genome that consequently contains foreign DNA. Of the 44 genomes shown in the genome tree, only 10 are complete (strains FACHB-1757, FD4, MC19, NIES-88, NIES-102, NIES-298, NIES-843, NIES-2481, NIES-2549 and PCC 7806SL). Their high (99.60-100%) 16S rRNA sequence identity is mirrored by ANI values of 94.8-99.49%, 95.31-99.69% Mash similarity and 99.20-99.97% CGS, confirming the con-specificity of the strains. However, isDDH (GGDC formula 3) values greater than the intra-species limit of 62.10% are observed only for strains NIES-102, NIES-298, NIES-1757 and PCC 7806SL, the remainder being placed into a second species.

We have analyzed core and accessory genomes for each strain, made with the spine software (Ozer et al., 2014), using only the 10 available complete sequences. The core genomes represent 56.39-68.63% of the total genome and show slightly higher mol% G+C (43.5-44.0 vs 42.09-42.92%). The addition of the draft genomic sequences would further slightly reduce the size of the core genome (see Humbert et al., 2013). With the core genomes, ANI values increased only slightly to a minimum of 95.81%, while isDDH values increased dramatically to 96.6-100% and Mash similarities to 100%. All core genomes contain our 79 marker genes (therefore the CGS values were unchanged) plus two complete rrn operons. In contrast, all pairwise crosses of the accessory genomes gave ANI values of at most 94.32% and a maximum isDDH value of 37.8%. Unlike the core genomes, the accessory genomes did not contain rrn operons or our marker genes. We have shown the presence of an unusually high number of repeat sequences in the complete chromosomes of members of this genus, using the RepSeek software (Achaz et al., 2006) with a minimum seed value of 25. The complete chromosomes vary not only in size (4.29-5.87 Mbp) and number of genes (4606-6671) but also in their content of paired repeats (3.46x104-1.54x105), the highest being found in strain M. panniformis FACHB-1757 (CP011339); see the figure included in the description of Moorena. The majority (66-70%) of the repeat sequences fall into the core genome, and within a single strain the core and accessory genomes share 28-36% of these elements, showing that they occur at least twice within the genome. The core genomes themselves share 49-82% of their total repeats between strains, whereas the accessory genomes share only 18-25%. The combined results show the presence of a conserved core genome and a variable accessory genome in M. aeruginosa, and collectively unite all the strains into a single species. These results confirm and extend those of Humbert et al. (2013), who commented on a genome evolutionary strategy that combines a large genome plasticity, characterized by a high number of repeated sequences, numerous rearrangements and an ability to include new adaptive genes by horizontal gene transfer. Our unpublished studies using wet-lab optical thermal renaturation DDH show the con-specificity of 5 strains of the PCC (PCC 7005, PCC 7806, PCC 7813, PCC 7820 and PCC 7941), having DDH values of 64-100%. Results with 3 of these strains can be compared with those of their incomplete genomes obtained with isDDH (the latter in parentheses): PCC 7806 x PCC 7941 64% (72.4%); PCC 7806 x PCC 7005 74% (72.1%); PCC 7941 x PCC 7005 76% (80.1%). The members of this toxin-producing genus require further detailed study. Several strains, apparently misidentified as Sphaerocavum brasiliense, also fall into genus but are not represented by genomic sequences; more details of these are given on the main page of this site.

The two available genomic sequences of Gloeothece, Gloeothece citriformis PCC 7424 and Gloeothece verrucosa PCC 7822, both previously named as Cyanothece spp., form two genera (species and Their completed chromosomes share only 17.4% isDDH (GGDC formula 3), 78.22% ANI (with 50% coverage), 80.39% Mash similarity, 94.65% CGS and 95.2% identity of their extracted 16S rRNA genes. The genome metrics values suggest that they are either borderline members of different genera or assignable to two species of a single genus.

Genus contains two draft metagenomic sequences of pleurocapsalean taxa named as Hydrococcus sp. These are identical in 16S rRNA sequence, show 99.61% ANI, 99.2% isDDH and share (with their nearest relatives) only 93.54% 16S rRNA sequence identity (70.0-70.18% ANI, 19.4% isDDH) with the complete chromosome of Gloeothece verrucosa PCC 7822 (genus and 94.88% 16S rRNA sequence identity (76.39-76.64% ANI, 22.6% isDDH) with the singleton Pleurocapsa sp. PCC 7327.

Species contains six strains (Crocosphaera watsonii WH 0003, WH 0005, WH 0401, WH 0402, WH 8501 and WH 8502) that share 99.64-100% 16S rRNA sequence identity. These strains show 63.3-93.7% isDDH, 98.29-99.58% ANI and 99.78-100% CGS; all genome metrics therefore confirm their con-specificity. This is unlike the situation in the Crocosphaera members of species

The endosymbiont Candidatus Atelocyanobacterium thalassa ALOHA (UCYN-A1), submitted as a complete genome (CP001842), contains only 76 of 79 core marker genes and cannot be shown in the genome trees. The two complete rrn operons fall into genus with Candidatus Atelocyanobacterium thalassa SIO64986 (=UCYN-A2, JPSP00000000), also containing 2 complete rrn operons. The extracted 16S rRNA genes share 98.72% sequence identity, suggesting them to probably represent two species; the genomes show 26.1% isDDH, 84.1% ANI. The isDDH value (GGDC formula 2) is ambiguous in this region, but the ANI value confirms specific separation of the two organisms represented by these metagenomes. CheckM finds them to be only 73.9% and 74.2% complete, respectively. The complete genome of Cyanobacterium endosymbiont of Braarudosphaera bigelowii CPSB-1 (AP024987), containing 77 core marker genes, lies in species and is con-specific with the organism represented by the metagenome SIO64986 (99.93% 16S rRNA sequence identity, 94.2% isDDH, 99.27% ANI, 99.80% CGS). Genus contains 2 complete genome sequences of comparable strains: endosymbiont of Epithemia turgida EtSB (AP012549) and endosymbiont of Rhopalodia gibberula RgSB (AP018341), estimated by CheckM to be 84.86% and 89.52% complete, respectively. Both contain 79 core marker genes. These are members of different species of the genus, sharing 98.65% 16S rRNA sequence identity, 58.8% isDDH (GGDC formula 3), 87.48% ANI, 89.73% Mash similarity and 96.23% CGS. Generic separation of these two groups of symbionts is confirmed by low values of all metrics (95.26% 16S rRNA sequence identity, 13.1% isDDH, 70.04-70.20% ANI, 82.60-83.37% CGS).

Within genus, the complete chromosomes of Rippkaea orientalis (formerly Cyanothece sp.) strains PCC 8801 and PCC 8802 are con-specific, showing 96.2% isDDH (GGDC formula 3) and 98.89% ANI, in good agreement with the 99.93% identity of their extracted 16S rRNA genes, Mash similarity of 99.17% and CGS of 99.91%.

The genus contains the draft genome sequences of 2 organisms, Aphanothece sacrum strains FPU1 and FPU3. These show 100% 16S rRNA sequence identity, 99.99% ANI and 100% isDDH, and are therefore identical.

Two Stanieria spp. strains (NIES-3757 and PCC 7437, genus with completed chromosomes share 98.72% 16S rRNA sequence identity, and fall into species and, respectively. Their genomes show 56.1% isDDH (GGDC formula 3), 89.64% ANI, 91.31% Mash similarity and 98.61% CGS, confirming their specific separation.

Three Prochloron didemni metagenomes (genus, from wide geographic origins, exhibit 99.70-100% 16S rRNA sequence identity, 96.90-99.19% ANI and isDDH values of 78.4-93.4%, confirming their membership of a single species. The genome P1 was omitted from the genome tree, since it is incomplete.

Geminocystis spp. strains NIES-3708, NIES-3709 and PCC 6308 (genus are assigned to three species on the basis of their 16S rRNA sequence identities of 97.64-98.04%. This is confirmed by low identities of isDDH (25.8-27.4%), ANI (79.75-82.40%) and CGS (95.03-96.44%).

Separation of Cyanobacterium aponinum (represented by strain PCC 10605) and C. stanieri (strain PCC 7202) at the generic rather than specific levels is confirmed by 16S rRNA sequence identity (93.40%), ANI (72.85%), isDDH (13.9%, GGDC formula 3), Mash similarity (75.70%) and CGS (91.39%) with their completed chromosomal sequences; these strains fall into genera (species and (species in the tree. The four strains of species, PCC 10605 (complete chromosome), 0216 (draft), FACHB-4101 (draft) and IPPAS B-1201 (draft), are almost identical (99.80-100% 16S rRNA sequence identity, 80.4-86.3% isDDH, 98.15-98.43% ANI). The potentially chimeric nature of the genome of strain 0216 cannot be excluded, since both 16S rRNA and 23S rRNA genes of one of two operons fall into the bacterial outgroup. Within genus, C. stanieri strain PCC 7202 is separated at the specific level from Cyanobacterium sp. strains IPPAS B-1200 and HL69, sharing 97.91% 16S rRNA sequence identity, 83.01-83.11% ANI and isDDH values of 27.1-27.2%. The latter two strains show 99.46% 16S rRNA sequence identity, 95.81% ANI and an isDDH value of 66.2%. The complete chromosomes of the strains of the two genera show a marked difference in size (4.11 Mbp for C. aponinum PCC 10605, 3.16 Mbp for the two members of genus and mol% GC (35.0 and 37.8-38.7, respectively). The draft genome of C. stanieri strain LEGE 03274 is seen in species, sharing 98.86% 16S rRNA sequence identity with strain PCC 7202. However, genome metrics suggest that these strains be separated into 2 species (84.40% ANI, 97.83% CGS; the isDDH value of 30.6% is not discriminatory at this level).

Within genus (renamed to Picosynechococcus by Komárek et al., 2020), in which all strains are of marine origin, 2 species ( and C) defined by 16S rRNA sequence identity (sharing only 98.05% identity) are supported by low isDDH (20.8-30.7%) and ANI (77.76-77.95%) values of their genomes. In species, the six 16S rRNA sequences extracted from the respective completed chromosomes of unicellular strains (Picosynechococcus sp. strains PCC 7002, PCC 7003, PCC 7117, PCC 8807, PCC 11901 and PCC 73109) show a range of 99.53-100% identity; the isDDH values for these genomes are 77.6-93.7% (GGDC formula 3), with ANI estimates ranging from 96.10 to 99.91%, CGS from 99.14 to 99.80% and Mash similarities from 97.28 to 97.60%. These strains are therefore confirmed as con-specific by all estimates. The two strains of species, Picosynechococcus sp. NIES-970 and NKBG15041c, show 100% rRNA sequence identity, 98.51% ANI and 87.6% isDDH, and are therefore again con-specific.

Two strains of Spirulina, PCC 6313 and PCC 9445, carrying different specific epithets (major and subsalsa) are clearly separate genera, sharing only 92.4% 16S rRNA sequence identity, 69.26% ANI, 88.16% CGS and an isDDH value of 20.3%. Although isDDH values are not discriminatory in this range, the ANI and CGS values confirm the generic separation of these strains. They are shown in the 16S rRNA tree in genera and, respectively. A third genomic sequence, Spirulina major CCY15215, lies in genus This shows only 19.6% isDDH, 68.84% ANI and 88.41% CGS with strain PCC 9445, and 23.6% isDDH, 67.97% ANI, 87.14% CGS with strain PCC 6313, thereby confirmed as a third genus. Cyanobacteria bacterium SBLK, represented by a draft metagenome, shows 78.81% ANI with strain CCY15215, and is possibly a second species of the genus; the CGS value of 94.59% is on the borderline between species and genus discrimination and the isDDH value of 22.0% is not discriminatory. This metagenome lacks a rrn operon and is consequently excluded from the 16S rRNA tree. The metagenome Spirulina sp. SIO3F2 also lacks a rrn operon; this represents a fourth genus, showing maximum values of 23.6% isDDH, 69.54% ANI and 87.87% CGS with members of the other genera. Two incomplete metagenomic sequences, Spirulina spp. DLM2.Bin59 and S17.Bin059, have been excluded from all trees.

As described on the main page, members of the genus Arthrospira ( have been renamed to Limnospira (with L. fusiformis as the type species) since some uncultured organisms named as the type species of Arthrospira, A. jenneri, fall elsewhere in the tree (genus We do not accept this change, based only on environmental samples. Of the many cyanobacterial strains assigned to the genus only 12 extracted 16S rRNA sequences, plus two extracted from the complete chromosomes of L. fusiformis strain SAG 85.79 and the draft genome of L. fusiformis strain KN, are shown in the 16S rRNA trees. A second 16S rRNA sequence extracted from the metagenome Limnospira sp. RM-2019 contains an unlikely number (2.98%) of mismatches in paired regions and is not shown. The strains of genus are all closely related, sharing 99.13-100% 16S rRNA sequence identity, suggesting that all members of Arthrospira can be assigned to a single species. This conclusion is supported by our genome trees based on 79 conserved marker genes, and by isDDH analysis of the available genomes. The genomes show 89.8-99.2% isDDH, despite the diverse geographical origins of the strains (Africa, China, Mexico, Switzerland). ANI values of 93.37-99.10%, Mash similarities of 95.58-99.52% and CGS values of 98.99-99.79% confirm that all Arthrospira strains (carrying 6 specific epithets) can be assigned to a single species, for which we suggest the name A. platensis. A similar conclusion was reached by Xu et al (2016) using a more restricted dataset, and these authors showed "unprecedented extensive chromosomal rearrangements" among the genomes. The latter are evident in the figure below (reciprocal comparisons of strains PCC 8005 and YZ).

cyanobacterial phylogeny

This is also evidenced by our CGS analysis: the positions of the 79 conserved marker genes on the same genomes are shown in the following figure. Note that strain O9.13F has been excluded from DDH calculation because the genome is only 84.1% complete as estimated by CheckM and contains only 72 of our 79 conserved marker genes. A. maxima strain CS-328, 96.6% complete and containing only 73 of the 79 marker genes, has also been excluded.

cyanobacterial phylogeny

The 2 available genome sequences in species, Lyngbya aestuarii BL J and Lyngbya sp. PCC 8106, appear to be con-specific, sharing 99.79% 16S rRNA sequence identity, 91.69% ANI and 46.8% isDDH.

Species is composed of 16S rRNA genes extracted from three Okeania sp. metagenomes, showing 99.60-100% sequence identity. A further 7 are shown in the genome tree. The 10 genomes show 76.1-99.8% isDDH (except SIO1I7, 47.1%) and 94.87-99.97% ANI (SIO1I7 90.88-90.98%), confirming their con-specificity. The SIO1I7 genome is unexpectedly larger (calculated as the sum of contig lengths) than the others (9.50 vs 8.40-8.88 Mbp), indicating a degree of duplication.

Three genomic sequences, identical in the sequence of their extracted 16S rRNA genes, lie in species, named as Planktothicoides sp. They share 91.5-91.6% isDDH and 98.85-99.98% ANI.

Baaleninema simplex strains FC II and PCC 7105 (species are identical on the basis of their 16S rRNA sequence identity (100%); their con-specificity is confirmed by values of 74.3% isDDH and 96.57% ANI of their draft genomes; the sequence of strain PCC 7105 contains only 75 of the 79 marker genes and has been excluded from the genome tree.

Within genus, the 23 metagenomes are identical, with 100% 16S rRNA sequence identity, 93.1-100% isDDH, and separated at the generic level from the sister taxon QS_8_64_29 (90.26% 16S rRNA sequence identity, 19.3% isDDH, 67.57% ANI). These genomes are not shown in our genomic tree.

Oscillatoria nigro-viridis PCC 7112 and Microcoleus vaginatus strains FGP-2, LEGE 07076 and PCC 9802 in species share a minimum of 98.93% 16S rRNA sequence identity, their genomes showing 49.4-96.8% isDDH and 91.53-99.97% ANI. The two M. vaginatus strains within this species show 96.8% isDDH and 99.97% ANI, their 16S rRNA sequences being identical. The genome metrics therefore confirm the con-specificity of these strains. M. vaginatus strain PCC 9802 shows only 91.36% 16S rRNA sequence identity with Allocoleopsis franciscana (Microcoleus sp.) PCC 7113 in genus 3.1.10; generic separation is confirmed by values of 24.4% isDDH (not discriminatory at this low level), 68.63% ANI and 87.89% CGS. Strain PCC 7113 (species 3.1.10A) shares 98.72-98.79% 16S rRNA sequence identity with strains FACHB-1 and FACHB-53 in species 3.1.10B; these are borderline values for separation at the specific level, which is confirmed by 41.8% isDDH and 89.19-89.25% ANI. That strains FACHB-1 and FACHB-53 are con-specific is shown by values of 99.93% 16S rRNA sequence identity, 85.0% isDDH and 97.72% ANI. Allocoleopsis franciscana PCC 7113 shows only 90.51% 16S rRNA sequence identity with Microcoleus sp. strain IPPAS B-353 in genus (described in more detail above with other halophiles); although isDDH (27.8%) may not be of great value within this range, generic separation is supported by values of 66.96% ANI and 85.67% CGS.

Eight metagenomic sequences each named as Oscillatoriales cyanobacterium in species share 99.33-100% 16S rRNA sequence identity and 85.7-99.3% isDDH. Two further metagenomic sequences, in species, have identical 16S rRNA sequences and show 98.1% isDDH, 99.85% ANI; they separate from the members of the preceding species with 98.84-99.30% 16S rRNA sequence identity, ~27.5% isDDH and ~85% ANI. Species also contains four draft genomes of Tychonema sp., only three of which are shown in the genome tree, sharing 99.06-100% 16S rRNA sequence identity, 47.0-65.8% isDDH, 90.37-95.61% ANI and 98.36-99.32% CGS. The genome metrics therefore place the strains on the borderline of species separation. These strains share 97.92-98.19% 16S rRNA sequence identity, 27.8-28.3% isDDH (ambiguous in this range), 82.37-82.59% ANI and 96.38-96.42% CGS with strain FEM_GT703, the single member of species, confirming separation at the species level.

Species contains two metagenomes named as Microcoleaceae bacterium UBA9251 and UBA11344; these show 64.0% isDDH and 94.41% ANI, the extracted 16S rRNA genes sharing 99.4% sequence identity. The data therefore confirm the con-specificity of the uncultured organisms. However, the 16S rRNA sequence extracted from the UBA11344 genome is short (1115 nt), revealing the fragmented nature of this genome.

Within species, the draft genomes of Kamptonema spp. strains PCC 6407 and PCC 6506 show 96.1% isDDH and 99.95% ANI, their 16S rRNA sequences being identical. Strain PCC 6407 shows only 27.2% isDDH (80.34% ANI) and 97.91% 16S rRNA sequence identity with the unidentified Oscillatoriales strain USR001, a member of the second species of this genus. The latter has been excluded from the genome tree, containing only 31 of our 76 conserved markers; the isDDH and ANI values are evidently approximate, since the genome is incomplete.

Species of the 16S rRNA tree contains the genes extracted from three Pseudanabaena sp. genomes (one being a metagenome, not shown in the genome tree), sharing 99.33-99.80% sequence identity. The two remaining genomes fall onto the borderline of species separation, sharing 40.7% isDDH, 89.19% ANI and 98.96% CGS.

Two genomes from Synechococcus sp. strains PCC 7502 (complete chromosome) and PCC 9635 (draft genome) in the mono-specific genus, falling among the "Pseudanabaena" genera, share 99.73% 16S rRNA sequence identity but only 24.1% isDDH and 79.14% ANI. However, the genome metrics may be artificially low, since the genome of strain PCC 9635 contains extensive duplications, as detected by CheckM, additionally evidenced by the larger genome size (4.81 Mbp versus 3.51 Mbp for strain PCC 7502).

Four Limnothrix sp. sequences of species (isolates FACHB-1083, PR1529 and the metagenomes CACIAM 69d, P13C2) share 99.79-100% 16S rRNA sequence identity, 54.2-99.9% isDDH, 93.83-99.96% ANI and 99.52-100% CGS; the genome metrics therefore confirm their con-specifity.

Although many genomic sequences of Leptolyngbya isolates and generic clusters thereof which have recently been renamed (e.g. Nodosilinea) are available, they fall as single members of separate genera and species; comparisons are therefore pointless, with the few following exceptions. The 2 strains lying in species 6.1.1B, the unidentified filamentous strains CCP1 and CCT1, share 99.19% 16S rRNA sequence identity and only 22.0% isDDH, 68.07% ANI. Strain CCP1 was excluded from the genome tree since the genome sequence is incomplete, carrying only 58 of the 79 marker genes; this may explain the low DDH and ANI values. Species 6.1.1F contains two Nodosilinea sp. sequences that share 99.93% 16S rRNA sequence identity; the values of 58.7% isDDH and 95.50% ANI clearly support their con-specificity. The two sequences in species 6.1.1M, Leptolyngbya antarctica ULC041 and Leptolyngbya sp. ULC073, are both metagenomic assemblages. They are identical in 16S rRNA sequence and show an isDDH value of 86.1% and 98.08% ANI, confirming their con-specificity. Halomicronema excentricum strain Lakshadweep and Lyngbya confervoides BDU141951 (species 6.1.5A) are similarly identical in 16S rRNA sequence and share 98.7% isDDH, 99.98% ANI. Unfortunately, the sequence of the former genome is chimeric and version 1 of the latter genome is 23.1% contaminated; both have been excluded from the genome tree. Version 2 of the Lyngbya confervoides BDU141951 genome is identical to version 1 in 16S rRNA sequence, has little contamination, and is included in the genome tree. The genome of this organism shares 82.04% ANI and 97.93% CGS with that of Halomicronema sp. CCY15110 (species 6.1.5E), thus being confirmed as a member of a different species. The value of 25.5% isDDH is not discriminatory.

In species 6.1.6A, two strains, Leptolyngbyaceae cyanobacterium CCMR0081 and CCMR0082, are identical in 16S rRNA sequence and show 87.8% isDDH, 98.27% ANI; they share 99.33% rRNA sequence identity, 63.2% isDDH and 94.30-94.37% ANI with Leptolyngbya sp. PCC 7375 in the same species. The metagenomic Phormidesmis priestleyi Ana ITZX contains no rrn operons and cannot be included in the 16S rRNA tree. Species 6.1.6B and 6.1.6D contain the draft genomic sequences of two strains of Leptothoe (Leptothoe spongobia TAU-MAC 1115 and Leptothoe kymatousa TAU-MAC 1615). These share 97.36% 16S rRNA sequence identity, but this apparent specific, rather than generic, separation cannot be confirmed by genome metrics: the low (20.2%) isDDH value is non-discriminatory in this region, but the 74.23% ANI value places the strains into two different genera. The strains are not included in the genome tree, because they contain only 66 and 58 of our 79 gene marker set, therefore a CGS value cannot be calculated. These sponge-associated organisms were suggested (Konstantinou et al., 2021) to have undergone reduction of genome size; this is surely an invalid remark, based on incomplete draft genomes, and most likely explains the low genome metrics values obtained with genomes which lack different regions.

The 16S rRNA gene extracted from the genomic sequence of Leptolyngbya sp. strain 2LT21S03, occupying a single branch within family 6.2, does not match the normal 16S rRNA sequence derived by PCR amplification (FM177494, family 6.3), sharing only 87.5% rRNA sequence identity; both are included in the 16S rRNA tree. The genome sequence of strain 2LT21S03 is too incomplete to be included in the genome tree. Strains Leptolyngbya sp. O-77 and Thermoleptolyngbya sp. PKUAC-SCTA183 (species 6.2.8A), both from hot or thermal springs, sharing 99.53% 16S rRNA sequence identity, are similar in genome size (5.48 and 5.52 Mbp) and GC content (55.9, 56.4 Mol%); their complete chromosomes show 66.7% isDDH, 89.65% ANI, 92.75% Mash similarity and 99.15% CGS, confirming their con-specificity Three strains of "Candidatus curcubocaldaceae cyanobacterium", Thermoleptolyngbya albertanoea ETS-08 and two strains identified only as "thermophilic cyanobacterium", not represented by genome sequences, also fall within species 6.2.8A; all members of this species were isolated from thermal environments. The position of Thermoleptolyngbya sp. strain PKUAC-SCTB121 in the tree is uncertain, since the two copies of the 16S rRNA gene appear in different genera: one in genus 6.2.8, the second in genus 6.3.33. Although the latter shows 98.92% sequence identity with the two identical 16S rRNA genes of Leptodesmis sichuanensis PKUAC-SCTA121, the genome metrics clearly define the strains as members of different genera (isDDH 13.1%, ANI 69.49%, MASH similarity 72.60%, CGS 89.22%). Strain PKUAC-SCTB121 is more likely a member of genus 6.2.8, sharing 97.15-97.63% 16S rRNA sequence identity, 68.1-99.8% is DDH (GGDC formula 3, complete genomes), 89.47-99.70% ANI, 92.76-99.82 MASH similarity and 99.16-99.93 CGS with strains O-77 and PKUAC-SCTA183.

Within the mono-specific genus 6.3.1 (Leptolyngbya sensu stricto), the 4 strains of L. boryana (3 having completed genome sequences) are identical (100% 16S rRNA sequence identity) and show high values of 97.4-100.0% isDDH, 99.14-100% ANI. Although the sizes of the complete chromosomes are virtually identical (6.18-6.26 Mbp), the draft genomes are larger (6.65-7.26 Mbp). The five draft genomes show 99.93-100% 16S rRNA identity, 83-100% isDDH and 97.25-100% ANI. Isolates within this cluster are unfortunately named as 6 different genera and 8 species. Strain NIES-2135 carries a 16S rRNA gene on a plasmid, in addition to the chromosomal copy; both are identical. See Section B2 (above) for strains whose placement is ambiguous in other genera named as Leptolyngbya.

The genus Phormidesmis is problematic, since strains bearing this name are found in genera of two families: 6.1.9, 6.1.10, 6.1.11, 6.3.18, 6.3.20 and 6.3.27, with 3 additional loner genera in family 6.3. In genus 6.3.18, the only cluster that contains strains represented by genomic sequences, two strains named as Phormidesmis (BC1401 and ULC007, the latter represented by 3 identical sequences), are assigned to different species (6.3.18A and 6.3.18B, respectively, showing only 97.31% 16S rRNA sequence identity) in the 16S rRNA tree. Values of 26.3% isDDH (not discriminatory in this range), 81.90% ANI and 97.55% CGS confirm specific separation. They show even lower values (93.5-94.1% 16S rRNA sequence identity, 20.0-21.9% isDDH, 68.34-68.48% ANI and 88.99-89.35% CGS) with the metagenomic sequence Alkalinema sp. CACIAM 70d, which therefore lies in a distinct genus (6.3.22), sharing 99.66% 16S rRNA sequence identity with the draft genomic sequence of strain FACHB-965. The genome sequence of Phormidesmis priestleyi strain Ana ITZX contains neither 16S rRNA nor 23S rRNA genes and cannot be shown in the 16S rRNA trees; this sequence is clearly also a member of a different genus to strains BC1401 and ULC007, showing only 19.6-20.1% isDDH, 66.79-66.85% ANI and 84.04-84.40% CGS. The metagenome CACIAM 70d and strain Ana ITZX, showing 66.43% ANI and 82.95% CGS, are themselves assigned to different genera. Two genome sequences of Phormidesmis priestleyi strain ULC027 are incomplete and lack both 16S rRNA and 23S rRNA genes; they are shown neither in the trees based on 16S rRNA nor in the genome tree.

Genus 6.3.25 contains the extracted 16S rRNA gene sequences of two strains, Leptolyngbya sp. ULC077 and Phormidium sp. CLA17, in two different species. They show 96.77% identity, the draft genomes sharing 28.0% isDDH, 83.23% ANI and 97.39% CGS. Genome metrics therefore confirm the specific separation of these strains.

The mono-specific genus 7.1.4 contains the metagenomic sequences Aphanocapsa feldmannii 277cI and 277cV. These are identical on the basis of 16S rRNA sequence identity and show 90.9% isDDH, 98.27% ANI. However, the genome sequence of 277cI is incomplete, containing only 70 of our 79 core marker genes; this is reflected by a large difference in apparent genome size (impossible to determine precisely with draft genomes), that of 277cV being 2.55 Mbp versus only 1.92 Mbp for 277cI. The latter sequence has been excluded from the genome tree. Two uncultured organisms, for which genomic sequences are not available, also fall into this genus on the basis of 16S rRNA sequence identity. The 16S rRNA genes extracted from five metagenomes of Candidatus Synechococcus spongiarum and one named as Synechococcus sp. (mono-specific genus 7.1.5) share a minimum of 99.12% identity. The genomes vary in size (estimated as the combined lengths of their contigs) from 1.45 Mbp to 2.27 Mbp, resembling many of the Prochlorococcus and the Synechococcus OMF sister group described above. Of the genomes common to both 16S rRNA and genomic trees, we can compare only 3, all Candidatus Synechococcus spongiarum metagenomes: 15L, bin9 and 142. The genome 15L is identical in 16S rRNA sequence to bin9; the two genomes show 91.0% isDDH and 99.24% ANI, confirming their con-specificity. Although 15L is suggested by 16S rRNA identity of 99.33% to be con-specific with 142, the ANI value of 86.35% indicates that the organisms represented by these metagenomes should be assigned to different species. The value of 32.2% obtained with isDDH formula 2 is ambiguous in this region. Although both metagenomes lack 2 of our 79 core marker genes, the missing genes are different: 15L lacks gmk and rimM, whereas 142 lacks purM and smpB. This indicates that the two genomes lack different segments, which will decrease the ANI value, in contrast to the Candidatus Atelocyanobacterium thalassa metagenomes of genus which both lack the same pair of marker genes (purM and rbcS). In showing 94.60% 16S rRNA sequence identity, 19.3% isDDH and 69.70% ANI with 15L, strain A. feldmannii 277cV is confirmed as a member of a distinct genus. As described in the discussion of the Synechoccocus OMF clade above, and shown in the figure, Synechococcus spongiarum was suggested to be re-named to Synechospongium by Salazar et al. (2020).

In the mono-specific genus 7.2.1, Synechococcus elongatus strains PCC 6301 (2 sequences), PCC 6311, PCC 7942, PCC 7943 and UTEX 3055, together with Synechococcus sp. strain UTEX 2973 (all complete chromosome sequences) and FACHB-1061 (draft sequence) are identical, showing 100% 16S rRNA sequence identity, 96-100% isDDH, 98.31-99.99% ANI. 98.74-99.93% Mash similarity (for the complete chromosomes only) and 99.88-100% CGS. Strain UTEX 2973 is a mutant of strain UTEX 625, of which the axenic isolate is strain PCC 6301, therefore the identity values for these strains are not surprising. Strain UTEX 3055 was isolated from the same habitat as strain PCC 6301 (Waller's Creek, Texas, USA), but over 50 years later. We have been unable to confirm the origin of strain PCC 7942. Strain PCC 11802 (draft genome sequence) is not shown in the 16S rRNA tree since the genome lacks 16S rRNA genes; values of 25.5% isDDH, 82.20% ANI and 98.44% CGS place this strain and PCC 6301 into different species. The draft genome of strain PCC 11801 shows only 52.4% isDDH, 98.47% CGS and 82.19% ANI with strain PCC 6301; the genome of strain PCC 11801 contains only a short (618 nt) 16S rRNA fragment that shares 99.35% identity with that of strain PCC 6301 over the comparable region. Since the strain is similar in genome size to the other members of this cluster (2.69 Mbp to 2.77 Mbp), the low genome metric values must indicate that it has both duplicated and missing regions. Strain FACHB-242 is UTEX 625, from which the axenic strain PCC 6301 was derived. Prochlorothrix hollandica strains CALU 1027 and PCC 9006, which are identical isolates held in different culture collections, in the single species of genus 7.3.1, share 100% rRNA sequence identity. Since the genome sequence of strain CALU 1027 is incomplete, a DDH value was not computed and the sequence was omitted from the genome tree.

Within species 8.2.3A, Acaryochloris sp. CCMEE 5410 (draft genome), Acaryochloris marina MBIC10699 and MBIC11017 (complete chromosomes) share 99.33-100% 16S rRNA sequence identity, 53.2-95.4% isDDH, 92.83-99.35% ANI and 99.43-99.8% CGS. Acaryochloris thomasi RCC1774 (draft), despite the generic name, has a unique pigment composition (lacking Chl d, a pigment characteristic of the genus Acaryochloris) and is clearly a member of a distinct genus, showing only 94.53-94.66% 16S rRNA sequence identity and around 20.8% isDDH, 68% ANI, 88.2% CGS with the 2 members of genus 8.2.3. A. marina strain S15, represented by a complete chromosomal sequence, is more problematic. It clusters, although loosely, with strains MBIC10699, MBIC11017 and CCMEE 5410 in the genome tree, and shows 48.0% isDDH (GGDC formula 3), 84.76% ANI and 98.19% CGS with strain MBIC11017, thus being a distinct species. However, the 16S rRNA sequence identity is 99.00%. Note that three metagenomes (CRU 2 0, RU 4 1 and SU 5 25) are incomplete, containing respectively 74, 73 and 37 of our 79 core markers), and have been excluded from the genome tree and genome metric calculations.

Unlike the situation in Thermosynechococcus described above, a second cluster of thermophilic Synechococcus spp. show a speciation pattern that is coherent with all methods of homology measurement. The separation of Synechococcus spp. JA-3-3Ab (species 10.1.1A) and JA-2-3B’a (species 10.1.1B) into two species is justified by their low (96.65%) 16S rRNA sequence identity, isDDH (44.9%), ANI (84.61%), Mash similarity (89.44%) and CGS (96.19%) values of their complete chromosome sequences. Strain JA-3-3Ab shares 99.86-100% 16S rRNA sequence identity and isDDH values of 82.9-93.4% with the draft genomes of 6 other thermophilic members of the same species. Genus 10.1.2 contains only a single strain (S. bigranulatus Rupite) whose metrics can be compared in the ribosomal and genomic trees; this shares 94.39-94.80% 16S rRNA sequence identity, 20.8% isDDH and 76.74-77.12% ANI with the members of genus 10.1.1. All of these strains have a 17 nt deletion in helix 49 of the 16S rRNA sequence. The distance estimates from the different methods therefore agree for this cluster. This is also true for the Thermoleptolyngbya spp. described above.

Thermomargarita ahousahtiae VI4D9 (species 10.2.1A) and Gloeomargarita lithophora D10 (species 10.2.1D), both represented by complete genomes, share only 98.25% 16S rRNA sequence identity. The values of 32.4% isDDH (GGDC formula 3), 81.66% ANI and 94.61% CGS confirm this specific separation. T. ahousahtiae V14D9 (species 10.2.1A) is con-specific with strain Synechococcus sp. C9 (not shown in the genome tree), sharing 99.12% 16S rRNA sequence identity, 50.0% isDDH and 92.64% ANI.

At the base of the 16S rRNA and genome trees, the complete genomes of Gloeobacter violaceus strain PCC 7421 and Gloeobacter morelensis MG652769 (species 11.1.1A), described respectively by Nakamura et al. (2003) and Saw et al. (2021), share 99.93% 16S rRNA sequence identity, despite their different specific names. This con-specificity is confirmed by genome metrics: 67.1% isDDH (GGDC formula 3), 91.96% ANI (a borderline value), 94.04% Mash similarity. The value of 98.31% CGS may suggest an inter-specific separation. The complete genome of G. kilaueensis strain JS1 (species 11.1.1B) shares 98.65% 16S rRNA sequence identity, 17.0% isDDH (GGDC formula 3), 73.6% ANI, 78.69% Mash similarity and 90.67% CGS with strain PCC 7421. Strain JS1, fully described by Saw et al. (2013), was not axenic and the authors reported genomic regions that were not recognized by BLAST, were from other bacteria or of viral origin. In contrast, strain PCC 7421 is held in the PCC in an axenic state. The regions described by Saw et al. (2013) obviously make genome metrics inaccurate and difficult to calculate. A fourth genomic sequence attributed to Gloeobacter is the metagenome SpSt-379 (Zhou et al., 2020); this is smaller in size (4.03 Mbp versus 4.66-4.72 Mbp for the two isolates), has a much higher mol% G+C content (65.3 versus 60.5-62.0), contains only 62 of our 79 core gene marker set and lacks the rrn operon. It has been excluded from all trees and from DDH/ANI calculations. Aphanocapsa lilacina strain HA4352-LM1 (Ward et al., 2021), represented by a draft genomic sequence, shares 99.46% 16S rRNA sequence identity with G. violaceus strain PCC 7421, suggesting con-specificity. The genome metrics (48.9% isDDH, 91.98% ANI, 98.21% CGS) all fall into the borderline area of specific separation. This genome is not shown in the genome tree. Anthocerotibacter panamensis C109 (Rahmatpour et al., 2021; complete chromosome, genus 11.2.2), and Candidatus Aurora vandensis MP9P1 (Grettenberger et al., 2020; draft, species 11.2.1A) behave as different genera with all estimates (96.62% 16S rRNA sequence identity, 19.6% isDDH, 69.6% ANI). However, the genome of organism MP9P1 contains only 76 of our 79 core marker gene set, is smaller in size (3.07 Mbp) and has been excluded from the genome tree. The metagenome Candidatus Aurora vandensis LV9 (Grettenberger et al., 2020) is also incomplete (2.96 Mbp), but has been retained in the genome tree since it contains more than 77 marker genes; this is clearly a member of a different genus, sharing 13.1-20.7% isDDH, 65.16-69.46% ANI and 76.28-76.81% CGS with all others. The metagenome Gloeobacterales cyanobacterium ES-bin-313 (Zeng, unpublished) forms an additional genus, sharing only 16.7-18.6% isDDH, 65.61-68.26% ANI, 76.28-86.66% CGS with the Gloeobacter strains and 16.7% isDDH 65.34% ANI, 76.28% CGS with the LV9 metagenome. Both of these metagenomes lack the 16S rRNA gene and are not shown in the RNA tree.

C: Conclusions

There is good agreement between the 16S rRNA sequence identity and genome metrics obtained for many of the strain clusters. Discrepancies mostly involve draft genomes and metagenomes; the possible incomplete nature or contamination of these cannot be excluded.

There are almost certainly other genomes in our dataset that we are unable to detect as incomplete, and some may contain undetected contaminant DNA sequences. Other possible causes of conflict include marked differences in genome size or mol% G+C content, and major or multiple deletions. All of these parameters will reduce the genome metrics values. Where relatedness estimates based on 16S rRNA sequence identity and genome metrics diverge dramatically, one or both of the genomes compared is often in the draft stage.

Leptolyngbya sp. PCC 7376 is a special case worthy of further investigation; this may be the first clear example of HGT involving the acquisition, by a filamentous organism, of the 16S rRNA gene from a unicellular organism (see discussion of species, above). The alternative possibility, that an ancestor was unicellular, is equally plausible.

In addition, the con-specificity of Synechocystis sp. strains PCC 6714 and PCC 6803 (99.60% 16S rRNA sequence identity) is not confirmed by genome metrics of their complete chromosomal sequences; although an HGT event involving the ancestors of both groups may have occurred, further studies are required to confirm this possibility. Also, the planktonic cyanobacteria of the genera Prochlorococcus, the OMF sister group of Prochlorococcus and the halophiles show major conflicts between genetic distances obtained with measurement of 16S rRNA sequence identity and genome metrics. These are explained in the relevant paragraphs on this page. The two isolates of Gloeobacter are also unusually divergent; we have no explanation for this except for the possible contamination of the genome of strain JS1, mentioned above.

We should bear in mind, the statement of Hayashi Sant’Anna et al. (2019) [where the term "dDDH" means isDDH] "The species circumscription threshold of many genomic metrics such as ANI and dDDH was defined using the wet-lab DNA-DNA hybridization criterion as reference, which in turn was defined by observing phenotypic coherence of strains. Paradoxically, along the years studies exploring bacterial diversity showed that genomic cohesion is not always accompanied by phenotypic homogeneity. Therefore, it is clear that genomic metrics are not a panacea, and they could not be appropriate in some cases, particularly when circumscribing a genomospecies (species defined based on genome data), whose members have undergone ecological diversification and substantial genome modifications (e.g. reduction), forming subgroups that could be interpreted as distinct species".

D: References.

Achaz, G., Boyer, F., Rocha, E.P.C., Viari, A. and Coissac, E. (2006). Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bioinformatics 23: 119-121.
Aguilera, A., Gómez, E.B., Kaštovský, J., Echenique, R.O. and Salerno, G.L. (2018). The polyphasic analysis of two native Raphidiopsis isolates supports the unification of the genera Raphidiopsis and Cylindrospermopsis (Nostocales, Cyanobacteria). Phycologia 57: 130–146.
Cheng, Y.-I., Chou, L., Chiu, Y.-F., Hsueh, H.-T., Kuo, C.-H. and Chu, H.A. (2020). Comparative Genomic Analysis of a Novel Strain of Taiwan Hot-Spring Cyanobacterium Thermosynechococcus sp. CL-1. Frontiers in Microbiology 11: article 82.
Chisholm, S.W., Frankel, S.L., Goericke, R., Olson, R.J., Palenik, B., Waterbury, J.B., West-Johnsrud, L. and Zettler, E.R. (1992). Prochlorococcus marinus nov. gen. nov. sp.: An oxyphototrophic marine prokaryote containing divinyl chlorophyll a and b. Arch Microbiol 157: 297–300.
Coutinho, F., Tschoeke, D.A., Thompson, F. and Thompson, C. (2016a). Comparative genomics of Synechococcus and proposal of the new genus Parasynechococcus. PeerJ 4: e1522.
Coutinho, F., Dutilh, B., Thompson, C. and Thompson, F. (2016b). Proposal of fifteen new species of Parasynechococcus based on genomic, physiological and ecological features. Arch. Microbiol. 198: 973–986.
Dreher, T.W., Davis, E.W. and Mueller, R.S. (2021a). Complete genomes derived by directly sequencing freshwater bloom populations emphasize the significance of the genus level ADA clade within the Nostocales. Harmful Algae 103: 102005.
Dreher, T.W., Davis, E.W., Mueller, R.S. and Otten, T.G. (2021b). Comparative genomics of the ADA clade within the Nostocales. Harmful Algae 104: 102037.
Driscoll C.B., Meyer K.A., Šulčius S, Brown N.M., Dick G.J. and 8 other authors (2018). A closely-related clade of globally distributed bloom-forming cyanobacteria within the Nostocales. Harmful Algae 77: 93–107.
Gaget, V., Welker, M., Rippka, R. and Tandeau de Marsac, N. (2015). A polyphasic approach leading to the revision of the genus Planktothrix (Cyanobacteria) and its type species, P. agardhii, and proposal for integrating the emended valid Botanical taxa, as well as three new species, Planktothrix paucivesiculata sp. nov.ICNP, Planktothrix tepida sp. nov.ICNP, and Planktothrix serta sp. nov.ICNP, as genus and species names with nomenclatural standing under the ICNP. Systematic and Applied Microbiology 38: 141–58.
Gagunashvili, A.N. and Andrésson, O.S. (2018). Distinctive characters of Nostoc genomes in cyanolichens. BMC Genomics 19: Article number 434.
Giovannoni, S.J., Cameron Thrash,J. and Temperton, B. (2014). Implications of streamlining theory for microbial ecology. The ISME Journal 8: 1553–1565.
Grettenberger, C.L., Sumner, D.Y., Wall, K., Brown, C.T., Eisen, J.A. and 4 other authors (2020). A phylogenetically novel cyanobacterium most closely related to Gloeobacter. The ISME Journal; 14: 2142-2152.
Hayashi Sant’Anna, F., Bach, E., Porto, R.Z., Guella, F., Hayashi Sant’Anna, E., and Passaglia, L.M.P. (2019). Genomic metrics made easy: what to do and where to go in the new era of bacterial taxonomy. Critical Reviews in Microbiology 45:1 82–200.
Humbert, J.-F., Barbe, V., Latifi, A., Gugger, M., Calteau, A., and 9 other authors (2013). A Tribute to Disorder in the Genome of the Bloom-Forming Freshwater Cyanobacterium Microcystis aeruginosa. PLoS ONE 8: e70747.
Jain, C., Rodriguez, R.L.M., Phillippy, A.M., Konstantinidis, K.T. and Aluru, S. (2018). High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications 9: 5114.
Jung, P., D’Agostino, P.M., Brust, K., Büdel, B. and Lakatos, M. (2021). Final Destination? Pinpointing Hyella disjuncta sp. nov. PCC 6712 (Cyanobacteria) Based on Taxonomic Aspects, Multicellularity, Nitrogen Fixation and Biosynthetic Gene Clusters. Life 11: 916.
Komárek, J. and Anagnostidis, K. (1989). Modern approach to the classification system of cyanophytes. 4 - Nostocales. Arch Hydrobiol Suppl. 82: 247–345.
Komárek, J., Johansen, J.R., Šmarda, J. and Strunecký, O. (2020). Phylogeny and taxonomy of Synechococcus-like cyanobacteria. Fottea 20: 171–191.
Konstantinou, D., Popin, R.V., Fewer, D.P., Sivonen, K. and Gkelis, S. (2021). Genome Reduction and Secondary Metabolism of the Marine Sponge-Associated Cyanobacterium Leptothoe. Marine Drugs 19: 298.
Kopf, M., Klahn, S., Pade, N., Weingartner, C., Hagemann, M., Voss, B. and Hess, W.R. (2014a). Comparative genome analysis of the closely related Synechocystis strains PCC 6714 and PCC 6803. DNA Research 21: 255–266.
Kopf, M., Klahn, S., Voss, B., Stuber, K., Huettel, B., Reinhardt, R. and Hess, W.R. (2014b). Finished Genome Sequence of the Unicellular Cyanobacterium Synechocystis sp. Strain PCC 6714. Genome Announcements 2: e00757-14.
Lachance, M.A. (1981). Genetic relatedness of heterocystous Cyanobacteria by Deoxyribonucleic Acid-Deoxyribonucleic Acid reassociation. International Journal of Systematic Bacteriology 31: 139–147.
Laloui, W., Palinska, K.A., Rippka, R., Partensky, F., Tandeau de Marsac, N., Herdman, M. and Iteman, I. (2002). Genotyping of axenic and non-axenic isolates of the genus Prochlorococcus and the OMF-′Synechococcus′ clade by size, sequence analyses or RFLP of the Internal Transcribed Spacer of the ribosomal operon. Microbiology 148: 453–465.
Leao, T., Castelão, G., Korobeynikov, A., Monroe, E.A., Podell, S., Glukhov, E., Allen, E.E., Gerwick, W.H. and Gerwick, L. (2017). Comparative genomics uncovers the prolific and distinctive metabolic potential of the cyanobacterial genus Moorea. Proceedings of the National Academy of Sciences 114: 3198–3203.
Li, S.-J., Hua, Z.-S., Huang, L.-N., Li J., Shi, S.-H. and 5 other authors (2014). Microbial communities evolve faster in extreme environments. Scientific Reports 4: 1-9.
Li, X., Huang, Y. and Whitman, W.B. (2015). The relationship of the whole genome sequence identity to DNA hybridization varies between genera of prokaryotes. Antonie van Leeuwenhoek 107: 241–249.
Mikhodyuk, O.S., Zavarzin, G.A. and Ivanovsky, R.N. (2008). Transport systems for carbonate in the extremely natronophilic cyanobacterium Euhalothece sp. Microbiology 77: 412–418.
Miscoe, L.H., Johansen, J.R., Kociolek, J.P., Pietrasiak, N., Sherwood, A.R. and Vaccarino, M.A. (2016). Diatom flora and cyanobacteria from caves on Kauai, Hawaii. Bibliotheca Phycologica 120: 3-152.
Nakamura, Y., Kaneko, T., Sato, S., Mimuro, M., Miyashita, H. and 14 other authors (2003). Complete genome structure of Gloeobacter violaceus PCC 7421, a cyanobacterium that lacks thylakoids. DNA Research 10: 137–145.
Nelson, J.M., Hauser, D.A., Gudiño, J.A., Guadalupe, Y.A., Meeks, J.C., Allen, N.S., Villarreal, J.C. and Li, F.-W. (2019). Complete Genomes of Symbiotic Cyanobacteria Clarify the Evolution of Vanadium-Nitrogenase. Genome Biol. Evol. 11: 1959–1964.
Ondov, B.D., Treangen, T.J., Melsted, P., Mallonee, A.B., Bergman, N.H., Koren, S. and Phillippy, A.M. (2016). Mash: fast genome and metagenome distance estimation using MinHash. Genome Biology 17: 132.
Oren, A. (2015). Cyanobacteria in hypersaline environments: biodiversity and physiological properties. Biodiversity Conservation 24: 781–798.
Österholm, J., Popin, R.V., Fewer, D.P. and Sivonen, K. (2020). Phylogenomic Analysis of Secondary Metabolism in the Toxic Cyanobacterial Genera Anabaena, Dolichospermum and Aphanizomenon. Toxins 212: 248.
Ozer, E.A., Allen, J.P. and Hauser A.R. (2014). Characterization of the core and accessory genomes of Pseudomonas aeruginosa using bioinformatic tools Spine and AGEnt. BMC Genomics 15: 737.
Palmer, M., Steenkamp, E.T., Blom J., Hedlund, B.P. and Venter, S.N. (2020). All ANIs are not created equal: implications for prokaryotic species boundaries and integration of ANIs into polyphasic taxonomy. International Journal of Systematic and Evolutionary Microbiology. 70: 2937-2948.
Parks, D.H., Imelfort, M., Skennerton, C.T., Hugenholtz, P. and Tyson, G.W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research 25: 1043–1055.
Pratte, B.S. and Thiel, T. (2021). Comparative genomic insights into culturable symbiotic cyanobacteria from the water fern Azolla. Microbial Genomics 7: 000595.
Rahmatpour, N., Hauser, D.A., Nelson, J.M., Chen, P.Y., Villarreal, J.C., Ho, M.-Y. and Li, F.-W. (2021). A novel thylakoid-less isolate fills a billion-year gap in the evolution of Cyanobacteria. Current Biology 31: 857-2867.e4.
Rajaniemi, P., Hrouzek, P., Kastovska, K., Willame, R., Rantala, A. and 3 other authors (2005a). Phylogenetic and morphological evaluation of the genera Anabaena, Aphanizomenon, Trichormus and Nostoc (Nostocales, Cyanobacteria). International Journal of Systematic and Evolutionary Microbiology 55: 11–26.
Rajaniemi, P., Komárek, J., Willame, R., Hrouzek, P., Kaštovská, K., Hoffmann, L. and Sivonen, K. (2005). Taxonomic consequences from the combined molecular and phenotype evaluation of selected Anabaena and Aphanizomenon strains. Algological Studies 117: 371–391.
Ran, L., Larsson, J., Vigil-Stenman, T., Nylander, J.A.A., Ininbergs, K. and 5 other authors (2010). Genome erosion in a nitrogen-fixing vertically transmitted endosymbiotic multicellular Cyanobacterium. PLoS ONE 5: e11486.
Richter, M. and Rosselló-Móra, R. (2009). Shifting the genomic gold standard for the prokaryotic species definition. Proceedings of the National Academy of Sciences 106: 19126–19131.
Rippka, R., Deruelles, J., Waterbury, J.B., Herdman, M. and Stanier, R.Y. (1979). Generic assignments, strain histories and properties of pure cultures of Cyanobacteria. Journal of general Microbiology 111: 1–61.
Salazar, V.W., Tschoeke, D.A., Swings, J., Cosenza, C.A., Mattoso, M., Thompson, C.C. and Thompson, F.L. (2020). A new genomic taxonomy system for the Synechococcus collective. Environmental Microbiology 22: 4557–4570.
Saw, J.H.W., Schatz, M., Brown, M.V., Kunkel, D.D., Foster, J.S. and 6 other authors (2013). Cultivation and Complete Genome Sequencing of Gloeobacter kilaueensis sp. nov., from a Lava Cave in Kilauea Caldera, Hawai′i. PLoS ONE 8: e76376.
Saw, J.H., Cardona, T. and Montejano, G. (2021). Complete genome sequencing of a novel Gloeobacter species from a waterfall cave in Mexico. Genome Biology and Evolution 13: evab 264.
Stackebrandt, E. and Goebel, B. M. (1994). Taxonomic note: A place for DNA-DNA reassociation and 16S rRNA sequence analysis in the present species definition in bacteriology. Int J Syst Bacteriol. 44: 846–849.
Stucken, K., John, U., Cembella, A., Murillo, A.A., Soto-Liebe, K. and 5 other authors (2010). The smallest known genomes of multicellular and toxic Cyanobacteria:comparison, minimal gene sets for linked traits and the evolutionary implications. PLoS ONE 5: e9235.
Thompson, C.C., Silva, G.G.Z., Vieira, N.M., Edwards, R., Vicente, A.C.P. and Thompson, F.L. (2013). Genomic Taxonomy of the Genus Prochlorococcus. Microbial Ecology 66: 752–762.
Tschoeke, D., Salazar, V.W., Vidal, L., Campeão, M., Swings, J., Thompson, F. and Thompson, C. (2020). Unlocking the Genomic Taxonomy of the Prochlorococcus Collective. Microbial Ecology 80: 546–558.
Walter, J.M., Coutinho, F.H., Dutilh, B.E., Swings J., Thompson, F.L. and Thompson, C.C. (2017). Ecogenomics and Taxonomy of Cyanobacteria Phylum. Frontiers in Microbiology 8: 2132.
Walter, J.M., Coutinho, F.H., Leomil, L., Hargreaves, P.I., Campeão, M.E. and 13 other authors (2020). Ecogenomics of the Marine Benthic Filamentous Cyanobacterium Adonisia. Microbial Ecology 80: 249-265.
Ward, R.D., Stajich, J.E., Johansen, J.R., Huntemann, M., Clum, A. and 16 other authors (2021). Metagenome Sequencing to Explore Phylogenomics of Terrestrial Cyanobacteria. Microbiology Resource Announcements 10: e00258-21.
Wayne, L.G., Brenner, D.J., Colwell, R.R., Grimont, P.A.D., Kandler, P., Krichevsky, M.I., Moore, L.H. and 6 other authors (1987). Report of the Ad Hoc Committee on Reconciliation of Approaches to Bacterial Systematics. Int. J. Syst. Bacteriol. 37: 463–464.
Xu, T., Qin, S., Hu, Y., Song, Z., Ying, J., Li, P., Dong, W., Zhao, F., Yang, H., and Bao, Q. (2016). Whole genomic DNA sequencing and comparative genomic analysis of Arthrospira platensis: high genome plasticity and genetic diversity. DNA Research 23: 325-338.
Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.-H., Whitman, W.B., Uzébey, J., Amann, R. and Rosselló-Móra, R. (2014). Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Reviews Microbiology 12: 635-645.
Zeng, Y. (2021). Metagenome-assembled genomes from the Lille Firn glacier at the Villum Research Station in northeast Greenland. Unpublished
Zhou, Z., Liu, Y., Xu, W., Pan, J., Luo, Z.H. and Li, M.(2020). Genome- and Community-Level Interaction Insights into Carbon Utilization and Element Cycling Functions of Hydrothermarchaeota in Hydrothermal Sediment. mSystems 5, e00795-19.