In many cases, viruses manage to spread so readily because they’re so compact, allowing hundreds of thousands of viral particles to explode from a single sneeze. That compact size comes in part from their limited needs. Since viruses use parts of their host cells for much of what they need to do, even the more complicated viruses tend to only need a few dozen specialized genes to do things like evade the immune system or remain dormant in cells. In fact, complexity would seem to go against one of virus’ evolutionary advantages: the ability to make lots of copies of itself very quickly.
So it was a bit of a surprise to find that there are giant viruses that carry far more genetic material than they seemingly need. All cells carry the machinery needed to make proteins so, at most, viruses typically carry just a few genes that direct the machinery to focus on the virus’ needs. But the giant viruses seemed to carry replacements for much of the basic machinery itself. Those viruses were attacking complicated cells, with a lot of internal structures and many complex biological processes going on in different locations. Maybe carrying all those seemingly superfluous parts was advantageous in that context.
Or possibly not. In a study released today, researchers describe a large collection of giant viruses that target bacteria. While smaller than some of the largest eukaryotic viruses, they’re not that much smaller. And given that they infect bacteria, the genomes of the newly described viruses may be a substantial fraction of the size of their host’s genome.
In the mix
The work relies on what has come to be called meta-genomics, which essentially involves blowing up all the cells in an environmental sample and sequencing any DNA that comes out. This will provide DNA sequence data on all of the different microbes living in it, as well as the viruses living in them. Software can search through that data and find pieces that overlap, stitching together larger sections of the genome from the smaller fragments of sequence. But it’s difficult to put together an entire genome this way, as any repeated sequences or difficult-to-sequence segments will confuse the computer. So even if giant viruses are in these samples, a metagenomic analysis would typically identify smaller fragments of them and not link them together to reveal their full size.
Inspired by some earlier indications that bacteria-attacking viruses (technically termed “phages”) can get very large, an equally large research team got ahold of a lot of environmental samples and went searching for giant viruses. Sources included “human fecal and oral samples, fecal samples from other animals, freshwater lakes and rivers, marine ecosystems, sediments, hot springs, soils, deep subsurface habitats, and the built environment.”
Once software had assembled the short sequences of the original survey into longer fragments, the researchers checked for gene similarities to identify whether the fragment came from bacteria, complex cells, archaea, or viruses. Any sequences that were 200,000 bases long or more were tested to see if they might actually be circular (a common feature of large viral genomes in bacteria), and a handful of the largest ones were selected for detailed manual examination. “Manual” here meaning that grad students would have to confirm the sequencing and look for ways to deal with any repeated DNA or difficult sequences.
As a whole, the researchers put together 350 sequences of viruses, based on the fact that they carry genes involved in building the viruses’ coat or exploding their host cells in order to spread further. Four other long sequences were difficult to assign to any category.
Families of giants
Some of the apparent viruses were absolutely huge, with four being over 600,000 bases long, and the largest coming in at 735,000. This is in the same range as some of the large viruses that attack amoeba. But whereas the amoeba can have genomes that are hundreds of billions of bases long, these viruses seem to be infecting bacteria with genomes less than 5 million bases long. For context, there are bacteria with genomes that are only about one-fifth the size of these viruses.
One of the viruses had a gene that was over 2,300 bases long—1.5 times the size of the entire genome of some small viruses.
With the assembly complete, the researchers started comparing sequences to figure out what these viruses were related to. In many cases, the answer turned out to be “each other.” The largest viruses were all part of a family that the researchers termed “Mahaphages” (Maha being the Sanskrit word for huge). Significantly, there were no small viruses that grouped among the giants, indicating that these huge genomes are probably stable features of this family rather than being the result of a smaller virus that happened to gain a lot of extra DNA recently.
Many of these viral families have genes for the transfer RNAs used in making proteins, which are normally supplied by the cell. Other genes include those needed for the metabolism of nucleic acids, allowing them to make some of the DNA and RNA they’re dependent on. Normally, both of these classes of genes are provided by the host, although similar things are found in the giant viruses that infect amoeba. The authors note that this sort of gene content is similar to a group of tiny bacteria with small genomes that are thought to be symbiotic or parasitic. Whether this is simply a consequence of lifestyle or represents something more significant is left to future studies.
Many of the viruses also carry components of the CRISPR/Cas system that we’ve started using for genome editing. Bacteria typically use this system to protect themselves from viruses, which makes it odd to find viruses carrying their own version. Some of these systems seem to target genes that bacteria use to control gene activity, so the virus’ version may simply involve redirecting these control systems to focus on virus production. In other cases, they target different viruses, suggesting that they’re a way of limiting competitors.
Other families of viruses seem to carry proteins that shut the bacterial CRISPR system down, which is more in line with what you’d expect—a means of protecting the virus from the host’s defenses.
Perhaps the strangest thing found in these viruses are genes that encode relatives of a protein called tubulin, which helps a cell organize its internal contents. Bacteria are rather notable for having a poorly defined internal organization, so seeing a virus leveraging something we don’t understand especially well is rather striking. Still, it’s easy to see how this protein could help get all the pieces needed for assembling a virus to the right place.
But there’s clearly a lot we don’t understand about these viruses more generally, including the specific cells they infect—we know the environment they came from and the genuses of bacteria they’re generally found with, but not a whole lot more than that. Figuring out more and studying their dynamics in culture may help us understand how the viruses can sometimes outcompete their smaller and faster-moving relatives.In the process, they might teach us some lessons about the bacteria they’re infecting.
Nature, 2020. DOI: 10.1038/s41586-020-2007-4 (About DOIs).