Researchers Store Digital Data in Synthetic DNA
Microsoft researchers have been exploring the role of biotechnology in IT via an end-to-end system that stores digital data in DNA.
Dr. Karin Strauss, a Senior Researcher at Microsoft Research in Redmond, explains how the unique properties of DNA could eventually enable us to store really big data in really small places for a really long time.
DNA, or deoxyribonucleic acid, is a big long molecule. It’s essentially a chain of what we call bases and that’s what we describe as A, T, C and G. Those are called monomers that, put together into a chain, make up DNA. Each side of the double-helix is made of complementary bases. So, A complements with T on the other side, and C complements with G. Dr. Strauss says that from an information and storage perspective, we only need to look at one of the sides, the other is redundant. Because there’s a direct correspondence between A to T, C to G.
The idea of storing sata in DNA actually dates back from the 60s, right after the structure of DNA started to be more well-understood. The question was whether DNA would be able to carry any kind of information besides the information about life. So, one could use DNA for that except at that time there was no technology available to fabricate DNA or to read DNA… not at reasonable speeds.
According to Dr.Strauss, the DNA has some exciting properties.
The first one is density. So instead of really storing the bits into devices that we have to manufacture, we are really looking at a molecule, storing data in a molecule itself. And so, a molecule can be a lot smaller than the devices we’re making.
"Just to give you an example, you could store the information, today stored in a datacenter, one exabyte of data, into a cubic inch of DNA. So that’s quite tiny. Durability is the next interesting property of DNA. And so, DNA, if preserved under the right conditions, can keep for a very long time, which is not necessarily possible with media that’s commercial today. DNA, if encapsulated in the right conditions, has been shown to survive thousands of years. And so, it’s very interesting from a data preservation perspective as well. And then, one other property is that, now that we know how to read DNA and we’ll always have the technology to read it," she said.
If we go back to the structure of DNA, it’s the chain of the different bases, A, T, C and G. And so, the way to think about them is, they’re a sequence of these bases. And the way to think about bits is, digital data is essentially a sequence of bits. And so, the science behind it starts with translating those bits into bases. So, a very simple way to think about that is A corresponds to zero, zero. C corresponds to zero, one. G to one, zero, and T to one, one. "And so, if we have a sequence of bits, we’ll take every two bits and translate it into a base. We use a lot more sophisticated methods, but that’s the first step," Dr. Strauss added.
So, once we've got the binary code translated into DNA code, there is a process to manufacture the DNA and there’s also a process where multiple chemicals are flowed and the DNA sort of grows.
We know which sequences need to be grown and those sequences are grown from a surface. "Once we grow the DNA, we’ll remove it from where it was grown, and we’ll encapsulate it," DR. Strauss said.
Encapsulation of the DNA can be done in glass using a type of chemistry that will encapsulate the DNA in glass. It’s actually silicon dioxide. And researchers habe developed nano-particles that, then, the DNA gets attached to and then a layer of glass is grown around it. And so, that keeps it away from water, which is something that degrades the DNA, UV light and when the temperature goes higher, it protects it from really degrading.
But how do we access the data stored in DNA?
Dr.Strauss said that data was stored in a certain location, organized in a spatial way, so there’s some way to retrieve the actual molecules encapsulated. We need to remove all that glass that was added for stability and extract the DNA. Once that’s done, and that’s the first part of a random-access process, it’s a hierarchical process.
"First, you physically find the smaller set of DNA molecules you are interested in, but then, within that, there are many molecules that may belong to different movies that you’ve stored, and you’re just interested in one movie, you don’t want to read the whole collection. Right? And reading the whole collection would actually be wasteful. And so, we would like the ability to further select that particular movie you want to watch. And it turns out that there’s a process to do that. We do it chemically. And so, it’s just another reaction that comes from nature, actually, and is repurposed for this goal, for this purpose," according to Dr. Strauss. The chemical process is borrowed from the biotech industry. It’s a pretty standard process called Polymerase Chain Reaction and it’s the process that copies DNA.
In her research, Dr.Strauss made headlines when she managed to store 200 megabytes on strands of DNA. The challenges to move further include getting the throughputs up, and also lowering costs.
"DNA manufacturing today is still quite costly. But for both of these challenges, they sort of go hand in hand: if you get the speed up, you also get the cost down. We see no fundamental, physical reason why you couldn’t really scale it to the level of being acceptable or being suitable for DNA data storage," Dr.Strauss said.
Algorithms also play a big role here and a team of coding theorists are working at Microsoft Research on this problem and on the project itself. So, they developed algorithms that really reduced the effort to recover the data from DNA.
"One of the big contributions there was to encode the data in a way that, once we read it on the way out, we need to process minimal amounts of information to really recover the data," Dr.Strauss explained.
"Success would look like everyone in the world has access to DNA data storage. And so, really at Microsoft, our mission is to empower every person and organization to achieve more. With DNA, we would empower every person and organization to store more!," she added.