Computer Breakthrough: Code of Life Becomes Databank
Scientists in Britain on Wednesday announced a breakthrough in the quest to turn DNA into a revolutionary form of data storage.
A speck of man-made DNA can hold mountains of data that can be freeze-dried, shipped and stored, potentially for thousands of years, they said.
The contents are "read" by sequencing the DNA -- as is routinely done today, in genetic fingerprinting and so on -- and turning it back into computer code.
"We already know that DNA is a robust way to store information because we can extract it from bones of woolly mammoths, which date back tens of thousands of years, and make sense of it," said Nick Goldman of the European Bioinformatics Institute (EBI) in Cambridge.
"It's also incredibly small, dense and does not need any power for storage, so shipping and keeping it is easy."
DNA is the famous double helix of compounds -- a long, coiled molecular "ladder" comprising four chemical rungs, adenine, cytosine, guanine and thymine, which team up in pairs. C teams up with G, and T teams up with A.
The letter sequence comprises the genome, or the chemical blueprint for making and sustaining life. Human DNA has more than three billion letters, coiled into packages of 24 chromosomes.
The project entails taking data in the form of zeros and 1s in computing's binary code, and transcribing it into "Base-3" code, which uses zeros, 1s and 2s.
The data is transcribed for a second time into DNA code, which is based on the A, C, G and T. A block of five letters is used for a single binary digit.
The letters are then turned into molecules, using lab-dish chemicals.
The work does not entail using any living DNA, nor does it seek to create any life form and in fact the man-made code would be quite useless in anything biological, the researchers said.
"We have absolutely no intention of messing with life," said Goldman.
Only short strings of DNA can be made, which means the message has to be chopped up into small sections of 117 letters, each attached to a tiny address tag, rather like packet-switching in Internet data, which enables data to be reassembled.
To prove their concept, the team encoded an MP3 recording of Martin Luther King's "I Have A Dream" speech; a digital photo of their lab; a PDF of the landmark study in 1953 that described the structure of DNA; a file of all of Shakespeare's sonnets; and a document that describes the data storage technique.
"We downloaded the files from the Web and used them to synthesize hundreds of thousands of pieces of DNA. The result looks like a tiny piece of dust," said Emily Leproust of Agilent, a U.S. biotech company that took the digital data and used it to synthesize molecules of DNA in the lab.
Agilent then mailed the sample back across the Atlantic to the EBI, where the researchers soaked the DNA in water to reconstitute it and used standard sequencing machines to unravel the code. They recovered and read the files with 100-percent accuracy.
The work follows a big step last year when scientists at Harvard announced they had stored 700 terabytes of data -- enough for around 70,000 movies -- in a gram of DNA.
The new method eliminates the risk of error when the DNA is read, say the researchers, whose work appears in the journal Nature.
"We figured, let's break up the code into lots of overlapping fragments going in both directions, with indexing information showing where each fragment belongs in the overall code, and make a coding scheme that doesn't allow repeats," said co-author Ewan Birney.
"That way, you would have to have the same error on four different fragments for it to fail, and that would be very rare."
Data is accumulating massively around the world, and storing it is a headache. Magnetic and optical discs are voluminous, need to be kept in cool, dry conditions and are prone to decay.
"The only limit (for DNA storage) is the cost," said Birney.
Sequencing and reading the DNA takes a couple of weeks with present technology, so it is not suitable for jobs needing instant data retrieval.
Instead, it would be appropriate for data that would be stored for between 500 and 5,000 years, such as a doomsday encyclopedia of knowledge and culture.
But on current trends, sequencing costs could fall by a factor of 20 within a decade, making DNA storage economically feasible for timeframes of less than 50 years, the authors claim.