Old, But Still Around
By David Rubinstein
April 23, 2008 — A rare gateway to ancient history was stumbled upon in 1947, when a group of Bedouins, looking for a stray goat, entered into caves near the Dead Sea between Israel and Jordan and found jars containing religious scrolls that were determined to be about 2,000 years old.
These scrolls were written in Hebrew and Aramaic on parchment, wrapped in linens, and placed in ceramic jars for safekeeping when hidden in the caves. And when they were found, it was not difficult to make sense of the words within the scrolls and fragments; people knowledgeable about those languages could easily understand their meaning.
Now, imagine if the words in those scrolls were preserved on 8-track tape. It would be significantly more difficult to retrieve the information, as players for that kind of tape are harder and harder to locate. Some day, when there are no functional players left, that information would be irretrievable.
This scenario concerns the Data Management Forum of the not-for-profit Storage Networking Industry Association. So they have seated a "100-Year Archive Committee" to set standards for the retrieval of data a century from now, when tape readers are obsolete, and the tapes themselves have degraded to the point of being ruined.
"Data needs to be readable and understandable in the future, even when the people who created it, or the systems it ran on, are no longer around," said Craig Mullins, co-chairman of the archive committee and chief technologist at NEON Enterprise Software.
If you look back 35 years—and Mullins said most people think of long-term storage as 10 to 15 years—you'd find most people working in an organization of any real size would have used a mainframe computer with data stored on punch cards. "To read those cards today, you'd have to go to a museum" that had a punch card reader, he said.
There are two parts to the problem: physical and logical. It's one thing to preserve the data on a medium from which it can be retrieved. It's another to be able to interpret that data and understand what it means.
The SNIA is working on something called the Self-Describing Self-Contained Data Format, which it hopes to put forth as a standard for long-term data storage. The standard, according to Mullins, will define a preservation-oriented data container that will hold the data and the metadata describing it, including reference information, authenticity controls and reader security.
Data storage problems will only worsen as the amount of data created, which needs to be stored, explodes in size. Mullins cited IDC statistics that show in 2006, 161 exabytes of data was created; by 2010 they expect 988 exabytes (10 to the 18th power) to be created. "We're in this age where we're moving data to be storied digitally, but we're actually putting it more at risk," he said, by not having retrieval standards in place.
He also noted that from the perspective of database archiving, "There are issues with IMS that make it tough to continue to work with." IT managers are being charged with repurposing data in different formats and then having to archive it, he said. An industry standard for storage, Mullins posited, will allow data to be archived today and easily converted to any standards that might come down the pike in the future.
Related Search Term(s): Backup & recovery
Share this link: http://www.sysmannews.com/link/32080