The Perils of Polyethylene for Posterity
By Mark HowellsIf my great-great-grandchildren came across these DAT tapes from the early twenty-first century, the tapes might still have retrievable data.
Every week, my PC dutifully makes backup DAT tapes of the important information on my household network. The family genealogy, digital photographs, e-mail, favorite recipes, and my son’s website bookmarks are saved in case one of our computers fails. One backup tape gets taken a few miles away from the house to a safe place so that if a disaster befalls our home, we will be able to recover the information.
In the short run, this little routine serves its purpose. We could recover our most important data after an unhappy event relatively quickly. In the long run, however, all of this data will be lost.
Ideally, if my great-great-grandchildren came across one of those old DAT tapes from the early twenty-first century, the tapes might still have data that is retrievable. That’s assuming the tapes were kept under ideal storage conditions, periodically re-tensioned, and technical obsolescence hadn’t made DAT tape readers impossible to find or the files on the tapes unreadable with future systems. (Have you tried to find a punch card reader lately?) But in the reality of my not-so-careful usage and storage of these tapes, they may not last more than ten or twenty years.
Backup tapes are not a viable long-term storage solution if I want to save my genealogical research for the distant future. Obviously, we’re going to have to do something different if we want the results of our research to survive long-term.
Useful Lives
How long a particular type of media will remain readable is an educated guessing game. Honest people will disagree as to what the results of accelerated stress testing mean to the actual useful lives of the media in question. Unfortunately, only time will generate the actual answers.
Various factors will influence the useful life of a piece of media. Its manufactured quality, the amount of use the media gets during its lifetime, how it is handled, how it is stored, and the quality of the equipment used to write or read off of the media all affect how long it will last. Environmental variations in temperature, humidity, and exposure to light, as well as biological, chemical, or electro-mechanical contamination are going to reduce media lifetimes as well.
Media life expectancy is most directly determined by its component materials. The composition of the protective layers, encoding surfaces, substrates, dyes, inks, and storage containers all affect its useful lifetime. The nature of the data encoding—whether human or machine readable, magnetic or optical, active or passive—is usually secondary in determining media life span.
Hard drive disks have the benefit of being contained within a tightly controlled environment—the drive itself. Floppy disks may be exposed to air, dirt, and fingerprints if the slide cover is pulled back. Magnetic media dependent on polymer substrates are at the mercy of the substrate’s rate of deterioration. In the old days of the 1960s, this was usually polyethylene; now it’s mostly Mylar ® . Static memory devices such as thumb drives have no moving parts like hard drives or floppies so they tend to last longer but ar e still at risk from magnetic fields. Obviously, optical storage devices such as CD-ROMs are not susceptible to failure due to magnetic fields since magnetism is not used to encode optical discs. However, optical discs can have their protective surfaces scratched, making the underlying data unreadable.
It’s interesting to note that “burning a CD-R” on your own PC utilizes a medium that has less longevity than commercially created CD-ROMs. This is because the writable CD-Rs use a form of dye to record the digital information whereas the commercial CD-ROMs use physical peaks and valleys on their substrate. The dye is less stable than the physical encoding, yet both are subject to wear and tear on their plastic coated surfaces.
What to Do in the Short-Term
Electronic media is great because it can be read and written to so quickly. Speed of access is its strength, but durability is its weakness. Electronic formats should not be used for preservation purposes when there are other formats available. Ultimately, genealogical data destined for future preservation should be migrated off of electronic formats and onto non-electronic formats with greater life expectancies, as shown in the chart below.
Given that most of us are not yet planning for posterity, here are some short-term suggestions for preserving electronic media.
When making regular periodic backups—the kind you use if your PC were to drop dead tomorrow—be sure to back up the software programs needed to read your files as well as the data files themselves. Keeping handy copies of software programs such as your genealogy program, e-mail program, etc., will increase the ease with which you can recover. If you use any compression utilities to fit more on your media, make copies of these as well so you may successfully uncompress your data if required. Having the backup and restore software you use on a different form of media will also increase your data’s chances of being readable in the ne ar future. For example, if you regularly back up your data files to a CD-R disc, have a Zip ® disc with your backup and restore software accompany the CD-R into storage.
Make more than one backup. Make duplicate backups and store them in different locations—as far apart geographically as possible. Better yet, make duplicate backups on different media formats. For instance, if you regularly back up to DAT tapes, perform the same backup using a CD-R disc. Data survivability increases with every dispersed copy of your data and with every different media format used.
Test your backup media every six months. Perform an actual restore from your backup media to your PC. This will help detect if your backup media is failing earlier rather than later. Humorously enough, as I wrote this article, one of my DAT backup tapes physically failed.
The Migration Trail
Our ancestors left migration trails that genealogists today follow in the records they left behind. Down a river, across an ocean, then overland by wagon, etc. Your data has a migration trail as well. If you’ve been researching your family history for any length of time, your data has probably migrated a few times. I entered my earliest research on handwritten family group sheets. In time, I transcribed these to a DOS-based software program with data files stored on 5- inch floppy disks. Now, my software and genealogical data are on several different forms of media. That’s at least two separate migration steps between media. While I was migrating to gain the benefits of new and faster software and hardware, I was unintentionally keeping my data “fresh.” Migrating your data between media formats over time is a critical way to avoid technical obsolescence and media failure.
Observe the migration trail of others first. What are your friends and relations using for electronic media? Some of the new formats may turn out to be technical or business model deadends, so pay attention to media formats that build up their consumer popularity and are accepted into the mainstream. You don’t want to wind up recording your data on the equivalent of Betamax VCR tapes that did not survive the marketplace battle with VHS. Find out what your local library or your place of work uses for electronic media.
Migrate to two different formats if possible. When considering a system upgrade, take the time to consider including more than one method of electronic media storage. When you buy a new system, get a CD burner and a thumb drive so you have two formats on which to make copies of your data. This guards against single device hardware failure and can extend the useful life of your data by betting on two media horses.
When you migrate between formats, double-check the accuracy of the new media. Compare the data on the new media to the data on the old media to ensure that the transfer was successful. Have another set of eyes do this for you as well so neither of you miss anything. It’s fairly typical for data to “drop” between formats due to software errors.
Document your data. Record information about the data on or with the container for your new media. At the very least, record the name of the software used to create the data, software vendor, software version, date, and vendor and model number of the hardware system used to write the data. If possible, include the source, content, and structure of the data such as the creator’s name, field names and formats, relationships to other data, etc. This information will become important in future attempts to recover the data.
For the Future
Unless we are extremely diligent and extremely well-funded, our electronic media will ultimately fail. The long-term goal of preservation-minded genealogists should be to migrate their data away from electronic format and on to something more permanent. Laser-inscribed granite makes an excellent long-term choice. But we usually save that technology for our tombstones. P ublishing your genealogical data in book format will certainly help to preserve it. Archival quality paper and microfilm are the most common solutions to the preservation problem right now. Realistically, there are limits to what individuals can do to preserve their research for posterity.
Making the preservation of your research someone else’s responsibility is one way to overcome the limits of individual resources. You could find that one young cousin who shares your passion for the family’s history and turn your research over to him. This will ensure its transmission across at least one generation. You are then dependent on the cousin’s ability to find a successor as data custodian in the subsequent generation. “Keeping it in the family” may only be a one- or two-generation solution.
Donating the results of your research to one of the many institutions that collect this sort of data is undoubtedly the best choice. Local and national libraries, genealogical and historical societies, and international institutions such as the Family History Library are aware of the issues surrounding data preservation. Often, they are equipped with experts and the institutional wherewithal to acquire, migrate between media, preserve, and make your research available to future generations. Be sure to investigate the institution’s capabilities in these areas before making arrangements to donate your research. Ask these questions:
• How will the institution preserve your research?
• Does it require permission to duplicate your data in order to preserve it?
• What, if any, rights as the creator of your research results are you granting to the institution?
• If the institution migrates your data between media formats, will you be able to obtain your own copy of the new format? Will you or the institution keep your original?
• How will your relations and/or the general public obtain access to your data at the institution?
Generational Laughter
My great-great-great-grandchildren will probably laugh at the quaintness of the DAT tapes I am currently using to record their ancestry. By then, their biological- or crystalline-based storage systems will no doubt make DAT cartridges look like toys. I enjoy a good joke as much as the next ancestor, but as a family historian, I have a responsibility to ensure that my research is saved for posterity. Electronic media is not a long-term solution to the problems of data survivability. As counter-intuitive as it appears, data preservation is dependent on relatively low-tech solutions. High-tech storage allows for fast speeds and huge capacities, but it simply can’t go the distance when carrying data into the future.
Mark Howells watches his data deteriorate at markhow@oz.net.
Email This Post