Using the Web as a Research Tool

Relying on family information provided via the Internet can be hazardous until you take the proper precautions.

As most of us know, the Internet started as a Defense Department project that would let researchers on government projects quickly share their research findings. It proved its value when it was extended to the scholarly community generally through university and college networks.

It is only within the past ten years, with the development of the World Wide Web, that those of us outside government and academia have been able to enjoy the same benefits. Once the doors were opened, we flocked in by the millions, and neither the world nor family history will ever be quite the same again.

Most family historians are now using the Internet’s resources–especially the World Wide Web–as one of the ess ential tools in the array available to them. But can we consider the Web a cornerstone of research? Opinions on this question vary and are marked by strongly held and expressed opinions at both extremes.

At one extreme you’ll hear, “Of course it’s a cornerstone–it’s the only tool I use for my family research, except perhaps for some data on disks that I use on my computer.” At the other end, you’ll get an incredulous, “Cornerstone? No way. Most of what’s on the Web is garbage and even good data so seldom has a source citation that you can’t tell it from the trash.”

Whatever your personal position, there’s no denying that most of us are continuing to increase our use of the Web, especially as databases become available that give us access to information in previously unindexed records, and as more and more images of original documents are being placed online, notably the U.S. Census population schedules and Ellis Island immigrant passenger lists.

In truth, the Web is such a vast resource, and its users so different in their expectations, that at best we can only recognize that the usefulness of the Web depends on how much information we can find that will be useful in solving some particular research problem. In some cases, we will find no more than a clue to publications or records not yet available online. In other cases, we’ll find databases compiled from books and original records that will direct us to the sources in which we can find a particular name of interest. In a few cases, we can then go to a Web site and view an image of the original source, which hopefully will give us enough information to decide whether and how it applies to the person we’re interested in.

Purposes for Web Use
In considering our use of the Web as a research tool, it may be helpful to consider separately the three purposes of the Web and how we use it in our research.

The ear liest purpose was person-to-person communication in the form of messaging or e-mail. Its primary advantage over earlier means of person-to-person communication was its speed, low cost, and provision of a record of the exchange. Most of us have already switched much of our communication from the postal service, overnight courier, and fax to this low-cost alternative when it isn’t necessary to place an original document or object in the hands of the recipient.

The second purpose, which we can call focused publication, occupies a position somewhere between person-to-person communication and general publication, and is addressed to a relatively small group of people who share some interest in common. Traditionally this took the form of publications like the newsletters of family associations or small special-interest groups. Emerging early as digital equivalents on the Internet were the mail list and the bulletin board, of which there are now thousands in genealogy alone, focused on such common interests as surnames, localities, ethnic origins, or shared experiences like immigration or military service.

The third purpose for using the Web is publication generally equivalent to what traditionally has been done through printed journals, pamphlets, and books. Many of the compiled family histories being published on Web sites can be considered an equivalent of the queries that appear in print journals–broadly distributed notices seeking other researchers with information about the same family. In print journals, queries have traditionally been limited to abbreviated and condensed paragraphs listing a few names and relationships with relevant dates and locations. The Web has no such limits, and it is easy to include much more information about a family on a Web site in the hopes of attracting responses from others interested in the same family who might have information from sources unavailable to the Web site author.

Value of Online GEDCOM Files
In fact, we can post a GEDCOM database that will display our entire family tree, or we can submit our GEDCOM file to an online service or digital publisher that will make it available for searching. Many GEDCOM files and other compiled genealogies on the Web contain no source citations that would permit some assessment of their reliability or

credibility. Some compilers unsuccessfully try to justify the omission with disclaimers noting that they are works in progress, subject to correction and revision, but with or without the warning, such works can still be useful, even though we can’t accept them as fact without further verification.

We can take the basic data as a possible clue to names and places that may be helpful in our research, and as an invitation to contact the originator who may have family records or other unpublished information, but we cannot rely on such data until we have found where it came from, and have judged how reliably it was originally reported and how accurately it was transmitted through subsequent repetition and copying.

Proliferating Errors
Whatever the good intentions of the originator, danger arises when others indiscriminately copy names and asserted relationships, and add them to their own GEDCOM files without identifying either the originator or any source information that may have accompanied the original version. If one of the relationships is in error, the error is multiplied each time the pedigree is copied to a new file, but if corrections are later made, they rarely catch up with the original error. For example, if a son was earlier attributed to the wrong wife of his father, we’re likely to find that there are more GEDCOM files that have the error, than files that show corrected information based on solid evidence that proved the earlier assertion wrong.

As a rule of thumb, if you find multiple GEDCOM files with conflicting information, suspect the version that occurs in the greater number of files, on the basis that earlier errors have probably multiplied more than any later corrections. Above all, don’t accept either undocumented assertion without investigating which is more likely to be true.

My own research has in a way contributed to just such a multiplication of error in online or CD-ROM pedigrees. Here’s what happened: In 1719 there were two girls named Catherine/Catharina/Katrina Van Tassel/Van Thexel baptized the same day at the old Dutch Church at Sleepy Hollow, Tarrytown, New York, and another girl of the same name baptized a year later, in each case with their parents named. All three were later married in the same church, but with their parents unnamed. My problem was determining which of the three girls married my ancestor, John/Johannes/Jan DeRevere/de Reviere, in 1740.

I agreed with an earlier researcher that the Catherine baptized in 1720 had married someone else because their children’s names were consistent with the traditional Dutch naming pattern, but I differed with the earlier researcher on which of the remaining two married John DeRevere, because the Van Tassel baptismal sponsor for their first child was too distantly related to the Catherine he chose to be a logical candidate. I concluded that it was more probable that the other Catherine baptized in 1719 was the one who married John, and I published my findings. My conclusion, which I labeled only probable, is the one that most frequently appears in online pedigrees.

Several years later, a newly-discovered VanTassel family record made it clear that my conclusion had been wrong. The Catherine Van Tassel I had picked as the probable wife of John DeRevere was in fact his half-sister–a legal impediment that made such a marriage impossible. The Van Tassel baptismal sponsor whom I had rejected as too distantly related to the wife was, in fact, the husband John DeRevere’s stepfather, through a marriage recorded nowhere but in the newly discover ed family record, which was published in the same journal in which my original paper appeared.

My correction to my earlier choice of the wrong Catherine as John’s wife was published in the next issue of that journal, with the evidence supporting it. Unfortunately, the correction never caught up with the original error. Of some two dozen references found in a recent online search for pedigrees including the three Catherines, only one incorporated the corrected conclusion from my 1989 corrections–vividly demonstrating how errors multiply more than later corrections. Some of the pedigrees list names and addresses of submitters, others have no clue as to who submitted them to the digital database. Where the database allows the submitter to indicate a data source, it is seldom used. In the one I noted, a submission by someone completely unknown to me cited the page reference to my original, uncorrected article.

Protecting Your Work
How do we protect ourselves from contributing to the further spread of such errors? We can treat data that lacks any indication of its origin as undependable, even in the absence of contradictory information. The lack of credibility is immediately apparent when conflicting information comes from different sources, even when the immediate submitter or contributor is identified. However, an individual undocumented information item is no more worthy of belief than either of two contradictory items of similarly unlisted origin.

We can protect our own work against inadvertent errors, and keep them out of our own database, by never adopting someone’s assertion or conclusion about a genealogical fact unless it meets one of two conditions:

1) The source information shows that it originated with a believable observer who was in a position to know the facts, and it has not been significantly altered in subsequent retelling or recopying; or

2) When there is no such information that directly ans wers a genealogical question, there is a persuasive written account of how the conclusion was reached from other believable information that touched on the question only indirectly.

With this defense to protect our research, we can surf the Web’s genealogical resources without fear that we will be affected by, or contribute further to, the proliferation of unreliable information on the Web. By doing so, we can take full advantage of the Web’s marvelous index and search capabilities and its ever-increasing store of original document images without hesitation.

Donn Devine, GC, CGI, a genealogical consultant from Wilmington, Delaware, is an attorney for that city and archivist of the Catholic Diocese of Wilmington. He is a director of the National Genealogical Society and chair of its Standards Committee, and is a trustee of the Board for Certification of Genealogists.

Share/Save/Bookmark

Email This Post Email This Post

Leave a Reply