The Day of the Search Engine

By Matthew L. Helm

Goodbye, Link Sites?

Link sites may be the preferred guide for online genealogical research now, but that's only because you haven't used the new genealgoy search engines.

The day started typically enough. I had my trusty research log by my side and decided to sign onto my Internet account. I went through my normal routine of checking e-mail and going to my favorite link site. I scanned the usual categories but, like so many times before, I saw nothing specific on my elusive ancestor. I noticed one new interesting family site so I decided to take a chance. After arriving at the site and reading the initial paragraph, I was pretty sure this was not the branch of my ancestor. However, still hoping to find some morsel of useful information, I scrolled down the page, to no avail. Near the bottom of the page, I noticed the usual list of links to other genealogical pages.

The first five links in the list were usual suspects–sites many people link to from their personal Web pages. However, number six was new–it was a link with an unfamiliar title. Not only was the name different, but the description was different as well. The site was described as a "genealogically-focused search engine." Thinking it must be a new concept, I clicked on the link.

The resulting page looked a lot like the typical search engines I’ve used to find non-genealogical information. I put a search term into the appropriate box and within seconds I was presented with a page contain ing ten links, each with a description of the site. Rather than the bland descriptions I’ve seen on some pages, these descriptions contained the search term I used. Instantly I knew the context of my search term within the Web page associated with the link.

After going through a couple of the search engine result pages, I finally found a reference to my elusive ancestor at a Web site that has an online GEDCOM database. I was then able to contact the maintainer of the database for more information. This was the dawning of a new day for me–the "Day of the Search Engine."

Has this happened to you? If not, it may happen in the not-so-distant future. The "Day of the Search Engine" is coming soon.

Tip of the Iceberg
If I asked you to name five comprehensive genealogical link sites (also called lists or indexes), you could probably do so within a few seconds. But what if I asked you to name five genealogically focused search engines? Could you do it?

The general Internet community discovered the utility of search engines three or four years ago. Currently, there are hundreds of them available ranging from whole Internet search engines (such as Lycos and AltaVista), to smaller ones tailored for searching specific interests or resources in a geographical area. However, the genealogical community has been slow to embrace search engines–especially those that are "genealogically-focused." Instead, the community has favored link sites.

Comprehensive link sites (sites that attempt to list all known genealogical Internet resources) found their start in late 1994 with the introduction of Stephen Wood’s Genealogy Home Page. Shortly thereafter came Helm’s Genealogy Toolbox by yours truly. And, in late February 1995, Genealogy Resour ces on the Internet was introduced by Christine Gaunt and John Fuller. In the ever-changing world of the Internet, it is interesting to note that all three of these sites are still actively maintained. And a whole host of others have been introduced in the past couple of years. A reflection that genealogists love their comprehensive link sites? I think so.

Although link sites have survived the test of "Internet" time, they have some shortcomings that make them a less than perfect research tool for genealogists. First, and probably least important, is that maintainers of link sites require genealogists to think like they do. Link sites are organized in many different ways (subject, alphabetically, or type of Internet resource) but in each case the maintainer makes certain assumptions about which resources go into which classifications. As a user, you then have to think like the maintainer or guess in which category the maintainer placed a resource you’re looking for. Some link sites have added search functionality that help break down this barrier, but most have not.

The second problem: link sites don’t give users enough information to do the types of research that genealogists really want to do. For the most part, genealogists are looking for information about specific individuals. How easy is it to find information on a specific individual on a link site? Unless your individual has a Web site that specifically names them in the title, you are pretty much out of luck.

Some link sites have abstracts that help identify major surnames connected with the site or they list the major people the site is built around. Really, this is not enough. For example, suppose I’m interested in finding an individual named James Madison Gardner. If I go to a link site, I will look for James Gardner under a category such as "Surnames—G" or "Personal Sites—G." After arriving at that categorical page, I might see a link to a surname b oard for Gardner, a mailing list, and maybe a couple of Gardner family pages or one-name studies. But my chances of finding a page entitled "The History of James Madison Gardner and his Ancestors" are pretty remote. My best hope is that one of these general resources can point me to a specific record or contact person.

The third and perhaps most important shortcoming of link sites is that the maintainers can’t keep up with the number of genealogical sites being produced. Most maintainers do the best they can to place as many sites on their lists as possible. But in order to be thorough and to ensure sites are what they say they are, maintainers must check each site individually. This takes a lot of time.

Unfortunately people expect too much from link sites–a case of using the wrong tool for the job. Link sites are great for browsing to see how much information is available on a certain subject, or finding locations of sites that you either know or suspect exist. They are also good for finding resources that are not easily indexed by search engines, such as mailing lists (unless the actual lists are archived by the search engine maintainer) and telnet/hytelnet resources. What they are not good for is finding information contained on sites that are not named for the individual ancestor whom you are seeking, such as transcribed record sites and online GEDCOM databases.

Searching for a Better Way
Search engines are composed of two parts. The first part, often referred to as robots, crawlers, or spiders, are programs that are sent by the search engine maintainer to remote Web sites to index the text on those sites and to store the results in a catalog. The second is the search interface or mechanism, which allows you to type in a search term that in turn executes a script that searches the robot catalog and yields results to the user.

In the case of genealogically-focused search engines, I’m referring to sea rch engines that index content specifically related to researching one’s genealogy or family history. Some have filters that throw out sites that do not meet specific criteria, and others have robots that are "fed" only a list of sites that are genealogically relevant.

As you may recall, the first problem with link sites is that you need to think like the site maintainer. Search engines usually do not have any kind of categorization (or taxonomy) as they are just looking for words that match, or closely match, your search terms. All you have to do is type in what you’re looking for.

The second problem with link sites is that they do not give you enough information. If the search engine you are using indexes the full text of a document, this is not a problem. You can easily find information buried on sites that you would never think to check. Let’s take the example of James Madison Gardner again. I typed the search term "James Madison Gardner" into the Internet FamilyFinder on genealogy.com. The results included five links to "James Madison Gardner" and nine links to "James M. Gardner." The titles of these links included "Linked Family Tree including Syles Jones METCALF," "Company C 25th AL: Shelby County," "1860 Carroll Co Census, Page 29," "Genealogy Index for surnames beginning with G," and the ever popular page title "Genealogy Data." Unless I knew the James Madison Gardner was in some way related to Syles Jones Metcalf, was a member of the 25th Alabama, or lived in Caroll County, Virginia, I could not have found these references through a link site. Yet these are specifically the kinds of sites and information genealogists need to find in order to complete their research.

While link site maintainers can’t keep up with all the new pages, search engines use robots to index sites. Remote Web sites can be ind exed up to twenty-four hours a day, without a lot of human interaction. Depending upon the speed of the Internet line, the amount of Internet traffic, the configuration of the robot, and the processing power of the server housing the search engine, it’s conceivable that robots can index two to three thousand pages in an hour. This is a rate that can easily keep up with the amount of new genealogical sites introduced every day. Plus, search engines can periodically revisit sites automatically to ensure they are still active (although there is never a guarantee the site won’t disappear the very next day), again without human interaction.

Is the Day of the Search Engine Coming?
So, if search engines are so useful, why isn’t everyone using them instead of link sites? Here are a few reasons why I believe search engines have been overlooked.

The first reason is simply hype. There’s an awful lot of hype generated by the media and other genealogists about link sites. When was the last genealogical conference you attended where search engines were mentioned before link sites? In fact, when was the last time genealogically-focused search engines were mentioned more than in a passing reference? More often than not genealogical lecturers only mention the resources that seem popular.

A second reason is marketing. Maintainers of search engine sites have not effectively marketed their sites. Some maintainers fear the label of "commercialism" by posting information about their search engines on newsgroups and mailing lists. Other efforts may be too small to mount a marketing campaign and to pay for the bandwidth and equipment necessary to create and maintain search engines. Also, those that have marketed their sites as search engines–when they truly are not–have hurt the credibility of legitimate search engine sites.

A third and perhaps most important reason is that there has not been much competition. Si nce 1997, perhaps there have been a handful of genealogically focused search engines created. Once competition begins, users will begin seeing improvements in search technology just as they have seen with the general Internet search engines.

Is the "Day of the Search Engine" coming? Yes, it’s coming, and it’s coming soon. There are already several efforts underway to produce genealogically-focused search engines that will go into production before the end of this year. And, after this day fully arrives, I’m sure you will see a fundamental shift in the way genealogy is researched on the Internet.

Matthew L. Helm, former editor of Genealogical Computing, is the co-author of the popular genealogy beginner’s reference book, Genealogy Online for Dummies.

Share/Save/Bookmark

Tagged as: Email This Post Email This Post

One Response »

  1. Good article. Introduced me to a new way to search. Thanks.

Leave a Reply