ChatGPT = Advanced Search – Part 4 of 4


Everything was new when I wrote my first book on PDF “From Paper to Web” (Adobe Press, 1997) – not just the amazing potential of the emerging Portable Document Format, but the exploding popularization of the Internet and the spectacular new functionality of Web Search. 

Exercising ChatGPT, I recognize the same historical roots in classic Information Retrieval the web search engines employed. That history is long, sixty years ago Gerard Salton, “the father of information retrieval” developed the SMART test bed at Harvard to pursue advanced search techniques. In the 90’s, I analyzed the leading search vendors and most advanced search web sites to understand their powerful functionality. Then in my book, I imagined how search would work if it appeared in a sci-fi embodiment. These approaches to information retrieval are recognizable in ChatGPT performance today. Here are snips of what I wrote then on Advanced Search, does this sound familiar now vis-à-vis ChatGPT?

“Libraries of digital information are expected to offer human-like accessibility. The user thinks: “If I’m going to a digital library, I expect to be able to search every single word in every book in the whole library. Otherwise, I could just go to the old library with the paper card catalog.”

This is perhaps a primal reaction to computers: “Okay, if you’re so smart, prove it.” If the computer is hard to use, or if the user can’t find what he is looking for, the computer is the dumb partner, not the user.

People expect to be able to search for ideas much the way they do in common conversation:

DAVE: I want to read about that inventor who made those great spy planes…

HAL: Did he build the Blackbird spy planes?

DAVE: Right, and he built the P-38 and the U-2.

HAL: Did he work for Lockheed?

DAVE: Right, Lockheed Skunkworks.

HAL: Kelly Johnson appears in most articles on the Skunkworks and Blackbird, so you must be looking for Kelly Johnson.

In the above example, we used celebrity stand-ins from the Stanley Kubrick film 2001: A Space Odyssey. The human star was Dave, and the famous movie star computer was HAL9000.  Advanced text-searching systems provide the kind of access that was fantasized about in this 1969 sci-fi epic and Academy Award-winning movie.

Two levels of query expansion:

Lexical: Word stemming, wildcards, fuzzy, pattern recognition

Logical: Word thesaurus, dictionary, concept relations

Natural Language Query

A natural language query capability allows you to “speak” to the computer in the same language commonly used to speak to humans. This is usually accomplished by a program that “parses” the user input query by stripping out stopwords and inferring relationships between the words in the query.

Concept Searching

If a computer is going to think like a human, then it should be able to handle many related ideas as if they are all part of one big concept. This is the quantum leap where advanced text searching loses the surly bonds of conventional computer databases.

Building on the Dave and Hal interaction:

The term “spy planes” might expand via a thesaurus to include the most popular terms such as Blackbird, U-2, Stealth and so on. The advanced text search software serves up Blackbird as a potential “expanded” search topic. By having the computer continue the conversation, by sensitively mentioning relevant topics, the text search proceeds in a way that intuitively blends the user’s intentions with the computer’s ability.

Automatic Summary

When talking to another person, large bodies of information can be assumed in just a few words.

DAVE: Do you know the fastest planes that were ever built?

HAL: Yes, I’m a big fan of all forms of supersonic flight. Which ones are you most interested in? The experimental planes, the military planes or other high-performance aircraft?

DAVE: I’d like to know about the fastest plane that takes off and lands like a regular plane and gets the fastest speed records.

HAL: Do you mean sustained speed, operational speed, or some special post-to-post race speed?

DAVE: I mean the plane that is the fastest overall. The fastest plane in my definition has the highest number of common, day-in and day-out missions that register the highest speeds and altitudes.

HAL: The SR-71 Blackbird appears in most of the highest absolute speed records for conventional aircraft.

Automatic summary provides assistance in understanding the contents of the hit articles. In a way, this can be considered a form of advanced browsing rather than searching. Rather than just providing a ranked list of hit terms in response to a query, automatic summary will write a small report or abstract of each hit.

Fuzzy Logic

DAVE: I wonder what is the fastest plane that I could fly on?

HAL: Fly on as a regular passenger, or a custom flight on some special deal, or just what is physically possible?

DAVE: What plane flies faster than anything else, that I personally, a nonpilot, could fly in?

HAL: Of several two-seater supersonic aircraft, the fastest is the SR-71B trainer. It can exceed Mach 3 in normal flight.

Fuzzy logic can be thought of as an “Expanded Or” search, where you want to find as many of your query terms as possible, but not necessarily all of them. Fuzzy logic attempts to perform the rough grouping that a human mind can.

Intelligent Agents

Intelligent agents are also known as persistent searches, which continuously or periodically perform the same queries, which may employ any of the methods discussed previously. For example, the intelligent agent could be constructed by a user to collect specific information from any number of sources. This is particularly relevant in constantly updating pools of new information.”

Looks like Gerard Salton’s efforts 60 years ago to progress Information Retrieval aka Advanced Search has evolved into the sci-fi-like functionality of ChatGPT behavior today.

PDF Expert – Master PDF and OCR

Copyright © 2023 Tony McKinley. All rights reserved.

Email: amckinley1@verizon.net