Full-Text Search and Retrieval

FILEFLEX PROGRAMMER MANUAL

CHAPTER 16
Full-Text Search and Retrieval

While FileFlex character fields are fixed in width at the time of database design, and can store only up to 256 characters each, memo fields have no such restrictions. You can comfortably place descriptions, notes, reports, letters, and other long information into FileFlex memo fields.

Easy Full-Text Search

FileFlex provides a simple, yet powerful function that allows you to find text anywhere in a memo field. The DBFindMemo function searches the specified memo field in the current database for the string specified. This is something of a brute-force searching mechanism. FileFlex checks every record, scanning each field of each record for the matching search string:

DBFindMemo("NOTES","status on FileFlex") into dbResult

Building a Complex Indexed Full-Text Search

If your database is very large, you'll find that DBFindMemo will take some time to find your record. But you can combine a number of FileFlex features together to create a rather sophisticated fully-indexed full-text search.

Note: Unlike normal indexes, this mechanism doesn't dynamically update the full-text index unless the index is rebuilt. Therefore it's most appropriate for use with CD-ROM or unchanging information. There are two phases to performing a complex indexed full-text search:

At development time, construct the index itself
At runtime, search the index for your words

Constructing a Full-Text Index

Before you can build your index, you need to think through the indexing architecture. The most important database is the file where your memo field is located. Let's call this the "memo database". When the user enters a search word, we want to move the record pointer in the memo database to the first matching record.

You'll need to create two additional files: a database file (let's describe it as the "word database") containing the words to be searched and an index into the word database file.

Define the word database with two fields: a fixed width field for each word in the memo file, and a fixed width field containing the physical record number of the record in the memo file containing the word.

Alternatively, the you can replace the record number with any other unique key that you can search for, but you'll need an extra index file if you use this technique.

Finally, you'll need a standard index file that uses the fixed word field as the indexed field and the word file as the data file. So, here's what you've got so far:

Memo database: The database containing your memo field. This is where the actual information you're looking for resides;
Word database: The database containing in one field all the words in the memo database memo field, and a field containing a pointer to the appropriate record in the memo database;
Index-on-word file: An index file indexing the word field in the word database (make sure you use the UPPER intrinsic function to ignore case).

Now you have to write a utility routine that follows the following algorithm (remember that this is a routine that you as the developer will run, not one used by the end-user of your software):

For every record in the memo database...read the contents of the memo field into a variable

For every word in the variable...write the word and the current memo database record number to a new record in the word database

Make sure the index file is up-to-date with the data in the word database.

At this point, you've got everything you need to do an incredibly fast, indexed full-text search. Let's look at how it'll run in your application to retrieve a value.

Searching Using a Full-Text Index

You get the word to be searched via some form of user input. You then pass that word to a full-text seek routine that does the following:

Selects the word database (DBSelect)
Selects the index-on-word tag (DBSelectIndex)
Prepares a seek expression, converting to upper case
Does the seek, moving to the first match (DBSeek)
Grabs the pointer to the memo database (DBGetFieldByName)
Selects the memo database (DBSelect)
Moves (DBGo) to the physical record containing the memo field

You can, of course, write the routine to iterate through the list of matching words if you want to build a table of all the records that contain a given word.

Discuss this chapter on the FileFlex Boards.

Casa de Bender