Overview of Voyager Keyword Indexes
- Product: Voyager
- Relevant for Installation Type: Multi-Tenant Direct, Dedicated-Direct, Local, TotalCare
Question
Would like some help understanding Voyager Keyword Indexes.
Answer
An index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book. An index in a database works the same way as a book index in that a query is pointed to the exact physical location of data in a table. You are actually being directed to the data's location in an underlying file of the database. This model of data management makes searching much more efficient and much faster. But the indexes need to be kept current and correct. If you add a new chapter to a book, the index needs to be updated as well. The same thing is true of databases.
The Voyager keyword indexes are outside of Oracle for better performance. They can be found in the /m1/voyager/xxxdb/data directory on the server (where xxxdb is the Voyager Database Instance).
The /m1/voyager/xxxdb/data/ directory contains two main types of files:
- Turbo Bib Text Files - not technically keyword index files. The text of bib records used for fast retrieval (in pairs; can have multiple pairs)
- bib_text.#.tdr
- bib_text.#.tdt
- Keyword Index Files - a list of keywords either static or dynamic. Keyword search looks at both static and dynamic lists:
- Static - the sorted list of keywords created at each regen; it is indexed (in pairs; can have multiple pairs):
- xxxdb.#.bif
- xxxdb.#.dc
- Dynamic - the unsorted list of keywords from added and changed records between regens; not indexed (one set of three):
- dynamic.bif
- dynamic.dc
- dynamic.que
- Kill file - list of words no longer used (one for each pair of static files):
- xxxdb.#.kil
- Static - the sorted list of keywords created at each regen; it is indexed (in pairs; can have multiple pairs):
Flow:
- The static dc (dictionary) file is created at each regen and is indexed.
- The dynamic dc file is what changes between regens and is not indexed. It is "searchable", though. The larger it gets, the less efficient the searching.
- Bulk Import (including GDC) or Cat client writes new words to que via keysvr, keysvr checks the que periodically and COPIES new words to dynamic dc (new words are not searchable until they appear in the dictionary); once new words are copied to dc they are searchable via the keyword index.
- In Bulk Import and GDC when you opt to run new records through keyword index, the Dynamic file is updated and its size increases. Eventually it will become so large that you will need a keyword regen. If you do not run records through keyword index, the new records won't be added to the dynamic file and will not be searchable until you run a keyword regen.
- After a keyword regen the .dc and .kil files are not set to 0 bytes because they must contain internal structures.
The individual files in /data are described like this:
bib_text.?.tbt – TurboBibText - Contains the actual bib text
bib_text.?.tdr - TurboDiRectory - Index to the bib_text.?.tbt file
*.dc – DiCtionary words - The actual indexed words (static and dynamic files) and pointers to records
*.bif – BinaryInfoFiles - Info for the matching .dc file. Links to dictionary, describes size and structure
*.kil – KILl file - List of words that no longer exist in the database
dynamic.que – QUEue file - Temporary list of words to be indexed awaiting access to dynamic dictionary file
The keyword index is made up of the static, dynamic and kill files, but only the first two are searched.
Size limits:
- The dynamic.dc file stops working at 2GB and from Voyager 9 on, no new records are indexed (before Voyager 9 consequences were more severe).
- Static files are capped at 2GB. Larger databases create more static files.
- Turbo text files generally are kept below 1.5GB. Larger databases create more turbo files.
The voyager.ini file on the server:
- The voyager.ini file found in /m1/voyager/xxxdb/ini is a server configuration file.
- It provides configuration settings for keyword indexes including port number specification and existence of multiple keyword index files.
- If configured incorrectly, it can cause problems with keyword searches.
Additional Information
Example of Keyword indexes in data/ for "UNKNOWDB":
-rw-rw-r-- 1 voyager exlibris 23812160 Apr 1 16:15 bib_text.1.tdr
-rw-rw-r-- 1 voyager exlibris 1629102566 Apr 1 16:15 bib_text.1.tbt
-rw-rw-r-- 1 voyager exlibris 1544 Feb 12 23:14 bib_text.2.tdr
-rw-rw-r-- 1 voyager exlibris 0 Feb 12 23:14 bib_text.2.tbt
-rw--w---- 1 voyager exlibris 264 Feb 12 23:15 unknowdb.1.bif
-rw--w---- 1 voyager exlibris 264 Feb 12 23:15 unknowdb.2.bif
-rw------- 1 voyager exlibris 264 Feb 12 23:15 unknowdb.3.bif
-rw------- 1 voyager exlibris 264 Feb 12 23:15 unknowdb.4.bif
-rw--w---- 1 voyager exlibris 1061131984 Feb 12 23:15 unknowdb.1.dc
-rw--w---- 1 voyager exlibris 1037491048 Feb 12 23:16 unknowdb.2.dc
-rw------- 1 voyager exlibris 1024779136 Feb 12 23:16 unknowdb.3.dc
-rw------- 1 voyager exlibris 432824496 Feb 12 23:16 unknowdb.4.dc
-rw------- 1 voyager exlibris 31604 Apr 1 16:15 unknow.db1.kil
-rw------- 1 voyager exlibris 7520 Apr 1 15:30 unknowdb.2.kil
-rw------- 1 voyager exlibris 8316 Apr 1 15:30 unknowdb.3.kil
-rw------- 1 voyager exlibris 12296 Apr 1 16:14 unknowdb.4.kil
-rw------- 1 voyager exlibris 144 Apr 1 16:15 dynamic.bif
-rw------- 1 voyager exlibris 2 Apr 1 16:15 dynamic.que
-rw--w---- 1 voyager exlibris 233009152 Apr 1 16:15 dynamic.dc
- Article last edited: 08-Jan-2020