How many records can be imported at a time using bulkimport?
- Product: Voyager
- Product Version: All
- Relevant for Installation Type: Multi-Tenant Direct, Dedicated-Direct, Local, Total Care
Question
When importing records via Bulk Import, what is the maximum number of records that should be imported in a single file?
Answer
For optimum importing performance, import 10,000 records (or less) at one time. If your record file is larger than 10,000 records, it should be broken into smaller sets of records (using the -b and -e parameters) and then imported one after the other.
Sites may elect to import more than 10,000 records, but this is not a recommended best practice, and should be done judiciously (i.e., increase size incrementally rather than jump from 10,000 records to 100,000), and monitor your log files carefully.
If issues occur with larger batch sizes, Support will request as a first step in troubleshooting that the site move back to importing 10,000 records or fewer and see if the issue occurs with the recommended batch size.
- This rule of thumb of limiting imports to groups of 10,000 records (or less) also applies to Global Data Change, which uses Bulk Import.
- If you are using WebAdmin, which also uses Bulk Import, the same general rules apply, however because you are uploading the file through your browser you should keep the imports in the 1000-5000 records range.
- Pick and Scan is different. It runs in real time on your desktop. The changes are made on the server. Constant connectivity is required between the client and server, but there is no "top limit" to the number of records you can process other than locally-imposed constraints (connectivity disconnects due to timeouts/etc.).
Additional Information
Consider opening a Case with Support before embarking on a large import project. We can offer additional advice about Oracle, tablespace and other topics that can help to ensure your project is a success
This advice is based on an understanding that in certain bulk imports (depending on the arguments used in the import rule and duplicate detection profile), bulkimport may leak memory leading to program failure. If the operation you are performing does not leak memory, or doesn't leak too much memory, then it will succeed. There is no easy way to monitor if your program is leaking memory. You will see core dump or possibly timeout errors in the import log file.
One option for some large import projects, such as a reclamation project, is to use the -M flag to run concurrent import sessions. For more information see: Overview of "Allow Multiple Instances of Bulk Import"
A suggested course of action is to use the training database for benchmark testing. This should provide a fair estimate of how similar imports will perform in production, what the boundaries are for your particular environment, and provide an opportunity to adjust any import plans before your production environment is impacted.
Always keep an eye on your available diskspace and avoid getting to the 85% filled point on your server. The "df -h" command will help you monitor usage.
For large bulk import projects, do not run the bibliographic records through keyword indexing during the loads. Instead, run regens later. Authority records are not keyword indexed.
See also:
- Bulk import fails with coredump, 'mgvMemoryAlloc' errors
- Why are bulk imports of MARC records running slowly?
- How do I determine if keyword regen is needed?
- Article last edited: 18-Mar-2021