Splitting a large file of MARC records into smaller files

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Article Type: General
Product: Aleph
Product Version: 16.02

Description:
I am working with a set of 136,000 MARC records. When the MARC records are converted to ALEPH sequential format the resulting file is too large to view with vi or other tools.

Is there a way to break these files into smaller pieces?

Resolution:
There are two ways to do this:

(1) Use the UNIX "split" command on the output of file-01 ("bigfile01"):
>> split -l 20000 bigfile01 splitfile

This will generate 7 files:
splitfileaa 00001-20000
splitfileab 20001-40000
splitfileac 40001-60000
splitfilead 60001-80000
splitfileae 80001-100000
splitfileaf 100001-120000
splitfileag 120001-136000

Then run each of these files through p_file_02.

OR

(2) Once a file has been run through p_file_02, the lines consist of individual fields rather than records.

The grep qualifier "^" indicates that the string must be at the beginning of the line.

You can break the output of p_file_02 (bigfile02) into 1,000-record pieces like this:

>> grep ^000000 bigfile02 > file000000 - this will give you records 000000001 - 000000999 .
>> grep ^000001 bigfile02 > file000001 - this will give you records 000001000 - 000001999
>> grep ^000002 bigfile02 > file000002 - this will give you records 000002000 - 000002999 .
Etc.

(Keywords: file-01 file_01 file-02 file_02)

Article last edited: 12/1/2016