Bibliographic Rank Algorithm
Alma evaluates the completeness and richness of MARC 21 bibliographic records based on information that includes identifiers, names, subjects, informative LDR and 008 fields, publication details, etc. This is reflected in the Bibliographic Rank, meant to provide a helpful tool for libraries to identify records that may need attention. The new bibliographic rank is displayed in the record view and in the Metadata Editor.
The bibliographic ranking range is between 1 - 120. Generally, records that are ranked higher than 75 are considered good records.
The bibliographic ranking is generated through an algorithm that is further described below.
General Model
This is a two-level approach:
- Level 1 - Breadth: The focus here is on coverage: Fields are grouped into categories and where a record has any one of the fields in a category it is given a score according to the importance of the category.
- LOW importance gives 1 point
- MEDIUM importance gives 3 points
- HIGH importance gives 7 points
For example, Subjects category has high importance, so it is assigned a score of 7. Canceled identifiers category is less important, so it has a score of only 1. There are 27 categories. The full list is described below.
- Level 2 - Depth: The second focus is on depth. For example, rather than just checking that there is a 6XX field, attention is paid to how many 6XX fields are included.
Depth is relevant only for some of the categories. When a record has such a category, the fields in the category are counted. The number of fields is the depth score of the category.Each relevant category has a "depth limit" to avoid giving too much weight to having many fields.
The total score is breadth score + depth score.
Categories
The following is the full list of categories. For each category, this information is included:
- List of the fields in the category
- Importance
- Indication if it is relevant for depth, and if so:
- Depth limit
# | Category Name | Fields | Importance | Relevant for depth? | Depth limit |
---|---|---|---|---|---|
1 | Canceled identifier |
|
LOW |
No |
|
2 | Classification and Call Number |
|
HIGH |
Yes |
3 |
3 | Coded language/place/time |
|
LOW |
Yes |
3 |
4 | Control fields |
|
MEDIUM |
No |
|
5 | 008 Common data |
One or more of the following, must have value that is not | nor #:
|
HIGH |
Yes |
5 |
6 | 008 Books data
(If Leader/06 = a and Leader/07 = a, c, d, or m) |
One or more of the following, must have value that is not | nor #:
|
LOW |
No |
|
7 | 008 Computer files data
(Leader/06 = m) |
One or more of the following, must have value that is not | and not #:
|
LOW |
No |
|
8 | 008 Music data
(Leader/06 = c, d, i, or j) |
One or more of the following, must have value that is not | and not #:
|
MEDIUM |
Yes |
5 |
9 | 008 Visual Materials data
(Leader/06 = g, k, o, or r) |
One or more of the following, must have value that is not | and not #:
|
MEDIUM |
Yes |
5 |
10 |
008 Maps data Leader/06 = e, or f) |
One or more of the following, must have value that is not | and not #:
|
MEDIUM |
Yes |
5 |
11 | 008 Continuing Resources
(Leader/06 = a and Leader/07 = b, i, or s) |
One or more of the following, must have value that is not | and not #:
|
MEDIUM |
No |
|
12 | Edition |
250 - Edition Statement |
HIGH |
No |
|
13 | Identifier |
|
HIGH |
Yes |
10 |
14 | Leader |
|
HIGH |
No |
|
15 | Names |
|
HIGH |
Yes |
5 |
16 | Note |
|
LOW |
No |
|
17 | Bibliography |
|
LOW |
No |
|
18 | Subjects |
One or more of the following, must have second indicator that is 0/1/2/3/5/6/7.
|
HIGH |
Yes |
15 |
19 | Other Physical information |
|
MEDIUM |
Yes |
3 |
20 | Physical description |
• 300 - Physical Description |
MEDIUM |
Yes |
5 |
21 | Publication details |
• 260 - Publication, Distribution, etc. (Imprint) |
HIGH |
No |
|
22 | Related items |
One or more of the following. Must include $a or $t: |
LOW |
No |
|
23 | Series |
One or more of the following. Must include $a: 780 - Preceding Entry 785 - Succeeding Entry |
MEDIUM |
Yes |
3 |
24 | Summary |
• 520 - Summary, etc |
MEDIUM |
No |
|
25 | Table of content |
• 505 - Formatted Contents Note |
MEDIUM |
No |
|
26 | Title |
• 245 with a minimum of either $a or $k |
HIGH |
No |
|
27 | Uniform title |
• 130 - Main Entry - Uniform Title |
LOW |
No |
Validations
In addition to breadth and depth scores, some Alma validations are done to check the basics of the MARC21 format. The following validations are invoked:
- Mandatory fields exist (LDR and 245)
- Control fields have legitimate data
- Indicators have legitimate data
- Only fields that are repeatable appear multiple times
- Only subfields that are repeatable appear multiple times
- All sub-fields are valid according to MARC standard
If there is an issue, the total score is reduced by 1 point.
Accuracy
In addition to the above validation, there is a check to make sure the data is accurate:
- ISBN check digit
- ISSN check digit
- "Other Standard Number" check digit
- Form of material in 006 field (position 0) matches the material type in the leader (LDR)
If there is an issue, the total score is reduced by 1 point.