Publishing Information background
Just like for Primo Central Index (PCI), there is an institutional holdings file available for the Central Discovery Index (CDI).
Ex Libris has not yet documented this in detail, with some of this information provided by SalesForce case request, some in meetings with Ex Libris staff, and some gleaned from local testing (which is ongoing).
This CKC is written to share this information, to fill this documentation gap and support the international community.
Note: There is also CDI Publishing Information available for 51* (electronic title) records in Alma directly, like there is for PCI, but Ex Libris have advised that 61* information (electronic collection) ie records for full text assigned at the collection level and for searchable collections is NOT available in Alma and this would be an enhancement request, so this file is vital as the only confirmation of correct publishing.
Known issue during CDI Enablement: If you cannot see the Publishing to Central Discovery Publishing Profile in your Publishing Information in Alma for 51*, go to Resources > Publishing Profile, Edit the Profile and Save (also advised in SalesForce).
CDI Institutional Holdings File
The holdings file structure for CDI is more complex than for PCI.
Alma is now also publishing search activations from Alma, as well as full text activation for database packages, where subscription is on package level.
BE AWARE! Unlike PCI, CDI does NOT match by title only. So, if you have a record which does not have an Identifier, it will NEVER be marked as 'Available online' if it is only in the holding.xml file with a title alone
- This is why Ex Libris recommends settings such as 'We subscribe to only some titles" of No aka selective false, to "improve matching" for those collections which are known not to have identifiers, because those titles with selective false in the holding.xml file are then assigned 'Available online' with no further checks. This same will apply for the new setting introduced in August 2020 of 'Add CDI-only Full Text activation'
- HOWEVER if these collections are Link Resolver, this may result in an availability assignment of Available online, but a View It of 'No full text available' also, because the Link Resolver also requires those identifiers to match correctly
- In sum, this recommendation allows for the scenario of the occasional Link Resolver failure, and you should decide if that is acceptable to you before using it for Link Resolver collections
- The danger of false positives in filtered search is heightened significantly if you have chosen to enable the setting introduced in March 2020 to only match when there are standard identifiers, and to not match on title only
- Find GetIt services based on standard identifiers only parameter (Configuration Menu > Fulfillment > Discovery Interface Display Logic > Other Settings)
- This adds <key id="skip_bib_match_for_getit">true</key> to your OpenURL call
The URL for the CDI holdings file is
Also adjust your base URL as needed for your environment, as ap01 is for APAC
The CDI holdings file is a tar.gz file which consists of three files
- File 1
- An expanded holdings file - holding.xml
- This information is available via this file, as well as within Alma via the Publishing Information, where the user has the full Repository Administrator, Catalog Administrator, or General System Administrator roles (not read only)
- nb if you also think restricting this non-editable information is ridiculous, rather than showing it also to Electronic Inventory Operators to help with troubleshooting via this vital information, vote here: Improve visibility of Publishing Information for electronic holdings
- This file is the equivalent of the current holdings file for Primo Central, but is expanded to include additional fields where the collection is shared ie Community Zone (CZ) with a DBID
- This file contains the electronic holdings of an institution for Aggregator package and Selective package Electronic Collection Types, where the collections contain full text active portfolios
- The outcome of an electronic holding being in this file is to trigger every CDI record associated with the identified resource to be marked as 'Available online', dependent on additional factors such as coverage ranges and embargoes
- Each entry contains either just an item section, for those without DBIDs, or both an item section and a collection section, for those with DBIDs
- Each entry starts with </item>
- Item section
- item type="electronic"
- may be multiple, on separate lines, including separate lines for 245a and then 245a and b
- includes at least 245a,b and 246 (maybe more - needs more testing)
- isbn / issn
- may be multiple, on separate lines
- includes both raw and normalized forms, even if not visually present in the bibliographic record in Alma
- includes both 10 and 13 forms, even if not visually present in the bibliographic record in Alma
- includes at least 020a / 022a / 776x (maybe more - needs more testing, as some records are seen to publish 020z and 776z, and others not - perhaps coding issue in only sending those with no indicators?)
- for example BOOK
- not for IZ records, only CZ, and is found in the bilbiographic record in Alma in the 001
- starts with 99
- individual to the site
- starts with 99 and ends with your 4 digit institution code
- Collection section
- This section comes after the item section when the record is from a CZ DBID collection, starts with <collection> and ends with </collection>
- starts with 53
- may be multiple, on same line separated by commas
- true or false
- Not sure about this one, as appears to be from the Portfolio level, even though it may be inherited from the service ie entries are false for portfolio in the file, but true in Alma as inherited from the service
- true or false
- this is determined by the setting in the Electronic Collection CDI Tab of 'We subscribe to only some titles in this collection'
- 'We subscribe to only some titles in this collection' NO = the file will contain <selective>false</selective>
- 'We subscribe to only some titles in this collection' YES = the file will contain <selective>true</selective>
- Practical example - we have Brepols Journals (611000000000001395) Link Resolver collection - 'CDI Search Activation Status' set to Active and 'We subscribe to only some titles in this collection' set to Yes. This collection has two DBIDs of AUPNH,RVB. There are 62 portfolios available in this CZ collection, but we have only 15 in our IZ active for full text. If checking the searchable_dbids file, we find one of the DBIDs as expected (RVB) for a collection active for search in CDI. If checking the db_ids file, we do not find either of the DBIDs associated with this collection, as expected for a collection which is not a Database. When checking the holdings file with a search by Brepols Journals aka the Public Name, we find 15 hits for these 15 titles, which is expected for the electronic holdings file to contain only our coverage in full text at the title level. We can also see that the selective field is set to true, as expected per the CDI tab setting. If we check this in Primo by individual ISSNs, we find Article records in our filtered search from those 15 ISSNs which are active for full text, marked as 'Available online' and correctly matching by Link Resolver to provide service links. For example, 1378-2274 with coverage in the holdings file of 1997 to 2011, and we can see that CDI records in Primo in this range are 'Available online' and CDI records later than 2011 are 'No full-text'. In contrast, if we go to the CZ and find an ISSN for one of the 47 other titles we don't have in our IZ (2507-0371) and search it in Primo, we will see that the CDI records appear only in expanded search as 'No full-text' and no match by CTO Link Resolver
- Practical example - we have British Periodicals Collection IV (613790000000000673) Link Resolver - 'CDI Search Activation Status' set to Active and 'We subscribe to only some titles in this collection' set to No. This collection has one DBID of AUIGY. There are 10 portfolios available in this CZ collection, using completely global coverage as provided by Ex Libris, and we have all 10 in our IZ active for full text (so is not quite a good scenario, but we'll go with it). If checking the searchable_dbids file, we find the DBID as expected for a collection active for search in CDI. If checking the db_ids file, we do not find the DBID associated with this collection, as expected for a collection which is not a Database. When checking the holdings file with a search by AUIGY, as this collection doesn't have identifiers (the whole point of using this 'No' setting for matching Alma holdings to CDI records by Ex Libris guidance), we find 10 hits for these 10 titles, which is expected for the electronic holdings file to contain our coverage in full text at the title level. We can also see that the selective field is set to false, as expected per the CDI tab setting. If we check this in Primo by individual titles... THIS IS THE BIT WHICH DOESN'T MAKE SENSE - for example "Picture Show; London" and limit to a portion of our three coverage ranges of 1921 to 1939, we find Article records in our filtered search which are marked as 'Available online', but when we open the records we find a View It of 'No full text available'. This completely makes sense from a perspective of this being a Link Resolver collection, which still needs to match to identifiers, titles, and various other parsed data in the CDI record, which is often not present, but does not make sense from a perspective that Ex Libris recommends this setting for collections without identifiers. This is because it will 'work' to assign the availability status without relying on identifiers but it will NOT work at the most critical point of actually delivering a valid service link to users in Primo at the OpenURL call stage. For example: cdi_proquest_miscellaneous_1880321797 and also for another title Top Spot of cdi_proquest_miscellaneous_1879612879
- (needs more testing, as the double negatives are confusing, and some entries don't match (defect?) between Alma and the holdings file. There was a fix in June 2020 wherein changes to the 'We subscribe to only some titles' in the CDI tab were not being taken into account when publishing, causing portfolios which were not active to be marked in CDI as available, but what happens if portfolio settings and CDI settings contradict each other?)
- File 2
- A new Database file - db_ids.xml
- This information is ONLY available via this file, and is not available in Alma, even for those with Administrator roles
- This file is the equivalent to the concept of activating Link in Record collections in the PCI Interface for Database type
- This contains DBIDs for Electronic Collection Type of Database, where the collection is active for full text, and where it is also NOT set to 'Do not show as Full Text in CDI even if active in Alma'
- Note: Full text active status for CDI purposes for Database type electronic collections is that the Electronic Collection has an unsuppressed bibliographic record as Additional Descriptive Record and has an Electronic Collection URL OR the new setting introduced in August 2020 is used of 'Add CDI-only Full Text activation'
- This file will also include the DBIDs of collections which have the special PCI to CDI transition-only setting of 'Active for full text in CDI only', until this is removed. This special status was created for Link in Record collections which were added to Alma by Ex Libris during CDI Enablement, for collections previously found only in the PCI Interface, in order to maintain the same full text active discovery environment during the transition from PCI to CDI.
- The outcome of a DBID being in this file is to trigger every CDI record associated with this DBID to be marked as 'Available online' in your Primo environment
- File 3
- A new Searchable file - searchable_dbids.xml
- This information is ONLY available via this file, and is not available in Alma, even for those with Administrator roles
- This file is the equivalent to the concept of activating collections in the PCI Interface, so that the records are discoverable, but has NO impact on availability assignment
- This contains the DBIDs for all collections, regardless of Electronic Collection Type, which are active for search in CDI
- The outcome of a DBID being in this file is to trigger every CDI record associated with this DBID to be discoverable in your Primo environment
- Note: Unless the title or DBID is also indicated in either the holding.xml or db_ids.xml file respectively as full text active, the records will be only in expanded search as 'No full-text'
Opening the file/s
This can be very difficult, given the massive size of File 1
- Add your URL to a doc as a URL or webpage, and then click to open to generate a download file
- Use 7-zip to Open archive
- There will be a single holdings tar file originally, but click on this file to reveal the 3 xml files total, as above
- The two smaller files can be opened easily in Notepad, but the larger file needs Notepad++ (even then, you might need to be patient when searching!)
Currency of the file
The file is updated daily after the running of the scheduled automated job: "Publishing electronic records to Central Discovery Index", which for AP01 is 3am AEST.
This starts the clock ticking for associated changes in CDI for Primo of maximum 48 hours (currently 72+ hours, with goal reduction to 48 hours by end of 2020), so the file reflects Alma activations, not associated Primo records.
The same should apply (as advised in SalesForce) for manually running the job "Publish electronic records to CDI", but this job currently fails when attempted to run, and is completely undocumented. Update: Now advised also by SalesForce that this job was made visible by error, and we will not be able to run the job, but must wait for the scheduled job.
Feedback and suggestions
- Stacey van Groll
- Discovery and Access Coordinator
- University of Queensland