Why is there a discrepancy between the publisher's metadata and the metadata captured by Esploro automatically?

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Product: Esploro

Question

Answer

The Central Discovery Index (CDI) is the data source for Smart Harvesting and auto-population processes. CDI has billions of records and ingests more from multiple sources (publishers, aggregators, and repositories of various kinds) on a daily basis. CDI is inclusive and harvests records from all subject domains. There are over 30,000 sources.

CDI harvests records from various sources on an ongoing basis. The periodicity varies between the sources from daily to monthly. Some of the key sources are harvested daily. It is important to keep in mind that CDI is indexed twice a week which means that even if the source is harvested daily, records will be added to the index only after the semiweekly indexing has run.

Esploro uses CDI data from all sources we can legally use. CDI often gets multiple versions of the same record from different sources and creates a logical record merging some elements from the different sources.

With so many sources the quality of the records in CDI can vary; even records from a single source will vary. Most of the records in CDI have high-quality metadata but there are some that are missing data and/or have errors. This can happen even in case of records from very trustworthy sources, including publishers.

Examples of the known metadata issues:

There are occasional duplicates
Author affiliations are missing or messy
Author names are inversed
Corporate authors are incorrect

The Esploro team is working on improving the results where possible.

Article last edited: 22-Jun-2020