Esploro Asset Matching Rules
This page describes the asset matching rules used by Esploro. This algorithm is invoked when an asset is added to the Esploro repository via Smart Harvesting, Smart Expansion or manual deposits.
Rules
Keys |
---|
DOI + Brief Title |
PMID + Brief Title |
ISBN + Brief title |
DOI + First Author Last name |
PMID + First Author Last Name |
DOI + “Is Part of” Start Page |
PMID + “Is Part of” Start Page |
“Is Part of” ISSN + “Is Part of” Volume + “Is Part of” Issue + “Is Part of” Start Page |
“Is Part of” ISBN + Brief Title + “Is Part of” volume |
“Is Part of” Title + Brief Title + “Is Part of” volume + “Is Part of” Issue + “Is Part of” Start Page |
“Is Part of” Title + Brief title + “Is Part of” Start Page + Year |
“Is Part of” Title + Brief title + Year + 1st Author Last Name + “Is Part of” Start Page |
“Is Part of” title + Title + Year + 1st Author Last Name |
Title + Year + 1st Author Last Name + Asset Category |
Data Elements
Data Element | Field in Schema | Normalizations | Notes |
---|---|---|---|
DOI |
identifier.doi |
None |
|
PMID |
identifier.pmid |
None |
|
Title |
title + subtitle |
|
|
Brief Title |
Take first 25 characters of the normalized Title |
|
|
“Is part of” Title |
relationship /relationtype=ispartof /relationtitle |
Same as Title. |
|
ISBN |
identifier.isbn + identifier.eisbn |
|
Take all occurrences. Each is basis for separate rule. |
ISSN |
identifier.issn + identifier.eissn |
Delete any text that may be in the field. Delete hyphen and pack spaces.
|
Take all occurrences. Each is basis for separate rule. |
“Is part of” ISSN |
relationship /relationtype=ispartof /identifier.issn or .eissn |
Same as ISSN. |
|
“Is part of” ISBN |
relationship /relationtype=ispartof /identifier.isbn or .eisbn |
Same as ISBN. |
|
“Is Part of” Volume |
relationship /relationtype=ispartof /volume |
Delete all punctuation and non-digit characters. Take only first remaining digit, for example: |
|
“Is Part of” Issue |
relationship /relationtype=ispartof /issue |
Delete all punctuation and non-digit characters. Take only first remaining digit, for example: Iss. 1 -> 1 2-3 -> 2 |
|
“Is Part of” StartPage |
relationship /relationtype=ispartof /spage |
Delete all punctuation and non-digit characters. Take only first remaining digit, for example: (23-29) -> 23
|
|
year |
Take most recent year.. |
Take year only from date – YYYY |
|
Asset Category |
Category from the RESEARCH_ASSET record. |
|
|
1st Author Last Name |
Take the first author’s last name.
|
|
|