Additional XML tags in Dublin Core (DC) fields
- Product: Rosetta
- Product Version: 6.0+
Symptoms
Search in permanent returns no results for IEs with DC fields containing additional XML tags (when the field is included in search columns).
Example - SIP METS and resulting IE METS XML (view on the server) contain a DC field like:
<dc:title>
this is title
<span>of a book</span>
</dc:title>
Search that should contain IE with such DC fields values will fail if the title is in selected search columns. Such IEs will not be included in the seach results, in the server log is:
2019-08-26 10:25:38,956 ERROR [com.exlibris.core.repository.dao.impl.HDeControlDaoImpl] (https-jsse-nio-8443-exec-4) [] | RPS-il-dtldev08c.corp.exlibrisgroup.com | javax.persistence.QueryTimeoutException: could not execute query
at org.hibernate.ejb.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1351)
....
Caused by: java.sql.SQLException: ORA-19025: EXTRACTVALUE returns value of only one node
...
Workaround
Ingesting SIPs containing DC records with XML reserved characters is not supported. Rosetta does not validate DC fields in general METS validation.
1) Escape XML reserved characters before ingest.
Example - SIP METS and resulting IE METS XML (view on the server) contain escaped DC record like:
<dc:title>
this is title
<span>of a book</span>
</dc:title>
2) Wrap the problematic fields values to CDATA.
<dc:description>
<![CDATA[
<div>Lorem ipsum dolor sit amet</div><div><br></div><div> In convallis<i> Curabitur sagittis hendrerit ante. </i>Curabitur sagittis <b>43</b>(9) 2142-2145 (2018)<br></div><div><br></div><div> <a href="https://doi.org/XXXXX">https://doi.org/XXXX</a> (also https://arxiv.org/abs/XXXX)</div>
]]>
</dc:description>
3) Fix existing problematic data.
Use Rosetta Metadata update job to export problematic DC records, fix them outside Rosetta and update.
- Article last edited: 22-AUG-2019