LaTeX in the Central Discovery Index (CDI)

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

In order to enhance readability of mathematical and scientific expressions in the search results, CDI now has the ability to process LaTeX markup in the title, abstract, and subject fields.

However, CDI does not include a LaTeX rendering engine. Instead, CDI applies normalization rules that convert recognizable LaTeX patterns into readable text.

The outcome depends heavily on how the original publisher encoded the markup.

Due to certain data inconsistencies and the wide variance in LaTeX syntax (both in terms of structural syntax and accented variations), there may still be some issues with display of LateX characters in this field.

Some expressions cannot be accurately converted to Unicode - for example, d(μ), where no Unicode subscript version of μ exists. Complex expressions, especially those that are nested or rely on formatting instructions, are less likely to be preserved.

LaTeX is used inconsistently across publishers. The same visual expression may be encoded using different commands or structures. Publishers like Springer may address this by embedding images, but we cannot apply the same approach.

When expressions rely on layout or presentation commands, rather than pure semantic meaning, normalization becomes unreliable. This is especially common with chemical formulas and accented characters.

Where normalization cannot produce a clean result, it may leave parts of the original markup visible.
This is expected and reflects the boundaries of what normalization can achieve.

Normalization applies only to the title, abstract, and subject fields.