Search Configurations for Different Languages

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

When using the Alma repository search (or when searching users, purchase requests, and fulfillment requests), you can search for special characters and characters with diacritics. Search language configuration (set by Ex Libris) is available in Alma for the below languages. For most of these languages, Alma uses the standard implementation for working with special characters.

Alma's handling of special characters is relevant for searching in the institution zone only.
Only one language for special characters search can be defined.

Normalization for the languages listed below is specific to text fields, such as title and author. Normalization is not done for numeric fields or fields that contain a normalized value, such as a call number.

German Characters

When your system is configured for German as the default searching language, German language characters are treated by the system as follows:

German Language Character/Character Combinations	Stored in the Alma Database
ß	ss
ä, Ä	ae
ö, Ö	oe
ü, Ü	ue
ae	ae
oe	oe
ue (when not following a vowel or q)	ue

With your system configured for German as the default language and the special German language characters stored in the system as identified in the above table, you can search using the special umlaut and Eszett characters or the extended Latin version of these characters (as shown in the second column above) and results will be treated equally. So, for example, if you search for Müller, the system will return results for both Müller and Mueller, but not Muller.

For systems whose default language is not German, a search for Müller will return search results for Müller and Muller but not Mueller.

When your institution (and Network Zone, if you are working with one) is configured for German as the default searching language, this standard German language special character search capability is available in the Institution Zone, Network Zone, and Community Zone.

For institutions that have German configured as the default searching language, the repository search results, user search results, and fulfillment request search results are sorted using the DIN 5007-1/2, section 6.1.1.4.1/2 standard. In addition to consideration for the special German language characters, hyphens are ignored when search results are sorted.

When sorting bibliographic and authority headings content, Alma removes dashes for sorting purposes in institutions that have the searching language parameter set to German.

When you use the Alma Browse Bibliographic Headings feature, the same sorting standard (DIN 5007-1/2, section 6.1.1.4.1/2) is used to sort the bibliographic headings. See Browsing Bibliographic Headings for more information.

Spanish and Catalan Characters

When your system is configured for Spanish as the default searching language (set by Ex Libris), special Spanish language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sort. Standard English characters are not substituted for the special Spanish characters. The following table describes how Alma handles special Spanish characters:

Letter	Search	Sort
Ñ/ñ	Searching for Ñ/ñ does not retrieve results for N/n and vice versa.	Sorted after n.
Ç/ç	Searching for Ç/ç does not retrieve results of C/c and vice versa.	Sorted after c.
L·L/l·l	Searched for as if it were the digraph ll.	Sorted as ll.

Diacritics are sorted in the following order:

Without diacritics
Acute
Grave
Dieresis

Scandinavian Characters (Swedish, Norwegian, Danish)

Alma normalizes the interchangeable Scandinavian characters æ Æ ä Ä ö Ö ø Ø and folded variants (aa, ao, ae, oe and oo) by transforming them to æ Æ å Å ø Ø.

Special characters cataloged in non-Scandinavian languages (such as French letters with accents), are normalized during indexing. This means that a search for a term including these special characters now behaves as if the search was done without them. (However, the opposite does not happen: a search for a term without these special characters is not treated as if done with these special characters.) The below table includes the rules for all characters with diacritics for customers of Scandinavian characters:

Language	Upper case	Lower case	Folded variant	The character (in Upper/Lower) is searchable with the following formulas:
Swedish	Å	å	Aa/aa	Å/å/Aa/aa
	Ä	ä	Ae/ae	Ä/Ae/ä/ae
	Ö	ö	Oe/oe	Ø/Ö/Oe/ø/ö/oe
	Æ	æ	Ae/ae	Æ/æ/Ae/ae
	Ø	ø	Oe/oe	Ø/Ö/Oe/ø/ö/oe
	Other accents (e.g È)	Other accents (e.g è)	Base characters (e.g. E/e)	Base characters (e.g. E/e)
	For example: Ö is searchable with Oe, but Oe is not searchable with Ö [Ö and Oe are not equivalent]. This means that a search for the term Edgar Allan Poe will return results for Edgar Allan Pö, but not the opposite: search for the term Pötry does not return results for Poetry.
Norwegian/Danish	Å	å	Aa/aa	Å/Aa/å/aa Exception: Å/å is equivalent to Aa/aa, and they are searchable interchangeably. A search for Aalborg returns results for Ålborg, and search for Ålborg returns results for Aalborg as well.
	Æ	æ	Ae/ae	Æ/Ä/Ae/æ/ae/ä
	Ø	ø	Oe/oe	Ø/Ö/Oe/ø/ö/oe
	Ö	ö	Oe/oe	Ø/Ö/Oe/ø/ö/oe
	Ä	ä	Ae/ae	Æ/Ä/Ae/æ/ae/ä
	Other accents (e.g È)	Other accents (e.g è)	Base characters (e.g. E/e)	Base characters (e.g. E/e)
	For example: Search for the term båd returns results for båd/baad, but not for bad Search for the term haan returns results for haan/hån, but not for han Search for the term søn returns results for søn / soen, but not for son or soon Search for the term baer returns results for bær / baer, but not vice versa (bær will not return baer)

Sorting the Norwegian and Danish special language characters for staff search and for Browse Bibliographic Headings/F3 functionality is handled differently from sorting the Swedish special language characters. See the sorting for each language below:

Norwegian/Danish Sorting	Swedish Sorting
a/A-z/Z (with Ü/ü sorted as Y/y) æ/Æ ; ä/Ä ø/Ø ; ö/ å/Å ; aa/Aa	a/A-z/Z (with æ/Æ sorted as ae/Ae; Ü/ü is sorted as Y/y) å/Å ä/Ä ö/Ö ; ø/Ø

Normalization for Norwegian and Danish is handled in the manner described in the Scandinavian Normalization Filter. For Swedish, Alma normalizes the Scandinavian characters in the same manner.

Icelandic Characters

When your system is configured for Icelandic as the default searching language (set by Ex Libris), Icelandic language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sorting. Standard English characters are not substituted for the special Icelandic characters. For example, a does not return á (and vice versa) in search results (when searching for “sál”, Alma does not return “sal"). They are not considered the same characters.

The following characters are converted as follows:

The character Ø/ø is converted to Ö/ö. It is sorted after Ó/ó.
The character Å/å is converted to AA/aa.
All other special characters with accents and umlauts, such as ä, ë, ü, û, è, are converted to their default values (a, e, u, etc.)

The results list is sorted based on the Icelandic alphabetical order (see the Icelandic Characters table below). This feature also applies to staff search for users, purchase and fulfillment requests, and deposits.

Icelandic Characters
Uppercase	Lowercase	Diacritics
A	a
Á	á	acute
B	b
C	c
D	d
Ð	ð	eth
E	e
É	é	acute
F	f
G	g
H	h
I	i
Í	í	acute
J	j
K	k
L	l
M	m
N	n
O	o
Ó	ó	acute
P	p
Q	q
R	r
S	s
T	t
U	u
Ú	ú	acute
V	v
W	w
X	x
Y	y
Ý	ý	acute
Z	z
Þ	þ	thorn
Æ	æ	ae
Ö	ö	Diaeresis

CJK Languages

Chinese and Korean Characters

Alma does hiragana to katakana transliteration, traditional Chinese to simplified Chinese transliteration, and splits words into bigrams and unigrams. See the ICU Transform Filter for more information.

Alma also does Hanja to Hangul transliteration. The sorting is unique to the Korean language.

Japanese Characters

For institutions that have the Japanese searching setup for Repository Search, Browse Bib Headings, and Browse Auth Headings, Alma performs the following:

Punctuation removal
Normalization between Hiragana and Katakana
Iterated character normalization
Normalization of variant Kanji characters

CJK Punctuation Handling

For institutions that have the Chinese, Hong Kong, Japanese, or Korean searching setup, all punctuation marks are removed during indexing when they appear within CJK text. However, the punctuation remains when you are searching. This helps to ensure that the best results are retrieved. Note that the display of CJK content continues to show the punctuation.

See the known search issue related to punctuation that is described in the note in the Using Advanced Search section.

Hong Kong TSVCC

Alma implements the Hong Kong Innovative Users Group (HKIUG) TSVCC (Traditional, Simplified, and Variant Chinese Characters) standard Version 1.0. released on 18 July 2006. In addition to handling the traditional and simplified Chinese characters, Alma also handles the variant Chinese characters when doing the following:

Searching metadata records
Browsing bibliographic headings
This includes searching for TSVCC characters entered as a value in Browse Bibliographic Headings and properly sorting headings that appear for browsing. When the same title occurs in different Chinese forms (including variant Chinese characters), all titles that are equivalent are sorted together in the headings list for browsing.
Searching for Chinese user names

TSVCC Chinese character handling is available for institutions that have the Alma searching language parameter set for Hong Kong. Contact Ex Libris Support if you need to have this institution parameter enabled.

For the complete HKIUG TSVCC table (UNICODE version), see http://hkiug-archive.lib.hku.hk/unicode/hkiug_tsvcc_table-UnicodeVersion-1.0.html.

Polish Characters

When your system is configured for Polish as the default searching language (set by Ex Libris), Polish language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sorting. Standard English characters are not substituted for the special Polish characters. For example, C does not return Ć (and vice versa) in search results. They are not considered the same characters.

The results list is sorted based on the Polish alphabetical order (see the Polish Characters table below). For example, być comes after bycie. This feature also applies to staff search for users, purchase and fulfillment requests, and deposits.

Polish Characters
Uppercase	Lowercase	Diacritics
A	a
Ą	ą	ogonek
B	b
C	c
Ć	ć	acute
D	d
E	e
Ę	ę	ogonek
F	f
G	g
H	h
I	i
J	j
K	k
L	l
Ł	ł	stroke
M	m
N	n
Ń	ń	acute
O	o
Ó	ó	acute
P	p
Q	q
R	r
S	s
Ś	ś	acute
T	t
U	u
V	v
W	w
X	x
Y	y
Z	z
Ź	ź	acute
Ż	ż	dot

Czech Characters

When your system is configured for Czech as the default searching language (set by Ex Libris), Czech language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sort. Standard English characters are not substituted for the special Czech characters. For example, C does not return Ć (and vice versa) in search results. They are not considered the same characters.

The results list is sorted based on the Czech alphabetical order (see the Czech Characters table below). This means that words starting with the digraph ch (chemie) are sorted between H and I. This feature also applies to staff search for users, purchase and fulfillment requests, and deposits.

Czech Characters
Uppercase	Lowercase	Diacritics
A	a
Á	á	acute
B	B
C	c
Č	č	caron
D	d
Ď	ď	acute
E	e
É	é	acute
Ě	ě	caron
F	f
G	g
H	h
Ch	ch
I	i
Í	í	acute
J	j
K	k
L	l
M	m
N	n
Ň	ň	caron
O	o
Ó	ó	acute
P	p
Q	q
R	r
Ř	ř	caron
S	s
Š	š	caron
T	t
Ť	ť	acute
U	u
Ú	ú	acute
Ů	ů	ring
V	v
W	w
X	x
Y	y
Ý	ý	acute
Z	z
Ž	ž	caron

Lithuanian Characters

When your system is configured for Lithuanian as the default searching language (set by Ex Libris), Lithuanian language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sorting. Standard English characters are not substituted for the special Lithuanian characters. This pertains to all the following Lithuanian characters: ą č ę ė į š ų ū ž Ą Č Ę Ė Į Š Ų Ū Ž.

Lithuanian characters can be searched and found using the corresponding Latin letters in queries:

Aa => Ąą
Cc => Čč
Ee => ĘĖęė
Ii => Įį
Ss => Šš
Uu => ŲŪųū
Zz => Žž
The same rule applies to all nonstandard Latin-based letters: German, Polish, Latvian, etc.

For example: Š is indexed as Š and S (and lower-case options); the text Šarūnas can be found with all of the following search queries: Šarūnas, Sarūnas, sarunas, saruNAS.

Letters that are not part of the official Lithuanian alphabet (Q/W/X) are sorted by their natural places in Latin. For example:

Q is sorted between P and R
W is sorted between V and Z

The sorting order of letters is as follows:

Lithuanian letters (includes both Latin and special Lithuanian letters above) with other Latin-based letters that are not pure Latin (e.g. Polish, German, Scandinavian).
Cyrillic and all non-Latin based alphabets
Chinese is at the end

The results list is sorted based on the Lithuanian alphabetical order (see the Lithuanian Characters table below). This sorting also applies to staff search for users, purchase and fulfillment requests, and deposits.

Lithuanian Characters
Uppercase	Lowercase
A	a
Ą	ą
B	b
C	c
Č	č
D	d
E	e
Ę	ę
Ė	ė
F	f
G	g
H	h
I	i
Į	į
Y	y
J	j
K	k
L	l
Ł	ł
M	m
N	n
O	o
P	p
R	r
S	s
Š	š
T	t
U	u
Ų	ų
Ū	ū
V	v
Z	z
Ż	ż

Lithuanian quotation marks

Lithuanian quotation marks are interpreted in the same way as Latin. So for indexing, search, and ordering, the terms "Great Britain" and „Great Britain“ are interpreted the same.

In search query, when using quotation marks to specify that exact phrase should be searched, only the Latin quotation marks are interpreted in Alma as exact phrase search. The Lithuanian quotation marks do not indicate an exact phrase search. User is expected to use the regular quotation mark for exact search phrase.

Russian letters transliteration

Alma supports Russian letters transliteration, so that results include both Lithuanian and Russian phrases. For example: when the actual query text is kaunas, Alma will find both kaunas (Latin) and каунас (Cyrillic).

Arabic and Persian Characters

Similar Arabic/Persian characters are treated as the same characters for repository search, browse bibliographic headings/F3, and sorting. For example, ڤ returns ف (and vice versa) in search results.

The following character groups are treated as the same character and are interchangeable:
ا – أ – إ – آ
ى – ي - ئ
ه - ة - ۀ
و - ؤ
ك – گ – ک
ف - ڤ
ز - ژ
ب - پ
ج - چ
ق - ڨ

Croatian Characters

When your system is configured for Croatian as the default searching language (set by Ex Libris), Croatian language characters are treated as fully independent letters for repository search, browse bibliographic headings/F3, and sorting. Standard English characters are not substituted for the special Croatian characters. This pertains to all the following Croatian characters: č, ć, dž, đ, lj, nj], š, ž.

Croatian characters can be searched for and found using the corresponding Latin letters in queries:

Cc => ČĆčć
Dd => DžĐdžđ
Ee => ĘĖęė
Li => Ljlj
Nn => Njnj
Ss => Šš
Zz => Žž

The same rule applies to all nonstandard Latin-based letters: German, Polish, Latvian, etc.

For example: Š is indexed as Š and S (and lower-case options); the text Šrętan can be found with all of the following search queries: Šrętan, Srętan, sretan, sreTAN.

Letters that are not part of the official Croatian alphabet (Q/W/X/Y) are sorted by their natural places in Latin. For example:

Q is sorted between P and R
W is sorted between V and Z

The sorting order of letters is as follows:

Croatian letters (includes both Latin and special Croatian letters above) with other Latin-based letters that are not pure Latin (e.g. Polish, German, Scandinavian).
Cyrillic and all non-Latin based alphabets
Chinese is at the end

The results list is sorted based on the Croatian alphabetical order (see the Croatian Characters table below). This sorting also applies to staff search for users, purchase and fulfillment requests, and deposits.

Croatian Characters
Uppercase	Lowercase
A	a
B	b
C	c
Č	č
Ć	ć
D	d
Dž	dž
Đ	đ
E	e
F	f
G	g
H	h
I	i
J	j
K	k
L	l
Lj	lj
M	m
N	n
Nj	nj
O	o
P	p
R	r
S	s
Š	š
T	t
U	u
V	v
Z	z
Ž	ž

Uppercase	Lowercase
A	a
Ą	ą
B	b
C	c
Č	č
D	d
E	e
Ę	ę
Ė	ė
F	f
G	g
H	h
I	i
Į	į
Y	y
J	j
K	k
L	l
Ł	ł
M	m
N	n
O	o
P	p
R	r
S	s
Š	š
T	t
U	u
Ų	ų
Ū	ū
V	v
Z	z
Ż	ż

Uppercase	Lowercase
A	a
B	b
C	c
Č	č
Ć	ć
D	d
Dž	dž
Đ	đ
E	e
F	f
G	g
H	h
I	i
J	j
K	k
L	l
Lj	lj
M	m
N	n
Nj	nj
O	o
P	p
R	r
S	s
Š	š
T	t
U	u
V	v
Z	z
Ž	ž

Uppercase	Lowercase
A	a
Ą	ą
B	b
C	c
Č	č
D	d
E	e
Ę	ę
Ė	ė
F	f
G	g
H	h
I	i
Į	į
Y	y
J	j
K	k
L	l
Ł	ł
M	m
N	n
O	o
P	p
R	r
S	s
Š	š
T	t
U	u
Ų	ų
Ū	ū
V	v
Z	z
Ż	ż

Uppercase	Lowercase
A	a
B	b
C	c
Č	č
Ć	ć
D	d
Dž	dž
Đ	đ
E	e
F	f
G	g
H	h
I	i
J	j
K	k
L	l
Lj	lj
M	m
N	n
Nj	nj
O	o
P	p
R	r
S	s
Š	š
T	t
U	u
V	v
Z	z
Ž	ž

Uppercase	Lowercase
A	a
Ą	ą
B	b
C	c
Č	č
D	d
E	e
Ę	ę
Ė	ė
F	f
G	g
H	h
I	i
Į	į
Y	y
J	j
K	k
L	l
Ł	ł
M	m
N	n
O	o
P	p
R	r
S	s
Š	š
T	t
U	u
Ų	ų
Ū	ū
V	v
Z	z
Ż	ż

Uppercase	Lowercase
A	a
B	b
C	c
Č	č
Ć	ć
D	d
Dž	dž
Đ	đ
E	e
F	f
G	g
H	h
I	i
J	j
K	k
L	l
Lj	lj
M	m
N	n
Nj	nj
O	o
P	p
R	r
S	s
Š	š
T	t
U	u
V	v
Z	z
Ž	ž