campusM EU01 Performance Issues - RCA - September 30 - October 15, 2024
Introduction
This document serves as a Root Cause Analysis for the campusM EU01 service interruption experienced by Ex Libris customers.
The goal of this document is to share our findings regarding the event, specify the root cause analysis, outline actions to be taken to solve the downtime event, as well as preventive measures Ex Libris is taking to avoid similar cases in future.
Event Timeline
Service interruption was experienced by Ex Libris customers served by the campusM EU01 instance at the Amsterdam Data Center during the following times:
September 30 - October 15, 2024.
During the event, there were performance issues.
Root Cause Analysis
Ex Libris Engineers investigated this event to determine the root cause analysis with the following results:
On September 30, performance alerts were triggered in all European datacenters. The Ex Libris teams quickly investigated and found a network issue affecting CampusM instances using the Attendance and Timetable application features, especially during peak times.
To address this, we restarted the system and increased network capacity by moving servers to a faster machine on September 30 and October 4.
Despite these upgrades, traffic issues persisted. The Development team conducted various tests and on October 9, deployed a software fix. Internal traffic was minimized further on October 10.
Additional application fixes were applied in North America on October 14 and gradually in Europe by October 15. Remaining datacenters will be updated in November. The entire resolution process took two weeks.
Technical Action Items and Preventive Measures
Ex Libris has taken the following action and preventive measures to avoid such an occurrence in future:
-
Additional alerts will be added to the monitoring system to provide early warnings in case of extreme load at specific times.
-
We plan to install a new version of Palo Alto Firewall and implement monitoring to proactively alert in similar situations.
Customer Communication
ExLibris is committed to providing customers with prompt and ongoing updates during Cloud events. Ongoing and prompt updates on service interruptions appear in the system status portal at this address: http://status.exlibrisgroup.com/
These updates are automatically sent as emails to registered customers.

