Primo VE AP02 - RCA - July 7, 2025
Introduction
This document serves as a Root Cause Analysis for the service interruption experienced by Ex Libris customers on HEP AP02.
The goal of this document is to share our findings regarding the event, specify the root cause analysis, outline actions to be taken to solve the downtime event, as well as preventive measures Ex Libris is taking to avoid similar cases in future.
Effected Products
Primo VE
Event Timeline
Service degradation was experienced by Ex Libris Primo VE customers served by the Higher-Ed Platform AP02 instance at the Sydney Data Center on July 7, 2025 at the following time frame (Sydney time):
-
12:40 PM: First system down case was received at the 24X7Hub and was escalated to our Ex libris support team for troubleshooting.
-
2:03 PM: As the issue was identified to impact multiple customers, a cross-instance event bridge was created to manage communication and coordination.
-
4:30 PM: Primo VE R&D team identified the problematic SQL query as the root cause and started to work on a code fix.
-
4:30 PM: First Status Page notification was published to inform affected Primo VE customers.
-
5:12 PM: A more efficient execution plan to temporarily stabilize system performance was pinned, while the permanent fix was being developed. From the this point on, the system became available again.
-
6:55 PM: Code fix was completed and deployed.
-
9:35 PM: Status Page was updated to reflect that the issue had been resolved.
During the event, customers experienced slowness and delayed load times in the environment.
Root Cause Analysis
Ex Libris Engineers investigated this event to determine the root cause with the following results:
The root cause was identified as an unoptimized SQL query introduced in the July 2025 release. Under high load conditions, this query led to significant performance degradation in Primo VE, and briefly impacted Alma performance as well.
To mitigate the impact, our DBA team identified and terminated long-running queries. A more efficient execution plan was pinned to stabilize performance and the R&D team implemented a manual rollback of the problematic query.
This fix was deployed globally and completed on 7 July 2025 end of the day, restoring normal service levels.
Due to misconfiguration, Status Page posts were not accurately updated but only sent emails to the registered customers. The issue was fixed and tested as well.
Technical Action Items and Preventive Measures
Ex Libris has taken the following action and preventive measures to avoid such an occurrence in future:
-
Enforce mandatory performance benchmarking for all new SQL queries.
-
Refreshed escalation protocols for performance-related incidents, ensuring clarity and consistency across teams.
-
Review and enhance our trouble shooting guide and processes for performance issues, to enhance investigation and shorten resolution time.
Customer Communication
Ex Libris is committed to providing customers with prompt and ongoing updates during Cloud events. Ongoing and prompt updates on service interruptions appear in the system status portal at this address: http://status.exlibrisgroup.com/
These updates are automatically sent as emails to registered customers.

