Skip to main content
ExLibris

Knowledge Assistant

BETA
 
  • Subscribe by RSS
  • Back
    Cross-Product
    Ex Libris Knowledge Center
    1. Search site
      Go back to previous article
      1. Sign in
        • Sign in
        • Forgot password
    1. Home
    2. Cross-Product
    3. RCA Reports
    4. APAC
    5. Primo VE AP02 - RCA - July 7, 2025

    Primo VE AP02 - RCA - July 7, 2025

    1. Last updated
    2. Save as PDF
    3. Share
      1. Share
      2. Tweet
      3. Share
    1. Introduction
    2. Effected Products 
    3. Event Timeline
    4. Root Cause Analysis
    5. Technical Action Items and Preventive Measures
    6. Customer Communication

    Introduction

    This document serves as a Root Cause Analysis for the service interruption experienced by Ex Libris customers on HEP AP02.

    The goal of this document is to share our findings regarding the event, specify the root cause analysis, outline actions to be taken to solve the downtime event, as well as preventive measures Ex Libris is taking to avoid similar cases in future.

    Effected Products 

    Primo VE

    Event Timeline

     

    Service degradation was experienced by Ex Libris Primo VE customers served by the Higher-Ed Platform AP02 instance at the Sydney Data Center on July 7, 2025 at the following time frame (Sydney time):

    • 12:40 PM: First system down case was received at the 24X7Hub and was escalated to our Ex libris support team for troubleshooting.

    • 2:03 PM: As the issue was identified to impact multiple customers, a cross-instance event bridge was created to manage communication and coordination.

    • 4:30 PM: Primo VE R&D team identified the problematic SQL query as the root cause and started to work on a code fix.

    • 4:30 PM: First Status Page notification was published to inform affected Primo VE customers.

    • 5:12 PM: A more efficient execution plan to temporarily stabilize system performance was pinned, while the permanent fix was being developed. From the this point on, the system became available again.

    • 6:55 PM: Code fix was completed and deployed.

    • 9:35 PM: Status Page was updated to reflect that the issue had been resolved.

    During the event, customers experienced slowness and delayed load times in the environment.

    Root Cause Analysis

    Ex Libris Engineers investigated this event to determine the root cause with the following results:

    The root cause was identified as an unoptimized SQL query introduced in the July 2025 release. Under high load conditions, this query led to significant performance degradation in Primo VE, and briefly impacted Alma performance as well.

    To mitigate the impact, our DBA team identified and terminated long-running queries. A more efficient execution plan was pinned to stabilize performance and the R&D team implemented a manual rollback of the problematic query.

    This fix was deployed globally and completed on 7 July 2025 end of the day, restoring normal service levels. 

    Due to misconfiguration, Status Page posts were not accurately updated but only sent emails to the registered customers. The issue was fixed and tested as well. 

    Technical Action Items and Preventive Measures

    Ex Libris has taken the following action and preventive measures to avoid such an occurrence in future:

    • Enforce mandatory performance benchmarking for all new SQL queries.

    • Refreshed escalation protocols for performance-related incidents, ensuring clarity and consistency across teams.

    • Review and enhance our trouble shooting guide and processes for performance issues, to enhance investigation and shorten resolution time.

    Customer Communication

    Ex Libris is committed to providing customers with prompt and ongoing updates during Cloud events. Ongoing and prompt updates on service interruptions appear in the system status portal at this address: http://status.exlibrisgroup.com/

    These updates are automatically sent as emails to registered customers.

    View article in the Exlibris Knowledge Center
    1. Back to top
      • Primo VE AP01 - RCA - November 14, 2022
      • Europe
    • Was this article helpful?

    Recommended articles

    1. Article type
      Topic
      Language
      English
    2. Tags
      1. Alma - Resource Management
      2. Root Cause Analysis
    1. © Copyright 2025 Ex Libris Knowledge Center
    2. Powered by CXone Expert ®
    • Term of Use
    • Privacy Policy
    • Contact Us
    2025 Ex Libris. All rights reserved