RCA: Springer Title Deletions, August 2021
Introduction
This document serves as a Root Cause Analysis for the reported incident on August 3, 2021.The goal of this document is to share our findings regarding the reported incident, specify the root cause analysis, handling of the event, outline actions to be taken for mitigation, as well as preventive measures Ex Libris is taking to avoid similar cases in future.
Event Timeline
Date |
Activity |
August 3 ,2021 |
Unintentional deletion of 2 Springer collections: - 'SpringerNature Complete Journals' - 3282 portfolios were deleted - 'SpringerLink Open Access eBooks' - 840 portfolios were deleted |
August 6 ,2021 |
Ex Libris communicated update to the SFX Listserv, acknowledging the issue, its impact and expected resolution time |
August 6, 2021 |
Analysis completed and fix was ready for release |
August 9 ,2021 |
Ex Libris communicated update to the SFX Listserv, informing that the analysis completed, and solution is in place along with expected timeline for local and cloud installations |
August 9 ,2021 |
Release was ready for local customers |
August 14 ,2021 |
Release was applied on SFX cloud environments |
Root Cause Analysis
Ex Libris investigated this event to determine the impact and root cause analysis with the following results: along automated process to ingest providers updated content, Springer file was sent with a change in its structure. As a result, a match mechanism against KB collection portfolios failed causing deletion of valid portfolios.
Findings
Ex Libris investigated this event and determined the following:
- Springer update file was sent with a change in its structure, causing the match mechanism against SFX KB to fail. As a result, new corrupted records were created, and the valid ones were deleted.
- Automated QA validation did not detect the issue because of a bug and therefore no alert notification was sent as is expected in such a scenario.
Technical Action Items and Preventive Measures
Ex Libris has taken the following action and preventive measures to avoid such an occurrence in future:
-
Enhance prevention capabilities to perform actions outside of the routine procedure to significantly reduce the risk of manual error
-
Bug detected in the automated QA routine was fixed, tested, and applied (Done)
-
New automated validation routine that will allow detection of structure changes in early stage will be added to the process (October)
-
Overall review of all automatic QA routines to ensure they perform as expected (September)
-
Conclusion
Ex Libris treats the incident in high priority with evaluation, assessment, and mitigation processes and lessons learned. We are determined to improve the level of KB quality and the value it provides to our customers.