ORA-00494: enqueue [CF] held for too long (more than 900 seconds)

Last updated
Save as PDF
Share
1. Share
2. Tweet
3. Share

Article Type: General
Product: Aleph
Product Version: 20, 21, 22, 23

Description:
We were down this morning. We were able to bring the system up but want to know how we can prevent
this Oracle error {"ORA-00494: enqueue [CF] held for too long (more than 900 seconds)"} from bringing our system down.

Oracle alert log:

Wed Jan 27 15:10:55 2010
Thread 1 advanced to log sequence 1083 (LGWR switch)
Current log# 3 seq# 1083 mem# 0: /ora01/oradata/aleph0/aleph0_redo03.log
Wed Jan 27 22:16:59 2010
Errors in file /exlibris/app/oracle/admin/aleph0/bdump/aleph0_arc1_31636.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, os
id 31585'
Wed Jan 27 22:17:00 2010
Errors in file /exlibris/app/oracle/admin/aleph0/bdump/aleph0_arc0_31634.trc:
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, os
id 31585'
Wed Jan 27 22:17:07 2010
System State dumped to trace file /exlibris/app/oracle/admin/aleph0/bdump/aleph0
_arc1_31636.trc
Wed Jan 27 22:17:08 2010
Killing enqueue blocker (pid=31585) on resource CF-00000000-00000000
by killing session 165.1
Killing enqueue blocker (pid=31585) on resource CF-00000000-00000000
by terminating the process
Wed Jan 27 22:17:08 2010
Killing enqueue blocker (pid=31585) on resource CF-00000000-00000000
ARC0: terminating instance due to error 2103
Wed Jan 27 22:17:13 2010
Termination issued to instance processes. Waiting for the processes to exit
Wed Jan 27 22:17:19 2010
Instance termination failed to kill one or more processes
Instance terminated by ARC0, pid = 31634

Resolution:
Metalink note: "Database Crashes With ORA-00494 [ID 753290.1] describes the following"

Cause:

The lgwr has killed the ckpt process, causing the instance to crash.

From the alert.log we can see:

That the database has waited too long for a CF enqueue, so the following error has been reported.
ORA-00494: enqueue [CF] held for too long (more than 900 seconds) by 'inst 1, osid 38356'

Then the LGWR has killed the blocker, which was in this case the CKPT process which cause the
instance to crash.

Checking the alert.log we can see that the frequency of redo log files switch is very high(almost every 1 min).

Solution:

1-We usually suggest to configure the redo log switches to be done every 20~30 min to reduce the contention on the control files.

You can use the V$INSTANCE_RECOVERY view column OPTIMAL_LOGFILE_SIZE to
determine the size of your online redo logs. This field shows the redo log file size in megabytes that is considered optimal based on the current setting of FAST_START_MTTR_TARGET. If this field consistently shows a value greater than the size of your smallest online log, then you should configure all your online
logs to be at least this size.

This issue is still open and was escalated to Second line support for further analysis <2010-02-28 01:00:04>.

Article last edited: 10/8/2013