Oninit Logo
The Down System Specialists
+1-913-674-0360
+44-2081-337529
Partnerships Contact

Fix It Yourself

Shared Memory Errors

Key already exists SHMGET[17]: Sometimes onmode -ky or onclean -ky will clear shared memory segments if the engine did come down gracefully. Otherwise, you will need root access to run ipcrm to remove them manually. If the server is running multiple instances then check the SERVERNUM, convert to hex and add 0x5256 to ensure the correct keys are removed.

SHMAT[22] or SHMAT[24] is a probably a problem with the kernel setup, check the release notes, typically $INFORMIXDIR/release/en_us/0333/*machine* or similar.

Data Errors

Cannot open the root chunk: errno 2 no such file, errno 5 I/O error, errno 6 no device. Check the link is correct and still points to a valid device. As informix run dd if=<root device> bs=2048 count=1 | strings | more and you should see the Informix version from the reserved pages. Otherwise, there might be a hardware problem.

Errno 13 is a permission problem, and the permissions are important. Run the dd command above.

Connectivity Errors

If the engine is not gracefully shutdown then a stray Oninit process might holding the tcp port open - error number -25572. If the service has been removed from /etc/services since the last engine restart then the error might be -25507.

If the engine is up but the users can't connect check it is fully online, i.e not still in Quiescent. Also check the NETTYPE for shared memory connections, if the system using is using shared memory connections then the number of concurrent users is specified in the NETTYPE.

DBSpace or Chunk Down

Check the ONDBSPACEDOWN setting in ONCONFIG. If it is 1 or 2 will prevent the engine from coming up and staying up. This can hard to spot, the engine comes online but hangs (ONDBSPACEDOWN 2) or comes up and goes straight back down (ONDBSPACEDOWN 1).

First Attempt at Starting Engine

Check out sqlhosts, and onconfig - there is probably just simple typo.

Engine Is Hung

Most of the time there is nothing much can be done but there are a few things worth checking.

AFDEBUG

Is AFDEBUG set in the environment? If yes, then kill -9 the master oninit process and then issue onmode -ky or onclean -ky to ensure all the shared memmory is cleaned up, and then bring the engine online.

Physical Recovery

If a physical recovery has been executed [ontape -p] then the engine will be sitting in FAST RECOVERY, just run onmode -m

If a physical recovery has been executed and the engine has been bounced instead of issuing onmode -m and onstat -d shows all the chunks marked as PI then just redo the restore.

ONDBSPACEDOWN

If ONDBSPACEDOWN is set to 2 then the engine can hang on a down chunk, issue onmode -O to override the wait.

Long Transaction

Is there a long transaction rolling back? The onstat - will show LONGTX and on onstat -p -r will show increasing read/writes. The engine has reached the exclusive high water mark and now all users are waiting until the long transaction rolls back. All you can do is wait - do NOT restart the engine.

PDQ

Is PDQ turned on? An onstat -g mgm will show if a PDQ query is running with 100% of the resources.

Error Trapping

Is onmode -I set, check the online logs for which traperror is set.

Logical Logs

Are the logical logs backed up? The onstat -l will show the status of the logs, if all of the logs are U------ instead of U-B----- then just back up the logs. If the ontape -a doesn't clear the problem the set LTAPEDEV to /dev/null and bounce the engine. This should clear the logs but the logs files will be lost. Remember to reset LTAPEDEV.

Engine Is Online

The engine is online but there is no connectivity.

Environment

Many time this is just a simple mismatch in the environment.

If possible, then try telnet'ing to the server on the DB Port, if you get 'Connection refused' the you are on the wrong port, wrong server or there is a firewall in the way. If you get 'Connection closed' after pressing return then look in the online.log and you will see error 401. You have made a successful connection.

Listener Errors

If a listener thread has hung or timed out then historically the only solution is a an engine restart. The later engines have

onmode -P [start|stop|restart] dynamic listen thread control

The typical symptoms are existing users are working fine but no new users are able to connect.

Chunk is Offline

Can you dd from the chunk

dd if= of=/dev/null count=1000 bs=2k (dbspace page size) If this fails then the problem is most likely an OS or Hardware problem. If it works then dd if= count=1000 bs=2k (dbspace page size) | strings is always a good command to try, you should be able see your 'data'.

You can always to bring the chunk online if it is part of a mirror pair using the onspace -s <dbspacename> -p <path> -o <offset> -O.

Failing that you can patch the chunk back into the system, if you don't know how to do that when please contact Oninit. However, if the chunk has a fundamental integrity issue the engine will probably mark it down again, in these cases a more detailed analysis is required.

To discuss how Oninit ® can assist please call on +1-913-674-0360 or alternatively just send an email specifying your requirements.


© Copyright 2006 - 2022 by the Oninit LLC. All rights reserved. Privacy
Unless otherwise noted, this Web Site and its contents are the property of Oninit LLC and are protected, without limitation, pursuant to United States of America and International copyright and trademark laws.
Oninit ® is a Register Trademark of Oninit LLC