CRS-0184: Cannot communicate with the CRS
的有关信息介绍如下:在巡检时发现数据库集群异常,但是数据库仍然正常运行。执行crs_stat -t报错CRS-0184: Cannot communicate with the CRS daemon。
解决步骤:
1:crs_stat -t查看集群信息报错
2:asmcmd进入ASM中发现存储集群信息的DG消失
3:crsctl check crs 检测集群发现异常
4:查看集群crsd.log日志
5:查看磁盘信息
6:查看v$asm_disk中集群信息
7:asmcmd中 mount -a挂载所有DG
8:crs_stat -t查看集群信息仍然报错
9:crsctl query css votedisk和ocrcheck 都正常
10:crsctl check crs 仍然报错
11:重启两个节点
12:重启之后发现crsdg正常,但是数据库无法启动
13:重新重启之后集群正常,数据库能够正常启动
1:crs_stat -t查看集群信息报错
[grid@qidong1 ~]$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
[grid@qidong2 ~]$ crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.
两个节点执行crs_stat 报错。但是集群正常ASM正常,数据库正常。
2:asmcmd进入ASM中发现存储集群信息的DG消失
asmcmd进入ASM中只能看到datadg和archdg。发现crsdg消失不见
两个节点都是这样的情况
3:crsctl check crs 检测集群发现异常
[grid@qidong2 ~]$ crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
[grid@qidong2 ~]$
4:查看集群crsd.log日志
在其中一个节点查看就行,应为ASM中crsdg消失不会,说明两个节点都会出现报错。
crs的日志结构:
根据CRS的安装目录,日志跟目录在$CRS_HOME/log/[node],其中node是节点名。目录下包含的内容如下: 1)alert.log :该日志类似于数据库的alert.log 一般作为检查的起点。 2)crsd,cssd,evmd :分别对应3个目录,分别对应着CRS三个同名进程的日志,日志名字分别叫做:crsd.logocssd.log evmd.log 3) racg :这是个目录,里面放置着所有的nodeapp的日志,包括ONS、VIP,每个日志从名字上很容易辨别对应的nodeapp。 4)client :该目录放置的是工具执行日志,Oracle Clusterware提供了许多命令行工具,比如ocrcheck 、ocrconfig、ocrdump、oifcfg clscfg 这些工具运行时产生的日志就放置在该目录下。
查看crsd.log日志:$GRID_HOME/log/sid/crsd/crsd.log
/u01/app/11.2.0.4/grid/log/qidong2/crsd:
2015-10-24 11:27:29.017: [ CRSMAIN] Checking the OCR device
2015-10-24 11:27:29.017: [ CRSMAIN] Sync-up with OCR
2015-10-24 11:27:29.017: [ CRSMAIN] Connecting to the CSS Daemon
2015-10-24 11:27:29.017: [ CRSMAIN] Getting local node number
2015-10-24 11:27:29.018: [ CRSMAIN] Initializing OCR
[ CLWAL]clsw_Initialize: OLR initlevel
2015-10-24 11:27:29.347: [ OCRASM]proprasmo: Error in open/create file in dg [crsdg]
[ OCRASM]SLOS : SLOS: cat=8, opn=kgfoOpen01, dep=15056, loc=kgfokge
2015-10-24 11:27:29.347: [ OCRASM]ASM Error Stack :
2015-10-24 11:27:29.395: [ OCRASM]proprasmo: kgfoCheckMount returned
2015-10-24 11:27:29.395: [ OCRASM]proprasmo: The ASM disk group crsdg is not found or not mounted
2015-10-24 11:27:29.396: [ OCRRAW]proprioo: Failed to open [+crsdg]. Returned proprasmo() with . Marking location as UNAVAILABLE.
2015-10-24 11:27:29.396: [ OCRRAW]proprioo: No OCR/OLR devices are usable
2015-10-24 11:27:29.396: [ OCRASM]proprasmcl: asmhandle is NULL
2015-10-24 11:27:29.397: [ GIPC] gipcCheckInitialization: possible incompatible non-threaded init from [prom.c : 690], original from [clsss.c : 5343]
2015-10-24 11:27:29.398: [ default]clsvactversion:4: Retrieving Active Version from local storage.
2015-10-24 11:27:29.401: [ CSSCLNT]clssgsgrppubdata: group (ocr_qidong-cluster) not found
2015-10-24 11:27:29.401: [ OCRRAW]proprio_repairconf: Failed to retrieve the group public data. CSS ret code
2015-10-24 11:27:29.402: [ OCRRAW]proprioo: Failed to auto repair the OCR configuration.
2015-10-24 11:27:29.402: [ OCRRAW]proprinit: Could not open raw device
2015-10-24 11:27:29.403: [ OCRASM]proprasmcl: asmhandle is NULL
2015-10-24 11:27:29.405: [ OCRAPI]a_init:16!: Backend init unsuccessful :
2015-10-24 11:27:29.405: [ CRSOCR] OCR context init failure. Error: PROC-26: Error while accessing the physical storage
2015-10-24 11:27:29.405: [ CRSD] Created alert : (:CRSD00111:) : Could not init OCR, error: PROC-26: Error while accessing the physical storage
2015-10-24 11:27:29.405: [ CRSD][PANIC] CRSD exiting: Could not init OCR, code: 26
2015-10-24 11:27:29.405: [ CRSD] Done.
根据crsd.log日志,可以判定由于无法读取到物理存储的信息,导致crsdg无法挂载。
5:查看磁盘信息
通过fdisk -l在两个节点上查看发现存储都正常挂载到两个节点上,且udev唯一映射也正常。
[root@qidong2 dev]# ll ora-*
brwxrwxrw-. 1 grid asmadmin 8, 17 Dec 1 10:34 ora-crsa1
brwxrwxrw-. 1 grid asmadmin 8, 33 Dec 1 10:34 ora-crsa2
brwxrwxrw-. 1 grid asmadmin 8, 49 Dec 1 10:34 ora-crsa3
brwxrwxrw-. 1 grid asmadmin 8, 65 Oct 24 11:26 ora-crsa4
brwxrwxrw-. 1 oracle oinstall 8, 81 Oct 24 11:26 ora-dataa5
brwxrwxrw-. 1 oracle oinstall 8, 97 Dec 1 10:34 ora-datab1
brwxrwxrw-. 1 oracle oinstall 8, 113 Dec 1 10:34 ora-datab2
brwxrwxrw-. 1 oracle oinstall 8, 129 Dec 1 10:34 ora-datab3
brwxrwxrw-. 1 oracle oinstall 8, 145 Dec 1 10:34 ora-datab4
brwxrwxrw-. 1 oracle oinstall 8, 161 Dec 1 10:34 ora-datab5
brwxrwxrw-. 1 oracle oinstall 8, 177 Dec 1 10:27 ora-datab6
6:查看v$asm_disk中集群信息
[oracle@qidong1 ~]$
[oracle@qidong1 ~]$ export ORACLE_SID=+ASM1
[oracle@qidong1 ~]$ export ORACLE_HOME=/u01/app/11.2.0.4/grid/
[oracle@qidong1 ~]$ export PATH=$PATH:$ORACLE_HOME/bin
[oracle@qidong1 ~]$ sqlplus sys as sysasm
SQL*Plus: Release 11.2.0.4.0 Production on Tue Dec 1 10:49:38 2015
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Enter password:
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> col name format a15
SQL> col path format a17
SQL> select name,path,header_status,mount_status,state from v$asm_disk;
NAME PATH HEADER_STATU MOUNT_S STATE
--------------- ----------------- ------------ ------- --------
/dev/ora-crsa3 MEMBER CLOSED NORMAL
/dev/ora-crsa2 MEMBER CLOSED NORMAL
/dev/ora-crsa1 MEMBER CLOSED NORMAL
/dev/ora-dataa5 CANDIDATE CLOSED NORMAL
/dev/ora-crsa4 CANDIDATE CLOSED NORMAL
DATADG_0002 /dev/ora-datab3 MEMBER CACHED NORMAL
DATADG_0003 /dev/ora-datab4 MEMBER CACHED NORMAL
DATADG_0004 /dev/ora-datab5 MEMBER CACHED NORMAL
ARCHDG_0000 /dev/ora-datab6 MEMBER CACHED NORMAL
DATADG_0001 /dev/ora-datab2 MEMBER CACHED NORMAL
DATADG_0000 /dev/ora-datab1 MEMBER CACHED NORMAL
crsdg 没有正常挂载
注意:如何进入sqlplus sys as sysasm参考经验:
sqlplus sys as sysasm进入ASM管理工具
0sqlplus sys as sysasm进入ASM管理工具
7:asmcmd中 mount -a挂载所有DG
ASMCMD> mount -a
ORA-15032: not all alterations performed
ORA-15017: diskgroup "DATADG" cannot be mounted
ORA-15013: diskgroup "DATADG" is already mounted
ORA-15017: diskgroup "ARCHDG" cannot be mounted
ORA-15013: diskgroup "ARCHDG" is already mounted (DBD ERROR: OCIStmtExecute)
ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 16777216 563184 450832 0 450832 0 N ARCHDG/
MOUNTED NORMAL N 512 4096 16777216 61392 60000 20464 19768 0 Y CRSDG/
MOUNTED EXTERN N 512 4096 16777216 2815920 1158096 0 1158096 0 N DATADG/
ASMCMD>
在一个节点执行即可,执行完之后,v$asm_disk中发现crsdg挂载正常了。asmcmd中执行lsdg也能看到crsdg信息了。
注意:
如果不想在asmcmd中使用mount -a,可以在第六步骤中,sqlplus sys as sysasm进入ASM中。在两个节点上分别执行alter diskgroup crsdg mount;
8:crs_stat -t查看集群信息仍然报错
9:crsctl query css votedisk和ocrcheck 都正常
[grid@qidong1 ~]$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 5f065dff8dc04f4cbfadfb4e5d805957 (/dev/ora-crsa1) [CRSDG]
2. ONLINE ab51b2fba8ad4fe2bf126f3d9ca8d9b4 (/dev/ora-crsa2) [CRSDG]
3. ONLINE 8d555f4c64b24f63bf019e5e40320ecf (/dev/ora-crsa3) [CRSDG]
Located 3 voting disk(s).
[grid@qidong1 ~]$
[grid@qidong1 ~]$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3100
Available space (kbytes) : 259020
ID : 2044633163
Device/File Name : +crsdg
Device/File integrity check succeeded
Device/File not configured
Device/File not configured
Device/File not configured
Device/File not configured
Cluster registry integrity check succeeded
Logical corruption check bypassed due to non-privileged user
10:crsctl check crs 仍然报错
[root@qidong1 grid]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
11:重启两个节点
由于之前数据库一致正常运行,且集群也没有恢复正常,srvctl stop database不能正常执行,只能在每个节点执行shutdown immeidate关闭数据库。
1:在每个节点执行shutdown immeidate关闭数据库
2:crsctl stop crs 关闭两节点集群,如果不行crsctl stop crs -f 关闭
3:reboot 重启两节点
12:重启之后发现crsdg正常,但是数据库无法启动
重启之后,集群正常启动,但是数据库无法启动。定位发现datadg 和archdg 没有挂载。
/dev/查看磁盘的各个主分区无法识别。但是通过fdisk -l 查看部分磁盘会显示主分区,但是有的磁盘分区不显示。
由于使用udev单通道唯一映射需要对每个磁盘划分一个主分区才能进行udev映射。此时重启之后发现datadg和archdg所在磁盘分区无缘无故自动被删除。所以需要重新分区:
1:crsctl stop crs 关闭两节点集群
2:在节点1对datadg和archdg所在磁盘进行分区,所有磁盘分区完成之后,此时节点1虽然udev能够映射了,但是节点2不能够udev映射。这是建议重启节点1和节点2
13:重新重启之后集群正常,数据库能够正常启动