关于 ORACLE RAC 心跳问题的释疑（oracle rac心跳要求）

25-04-25 1

最近很多小伙伴都在问关于ORACLERAC心跳问题的释疑和oraclerac心跳要求这两个问题，那么本篇文章就来给大家详细解答一下，同时本文还将给你拓展11gR2RAC新特性-SCAN-GNS-RAC

最近很多小伙伴都在问关于 ORACLE RAC 心跳问题的释疑和oracle rac心跳要求这两个问题，那么本篇文章就来给大家详细解答一下，同时本文还将给你拓展11gR2 RAC 新特性-SCAN-GNS-RAC One Node等、12cr1 rac-rac dg broker 报错ORA-16698、12cR2 RAC+RAC+ADG ORA-16854、Backup-based duplicate (RAC-RAC)等相关知识，下面开始了哦！

本文目录一览：

关于 ORACLE RAC 心跳问题的释疑（oracle rac心跳要求）
11gR2 RAC 新特性-SCAN-GNS-RAC One Node等
12cr1 rac-rac dg broker 报错ORA-16698
12cR2 RAC+RAC+ADG ORA-16854
Backup-based duplicate (RAC-RAC)

关于 ORACLE RAC 心跳问题的释疑（oracle rac心跳要求）

1、rac 心跳的作用：
检测集群节点间的网络健康状态，还可用做缓存同步刷新及全局资源维护。在 grid control 出现后还传输数据块，其内联数据通信量比较大，通常是千兆网，当然使用万兆更好。

2、rac 心跳能否用直连网线？
直连网线限制 RAC 至两节点，另外直连网线不稳定，由此造成的 BUG 和技术问题，ORACLE 不提供相应的技术支持。
具体看 ORACLE 官方解释：
RAC: Frequently Asked Questions [ID 220970.1] 中描述

Is crossover cable supported as an interconnect with RAC on any platform ?

NO. CROSS OVER CABLES ARE NOT SUPPORTED. The requirement is to use a switch:

Detailed Reasons:

1) cross-cabling limits the expansion of RAC to two nodes

2) cross-cabling is unstable:

a) Some NIC cards do not work properly with it. They are not able to negotiate the DTE/DCE clocking, and will thus not function. These NICS were made cheaper by assuming that the switch was going to have the clock. Unfortunately there is no way to know which NICs do not have that clock.

b) Media sense behaviour on various OS''s (most notably Windows) will bring a NIC down when a cable is disconnected. Either of these issues can lead to cluster instability and lead to ORA-29740 errors (node evictions).

Due to the benefits and stability provided by a switch, and their afforability ($200 for a simple 16 port GigE switch), and the expense and time related to dealing with issues when one does not exist, this is the only supported configuration.

From a purely technology point of view Oracle does not care if the customer uses cross over cable or router or switches to deliver a message. However, we know from experience that a lot of adapters misbehave when used in a crossover configuration and cause a lot of problems for RAC. Hence we have stated on certify that we do not support crossover cables to avoid false bugs and finger pointing amongst the various parties: Oracle, Hardware vendors, Os vendors etc...

3、rac 心跳的高可用
rac 心跳实现高可用，可使用双网口绑定的技术，操作系统层面实现。双网口绑定常见有负载均衡和主备模式。负载均衡可提供两倍的带宽 (实际并达不到，只是可快一些)，但从可靠性角度来说，建议主备模式。在主备模式下，当一个网络接口失效时 (例如主交换机掉电等), 不会出现网络中断，系统会按照 /etc/rc.d/rc.local 里指定的网卡顺序工作，机器仍能对外服务，起到了失效保护的功能。

补充资料：

linux 系统下 bond mode 参数说明：（mode=4 在交换机支持 LACP 时推荐使用，其能提供更好的性能和稳定性）

0－轮询模式，所绑定的网卡会针对访问以轮询算法进行平分。
1－高可用模式，运行时只使用一个网卡，其余网卡作为备份，在负载不超过单块网卡带宽或压力时建议使用。
2－基于HASH算法的负载均衡模式，网卡的分流按照xmit_hash_policy的TCP协议层设置来进行HASH计算分流，使各种不同处理来源的访问都尽量在同一个网卡上进行处理。
3－广播模式，所有被绑定的网卡都将得到相同的数据，一般用于十分特殊的网络需求，如需要对两个互相没有连接的交换机发送相同的数据。
4－802.3ab负载均衡模式，要求交换机也支持802.3ab模式，理论上服务器及交换机都支持此模式时，网卡带宽最高可以翻倍(如从1Gbps翻到2Gbps)
5－适配器输出负载均衡模式，输出的数据会通过所有被绑定的网卡输出，接收数据时则只选定其中一块网卡。如果正在用于接收数据的网卡发生故障，则由其他网卡接管，要求所用的网卡及网卡驱动可通过ethtool命令得到speed信息。
6－适配器输入/输出负载均衡模式，在”模式5″的基础上，在接收数据的同时实现负载均衡，除要求ethtool命令可得到speed信息外，还要求支持对网卡MAC地址的动态修改功能。

4、rac 双心跳的可行性
rac 心跳使用双网口绑定后，是一个私有的地址隶属于一个 vlan，采用主备模式，两条网线分别连接两个不同的交换机。这是操作系统层面就可实现的。如果 rac 心跳采用两个私有 VLAN，那么心跳就会有两个私有地址。双心跳地址间如何做负载均衡或主备模式，就由 ORACLE 数据库自己来实现（操作系统层不再做绑定）。oracle 在 11G R2 之后的版本 11.2.0.2 里支持这种方式，由于这个 HAIP 新特性刚推出有 BUG，建议大家使用 11.2.0.4 版更稳定。官方的举例是针对多个数据库 instance 高互连带宽要求的。
官方具体说明请参见 http://docs.oracle.com/database/121/RACAD/admin.htm#RACAD7295

文档 ID 1210883.1 详细介绍了 HAIP，其中对 HAIP 的描述如下：
Redundant Interconnect without any 3rd-party IP failover technology (bond, IPMP or similar) is supported natively by Grid Infrastructure starting from 11.2.0.2. Multiple private network adapters can be defined either during the installation phase or afterward using the oifcfg. Oracle Database, CSS, OCR, CRS, CTSS, and EVM components in 11.2.0.2 employ it automatically.

Grid Infrastructure can activate a maximum of four private network adapters at a time even if more are defined. The ora.cluster_interconnect.haip resource will start one to four link local HAIP on private network adapters for interconnect communication for Oracle RAC, Oracle ASM, and Oracle ACFS etc.

Grid automatically picks free link local addresses from reserved 169.254.*.* subnet for HAIP. According to RFC-3927, link local subnet 169.254.*.* should not be used for any other purpose. With HAIP, by default, interconnect traffic will be load balanced across all active interconnect interfaces, and corresponding HAIP address will be failed over transparently to other adapters if one fails or becomes non-communicative. .

The number of HAIP addresses is decided by how many private network adapters are active when Grid comes up on the first node in the cluster . If there''s only one active private network, Grid will create one; if two, Grid will create two; and if more than two, Grid will create four HAIPs. The number of HAIPs won''t change even if more private network adapters are activated later, a restart of clusterware on all nodes is required for the number to change, however, the newly activated adapters can be used for fail over purpose.

5、每一套业务系统数据库的 RAC 心跳是否需要做 vlan 隔离？
oracle 官方没有明确说明，出于安全的特定要求，自己可以做 VLAN 隔离，小的 VLAN 比较多则会增加一些管理和配置成本。

本文同步分享在博客 “xjsunjie”（51CTO）。
如有侵权，请联系 support@oschina.cn 删除。
本文参与 “OSC 源创计划”，欢迎正在阅读的你也加入，一起分享。

11gR2 RAC 新特性-SCAN-GNS-RAC One Node等

11gR2 RAC 新特性:

1.SCAN --single client access name:

客户端与RAC之前又架设了一层，目的是把RAC的IP信息对客户屏蔽掉，
让用户更加方便透明，不用再去管去连接哪个实例，新增加节点或删除节点
客户端不需要再做任何配置
目的就是为了：扩展性，
比如：云基础平台上的许多设备，允许其中的设备故障（容错），内部调整添加删除，无需用户干预。
某个设备损坏就是指ORACLE高可用上的单点故障，利用这种技术可以屏蔽单点故障。

2.GNS--Grid Naming Service

与SCAN搭配使用，可以不配置RAC的IP地址，使用DHCP服务来分配IP

3.OCR and Voting on ASM storage

10g的时候只能放在裸设备上，不便于管理，现在11g可以放在ASM上，便于管理

4.RAC One Node

在一个RAC体系架构下，可以在一个节点上创建一个单节点的数据库

5.Clusterware and ASM share the same Oracle Home

Clusterware and ASM共享一个ORACLEHOME，10g下Clusterware使用CRS_HOME,ASM使用ORACLE_HOME
11g下Clusterware and ASM打造到一起，即GI，云基础平台的故障需要使用GI集群软件来管理这些故障，来作故障冗余

6.Rebootless Restart：减少机器的重启

10g时由于心跳问题，节点与节点没有办法检测到对方的状态时，会有一个节点被剔除（即脑裂问题），被剔除的节点将会重启
由于重启机器比较慢，11g时对这个做了改进，ORACLE将会使用自己的服务来处理被剔除的节点，不需要重启机器

7.SCAN详解：

10g时客户端配置TNS文件，里面需要把所有实例的IP地址配置到里面，然后客户端直接连接到某一个实例的监听器上
而11g时，屏蔽了该种方法，客户端不直接连接实例上的监听器，在中间加了一个中间层DNS，客户端先直接连接DNS，
所以在客户端只配置一个DNS服务器的IP就可以了，DNS确认后把请求转到SCAN Listeners（本地Listeners在SCAN Listeners
的注册，像一个负载均衡的功能），然后SCAN Listeners再分配Local Listeners。
当集群中的节点非常多时，客户端TNS没办法配置，通过这种机制，客户端连接固定的IP地址，非常简单，方便扩展。

8.RAC One Node详解：

a.Online Database Relocation：在多个节点上安装多个数据库，都使用一个共享存储，当A服务器压力比较大，
B服务器压力小，ORACLE可以利用这种技术，把A服务器上的一个数据库实例
转移到B服务器上，实现了资源的转移

b.Online Rolling Patches：滚动打补丁，通过这种技术，可以先把数据库实例转移走后，然后在原来的服务器上给
ORACLE数据库软件打补丁，打完补丁后，然后把数据库实例移回来

c.Cluster Failover：当某台机器崩溃后，可以通过这种技术把数据库实例转移到另外一台机器上
原文链接： http://blog.csdn.net/q947817003/article/details/11558785

12cr1 rac-rac dg broker 报错ORA-16698

在oracle 11g 中配置dg 的时候，我们需要配置

log_archive_dest_1=xxx

log_archive_dest_2=xxx

但是到12.1版本之后，不需要配置log_archive_dest_2 ,如果配置的话如下错误，清空此参数即可。

ORA-16698: LOG_ARCHIVE_DEST_n parameter set for object to be added
 
Failed.

To clear LOG_ARCHIVE_DEST_n settings, use the ALTER SYstem SET LOG_ARCHIVE_DEST_n=" " sql*Plus command.

12cR2 RAC+RAC+ADG ORA-16854

近期在某银行生产搭建了一套RAC+RAC+ADG+broKER的生产系统，遇到了不少坑，再此做一个记录。大家有疑问，可以给我留言，一起讨论学习。

1、环境描述
Oralce 12cR2 RAC+RAC ADG broker

2、类似现象
DGMGRL> show configuration verbose

Configuration - FSF

Protection Mode: MaxAvailability
Members:
test_a - Primary database
standby_c - Physical standby database
standby_b - Physical standby database
Warning: ORA-16854: apply lag Could not be determined
##告警ORA-16854 无法确认应用延迟，告警现象和我遇到的是一模一样的。

3、解决方法
MOS 官方提供了两个方法，具体情况还需要进一步分析
1)有可能是BUG
Workaround: set the ApplyLagThreshold=0 but this means you will not receive notifications of apply lag in broker for the specified Standby database:

dgmgrl> edit database <standby db_unique_name> set property ApplyLagThreshold=0;

Please download and apply existing fix for BUG 28803345 or open SR and request backport of BUG 28803345 to resolve the issue.
2)重建控制文件
To solve this issue, recreate the standby controlfile.

Here steps for how to recreated the standby controlfile:

Steps to recreate a Physical Standby Controlfile (Doc ID 459411.1)

Please note in few cases apply lag in v$dataguard_stats was null in the database and Clearing ORLs in standby helped resolve the NULL value issue.

4、开SR
有条件或者服务的朋友可以开一个SR，明确一下问题。

Backup-based duplicate (RAC-RAC)

环境

生产RAC(target):

ORACLE_SID=prodb

测试RAC(auxiliary)

ORACLE_SID=testdb

oracle版本：11.2.0.1.0

1、在生产环境RAC node1节点做数据库prodb全备

RMAN> backup database include current controlfile plus archivelog delete all input;

2、将备份集和密码文件拷贝到测试环境RAC node1节点的/tmp/oracle。

3、在测试环境做还原前的准备

sql>alter system set cluster_database=false scope=spfile;

sql>shutdown immediate;

sql>startup nomount;

4、数据还原

$rman auxiliary /

rman>duplicate database to testdb backup location '/tmp/oracle';

5、替换测试环境密码文件为生产环境密码文件

6、在测试环境RAC node1 修改cluster_database为true

sql>alter system set cluster_database=true scope=spfile;

7、重启测试环境RAC node1、node2实例

sql>startup force;

今天关于关于 ORACLE RAC 心跳问题的释疑和oracle rac心跳要求的介绍到此结束，谢谢您的阅读，有关11gR2 RAC 新特性-SCAN-GNS-RAC One Node等、12cr1 rac-rac dg broker 报错ORA-16698、12cR2 RAC+RAC+ADG ORA-16854、Backup-based duplicate (RAC-RAC)等更多相关知识的信息可以在本站进行查询。

本文标签：