How to diagnose HA out of sync messages

This section describes how to use the commands diagnose sys ha showcsum and diagnose debug to diagnose the cause of HA out of sync messages.

If HA synchronization is not successful, use the following procedures on each cluster unit to find the cause.

To determine why HA synchronization does not occur

1. Connect to each cluster unit CLI by connected to the console port.

2. Enter the following commands to enable debugging and display HA out of sync messages.

diagnose debug enable

diagnose debug console timestamp enable diagnose debug application hatalk -1 diagnose debug application hasync -1

Collect the console output and compare the out of sync messages with the information on page 203.

3. Enter the following commands to turn off debugging.

diagnose debug disable diagnose debug reset

To determine what part of the configuration is causing the problem

If the previous procedure displays messages that include sync object 0x30 (for example, HA_SYNC_SETTING_ CONFIGURATION = 0x03) there is a synchronization problem with the configuration. Use the following steps to determine the part of the configuration that is causing the problem.

If your cluster consists of two cluster units, use this procedure to capture the configuration checksums for each unit. If your cluster consists of more that two cluster units, repeat this procedure for all cluster units that returned messages that include 0x30 sync object messages.

1. Connect to each cluster unit CLI by connected to the console port.

2. Enter the following command to turn on terminal capture

diagnose debug enable

3. Enter the following command to stop HA synchronization.

execute ha sync stop

4. Enter the following command to display configuration checksums.

diagnose sys ha showcsum 1

5. Copy the output to a text file.

6. Repeat for all affected units.

7. Compare the text file from the primary unit with the text file from each cluster unit to find the checksums that do not match.

You can use a diff function to compare text files.

8. Repeat steps 4 to 7 for each checksum level:

diagnose sys ha showcsum 2 diagnose sys ha showcsum 3 diagnose sys ha showcsum 4 diagnose sys ha showcsum 5 diagnose sys ha showcsum 6 diagnose sys ha showcsum 7 diagnose sys ha showcsum 8

9. When the non-matching checksum is found, attempt to drill down further. This is possible for objects that have sub-components.

For example you can enter the following commands:

diagnose sys ha showcsum system.global diagnose sys ha showcsum system.interface

Generally it is the first non-matching checksum in one of the levels that is the cause of the synchronization problem.

10. Attempt to can remove/change the part of the configuration that is causing the problem. You can do this by making configuration changes from the primary unit or subordinate unit CLI.

11. Enter the following commands to start HA configuration and stop debugging:

execute ha sync start diagnose debug disable diagnose debug reset

Recalculating the checksums to resolve out of sync messages

Sometimes an error can occur when checksums are being calculated by the cluster. As a result of this calculation error the CLI console could display out of sync error messages even though the cluster is otherwise operating normally. You can also sometimes see checksum calculation errors in diagnose sys ha showcsum command output when the checksums listed in the debugzone output don’t match the checksums in the checksum part of the output.

One solution to this problem could be to re-calculate the checksums. The re-calculated checksums should match and the out of sync error messages should stop appearing.

You can use the following command to re-calculate HA checksums:

diagnose sys ha csum-recalculate [<vdom-name> | global]

Just entering the command without options recalculates all checksums. You can specify a VDOM name to just recalculate the checksums for that VDOM. You can also enter global to recalculate the global checksum.

Having trouble configuring your Fortinet hardware or have some questions you need answered? Check Out The Fortinet Guru Youtube Channel! Want someone else to deal with it for you? Get some consulting from Fortinet GURU!

Synchronizing the configuration

Leave a reply

Synchronizing the configuration

The FGCP uses a combination of incremental and periodic synchronization to make sure that the configuration of all cluster units is synchronized to that of the primary unit.

The following settings are not synchronized between cluster units:

HA override.
HA device priority.
The virtual cluster priority.
The FortiGate unit host name.
The HA priority setting for a ping server (or dead gateway detection) configuration.
The system interface settings of the HA reserved management interface.
The HA default route for the reserved management interface, set using the ha-mgmt-interface-gateway option of the config system ha command.

The primary unit synchronizes all other configuration settings, including the other HA configuration settings.

Disabling automatic configuration synchronization

In some cases you may want to use the following command to disable automatic synchronization of the primary unit configuration to all cluster units.

config system ha

set sync-config disable end

When this option is disabled the cluster no longer synchronizes configuration changes. If a device failure occurs, the new primary unit may not have the same configuration as the failed primary unit. As a result, the new primary unit may process sessions differently or may not function on the network in the same way.

In most cases you should not disable automatic configuration synchronization. However, if you have disabled this feature you can use the execute ha synchronize command to manually synchronize a subordinate unit’s configuration to that of the primary unit.

You must enter execute ha synchronize commands from the subordinate unit that you want to synchronize with the primary unit. Use the execute ha manage command to access a subordinate unit CLI.

For example, to access the first subordinate unit and force a synchronization at any time, even if automatic synchronization is disabled enter:

execute ha manage 0

execute ha synchronize start

You can use the following command to stop a synchronization that is in progress.

execute ha synchronize stop

Incremental synchronization

When you log into the cluster web-based manager or CLI to make configuration changes, you are actually logging into the primary unit. All of your configuration changes are first made to the primary unit. Incremental synchronization then immediately synchronizes these changes to all of the subordinate units.

When you log into a subordinate unit CLI (for example using execute ha manage) all of the configuration changes that you make to the subordinate unit are also immediately synchronized to all cluster units, including the primary unit, using the same process.

Incremental synchronization also synchronizes other dynamic configuration information such as the DHCP server address lease database, routing table updates, IPsec SAs, MAC address tables, and so on. SeeAn introduction to the FGCP on page 1310 for more information about DHCP server address lease synchronization and Synchronizing kernel routing tables on page 1523 for information about routing table updates.

Whenever a change is made to a cluster unit configuration, incremental synchronization sends the same configuration change to all other cluster units over the HA heartbeat link. An HA synchronization process running on the each cluster unit receives the configuration change and applies it to the cluster unit. The HA synchronization process makes the configuration change by entering a CLI command that appears to be entered by the administrator who made the configuration change in the first place.

Synchronization takes place silently, and no log messages are recorded about the synchronization activity. However, log messages can be recorded by the cluster units when the synchronization process enters CLI commands. You can see these log messages on the subordinate units if you enable event logging and set the minimum severity level to Information and then check the event log messages written by the cluster units when you make a configuration change.

You can also see these log messages on the primary unit if you make configuration changes from a subordinate unit.

Periodic synchronization

Incremental synchronization makes sure that as an administrator makes configuration changes, the configurations of all cluster units remain the same. However, a number of factors could cause one or more cluster units to go out of sync with the primary unit. For example, if you add a new unit to a functioning cluster, the configuration of this new unit will not match the configuration of the other cluster units. Its not practical to use incremental synchronization to change the configuration of the new unit.

Periodic synchronization is a mechanism that looks for synchronization problems and fixes them. Every minute the cluster compares the configuration file checksum of the primary unit with the configuration file checksums of each of the subordinate units. If all subordinate unit checksums are the same as the primary unit checksum, all cluster units are considered synchronized.

If one or more of the subordinate unit checksums is not the same as the primary unit checksum, the subordinate unit configuration is considered out of sync with the primary unit. The checksum of the out of sync subordinate unit is checked again every 15 seconds. This re-checking occurs in case the configurations are out of sync because an incremental configuration sequence has not completed. If the checksums do not match after 5 checks the subordinate unit that is out of sync retrieves the configuration from the primary unit. The subordinate unit then reloads its configuration and resumes operating as a subordinate unit with the same configuration as the primary unit.

The configuration of the subordinate unit is reset in this way because when a subordinate unit configuration gets out of sync with the primary unit configuration there is no efficient way to determine what the configuration differences are and to correct them. Resetting the subordinate unit configuration becomes the most efficient way to resynchronize the subordinate unit.

Synchronization requires that all cluster units run the same FortiOS firmware build. If some cluster units are running different firmware builds, then unstable cluster operation may occur and the cluster units may not be able to synchronize correctly.

Re-installing the firmware build running on the primary unit forces the primary unit to upgrade all cluster units to the same firmware build.

Console messages when configuration synchronization succeeds

When a cluster first forms, or when a new unit is added to a cluster as a subordinate unit, the following messages appear on the CLI console to indicate that the unit joined the cluster and had its configuring synchronized with the primary unit.

slave’s configuration is not in sync with master’s, sequence:0 slave’s configuration is not in sync with master’s, sequence:1

slave’s configuration is not in sync with master’s, sequence:2 slave’s configuration is not in sync with master’s, sequence:3 slave’s configuration is not in sync with master’s, sequence:4 slave starts to sync with master

logout all admin users

slave succeeded to sync with master

Console messages when configuration synchronization fails

If you connect to the console of a subordinate unit that is out of synchronization with the primary unit, messages similar to the following are displayed.

slave is not in sync with master, sequence:0. (type 0x3) slave is not in sync with master, sequence:1. (type 0x3) slave is not in sync with master, sequence:2. (type 0x3) slave is not in sync with master, sequence:3. (type 0x3) slave is not in sync with master, sequence:4. (type 0x3) global compared not matched

If synchronization problems occur the console message sequence may be repeated over and over again. The messages all include a type value (in the example type 0x3). The type value can help Fortinet Support diagnose the synchronization problem.

HA out of sync object messages and the configuration objects that they reference

Out of Sync Message Configuration Object

HA_SYNC_SETTING_CONFIGURATION = 0x03 /data/config

HA_SYNC_SETTING_AV = 0x10

HA_SYNC_SETTING_VIR_DB = 0x11 /etc/vir

HA_SYNC_SETTING_SHARED_LIB = 0x12 /data/lib/libav.so

HA_SYNC_SETTING_SCAN_UNIT

0x13

/bin/scanunitd

HA_SYNC_SETTING_IMAP_PRXY

0x14

/bin/imapd

HA_SYNC_SETTING_SMTP_PRXY

0x15

/bin/smtp

HA_SYNC_SETTING_POP3_PRXY

0x16

/bin/pop3

HA_SYNC_SETTING_HTTP_PRXY

0x17

/bin/thttp

HA_SYNC_SETTING_FTP_PRXY = 0x18 /bin/ftpd

HA_SYNC_SETTING_FCNI

0x19

/etc/fcni.dat

HA_SYNC_SETTING_FDNI

0x1a

/etc/fdnservers.dat

HA_SYNC_SETTING_FSCI

0x1b

/etc/sci.dat

HA_SYNC_SETTING_FSAE

0x1c

/etc/fsae_adgrp.cache

HA_SYNC_SETTING_IDS = 0x20 /etc/ids.rules

Out of Sync Message Configuration Object

HA_SYNC_SETTING_IDSUSER_RULES = 0x21 /etc/idsuser.rules

HA_SYNC_SETTING_IDSCUSTOM = 0x22

HA_SYNC_SETTING_IDS_MONITOR = 0x23 /bin/ipsmonitor

HA_SYNC_SETTING_IDS_SENSOR = 0x24 /bin/ipsengine

HA_SYNC_SETTING_NIDS_LIB = 0x25 /data/lib/libips.so

HA_SYNC_SETTING_WEBLISTS = 0x30

HA_SYNC_SETTING_CONTENTFILTER = 0x31 /data/cmdb/webfilter.bword

HA_SYNC_SETTING_URLFILTER = 0x32 /data/cmdb/webfilter.urlfilter

HA_SYNC_SETTING_FTGD_OVRD = 0x33 /data/cmdb/webfilter.fgtd-ovrd

HA_SYNC_SETTING_FTGD_LRATING = 0x34 /data/cmdb/webfilter.fgtd-ovrd

HA_SYNC_SETTING_EMAILLISTS = 0x40

HA_SYNC_SETTING_EMAILCONTENT = 0x41 /data/cmdb/spamfilter.bword

HA_SYNC_SETTING_EMAILBWLIST = 0x42 /data/cmdb/spamfilter.emailbwl

HA_SYNC_SETTING_IPBWL = 0x43 /data/cmdb/spamfilter.ipbwl

HA_SYNC_SETTING_MHEADER = 0x44 /data/cmdb/spamfilter.mheader

HA_SYNC_SETTING_RBL = 0x45 /data/cmdb/spamfilter.rbl

HA_SYNC_SETTING_CERT_CONF = 0x50 /etc/cert/cert.conf

HA_SYNC_SETTING_CERT_CA = 0x51 /etc/cert/ca

HA_SYNC_SETTING_CERT_LOCAL = 0x52 /etc/cert/local

HA_SYNC_SETTING_CERT_CRL = 0x53 /etc/cert/crl

HA_SYNC_SETTING_DB_VER = 0x55

HA_GET_DETAIL_CSUM = 0x71

HA_SYNC_CC_SIG = 0x75 /etc/cc_sig.dat

Out of Sync Message Configuration Object

HA_SYNC_CC_OP = 0x76 /etc/cc_op

HA_SYNC_CC_MAIN = 0x77 /etc/cc_main

HA_SYNC_FTGD_CAT_LIST = 0x7a /migadmin/webfilter/ublock/ftgd/data/

Comparing checksums of cluster units

You can use the diagnose sys ha showcsum command to compare the configuration checksums of all cluster units. The output of this command shows checksums labelled global and all as well as checksums for each of the VDOMs including the root VDOM. The get system ha-nonsync-csum command can be used to display similar information; however, this command is intended to be used by FortiManager.

The primary unit and subordinate unit checksums should be the same. If they are not you can use the execute ha synchronize command to force a synchronization.

The following command output is for the primary unit of a cluster that does not have multiple VDOMs enabled:

diagnose sys ha showcsum is_manage_master()=1, is_root_master()=1 debugzone

global: a0 7f a7 ff ac 00 d5 b6 82 37 cc 13 3e 0b 9b 77 root: 43 72 47 68 7b da 81 17 c8 f5 10 dd fd 6b e9 57 all: c5 90 ed 22 24 3e 96 06 44 35 b6 63 7c 84 88 d5

checksum

global: a0 7f a7 ff ac 00 d5 b6 82 37 cc 13 3e 0b 9b 77 root: 43 72 47 68 7b da 81 17 c8 f5 10 dd fd 6b e9 57 all: c5 90 ed 22 24 3e 96 06 44 35 b6 63 7c 84 88 d5

The following command output is for a subordinate unit of the same cluster:

diagnose sys ha showcsum is_manage_master()=0, is_root_master()=0 debugzone

global: a0 7f a7 ff ac 00 d5 b6 82 37 cc 13 3e 0b 9b 77 root: 43 72 47 68 7b da 81 17 c8 f5 10 dd fd 6b e9 57 all: c5 90 ed 22 24 3e 96 06 44 35 b6 63 7c 84 88 d5

checksum

global: a0 7f a7 ff ac 00 d5 b6 82 37 cc 13 3e 0b 9b 77 root: 43 72 47 68 7b da 81 17 c8 f5 10 dd fd 6b e9 57 all: c5 90 ed 22 24 3e 96 06 44 35 b6 63 7c 84 88 d5

The following example shows using this command for the primary unit of a cluster with multiple VDOMs. Two

VDOMs have been added named test and Eng_vdm. From the primary unit:

config global

sys ha showcsum

is_manage_master()=1, is_root_master()=1 debugzone

global: 65 75 88 97 2d 58 1b bf 38 d3 3d 52 5b 0e 30 a9

test: a5 16 34 8c 7a 46 d6 a4 1e 1f c8 64 ec 1b 53 fe root: 3c 12 45 98 69 f2 d8 08 24 cf 02 ea 71 57 a7 01

Eng_vdm: 64 51 7c 58 97 79 b1 b3 b3 ed 5c ec cd 07 74 09 all: 30 68 77 82 a1 5d 13 99 d1 42 a3 2f 9f b9 15 53

checksum

global: 65 75 88 97 2d 58 1b bf 38 d3 3d 52 5b 0e 30 a9 test: a5 16 34 8c 7a 46 d6 a4 1e 1f c8 64 ec 1b 53 fe root: 3c 12 45 98 69 f2 d8 08 24 cf 02 ea 71 57 a7 01

Eng_vdm: 64 51 7c 58 97 79 b1 b3 b3 ed 5c ec cd 07 74 09 all: 30 68 77 82 a1 5d 13 99 d1 42 a3 2f 9f b9 15 53

From the subordinate unit:

config global

diagnose sys ha showcsum is_manage_master()=0, is_root_master()=0 debugzone

global: 65 75 88 97 2d 58 1b bf 38 d3 3d 52 5b 0e 30 a9 test: a5 16 34 8c 7a 46 d6 a4 1e 1f c8 64 ec 1b 53 fe root: 3c 12 45 98 69 f2 d8 08 24 cf 02 ea 71 57 a7 01

Eng_vdm: 64 51 7c 58 97 79 b1 b3 b3 ed 5c ec cd 07 74 09 all: 30 68 77 82 a1 5d 13 99 d1 42 a3 2f 9f b9 15 53

checksum

global: 65 75 88 97 2d 58 1b bf 38 d3 3d 52 5b 0e 30 a9 test: a5 16 34 8c 7a 46 d6 a4 1e 1f c8 64 ec 1b 53 fe root: 3c 12 45 98 69 f2 d8 08 24 cf 02 ea 71 57 a7 01

Eng_vdm: 64 51 7c 58 97 79 b1 b3 b3 ed 5c ec cd 07 74 09 all: 30 68 77 82 a1 5d 13 99 d1 42 a3 2f 9f b9 15 53

Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain

Leave a reply

Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain

A network may experience packet loss when two FortiGate HA clusters have been deployed in the same broadcast domain. Deploying two HA clusters in the same broadcast domain can result in packet loss because of MAC address conflicts. The packet loss can be diagnosed by pinging from one cluster to the other or by pinging both of the clusters from a device within the broadcast domain. You can resolve the MAC address conflict by changing the HA Group ID configuration of the two clusters. The HA Group ID is sometimes also called the Cluster ID.

This section describes a topology that can result in packet loss, how to determine if packets are being lost, and how to correct the problem by changing the HA Group ID.

Packet loss on a network can also be caused by IP address conflicts. Finding and fixing IP address conflicts can be difficult. However, if you are experiencing packet loss and your net- work contains two FortiGate HA clusters you can use the information in this article to elim- inate one possible source of packet loss.

Changing the HA group ID to avoid MAC address conflicts

Change the Group ID to change the virtual MAC address of all cluster interfaces. You can change the Group ID

from the FortiGate CLI using the following command:

config system ha

set group-id <id_integer>

end

Example topology

The topology below shows two clusters. The Cluster_1 internal interfaces and the Cluster_2 port 1 interfaces are both connected to the same broadcast domain. In this topology the broadcast domain could be an internal network. Both clusters could also be connected to the Internet or to different networks.

Example HA topology with possible MAC address conflicts

Ping testing for packet loss

If the network is experiencing packet loss, it is possible that you will not notice a problem unless you are constantly pinging both HA clusters. During normal operation of the network you also might not notice packet loss because the loss rate may not be severe enough to timeout TCP sessions. Also many common types if TCP traffic, such as web browsing, may not be greatly affected by packet loss. However, packet loss can have a significant effect on real time protocols that deliver audio and video data.

To test for packet loss you can set up two constant ping sessions, one to each cluster. If packet loss is occurring the two ping sessions should show alternating replies and timeouts from each cluster.

Cluster_1 Cluster_2

reply timeout

Cluster_1 Cluster_2

reply timeout

timeout reply

reply timeout

timeout reply

Viewing MAC address conflicts on attached switches

If two HA clusters with the same virtual MAC address are connected to the same broadcast domain (L2 switch or hub), the MAC address will conflict and bounce between the two clusters. This example Cisco switch MAC address table shows the MAC address flapping between different interfaces (1/0/1 and 1/0/4).

0009.0f09.0002 DYNAMIC Gi1/0/1
0009.0f09.0002 DYNAMIC Gi1/0/4

Displaying the virtual MAC address

Leave a reply

Displaying the virtual MAC address

Every FortiGate unit physical interface has two MAC addresses: the current hardware address and the permanent hardware address. The permanent hardware address cannot be changed, it is the actual MAC address of the interface hardware. The current hardware address can be changed. The current hardware address is the address seen by the network. For a FortiGate unit not operating in HA, you can use the following command to change the current hardware address of the port1 interface:

config system interface edit port1

set macaddr <mac_address>

end end

For an operating cluster, the current hardware address of each cluster unit interface is changed to the HA virtual MAC address by the FGCP. The macaddr option is not available for a functioning cluster. You cannot change an interface MAC address and you cannot view MAC addresses from the system interface CLI command.

You can use the get hardware nic <interface_name_str> command to display both MAC addresses for any FortiGate interface. This command displays hardware information for the specified interface. Depending on their hardware configuration, this command may display different information for different interfaces. You can use this command to display the current hardware address as Current_HWaddr and the permanent hardware address as Permanent_HWaddr. For some interfaces the current hardware address is displayed as MAC. The command displays a great deal of information about the interface so you may have to scroll the output to find the hardware addresses.

You can also use the diagnose hardware deviceinfo nic <interface_str> command to display both MAC addresses for any FortiGate interface.

Before HA configuration the current and permanent hardware addresses are the same. For example for one of the units in Cluster_1:

FGT60B3907503171 # get hardware nic internal

MAC: 02:09:0f:78:18:c9

Permanent_HWaddr: 02:09:0f:78:18:c9

During HA operation the current hardware address becomes the HA virtual MAC address, for example for the units in Cluster_1:

FGT60B3907503171 # get hardware nic internal

MAC: 00:09:0f:09:00:02

Permanent_HWaddr: 02:09:0f:78:18:c9

The following command output for Cluster_2 shows the same current hardware address for port1 as for the internal interface of Cluster_2, indicating a MAC address conflict.

FG300A2904500238 # get hardware nic port1

MAC: 00:09:0f:09:00:02

Permanent_HWaddr: 00:09:0F:85:40:FD

How the virtual MAC address is determined

1 Reply

How the virtual MAC address is determined

The virtual MAC address is determined based on following formula:

00-09-0f-09-<group-id_hex>-(<vcluster_integer> + <idx>) where <group-id_hex> is the HA Group ID for the cluster converted to hexadecimal.The following table lists the virtual MAC address set for each group ID.

HA group ID in integer and hexadecimal format

Integer Group ID

Hexadecimal Group ID

Integer Group ID Hexadecimal Group ID

… …

10 0a

11 0b

… …

63 3f

… …

255 ff

<vcluster_integer> is 0 for virtual cluster 1 and 20 for virtual cluster 2. If virtual domains are not enabled, HA sets the virtual cluster to 1 and by default all interfaces are in the root virtual domain. Including virtual cluster and virtual domain factors in the virtual MAC address formula means that the same formula can be used whether or not virtual domains and virtual clustering is enabled.

<idx> is the index number of the interface. Interfaces are numbered from 0 to x (where x is the number of interfaces). Interfaces are numbered according to their has map order. See Interface index and display order on page 1503. The first interface has an index of 0. The second interface in the list has an index of 1 and so on.

Only the <idx> part of the virtual MAC address is different for each interface. The <vcluster_integer> would be different for different interfaces if multiple VDOMs have been added.

Between FortiOS releases interface indexing may change so the virtual MAC addresses assigned to individual FortiGate interfaces may also change.

Example virtual MAC addresses

An HA cluster with HA group ID unchanged (default=0) and virtual domains not enabled would have the following virtual MAC addresses for interfaces port1 to port12:

port1 virtual MAC: 00-09-0f-09-00-00
port10 virtual MAC: 00-09-0f-09-00-01
port2 virtual MAC: 00-09-0f-09-00-02 l port3 virtual MAC: 00-09-0f-09-00-03 l port4 virtual MAC: 00-09-0f-09-00-04 l port5 virtual MAC: 00-09-0f-09-00-05 l port6 virtual MAC: 00-09-0f-09-00-06 l port7 virtual MAC: 00-09-0f-09-00-07
port8 virtual MAC: 00-09-0f-09-00-08
port9 virtual MAC: 00-09-0f-09-00-
port11 virtual MAC: 00-09-0f-09-00-0a
port12 virtual MAC: 00-09-0f-09-00-0b

If the group ID is changed to 34 these virtual MAC addresses change to:

port1 virtual MAC: 00-09-0f-09-22-00 l port3 virtual MAC: 00-09-0f-09-22-03 l port4 virtual MAC: 00-09-0f-09-22-04 l port5 virtual MAC: 00-09-0f-09-22-05 l port6 virtual MAC: 00-09-0f-09-22-06 l port7 virtual MAC: 00-09-0f-09-22-07 l port8 virtual MAC: 00-09-0f-09-22-08 l port9 virtual MAC: 00-09-0f-09-22-
port11 virtual MAC: 00-09-0f-09-22-0a l port12 virtual MAC: 00-09-0f-09-22-0b l port10 virtual MAC: 00-09-0f-09-22-01 l port2 virtual MAC: 00-09-0f-09-22-02 A cluster with virtual domains enabled where the HA group ID has been changed to 23, port5 and port 6 are in the root virtual domain (which is in virtual cluster1), and port7 and port8 are in the vdom_1 virtual domain (which is in virtual cluster 2) would have the following virtual MAC addresses:
port5 interface virtual MAC: 00-09-0f-09-23-05 l port6 interface virtual MAC: 00-09-0f-09-23-06 l port7 interface virtual MAC: 00-09-0f-09-23-27 l port8 interface virtual MAC: 00-09-0f-09-23-28

Disabling gratuitous ARP packets after a failover

Leave a reply

Disabling gratuitous ARP packets after a failover

You can use the following command to turn off sending gratuitous ARP packets after a failover:

config system ha

set gratuitous-arps disable end

Sending gratuitous ARP packets is turned on by default.

In most cases you would want to send gratuitous ARP packets because its a reliable way for the cluster to notify the network to send traffic to the new primary unit. However, in some cases, sending gratuitous ARP packets may be less optimal. For example, if you have a cluster of FortiGate units in Transparent mode, after a failover the new primary unit will send gratuitous ARP packets to all of the addresses in its Forwarding Database (FDB). If the FDB has a large number of addresses it may take extra time to send all the packets and the sudden burst of traffic could disrupt the network.

If you choose to disable sending gratuitous ARP packets you must first enable the link-failed-signal setting. The cluster must have some way of informing attached network devices that a failover has occurred.

For more information about the link-failed-signal setting, see Updating MAC forwarding tables when a link failover occurs on page 1531.

Cluster virtual MAC addresses

Leave a reply

Cluster virtual MAC addresses

When a cluster is operating, the FGCP assigns virtual MAC addresses to each primary unit interface. HA uses virtual MAC addresses so that if a failover occurs, the new primary unit interfaces will have the same virtual MAC addresses and IP addresses as the failed primary unit. As a result, most network equipment would identify the new primary unit as the exact same device as the failed primary unit.

If the MAC addresses changed after a failover, the network would take longer to recover because all attached network devices would have to learn the new MAC addresses before they could communicate with the cluster.

If a cluster is operating in NAT/Route mode, the FGCP assigns a different virtual MAC address to each primary unit interface. VLAN subinterfaces are assigned the same virtual MAC address as the physical interface that the VLAN subinterface is added to. Redundant interfaces or 802.3ad aggregate interfaces are assigned the virtual MAC address of the first interface in the redundant or aggregate list.

If a cluster is operating in Transparent mode, the FGCP assigns a virtual MAC address for the primary unit management IP address. Since you can connect to the management IP address from any interface, all of the FortiGate interfaces appear to have the same virtual MAC address.

A MAC address conflict can occur if two clusters are operating on the same network. See Diagnosing packet loss with two FortiGate HA clusters in the same broadcast domain on page 1512 for more information.

Subordinate unit MAC addresses do not change. You can verify this by connecting to the subordinate unit CLI and using the get hardware interface nic command to dis- play the MAC addresses of each FortiGate interface.

The MAC address of a reserved management interface is not changed to a virtual MAC address. Instead the reserved management interface keeps its original MAC address.

When the new primary unit is selected after a failover, the primary unit sends gratuitous ARP packets to update the devices connected to the cluster interfaces (usually layer-2 switches) with the virtual MAC address. Gratuitous ARP packets configure connected network devices to associate the cluster virtual MAC addresses and cluster IP address with primary unit physical interfaces and with the layer-2 switch physical interfaces. This is sometimes called using gratuitous ARP packets (sometimes called GARP packets) to train the network. The gratuitous ARP packets sent from the primary unit are intended to make sure that the layer-2 switch forwarding databases (FDBs) are updated as quickly as possible.

Sending gratuitous ARP packets is not required for routers and hosts on the network because the new primary unit will have the same MAC and IP addresses as the failed primary unit. However, since the new primary unit interfaces are connected to different switch interfaces than the failed primary unit, many network switches will update their FDBs more quickly after a failover if the new primary unit sends gratuitous ARP packets.

Changing how the primary unit sends gratuitous ARP packets after a failover

When a failover occurs it is important that the devices connected to the primary unit update their FDBs as quickly as possible to reestablish traffic forwarding.

Depending on your network configuration, you may be able to change the number of gratuitous ARP packets and the time interval between ARP packets to reduce the cluster failover time.

You cannot disable sending gratuitous ARP packets, but you can use the following command to change the number of packets that are sent. For example, enter the following command to send 20 gratuitous ARP packets:

config system ha set arps 20

end

You can use this command to configure the primary unit to send from 1 to 60 ARP packets. Usually you would not change the default setting of 5. In some cases, however, you might want to reduce the number of gratuitous ARP packets. For example, if your cluster has a large number of VLAN interfaces and virtual domains and because gratuitous ARP packets are broadcast, sending a higher number gratuitous ARP packets may generate a lot of network traffic. As long as the cluster still fails over successfully, you could reduce the number of gratuitous ARP packets that are sent to reduce the amount of traffic produced after a failover.

If failover is taking longer that expected, you may be able to reduce the failover time by increasing the number gratuitous ARP packets sent.

You can also use the following command to change the time interval in seconds between gratuitous ARP packets. For example, enter the following command to change the time between ARP packets to 3 seconds:

config system ha

set arps-interval 3 end

The time interval can be in the range of 1 to 20 seconds. The default is 8 seconds between gratuitous ARP packets. Normally you would not need to change the time interval. However, you could decrease the time to be able send more packets in less time if your cluster takes a long time to failover.

There may also be a number of reasons to set the interval higher. For example, if your cluster has a large number of VLAN interfaces and virtual domains and because gratuitous ARP packets are broadcast, sending gratuitous ARP packets may generate a lot of network traffic. As long as the cluster still fails over successfully you could increase the interval to reduce the amount of traffic produced after a failover.

For more information about gratuitous ARP packets see RFC 826 and RFC 3927.

HA heartbeat and communication between cluster units

Leave a reply

HA heartbeat and communication between cluster units

The HA heartbeat keeps cluster units communicating with each other. The heartbeat consists of hello packets that are sent at regular intervals by the heartbeat interface of all cluster units. These hello packets describe the state of the cluster unit and are used by other cluster units to keep all cluster units synchronized.

HA heartbeat packets are non-TCP packets that use Ethertype values 0x8890, 0x8891, and 0x8890. The default time interval between HA heartbeats is 200 ms. The FGCP uses link-local IP4 addresses in the 169.254.0.x range for HA heartbeat interface IP addresses.

For best results, isolate the heartbeat devices from your user networks by connecting the heartbeat devices to a separate switch that is not connected to any network. If the cluster consists of two FortiGate units you can connect the heartbeat device interfaces directly using a crossover cable. Heartbeat packets contain sensitive information about the cluster configuration. Heartbeat packets may also use a considerable amount of network bandwidth. For these reasons, it is preferable to isolate heartbeat packets from your user networks.

On startup, a FortiGate unit configured for HA operation broadcasts HA heartbeat hello packets from its HA heartbeat interface to find other FortiGate units configured to operate in HA mode. If two or more FortiGate units operating in HA mode connect with each other, they compare HA configurations (HA mode, HA password, and HA group ID). If the HA configurations match, the units negotiate to form a cluster.

While the cluster is operating, the HA heartbeat confirms that all cluster units are functioning normally. The heartbeat also reports the state of all cluster units, including the communication sessions that they are processing.

Heartbeat interfaces

A heartbeat interface is an Ethernet network interface in a cluster that is used by the FGCP for HA heartbeat communications between cluster units.

To change the HA heartbeat configuration go to System > HA and select the FortiGate interfaces to use as HA heartbeat interfaces.

Do not use a switch port for the HA heartbeat traffic. This configuration is not supported. From the CLI enter the following command to make port4 and port5 HA heartbeat interfaces and give both

interfaces a heartbeat priority of 150:

config system ha

set hbdev port4 150 port5 150 end

The following example shows how to change the default heartbeat interface configuration so that the port4 and port1 interfaces can be used for HA heartbeat communication and to give the port4 interface the highest heartbeat priority so that port4 is the preferred HA heartbeat interface.

config system ha

set hbdev port4 100 port1 50 end

By default, for most FortiGate models two interfaces are configured to be heartbeat interfaces. You can change the heartbeat interface configuration as required. For example you can select additional or different heartbeat interfaces. You can also select only one heartbeat interface.

In addition to selecting the heartbeat interfaces, you also set the Priority for each heartbeat interface. In all cases, the heartbeat interface with the highest priority is used for all HA heartbeat communication. If the interface fails or becomes disconnected, the selected heartbeat interface that has the next highest priority handles all heartbeat communication.

If more than one heartbeat interface has the same priority, the heartbeat interface with the highest priority that is also highest in the heartbeat interface list is used for all HA heartbeat communication. If this interface fails or becomes disconnected, the selected heartbeat interface with the highest priority that is next highest in the list handles all heartbeat communication.

The default heartbeat interface configuration sets the priority of two heartbeat interfaces to 50. You can accept the default heartbeat interface configuration if one or both of the default heartbeat interfaces are connected. You can select different heartbeat interfaces, select more heartbeat interfaces and change heartbeat priorities according to your requirements.

For the HA cluster to function correctly, you must select at least one heartbeat interface and this interface of all of the cluster units must be connected together. If heartbeat communication is interrupted and cannot failover to a second heartbeat interface, the cluster units will not be able to communicate with each other and more than one cluster unit may become a primary unit. As a result the cluster stops functioning normally because multiple devices on the network may be operating as primary units with the same IP and MAC addresses creating a kind if split brain scenario.

The heartbeat interface priority range is 0 to 512. The default priority when you select a new heartbeat interface is

0. The higher the number the higher the priority.

In most cases you can maintain the default heartbeat interface configuration as long as you can connect the heartbeat interfaces together. Configuring HA heartbeat interfaces is the same for virtual clustering and for standard HA clustering.

You can enable heartbeat communications for physical interfaces, but not for VLAN subinterfaces, IPsec VPN interfaces, redundant interfaces, or for 802.3ad aggregate interfaces. You cannot select these types of interfaces in the heartbeat interface list.

Selecting more heartbeat interfaces increases reliability. If a heartbeat interface fails or is disconnected, the HAheartbeat fails over to the next heartbeat interface.

You can select up to 8 heartbeat interfaces. This limit only applies to FortiGate units with more than 8 physical interfaces.

HA heartbeat traffic can use a considerable amount of network bandwidth. If possible, enable HA heartbeat traffic on interfaces used only for HA heartbeat traffic or on interfaces connected to less busy networks.

Pages: 1 2 3