Troubleshooting SD WAN – FortiOs 6.2

Troubleshooting

Tracking SD-WAN sessions

You can check the destination interface in FortiView in order to see which port the traffic is being forwarded to.

The example below demonstrates a source-based load-balance between two SD-WAN members.

  • If the source IP address is an even number, it will go to port13.
  • If the source IP address is an odd number, it will go to port12.

For information on other features of FortiView, see FortiView on page 91.

Understanding SD-WAN related logs

This topic lists the SD-WAN related logs and explains when the logs will be triggered.

Health-check detects a failure:

  • When health-check detects a failure, it will record a log:

34: date=2019-03-23 time=17:26:06 logid=”0100022921″ type=”event” subtype=”system” level=”critical” vd=”root” eventtime=1553387165 logdesc=”Routing information changed” name=”test” interface=”R150″ status=”down” msg=”Static route on interface R150 may be removed by health-check test. Route: (10.100.1.2->10.100.2.22 ping-down)”

  • When health-check detects a recovery, it will record a log:

32: date=2019-03-23 time=17:26:54 logid=”0100022921″ type=”event” subtype=”system” level=”critical” vd=”root” eventtime=1553387214 logdesc=”Routing information changed” name=”test” interface=”R150″ status=”up” msg=”Static route on interface R150 may be added by health-check test. Route: (10.100.1.2->10.100.2.22 ping-up)”

Health-check has an SLA target and detects SLA qualification changes:

  • When health-check has an SLA target and detects SLA changes, and changes to fail:

5: date=2019-04-11 time=11:48:39 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555008519816639290 logdesc=”Virtual WAN Link status” msg=”SD-WAN Health Check(ping) SLA(1): number of pass members changes from 2 to 1.”

  • When health-check has an SLA target and detects SLA changes, and changes to pass:

2: date=2019-04-11 time=11:49:46 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555008586149038471 logdesc=”Virtual WAN Link status” msg=”SD-WAN Health Check(ping) SLA(1): number of pass members changes from 1 to 2.”

SD-WAN calculates a link’s session/bandwidth over/under its ratio and stops/resumes traffic:

  • When SD-WAN calculates a link’s session/bandwidth over its configured ratio and stops forwarding traffic:

3: date=2019-04-10 time=17:15:40 logid=”0100022924″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1554941740185866628 logdesc=”Virtual WAN Link volume status” interface=”R160″ msg=”The member(3) enters into conservative status with limited ablity to receive new sessions for too much traffic.” l When SD-WAN calculates a link’s session/bandwidth according to its ratio and resumes forwarding traffic:

1: date=2019-04-10 time=17:20:39 logid=”0100022924″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1554942040196041728 logdesc=”Virtual WAN Link volume status” interface=”R160″ msg=”The member(3) resume normal status to receive new sessions for internal adjustment.”

The SLA mode service rule’s SLA qualified member changes:

  • When the SLA mode service rule’s SLA qualified member changes. In this example R150 fails the SLA check, but is still alive:

14: date=2019-03-23 time=17:44:12 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388252 logdesc=”Virtual WAN Link status” msg=”Service2() prioritized by SLA will be redirected in seq-num order 2(R160) 1(R150).” 15: date=2019-03-23 time=17:44:12 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388252 logdesc=”Virtual WAN Link status” interface=”R150″ msg=”The member1(R150) SLA order changed from 1 to 2. ”

16: date=2019-03-23 time=17:44:12 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388252 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) SLA order changed from 2 to 1. ”

  • When the SLA mode service rule’s SLA qualified member changes. In this example R150 changes from fail to pass:

1: date=2019-03-23 time=17:46:05 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388365 logdesc=”Virtual WAN Link status” msg=”Service2() prioritized by SLA will be redirected in seq-num order 1(R150) 2(R160).” 2: date=2019-03-23 time=17:46:05 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388365 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) SLA order changed from 1 to 2. ” 3: date=2019-03-23 time=17:46:05 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388365 logdesc=”Virtual WAN Link status” interface=”R150″ msg=”The member1(R150) SLA order changed from 2 to 1. ”

The priority mode service rule member’s link status changes:

  • When priority mode service rule member’s link status changes. In this example R150 changes to better than R160, and both are still alive:

1: date=2019-03-23 time=17:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387603 logdesc=”Virtual WAN Link status” msg=”Service2() prioritized by packet-loss will be redirected in seq-num order 1(R150) 2 (R160).”

2: date=2019-03-23 time=17:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387603 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) link quality packet-loss order changed from 1 to 2.

3: date=2019-03-23 time=17:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387603 logdesc=”Virtual WAN Link status” interface=”R150″ msg=”The member1(R150) link quality packet-loss order changed from 2 to 1. ” l When priority mode service rule member’s link status changes. In this example R160 changes to better than R150, and both are still alive:

6: date=2019-03-23 time=17:32:01 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387520 logdesc=”Virtual WAN Link status” msg=”Service2() prioritized by packet-loss will be redirected in seq-num order 2(R160) 1 (R150).”

7: date=2019-03-23 time=17:32:01 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387520 logdesc=”Virtual WAN Link status” interface=”R150″ msg=”The member1(R150) link quality packet-loss order changed from 1 to 2.

8: date=2019-03-23 time=17:32:01 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387520 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) link quality packet-loss order changed from 2 to 1. ”

SD-WAN member is used in service and it fails the health-check:

  • When SD-WAN member fails the health-check, it will stop forwarding traffic:

6: date=2019-04-11 time=13:33:21 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555014801844089814 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) link is unreachable or miss threshold. Stop forwarding traffic. ”

  • When SD-WAN member passes the health-check again, it will resume forwarding logs:

2: date=2019-04-11 time=13:33:36 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555014815914643626 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) link is available. Start forwarding traffic. ”

Load-balance mode service rule’s SLA qualified member changes:

  • When load-balance mode service rule’s SLA qualified member changes. In this example R150 changes to not meet SLA:

2: date=2019-04-11 time=14:11:16 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555017075926510687 logdesc=”Virtual WAN Link status” msg=”Service1(rule2) will be load balanced among members 2(R160) with available routing.” 3: date=2019-04-11 time=14:11:16 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555017075926508676 logdesc=”Virtual WAN Link status”

interface=”R150″ msg=”The member1(R150) SLA order changed from 1 to 2. ” 4: date=2019-04-11 time=14:11:16 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555017075926507182 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) SLA order changed from 2 to 1. ”

  • When load-balance mode service rule’s SLA qualified member changes. In this example R150 changes to meet SLA:

1: date=2019-04-11 time=14:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555017075926510668 logdesc=”Virtual WAN Link status” msg=”Service1(rule2) will be load balanced among members 1(R150) 2(R160) with available routing.”

2: date=2019-03-23 time=14:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387603592651068 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) link quality packet-loss order changed from 1 to 2.

3: date=2019-03-23 time=14:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387603592651068 logdesc=”Virtual WAN Link status” interface=”R150″ msg=”The member1(R150) link quality packet-loss order changed from 2 to 1. ”

SLA link status logs, generated with interval sla-fail-log-period or sla-pass-log-period:

l When SLA fails, SLA link status logs will be generated with interval sla-fail-log-period:

7: date=2019-03-23 time=17:45:54 logid=”0100022925″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388352 logdesc=”Link monitor SLA information” name=”test” interface=”R150″ status=”up” msg=”Latency: 0.016, jitter: 0.002, packet loss: 21.000%, inbandwidth: 0Mbps, outbandwidth: 200Mbps, bibandwidth: 200Mbps, sla_map: 0x0″ l When SLA passes, SLA link status logs will be generated with interval sla-pass-log-period:

5: date=2019-03-23 time=17:46:05 logid=”0100022925″ type=”event” subtype=”system” level=”information” vd=”root” eventtime=1553388363 logdesc=”Link monitor SLA information” name=”test” interface=”R150″ status=”up” msg=”Latency: 0.017, jitter: 0.003, packet loss:

0.000%, inbandwidth: 0Mbps, outbandwidth: 200Mbps, bibandwidth: 200Mbps, sla_map: 0x1″

SD-WAN related diagnose commands

This topic lists the SD-WAN related diagnose commands and related output.

To check SD-WAN health-check status:

FGT # diagnose sys virtual-wan-link health-check Health Check(server):

Seq(1): state(alive), packet-loss(0.000%) latency(15.247), jitter(5.231) sla_map=0x0

Seq(2): state(alive), packet-loss(0.000%) latency(13.621), jitter(6.905) sla_map=0x0

FGT  # diagnose sys virtual-wan-link health-check Health Check(ping):

Seq(1): state(alive), packet-loss(0.000%) latency(0.683), jitter(0.082) sla_map=0x0 Seq(2): state(dead), packet-loss(100.000%) sla_map=0x0

FGT # diagnose sys virtual-wan-link health-check google Health Check(google):

Seq(1): state(alive), packet-loss(0.000%) latency(14.563), jitter(4.334) sla_map=0x0

Seq(2): state(alive), packet-loss(0.000%) latency(12.633), jitter(6.265) sla_map=0x0

To check SD-WAN member status:

l When SD-WAN load-balance mode is source-ip-based/source-dest-ip-based.

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port13, gateway: 10.100.1.1 2004:10:100:1::1, priority: 0, weight: 0

Member(2): interface: port15, gateway: 10.100.1.5 2004:10:100:1::5, priority: 0, weight: 0 l When SD-WAN load-balance mode is weight-based.

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port13, gateway: 10.100.1.1 2004:10:100:1::1, priority: 0, weight: 33

Member(2): interface: port15, gateway: 10.100.1.5 2004:10:100:1::5, priority: 0, weight: 66 l When SD-WAN load-balance mode is measured-volume-based. l Both members are under volume and still have room:

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port13, gateway: 10.100.1.1 2004:10:100:1::1, priority: 0, weight: 33

Config volume ratio: 33, last reading: 8211734579B, volume room 33MB

Member(2): interface: port15, gateway: 10.100.1.5 2004:10:100:1::5, priority: 0, weight: 66

Config volume ratio: 66, last reading: 24548159B, volume room 66MB l Some members are overloaded and some still have room:

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port1, gateway: 10.10.0.2, priority: 0, weight: 0

Config volume ratio: 10, last reading: 10297221000B, overload volume 1433MB

Member(2): interface: port2, gateway: 10.11.0.2, priority: 0, weight: 38 Config volume ratio: 50, last reading: 45944239916B, volume room 38MB l When SD-WAN load balance mode is usage-based/spillover. l When no spillover occurs:

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port13, gateway: 10.100.1.1 2004:10:100:1::1, priority: 0, weight: 255

Egress-spillover-threshold: 400kbit/s, ingress-spillover-threshold: 300kbit/s Egress-overbps=0, ingress-overbps=0

Member(2): interface: port15, gateway: 10.100.1.5 2004:10:100:1::5, priority: 0, weight: 254

Egress-spillover-threshold: 0kbit/s, ingress-spillover-threshold: 0kbit/s Egress-overbps=0, ingress-overbps=0 l When member has reached limit and spillover occurs:

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port13, gateway: 10.100.1.1 2004:10:100:1::1, priority: 0, weight: 255

Egress-spillover-threshold: 400kbit/s, ingress-spillover-threshold: 300kbit/s Egress-overbps=1, ingress-overbps=1

Member(2): interface: port15, gateway: 10.100.1.5 2004:10:100:1::5, priority: 0, weight: 254

Egress-spillover-threshold: 0kbit/s, ingress-spillover-threshold: 0kbit/s

Egress-overbps=0, ingress-overbps=0

  • You can also use the diagnose netlink dstmac list command to check if you are over the limit.

FGT # diag netlink dstmac list port13

dev=port13 mac=08:5b:0e:ca:94:9d rx_tcp_mss=0 tx_tcp_mss=0 egress_overspill_ threshold=51200 egress_bytes=103710 egress_over_bps=1 ingress_overspill_threshold=38400 ingress_bytes=76816 ingress_over_bps=1 sampler_rate=0

To check SD-WAN service rules status:

  • Manual mode service rules.

FGT # diagnose sys virtual-wan-link service

Service(1): Address Mode(IPV4) flags=0x0

TOS(0x0/0x0), Protocol(0: 1->65535), Mode(manual) Members:

1: Seq_num(2), alive, selected

Dst address: 10.100.21.0-10.100.21.255 l Auto mode service rules.

FGT # diagnose sys virtual-wan-link service

Service(1): Address Mode(IPV4) flags=0x0

TOS(0x0/0x0), Protocol(0: 1->65535), Mode(auto), link-cost-factor(latency), link-costthreshold(10), health-check(ping) Members:

1: Seq_num(2), alive, latency: 0.011

2: Seq_num(1), alive, latency: 0.018, selected Dst address: 10.100.21.0-10.100.21.255 l Priority mode service rules.

FGT # diagnose sys virtual-wan-link service

Service(1): Address Mode(IPV4) flags=0x0

TOS(0x0/0x0), Protocol(0: 1->65535), Mode(priority), link-cost-factor(latency), linkcost-threshold(10), health-check(ping) Members:

1: Seq_num(2), alive, latency: 0.011, selected

2: Seq_num(1), alive, latency: 0.017, selected Dst address: 10.100.21.0-10.100.21.255 l Load-balance mode service rules.

FGT # diagnose sys virtual-wan-link service

Service(1): Address Mode(IPV4) flags=0x0

TOS(0x0/0x0), Protocol(0: 1->65535), Mode(load-balance) Members:

1: Seq_num(1), alive, sla(0x1), num of pass(1), selected

2: Seq_num(2), alive, sla(0x1), num of pass(1), selected Dst address: 10.100.21.0-10.100.21.255 l SLA mode service rules.

FGT # diagnose sys virtual-wan-link service

Service(1): Address Mode(IPV4) flags=0x0 TOS(0x0/0x0), Protocol(0: 1->65535), Mode(sla) Members:

1: Seq_num(1), alive, sla(0x1), cfg_order(0), cost(0), selected

2: Seq_num(2), alive, sla(0x1), cfg_order(1), cost(0), selected Dst address: 10.100.21.0-10.100.21.255

To check interface logs from the past 15 minutes:

FGT (root) # diagnose sys virtual-wan-link intf-sla-log R150

Timestamp: Fri Apr 12 11:08:36 2019, used inbandwidth: 0bps, used outbandwidth: 0bps, used bibandwidth: 0bps, tx bytes: 860bytes, rx bytes: 1794bytes.

Timestamp: Fri Apr 12 11:08:46 2019, used inbandwidth: 1761bps, used outbandwidth: 1710bps, used bibandwidth: 3471bps, tx bytes: 2998bytes, rx bytes: 3996bytes.

Timestamp: Fri Apr 12 11:08:56 2019, used inbandwidth: 2452bps, used outbandwidth: 2566bps, used bibandwidth: 5018bps, tx bytes: 7275bytes, rx bytes: 7926bytes.

Timestamp: Fri Apr 12 11:09:06 2019, used inbandwidth: 2470bps, used outbandwidth: 3473bps, used bibandwidth: 5943bps, tx bytes: 13886bytes, rx bytes: 11059bytes.

Timestamp: Fri Apr 12 11:09:16 2019, used inbandwidth: 2433bps, used outbandwidth: 3417bps, used bibandwidth: 5850bps, tx bytes: 17946bytes, rx bytes: 13960bytes.

Timestamp: Fri Apr 12 11:09:26 2019, used inbandwidth: 2450bps, used outbandwidth: 3457bps, used bibandwidth: 5907bps, tx bytes: 22468bytes, rx bytes: 17107bytes.

To check SLA logs in the past 15 minutes:

FGT (root) # diagnose sys virtual-wan-link sla-log ping 1

Timestamp: Fri Apr 12 11:09:27 2019, vdom root, health-check ping, interface: R150, status:

up, latency: 0.014, jitter: 0.003, packet loss: 16.000%.

Timestamp: Fri Apr 12 11:09:28 2019, vdom root, health-check ping, interface: R150, status:

up, latency: 0.015, jitter: 0.003, packet loss: 15.000%.

Timestamp: Fri Apr 12 11:09:28 2019, vdom root, health-check ping, interface: R150, status:

up, latency: 0.014, jitter: 0.003, packet loss: 14.000%.

Timestamp: Fri Apr 12 11:09:29 2019, vdom root, health-check ping, interface: R150, status: up, latency: 0.015, jitter: 0.003, packet loss: 13.000%.

To check application control used in SD-WAN and the matching IP addresses:

FGT # diagnose sys virtual-wan-link internet-service-app-ctrl-list

Ctrl application(Microsoft.Authentication 41475):Internet Service ID(4294836224)

Protocol(6), Port(443)

Address(2): 104.42.72.21 131.253.61.96

Ctrl application(Microsoft.CDN 41470):Internet Service ID(4294836225)

Ctrl application(Microsoft.Lync 28554):Internet Service ID(4294836226)

Ctrl application(Microsoft.Office.365 33182):Internet Service ID(4294836227)

Ctrl application(Microsoft.Office.365.Portal 41468):Internet Service ID(4294836228)

Ctrl application(Microsoft.Office.Online 16177):Internet Service ID(4294836229)

Ctrl application(Microsoft.OneNote 40175):Internet Service ID(4294836230)

Ctrl application(Microsoft.Portal 41469):Internet Service ID(4294836231)

Protocol(6), Port(443)

Address(8): 23.58.134.172 131.253.33.200 23.58.135.29 204.79.197.200 64.4.54.254

23.59.156.241 13.77.170.218 13.107.22.200

Ctrl application(Microsoft.Sharepoint 16190):Internet Service ID(4294836232)

Ctrl application(Microsoft.Sway 41516):Internet Service ID(4294836233)

Ctrl application(Microsoft.Tenant.Namespace 41471):Internet Service ID(4294836234)

To check IPsec aggregate interface when SD-WAN uses the per-packet distribution feature:

# diagnose sys ipsec-aggregate list agg1 algo=L3 member=2 run_tally=2 members: vd1-p1 vd1-p2

To check BGP learned routes and determine if they are used in SD-WAN service:

FGT # get router info bgp network

FGT # get router info bgp network 10.100.11.0

BGP routing table entry for 10.100.10.0/24

Paths: (2 available, best 1, table Default-IP-Routing-Table) Advertised to non peer-group peers:

172.10.22.2

20

10.100.20.2 from 10.100.20.2 (6.6.6.6)

Origin EGP metric 200, localpref 100, weight 10000, valid, external, best

Community: 30:5

Last update: Wen Mar 20 18:45:17 2019

FGT # get router info route-map-address

Extend-tag: 15, interface(wan2:16)

10.100.11.0/255.255.255.0

FGT # diagnose firewall proute list list route policy info(vf=root):

id=4278779905 vwl_service=1(DataCenter) flags=0x0 tos=0x00 tos_mask=0x00 protocol=0 sportt=0:65535 iif=0 dport=1-65535 oif=16 source wildcard(1): 0.0.0.0/0.0.0.0

destination wildcard(1): 10.100.11.0/255.255.255.0

 

This entry was posted in Administration Guides, FortiGate, FortiOS 6.2 on by .

About Mike

Michael Pruett, CISSP has a wide range of cyber-security and network engineering expertise. The plethora of vendors that resell hardware but have zero engineering knowledge resulting in the wrong hardware or configuration being deployed is a major pet peeve of Michael's. This site was started in an effort to spread information while providing the option of quality consulting services at a much lower price than Fortinet Professional Services. Owns PacketLlama.Com (Fortinet Hardware Sales) and Office Of The CISO, LLC (Cybersecurity consulting firm).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.