Troubleshooting SD WAN – FortiOs 6.2

Troubleshooting

Tracking SD-WAN sessions

You can check the destination interface in FortiView in order to see which port the traffic is being forwarded to.

The example below demonstrates a source-based load-balance between two SD-WAN members.

  • If the source IP address is an even number, it will go to port13.
  • If the source IP address is an odd number, it will go to port12.

For information on other features of FortiView, see FortiView on page 91.

Understanding SD-WAN related logs

This topic lists the SD-WAN related logs and explains when the logs will be triggered.

Health-check detects a failure:

  • When health-check detects a failure, it will record a log:

34: date=2019-03-23 time=17:26:06 logid=”0100022921″ type=”event” subtype=”system” level=”critical” vd=”root” eventtime=1553387165 logdesc=”Routing information changed” name=”test” interface=”R150″ status=”down” msg=”Static route on interface R150 may be removed by health-check test. Route: (10.100.1.2->10.100.2.22 ping-down)”

  • When health-check detects a recovery, it will record a log:

32: date=2019-03-23 time=17:26:54 logid=”0100022921″ type=”event” subtype=”system” level=”critical” vd=”root” eventtime=1553387214 logdesc=”Routing information changed” name=”test” interface=”R150″ status=”up” msg=”Static route on interface R150 may be added by health-check test. Route: (10.100.1.2->10.100.2.22 ping-up)”

Health-check has an SLA target and detects SLA qualification changes:

  • When health-check has an SLA target and detects SLA changes, and changes to fail:

5: date=2019-04-11 time=11:48:39 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555008519816639290 logdesc=”Virtual WAN Link status” msg=”SD-WAN Health Check(ping) SLA(1): number of pass members changes from 2 to 1.”

  • When health-check has an SLA target and detects SLA changes, and changes to pass:

2: date=2019-04-11 time=11:49:46 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555008586149038471 logdesc=”Virtual WAN Link status” msg=”SD-WAN Health Check(ping) SLA(1): number of pass members changes from 1 to 2.”

SD-WAN calculates a link’s session/bandwidth over/under its ratio and stops/resumes traffic:

  • When SD-WAN calculates a link’s session/bandwidth over its configured ratio and stops forwarding traffic:

3: date=2019-04-10 time=17:15:40 logid=”0100022924″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1554941740185866628 logdesc=”Virtual WAN Link volume status” interface=”R160″ msg=”The member(3) enters into conservative status with limited ablity to receive new sessions for too much traffic.” l When SD-WAN calculates a link’s session/bandwidth according to its ratio and resumes forwarding traffic:

1: date=2019-04-10 time=17:20:39 logid=”0100022924″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1554942040196041728 logdesc=”Virtual WAN Link volume status” interface=”R160″ msg=”The member(3) resume normal status to receive new sessions for internal adjustment.”

The SLA mode service rule’s SLA qualified member changes:

  • When the SLA mode service rule’s SLA qualified member changes. In this example R150 fails the SLA check, but is still alive:

14: date=2019-03-23 time=17:44:12 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388252 logdesc=”Virtual WAN Link status” msg=”Service2() prioritized by SLA will be redirected in seq-num order 2(R160) 1(R150).” 15: date=2019-03-23 time=17:44:12 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388252 logdesc=”Virtual WAN Link status” interface=”R150″ msg=”The member1(R150) SLA order changed from 1 to 2. ”

16: date=2019-03-23 time=17:44:12 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388252 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) SLA order changed from 2 to 1. ”

  • When the SLA mode service rule’s SLA qualified member changes. In this example R150 changes from fail to pass:

1: date=2019-03-23 time=17:46:05 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388365 logdesc=”Virtual WAN Link status” msg=”Service2() prioritized by SLA will be redirected in seq-num order 1(R150) 2(R160).” 2: date=2019-03-23 time=17:46:05 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388365 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) SLA order changed from 1 to 2. ” 3: date=2019-03-23 time=17:46:05 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388365 logdesc=”Virtual WAN Link status” interface=”R150″ msg=”The member1(R150) SLA order changed from 2 to 1. ”

The priority mode service rule member’s link status changes:

  • When priority mode service rule member’s link status changes. In this example R150 changes to better than R160, and both are still alive:

1: date=2019-03-23 time=17:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387603 logdesc=”Virtual WAN Link status” msg=”Service2() prioritized by packet-loss will be redirected in seq-num order 1(R150) 2 (R160).”

2: date=2019-03-23 time=17:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387603 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) link quality packet-loss order changed from 1 to 2.

3: date=2019-03-23 time=17:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387603 logdesc=”Virtual WAN Link status” interface=”R150″ msg=”The member1(R150) link quality packet-loss order changed from 2 to 1. ” l When priority mode service rule member’s link status changes. In this example R160 changes to better than R150, and both are still alive:

6: date=2019-03-23 time=17:32:01 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387520 logdesc=”Virtual WAN Link status” msg=”Service2() prioritized by packet-loss will be redirected in seq-num order 2(R160) 1 (R150).”

7: date=2019-03-23 time=17:32:01 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387520 logdesc=”Virtual WAN Link status” interface=”R150″ msg=”The member1(R150) link quality packet-loss order changed from 1 to 2.

8: date=2019-03-23 time=17:32:01 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387520 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) link quality packet-loss order changed from 2 to 1. ”

SD-WAN member is used in service and it fails the health-check:

  • When SD-WAN member fails the health-check, it will stop forwarding traffic:

6: date=2019-04-11 time=13:33:21 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555014801844089814 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) link is unreachable or miss threshold. Stop forwarding traffic. ”

  • When SD-WAN member passes the health-check again, it will resume forwarding logs:

2: date=2019-04-11 time=13:33:36 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555014815914643626 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) link is available. Start forwarding traffic. ”

Load-balance mode service rule’s SLA qualified member changes:

  • When load-balance mode service rule’s SLA qualified member changes. In this example R150 changes to not meet SLA:

2: date=2019-04-11 time=14:11:16 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555017075926510687 logdesc=”Virtual WAN Link status” msg=”Service1(rule2) will be load balanced among members 2(R160) with available routing.” 3: date=2019-04-11 time=14:11:16 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555017075926508676 logdesc=”Virtual WAN Link status”

interface=”R150″ msg=”The member1(R150) SLA order changed from 1 to 2. ” 4: date=2019-04-11 time=14:11:16 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555017075926507182 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) SLA order changed from 2 to 1. ”

  • When load-balance mode service rule’s SLA qualified member changes. In this example R150 changes to meet SLA:

1: date=2019-04-11 time=14:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1555017075926510668 logdesc=”Virtual WAN Link status” msg=”Service1(rule2) will be load balanced among members 1(R150) 2(R160) with available routing.”

2: date=2019-03-23 time=14:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387603592651068 logdesc=”Virtual WAN Link status” interface=”R160″ msg=”The member2(R160) link quality packet-loss order changed from 1 to 2.

3: date=2019-03-23 time=14:33:23 logid=”0100022923″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553387603592651068 logdesc=”Virtual WAN Link status” interface=”R150″ msg=”The member1(R150) link quality packet-loss order changed from 2 to 1. ”

SLA link status logs, generated with interval sla-fail-log-period or sla-pass-log-period:

l When SLA fails, SLA link status logs will be generated with interval sla-fail-log-period:

7: date=2019-03-23 time=17:45:54 logid=”0100022925″ type=”event” subtype=”system” level=”notice” vd=”root” eventtime=1553388352 logdesc=”Link monitor SLA information” name=”test” interface=”R150″ status=”up” msg=”Latency: 0.016, jitter: 0.002, packet loss: 21.000%, inbandwidth: 0Mbps, outbandwidth: 200Mbps, bibandwidth: 200Mbps, sla_map: 0x0″ l When SLA passes, SLA link status logs will be generated with interval sla-pass-log-period:

5: date=2019-03-23 time=17:46:05 logid=”0100022925″ type=”event” subtype=”system” level=”information” vd=”root” eventtime=1553388363 logdesc=”Link monitor SLA information” name=”test” interface=”R150″ status=”up” msg=”Latency: 0.017, jitter: 0.003, packet loss:

0.000%, inbandwidth: 0Mbps, outbandwidth: 200Mbps, bibandwidth: 200Mbps, sla_map: 0x1″

SD-WAN related diagnose commands

This topic lists the SD-WAN related diagnose commands and related output.

To check SD-WAN health-check status:

FGT # diagnose sys virtual-wan-link health-check Health Check(server):

Seq(1): state(alive), packet-loss(0.000%) latency(15.247), jitter(5.231) sla_map=0x0

Seq(2): state(alive), packet-loss(0.000%) latency(13.621), jitter(6.905) sla_map=0x0

FGT  # diagnose sys virtual-wan-link health-check Health Check(ping):

Seq(1): state(alive), packet-loss(0.000%) latency(0.683), jitter(0.082) sla_map=0x0 Seq(2): state(dead), packet-loss(100.000%) sla_map=0x0

FGT # diagnose sys virtual-wan-link health-check google Health Check(google):

Seq(1): state(alive), packet-loss(0.000%) latency(14.563), jitter(4.334) sla_map=0x0

Seq(2): state(alive), packet-loss(0.000%) latency(12.633), jitter(6.265) sla_map=0x0

To check SD-WAN member status:

l When SD-WAN load-balance mode is source-ip-based/source-dest-ip-based.

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port13, gateway: 10.100.1.1 2004:10:100:1::1, priority: 0, weight: 0

Member(2): interface: port15, gateway: 10.100.1.5 2004:10:100:1::5, priority: 0, weight: 0 l When SD-WAN load-balance mode is weight-based.

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port13, gateway: 10.100.1.1 2004:10:100:1::1, priority: 0, weight: 33

Member(2): interface: port15, gateway: 10.100.1.5 2004:10:100:1::5, priority: 0, weight: 66 l When SD-WAN load-balance mode is measured-volume-based. l Both members are under volume and still have room:

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port13, gateway: 10.100.1.1 2004:10:100:1::1, priority: 0, weight: 33

Config volume ratio: 33, last reading: 8211734579B, volume room 33MB

Member(2): interface: port15, gateway: 10.100.1.5 2004:10:100:1::5, priority: 0, weight: 66

Config volume ratio: 66, last reading: 24548159B, volume room 66MB l Some members are overloaded and some still have room:

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port1, gateway: 10.10.0.2, priority: 0, weight: 0

Config volume ratio: 10, last reading: 10297221000B, overload volume 1433MB

Member(2): interface: port2, gateway: 10.11.0.2, priority: 0, weight: 38 Config volume ratio: 50, last reading: 45944239916B, volume room 38MB l When SD-WAN load balance mode is usage-based/spillover. l When no spillover occurs:

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port13, gateway: 10.100.1.1 2004:10:100:1::1, priority: 0, weight: 255

Egress-spillover-threshold: 400kbit/s, ingress-spillover-threshold: 300kbit/s Egress-overbps=0, ingress-overbps=0

Member(2): interface: port15, gateway: 10.100.1.5 2004:10:100:1::5, priority: 0, weight: 254

Egress-spillover-threshold: 0kbit/s, ingress-spillover-threshold: 0kbit/s Egress-overbps=0, ingress-overbps=0 l When member has reached limit and spillover occurs:

FGT # diagnose sys virtual-wan-link member

Member(1): interface: port13, gateway: 10.100.1.1 2004:10:100:1::1, priority: 0, weight: 255

Egress-spillover-threshold: 400kbit/s, ingress-spillover-threshold: 300kbit/s Egress-overbps=1, ingress-overbps=1

Member(2): interface: port15, gateway: 10.100.1.5 2004:10:100:1::5, priority: 0, weight: 254

Egress-spillover-threshold: 0kbit/s, ingress-spillover-threshold: 0kbit/s

Egress-overbps=0, ingress-overbps=0

  • You can also use the diagnose netlink dstmac list command to check if you are over the limit.

FGT # diag netlink dstmac list port13

dev=port13 mac=08:5b:0e:ca:94:9d rx_tcp_mss=0 tx_tcp_mss=0 egress_overspill_ threshold=51200 egress_bytes=103710 egress_over_bps=1 ingress_overspill_threshold=38400 ingress_bytes=76816 ingress_over_bps=1 sampler_rate=0

To check SD-WAN service rules status:

  • Manual mode service rules.

FGT # diagnose sys virtual-wan-link service

Service(1): Address Mode(IPV4) flags=0x0

TOS(0x0/0x0), Protocol(0: 1->65535), Mode(manual) Members:

1: Seq_num(2), alive, selected

Dst address: 10.100.21.0-10.100.21.255 l Auto mode service rules.

FGT # diagnose sys virtual-wan-link service

Service(1): Address Mode(IPV4) flags=0x0

TOS(0x0/0x0), Protocol(0: 1->65535), Mode(auto), link-cost-factor(latency), link-costthreshold(10), health-check(ping) Members:

1: Seq_num(2), alive, latency: 0.011

2: Seq_num(1), alive, latency: 0.018, selected Dst address: 10.100.21.0-10.100.21.255 l Priority mode service rules.

FGT # diagnose sys virtual-wan-link service

Service(1): Address Mode(IPV4) flags=0x0

TOS(0x0/0x0), Protocol(0: 1->65535), Mode(priority), link-cost-factor(latency), linkcost-threshold(10), health-check(ping) Members:

1: Seq_num(2), alive, latency: 0.011, selected

2: Seq_num(1), alive, latency: 0.017, selected Dst address: 10.100.21.0-10.100.21.255 l Load-balance mode service rules.

FGT # diagnose sys virtual-wan-link service

Service(1): Address Mode(IPV4) flags=0x0

TOS(0x0/0x0), Protocol(0: 1->65535), Mode(load-balance) Members:

1: Seq_num(1), alive, sla(0x1), num of pass(1), selected

2: Seq_num(2), alive, sla(0x1), num of pass(1), selected Dst address: 10.100.21.0-10.100.21.255 l SLA mode service rules.

FGT # diagnose sys virtual-wan-link service

Service(1): Address Mode(IPV4) flags=0x0 TOS(0x0/0x0), Protocol(0: 1->65535), Mode(sla) Members:

1: Seq_num(1), alive, sla(0x1), cfg_order(0), cost(0), selected

2: Seq_num(2), alive, sla(0x1), cfg_order(1), cost(0), selected Dst address: 10.100.21.0-10.100.21.255

To check interface logs from the past 15 minutes:

FGT (root) # diagnose sys virtual-wan-link intf-sla-log R150

Timestamp: Fri Apr 12 11:08:36 2019, used inbandwidth: 0bps, used outbandwidth: 0bps, used bibandwidth: 0bps, tx bytes: 860bytes, rx bytes: 1794bytes.

Timestamp: Fri Apr 12 11:08:46 2019, used inbandwidth: 1761bps, used outbandwidth: 1710bps, used bibandwidth: 3471bps, tx bytes: 2998bytes, rx bytes: 3996bytes.

Timestamp: Fri Apr 12 11:08:56 2019, used inbandwidth: 2452bps, used outbandwidth: 2566bps, used bibandwidth: 5018bps, tx bytes: 7275bytes, rx bytes: 7926bytes.

Timestamp: Fri Apr 12 11:09:06 2019, used inbandwidth: 2470bps, used outbandwidth: 3473bps, used bibandwidth: 5943bps, tx bytes: 13886bytes, rx bytes: 11059bytes.

Timestamp: Fri Apr 12 11:09:16 2019, used inbandwidth: 2433bps, used outbandwidth: 3417bps, used bibandwidth: 5850bps, tx bytes: 17946bytes, rx bytes: 13960bytes.

Timestamp: Fri Apr 12 11:09:26 2019, used inbandwidth: 2450bps, used outbandwidth: 3457bps, used bibandwidth: 5907bps, tx bytes: 22468bytes, rx bytes: 17107bytes.

To check SLA logs in the past 15 minutes:

FGT (root) # diagnose sys virtual-wan-link sla-log ping 1

Timestamp: Fri Apr 12 11:09:27 2019, vdom root, health-check ping, interface: R150, status:

up, latency: 0.014, jitter: 0.003, packet loss: 16.000%.

Timestamp: Fri Apr 12 11:09:28 2019, vdom root, health-check ping, interface: R150, status:

up, latency: 0.015, jitter: 0.003, packet loss: 15.000%.

Timestamp: Fri Apr 12 11:09:28 2019, vdom root, health-check ping, interface: R150, status:

up, latency: 0.014, jitter: 0.003, packet loss: 14.000%.

Timestamp: Fri Apr 12 11:09:29 2019, vdom root, health-check ping, interface: R150, status: up, latency: 0.015, jitter: 0.003, packet loss: 13.000%.

To check application control used in SD-WAN and the matching IP addresses:

FGT # diagnose sys virtual-wan-link internet-service-app-ctrl-list

Ctrl application(Microsoft.Authentication 41475):Internet Service ID(4294836224)

Protocol(6), Port(443)

Address(2): 104.42.72.21 131.253.61.96

Ctrl application(Microsoft.CDN 41470):Internet Service ID(4294836225)

Ctrl application(Microsoft.Lync 28554):Internet Service ID(4294836226)

Ctrl application(Microsoft.Office.365 33182):Internet Service ID(4294836227)

Ctrl application(Microsoft.Office.365.Portal 41468):Internet Service ID(4294836228)

Ctrl application(Microsoft.Office.Online 16177):Internet Service ID(4294836229)

Ctrl application(Microsoft.OneNote 40175):Internet Service ID(4294836230)

Ctrl application(Microsoft.Portal 41469):Internet Service ID(4294836231)

Protocol(6), Port(443)

Address(8): 23.58.134.172 131.253.33.200 23.58.135.29 204.79.197.200 64.4.54.254

23.59.156.241 13.77.170.218 13.107.22.200

Ctrl application(Microsoft.Sharepoint 16190):Internet Service ID(4294836232)

Ctrl application(Microsoft.Sway 41516):Internet Service ID(4294836233)

Ctrl application(Microsoft.Tenant.Namespace 41471):Internet Service ID(4294836234)

To check IPsec aggregate interface when SD-WAN uses the per-packet distribution feature:

# diagnose sys ipsec-aggregate list agg1 algo=L3 member=2 run_tally=2 members: vd1-p1 vd1-p2

To check BGP learned routes and determine if they are used in SD-WAN service:

FGT # get router info bgp network

FGT # get router info bgp network 10.100.11.0

BGP routing table entry for 10.100.10.0/24

Paths: (2 available, best 1, table Default-IP-Routing-Table) Advertised to non peer-group peers:

172.10.22.2

20

10.100.20.2 from 10.100.20.2 (6.6.6.6)

Origin EGP metric 200, localpref 100, weight 10000, valid, external, best

Community: 30:5

Last update: Wen Mar 20 18:45:17 2019

FGT # get router info route-map-address

Extend-tag: 15, interface(wan2:16)

10.100.11.0/255.255.255.0

FGT # diagnose firewall proute list list route policy info(vf=root):

id=4278779905 vwl_service=1(DataCenter) flags=0x0 tos=0x00 tos_mask=0x00 protocol=0 sportt=0:65535 iif=0 dport=1-65535 oif=16 source wildcard(1): 0.0.0.0/0.0.0.0

destination wildcard(1): 10.100.11.0/255.255.255.0

 


Having trouble configuring your Fortinet hardware or have some questions you need answered? Ask your questions in the comments below!!! Want someone else to deal with it for you? Get some consulting from Fortinet GURU!

Don't Forget To Buy Your Fortinet Hardware From The Fortinet GURU