Menu

Understanding SD-WAN and the challenges to test and assure SD-WAN performance

SD-WAN solutions offer their own performance KPIs, but usually not to the depth required to fully understand the underlay. Many SD-WAN users also choose to independently check SD-WAN performance and not to simply trust the vendor to measure itself.

Overview of SD-WAN

The key function of SD-WAN is to abstract the physical infrastructure in use from the software services that use it. The choice to move to SD-WAN is largely driven by a few key benefits that are responsible for driving the huge growth in the SD-WAN market:

1. Lower cost v dedicated MPLS.
2. Use of automation to reduce errors and increase agility
3. Single security control point.
4. Resilience support built in.
5. Application prioritisation
6. Can support multiple virtual links, useful to separate traffic in large organisations.

To allow us to have a common reference point for a discussion, first we will define the main components of an SD-WAN solution and their functions:

SD-WAN Controller

The SD-WAN Controller is the management function that allows network operations to configure policies centrally for:

  1. Traffic routing through the physical underlay networks, i.e. what traffic types are transported by what physical links
  2. End to End security policies, i.e. multiple virtual VPNs, automatic key rotation etc.
  3. Failure case policies, fall-back and resilience configurations.
  4. Service chain, application & QoS based routing

The controller typically acts as a single integration point and can aggregate management KPIs from connected SD-WAN edge units under its control. In most cases, the controller can be used to push out software updates to connected edge devices, to allow for full remote management and control. 

 

SD-WAN Edge

The SD-WAN Edge is the inline application that is implementing the design pushed down from the SD-WAN Controller. The Edge application is responsible for responding in real-time to changes in network conditions, such as congestion and failures. In these cases, pre-configured fall-back scenarios are implemented.  
Traffic carried over SD-WAN is usually in an IPSEC tunnel, this terminates at the SD-WAN Edge, and key rotation policies pushed out by the SD-WAN Controller can be implemented. 

 

Traffic routing

The primary functionality performed at the SD-WAN Edge is traffic routing across the various underlay transport technologies according to different rules.  The rules could reflect:

  1. Security considerations
  2. Performance / QoS for different applications
  3. Failure scenarios

In the second two cases, there are test, monitoring and troubleshooting implications, to ensure that the SD-WAN operates optimally and correctly. 

SD-WAN Test, Monitoring and Troubleshooting

We will consider the key three testing phases here.

  1. Activation testing, pre-launch
  2. Continuous monitoring
  3. Troubleshooting.

A critical benefit of SD-WAN is its agility in areas such as adding new capacity and changing application prioritisation. The ability to rapidly roll out a new branch or service is only half of the story. Without a matching capability to quickly and thoroughly verify the new offering is correct, the SD-WAN agility advantage is lost.

The same set of traffic handling policies in the SD-WAN controller need to be configured in a suitable testing solution, ideally programmatically via automation or orchestration.

Test probes need to be:

  1. Virtualised, supporting all the major platforms, AWS, Azure, Google
  2. Capable of automatically self-starting (zero_init)
  3. Use the smallest resource footprint, vCPU, memory and storage
  4. Be remotely managed in terms of configuration and software lifecycle.
  5. Be capable of injecting both large and low amounts of synthetic test traffic to stress test and continuously monitor.

SD-WAN Testing

To verify the application performance over the SD-WAN, the only guaranteed test method is to pass the different applications / DSCP marked traffic through the SD-WAN. Manually, this could involve physical presence at the remote location, to trigger traffic types that are measured across the SD-WAN. This is neither efficient nor in the “spirit” of SD-WAN that focusses on virtualisation and automation.

An ideal solution is a tool that is remotely installable and configurable to send multiple different traffic types to a measurement point at the opposite end of an SD-WAN link. This tool should be capable to stress test the link and ideally, in conjunction with other tools, perform analysis during failure cases, to ensure continuous operation.

Testing the underlay

SD-WAN presents an overlay traffic path for an application, but of course, the correct overlay depends on a correct underlay.  Therefore, for each application type, a testing solution should be able to confirm:

  1. Correct MTU end to end
  2. Packet loss at different throughput rates
  3. Latency and jitter for different applications/throughputs
  4. Confirm DSCP, QoS marking transparency through the underlay

The output of initial testing should be a report confirming the correct performance and in a fully automated architecture, a signal to an automation engine to allow the SD-WAN to be used in production. 

Monitoring the underlay in live use

Confirming correct ongoing operation of an SD-WAN link is the logical next step after activation testing. Once the initial spec is verified, lower levels of application traffic should be continuously sent through the underlay.  
The objectives are similar:

  1. Verify loss/latency and jitter in the underlay
  2. Confirm correct DSCP markings are being respected
  3. Confirm correct MTU
  4. Monitor specific services such as VoIP, Video, HTTP and DNS

Changes to equipment and configurations can happen at any time, so it’s essential to check for correct behaviour before end users report issues. 

SD-WAN Troubleshooting

SD-WAN presents new challenges to operations. Using PING as a simple debug tool may give very misleading results since PING packets could be directed down a different route to the traffic experiencing an issue. Higher performing debugging tools are needed. 

SD-WAN monitoring will typically flag up an issue seen in end to end testing. For higher productivity and efficiency, being able to identify the location of a problem is critical.  For this, hop by hop testing is key. Various techniques can be used to try and inspect hop by hop performance. In a fully managed system, detailed TWAMP testing can be performed against each network element in a path. TWAMP testing provides highly accurate loss, latency and jitter measurements to each compatible network element.  

For parts of an end to end system that are outside of an operator’s control, tools such as traceroute and ICMP Echo can also give indications of where problems may lie.  Those tools that can establish trends before an issue occurs, will provide the best chance to find a live issue later. 

Summary

  1. Testing before operation and during SD-WAN live use is critical to achieve high productivity.
  2. A testing solution must be fully virtualised and be capable of full automation.
  3. The end to end services must be tested and monitored, to be sure of correct and consistent behaviour.
  4. Automated troubleshooting to drill down to individual network elements should be possible. 

Best practise is to be proactive, don’t wait for users to find faults. 

Netrounds

Software based active test and assurance platform for enterprises, communication service providers and cloud providers.

Find out more

14 + 3 =