In-Service Software Upgrades eliminate downtime from wireless network upgrades


In collaboration with Vishal Gupta

Cisco IT has designed Wi-Fi networks with overlapping Access Point (AP) coverage and stateful, redundant Wireless Controllers for years. However, our AP upgrades have always required system reboots outside of business hours – creating scheduling headaches for IT, and potentially disruptive downtime for end users.

Fortunately, thanks to the In-Service Software Upgrades (ISSU) capability in Cisco Catalyst 9800 Series Wireless Controllers, an option exists where IT doesn’t have to schedule an after-hours upgrade, and end users won’t experience any downtime during work hours.

ISSU leverages Stateful Switchover (SSO) redundancy and intelligent AP reboots to minimize or eliminate downtime for end users. Protocols like 802.11v help gracefully transition clients from rebooting APs to neighboring APs. When combined with the controllers’ Radio Resource Monitoring (RRM) neighbor relationship intelligence, and a knowledge of where each client is associated, the Catalyst 9800 is uniquely capable of reducing or eliminating service interruptions during AP reboots.

After Cisco DNA Center triggers and monitors the ISSU workflow, here’s how it works:

  1. First, Cisco DNA Center sends the golden image to the Catalyst 9800 from its internal image repository. The Catalyst 9800 installs it on both chassis in the SSO pair, and DNAC initiates the AP pre-download.
  2. Next, Cisco DNA Center starts ISSU on the Catalyst 9800. The Catalyst 9800 leverages SSO to upgrade both chassis,  sessions to the APs or disrupting any clients.
  3. Finally, the controller begins a “staggered” rebooting of APs in small increments. Network administrators select the percentage of total APs to reboot in each iteration. The controller also avoids rebooting neighboring APs and uses 802.11v to gracefully roam clients. Cisco DNA Center monitors the progress and commits the software upgrade when completed.

Customer Zero team assesses ISSU’s impact 

To measure ISSU’s impact on wireless network upgrades, the Cisco IT’s Customer Zero team tested and validated this feature by upgrading two campus wireless networks (comprising about 725 APs). We then validated the success of the software upgrade by measuring the following metrics:

  • Client count and client health: AP reboots did not cause any disruption to client health or client count. Client health was within our normal range (upper 80th percentile to lower 90th percentile). Client count was comparable to days on which no software upgrade was conducted.
Diagram of In-Service Software Upgrades for network upgrades
ISSU timeline overlayed on client health (top) and client count per SSID (bottom)
  • The Cisco DNA Center machine learning (ML) engine securely learns the baseline state of a network across several parameters. We chose to focus on onboarding time and onboarding failures. The Cisco DNA Center ML engine did not register any deviations from baseline network onboarding times and onboarding failures. Further, the baseline was comparable to that of the two previous days.
in service software upgrade machine learning
No deviations from expected Cisco DNA Center machine learning baselines
  • Staggered AP reboots distribution: We looked at the output of “show AP upgrade” from the Catalyst 9800 to see which APs rebooted during each iteration of the staggered AP reboots. We wanted to ensure the distribution was balanced across all buildings and floors. The Catalyst 9800’s execution of staggered reboots closely matched predictions. We chose to reboot 5% of all APs during each iteration and expected reboot counts of three APs/floor/iteration in the first network, and two APs/floor/iteration in the second network. Below are the detailed numbers for the second network:
    • Predicted: (278 APs x .05) / 7 floors = 1.98 APs/floor/iteration
    • Actual:86 APs/floor/iteration

Testing validates several ISSU benefits

Cisco IT has committed to a NetDevOps approach, in which we strive for automated changes with predictable risk. As we approach this goal, we gain the ability to make network changes at any time – without compromising our service level to end users or overburdening our network engineers.  Cisco DNA Center, Catalyst 9800, and ISSU allow us to replace “fear of change” with newfound confidence based on more predictable outcomes.

Customer Zero also conducted a detailed analysis to validate the benefits of ISSU, with the following findings:

  • Enhanced end-user experience: We received no complaints or helpdesk tickets during this upgrade. Furthermore, no incidents have been retroactively tied to the upgrade. In fact, for as long as we have been using ISSU on the Catalyst 9800, we have never received a single end-user complaint.
  • Improved IT experience: With ISSU, there is no need to schedule wireless infrastructure upgrades outside business hours, or to schedule Wi-Fi network downtime.
  • Ease of upgrade: Cisco DNA Center and the Catalyst 9800 work together to automate the ISSU process. Network engineers can trigger upgrades with just a few clicks within the Catalyst 9800’s Software Image Management (SWIM) feature and monitor progress in the SWIM dashboard.
  • Improved security: Easier upgrades help ensure that the wireless infrastructure is always current, eliminating security vulnerabilities due to outdated software.
  • Network downtime avoidance: Intelligent, staggered rebooting of APs reduces or eliminates Wi-Fi network downtime for end users.

Smoother process. Happier engineers and end users. Enhanced security. ISSU and the Catalyst 9800 combine to take the dread out of wireless network upgrades.

Inside Cisco IT Blog



[1] Control and Provisioning of Wireless Access Points