Intro Catch up
In a previous post, I described how we used Nautobot to serve as a dynamic inventory for Nornir, using a Nornir plugin made for this purpose, to build our client location history for the legacy access network. As I mentioned there, we were preparing to migrate our metro area Access Network to an SD-Access network (Software Defined Access), a custom vendor solution that allows a network to operate in an overlay architecture and define Access Security Policy in an intent-based manner to be deployed to the network using a controller, in this case, Cisco DNA Center (now rebranded as Cisco Catalyst Center).
What is this post about?
This post describes how we used python code to make sure the migrations from the Legacy Network to SDA went smoothly and no hosts or users were left behind. The post sets the scene, explains the concepts, the ideas behind the reasoning in the code, provides access to the repositories where a public version of the code is available to share as well as the utilities to help setting this blog post up and concludes with lessons learned and what’s next.
Reality check
Let’s clear up something right away. Migrations checks are not the most important or the most difficult part of this project. There were great technical challenges to overcome in networking and security and also equally great challenges in the field of project management, as it included an enormous list of tasks to coordinate, human resources to manage and even plain old grunt work.
However, the migrations themselves were the part of the project where the result of the work of the engineers involved finally touched the users, and in this way came into the light for everyone to see. In this context, the migrations checks are the means with which one makes sure that the enormous amount of work everyone put into the project is not diminished and devalued by a few bad results.
I realize that such a statement may have been more suited for the end of this post. But I think it’s important to set the scale from the start. With that out of the way, let’s look at the environment for a bit.
The playground
SDA Components
While Cisco DNAC is the tool used by network admins to define access policy in an intuitive manner, but also monitor and manage the network in total, Cisco ISE plays an equally important role in this architecture. DNAC and ISE are tied close together in it:
- DNAC and ISE are integrated through the use of PxGrid.
- DNAC serves as the place where intent is declared by network admins turning the network into an IBN.
- ISE serves as the one receiving those changes in policy from DNAC, translates them into config and pushes the config down to devices,
- ISE is also the place where authentication policies are defined in detail and the one in charge of the authentication & authorization processes.
- Both DNA Center and ISE offer a significant number of programmable and querible APIs for every part of the network management cycle, allowing for smart integrations with other products but also custom code.
I really won’t go into how SD-Access works and how we set it up for our environment, as besides going off topic, it’s also classified info. You can find a ton of information about it (it will take time to really get everything you need beyond the marketing summary, and please don’t decide what to do based on the marketing summary) in Cisco documentation, the channel I linked, but also in the good old Cisco Live On-Demand Session Library, where every single use case you may have thought off has probably been addressed. Also, let me remind once again, I don’t work for Cisco, or sell/endorse their products. This is about automation and integration and I am just describing the playground at the moment.
Is that really what you think?
Ok ok, if you definitely need to hear this, yes it’s a vendor lock-in and unless you definitely know you need it, don’t do it, there are a lot of pit-falls to avoid and lots of issues to discuss about design but also about deployment and operations. However, I have to say that in the current time and place, not having a way to segment your access network is a bad idea. Also, it works! And it’s cool (and expensive)! So my advice is choose your vendor according to what you need, get used to the idea, take a deep breath and dive in. Good luck!
Moving stuff around
I think it’s enough to say that after going through:
- the study & the planning phase,
- the phase where deciding and buying occurs and the project really starts, and
- the phase where the main components are laid out and the management tools are set-up and configured,
there comes a time where a lot of migrations take place (A LOT!).
(Did I mention it wasn’t not really a greenfield deployment?)
In our case we had to deal with two different kinds of migrations at each equipment site:
- Migrating the legacy devices from old network racks to new network racks without loss of end-host/user connectivity
- Migrating the users from the legacy network to SD-Access without loss of end-host/user connectivity
The most difficult part about changes in a network is to make the change totally transparent to what is serviced by the network.
The change itself was inevitable. The legacy network had outlived the life expectancy set by the vendor and the technology, the capabilities of the specific electronics components and common sense. Once the official start button was pushed for the project, it was nothing short of a merciless race against time, in order to reach that point ASAP, where the totality of users would be migrated to the SD-Access network, without losing anyone in the process and avoiding any case of service loss by the legacy network.
What else should I know?
One more thing to add: The documentation about the cabling of hosts was scarce, like which port on the wall or cable coming out of the floor corresponds to which port on the patch panel. It was also administered by a separate department, so we had neither access or any claim to it. In a few areas it was in a good state, so it was important to make sure it’s not tainted by the migration. In all other areas it was just in the mind and memory of fellow cabling technicians, so if you lost the cable connection because e.g. the old cable would come out while you were tracing it, it would be a nice treasure hunt, with people on both ends using signal generators and cable testers to recover it. We did manage to skip some of those, you can guess how (gathering data by automation and cross referencing with other sources with code, ofc, what else?).
Controlling Chaos
Imagine a network infrastructure growing all the time for a period of almost 20 years. A living network, deployed constantly to serve a growing number of users, spread around in old buildings with rich history but little prevision for cabling shafts and network growth, and little space to accommodate network racks at all the right places (in fact in some cases, there were no right places). Imagine also that the network grows in waves, adding new devices every few years, with different features, capacity, connection capabilities and access and security mechanisms.
Sounds like chaos, right? Well it’s how the real world works. You can’t always change the environment. What you may call a “mess” and an “obstacle“, others may call it “history” and “cultural heritage” (and they may actually be correct too, but they still need network and IT). Sometimes you just have to adapt to it. Network and Field Engineers often have to embrace this truth and when they do, they bring a lot of value in the organization, constantly striving to bring order into chaos, almost developing an extra layer of control and logic above the infrastructure:
- They know all the particularities of buildings, shafts, cabling and power
- They understand what is needed to deploy new services or extend existing ones to other areas.
- They know the capabilities and limits of the network infrastructure and how to try to normalize the user experience, as users move around and new users are added in the organization.
- They know it’s a mess, but can navigate through it.
Where does network automation fit into this?
Can network automation and programmability help in this kind of environment? Before I start to answer that question, let me answer another burning question (for some at least):
Can automation replace the work of a network engineer?
I think I can still say “Absolutely not!” and make sense to a lot of people (and even more will cheer me for it, because it’s what they want to hear). Of course I don’t know how long I will be able to maintain that position as things are certainly moving around us at break neck speed:
- https://www.nvidia.com/en-eu/research/ai-playground/
- https://www.cognition-labs.com/introducing-devin
For now however, I think we are safe to assume there is no way that an engineer that provides all the added value described earlier can be replaced by AI or automation code (or both). More than that, most times, my first choice in automating tasks is following exactly what an engineer would do for each task by hand. In that way:
- Tasks are standardized -> documentation/templates/version control
- Execution time is minimized, and so are errors.
- Code is easier to follow/troubleshoot/augment for network engineers by network engineers
- The engineers’ work load is alleviated, provided enough time and effort is spent to reach that stage with automation code (the well known problem of spending time to make time or reaching critical mass).
I do have to say that I am afraid that there is a chance, by the time this blog post is published, that things may have already started to change about this, but I hope we still got time ahead..
- https://www.ericsson.com/en/network-automation/network-automation-and-ai
- https://www.cisco.com/c/en/us/solutions/artificial-intelligence/artificial-intelligence-machine-learning-in-networking.html
- https://blogs.cisco.com/developer/using-the-power-of-artificial-intelligence-to-augment-network-automation
- https://www.redhat.com/zh-tw/blog/why-do-you-need-network-automation-ai-world
- https://www.juniper.net/us/en/research-topics/what-is-network-automation.html
- https://www.nokia.com/networks/automation/
So can it help then?
Absolutely yes!! Both kinds of migration demanded a great deal of preparation, flexibility in order to adapt for changes in planning during the time of migration and a way to make sure that no mistakes are allowed to remain and nothing (machine/user) is left behind. Besides the obvious need to avoid service disruption, when running such projects that impact a large number of locations and take a long time to finish (almost 2 years since project start), the credibility of the decision to migrate but also the credibility of the network engineering team itself, are both in the balance. I often paraphrase an old western movie quote: “A good network engineer is an invisible network engineer” or to put it in simple terms: “If all goes well, users should never have to see you or talk to you” (although that doesn’t really work in your favor, does it?).
So although automation and programmability can’t (or shouldn’t) replace the human engineer, it can make their work a lot easier and can help produce more reliable results, time after time after time. As I usually say (I have said it before about using Cisco PyATS for network migrations), “it can also help you get home sooner and sleep better“. At least it does that much for me, time and again!
Can I use the code linked here to try this out?
Well.. Yes and no (or more likely no and yes). The integrations and the code mentioned here were developed to serve a real project, with real teams doing very real migrations for real devices and real users. Even if I tried to develop a simulated network where you can test the nornir inventory that is produced from Nautobot, there is no way I can simulate the user data collected from the switches or ISE. I also don’t have a sandbox to provide where DNAC and ISE are running with mock data that you can query (But hey, you can still query the Cisco Devnet Sandboxes and explore the apis!)
So you will be able to clone the github repo mentioned here and I did my best to provide mock data so you can better understand how the code works. I will also try to produce some pics to help understand the code flow and the philosophy behind it. But after that, you are on your own to try to apply part of this for your own use case, whether it’s an exact match or something similar. It’s not a product. It’s a concept. (You can still contact me to try and explain things better, try twitter).
Is the code any good?
Well, it’s not pizza. Programming is not my main focus. I do have some level of skill, too low to earn a good living as a programmer but good enough for a network engineer considering the level of knowledge most network engineers have. I am nowhere near the level of known trainers on this field like Kirk Byers, Nicolas Russo, Eric Chou, etc. I just like making things that work for a specific use case, especially across tools and fields, and then share it with others. Not for the glory of it. For community’s sake. I think we all get better and rise higher when we share (“We are all in this together!” – Stuart Clark, philosopher).
Also it’s about spreading the message that “if I can do this, you probably can too!“.
Can I use this at all?
There is a chance you have a case that matches up exactly. A very slim chance. But it’s there. If you were searching for it on Google and came here, then congratulations, this is it!
Besides that, you may have things to borrow from this, such as how the structure previews that it can be reused either for a different use case, for example sending multiple commands over Nornir to a subset of devices tagged in Nautobot or a single command over Nornir to a subset of devices tagged in Nautobot, treating function names as parameters to define two different levels of flow and control, handling output in different environments in the same code as it can be used to display data in the cli, or store in files or send output to chat tools (MS-Teams in this case). It’s true that I did develop the code with this mindset, thinking about my team. But it can be helpful to you as well.
Finally you may get inspired to do something different using a concept you see here. At least I hope you get some fun out of reading this. It may all sound confusing without examples so these will probably make sense later or when you look at the code more closely.
The text above also serves as a disclaimer before you take a look at the code. If the code gives you a headache or nausea, I am not to be held responsible!
Time’s up!
A friend of mine has often complained about how long my posts are.. This doesn’t fix it as the two next parts are published all together along with part 1 but at least it gives the reader the feeling that they can read to the end of .. part 1. So please move on to part 2 to read more about how the planning for the project.