When and where we met
It must have been over 1,5 years ago, more like 2 probably.. I was watching one of @dmfigol‘s video streams on Twitch (most of them also published on his channel on Youtube ), where i heard him mention Netbox for the first time..
So I looked it up! I won’t go into Netbox history here, but essentially Jeremy Stretch is the one who created it and started to maintain it with a small team of devs. He was developing it while working at Digital Ocean and he was writing code for it full time at NTC- Network to Code. As this post hit the internet, a good friend whose work is coming with references in Part 2, let me know that Jeremy now has his own company offering Netbox Services, NetVerity : https://netverity.dev/ . Best of luck Jeremy! I hope you do great!!
I knew immediately this was a unique tool. It was also the first time I had heard of the term Network Single Source of Truth (NSSoT) or SSoNT (Single Source of Network Truth).
You see, I have been working for a long time in an institution where network documentation plays a really important part and it is usually where audits are based on. In most cases, there were no systems to cover that role.
What role is that?
Most of the time in the past, network teams either used static files, like excel or doc files, or used their own NMS or Fault Management systems as inventories, and when they needed data, they exported it out of those NMS databases.
Since it’s hard enough to catalog that kind of data or keep them up-to-date, most teams either didn’t know or didn’t care about the difference between what is active and discoverable on the network and how it’s supposed to be or designed to be. When you have systems that can keep discovering the network continuously exactly the way you want them to, ignoring that difference or pretending it doesn’t exist, becomes an easy choice.
But the moment inevitably comes when that difference plays a big part:
- Sometimes the data is out of date.
- Sometimes, the network is at fault or evolving, so the discovered data doesn’t tell the exact truth.
- How do you keep track with network equipment that is not even active on the network, e.g. in storage?
- What about the design or the intent of the Network Engineering Team? The Delta (difference) between the intended status/design of the network and the current status, tells a story that is important to tell and being able to capture that story is equally important.
Is that all?
No, there is more . What happens when you want to to plan ahead about IP address space? Vlans? Rack space? There have been tools out there that usually try to capture what has already been deployed, to help you manage fault and status. But what if you want to plan first? Isn’t that how it’s supposed to go? Plan first?
And then comes the time where you want to manage the data itself. Use it, export it, maintain it. All tedious and error prone processes. So what do we need for those kind of processes? You probably guessed it: Automation and APIs. Netbox can offer those.
Didn’t you say onboarding?
It’s easy to recognize all the good things about an SSoNT system and, in this case, Netbox. There is a lot of buzz about it lately and there have been events (Netbox day, current state of Netbox), demos/podcasts/videos (like Your network can handle the truth with Netbox) and blog posts (The Truth Will Set You Free with NetBox) about it from a lot of well known people or teams in the Network Automation Community that make a great case for it.
Omg, another evangelist..
I wouldn’t mind that, but it’s not what I am doing. Like any other engineer, when I see a great tool, I get excited and want to get my hands on it and try it out for myself. However, the most difficult thing about flying, is getting off the ground.
I showed the videos, demos and docs to my head of section and he thought it was very promising, I even got a test installation going. It looked like it was exactly what we needed to cover that role of Dynamic Network Documentation, easy to query, share and maintain.
But if we had to do everything manually everytime, all the data entry, the updating, the comparison to the actual network status, then we would not have gained that much. Our habits would remain the same, only our tools would be different.
So what exactly is the problem?
In such cases people get exited about new tech and don’t realize that in order to innovate in your work/production environment, the problem is not about knowing well the inner workings of the tech, working with languages, APIs, etc.
The real problem is realizing how you need to change the way you think and the way you work. Of course, this comes down to processes and methods, doing things differently 360 degrees, but it’s also about people and mentality.
The people problem
It’s very hard to convince people, the people you work with or the people you work for, not just your bosses but also your internal “clients”, that things need to be done differently, especially if they feel “safe” in how things are being done so far. Maybe they don’t share your vision on how you want to do things or maybe they fail to see the shortcomings and issues in the way you are doing things so far.
I hate to be the one to break it to you, but like I have often said before, no matter how much automation and change gospels you try to shove down people’s throats about what’s coming down the road and why they need to change their mentality, they won’t change their minds, just because you say so!
Of course, you DO need to say so! So when they do realize what is happening, they will trust you to help them on that path. But before that you have no other option than to … just build the damn things! On your own if necessary, or with who ever is willing to help. They will understand later. Seeing is believing.
I am not saying you should do this behind your boss’s back and get fired or loose your credibility. Do what you have to do to get the minimum amount of approval and resources to get your point across and then get to it!
The system problem
By system, I mean everything you use to do your work. Tools, processes, habits. You need to rethink all of it. If you really want to take advantage of what Netbox has to offer, it’s necessary.
There is always the possibility that you were already aware you needed a tool just like that. And you found it with Netbox. But let’s be realistic. Most of us didn’t really know that we needed it until we saw what it can do.
So why do we need to rethink everything? Well, you may already have tools, habits, documents that you use/apply to do things. Most of them were never tied together or used to exchange data. In our case we have two different NMSs, Cisco Prime Infrastructure (just trying to define exactly what category Prime belongs to is sure to get you in a fight or two), excel files, documents, contracts, etc.
Can you be a little more specific?
- 2 NMSs for Network Fault Management (sorry, can’t say which).
- Cisco Prime Infrastructure for Device Inventory, Configuration Change Management, Config Archiving, Vulnerability Alerts, Device Fault Management, Compliance, Log handling and Automation, and many more
- Excel files for IP Address Management and Vlan Management
- Excel files for Datacenter Infrastructure Mangement (Racks)
- Documents and Excel files for Support Contracts
- Prayers for not forgetting things and how they are tied together
Ok, ok, please get back to the system problem
Right, so besides the problem of using overlapping and disconnected tools and processes, which can lead to duplicate or multiple copies of the same data that you normally have to crosscheck every so often (but, come on, you never do that.. right?), what happens when you need to do fact checking and your different “sources of truth” don’t agree with each other?
I am not the first one to say it, but in order for this to work, you do need to pick only one source of truth exactly for each kind of information. Only one should be “right” in every argument and every situation at any given time.
You don’t have to choose the same source for all kinds of data, but it’s better if you manage to do so, for organizational reasons.
So there are your first battles right there, your first questions and decisions. How exactly does the information flow along with method, habits, systems and which source of information should be right.
Those and the answer to the following question will help you reach the first step. It’s actually also a decision you have to make.
What will you use Netbox for?
You must decide now. Hint: You can start small..
We decided to go with the IPAM set as a start (back when we were still thinking about using Netbox). What does IPAM mean? You will find a lot of definitions on Google.
The Wikipedia definition for IPAM is one I like a lot, but you can find a lot of other ones, by Infoblox, ManageEngine, Bluecat, Microsoft, etc. What do all of those have in common? They all correspond to the structure of those companies own application product for IPAM! (I mean really.. DNS in IPAM? Get over yourselves please).
I would rather read what Netbox documentation says about the IPAM functionality of Netbox, here. The letters mean IP Address Management, so essentially its about how you organize your ip address space for your networks, the ip prefix hierarchy, the individual ip addresses assigned to devices and interfaces and the connection between ip prefixes and vlans. You can also do VRF and Vlan management as well!
Netbox is one of the few key products that can give you a connection between IP Prefixes and Vlans. Very important for network engineers! It’s also open source and free and very powerful and programmable with APIs. I don’t see why you should bother with any other product for that kind of thing.
So the first thing you need to do, is figure out how you are going to organize your Sites.
Create Sites? What do those have to do with IPAM?
I know, it sounds confusing.. The thing is that every tool, every piece of work done by anyone, contains a small portion of information they consider is a given, so they never say it out loud. What I mean is that you have to start somewhere when entering data and continue a certain way, taking into account how your different types of data connect to each other.
I can give you several examples (ip prefixes and vlans, devices/interfaces/ip addresses, etc) but about the ip prefixes, the truth is you could start typing them in as soon as Netbox is installed. However, at some point you will need to separate your prefixes per site/location. If you don’t do this at the very beginning, you will have a problem doing it in the middle of your data entry and at the very least you will be going back and forth, correcting and modifying entries.
In other cases you may run into problems when trying to synchronize data between Netbox and other sources containing location information for your network data : devices, racks, ip prefixes, vlans, everything there is normally tied to a Location or Site. So you better start thinking about that.
We went with a geographical hierarchy in Netbox for two reasons. In my company this makes sense for the network, location information is already in numerous other tools and we can use the possibilities in Netbox to enter geographical locations and regions so we can link to addresses on maps (like Google maps), when we want to use that address in a letter to a partner for example.
How did we create the sites? You can insert any data either directly on the GUI, use a CSV approach with the import option on each relevant item (be careful, the format is specific and there are mandatory and optional values you have to put in), or via API using the REST API, pynetbox or Ansible. All are valid options we can discuss in more detail in part 2. We just entered our main sites by hand on the GUI. The rest went in via API (pynetbox).
Great.. what about IPAM?
For IPAM we could have gone a lot of ways. But we did keep around a set of ms excel files for 15 years containing just that. Which is a big problem in itself.
Don’t get me wrong. I like Excel. To store data and process them numerically. Which means using it for what it was originally meant for (it’s no Mathcad, no matter what anyone says, like these folks for example: Microsoft’s New Programming Language for Excel Now Turing Complete). I have even used it for a few years as a GUI for network automation with a combination of visual basic for applications (embedded VBA in ms office apps), DOS scripts and TCL-Expect, allowing even inexperienced employees to perform basic network troubleshooting, tailored to our company’s network, or gain a lot of time during operation procedures (like massive password changes alont with testing).
Even so, I can go 15 rounds against anybody about why using Excel, as a way to store data for automation, is a very bad idea. I sound like an aggressive sob there, I know.. But just imagine thinking you have entered your data in there and went ahead and run your automation code without doing sufficient testing. Excel can throw your code out the window, before you know what happened. It’s all done to “help you” but when it happens, believe me, you will have plenty of feelings but gratitude will not be one of them.
Let me give you a couple of examples and then let’s drop the whole thing:
- You just start typing IP addresses in cells. While the ip addresses have a form where in the 2nd, 3rd or 4rth octet there is at least one number with less than 3 digits, everything is fine: e.g: 192.168.1.12 and the ip address is stored as text, so great so far. However when there are 3 digit numbers in all of those three octets, Excel thinks you are storing a number as text and thinks it should “help you” by removing the dots, without telling you or giving you a choice, turning the ip address into a number. Result: when you finally use those data for automation with code, your code crashes without you being able to understand why, unless you use more testing. When you do get what happened, what can you do? You will either waste a lot of time correcting data by hand, or use a different piece of code to remedy the situation (not that easy, as it requires some form of pattern matching, regex or other).
- The data is entered by different people or the same person, but at different times. Result: columns differ between instances as spelling, case, word choices can be different. Again, you have no way of knowing when running code as “it sort of looks the same when I look at it!”. Looking the same and being the same is different. When using code to automate, you have to be able to anticipate things at a specific format, not be able to react to arbitrary conditions (that would take an AI or some ML would it not?).
In short, when you enter data from excel in an environment such as netbox using code, you have to normalize the data first. Make sure the format is identical everywhere. Or it will not work. In fact, I suggest also learning how to back up and restore your database in Netbox, before you start playing around with entering or updating data automatically.
For IPAM, if you haven’t guessed it already, we used python and pynetbox, a great SDK for Netbox to enter IP Prefixes and IP addresses in that order. We also used some extra Python libraries to help us determine network information from ip address and mask and vice-versa. More information on that in Part 2 of this series.
DCIM means Data Center Infrastructure Management. I won’t site any company’s explanation or product. This is not a marketing campaign and you are all more than capable of doing your own google based research.
In Netbox, DCIM is essentially about Devices and interfaces/components, Device Roles, Vendors, Device Types/Models, Cables/Connections and Power. To be honest, that’s where we first started our data entry. We entered our devices in Netbox, after having them exported from Cisco Prime Infrastructure in a “CSV” format (be careful, those are not commas!). Then we used the import capability to get them in.
I just wanted at that point to delay my dive into code with either the REST API or the pynetbox SDK, which I could not figure out at first. But even with this approach, I ran into a lot of issues, as I realized the intricacies of it and how I needed to define a lot of things first before importing the devices. More on all of those in Part 2, but lets say for now that Roles, Manufacturers, Device Types and Platforms come before the devices themselves.
You may have guessed, if you used the link above that DCIM as a term is not mentioned explicitly in the Netbox documentation. So why am I mentioning it here? It’s not just for completeness. You will understand when you use the API or the Netbox shell. More in Part 2.
Oh, that was a present, really! Again, lots of data in excel files over the years (15 or more years of data), poor maintenance, multiple version of files with the same data (another present from the excel gods, use databases people please, if you can’t then go for csv files or text and use version control like git).
Netbox helps a lot with this, but besides being the single source of network truth, there’s more you can do with those, as you can define endpoints for circuits, thus defining your topology! There used to be a capability for creating dynamic topology graphs from that, now it has been removed from netbox, but you can have it if you install a plugin for it.
What did we do? Again, python scripts for importing the data with pynetbox. More on that (you know, right?) in Part 2.
There are other ways to store passwords, for sure. Vault type of software (Hashicorp Vault, Symantec Vault, etc) is one solution for it. You can easily do this with netbox too, though. I won’t go into that but it’s an interesting idea to retrieve your passwords from Netbox automatically before using automation on something else (devices included in netbox or other platforms).
You can develop your own plugins, of course. But there is more, you can also develop forms that run some custom Python script also getting data from Netbox about your network as input for your script. You could turn Netbox into an automation platform if you want. But I don’t recommend that. Netbox does not belong in the OPS category. Leave it in Live Doc.
Custom menus is also an option I believe and NTC does Netbox customizations for customers, as far as I know.
The Infrastructure and Schema
You need to identify the sections that Netbox uses to categorize information. You will realize that the Netbox API (along with corresponding tools, like pynetbox) uses the same structure. Those sections are:
Organization/ Devices (DCIM) /IPAM / Virtualization / Circuits / Power / Secrets.
Under those sections, there are subsections per type of information, where you can find fields of info. Those fields correspond to specific items of information tied to your infrastructure. I already mentioned this, so pay attention: It’s important to identify those that can help you tie your instance of Netbox to your other sources of data, your “external keys” speaking in DB terms.
Just remember, getting the data in just once, is not going to drastically change things for you. I already told you, you need to change the way you think and the way you operate. That means you need to keep updating the data in netbox or use netbox data to update your other sources. Those fields are the key for that. You need to make sure you don’t compromise them. For example, if you use device hostnames for that, they need to be consistent across tools. Same thing for Site/Location names, etc.
There are two main options for where you want to install Netbox on. You can go with installation on a Linux machine (VM or physical) or install on Docker. The netbox-docker is considered a separate project, that comes with different flavors (tags). For example, if you want LDAP/AD integration you need to use a specific netbox docker image, tagged accordingly. I will let you know more in Part 2, but if you do wanna go that way on your own before we talk again, be sure to read the project wiki! Not only will you understand more about what you need to do (LDAP or not), but you will also get access to more interesting information.. I won’t say more for now, but go ahead and take a look if you want.
I have used both methods. I started without Docker. Pretty soon, I figured out that while trying to keep your system in a good state, if you want to make sure you don’t cause trouble with your mistakes, you need a backup/restore procedure, not just for your data but for your system, as well. You sure don’t need me to tell you that doing this with a physical system in this day and age is too much trouble. Using a VM, your obvious choice is using snapshots.
This can be helpful, but doesn’t make it safe, or give you the flexibility that Docker does. Infrastructure As Code, remember? You can rebuild your system from scratch in minutes, they way you want it to be built. Mess up, delete, go again. When you are building your system, Docker wins all the way. There is only one thing to seriously take under consideration: Netbox Upgrades. Sometimes Netbox-Docker’s components change from one version to the next. The data are held in Docker volumes. If the version of the files are incompatible with the new component versions, you get failures (yeah, I got a few, especially with PostgresSQL and Nginx). Even if your customization only includes LDAP and HTTPS, you still have to do a lot in order to get your Netbox on Docker instance upgraded.
So the plain form of Netbox on Linux (I use Ubuntu) can be useful. I keep one as a test system just in case. It did come in handy a while ago during an upgrade so I could keep my data intact while PostgreSQL was updated. The upgrade script included in the original Netbox code usually takes care of those issues. Just don’t wait forever before you upgrade. Also, better test it first. Don’t turn your production system into a brick!
After getting through the fist steps, I started with a setup of a Netbox instance on Docker as prod system (which is very fast btw, very good performance, better than the regular one!) and a regular Netbox instance on Ubuntu 18.04.4 LTS as a DR system, using PG dumps to export/import data between systems for syncing. Now I am using Docker for both instances, I have developed syncing scripts to be run with Cron and I am keeping the regular version around as a test system, just in case.
If you want to keep a regular system along with a Netbox on Docker system, make sure the regular version and Docker – with ldap or not – are on the same number: check the release notes page and then go over to the docker hub page for the netbox-docker images to check for the ldap one. Usually Netbox-Docker follows the regular version in upgrades after a few days, but better make sure to look before you leap!
More on that.. well you know where but maybe Part 3 for that!
The Ways In
As already mentioned, besides using the GUI to enter data, you can use import for CSV data, the native Netbox REST API, Pynetbox and the Netbox shell. In our case. we used import with CSV for very few cases and for the rest went with Python and Pynetbox.
The Big Picture
The Big Picture for me is an Automaton. Netbox is right at the center of it as a SSoNT (Single Source of Network Truth) and stays connected with every other source of information on the Network doing complete and regular update cycles. The flow of information is designed and tailored on the philosophy of the team running the Network. Let me give you an example:
How do you make a new vlan?
- Create it on the network devices first? Then Netbox will be one step back and will need to be updated.
- Do you want to create the vlan on Netbox and then let it be created on the Devices? Then you need to be thorough in Netbox about where that vlan is present exactly (Devices, interfaces). Then you can run update cycles towards your devices using Network Automation tools and techniques.
- Do you want to create it in Netbox and then create it by hand or automatically on the Devices? Then you have to run checks for consistency between Netbox and the Devices and alert for differences, or even offer the ability for automatic remediation if you want to.
- You could even keep your old sources of information and recreate them as copies each time you build netbox from scratch, just to give your team a means to have a querible online version of the data, that can offer an access rights schema according to their place in the organization. But I don’t recommend that, it’s so much more powerful when you give it the role it was supposed to have.
The point is that you need to automate your flows and create update procedures. What would be the trigger? Change.
For example, a new device is installed. It goes live and creates log entries on your NMS systems and is added as a managed node.
Then your automated consistency checks pick the change up, and alert the network admin team to consider adding the device to Netbox, or go a step further and add it in using some logic.. draw your own picture there.
So what does that mean? It means you need a codebase, constant updates to it, a delivery mechanism and automation. It does look like a CI/CD system and NetDevOps at its’s best, doesn’t it? I can tell you one more truth about that.. It takes time and effort. A lot of it. But you don’t have to reach the end of the journey to benefit from it. Whether you are a one person team, or a full-fledged team of NetDevOps Engineers, you can reach a point where this is useful to you. Create a plan and stick it.
Sorry this is long enough, examples come in Part 2. I will try to get the code developed already tested up to version 2.10.3, as it was created with older Netbox versions in mind. On 2.10.4 there are big changes (no more classic Nginx) so I still need a little time there.
How much time?
Well at least 15 days, 3 – 4 weeks max I think. I will do my best, but can’t promise much right now. So what do you do until then?
Besides checking out all the links I gave you, you can go on and check Ethan Bank’s video on installing Netbox and doing a walkthrough in his home lab or check a demo site with a a fairly recent version (2.10.1), so you can get a feel of Netbox before you take the plunge. Here is a small post about the demosite by packetpushers.
You can also join the Slack channel for Netbox and Netbox on docker from the NTC slack workspace (join here), and start watching those spaces for the latest discussions on Netbox and how you can use it.
Don’t forget to check the Netbox-docker wiki and check all the info included there!
I could give you more links now but that would ruin the journey (I can’t wait to tell you about a very special Gentleman and his work about automating data management with Netbox). So hang in there, I will be writing more soon.
Netbox is a great tool that you can slowly integrate in your setup to help better organize the information about your network, using APIs and other automation methods, and tie it to your tools for added benefit. I have already done as much with my team, so I try to pass on the information so I can help others.
So what are we doing next time? We have discussed the matter enough, so time to apply what we talked about. Examples per step, with sufficient explanations on the choices made, issues that came up and how they were dealt with. The relevant code will be published, either with the article or a small amount of time later. We will also give references about resources used to understand the issues and create the code.
Part 3 will be about updating and creating the automaton. But that may take significantly more time..
On finishing up let me say again that I found out about Jeremy Stretch’s news as the post hit the internet, from a good friend. I think Jeremy deserves a great start, so here are links to 2 of his posts, an Introduction of NetVerity and the availability of Netbox as a Service on the Digital Ocean Cloud platform.
Cu soon! If you need to ask anything, you can find me on Twitter under @mythryll.
PS: Just a side-note. If you read the wiki, you may have probably noticed a section about how to use TLS with Netbox. Let me just say that I disagree with that approach entirely. Using an external reverse proxy just to add TLS/SSL functionality to a web server is not a good idea. If you want security end your tunnels on the web server, not a step before. Also if you want to put an independent reverse proxy ahead of your web component, you can do it regardless of where you terminate your TLS tunnel. If you want it to be independent, then using a container on the same docker host, is a poor and dumb man’s HTTPS solution, not a good plan for security. If you want to outsource your security to a stronger component, go ahead and use another platform ahead of it, probably consolidating security for a lot of web servers.
Under-using components because you want to avoid configuring them is a bad idea and has consequences on scalability, resource management and security. Make sure you know why you are choosing each component and for which reason.