{"id":922,"date":"2019-12-11T22:13:03","date_gmt":"2019-12-11T20:13:03","guid":{"rendered":"http:\/\/www.mythryll.com\/?p=922"},"modified":"2020-01-19T21:26:43","modified_gmt":"2020-01-19T19:26:43","slug":"migrating-from-mrtg-rrd-cgi-to-grafana-influxdb-telegraf-on-docker-containers-for-enterprise-network-performance-monitoring-with-snmp-part-1","status":"publish","type":"post","link":"https:\/\/www.mythryll.com\/?p=922","title":{"rendered":"Migrating from MRTG\/RRD\/CGI to Grafana\/InfluxDB\/Telegraf on Docker containers for Enterprise Network Performance Monitoring with SNMP &#8211; Part 1"},"content":{"rendered":"\n<p>(if you are looking for part2, it&#8217;s <a href=\"http:\/\/www.mythryll.com\/?p=963\">here<\/a>)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Introduction <\/h2>\n\n\n\n<p>I like reading guides, not writing them (it takes too long for me). So this isn&#8217;t a guide, it&#8217;s more like an adventure tale, it&#8217;s a &#8220;I got out one day for a stroll and look what happened!&#8221;. I had no knowledge of Grafana, InfluxDB, Telegraf or Docker before doing this, apart from a simple definition for each term. The previous monitoring infrastructure was old, difficult to maintain and limited in capabilities, its components slowly dying and out of support. Furthermore, new monitoring needs and novel tech such as Streaming Telemetry and Network Automation were almost impossible to integrate. A big change was due, and it needed to be done soon.<\/p>\n\n\n\n<p>Everything that follows was implemented in a production environment, although test VMs were used for the first tests on installing and using the software.<\/p>\n\n\n\n<p>For those of you that don&#8217;t know me, I am a network engineer, I mostly work with Cisco Network Equipment and Software, I have a strong background with Network Performance and Fault management and Monitoring and I try to have an open mind, combining tech from different areas to solve operational challenges and deploy new services. I also have a keen interest in Network Automation. You can find me on Twitter under the @mythryll handle.<br \/><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A little history<\/h2>\n\n\n\n<p>A little over a month ago (maybe two) I took a big plunge. I had been working at my current employer for more than 15 years and almost since the beginning, bringing in a lot of experience with SNMP, I had deployed SNMP network throughput monitoring on <a href=\"https:\/\/oss.oetiker.ch\/mrtg\/\">MRTG<\/a>, the well-known network traffic grapher created by Tobie Oeticker, probably the best friend of most network engineers for a long long time. Deploying MRTG was also the start of my involvement with Linux, as I used OpenSuSE linux for it, putting everything (MRTG+Apache) in one server box, and began a parallel life as a semi-serious Linux system admin, taking care of my systems. <\/p>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"531\" height=\"194\" src=\"http:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/mrtg.png\" alt=\"\" class=\"wp-image-936\" srcset=\"https:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/mrtg.png 531w, https:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/mrtg-300x110.png 300w\" sizes=\"auto, (max-width: 531px) 100vw, 531px\" \/><figcaption>A familiar image for most network engineers<\/figcaption><\/figure>\n\n\n\n<p>That monitoring infrastructure became larger little by little in order to provide metrics for every port in the network (a few thousands) for every network device (a few hundreds). I added more servers, spread out, using physical at start and then migrated them to virtual machines as Vmware took over the Datacenter (I had a hand in that). CPU load, Memory, Temperatures, PS Status, pretty much everything important that had an available SNMP gauge parameter was displayed, even custom metrics produced by running Perl scripts on the core switches (measuring MLS QoS queues on Cat 6500s), or composite metrics (products of several SNMP gathered metrics). <\/p>\n\n\n\n<p>During the migration to the virtual world, the servers were trimmed down as much as possible and RRD\/CGI was added to the mix, in order to take the load off from creating PNG files constantly whenever transforming SNMP metrics into graphs. Instead, by using the MRTG-CGI script created by  <em><a href=\"http:\/\/www.fi.muni.cz\/~kas\/\">Jan &#8220;Yenya&#8221; Kasprzak<\/a><\/em>  and using RRD to store the data, monitored object web pages and graphs were only created on demand. I had created a decent hierarchical site structure, calling the <a href=\"https:\/\/www.fi.muni.cz\/~kas\/mrtg-rrd\/\">MRTG-CGI <\/a>script whenever every link was used, with just the right parameters for every object in order to dynamically create the page for it. I finally added a little Javascript in order to create a light structure, added AD authentication and migrated to Https for securing the access for the right users and groups. <\/p>\n\n\n\n<p>One central web server (Apache) served as the website skeleton and three measuring servers were running MRTG\/RRD\/CGI for gathering the SNMP metrics and serving them through Apache by responding to requests from links used on the main web server.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The old dog<\/h3>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"alignleft is-resized\"><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/townsquare.media\/site\/112\/files\/2014\/03\/dog-Eric-Issel%C3%83%C6%92%C3%82%C2%A9e.jpg\" alt=\"\" width=\"195\" height=\"209\"\/><figcaption>Can you teach an old dog new tricks?<\/figcaption><\/figure><\/div>\n\n\n\n<p>I had reached the limit for that infrastructure. There were constant problems, RRD files becoming corrupted and having to be deleted, file rights often overwritten, system files updates would cause conflicts with perl snmp libraries used by MRTG and needed manual patching, etc. I had little notification of such issues, only connectivity problems would generate emails for the mrtg local user. Further than that, provisioning for new items to monitor was a pain, since every OID required one specific entry for every network node in the MRTG config (if you ever did that, you know how it is) and once that was done, the WEB UI part was also another task to complete, that usually stayed behind and out of date. Btw, I never liked other solutions like Cacti or Routers2 (sorry Steve), as they didn&#8217;t allow enough freedom with the UI. <\/p>\n\n\n\n<p>Upgrading the OS became a nightmare, as RRD files don&#8217;t behave well once the system has changed and so history could be lost. The time to perform the upgrade, set everything back as it was, copy over config and data and start the engine again, was too long. There would be gaps in the monitoring data and I had to look for such &#8220;free&#8221; periods in my schedule, that rarely came up. If the OS upgrade procedure failed (it often did) I had to revert to a snapshot or create everything again from scratch..  <\/p>\n\n\n\n<p>Updating perl modules with CPAN was also an issue, as the Security Division&#8217;s tools in my Organization would most often cut part of the access to the CPAN mirrors, resulting in file corruption (Security will get you, every time..).<\/p>\n\n\n\n<p>On top of that I had to learn a lot about Apache as well, as significant changes in the software made it necessary to learn the new access rights model and how it would affect the tool, and the dynamic part, based on CGI, was not getting any younger. The CGI script used for dynamic page and graph creation, <a href=\"https:\/\/www.fi.muni.cz\/~kas\/mrtg-rrd\/\">mrtg-cgi<\/a>, dated its latest changes back to 2003 and since 2014 there were even security issues involved, you can read about that <a href=\"https:\/\/www.cvedetails.com\/cve\/CVE-2002-0232\/\">here<\/a>.<\/p>\n\n\n\n<p>After I started playing with automation (python, <a href=\"https:\/\/www.ansible.com\/\">Ansible<\/a>, <a href=\"https:\/\/pynet.twb-tech.com\/blog\/automation\/netmiko.html\">netmiko<\/a>, <a href=\"https:\/\/nornir.readthedocs.io\/en\/latest\/tutorials\/intro\/overview.html\">nornir<\/a>, <a href=\"https:\/\/netbox.readthedocs.io\/en\/stable\/\">netbox<\/a>, <a href=\"https:\/\/developer.cisco.com\/docs\/pyats\/#!introduction\">PyATS\/Genie<\/a>, etc) I developed big plans to use Ansible to automatically manage the configuration part. I had already created python scripts to produce the web pages for each network node with every interface on it, began studying Django and DBs, which in fact could come in handy later, and generically sensed that such an old tech could not go any further. RRD is great but not so flexible and not querible. The look was also not so great but I told myself that I am not a web developer so that was ok. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Meeting with Grafana<\/h3>\n\n\n\n<p>I had seen Grafana before. I believe it was Cisco&#8217;s <a href=\"https:\/\/twitter.com\/SNMPguy\">Jason Davis<\/a>, that published pics from a Cisco Live Noc at some point (probably Jan 2018) and I began taking a look at it. Then Cisco Devnet&#8217;s <a href=\"https:\/\/twitter.com\/bigevilbeard\">Stuart Clark<\/a> mentioned it during a NetDevOps Live Session and I became aware of the fact that there are finally software solutions for network engineers who monitor the network and seek simpler and information rich tools to visualize performance metrics.<\/p>\n\n\n\n<p>At first I thought of going for a small step further, still using MRTG for the collection of data, but replacing RRD with a similar component, like <a href=\"https:\/\/github.com\/doublemarket\/grafana-rrd-server\">this<\/a>. Little after starting my search, I realized there was a solution that provided everything in one package, had enough simplicity and robustness to fit in with my goals:<br \/><a href=\"https:\/\/www.influxdata.com\/time-series-platform\/telegraf\/\">Telegraf<\/a>, <a href=\"https:\/\/www.influxdata.com\/\">InfluxDB <\/a>and <a href=\"https:\/\/grafana.com\/\">Grafana<\/a>, the so-called TIG Stack.<\/p>\n\n\n\n<p>I also came across a <a href=\"https:\/\/blogs.cisco.com\/developer\/getting-started-with-model-driven-telemetry\">post <\/a>by Jeremy Cohoe in the Cisco Devnet developer blog, about setting up Streaming Telemetry using the TIG stack to consume and visualize the data. As I wanted to deploy Streaming Telemetry for our network as well (I will not develop this here and now), I thought that stepping towards the TIG stack with SNMP, would probably ease that transition as well, later.<\/p>\n\n\n\n<p>So  I decided to try that, and began looking for more information. The official documentation for each tool was rich but had no reference on how to combine them in one solution. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Ok, I am convinced.. so where do I start?<\/h3>\n\n\n\n<p>I found a few posts about how it worked and how to set it up. Some used it to monitor Host performance metrics on isolated systems, others for home network monitoring (setup on Rasberrypi), others for limited SNMP metrics, and others for Virtual environments, like <a href=\"https:\/\/jorgedelacruz.uk\/2018\/10\/01\/looking-for-the-perfect-dashboard-influxdb-telegraf-and-grafana-part-xii-native-telegraf-plugin-for-vsphere\/\">Vmware\/VCenter<\/a>. I didn&#8217;t find any posts or comments about it being used for a large network infrastructure with SNMP. Nothing came up on Twitter either. The Devops world has produced a lot of monitoring tools lately but it&#8217;s probably still early for network engineers that traditionally used simpler tools to adopt and use them, so I guess it&#8217;s normal. <\/p>\n\n\n\n<p>At first I started with a test server in a VM, Ubuntu 18.04 LTS. I decided to try the procedure suggested in this <a href=\"https:\/\/lkhill.com\/telegraf-influx-grafana-network-stats\/\">post<\/a>, explaining how to setup most things for simple SNMP monitoring and only a few agents (nodes), and combined it with this <a href=\"https:\/\/www.howtoforge.com\/tutorial\/how-to-install-tig-stack-telegraf-influxdb-and-grafana-on-ubuntu-1804\/\">post <\/a>in order to better prepare InfluxDB, as the first one does not create a separate Database. Looking into more posts, I started to realize how InfluxDB is very close to a regular database, providing a lot more features but at the same time needing some administration and housekeeping. Besides doing regular queries and deleting test or erroneous data, I found that a very important issue is data retention policy. But more on that later. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What were the main installation steps?<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>Setting up the server (dhcp reservation using the mac address after modifying the netplan cfg, packaces up to date, NTP setup, .bashrc modification, blacklist floppy)<\/li><li>Set up the package repo for InfluxDB and Telegraf<\/li><li>Install SNMP mibs downloader to get the SNMP MIBII standard mibs (for standard OIDs like the interface table and the system variables) which are not included in the standard server OS.<\/li><li>Install snmptranslate, if you need to. It can help with checking whether you have the right MIBS installed for every object you need to monitor and troubleshooting your telegraf configuration.<\/li><li>Install InfluxDB, create a separate telegraf database (besides the default one), create the telegraf DB user, assign access rights<\/li><li>Install Telegraf (the SNMP input plugin is contained in Telegraf by default, other plugins also available), create a sample config towards an SNMP enabled node (I used a Cat 6500) and test it. Apply the changes.<\/li><li>Add the Grafana package repo, install Grafana<\/li><li>Make standard changes to the default config if you need to (e.g. domain, root url, syslog etc)<\/li><li>Login as admin, create a datasource, create a simple dashboard and graph, enjoy and become acquainted to the new environment .<\/li><\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"957\" height=\"809\" src=\"http:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/grafana-datasources.png\" alt=\"\" class=\"wp-image-942\" srcset=\"https:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/grafana-datasources.png 957w, https:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/grafana-datasources-300x254.png 300w, https:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/grafana-datasources-768x649.png 768w, https:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/grafana-datasources-850x719.png 850w\" sizes=\"auto, (max-width: 957px) 100vw, 957px\" \/><figcaption>Add a datasource to Grafana<\/figcaption><\/figure>\n\n\n\n<p>So for my test server, the process was not too complicated, I had done similar things before as I have an aptitude for reading up on new tools and software and integrating them in our environment. What had we we gained so far?<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The plus side<\/h3>\n\n\n\n<ul class=\"wp-block-list\"><li>A lot less config effort and time. Configuration on Telegraf for new items to monitor was minimal:<ul><li>Every new OID had to be defined only once and it would be monitored for every SNMP agent declared in the config.<\/li><li>Furthermore, one can define whole tables to be monitored without the need to define every OID in the table.<\/li><li>The only need to specify different config files for groups of nodes and OIDs would be to specify different polling intervals, SNMP version and communities.<\/li><\/ul><\/li><li>Configuration on the visualization part (Grafana) will only have to be done once per new item type. Once the dashboards and panels are created, you only have to make sure you are receiving the data from the appropriate data source. For example, a new switch only has to be added to the list of agents in the Telegraf existing config. Unless you are filtering the data in Grafana, you will have that switch&#8217;s data available for visualization as well, no extra effort on your part.<\/li><li>Telegraf allows for more customization of the data, defining tags that can help with the indexing (in fact they are necessary for the visualization of multiple instances of monitored data).<\/li><li>InfluxDB is a querible database.  In fact there is a &#8220;query language&#8221; called iql for it, similar to sql. So managing the database is possible and so are custom queries that can combine things and we can go back in time as much as we want (provided there is data for it). No more zooming on graphs that hold average values and trying to guess what the data values were three months ago by interpollating graphs.<\/li><li>Grafana is a very flexible data visualization tool, incredibly customizable. The types of visualizations are very diverse and dashboards really come alive in the screen but on top of that there are so many more options and features to be configured and used:<ul><li>intuitive UI (I know how many times you may have read something like that, it depends on the person, but trust me it&#8217;s easy to get)<\/li><li>customizable time periods<\/li><li>user groups and access levels<\/li><li>various types of datasources and plugins<\/li><li>integration with other tools at various levels<\/li><li>programmable APIs<\/li><li>thresholds and alerts<\/li><li>notifications<\/li><li>exporting data in CVS<\/li><\/ul><\/li><li>All the Stack components can operate independently (same or different machines, close or far), are extensible with plugins, use structured data (e.g. JSON) for communicating, provide APIs that can help with automation and have rich documentation and community support. And oh yeah, all of that is free of course<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Wait.. what do we do about dashboards and panels?<\/h3>\n\n\n\n<figure class=\"wp-block-image\"><img loading=\"lazy\" decoding=\"async\" width=\"593\" height=\"306\" src=\"http:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/grafana-panel.png\" alt=\"\" class=\"wp-image-941\" srcset=\"https:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/grafana-panel.png 593w, https:\/\/www.mythryll.com\/wp-content\/uploads\/2019\/12\/grafana-panel-300x155.png 300w\" sizes=\"auto, (max-width: 593px) 100vw, 593px\" \/><\/figure>\n\n\n\n<p>Sounds like magic, right? It looks good too and it&#8217;s reasonably fast. But in fact, Grafana dashboards and panels are not so easy to create or understand when you first take a look at the tool. In reality, what you are seeing is written in json, data is retrieved with queries upon the data source using parameters defined in the dashboard JSON code and then processed and displayed using Go (not 100% sure on that). There are two ways to go:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Learn how to create your own, by reading either independent user posts or the doc (good luck, it takes time, however you may end up having to read the doc anyway)<\/li><li>Download a <a href=\"https:\/\/grafana.com\/grafana\/dashboards\">ready dashboard<\/a> from the Grafana website and start fooling around with it in order to understand the structure and what needs to be modified to adapt to your needs. For example here is an example for an interface stats dashboard, mind the dependencies mentioned (influxdb, telegraf): <a href=\"https:\/\/grafana.com\/grafana\/dashboards\/10668\">https:\/\/grafana.com\/grafana\/dashboards\/10668<\/a><\/li><\/ul>\n\n\n\n<p>If you have a lot of time, start with the 1rst option. If you know a few things about json or have played with python dictionaries, you may have an easier time with the 2nd option. No matter what you choose, eventually you will probably try both. I will comment more on the dashboards and panels in the next parts of this series of posts. For now let&#8217;s say it was a necessary but confusing part of my journey.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Ok.. Are we ready to go then?<\/h3>\n\n\n\n<p>I could have stopped there and deployed on production. But there were more things to consider:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Ok so we have lessened the burden of configuration and provisioning for monitoring sources and their visualization. But did we get rid of the maintenance overhead for the OS and the libraries? Certainly not. <ul><li>By streamlining the installation process we could of course automate it, using either large bash scripts or Ansible (or your system automation tool of choice), but we would still have to do something about the VM, like creating a VM template ready to be deployed in minutes in case things go terribly wrong.<\/li><li>We still have to worry about the dependencies for the software. If things change in that area we may find ourselves in trouble trying to fix it, while our monitoring is halted and data is lost.<\/li><li>We definitely need a backup procedure to preserve the performance DB data and schema. That can&#8217;t be helped no matter what you do. Vmware backup solutions usually go for the whole VM but in case one goes for file backup, DB manipulation will be necessary ( essentially stopping the tools)<\/li><\/ul><\/li><li>What about updating the tools themselves? Well in that area one has to go with what the software creator has to offer. In our case let&#8217;s not forget that two of the tools are DB based, so there is an update effort there too. Sometimes the packages update procedure is enough.<\/li><li>What about Https? After all this, a standard installation leaves us with an Http based service. Well that&#8217;s the easy part. If you have ever setup HTTPS on Apache or Nginx, doing the same with Grafana is a piece of cake. The documentation is simple enough. If however you have no idea how to create a CSR, create certificates for your server and installing them, then track me down on twitter and I will do my best to provide some directions in order for you to find your own way with it.<\/li><li>What about LDAP\/AD integration? That part was not so easy to figure out. I will develop this in the next part of the series but getting the software to allow your AD users in is one thing, correctly reading group information and assigning roles is another. It turned out to be easier than I thought but at that point I had tried using my Netbox experience and failed (in fact I believe I had done things wrong with Netbox as well in that area). So my confusion remained for the time being.<\/li><li>What happens if we need to recreate our datasources, dashboards and panels? Would all that work go to waste? Well, the fact that there are dashboards you can <a href=\"https:\/\/grafana.com\/grafana\/dashboards\/10488\">download <\/a>and that Grafana has multiple points of access to the json dashboard and panel code, as well as options for dashboard import and export to json (mind the option for &#8220;sharing externaly&#8221;), it seems possible that one can save the dashboards one by one in files in json format and then have them inserted back in a &#8220;clean&#8221; system. But what about the datasources? Well no export option in the gui for that but what the heck, it&#8217;s only a few sources, right? Well not exactly. The dashboards and panels depend on your datasources (they are explicitly mentioned in the JSON code). When you try to change a datasource in a dashboard (for example in the variables section) or in a panel, you will probably end up loosing your query text and may have to recreate everything. Even if you don&#8217;t come to face that problem, it&#8217;s still a lot of work and there is much room for error. You may have guessed where I am getting at .. APIs and auto-provisioning (using Yaml files). Grafana provides both. You can choose to use either or both. But more on that also in the next parts of the series.<\/li><li>What about adding more MIBs, custom vendor ones? What about providing notifications? What about alerts? Adding more MIBs was not so hard to figure out but the rest of the questions and more issues, remained a source of confusion.<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">The calm before the storm<\/h3>\n\n\n\n<p>At about that time I came to a realization: There was only so much I could do in the test environment. I could not transfer the historical data from the old system to the new one and I certainly didn&#8217;t want a full test deployment to burden the network devices by deploying the SNMP queries twice, one for the old and one for the new system. So there was no point in waiting any more.<\/p>\n\n\n\n<p>While studying using the Cisco Devnet Streaming Telemetry Sandbox back in August and September, I came across Docker containers as they were heavily used in the lab. I knew what containers were conceptually but never actually had put my hands on to a docker installation. In order to better understand how the Streaming Telemetry infrastructure was staged, I started to scratch the surface. I learned how to start and stop containers, how to create a pseudo-command line terminal in order to execute commands and a general idea on how they are created and orchestrated.<\/p>\n\n\n\n<p>After verifying the availability of docker images for all components of the Stack, I finaly decided to take the plunge: <\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Engage!<\/h2>\n\n\n\n<ul class=\"wp-block-list\"><li>I would install Docker on the main web server of the old system and deploy Grafana using a docker container. <\/li><li>I would scrap one of the measuring servers and replace it with a new Ubuntu 18.04 VM, using the same network parameters, deploy Docker on that one too, and then InfluxDB and Telegraf again using Docker containers.<\/li><li>I would recreate the structure of the test system, transfering the measuring duties of the old measuring server to the new one and visualize metrics on the new container based Grafana system.<\/li><li>I would repeat the process with the rest of the measuring servers, until every performance parameter previously monitored by MRTG was monitored by Grafana.<\/li><li>As a last step I would scrap the old web server, replace it with a new Ubuntu 18.04 virtual machine and recreate the same Grafana Docker container again on that docker host.<\/li><\/ul>\n\n\n\n<p>I  did face some difficulties that forced me to change the plan a bit. But this is where the first part of the series will stop. Before I leave you, I promise that the next part will come a little before Christmas (hopefully).<\/p>\n\n\n\n<p>Cu in the next part!<\/p>\n\n\n\n<p>PS: I am including below a processed sample of the command line history from the test VM, to help you get a better idea of how it went about. Obviously there are parts missing or modified (sanitized). Do not copy\/paste, just read.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#Initial Setup part\n    1  vim \/etc\/ssh\/sshd_config\n    6  vim .bashrc\n    9  vim \/etc\/netplan\/50-cloud-init.yaml\n   12  vim \/etc\/modprobe.d\/blacklist.conf\n   14  apt-get update\n   15  apt-get upgrade\n\n#This is where the installation begins. You can see I tried \n#going the apt way.. \n   17  sudo curl -sL https:\/\/repos.influxdata.com\/influxdb.key | sudo apt-key add -\n   18  source \/etc\/lsb-release\n   19  echo \"deb https:\/\/repos.influxdata.com\/${DISTRIB_ID,,} ${DISTRIB_CODENAME} stable\" | sudo tee \/etc\/apt\/sources.list.d\/influxdb.list\n   20  sudo apt update\n   21  sudo apt install influxdb\n   22  apt-get update\n\n#And this is where the Security Division ruined my day, so I went \n#with downloading and installing the packages manually\n   23  wget https:\/\/dl.influxdata.com\/influxdb\/releases\/influxdb_1.7.8_amd64.deb\n   24  sudo dpkg -i influxdb_1.7.8_amd64.deb\n   25  systemctl status influxdb\n   26  systemctl start influxdb\n   27  systemctl enable influxdb\n   28  systemctl status influxdb\n\n#At this point we need to get inside influxdb and create database\n#dbuser and assign access rights\n   29  influx\n   30  wget https:\/\/dl.influxdata.com\/telegraf\/releases\/telegraf_1.12.1-1_amd64.deb\n   31  sudo dpkg -i telegraf_1.12.1-1_amd64.deb\n\n#Don't forget that if your telegraf service is not in the same host \n#with your influxdb service, you need to modify the main telegraf \n#config in \/etc\/telegraf\/\n   32  sudo systemctl start telegraf\n   33  sudo systemctl status telegraf\n   34  sudo apt install snmp snmp-mibs-downloader\n\n#after you get the default mibs, manipulation for this file is \n#needed and the same happens with other mibs (like vendor mibs).\n#you also need to put the extra mibs in the same place, \/usr\/share\/snmp\/mibs or you will have a lot of trouble making it work.\n   35  vim \/etc\/snmp\/snmp.conf\n   36  snmpwalk -v 2c -c community ipaddress system\n\n#you can put the extra conf files in this dir, it's enabled by \n#default in the packaged version (not the same on the Docker image, \n#be careful)\n   37  cd \/etc\/telegraf\/telegraf.d\/\n   38  vim accore6509.conf\n   39  telegraf --test --config \/etc\/telegraf\/telegraf.d\/accoreMK6509.conf\n   40  sudo systemctl reload telegraf\n   41  cd\n   42  wget https:\/\/dl.grafana.com\/oss\/release\/grafana_6.3.5_amd64.deb\n   43  sudo dpkg -i grafana_6.3.5_amd64.deb\n\n#I don't remember what this was for.\n   44  apt-get install libfontconfig1\n   45  apt --fix-broken install\n   46  sudo dpkg -i grafana_6.3.5_amd64.deb\n   47  sudo systemctl enable grafana-server\n\n#this checks for active service ports on your host\n   48  netstat -plntu\n\n#this is the part for the certificates installation for https. You \n#can't see the whole thing here, I am mostly fooling around.\n   68  cd \/etc\/grafana\n   70  vim grafana.ini\n   71  scp root@xxxxx:\/etc\/apache2\/ssl\/* .\/\n   72  ls\n   73  ls -al ssl\n   74  vim grafana.ini\n   75  systemctl restart grafana\n   76  systemc\n   77  systemctl restart grafana-server\n   78  systemctl status grafana-server\n  \n   82  ls\n   83  vim grafana.ini\n   84  systemctl restart grafana-server\n   85  systemctl status grafana-server\n\n   99  cd ssl\/\n  100  ls\n  101  openssl genrsa -aes256 -out test.key.pem 2048\n  102  openssl rsa -in test.key.pem -out test.nopasswd.key.pem\n  103  vim my_openssl.cnf\n  104  openssl req -config my_openssl.cnf -key test.nopasswd.key.pem -new -sha256 -out test.csr.pem -subj \"\/C=GR\/ST=TheProvince\/L=City\/O=Organization\/OU=YELS\/CN=test.ourdomain.gr\/emailAddress=nocteam@whereIwork.gr\"\n  105  ls -al\n\n#this is where I started checking the snmp part for the host. If \n#those work you can put those OIDs in your telegraf configs\n  106  cd \/etc\/telegraf\/\n  107  ls\n  108  vim telegraf.d\/accore6509.conf\n  109  systemctl restart telegraf.service\n  110  systemctl restart influxd\n  111  systemctl restart grafana-server\n  112  vim telegraf.d\/accore6509.conf\n  113  systemctl restart telegraf.service\n  114  systemctl restart influxd\n  115  systemctl restart grafana-server\n  116  snmpwalk -v 2c -c community ipaddress IF-MIB::ifTable\n  117  snmpwalk -v 2c -c community ipaddress IF-MIB::ifDescr\n  118  snmpwalk -v 2c -c community ipaddress IF-MIB::ifAlias\n  119  vim telegraf.d\/accore6509.conf\n\n#you can test the config before you activate it in telegraf. \n#Activation happens upon restarting the telegraf service\n  123  telegraf --test --config \/etc\/telegraf\/telegraf.d\/accore6509.conf\n  124  systemctl restart influxd\n  125  systemctl restart grafana-server\n  138  influx\n  139  systemctl stop grafana-server\n  140  systemctl stop telegraf.service\n\n  147  systemctl start telegraf.service\n  149  systemctl start grafana-server\n\n  157  systemctl restart telegraf.service\n  158  systemctl restart influxd\n  159  systemctl restart influxdb\n  160  systemctl start grafana-server\n  161  systemctl restart grafana-server\n\n#this is the part where I started fooling around with ldap and AD\n  174  locate ldap.toml\n  175  cd \/etc\/grafana\/\n  176  vim ldap.toml\n  177  systemctl restart grafana-server\n  178  vim ldap.toml\n  179  vim grafana.ini\n  189  more \/var\/log\/syslog\n  190  tail -500 \/var\/log\/syslog | grep ldap\n  191  systemctl restart grafana-server\n\n#at this point I realized I should have installed ntp service from \n#the start. It's not included by default in Ubuntu server. The logic \n#is the same for a workstation, it syncs with internet based \n#servers, if you need to config more stuff you need the ntp package.\n  224  date\n  225  echo $LOCALE\n  226  echo LOCALE\n  227  timedatectl\n  228  timedatectl list-timezones\n  229  systemctl stop grafana-server.service\n  230  systemctl stop influxdb\n  231  systemctl stop telegraf\n  232  sudo timedatectl set-timezone Europe\/Athens\n  233  timedatectl\n  234  apt-get install ntpd\n  235  apt-cache search ntp\n  257  apt-get install ldapsearch\n  258  apt-get update\n  259  apt-get upgrade\n  260  sudo apt install ldap-utils\n  261  ldapsearch -H ldap:\/\/ldapserver:ldapport -D \"CN=ldapusername,OU=Special Purpose,OU=UsersOU,DC=yourdomain,DC=gr\" -w \"bigstrongpassword\" -v -d 1  -LLL \"(sn=theodoridis)\" -u cn sn telephoneNumber sAMAccountName\n  262  ldapsearch -H ldap:\/\/ldapserver:ldapport -D \"CN=ldapusername,OU=Special Purpose,OU=UsersOU,DC=bankofgreece,DC=gr\" -w \"bigstrongpassword\" -v -d 1  -LLL \"(cn=itheodoridis)\" -u cn sn telephoneNumber sAMAccountName\n  263  vim ldap.toml\n  264  systemctl restart grafana-server.service\n  277  tail -500 \/var\/log\/syslog | grep ldap\n  294  cd \/etc\/grafana\/\n  377  cd \/etc\/snmp\n  378  ls\n  379  vim snmp.conf\n  380  cd \/usr\/share\/\n  381  ls\n  382  cd snmp\/\n  383  ls\n  385  cd mibs\n  386  ls\n  387  cd ..\n  388  ls -al\n\n#lots of snmp tests\n  393  snmptranslate .1.3.6.1.4.1.9.9.109\n  394  vim \/etc\/snmp\/snmp.conf\n  395  snmptranslate .1.3.6.1.4.1.9.9.109\n  396  vim \/etc\/snmp\/snmp.conf\n  397  snmptranslate .1.3.6.1.4.1.9.9.109\n  398  snmptranslate .1.3.6.1.4.1.9.9\n  399  snmptranslate .1.3.6.1.4.1\n  400  snmptranslate .1.3.6.1.4.1.9.9\n  455  snmptranslate 1.3.6.1.4.1.9.9.109\n  456  snmptranslate CPULoadAverage\n  457  snmptranslate cpmCPUTotal5min\n  458  vim \/etc\/snmp\/snmp.conf\n  459  net-snmp-config --default-mibdirs\n  460  apt install libsnmp-dev\n  461  net-snmp-config --default-mibdirs\n  462  snmptranslate cpmCPUTotal5min\n  463  snmptranslate 1.3.6.1.4.1.9.9.109\n  464  snmptranslate -IR -0n cpmCPUTotal5min\n  465  snmptranslate -IR -On cpmCPUTotal5min\n  470  snmptranslate -On CISCO-PROCESS-MIB::cpmCPUTotal5min\n  484  snmptranslate -On CISCO-PROCESS-MIB::cpmCPUTotal5min\n  485  snmptranslate -On \/usr\/share\/snmp\/CISCO-PROCESS-MIB::cpmCPUTotal5min\n  486  snmptranslate -On \/usr\/share\/snmp\/othermibs\/CISCO-PROCESS-\nMIB::cpmCPUTotal5min\n\n#more tests, you can see vendor mibs too below. Putting the mibs in \n#separate directories didn't work, no matter how much I tried adding \n#the dirs in the config.\n  510  snmptranslate CISCO-PROCESS-MIB::cpmCPUTotal5min\n  511  snmptranslate CISCO-PROCESS-MIB.my::cpmCPUTotal5min\n  512  snmptranslate cpmCPUTotal5min\n  513  vim \/etc\/snmp\/snmp.conf\n  514  snmptranslate cpmCPUTotal5min\n  515  snmptranslate CISCO-PROCESS-MIB.my::cpmCPUTotal5min\n  516  snmptranslate cpmCPUTotal5min\n  517  vim \/etc\/snmp\/snmp.conf\n  518  snmptranslate cpmCPUTotal5min\n  519  snmptranslate CISCO-PROCESS-MIB::cpmCPUTotal5min\n  520  snmptranslate -IR CISCO-PROCESS-MIB::cpmCPUTotal5min\n  521  snmptranslate -Ir CISCO-PROCESS-MIB::cpmCPUTotal5min\n  522  snmptranslate -R CISCO-PROCESS-MIB::cpmCPUTotal5min\n  523  snmptranslate -IR cpmCPUTotal5min\n  524  snmptranslate -On CISCO-PROCESS-MIB::cpmCPUTotal5min\n  525  snmptranslate -On cpmCPUTotal5min\n  526  snmptranslate -Ir-On cpmCPUTotal5min\n  527  snmptranslate -Ir -On cpmCPUTotal5min\n  528  snmptranslate -IR -On cpmCPUTotal5min\n  529  snmptranslate -IR -On cpmCPUTotal1min\n  530  snmpwalk .1.3.6.1.4.1.9.9.109\n  532  snmpwalk -v 2c -c community ipaddress .1.3.6.1.4.1.9.9.109\n  553  snmptranslate -IR -On snmptranslate -IR -On cevChassisCat6509\n  554  snmptranslate -IR -On snmptranslate -IR -On CISCO-ENTITY-VENDORTYPE-OID-MIB::cevChassisCat6509\n  555  snmptranslate  1.3.6.1.4.1.9.12.3.1.3.144\n  557  snmptranslate cpmCPUTotal5min\n  558  snmptranslate -IR -On cpmCPUTotal5min\n  559  snmptranslate -IR -On ifConnectorPresent\n  560  snmptranslate -IR -On cpmCPUTotal5minRev7\n  561  snmptranslate -IR -On cpmCPUTotal5minRev\n  562  snmptranslate -IR -On cpmCPUTotal5min\n  563  snmpwalk cpmCPUTotal1minRev\n  564  history | grep snmpwalk\n  565  snmpwalk -v 2c -c community ipaddress cpmCPUTotal5min\n  566  snmpwalk -v 2c -c community ipaddress cpmCPUTotal5minRev\n  567  snmpwalk -v 2c -c community ipaddress cpmCPUTotal1minRev\n  572  snmptranslate -IR -On CISCO-PROCESS-MIB::cpmCPUTotal1minRev\n  573  snmptranslate -On CISCO-PROCESS-MIB::cpmCPUTotal1minRev\n  574  snmptranslate -On cpmCPUTotal1minRev\n  575  snmptranslate -On CISCO-PROCESS-MIB::cpmCPUTotal1minRev\n  576  vim \/etc\/telegraf\/telegraf.d\/accoreProcStats.conf\n  577  history | grep telegraf\n  578  telegraf --test --config \/etc\/telegraf\/telegraf.d\/accoreProcStats.conf\n  579  vim \/etc\/telegraf\/telegraf.d\/accoreProcStats.conf\n  580  snmpwalk -v 2c -c bog ipaddress cpmCPULoadAvg1min\n  581  snmpwalk -v 2c -c bog ipaddress cpmCPULoadAvg5min\n  582  snmpwalk -v 2c -c bog ipaddress cpmCPULoadAvg15min\n  583  snmpwalk -v 2c -c bog ipaddress cpmCPUTotal1minRev\n  584  snmpget -v 2c -c bog ipaddress cpmCPUTotal1minRev\n  585  systemctl restart telegraf\n  586  systemctl status telegraf\n\n#I checked for disk space usage as I didn't want things to blow up. \n#Remember the retention policy? Look for that discussion in the next \n#part of the series\n  587  du\n  588  df\n\n# More tests\n  600  snmpget -v 2c -c community ipaddress cpmCPUTotalIndex\n  601  snmpget -v 2c -c community ipaddress cpmCPUPhysicalIndex\n  602  snmpget -v 2c -c community ipaddress cpmCPUTotalPhysicalIndex\n  603  snmpwalk -v 2c -c community ipaddress cpmCPUPhysicalIndex\n  604  snmpwalk -v 2c -c community ipaddress cpmCPUTotalIndex\n  605  snmpwalk -v 2c -c community ipaddress cpmCPUTotalPhysicalIndex<\/code><\/pre>\n\n\n\n<p>(the adventure continues on <a href=\"http:\/\/www.mythryll.com\/?p=963\">part 2<\/a>)<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(if you are looking for part2, it&#8217;s here) Introduction I like reading guides, not writing them (it takes too long for me). So this isn&#8217;t a guide, it&#8217;s more like an adventure tale, it&#8217;s a &#8220;I got out one day for a stroll and look what happened!&#8221;. I had no knowledge of Grafana, InfluxDB, Telegraf&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[27,14,21,7],"tags":[34,37,36,39],"class_list":["post-922","post","type-post","status-publish","format-standard","hentry","category-automation","category-it","category-monitoring-tools","category-professional","tag-grafana","tag-influxdb","tag-telegraf","tag-tig"],"_links":{"self":[{"href":"https:\/\/www.mythryll.com\/index.php?rest_route=\/wp\/v2\/posts\/922","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.mythryll.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.mythryll.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.mythryll.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.mythryll.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=922"}],"version-history":[{"count":28,"href":"https:\/\/www.mythryll.com\/index.php?rest_route=\/wp\/v2\/posts\/922\/revisions"}],"predecessor-version":[{"id":998,"href":"https:\/\/www.mythryll.com\/index.php?rest_route=\/wp\/v2\/posts\/922\/revisions\/998"}],"wp:attachment":[{"href":"https:\/\/www.mythryll.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=922"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.mythryll.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=922"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.mythryll.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=922"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}