Home Assistant - 'ESXi Server and Devices' Dashboard Breakdown

Home Assistant - 'ESXi Server and Devices' Dashboard Breakdown

Hello hello hello and welcome back to my SWAKES blog! In todays breakdown post I'll be going through the setup and components used to monitor and manage my ESXi server as well as a few other devices I have at home. Before we delve into the nitty gritty, let me quickly go over the bits I'm monitoring/managing on this dashboard.

  • Dell PowerEdge T330 Server - This is where HA lives amongst other virtual machines on ESXi.
  • APC Smart-UPS - I much needed piece of kit, I've luckily got this hooked up to the Dell server as well as my data/network cabinet.
  • ESXi - Where all the magic happens, I've got around 8-10 VMs running which I aim to monitor on this dashboard.
  • Office Workstation - A way to easily lock, restart or shutdown my desktop workstation as well as showing some additional stats (got to love those stats!)
  • Pi-KVM - Currently running a Raspberry Pi 4 hooked up to my pfSense router in the data cabinet. This allows me to remotely manage the router/host without having to connect and trail keyboard, mouse or display cables!

With that out of the way, let's list the components used on this dashboard to pull these stats and manage the devices ...

  • 'ESXi Stats' (HACS Integration) - This integration provides me with stats for each virtual machine in ESXi such as CPU, RAM, Disk space and others. Each VM is a single sensor entity with all the stats as attributes so I've had to do some templating to pull and separate these out which I'll go into later.
  • 'Network UPS Tools' (Supervisor Addon) - This addon gives me info around my APC Smart UPS SMT1000.
  • 'RPI System Sensors' - A handy little thing I found on someone's Reddit post a few weeks back, this service runs on a Raspberry Pi and relays stats to HA via MQTT.  
  • 'HASS Workstation Service' - Another handy service, this runs on my desktop PC and you've guess right, provides stats and controls for Windows 10.
  • 'Linux Service Monitoring' - I've got some custom 'command_line' entities setup to SSH into certain VMs and check the status of key services such as Docker, NGINX, Zabbix/OSSEC agents plus more.

In terms of layout and functionality, I've used some clever cards and coding to display and access all these wonderful stats using the following:

  • 'custom:button-card' - One of my favourite and most used custom cards, nearly everything on this dashboard is using a 'custom:button-card'. Mainly due to the fact they are easily customisable for sizing etc but also provide some extra functionality (we will go balls deep into this later).
  • 'custom:mini-graph-card' - Used to display stats for CPU, RAM, Uptime and UPS charge. Another favourite custom card used throughout my HA setup
  • 'Template Sensors' - I've had to painstakingly modify some of the values for certain stats as they weren't what I wanted. For example, the value for the 'Uptime' entity was producing '2021-03-24T13:21:11' which isn't pretty so I've managed to swap this to hours instead (a lot more readable!). Also as mentioned above, the VMs stats are all in one sensor (as attributes) so I've had to use a Template sensor again to split these attributes out into their own sensors.
  • 'command_line' Entities - These require some extra magic in the form of SSH key copying/sharing but they ultimately run commands remotely from HA to each VM to check up on active/key services.

Now we've got that out of the way let's start to break this shit down!

Dell PowerEdge T300

This panel is made up of two parts, firstly the graphs which use the 'custom:mini-graph-card' cards. The three stats I've ended up displaying are the overall CPU and RAM usage as well as Uptime (in hours). The stats are pulled in Home Assistant via the 'ESXi Stats' component. I won't go into how to set this up as it's relativity easy but I will show you what to do next once it's setup and integrated.

As mentioned above, each VM and the overall ESXi host is setup as a single sensor, with all the stats under that as an attribute.

In order to separate these attributes into individual sensors, you'll need to perform some magic in the form of a template sensor.

platform: template
sensors:
  esxi_stats_cpu:
    friendly_name: "ESXi CPU Usage"
    unit_of_measurement: 'Ghz'
    value_template: "{{ state_attr('sensor.esxi_vmhost_localhost_localdomain', 'cpuusage_ghz') }}"

The template sensor shown above essentially creates a new sensor for 'cpuusage_ghz' and once created, saved and HA restarted, will produce 'sensor.esxi_stats_cpu'.

Fucking A! The next step is to repeat the process for each attribute you wish to separate and display in HA. In my case I've repeated this for the RAM, Uptime and amount of VM's. Whilst we're here, it may be worth noting that certain values from these attributes may not be to your liking. For example, Uptime. The value for this shows the number of hours the server has been up and ideally would be better displayed as days instead. Don't worry, some more magic is needed to correct this yet again in the form of a template sensor.

platform: template
sensors:
   esxi_uptime_days:
      friendly_name: "ESXi Uptime Days"
      unit_of_measurement: 'days'
      value_template: "{{ (states('sensor.esxi_stats_uptime')|float / 24)|round(0)}}"

Another handy template sensor later and we've managed to convert the number of hours into days! For more details on how to convert different time and date values, check out this thread on the Home Assistant Community forum as I sure can't be arsed to go through them all!

Next on the list are the three boxes showing the online/offline status for iDRAC, ESXi and the amount of VMs in ESXi. The first two boxes are ultimately 'ping' checks against the web portal URLs for iDRAC and ESXi which I've previously written about in this post. The last box is using a template sensor again to pull the amount of VMs from the ESXi sensor entity.

The way I've created the two 'ping' check boxes for iDRAC and ESXi is shown below. Using the ever so handy 'custom:button-card', I've managed to also include a green/red bar underneath the name to display whether the URL is online or offline. If they are online, the bar displays green however for offline, the bar turns to red and also applies an opacity filter to 'grey' out the card entirely.

type: 'custom:button-card'
entity: sensor.esxi_online
entity_picture: /local/esxilogo.png
show_icon: false
show_entity_picture: true
name: ESXi
state:
  - styles:
      card:
        - height: 120px
      entity_picture:
        - width: 40%
        - opacity: 0.2
      name:
        - padding-bottom: 10px
        - font-size: 15px
        - text-overflow: unset
        - white-space: unset
        - word-break: break-word
        - opacity: 0.2
      custom_fields:
        notification:
          - background-color: |
              [[[
                if (states['sensor.esxi_online'].state == 'Offline')
                  return "red";
                return "#AD5C5C";
              ]]]
          - border-radius: 50%
          - position: absolute
          - left: 10%
          - top: 93%
          - opacity: 0.5
          - height: 3px
          - width: 80%
    custom_fields:
      notification: |
        [[[
          return `<ha-icon
            icon="mdi:robot"
            style="display: block; width: 1px; height: 1px; color: white; margin: auto; position: relative;">
            </ha-icon>`
        ]]]         
    value: Offline
  - color: white
    operator: default
    styles:
      card:
        - height: 100px
      entity_picture:
        - width: 25%
      name:
        - padding-bottom: 10px
        - font-size: 15px
        - text-overflow: unset
        - white-space: unset
        - word-break: break-word
      custom_fields:
        notification:
          - background-color: |
              [[[
                if (states['sensor.esxi_online'].state == 'Online')
                  return "green";
                return "#AD5C5C";
              ]]]
          - border-radius: 50%
          - position: absolute
          - left: 10%
          - top: 93%
          - opacity: 0.5
          - height: 3px
          - width: 80%
    custom_fields:
      notification: |
        [[[
          return `<ha-icon
            icon="mdi:robot"
            style="display: block; width: 1px; height: 1px; color: white; margin: auto; position: relative;">
            </ha-icon>`
        ]]]         

APC Smart-UPS

To be honest I'll quickly skim over this as there isn't anything special (or already mentioned above) that I've used to create this card. The stats are all gathered from the 'Network UPS Tools' addon (found in Supervisor Addons) and put into 'custom:button-card' and 'custom:mini-graph' cards. Simples.

ESXi Virtual Machine

Now onto the middle column, this shows a selection button panel at the top and the virtual machine details beneath. When I initially started to knock up this dashboard, I soon found that attempting to display 8 virtual machines on one page became very busy and left little room for anything else. To combat this, I copied an element I had already created for my 'Floor Plan' dashboard where you are able to easily select a button at the top (using 'input_select') to show the relevant card underneath.

The first two on the list are gauges for CPU and RAM. As you can probably guess, these as well as all the other cards are 'custom:button-card' elements. Like before, the stats are pulled via 'ESXi Stats' and then separated using a template sensor. The same goes for the 'LAN Address' and 'System Uptime' stats/cards.

To display the 'OS' and 'Security' updates, we'll need to get these via a 'command_line' entity which essentially it done via a SSH command. Before creating the 'command_line' entity, you'll first need to setup 'SSH keys' between the remote host and HA in order for them to communicate without any 'password entering' intervention. I'm lazy and won't document how to do it here but check out this post to get the ball rolling. Once that is done, we can now create the 'command_line' entites.

- platform: command_line
  name: Web Server Package Updates
  command: ssh -i /config/.ssh/id_rsa -o 'StrictHostKeyChecking=no' user@192.168.12.34 -T /usr/lib/update-notifier/apt-check 2>&1 | cut -d ';' -f 1
  
- platform: command_line
  name: Web Server Security Updates
  command: ssh -i /config/.ssh/id_rsa -o 'StrictHostKeyChecking=no' user@192.168.12.34 -T /usr/lib/update-notifier/apt-check 2>&1 | cut -d ';' -f 2

Once created, saved and HA restarted, you'll now have two new fancy sensors displaying the OS and Security updates available on the remote host.

The last part of this card shows the status of currently key/critical services running on the virtual machine. You've got a few ways of doing this depending on the service in question. Firstly, you could go down the 'ping check' as mentioned previously (and in this blog post) if the service has some sort of web/portal UI. Secondly, if the service is hosted and running in Docker, you could use 'Monitor Docker' which I've covered in my 'System Dashboard Breakdown' post (found here). Or finally if the two above are not applicable, use a 'command_line' entity to run a SSH command and check the status of the desired service.

- platform: command_line
  name: Web Server Zabbix Service Check
  command: ssh -i /config/.ssh/id_rsa -o 'StrictHostKeyChecking=no' user@192.168.12.34 -t systemctl show -p ActiveState zabbix-agent | sed 's/ActiveState=//g'

- platform: command_line
  name: Web Server Docker Service Check
  command: ssh -i /config/.ssh/id_rsa -o 'StrictHostKeyChecking=no' user@192.168.12.34 -t systemctl show -p ActiveState dockerd | sed 's/ActiveState=//g'

- platform: command_line
  name: Web Server Webmin Service Check
  command: ssh -i /config/.ssh/id_rsa -o 'StrictHostKeyChecking=no' user@192.168.12.34 -t systemctl show -p ActiveState webmin | sed 's/ActiveState=//g'

The examples above are to check the service status for Zabbix agent, Docker and Webmin which are running on my Ubuntu 'Web Server' virtual machine.

Similar to the 'custom:button-cards' mentioned under Dell PowerEdge T330, these also have the green/red status bar underneath the names and also 'grey' to show if they are online of offline.

Windows Desktop

Not hard to figure out what host/device this is, the Windows Desktop card is comprised of both 'custom:mini-graph-card' and 'custom:button-card'. The method of getting these stats into HA is by installing the 'HASS Workstation Service'. This handy piece of software runs in the background on your Windows machine and uses MQTT to provide HA with stats and also provides the ability to run commands as well. As you can see, the top row consists of CPU, RAM and Uptime info which is being displayed in a 'custom:mini-graph'. Below that shows the current and active Window opened/being used and just below that, three buttons to either Lock, Restart or Shutdown the PC. Again, I won't go into how to install/integrate this into HA as its fairly simple, all you need is to have a MQTT instance setup and you're on your way! Once you've entered the MQTT details and added some sensors or commands, you'll soon see the nearly created sensors in HA under Configuration > Integrations > MQTT.

Pi-KVM

So last on our list is the 'Pi-KVM' card. As you may have guessed from the title, this device is running on a Raspberry Pi and the method of getting these stats is using a service called 'RPI System Sensors'. This is a small script runs in the background on the Raspberry Pi and sends the details over via MQTT. Like everything on this dashboard, it uses 'custom:mini-graph-cards' and 'custom:button-cards' to display the relevant details. Once setup, you'll then get all these juicy stats under Configuration > Integrations > MQTT.

So that's all for today folks! If you've got any questions or queries, please feel free to drop me a message on Reddit or Facebook.

Also, if you like what you've read and want to support me in knocking up more posts, please share or 'Buy me a coffee' below!

Buy Me A Coffee