paritybit.ca

OpenBSD Server Monitoring

Prometheus, coupled with Grafana, is a pretty nice solution for monitoring a fleet of servers. Resource usage is pretty low and they’re easy to configure.

This guide is tailored to OpenBSD, but the steps aren’t that different on other operating systems.

Server

Prometheus and Grafana will both be running on one server. Install the prometheus, node_exporter, and grafana packages and enable all of their daemons.

Prometheus

Configure the Prometheus daemon by editing /etc/prometheus/prometheus.yml:

global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:

scrape_configs:
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]
      labels:
        group: "monitoring"
  - job_name: "node"
    static_configs:
      - targets: ["localhost:9100"]
      labels:
        name: "prometheus"

This is set up to display the Prometheus web interface at localhost:9090 and collect data from the server that Prometheus is running on via a node_exporter on localhost:9100.

Once this is done, start the node_exporter and prometheus daemons, check /var/log/messages for any errors, and the web interface for Prometheus should be available at localhost:9090 with some collected metrics available.

Grafana

Simply start the grafana daemon and navigate to localhost:3000 in your web browser. Log in as admin:admin and change your password.

Then, go to Configuration -> Data Sources and add a new data source. Click Prometheus and input http://localhost:9090 for URL. Click save and test and Grafana will now be able to display collected metrics from your Prometheus server.

Now you are free to create dashboards/panels that contain your data. The level of customization and options are extensive, but there are many guides on building dashboards plus a large library of community-made ones here: Grafana.com - Dashboards.

Alertmanager

[I have to figure out Alertmanager later]

Client

On any client that you want to monitor, install node_exporter, enable and start it, and make sure it’s available over port 9100. Then, in your Prometheus configuration, add another targets section or another target to the targets array:

      - targets: ["example.com:9100"]
      labels:
        group: "websites"

Once you have done this, restart the prometheus daemon to load the new configuration.

Login Alerts

Although Prometheus has the ability to collect various metrics, it’s not really a solution for log monitoring. I wanted something that would notify me of any successful login to my servers via SSH, so I wrote the following script:

#!/bin/sh

# Watches SSH's auth log for successful logins and sends an email on login

email="jbauer@paritybit.ca"

case "$(uname -s)" in
"OpenBSD")
    tail_flags="-f"
    logfile="/var/log/authlog";;
*)
    # Linux, Free/Net/DragonflyBSD
    tail_flags="-F"
    logfile="/var/log/auth.log";;
esac

tail $tail_flags $logfile | while true; do
    read line
    if [ -n "$(echo $line | grep -e "sshd.*: Accepted")" ]; then
        subject=$(echo $line | awk '{print "ALERT: Login to "$9"@"$4" from "$11" on "$1" "$2" at "$3" EOM"}')
        echo "" | mail -s "$subject" "$email"
    fi
done

This is an efficient solution (compared to the ones below) which analyzes each line in the logfile and sends an email whenever it detects a login. This should also be fairly portable thanks to a small amount of OS detection code at the top. It does have a flaw where, when the daemon is restarted, it will alert on log lines that have already been seen because tail will print out a few of the recent lines from the file. There are a few ways that this could be addressed (keeping a memory of the last alert in a file, analysing the date of the log line, etc.) but this is good enough for my needs, especially since my servers don’t need to reboot often.

This is run like so from cron:

@reboot tmux new-session -d '/usr/local/bin/authalert'

(I should write an rc.d file for this so it can be a daemon.)

Here is a different script I also came up with:

#!/bin/sh

# Watches /var/log/authlog for successful logins and sends an email

email="jbauer@paritybit.ca"

count=$(grep -e "sshd.*: Accepted" /var/log/authlog | wc -l | awk '{print $1}')

while true; do
    lines="$(grep -e "sshd.*: Accepted" /var/log/authlog)"
    newcount=$(echo "$lines" | wc -l | awk '{print $1}')
    if [ $newcount -ne $count ]; then
        difference=$(($newcount - $count))
        if [ $difference -lt 0 ]; then
            # Log rolled over
            count=0
            continue
        fi
        for i in $(seq $difference); do
            line=$(echo "$lines" | tail -$i | head -1)
            subject=$(echo $line | awk '{print "ALERT: Login to "$9"@"$4" from "$11" on "$1" "$2" at "$3" EOM"}')
            echo "" | mail -s "$subject" "$email"
        done
    fi
    count=$newcount
    sleep 1
done

This solution does use more RAM and CPU than the other iterations, but it alerts on every login without repeats and really doesn’t use any significant amount of resources (I measured roughly a couple hundred kilobytes for the log data and 0.5% CPU every second). It also doesn’t require detecting the OS for tail flags, though the logfile name would still need to be changed depending on the OS. It is a relatively inefficient solution though, which is why I ultimately went for the one above.

Prior to the above solution, I stumbled upon fwa for OpenBSD which is able to watch files for changes. I coupled that program with the following script:

#!/bin/sh

# Watches /var/log/authlog for successful logins and sends an email

email="mail@example.com"

/usr/local/bin/fwa /var/log/authlog | while true; do
    lastline="$(tail -3 /var/log/authlog | grep -e "sshd.*: Accepted" | tail -1 )"
    if [ -n "$lastline" ]; then
        subject=$(echo $lastline | awk '{print "ALERT: Login to "$9"@"$4" from "$11" on "$1" "$2" at "$3" EOM"}')
        echo "" | mail -s "$subject" "$email"
    fi
    read discard
done

This is the least portable of the three solutions since fwa is specific to OpenBSD. Also, it reads the last three lines of authlog instead of just the last because I found that it wouldn’t alert on really quick operations, like when a user would scp a file, because the connection would open and close so quickly the log would already have the “Disconected” message instead of the “Accepted” message by the time the log was tailed. It is also slightly flawed in that it will repeat an alert if you, for example, ssh into the machine in two different terminals and then exit one of those sessions.