How To Setup Automated Backups For Plausible

Docker

No Comments

Published: 11.09.2022

Are you interested in setting up automated backups for plausible analytics? In this post, I will show you why and how to do it! The setup is based on a self-hosted docker version that I explained in an earlier post!

Basics
Automated Backups for Plausible
Recover the data from Backups
Conclusion

Basics

The foundation for this post is the Plausible setup in docker from this post here. In addition to the containers there, we need to create a companion container for the PostgreSQL database and configure the Clickhouse container to take care of the backups.

But why should you back up your statistics data? First of all, because data loss can happen at any time if by accident or someone wanting to delete your data. With the help of backups, you can recover from such a data loss, most of the time with a view states back in time.

In the next section, we will set up the companion container and configure the data back up in the way we want it.

Automated Backups for Plausible

Before we start setting up the databases and the backup of the data we will create the container configurations that are needed for plausible. Therefore we create a docker-compose.yml file in the directory we like. Inside the file we will configure the containers plausible and plausible_mail:

services:
    plausible_mail:
        container_name: plausible_mail
        image: bytemark/smtp
        restart: unless-stopped

    plausible:
        container_name: plausible
        image: plausible/analytics:latest
        restart: unless-stopped
        command: sh -c "sleep 10 && /entrypoint.sh db createdb && /entrypoint.sh db migrate && /entrypoint.sh db init-admin && /entrypoint.sh run"
        ports:
            - "8000:8000"
        environment: 
            ADMIN_USER_EMAIL: mail@programonaut.com
            ADMIN_USER_NAME: admin
            ADMIN_USER_PWD: admin
            BASE_URL: http://localhost:8000
            SECRET_KEY_BASE: <key>

After that, we will first create the PostgreSQL database that stores the plausible configuration like users and the websites and the automated backing up of these. Therefore we will create the plausible_db container and the pg-backup container that is based on the postgres-backup-local image which allows for automated and scheduled backups.

Need help or want to share feedback? Join my discord community!

    plausible_db:
        container_name: plausible_db
        image: postgres:12
        restart: always
        volumes:
            - ./plausible/data:/var/lib/postgresql/data
            - ./plausible/backup:/var/lib/postgresql/backup
        environment:
            - POSTGRES_PASSWORD=postgres

    pg-backup:
        container_name: pg-backup
        image: prodrigestivill/postgres-backup-local
        restart: always
        volumes:
            - ./plausible/backup/postgres/:/backups/
        links:
            - plausible_db:plausible_db
        depends_on:
            - plausible_db
        environment:
            - POSTGRES_HOST=plausible_db
            - POSTGRES_DB=plausible_db
            - POSTGRES_USER=postgres
            - POSTGRES_PASSWORD=postgres
            - POSTGRES_EXTRA_OPTS=-Z9 --schema=public --blobs -a
            - SCHEDULE=@daily
            - BACKUP_KEEP_DAYS=14
            - BACKUP_KEEP_WEEKS=4
            - BACKUP_KEEP_MONTHS=6
            - HEALTHCHECK_PORT=81

With the environment variables of the pg-backup container, you can configure the automated backups easily. For example, with the current configuration, I will create a backup once a day and store it for 14 days. In addition to the daily backups the image also automatically generates monthly and weekly backups. In addition to these variables, you can find all of them here.

Next up is the Clickhouse database. When setting up the system I figured out that using a companion container did not work very well, because of the way Clickhouse stores the data. The reason is that Clickhouse has some “shadow” instances containing the data that are not retrievable through the companion. Thus I decided to extend the base image via the click house server with some files and the backup functionality.

If this guide is helpful to you and you like what I do, please support me with a coffee!

For that, I first got myself the entrypoint.sh script from the clickhouse server repository and added the following two lines before the last line in the script:

echo "Running crond"
crond -b -c /etc/crontabs

The final file looks like this:

#!/bin/bash

set -eo pipefail
shopt -s nullglob

DO_CHOWN=1
if [ "${CLICKHOUSE_DO_NOT_CHOWN:-0}" = "1" ]; then
    DO_CHOWN=0
fi

CLICKHOUSE_UID="${CLICKHOUSE_UID:-"$(id -u clickhouse)"}"
CLICKHOUSE_GID="${CLICKHOUSE_GID:-"$(id -g clickhouse)"}"

# support --user
if [ "$(id -u)" = "0" ]; then
    USER=$CLICKHOUSE_UID
    GROUP=$CLICKHOUSE_GID
    if command -v gosu &> /dev/null; then
        gosu="gosu $USER:$GROUP"
    elif command -v su-exec &> /dev/null; then
        gosu="su-exec $USER:$GROUP"
    else
        echo "No gosu/su-exec detected!"
        exit 1
    fi
else
    USER="$(id -u)"
    GROUP="$(id -g)"
    gosu=""
    DO_CHOWN=0
fi

# set some vars
CLICKHOUSE_CONFIG="${CLICKHOUSE_CONFIG:-/etc/clickhouse-server/config.xml}"

if ! $gosu test -f "$CLICKHOUSE_CONFIG" -a -r "$CLICKHOUSE_CONFIG"; then
    echo "Configuration file '$dir' isn't readable by user with id '$USER'"
    exit 1
fi

# get CH directories locations
DATA_DIR="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=path || true)"
TMP_DIR="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=tmp_path || true)"
USER_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=user_files_path || true)"
LOG_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=logger.log || true)"
LOG_DIR=""
if [ -n "$LOG_PATH" ]; then LOG_DIR="$(dirname "$LOG_PATH")"; fi
ERROR_LOG_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=logger.errorlog || true)"
ERROR_LOG_DIR=""
if [ -n "$ERROR_LOG_PATH" ]; then ERROR_LOG_DIR="$(dirname "$ERROR_LOG_PATH")"; fi
FORMAT_SCHEMA_PATH="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=format_schema_path || true)"

CLICKHOUSE_USER="${CLICKHOUSE_USER:-default}"
CLICKHOUSE_PASSWORD="${CLICKHOUSE_PASSWORD:-}"
CLICKHOUSE_DB="${CLICKHOUSE_DB:-}"
CLICKHOUSE_ACCESS_MANAGEMENT="${CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT:-0}"

for dir in "$DATA_DIR" \
  "$ERROR_LOG_DIR" \
  "$LOG_DIR" \
  "$TMP_DIR" \
  "$USER_PATH" \
  "$FORMAT_SCHEMA_PATH"
do
    # check if variable not empty
    [ -z "$dir" ] && continue
    # ensure directories exist
    if ! mkdir -p "$dir"; then
        echo "Couldn't create necessary directory: $dir"
        exit 1
    fi

    if [ "$DO_CHOWN" = "1" ]; then
        # ensure proper directories permissions
        # but skip it for if directory already has proper premissions, cause recursive chown may be slow
        if [ "$(stat -c %u "$dir")" != "$USER" ] || [ "$(stat -c %g "$dir")" != "$GROUP" ]; then
            chown -R "$USER:$GROUP" "$dir"
        fi
    elif ! $gosu test -d "$dir" -a -w "$dir" -a -r "$dir"; then
        echo "Necessary directory '$dir' isn't accessible by user with id '$USER'"
        exit 1
    fi
done

# if clickhouse user is defined - create it (user "default" already exists out of box)
if [ -n "$CLICKHOUSE_USER" ] && [ "$CLICKHOUSE_USER" != "default" ] || [ -n "$CLICKHOUSE_PASSWORD" ]; then
    echo "$0: create new user '$CLICKHOUSE_USER' instead 'default'"
    cat <<EOT > /etc/clickhouse-server/users.d/default-user.xml
    <yandex>
      <!-- Docs: <https://clickhouse.tech/docs/en/operations/settings/settings_users/> -->
      <users>
        <!-- Remove default user -->
        <default remove="remove">
        </default>

        <${CLICKHOUSE_USER}>
          <profile>default</profile>
          <networks>
            <ip>::/0</ip>
          </networks>
          <password>${CLICKHOUSE_PASSWORD}</password>
          <quota>default</quota>
          <access_management>${CLICKHOUSE_ACCESS_MANAGEMENT}</access_management>
        </${CLICKHOUSE_USER}>
      </users>
    </yandex>
EOT
fi

if [ -n "$(ls /docker-entrypoint-initdb.d/)" ] || [ -n "$CLICKHOUSE_DB" ]; then
    # port is needed to check if clickhouse-server is ready for connections
    HTTP_PORT="$(clickhouse extract-from-config --config-file "$CLICKHOUSE_CONFIG" --key=http_port)"

    # Listen only on localhost until the initialization is done
    $gosu /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" -- --listen_host=127.0.0.1 &
    pid="$!"

    # check if clickhouse is ready to accept connections
    # will try to send ping clickhouse via http_port (max 12 retries by default, with 1 sec timeout and 1 sec delay between retries)
    tries=${CLICKHOUSE_INIT_TIMEOUT:-12}
    while ! wget --spider -T 1 -q "http://127.0.0.1:$HTTP_PORT/ping" 2>/dev/null; do
        if [ "$tries" -le "0" ]; then
            echo >&2 'ClickHouse init process failed.'
            exit 1
        fi
        tries=$(( tries-1 ))
        sleep 1
    done

    clickhouseclient=( clickhouse-client --multiquery --host "127.0.0.1" -u "$CLICKHOUSE_USER" --password "$CLICKHOUSE_PASSWORD" )

    echo

    # create default database, if defined
    if [ -n "$CLICKHOUSE_DB" ]; then
        echo "$0: create database '$CLICKHOUSE_DB'"
        "${clickhouseclient[@]}" -q "CREATE DATABASE IF NOT EXISTS $CLICKHOUSE_DB";
    fi

    for f in /docker-entrypoint-initdb.d/*; do
        case "$f" in
            *.sh)
                if [ -x "$f" ]; then
                    echo "$0: running $f"
                    "$f"
                else
                    echo "$0: sourcing $f"
                    # shellcheck source=/dev/null
                    . "$f"
                fi
                ;;
            *.sql)    echo "$0: running $f"; "${clickhouseclient[@]}" < "$f" ; echo ;;
            *.sql.gz) echo "$0: running $f"; gunzip -c "$f" | "${clickhouseclient[@]}"; echo ;;
            *)        echo "$0: ignoring $f" ;;
        esac
        echo
    done

    if ! kill -s TERM "$pid" || ! wait "$pid"; then
        echo >&2 'Finishing of ClickHouse init process failed.'
        exit 1
    fi
fi

# if no args passed to `docker run` or first argument start with `--`, then the user is passing clickhouse-server arguments
if [[ $# -lt 1 ]] || [[ "$1" == "--"* ]]; then
    exec $gosu /usr/bin/clickhouse-server --config-file="$CLICKHOUSE_CONFIG" "$@"
fi

# Otherwise, we assume the user want to run his own process, for example a `bash` shell to explore this image
echo "Running crond"
crond -b -c /etc/crontabs

exec "$@"

This is required to that cronjobs are run. In addition to that, I created a backup.sh script containing the following code:

#!/bin/bash
BACKUP_NAME=$BACKUP_PRE-$(date -u +%Y-%m-%dT%H-%M-%S)
clickhouse-backup create
if [[ $? != 0 ]]; then
  echo "clickhouse-backup create $BACKUP_NAME FAILED and return $? exit code"
fi

And a crontab file called cron that runs the backup.sh script every day at 1 am:

# min   hour    day     month   weekday command
0   1   *   *   *   sh /var/lib/backup.sh

To incorporate all these changes I created a Dockerfile:

FROM yandex/clickhouse-server:21.3.20.1-alpine

# RUN apt-get update && apt-get install cron -y && apt-get install vim -y

RUN apk update && apk add --no-cache --update busybox-suid
RUN wget https://github.com/AlexAkulov/clickhouse-backup/releases/download/v1.5.2/clickhouse-backup-linux-amd64.tar.gz
RUN tar -xzf clickhouse-backup-linux-amd64.tar.gz
RUN cd build/linux/amd64/ && cp clickhouse-backup /bin/clickhouse-backup
RUN cd ~ && rm -rf clickhouse-backup-linux-amd64.tar.gz build

COPY ./cron /etc/crontabs/root
COPY ./backup.sh /var/lib/backup.sh
COPY ./entrypoint.sh /entrypoint.sh

!Disclaimer: All three scripts have to be executable, for example, modify them with chmod 777.

This Dockerfile is then the base for the Clickhouse database image. To build it we will add the following container configuration to our docker-compose.yml:

    plausible_events_db:
        container_name: plausible_events_db
        build: ./
        image: clickhouse-server
        restart: unless-stopped
        volumes:
            - ./plausible/event-data:/var/lib/clickhouse
        environment:
            BACKUPS_TO_KEEP_LOCAL: 14
            BACKUP_PRE: plausible
        ulimits:
            nofile:
                soft: 262144
                hard: 262144

With this configuration of the cron file and the environment variable BACKUPS_TO_KEEP_LOCAL we create one backup a day and keep it for 14 days.

The backups for Postgres can be found in ./plausible/backup and the backups for clickhouse can be found in ./plausible/event-data/backup.

With this, we set up automated backups for plausible. Now let’s have a look at how to recover the data in case of a data loss!

Recover the data from Backups

In this section, we will have a look at how to recover the data of the two different databases.

PostgreSQL

docker stop plausible_db
docker rm plausible_db
rename the old data folder
docker compose up -d plausible_db
docker restart plausible
check in the browser if everything works as expected (no websites there)
docker exec -it plausible_db bash -c "zcat /var/lib/postgresql/backup/postgres/<backup-dir>/<backup-file>.sql.gz | psql --username=postgres --dbname=plausible_db -W"

Clickhouse

docker stop plausible_events_db
docker rm plausible_events_db
move the backups directory out of the event-data folder
rename the old event-data folder
docker compose up -d plausible_events_db
docker restart plausible
check in the browser if everything works as expected (websites there, but no data)
move the backup folder back into the new event-data folder
docker exec plausible_events_db bash -c "clickhouse-backup restore <backup-dir-name>"

With this, you can recover data after a data loss!

Conclusion

In this post, we created automated backups for plausible by creating a companion container and by updating the clickhouse instance required for plausible. We learned how to set up the backup creation and we also learned how to recover the data in case of a data loss!

I hope this post helped you set up automated backups and that keeps you safe from the struggle that I had when losing my data.

In case you liked this post consider subscribing to my newsletter to get monthly updates on all of my posts!

[convertkit form=2303042]

How To Setup Automated Backups For Plausible

Basics

Automated Backups for Plausible

Recover the data from Backups

Conclusion

Discussion (0)

Related Posts