Enabling concurrent `ghe-backup` tasks from separate backup hosts #441

taz · 2018-09-25T02:32:52Z

Consider a scenario where two backup hosts exist:

One for regular snapshots, situated in the same datacenter as the appliance.
One for taking less frequent off-site backups (where copying the snapshots from the host above is not viable)

There's currently no check in ghe-backup which would prevent the simultaneous execution from occurring from different hosts, however it looks like there's a least one place where a semaphore file placed on the appliance (to suspend repository maintenance) may be prematurely removed if the backup tasks were to overlap during specific phases: https://github.com/github/backup-utils/blob/master/share/github-backup-utils/ghe-gc-enable#L35. For example if:

Backup A started the repository backup phase and creates a semaphore file.
The background maintenance queue on GitHub Enterprise is suspended.
Backup B commences its backup and overwrites the same semaphore file.
Backup A completes its repository backup phase and removes the semaphore file.
The background maintenance queue on GitHub Enterprise starts draining.
Repositories data is potentially modified by a maintenance task.
Backup B completes its repository backup phase, there is no semaphore file to remove.

Is this the only place where such a condition exists? If so, would it be possible to keep a count of the active backup tasks instead and remove the file only when it reaches 0? Or are there other considerations / complexities to take into account which makes the approach unpredictable or unreliable?

lildude · 2018-09-27T17:41:32Z

Is this the only place where such a condition exists? If so, would it be possible to keep a count of the active backup tasks instead and remove the file only when it reaches 0?

This is an interesting idea, and off the top of my head, I think this is the only location where we're putting in place any sort of locking/semaphore on the appliance side of things. If I'm remembering correctly, keeping a count is probably a good solution.

kathodos · 2019-05-24T07:07:04Z

Or are there other considerations / complexities to take into account which makes the approach unpredictable or unreliable?

A counter may not be sufficient. For example, if a backup was interrupted and the counter wasn't reduced, then the appliance would continue to believe it was running.

Adding the hostname (or some other identifying details) may help to detect collisions and resolve them more logically. For example, in the above case if the interrupted backup was rerun, it could "add or replace" it's hostname to the list, then know to "remove" it's hostname when complete. The

Extending this to hostname+Backup PID would also account for accidental overlap of multiple backup runs from the same host.

As a side effect, if the the semaphore file was left behind, it would identify the backup host that did so.

taz added enhancement question backup labels Sep 25, 2018

github / backup-utils

Enabling concurrent `ghe-backup` tasks from separate backup hosts #441

Enabling concurrent `ghe-backup` tasks from separate backup hosts #441

taz commented Sep 25, 2018

lildude commented Sep 27, 2018

kathodos commented May 24, 2019

github / backup-utils

Join GitHub today

GitHub is where the world builds software

Enabling concurrent `ghe-backup` tasks from separate backup hosts #441

Enabling concurrent `ghe-backup` tasks from separate backup hosts #441

Comments

taz commented Sep 25, 2018

lildude commented Sep 27, 2018

kathodos commented May 24, 2019

Essential cookies

Always active

Analytics cookies