Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling concurrent `ghe-backup` tasks from separate backup hosts #441

Open
taz opened this issue Sep 25, 2018 · 3 comments
Open

Enabling concurrent `ghe-backup` tasks from separate backup hosts #441

taz opened this issue Sep 25, 2018 · 3 comments

Comments

@taz
Copy link
Member

@taz taz commented Sep 25, 2018

Consider a scenario where two backup hosts exist:

  • One for regular snapshots, situated in the same datacenter as the appliance.
  • One for taking less frequent off-site backups (where copying the snapshots from the host above is not viable)

There's currently no check in ghe-backup which would prevent the simultaneous execution from occurring from different hosts, however it looks like there's a least one place where a semaphore file placed on the appliance (to suspend repository maintenance) may be prematurely removed if the backup tasks were to overlap during specific phases: https://github.com/github/backup-utils/blob/master/share/github-backup-utils/ghe-gc-enable#L35. For example if:

  1. Backup A started the repository backup phase and creates a semaphore file.
  2. The background maintenance queue on GitHub Enterprise is suspended.
  3. Backup B commences its backup and overwrites the same semaphore file.
  4. Backup A completes its repository backup phase and removes the semaphore file.
  5. The background maintenance queue on GitHub Enterprise starts draining.
  6. Repositories data is potentially modified by a maintenance task.
  7. Backup B completes its repository backup phase, there is no semaphore file to remove.

Is this the only place where such a condition exists? If so, would it be possible to keep a count of the active backup tasks instead and remove the file only when it reaches 0? Or are there other considerations / complexities to take into account which makes the approach unpredictable or unreliable?

@lildude
Copy link
Member

@lildude lildude commented Sep 27, 2018

Is this the only place where such a condition exists? If so, would it be possible to keep a count of the active backup tasks instead and remove the file only when it reaches 0?

This is an interesting idea, and off the top of my head, I think this is the only location where we're putting in place any sort of locking/semaphore on the appliance side of things. If I'm remembering correctly, keeping a count is probably a good solution.

@kathodos
Copy link
Contributor

@kathodos kathodos commented May 24, 2019

Or are there other considerations / complexities to take into account which makes the approach unpredictable or unreliable?

A counter may not be sufficient. For example, if a backup was interrupted and the counter wasn't reduced, then the appliance would continue to believe it was running.

Adding the hostname (or some other identifying details) may help to detect collisions and resolve them more logically. For example, in the above case if the interrupted backup was rerun, it could "add or replace" it's hostname to the list, then know to "remove" it's hostname when complete. The

Extending this to hostname+Backup PID would also account for accidental overlap of multiple backup runs from the same host.

As a side effect, if the the semaphore file was left behind, it would identify the backup host that did so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
@lildude @taz @kathodos and others
You can’t perform that action at this time.