Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upGitHub is where the world builds software
Millions of developers and companies build, ship, and maintain their software on GitHub — the largest and most advanced development platform in the world.
Enabling concurrent `ghe-backup` tasks from separate backup hosts #441
Comments
This is an interesting idea, and off the top of my head, I think this is the only location where we're putting in place any sort of locking/semaphore on the appliance side of things. If I'm remembering correctly, keeping a count is probably a good solution. |
A counter may not be sufficient. For example, if a backup was interrupted and the counter wasn't reduced, then the appliance would continue to believe it was running. Adding the hostname (or some other identifying details) may help to detect collisions and resolve them more logically. For example, in the above case if the interrupted backup was rerun, it could "add or replace" it's hostname to the list, then know to "remove" it's hostname when complete. The Extending this to hostname+Backup PID would also account for accidental overlap of multiple backup runs from the same host. As a side effect, if the the semaphore file was left behind, it would identify the backup host that did so. |
Consider a scenario where two backup hosts exist:
There's currently no check in
ghe-backup
which would prevent the simultaneous execution from occurring from different hosts, however it looks like there's a least one place where a semaphore file placed on the appliance (to suspend repository maintenance) may be prematurely removed if the backup tasks were to overlap during specific phases: https://github.com/github/backup-utils/blob/master/share/github-backup-utils/ghe-gc-enable#L35. For example if:Is this the only place where such a condition exists? If so, would it be possible to keep a count of the active backup tasks instead and remove the file only when it reaches 0? Or are there other considerations / complexities to take into account which makes the approach unpredictable or unreliable?