Skip to content

Commit files to WORM automatically #6464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

27149chen
Copy link
Contributor

What problem are we solving?

This pr implements the WORM auto-commit feature mentioned in this pr: #6404

How are we solving the problem?

  1. create a queue and several workers to handle the auto-commit logic
  2. when a file is created or updated, send it to the queue if it is worm enabled
  3. one of the worker will pick the file from the queue, check if it should be committed right now. If yes, commit it by setting WORMEnforcedAtTsNs field, if no, re-add it to the queue later
  4. when filer starts, loop all worm paths to find the files which need to commit and send them to the queue.

about the queue:

  1. it is a priority queue, file with old mtime will be handled first
  2. it ensures that one file can only be added once, if the file is processing, new add reqest will be dropped, if the file is in the queue, it will be replaced by new add request

How is the PR tested?

Checks

  • I have added unit tests if possible.
  • I will add related wiki document changes and link to this PR after merging.

Copy link
Collaborator

@chrislusf chrislusf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • there are multiple filers.
  • the queue will be busy most of the time
  • filer restart is common. iterating through all the files is not acceptable

@27149chen
Copy link
Contributor Author

27149chen commented Jan 22, 2025

  1. It's ok to have multiple filers, if a change is sent to one of the filers, item will be added to the queue in that filer

@27149chen
Copy link
Contributor Author

  1. a worker process includes the following things:
    a. get the entry from store
    b. update entry and save it to store if it is necessary

there are only one or two store query/update, I think it is fast enough. also we can increase the number of workers.
And the queue also can help to dedup the items

@27149chen
Copy link
Contributor Author

27149chen commented Jan 22, 2025

  1. a complete loop is necessary because during the filer downtime , some files may have exceeded the grace period, if no new changes, they will never been committed as worm

the loop happens in backupgroud, and only choose the files which are not worm enforced yet and add it to the queue, so I think the impact is relatively small

@27149chen
Copy link
Contributor Author

@chrislusf please take a look

Signed-off-by: lou <alex1988@outlook.com>

update after review

Signed-off-by: lou <alex1988@outlook.com>

typo

Signed-off-by: lou <alex1988@outlook.com>

Revert "typo"

This reverts commit b141393.

Revert "update after review"

This reverts commit 06da532.

update after review

Signed-off-by: lou <alex1988@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants