Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop dirty dbengine pages if disk cannot keep up #7777

Merged
merged 4 commits into from Feb 6, 2020

Conversation

@mfundul
Copy link
Contributor

@mfundul mfundul commented Jan 17, 2020

Summary

Fixes #7268
Fixes #7468

Component Name

database/engine

Additional Information

Introduce dirty page pressure handling in the dbengine page cache that invalidates pages when the disk cannot keep up with the flushing speed. Make sure progress is made by managing real threads instead of relying on libuv workers since the workers will be exhausted by internal I/O queuing.

@mfundul mfundul requested review from amoss, thiagoftsm and stelfrag Jan 17, 2020
@mfundul mfundul requested review from cosmix, ktsaou and vlvkobal as code owners Jan 17, 2020
@squash-labs
Copy link

@squash-labs squash-labs bot commented Jan 17, 2020

Manage this branch in Squash

Test this branch here: https://mfunduldbengine-drop-dirty-pag-n9rmv.squash.io
@mfundul mfundul removed request for cosmix, ktsaou and vlvkobal Jan 17, 2020
@mfundul mfundul force-pushed the mfundul:dbengine-drop-dirty-pages branch from b43998e to c1648af Jan 20, 2020
@mfundul mfundul changed the title [WIP] Drop dirty dbengine pages if disk cannot keep up Drop dirty dbengine pages if disk cannot keep up Jan 20, 2020
@amoss
Copy link
Contributor

@amoss amoss commented Jan 21, 2020

I've started reviewing the code - but this will take a few days. Last time this code was changed we saw some subtle bugs so I need to read around it more this time. Conveniently I'm examining the dbengine code for the roadmap task anyway so there is some overlap...

caller-graph:
image

call-graph:
image

@thiagoftsm thiagoftsm self-requested a review Jan 29, 2020
@mfundul mfundul force-pushed the mfundul:dbengine-drop-dirty-pages branch from b145560 to 1054a6e Jan 30, 2020
array[33] = (uint64_t)ctx->stats.flushing_errors;
array[34] = (uint64_t)global_flushing_errors;
assert(RRDENG_NR_STATS == 35);
array[33] = (uint64_t)ctx->stats.pg_cache_over_half_dirty_events;

This comment has been minimized.

@amoss

amoss Jan 30, 2020
Contributor

There is a comment about that changing the indices can break user code. Does changing the use of index 33 from the number of flushing errors to the number of over_half_dirty events affect any user code?

This comment has been minimized.

@mfundul

mfundul Jan 30, 2020
Author Contributor

All corresponding statistics have been adjusted. The hazard of memory corruption is about the number of the actual statistics. Changing individual statistics only changes the semantics.

This comment has been minimized.

@amoss

amoss Jan 30, 2020
Contributor

Ah ha. I see that you have changed the number of the actual statistics as well...

@ilyam8
Copy link
Member

@ilyam8 ilyam8 commented Jan 30, 2020

this #7777 is lucky one we need to merge it asap 😄

@thiagoftsm thiagoftsm self-requested a review Jan 30, 2020
Copy link
Contributor

@stelfrag stelfrag left a comment

I am approving. I can not find something that shouldn't work as expected. Steps:

  1. Check condition that will trigger invalidation of pages
  2. Queue command
  3. Start the cleanup (judy magic and more) 😄
  4. Cleanup
@mfundul mfundul merged commit 6b119d9 into netdata:master Feb 6, 2020
16 checks passed
16 checks passed
@github-actions
runner / golangci-lint
Details
@github-actions
runner / eslint
Details
@github-actions
runner / shellcheck
Details
@netlify
Header rules - netdata No header rules processed
Details
@lgtm-com
LGTM analysis: JavaScript No code changes detected
Details
@lgtm-com
LGTM analysis: Python No code changes detected
Details
@netlify
Pages changed - netdata 4 new files uploaded
Details
Codacy/PR Quality Review Up to standards. A positive pull request.
Details
@lgtm-com
LGTM analysis: C/C++ No new or fixed alerts
Details
@netlify
Mixed content - netdata No mixed content detected
Details
@netlify
Redirect rules - netdata 2 redirect rules processed
Details
@squash-labs
Squash environment: netdata Successful in 3.21 minutes - Received a success response
Details
@travis-ci
Travis CI - Pull Request Build Passed
Details
@wip
WIP Ready for review
Details
license/cla Contributor License Agreement is signed.
Details
@netlify
netlify/netdata/deploy-preview Deploy preview ready!
Details
@mfundul mfundul deleted the mfundul:dbengine-drop-dirty-pages branch Feb 6, 2020
Saruspete added a commit to Saruspete/netdata that referenced this pull request May 21, 2020
* Introduce dirty page pressure handling in the dbengine page cache that invalidates pages when the disk cannot keep up with the flushing speed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

5 participants