Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lot of ERROR messages. #6038

Open
HolgerHees opened this issue May 17, 2019 · 2 comments
Open

Lot of ERROR messages. #6038

HolgerHees opened this issue May 17, 2019 · 2 comments

Comments

@HolgerHees
Copy link
Contributor

@HolgerHees HolgerHees commented May 17, 2019

Hi,

first thanks for this awesome software. But I have some trouble and I don't know how to proceed. First I try to find out what else I could provide for this report to be a good bug report.

First, I use the latest version (v1.14.0) from openSUSE build Service. I know, I should compile it by my self to avoid any other causes for this behavior. But for now I still use this package from there.

but now I try to describe my behavior. The good news is that netdata seems to run without any noticeable problems. But I get a lot of ERROR logs and I can't ignore them because otherwise I overlook other ERROR logs. So my policy in my system is to have 0 ERROR logs. Netdata is the last piece of software which prevents me from that.

from time to time I get ERROR messages like "child pid 13701 exited with code 1." without knowing which child process it was. So I don't know which plugin etc was generating this error.

another error I get from time to time is "read failed" followed by a "'/usr/lib/netdata/plugins.d/apps.plugin' (pid 11781) disconnected after 792055 successful data collections (ENDs)." or "'/usr/lib/netdata/plugins.d/nfacct.plugin' (pid 11780) disconnected after 72005 successful data collections (ENDs).". As this binaries are used to collect many different data I also not know what exactly was the problem.

If I check my process list I see that the command line for the parent process was "netdata CLEAR on at Wed May 15 13:00:26 CEST 2019: test.chart new value".

If I say from time to time, I mean ~10 times per day. Sometimes more. Sometimes less. I already tried to change the process scheduler to "other" with a nice value of 0. Because I'm running it on a 4 core atom based server with 16GB memory and a typical load between 0.5 and 1.5.

Is there anything else I can provide without compiling it with enabled debug flags?

best

@cakrit cakrit self-assigned this May 17, 2019
@cakrit
Copy link
Contributor

@cakrit cakrit commented May 17, 2019

We haven't taken the time to review all the messages that go to error.log and you're absolutely correct, netdata is very chatty there. We certainly need to review those messages and do fixes, but it's not a trivial task. I think we can keep this as a feature request and ask the community as well as the team to look into improvements that we can do in this area, with multiple PRs.

@ilyam8
Copy link
Member

@ilyam8 ilyam8 commented May 17, 2019

related: #5480

I don't know how to fix it, because it very depends on the context. For example python.d.plugin produces a lot of error level messages during auto-detection, and this is ok, because all those messages come from jobs and in context of jobs they are errors for sure, but in context of plugin/netdata they are just warnings or even infos.

The easiest solution is to split log files and have a separate file for external plugins.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants