Skip to content

Nondeterministic "Cannot fetch logs" error in Nomad UI #27821

@nh2

Description

@nh2

Nomad version

v1.11.3

Operating system and Environment details

NixOS Linux

Issue

Often, when trying to view logs in the Nomad web UI, I get this error:

Cannot fetch logs
The logs for this task are inaccessible. Check the condition of the node the allocation is on.

Image

The issue is nondeterministic:

  • Within the same browser (e.g. Firefox): With sufficient amount of refreshing, I can toggle between the error being shown, and logs actually loading. Sometimes that can take a long time though.
  • Across different browsers: Right now as I'm typing this sentence, I have a situation that when refreshing in Firefox, the error is shown, while in Chromium (refreshing simultaneously), the logs are shown successfully. On the same the URL.
    • This is after logs were shown successfully on that same Firefox 10 minutes earlier.

Observations:

  • In the browser devtools, I can see that the browser first fails to load http://10.0.5.10:4646/v1/client/fs/logs/7eee1a15-...?follow=true... which is the IP of the allocation's client, which cannot be connected to from my browser. This is apparently expected as explained in Nomad UI - Why are logs piped directly from agents?  #6409 (comment)
    • The log warnings slightly differ (not sure that's relevant):
      • In a log-showing Chromium: LOG FETCH: Couldn't connect to //10.0.5.10:4646/v1/client/fs/logs/7eee1a15...
      • In a failing Firefox: LOG FETCH: Couldn't connect to /v1/client/fs/logs/7eee1a15
  • I notice that the underlying errors that prevent connecting to 10.0.5.10 are slightly different:
    • In a log-showing Chromium: GET http://10.0.5.10:4646/v1/client/fs/logs/7eee1a15-70c5-9fe5-b9b3-7fce76cc0ae0?follow=true&offset=50000&origin=end&task=script-task&type=stderr net::ERR_CONNECTION_REFUSED
    • In a failing Firefox: Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at http://10.0.5.10:4646/v1/client/fs/logs/7eee1a15-70c5-9fe5-b9b3-7fce76cc0ae0?follow=true&offset=50000&origin=end&task=script-task&type=stderr. (Reason: CORS request did not succeed). Status code: (null).
      That said, in Firefox's Network panel it also says NS_ERROR_CONNECTION_REFUSED for that request in the Transferred column.

Failing screenshot:

Image

Succeeding screenshot:

Image

Suffice to say, the actual log file on the Nomad client is available at all times in /var/lib/nomad/alloc/....

Reproduction steps

  • Unclear

Expected Result

  • Logs load reliably.

Actual Result

  • Logs load unreliably as described above.

Possibly related issues

The same error message appears in:

However, these issues were unhelpfully closed and locked by the github-actions stalebot, thus making it impossible to ask detail questions there.

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Triaging

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions