Skip to content

Conversation

@bio-boris
Copy link

@bio-boris bio-boris commented Oct 17, 2025

This PR adds Basic Authentication to the Elasticsearch client, allowing the service to connect to secured clusters.

The startup script has been updated to use the new authentication variables, resolving the connection issues when pointed at the elastic-next-spike instance. I've deployed and verified this on the next environment.

Next Steps:

Populate a test Elasticsearch instance with actual data + auth to fully verify end-to-end functionality.
Create a user for elastic

Closes #90

Add authentication (#96)
Copilot AI review requested due to automatic review settings October 17, 2025 18:03
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds authentication support for Elasticsearch connections throughout the application. The changes implement both Basic authentication (username/password) and Bearer token authentication with proper priority handling.

  • Adds authentication configuration options for Elasticsearch (username, password, and token)
  • Updates all Elasticsearch client code to include authentication headers
  • Implements comprehensive test coverage for the new authentication functionality

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/utils/config.py Adds auth configuration variables and get_elasticsearch_auth_header() function
src/utils/wait_for_service.py Updates service connection function to accept and use auth tokens
src/server/main.py Updates service startup to use authentication when connecting to Elasticsearch
src/es_client/query.py Adds authentication headers to Elasticsearch search requests
tests/helpers/init_elasticsearch.py Updates test helper to use centralized auth header function
tests/unit/utils/test_config.py Adds comprehensive tests for authentication header generation
tests/unit/utils/test_wait_for_service.py Adds tests for service connection with and without authentication
.github/workflows/docker-image.yml Adds CI/CD workflow for Docker image building

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@codecov
Copy link

codecov bot commented Oct 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (23ec7c1) to head (ec06906).

Additional details and impacted files
@@            Coverage Diff            @@
##           develop       #97   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           21        21           
  Lines          660       685   +25     
=========================================
+ Hits           660       685   +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

logger.info("Using Basic Authentication for Elasticsearch.")
return f"Basic {base64_credentials}"
elif auth_token:
# Bearer authentication
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no way to test this live right now, as this requires enabling SSL communication to be able to generate API keys.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so with our current elastic search configuration we can't use auth tokens?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We would have to create and distribute SSL certs and enable HTTPS. @jbezouska-ANL were there any plans to enable HTTPS on the backend?

Comment on lines 68 to 70
Priority:
1. If username and password are set, use Basic authentication
2. If only auth token is set, use Bearer authentication
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this precedence?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I picked it arbitrarily.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking is that tokens are more secure than user/pwd so they should be preferred

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we can get tokens working, we can delete the username and password code too, but since they aren't working yet, I just got it working with what we have.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well leave it if it works and just swap the preference

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked this up the other day and AI says they aren't more secure.

TL;DR: With One Username Per Service, They're Identical
If you have one username per service, then API keys and username/password are literally the same thing security-wise:
Both are:

Base64 encoded strings in headers
Unique per service (no shared credentials)
Equally vulnerable if intercepted
Equally easy to revoke (just disable that one user/key)
Can have identical RBAC permissions

The only remaining differences are cosmetic:

API Key: Authorization: ApiKey <base64(id:key)>
Basic Auth: Authorization: Basic <base64(username:password)>

Both decode to two strings separated by a colon. Both can be scoped, rotated, and revoked independently.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Equally easy to revoke (just disable that one user/key)

I don't think that's true, it's easier to revoke a token than change a password

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it easier? Both will require shutting down and restarting the service?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed it was easier on the ES side, like how in KBase revoking a token is trivial.

If that's not the case and you're saying that passwords and tokens are completely indistinguishable for this use case then we should just drop token support

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I haven't tested it out, I was thinking more from the side of what happens to the service. Ok I will remove it.

bio-boris and others added 4 commits October 17, 2025 15:59
Refactor tests for wait_for_service to reduce duplication and improve clarity.
@bio-boris
Copy link
Author

@MrCreosote Can we merge this?

@MrCreosote
Copy link
Member

This one is still open #97 (comment)

bio-boris and others added 16 commits October 21, 2025 12:00
Updated the docstring to clarify test purpose.
Removed unused logging import and logger initialization.
Remove comment about calculating Elasticsearch auth header.
Replaced centralized auth header function with config-based authorization header retrieval.
Removed unused import of get_elasticsearch_auth_header.
Refactor authentication header retrieval in show_indexes function.
Refactor Elasticsearch authentication header retrieval.
params = {'allow_no_indices': 'true'}

resp = requests.post(url, data=json.dumps(options), params=params, headers=headers)
resp = requests.post(url, data=json.dumps(options), params=params, headers=headers, timeout=[120, 600])
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added timeouts, maybe I should instead add an annotation for bandit to ignore these and not add timeouts, since I don't know what the real timeouts should be?

@bio-boris
Copy link
Author

bio-boris commented Oct 21, 2025

Closing in favor of #100

@bio-boris bio-boris closed this Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add authentication for elasticsearch

3 participants