-
Notifications
You must be signed in to change notification settings - Fork 21
feature/rbac-2 #405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feature/rbac-2 #405
Conversation
|
Not yet addressed:
@Avantol13 - could you review and comment 🧾 User Story 1: Control Whether Records Are DiscoverableTitle: Configurable Discovery of Indexd Records As a platform operator, Acceptance Criteria
🧾 User Story 2: Global Discovery Authorization ControlTitle: Global Discovery Authz for Indexd Records, Support Discovery Access Independent from File Access As a system administrator, As a data commons architect, Acceptance CriteriaAssuming ARE_RECORDS_DISCOVERABLE=False
📌 Configuration Summary# Whether any records are discoverable at all
ARE_RECORDS_DISCOVERABLE = True # default: True
# Override per-record authz for GET/read
# Only applies to record discovery (not file access)
# If None, use per-record `authz`
GLOBAL_DISCOVERY_AUTHZ = ["/indexd/discovery"] |
|
In general the comments above look good, thanks for all the detail. This part:
I think needs to actually behave similar to READ filtering based on config. In other words, if you request a did and you do have access to authz, this should return 200. If you request a did and do you have access to the global authz that's configured, this should return 200. Basically this should only 403 in situations where this record itself would've been filtered out. |
indexd/auth/drivers/alchemy.py
Outdated
| resources = self.arborist.auth_mapping() | ||
| return resources | ||
|
|
||
| @timed_cache(1800) # Cache for 30 minutes (typical JWT expiration time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't hard-code this because we absolutely cannot have a response cached beyond the expiration. This has to be dynamic based on the expiration of the token. Our security is heavily reliant on the guarantee that the expiration ensures no access beyond that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re. requirements
has to be dynamic based on the expiration of the token
Understood. At the same time, previous feedback stated:
Arborist allows no token to be sent on purpose, it allows assignment of anonymous access.
Additionally, AFAIK, no validation of the token occurs now in indexd. ie no calls to authutils.token.validate_jwt()
So, if there is a token:
- 🆕 we can check to ensure it has not expired, use expiry time as ttl
- already being used as a cache key
If there is no token:
- 🆕 use maximum_ttl_seconds as ttl
- 🆕 add authentication header to cache key (for basic and no auth)
Other:
- 🆕 clean up any unused cache entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Thanks. I edited the comment above |
|
|
@Avantol13 I've addressed all PR items. Please see #405 (comment) for a followup question. |
Avantol13
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re: db driver and how to centrally organize things. Here's my current thinking:
We should, theoretically, be able to move all the new code we need with stateful decisions out of the db driver b/c nothing really needs the db.
Here's my idea:
Put the authz check for discovery in a similar authorize decorator to this
indexd/indexd/auth/__init__.py
Line 9 in b5b198a
| def authorize(*p): |
authorize_discovery and add that decorator everywhere you need. The logic in there should look like this:
- if is_discovery_enabled (check config only)
- Get user's authz (perhaps cache, could even use flask's per-request cache flask.g if that somehow simplifies - I know that won't save beyond the request)
- if config GLOBAL_AUTHZ set
- Check if the GLOBAL_authz is in the user's authz
- if config GLOBAL not set
- Check if user's authz contains records authz (this will require making a db call based on the request's ID)
done. Now we have appropriately denied access pre-blueprint logic with this decorator.
Within the blueprints that need the logic for filtering, now we can implement a shared set of util functions.
Before making a db query:
* if is_discovery_enabled (check config only)
* get user authz (perhaps from cache)
* don't worry about GLOBAL AUTHZ at all b/c we already handled that in the new decorator of this endpoint, so we only get here if they are authorized
* Add filter for records to db query based on user authz
This way, the stateful logic stays out of the db driver and in the request handling (which is where it should be) and we have minimized duplication of code as much as possible.
What do you think?
indexd/auth/drivers/__init__.py
Outdated
|
|
||
| return result | ||
|
|
||
| def calculate_ttl(now, token) -> int | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we might be overcomplicating this cache. Can't we simply keep the exp in the cache itself and instead of doing ttl math, just invalidate entries before now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Avantol13 - I might be missing what you are saying here. simply keep the exp in the cache
could you expand a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't calculate ttl, just put the token and exp in the cache and instead of calculating it every time you check the cache, just get the entries with exp that isn't past now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would simplify a lot of this code imo
Using ContextVar to decouple the web tier from the database driverWhy this patternCurrent request-scoped values (Bearer token, Arborist resources) are collected in the web tier but consumed in the data tier (for row-level security, audit columns, tenancy filters, per-request logging, etc.). Passing these values explicitly through every call (blueprints → services → repositories) scatters boilerplate and causes wide signature churn. contextvars.ContextVar provides implicit, request-scoped propagation without coupling the DB code to Flask. It gives you: Separation of concerns: web code gathers state once; DB/ORM code reads it when needed.
|
Avantol13
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the contextvar design. I appreciate the considerations and do see where you're coming from on some of the points, but I still don't agree with the approach.
The main thing I'm trying to ensure is appropriate separation between the web and data layer. We don't want explicit or implicit coupling. The data layer should not need to understand web layer. The endpoint handling the incoming request should be translating what it can and interacting with data through an explicit, well-defined interface, not relying on an implicit context being available. The contextvar would be a sort of hidden dependency in the data layer, which complicates this data layer abstraction.
I do understand and appreciate the attempt to reduce churn, but the implicit coupling imo is not worth the drawbacks.
I'm open to counters to the idea I had, but I really want to land on a solution that doesn't (explicitly or implicitly) couple the data layer to the web request context.
indexd/auth/drivers/__init__.py
Outdated
|
|
||
| return result | ||
|
|
||
| def calculate_ttl(now, token) -> int | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't calculate ttl, just put the token and exp in the cache and instead of calculating it every time you check the cache, just get the entries with exp that isn't past now
indexd/auth/drivers/__init__.py
Outdated
|
|
||
| return result | ||
|
|
||
| def calculate_ttl(now, token) -> int | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would simplify a lot of this code imo
API Endpoints DocumentationA preliminary list of blueprint endpoints that are impacted indexd/index/drivers/alchemy.py
indexd/index/drivers/query/urls.py:
|
sequenceDiagram
autonumber
actor Client
participant Flask as Flask App
participant Hook as "before_request: ensure_auth_context()"
participant Arborist as "Arborist (auth service)"
participant Ctx as "ContextVar<request_ctx>"
participant Repo as "Driver.method()"
participant Decorator as "@authorize_discovery"
participant DB as "Indexd DB"
Client->>Flask: HTTP GET /ids …
note over Flask,Hook: middleware, called before all blueprints
Flask->>Hook: Trigger before_request
Hook->>Flask: Read config\nARE_RECORDS_DISCOVERABLE, GLOBAL_DISCOVERY_AUTHZ
alt are_records_discoverable == false
Hook->>Arborist: auth.get_authorized_resources()
Arborist-->>Hook: {resource -> [permissions]}
Hook->>Hook: filter for read@indexd|*\ncompute can_user_discover
else are_records_discoverable == true
Hook->>Hook: can_user_discover = true\nauthorized_resources = []
end
Hook->>Ctx: request_ctx.set({\n are_records_discoverable,\n can_user_discover,\n authorized_resources\n})
note over Flask: call specific endpoint e.g. `/index`
Flask->>Decorator: call Driver.ids(...)
note over Decorator,Repo: Inject can_user_discover, authorized_resources only if args missing/None
Decorator->>Ctx: get_auth_context()
Decorator->>Repo: ids(...,\n can_user_discover, authorized_resources)
Repo->>DB: query with discovery rules\n(ACL/authz filters)
DB-->>Repo: results
Repo-->>Flask: IDs payload
Flask-->>Client: 200 OK (JSON)
note over Ctx: Framework-agnostic state\n(no Flask import in Driver)
Request hits Flask → ensure_auth_context seeds a ContextVar with discovery flags. The @authorize_discovery decorator then pulls can_user_discover and authorized_resources and injects them into wrapped function, for example ids() , so the repo/DB layer stays Flask-agnostic and only consumes plain function parameters. |
|
Thanks for the thoughtful review, @Avantol13 I’ve aligned the behavior and added a small infrastructure layer to make it reliable and testable. What changed (referencing commit
|
Avantol13
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, here's where I'm at:
- I am happy with the general adoption of the idea of the separating web/data
- The implementation details using this contextvar approach could be simplified
The key items to solve still (let me know if I'm missing anything):
We don't want to hit the data layer twice for indexd records when checking their authz for discovery.
Solution: Use flask.g to store the record (already handles all the context properly per request) after hitting DAL in request decorator by extracting GUID from request in web layer and hitting existing DAL to get a record.
We don't want to hit arborist to get resources twice.
Solution: Use the cache that was built for persisting this - don't bother with any context, that's what the cache is for.
With the above: I don't think we need any home-grown parameter injection or manual context management at all. Sure we have to update existing blueprints to add a new decorator, I'm fine with that.
| if any(resource in authorized_resources for resource in global_discovery_authz): | ||
| can_user_discover = True | ||
|
|
||
| # Remove global discovery authz resources from authorized_resources to avoid confusion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh, don't do this. It's possible for people for authorize data the same way they authorize discovery, in which case we would want the resource(s) in there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave the authorized resources untouched, as they should represent the users full authz
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Will do.
|
|
||
| from indexd import auth | ||
|
|
||
| discovery_context = ContextVar("discovery_context", default={}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should just use the web framework's context capabilities already built-in. I'm worried about edge cases around this global.
https://flask.palletsprojects.com/en/stable/appcontext/
use flask.g instead: https://flask.palletsprojects.com/en/stable/appcontext/#storing-data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already built for this exact purpose and manages the clearing of the context as part of the general framework
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, ensure_auth_context will no longer be called. Note that a "context" var holder is no longer as we are passing the variables explicitly
The main thing I'm trying to ensure is appropriate separation between the web and data layer. We don't want explicit or implicit coupling. The data layer should not need to understand web layer. The endpoint handling the incoming request should be translating what it can and interacting with data through an explicit, well-defined interface, not relying on an implicit context being available
|
|
||
| set_auth_context(are_records_discoverable=are_records_discoverable, | ||
| can_user_discover=can_user_discover, | ||
| authorized_resources=authorized_resources) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this duplicating some of the logic in the caching? There's got to be a cleaner way to do all this.
Flask → ensure_auth_context seeds a ContextVar with discovery flags
We don't really need the context to contain discovery flags, we should just appropriately react if they don't have access by intercepting the request - I think we may be overcomplicating things with the context now that I'm thinking more.
We don't need any context persisted other than ideally the user's authorization resources (so we don't have to hit arborist twice). We already have a cache for that information. So we just need to ensure that we use the cache in the inception of the request where we make the call and then we don't need any of this context.
And we may need the interaction with arborist to happen in a decorator capturing the request (not running before_request). We could keep the global config check before_request, but if there's a global authz then we have to handle that later with user info, so it might be better to actually not use before_request.
so I think the original idea still stands: #405 (review)
and to address this: #405 (comment)
You can use flask.g to store the record after interaction with the DAL and check for that later in the endpoints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, web tier will resolve config and authorized_resources on demand via auth_context() and pass to db layer. See latest change to get_all_index_record_versions:
Updates the get_all_index_record_versions view to call auth_context() and pass can_user_discover and authorized_resources to the index driver’s get_all_versions() method.
| - Only inject for parameters actually present in the target signature. | ||
| - Only fill when the argument is missing or None (does not override explicit values). | ||
| """ | ||
| sig = inspect.signature(func) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This all scares me a bit. I don't really want to add home-grown injection that relies on downstream functions to know to use these parameter names. I would rather rely on the web framework's context management (even if that logic ends up in the endpoints) and a pattern of decorators (which already exists).
Then, in endpoints where you need to handle some per request context that may or may not exist, do:
indexd_record = flask.g.indexd_record or get_record()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, ensure_auth_context will no longer be called.
deprecated methods (ensure_auth_context, authorize_discovery) removed.
Summary of the changes made in commit
|
|
Hey @bwalsh, per
Could you go ahead and make all those necessary changes so we can see the overall new approach without it being mixed in with the old approach? We'll have the git history if we need to refer back or revert, but it's hard to understand the implications without making the full change. |


New Features
adds RBAC to db operations).add cached call to Arborist).Adds RBAC config).Breaking Changes
Bug Fixes
Improve error handling).Improvements
developer documentation).test RBAC).Dependency updates
Deployment changes
RBAC. When set toTrue, RBAC enforcement is active for protected records.