Skip to content

[RFC-0010] Multi-Tenant Workload Identity #5209

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 14, 2025

Conversation

matheuscscp
Copy link
Member

@matheuscscp matheuscscp commented Feb 23, 2025

In this RFC we aim to add support for multi-tenant workload identity in Flux, i.e. the ability to specify at the object-level which set of cloud provider permissions must be used for interacting with the respective cloud provider on behalf of the reconciliation of the object. In this process, credentials must be obtained automatically, i.e. this feature must not involve the use of secrets. This would be useful in a number of Flux APIs that need to interact with cloud providers, including all controllers except helm-controller.

Link: https://github.com/fluxcd/flux2/blob/main/rfcs/0010-multi-tenant-workload-identity/README.md

Umbrella issue for implementation: #5022

@matheuscscp matheuscscp added the area/rfc Feature request proposals in the RFC format label Feb 23, 2025
@matheuscscp matheuscscp force-pushed the rfc-multi-tenant-workload-identity branch 2 times, most recently from b795adf to 3034741 Compare February 23, 2025 04:33
@stefanprodan stefanprodan changed the title [RFC-0010] Multi-Tenant Workload Identity [RFC] Multi-Tenant Workload Identity Feb 23, 2025
@stefanprodan stefanprodan marked this pull request as draft February 23, 2025 10:39
@matheuscscp matheuscscp force-pushed the rfc-multi-tenant-workload-identity branch 15 times, most recently from 5a5f4ac to 614bbc8 Compare March 1, 2025 17:03
@matheuscscp matheuscscp force-pushed the rfc-multi-tenant-workload-identity branch 10 times, most recently from f505036 to 3e9b36a Compare March 8, 2025 01:36
@matheuscscp matheuscscp force-pushed the rfc-multi-tenant-workload-identity branch 2 times, most recently from f3dc52d to 2f0b445 Compare March 29, 2025 01:51
Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matheuscscp good job putting this together. From a high-level it is a great feature and would further bolster Flux's least privileged model.

I added a few nits, but overall multi-tenancy is a charged term, so it is quite important to define what precisely we mean by that and what guarantees we are providing as part of this RFC.

My take is that this supports out-of-the-box the multiple teams model. The multiple customers model is achievable, but requires stronger security boundaries at cluster topology level - for coping with more hostile environment.

It would be nice to explicitly call out in the RFC:

  1. The cache keys format, so that it is crystal clear that it will be resistant to cross-tenant takeovers. As you already pointed out, when sharing the same cache storage, the cache key will require some level of tenant or object level information to provide isolation. What in the format picked will result in a malicious tenant not being able to forge it (e.g. the fully qualified service account name plus x, y and z)?
  2. What security controls are in place (or can be implemented by the user) to ensure that even if tenant A knows the Cloud Provider annotations for tenant B, it won't be able to impersonate them.
  • This can be as simple as they enforcing rules via Admission Controllers (e.g. Kubewarden), as per some examples of Flux multi-tenancy. Or something more sophisticated as part of the new feature.
  1. The limitations of this feature for multi-tenancy in hostile environments where the tenants are not trustworthy. Suggestions on how to overcome them is optional, as that can be a larger topic - e.g. stronger isolation so that tenants can't impersonate each other's identities even if they know them and can bypass controls that may be operating at a degraded state (i.e. admission controllers).

@matheuscscp
Copy link
Member Author

matheuscscp commented Mar 29, 2025

@pjbgf Thank you very much for the review!

multi-tenancy is a charged term, so it is quite important to define what precisely we mean by that and what guarantees we are providing as part of this RFC.
My take is that this supports out-of-the-box the multiple teams model. The multiple customers model is achievable, but requires stronger security boundaries at cluster topology level - for coping with more hostile environment.

Agreed! Added a section to clarify this on the beginning 👍

The cache keys format, so that it is crystal clear that it will be resistant to cross-tenant takeovers. As you already pointed out, when sharing the same cache storage, the cache key will require some level of tenant or object level information to provide isolation. What in the format picked will result in a malicious tenant not being able to forge it (e.g. the fully qualified service account name plus x, y and z)?

Format added 👍 And specifically about:

What in the format picked will result in a malicious tenant not being able to forge it (e.g. the fully qualified service account name plus x, y and z)?

In the Cache Key section, the five paragraphs following the list of components are there to justify the presence of each component, be it for preventing malicious tenants from forging/stealing tokens from other tenants, or for making sure that served tokens are according to the request specifications and hence will be valid for the use case in question. I added a ##### Justification header on the beginning of these paragraphs to make it clearer.

What security controls are in place (or can be implemented by the user) to ensure that even if tenant A knows the Cloud Provider annotations for tenant B, it won't be able to impersonate them.
This can be as simple as they enforcing rules via Admission Controllers (e.g. Kubewarden), as per some examples of Flux multi-tenancy. Or something more sophisticated as part of the new feature.

Great point, there's a paragraph in the Technical Background section pointing out how this works. This is inherently built into the workload identity features of each cloud provider, you must create a strong link between the Kubernetes ServiceAccount and the cloud provider identity by granting impersonation permission to the latter on the former, see the original paragraph I wrote:

Another aspect of workload identity that is important for this RFC is how the cloud identities are associated with the Kubernetes ServiceAccounts. In most cases, an identity from the IAM service of the cloud provider (e.g. a GCP IAM Service Account, or an AWS IAM Role) is associated with a Kubernetes ServiceAccount by the process of impersonation. Permission to impersonate the cloud identity is granted to the ServiceAccount.

Here's the updated paragraph with a bit more details at the end:

Another aspect of workload identity that is important for this RFC is how the cloud
identities are associated with the Kubernetes ServiceAccounts. In most cases, an
identity from the IAM service of the cloud provider (e.g. a GCP IAM Service Account,
or an AWS IAM Role) is associated with a Kubernetes ServiceAccount by the process
of impersonation. Permission to impersonate the cloud identity is granted to the
ServiceAccount through a configuration that points to the fully qualified name of
the Kubernetes ServiceAccount, i.e. the name and namespace of the ServiceAccount
and which cluster it belongs to in the name/address system of the cloud provider.

So essentially the identities are not secret, knowing what cloud identity a tenant uses gives no advantages to a malicious neighbor tenant whatsoever.

The limitations of this feature for multi-tenancy in hostile environments where the tenants are not trustworthy. Suggestions on how to overcome them is optional, as that can be a larger topic - e.g. stronger isolation so that tenants can't impersonate each other's identities even if they know them and can bypass controls that may be operating at a degraded state (i.e. admission controllers).

As discussed right above, no admission controllers are required, the impersonation permission is implemented and enforced by the cloud provider. You grant impersonation permission for ServiceAccount A to impersonate cloud identity X at the cloud provider level. The ServiceAccount B will never be able to impersonate cloud identity X if you don't give it the same permission. The model here is zero trust, ServiceAccount B has no impersonation permissions by default.

Workload identity is pretty solid :)

@matheuscscp
Copy link
Member Author

New SOPS version 3.10.0 released with the GCP KMS oauth2.TokenSource authentication method and already bumped in kustomize-controller:

fluxcd/kustomize-controller#1410

@matheuscscp matheuscscp force-pushed the rfc-multi-tenant-workload-identity branch from af55097 to ac8a8de Compare April 7, 2025 09:11
@matheuscscp
Copy link
Member Author

matheuscscp commented Apr 7, 2025

I'm now addressing the offline comments I got during KubeCon EU 2025.

From @stealthybox:

An alternative to using service account tokens would be using a token whose subject string encodes a direct reference to the respective Flux resource, this way a resource would be its own identity. This is more secure than having a configuration knob to define another resource (a service account in this case) as the object identity, as it prevents another object in the same namespace from abusing the same service account/cloud permissions.

From @hiddeco:

We should move the interfaces in interfaces.go to multiple files with names matching the interface names themselves.

From @stefanprodan:

To avoid introducing kustomizations.spec.decryption.key.serviceAccountName for disambiguating single-tenant and multi-tenant workload identity we should instead introduce a binary flag in the controllers to switch to and enforce multi-tenant workload identity, e.g. --require-service-account-for-provider-auth

@matheuscscp
Copy link
Member Author

@stealthybox @hiddeco @stefanprodan Comments from KubeCon addressed, please feel free to do another pass/approve.

@matheuscscp matheuscscp force-pushed the rfc-multi-tenant-workload-identity branch from 6de8de0 to c76ea14 Compare April 7, 2025 17:34
Copy link
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@matheuscscp matheuscscp force-pushed the rfc-multi-tenant-workload-identity branch 4 times, most recently from 64e29f8 to 0f4cf9f Compare April 11, 2025 02:17
Copy link
Member

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@matheuscscp matheuscscp force-pushed the rfc-multi-tenant-workload-identity branch from 0f4cf9f to d0a69fe Compare April 12, 2025 13:41
@stefanprodan stefanprodan changed the title [RFC] Multi-Tenant Workload Identity [RFC-0010] Multi-Tenant Workload Identity Apr 14, 2025
Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @matheuscscp 🏅

Signed-off-by: Matheus Pimenta <matheuscscp@gmail.com>
@matheuscscp matheuscscp force-pushed the rfc-multi-tenant-workload-identity branch from 0ca7ef6 to a7e41df Compare April 14, 2025 10:34
@matheuscscp matheuscscp merged commit 9127181 into main Apr 14, 2025
6 checks passed
@matheuscscp matheuscscp deleted the rfc-multi-tenant-workload-identity branch April 14, 2025 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/rfc Feature request proposals in the RFC format
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants