Have kubelet pass pod annotations directly to CNI plugins #69882

leblancd · 2018-10-16T14:50:17Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature

What happened:
Currently, if a CNI network plugin needs access to annotations that are configured on a pod, it needs to do a GET/read of the pod manifest from the Kubernetes API server, based on the pod's name.

What you expected to happen:
With the introduction of CNI release 0.4.0 (which supports CNI API version 0.2.0), it should be possible to make changes to kubelet such that pod annotations (and possibly labels) can be dynamically inserted into the CNI network configuration that is passed to CNI plugin binaries via stdin. In many cases, this would eliminate the need for CNI plugins to perform Kubernetes API reads of pod manifests.

This can be achieved by using the new args field in the CNI network configuration. This field is intended to allow runtimes (e.g. kubelet) to dynamically insert per-container (per-pod, in the case of Kubernetes) configuration when a container/pod is being created.

The CNI conventions for the args field currently suggest the use of a "labels" field to convey mappings to the plugin:

   "args":{  
      "cni":{  
         "labels": [{"key": "app", "value": "myapp"}]
      }
   },

It's not clear whether this "labels" field should be used ("overloaded") for Kubernetes pod annotations... maybe the CNI maintainers would prefer that we create a new "annotations" field in parallel with "labels".

The use of this feature would also require the use of CNI version 0.4.0 binaries, so we should consider bumping up the Kubernetes reference version for CNI from 0.3.0.

How to reproduce it (as minimally and precisely as possible):
Refer to code here. This is approximately where changes would be needed.

Anything else we need to know?:
Use of this feature will require CNI versions 0.4.0 or newer. The Kubernetes reference version for CNI should be bumped from 0.3.0 to 0.4.0 at some point.

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:

The text was updated successfully, but these errors were encountered:

leblancd · 2018-10-16T14:51:22Z

/sig network

fejta-bot · 2019-01-15T14:41:15Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

danwinship · 2019-01-25T12:58:18Z

/remove-lifecycle stale
/lifecycle frozen

Multus would make use of this too (cc @dcbw), to figure out if it needs to create additional networks for a pod.

And we could just pass the entire pod JSON. I think plenty of network plugins could make use of that. (HostPort pods are one thing in particular that the plugin might need to deal with that aren't labels or annotations.)

kmala · 2019-08-08T22:04:25Z

Do we have a priority for this? and i can work on this if needed

danwinship · 2019-08-12T11:54:49Z

I mean, we're surviving just fine without this feature, so I guess
/priority backlog
but it would be nice to have

kmala · 2019-08-12T17:59:40Z

@danwinship Thanks for confirming my assumption
This is kind of important for us as there is a usecase to write custom CNI plugins similar to existing meta plugins and it requires more information from POD than just the name and namespace.
Is it okay if i can implement this feature and create PR or do you think i should start discussion on the feature before implementation?

danwinship · 2019-08-13T12:25:54Z

Well, the discussion above is incomplete; no one ever decided exactly how the data would be passed, and it's not clear to me if you're talking about just passing the annotations or if you mean the entire pod json. So, if you want to describe what you're planning to implement before writing too much code, that might be good.

ofiliz · 2019-08-13T22:39:32Z

I am also interested in helping with defining the requirements and the specific mechanisms. I thought passing the entire pod json is probably too much, but maybe we can pass pod labels matching a well-known pattern ("cni*"?) as args to libcni?

Additionally, kubelet could call different CNI plugins (not just the same plugin with a different configuration) based on a well-known pod label. This would be useful for launching pods with different networking requirements on the same node.

In any case, I'm interested in discussing and helping with the implementation.

kmala · 2019-08-13T23:40:48Z

Thanks @danwinship and @ofiliz
This is what i have in mind:

we should have naming convention such that we can pass only those annotations to the cni plugin. I am thinking containernetwork.k8s.io or cni.k8s.io as the prefix for the annotations

We can add those annotations as key value pairs to the args field in addition to the existing pod name and namespace while creating the runtime conf, both cni and kubenet already have annotations available to them

kubernetes/pkg/kubelet/dockershim/network/cni/cni.go

Line 365 in d2eecfe

    
           func (plugin *cniNetworkPlugin) buildCNIRuntimeConf(podName string, podNs string, podSandboxID kubecontainer.ContainerID, podNetnsPath string, annotations, options map[string]string) (*libcni.RuntimeConf, error) {

kubernetes/pkg/kubelet/dockershim/network/kubenet/kubenet_linux.go

Line 342 in d2eecfe

    
           func (plugin *kubenetNetworkPlugin) setup(namespace string, name string, id kubecontainer.ContainerID, annotations map[string]string) error {

danwinship · 2019-08-15T14:17:31Z

1. we should have naming convention such that we can pass only those annotations to the cni plugin.

No, there are already other annotations that don't match that naming convention that plugins want access to. Eg, the Network Attachment annotation or the pod bandwidth annotations. You should pass all of the annotations.

kmala · 2019-08-15T18:49:20Z

Thanks for giving more context. I will make a PR passing all the annotations.

mars1024 · 2019-08-19T09:43:58Z

@danwinship @kmala I don't think we need to pass annotations to CNI plugin, because it's CRI's job to translate pod spec or annotations to CNI RuntimeConfig.
For example, in different scenes, bandwidth may have different annotation keys, like bandwidth.k8s.io or bandwidth.cncf.io, actually bandwidth CNI plugin can't and shouldn't recognize both of them, so we should define some new CNI RuntimeConfigs and translate annotations to it, then pass them to CNI plugin, this will keep CNI plugin more general.

cc @dcbw @squeed

khenidak · 2019-08-19T16:54:48Z

I am worried about the amount of information we pass into CNI (and how updated they are at the time of Setup call). Sure we can work around that by passing the entire pod spec. But do we ensure that pod status updates are fed into CNI as well?

I think we should keep the setup and teardown interfaces as simple as possible, maybe pass in a service account token (with prober default RBAC roles for it). If the CNI functionality depends on pod spec (or other parts of the data that may be linked to that particular pod) that were not originally transmitted in setup then it is up to the CNI runtime to read the api-server.

The above represents a challenge to CNI developers because CNI is currently stateless, so data will have to be re-read as needed.

danwinship · 2019-08-19T17:23:14Z

@mars1024 This is about CNI plugins that are specifically interested in Kubernetes-specific things. There are already other bits of kubernetes-specific information that get passed to plugins (eg, the namespace of the pod). This would just be one more. CNI plugins that want to be container-platform-agnostic would just ignore the kubernetes-specific information

@khenidak No, we would not pass updates to the CNI plugin. Plugins that want to track the state of a Pod after creation time would have to continue doing that on their own, just like they do now. The goal here is just to simplify things for CNI plugins that need the pod annotations (or more) at setup time. eg, Multus (or any other "CNI Delegating Plugin" as defined in the Kubernetes Network Custom Resource Definition De-facto Standard) needs to look at the pod annotations to see if it is supposed to only delegate to the primary network plugin, or if it needs to also call additional CNI plugins to create additional network interfaces. Currently it does this by making an apiserver request whenever it receives a CNI_ADD request, but if kubelet passed the pod annotations to it as part of the request, then it wouldn't have to do that.

khenidak · 2019-08-19T17:32:01Z

@danwinship i understand your point, thank you.

The point i was trying to make. today we will add annotation, because they are needed. Later some CNI will be working with labels. Even later some other CNI will will need to work with owner-ref etc.. So what i was trying to say is to find a process where the entire system is accessible in a regulated fashion to CNI.

dcbw · 2019-08-21T21:59:08Z

@khenidak @kmala @danwinship we've periodically discussed the issue in SIG Network in the past. And I think the reasons not to pass annotations still hold. Those are roughly that:

there can be lots of annotations, most of which are generic and the CNI plugin likely doesn't care about. Do we really want/need to push all of them to the plugin?
why are annotations special and things like the whole PodSpec or labels or other information not special? eg why stop at annotations?
plugins that need these annotations almost always already have a KubeClient anyway, so they can go retrieve the whole PodSpec if they want it. And they probably also need CRDs and stuff like that.

Basically we drew a line between plugins that are pretty dumb (eg flannel, bridge, kubenet, etc) and plugins that are more capable and already use KubeClients. And decided that it wasn't worth the downsides (complexity, size, processing, etc) to put annotations and/or other stuff into the CNI plugin. I don't think I've seen compelling arguments to revisit that decision :(

kmala · 2019-08-21T23:17:18Z

one of my first proposal is pass only specific annotations which have a domain prefix like containernetwork.k8s.io or cni.k8s.io. So, i am okay to make changes to push only few of them.
The reason for only annotations is because it is the preferred way of adding metadata to the pod. Though labels can also be used to add metadata, its mainly used for identifying the objects and selection. I am okay to even pass the whole podspec if we think it is useful for the plugins.
The plugin's can call apiserver to get the entire podspec but this would mean there would be an api call to apiserver for every add, delete event of pod and also much more if the CNI plugin needs information during handling the check method. We are already doing this and this is increasing a lot of load on the apiserver and decreasing the performance.
I understand that the existing plugins doesn't need much metadata except the pod name and namespace name, but for people who are writing their own custom plugins for their internal networking and firewall restrictions, the handling would be different for different workloads(apps) and information is stored in metadata. The current way of calling kubernetes for each pod wouldn't scale in large clusters and if the CNI already supports passing of metadata using extra args then that should be the preferred way is what i think.

ofiliz · 2019-08-21T23:25:43Z

@dcbw I think while your point 3 above, "plugins that need these annotations already have a KubeClient anyway" is certainly true, this is only because we provide no other way for such plugins to achieve their goals. In complex Kubernetes networking setups there is now an explosion of daemons running on each node for this very reason. Common examples are daemons to configure sidecar proxies and network policy agents. These daemons consume node resources and increase deployment complexity. Every watch and get on API server also causes node scalability problems.

Another related problem we are solving here is not just what config to pass to CNI plugins, but also which CNI plugins to call. Existing design forces all plugins to be called for all pods. The common solution to that problem is to use a delegating CNI plugin, but because those plugins have no in-context information to decide which plugin(s) to call, they also have to fetch pod specs, inflating the problem.

danwinship · 2019-08-22T12:05:45Z

"plugins that need these annotations already have a KubeClient anyway" is certainly true, this is only because we provide no other way for such plugins to achieve their goals.

But most of them are still going to need to need the KubeClient even if we provide them the annotations. eg:

The common solution to that problem is to use a delegating CNI plugin, but because those plugins have no in-context information to decide which plugin(s) to call, they also have to fetch pod specs, inflating the problem.

But if the pod has the network attachment annotation, then that will point to a CRD which the delegating plugin will also need to look at. So it's still going to need a KubeClient anyway.

kmala · 2019-08-22T21:28:19Z

I understand that some plugins will/might still need to call KubeClient which we may not be able to solve but still there are multiple use-cases like custom firewall and network CNI plugins which would require the metadata of the pod and this feature would help them a lot.

khenidak · 2019-08-26T18:16:02Z

I'd avoid tying in pod annotations via prefixes like cni.XX or containernetwork.XX or something else. its implicit dependancies that the CNI can actually read these values and there are no errors thrown if it can't. Also it negatively impacts portability because the assumption - which may or may not be valid - is all CNIs are the same. Just a thought.

kmala · 2019-08-26T21:42:10Z

The prefixes is just an example and am okay to change it depending on what everyone agrees but the important point i want to talk about is that CNI spec already supports passing of metadata by the runtimes and now instead of asking the plugins to get the information by talking to apiserver all the time it only makes sense to pass generic metadata to the CNI plugins. Since we anyway pass the metadata like kubernetes namespace name and pod name, i think we can pass more metadata for the CNI plugins to use.

squeed · 2019-08-27T13:56:54Z

For dynamic selection of network (e.g. Multus-style), I don't see any way how we can avoid a Kube-client in the near future. Mostly because of status reporting. Some day, we should probably standardize on a way to get multiple networks in the CRI, but I don't think that's going to happen now.

However, I'm more sympathetic to the allowing per-pod tunables. There is some precedent for this with the special bandwidth annotation, which is directly translated in to a CNI capability argument. However, not all possible parameters should be capability args.

When the CNI was originally conceived, it was definitely expected that end-users would be able to pass runtime arguments directly to the CNI plugins. It was intended to look something like rkt run --net=bridge:IP=1.2.3.4:FOO=BAR. I think there is a precedent for exposing this mechanism.

So I would be in favor of adding a cni.k8s.io/ annotation space. I think this is better than every single CNI binary doing it differently.

ofiliz · 2019-08-27T18:16:09Z

@danwinship "But most of them are still going to need to need the KubeClient" I don't think we have any data to support that claim. I can speak for EKS and the scenarios we are considering are all solvable without a kubeclient.

I agree with the annotations (not the whole podspec) as CNI args approach, or the cni.k8s.io prefixed annotations. With a tactical 20-line change here we can fix the majority of common use cases. The remaining cases that require kubeclient can continue to do so.

danwinship · 2019-12-13T15:29:03Z

OK, as discussed here and in the corresponding PR, there are issues with this idea/implementation:

some people think it's wrong in general
it's definitely wrong to do it only for dockershim and not for CRI
confusion in the PR about exactly what the best way of passing the args would be

If we are going to do this, it needs a KEP so we can hash all of that out (and get input from CRI people as well).
I'm going to close the PR but leave the issue open for now.

reith · 2020-01-05T11:47:24Z

I've been watching this issue since I liked our CNI to not watch for Pods but we already had kubeclient to manage CRDs representing our container links, so I was in agreement with @ofiliz . Now I'm changing our CNI for a nested environment, e.g. a Pod of a kubernetes in VMs which have their interfaces managed by another kubernetes. This inner CNI should talk with two API servers, inner kubernetes for Pods and upper kubernetes for Links of VMs. If I had Pod annotations from kubelet, I could make CNI to talk with just upper kubernetes. It's still a narrow situation and relates to our design decisions, so probably not a requirement. I just wanted to share a case that kubeclient is not really needed and some Pod information can suffice. Still how much information is needed is a subject of debate.

tallclair · 2020-12-18T19:24:50Z

Related issue: #84248 (node local apicache)

This seems like a strong use case for a node-local API. Basically, address the problems with CNI plugins needing to watch the apiserver, without needing to load everything into the CNI API.

k8s-ci-robot added kind/feature needs-sig labels Oct 16, 2018

k8s-ci-robot added sig/network and removed needs-sig labels Oct 16, 2018

leblancd changed the title ~~Allow kubelet to pass pod annotations directly to CNI plugins~~ Have kubelet pass pod annotations directly to CNI plugins Oct 16, 2018

leblancd mentioned this issue Oct 28, 2018

Add IPv4/IPv6 dual stack KEP kubernetes/community#2254

Closed

k8s-ci-robot added the lifecycle/stale label Jan 15, 2019

k8s-ci-robot added lifecycle/frozen and removed lifecycle/stale labels Jan 25, 2019

thockin added the triage/unresolved label Mar 8, 2019

freehan removed the triage/unresolved label May 2, 2019

k8s-ci-robot added the priority/backlog label Aug 12, 2019

kmala mentioned this issue Aug 19, 2019

Pass pod annotations to the cni plugins #81583

Closed

kiranmeduri mentioned this issue Oct 1, 2019

[request]: Security feature request to be able to run"proxy/init" container outside the "App mesh Injector" aws/aws-app-mesh-roadmap#88

Open

kmala mentioned this issue Feb 25, 2020

Kubelet deletes the pod before cleaning the CNI resources #88543

Open

chiragtayal mentioned this issue Apr 19, 2020

Pass pod annotations as extra args to CNI plugins containerd/cri#1452

Open

SaranBalaji90 mentioned this issue Feb 8, 2021

Existing pod network not cleanedup when using security groups with stateful sets aws/amazon-vpc-cni-k8s#1374

Closed

smarterclayton mentioned this issue Apr 29, 2021

Node-local API cache #84248

Open

kubernetes / kubernetes

Have kubelet pass pod annotations directly to CNI plugins #69882

Have kubelet pass pod annotations directly to CNI plugins #69882

leblancd commented Oct 16, 2018 •

edited

leblancd commented Oct 16, 2018

fejta-bot commented Jan 15, 2019

danwinship commented Jan 25, 2019

kmala commented Aug 8, 2019

danwinship commented Aug 12, 2019

kmala commented Aug 12, 2019

danwinship commented Aug 13, 2019

ofiliz commented Aug 13, 2019

kmala commented Aug 13, 2019

danwinship commented Aug 15, 2019

kmala commented Aug 15, 2019

mars1024 commented Aug 19, 2019

khenidak commented Aug 19, 2019 •

edited

danwinship commented Aug 19, 2019

khenidak commented Aug 19, 2019 •

edited

dcbw commented Aug 21, 2019

kmala commented Aug 21, 2019

ofiliz commented Aug 21, 2019

danwinship commented Aug 22, 2019

kmala commented Aug 22, 2019

khenidak commented Aug 26, 2019

kmala commented Aug 26, 2019

squeed commented Aug 27, 2019

ofiliz commented Aug 27, 2019

danwinship commented Dec 13, 2019

reith commented Jan 5, 2020 •

edited

tallclair commented Dec 18, 2020

kubernetes / kubernetes

Have kubelet pass pod annotations directly to CNI plugins #69882

Have kubelet pass pod annotations directly to CNI plugins #69882

Comments

leblancd commented Oct 16, 2018 • edited

leblancd commented Oct 16, 2018

fejta-bot commented Jan 15, 2019

danwinship commented Jan 25, 2019

kmala commented Aug 8, 2019

danwinship commented Aug 12, 2019

kmala commented Aug 12, 2019

danwinship commented Aug 13, 2019

ofiliz commented Aug 13, 2019

kmala commented Aug 13, 2019

danwinship commented Aug 15, 2019

kmala commented Aug 15, 2019

mars1024 commented Aug 19, 2019

khenidak commented Aug 19, 2019 • edited

danwinship commented Aug 19, 2019

khenidak commented Aug 19, 2019 • edited

dcbw commented Aug 21, 2019

kmala commented Aug 21, 2019

ofiliz commented Aug 21, 2019

danwinship commented Aug 22, 2019

kmala commented Aug 22, 2019

khenidak commented Aug 26, 2019

kmala commented Aug 26, 2019

squeed commented Aug 27, 2019

ofiliz commented Aug 27, 2019

danwinship commented Dec 13, 2019

reith commented Jan 5, 2020 • edited

tallclair commented Dec 18, 2020

leblancd commented Oct 16, 2018 •

edited

khenidak commented Aug 19, 2019 •

edited

khenidak commented Aug 19, 2019 •

edited

reith commented Jan 5, 2020 •

edited