-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Enabling ILB/ELB on windows using per-node, per-network LB endpoint. #34674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
libnetwork PR: moby/libnetwork#1925 |
@pradipd can you please fix the tests ? |
@@ -360,6 +387,34 @@ func (daemon *Daemon) createNetwork(create types.NetworkCreateRequest, id string | |||
} | |||
daemon.LogNetworkEvent(n, "create") | |||
|
|||
if agent && !n.Info().Ingress() && n.Type() == "overlay" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this code (and the code in deleteNetwork) to a new function. A separate function will make it easier to unit test, and help the reader understand the different steps performed as part of createNetwork
and deleteNetwork
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
daemon/network.go
Outdated
sb, err := c.NewSandbox(sandboxName) | ||
if err != nil { | ||
logrus.Errorf("Failed creating %s sandbox: %v", sandboxName, err) | ||
return nil, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the error needs to be logged. The conventional way of handling errors is to return them with errors.Wrapf(err, "failed creating %s sandbox", sandboxName)
.
The errors should end up in the logs.
Thanks for the feedback. I'm addressing it now. I had to fix some changes in libnetwork PR (moby/libnetwork#1925) and swarmkit PR (moby/swarmkit#2363) |
daemon/network.go
Outdated
endpointName := prefix + "-endpoint" | ||
ep, err := n.CreateEndpoint(endpointName, libnetwork.CreateOptionIpam(ip, nil, nil, nil), libnetwork.CreateOptionLoadBalancer()) | ||
if err != nil { | ||
logrus.Errorf("Failed creating %s in sandbox %s: %v", endpointName, sandboxName, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I left the logrus error messages here, because they were here before for when we were creating the ingress network.
@vieux In order to get the tests passing I need changes from moby/libnetwork#1925 and moby/swarmkit#2363 |
Please sign your commits following these rules: $ git clone -b "windows_routingmesh" git@github.com:pradipd/moby.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354127368
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f Amending updates the existing PR. You DO NOT need to open a new one. |
@vieux : Can you help me understand the integration tests? So, I moved the test to docker_api_swarm_test.go, but the test is exactly the same. That fixed the Janky error, but, I don't think that was the proper way to fix the test. I also tried to add the test to integration/service, but, I can't figure out how to run the tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How to I run the tests under integration/service?
The same way you would the cli suite, make test-integration
d := s.AddDaemon(c, true, true) | ||
|
||
// Create a new overlay network | ||
out, err := d.Cmd("network", "create", "-d", "overlay", "new-overlay") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API suite should not use d.Cmd()
please use testEnv.APIClient()
to get a client, and make calls to the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification. That is what I thought, but, I saw other tests in the .Cmd() and hence the confusion. I will clean this up.
9f7bd27
to
a70fad7
Compare
50f273d
to
6fda10b
Compare
ping @tiborvass @fcrisciani @vdemeester can you take a look ? |
6fda10b
to
704610a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some nits and a question
return err | ||
} | ||
|
||
err = e.backend.AddLBAttachments(lbAttachments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this can be just
return e.backend.AddLBAttachments(lbAttachments)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest PR.
daemon/network.go
Outdated
if err := sb.EnableService(); err != nil { | ||
logrus.Errorf("Failed enabling service for ingress sandbox") | ||
} | ||
daemon.createLoadBalancerSandbox("ingress", create.ID, ip, n, libnetwork.OptionIngress()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the returned error not handled here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed this in the latest PR
daemon/network.go
Outdated
@@ -235,6 +214,32 @@ func (daemon *Daemon) releaseIngress(id string) { | |||
} | |||
} | |||
|
|||
//AddLBAttachments sets the mapping of LB IP address per network | |||
func (daemon *Daemon) AddLBAttachments(lbAttachments map[string]string) error { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this empty line here (and other functions you added) to make the coding style match the existing code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
daemon/network.go
Outdated
func (daemon *Daemon) AddLBAttachments(lbAttachments map[string]string) error { | ||
|
||
for nid, nodeIP := range lbAttachments { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, could you remove this empty line?
@@ -136,23 +136,34 @@ func (e *executor) Describe(ctx context.Context) (*api.NodeDescription, error) { | |||
} | |||
|
|||
func (e *executor) Configure(ctx context.Context, node *api.Node) error { | |||
na := node.Attachment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not too familiar with this, but is node.Attachment
no longer needed? (is this now included in node.LbAttachments
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. node.Attachment has been deprecated https://github.com/docker/swarmkit/blob/master/api/objects.proto#L64
e.backend.ReleaseIngress() | ||
e.backend.ClearLBAttachments() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this function clearing the attachments of all the networks or only the Ingress?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is clearing all the attachments for all networks --- which is wrong.
I fixed this in the latest PR.
The lb attachments are always "reset". I.e. they are cleared and then updated with the latest node.lbattachments passed in from the swarm manager.
return err | ||
} | ||
|
||
err = e.backend.AddLBAttachments(lbAttachments) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The feeling here is that if there is not ingress network there is no lb configured, is it correct or there is a different code path followed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it is not ingress network, then we save the lb attachment (actually, only the per network IP address).
We only use the lb IP address when we create a service.
daemon/network.go
Outdated
@@ -496,6 +543,28 @@ func (daemon *Daemon) DeleteNetwork(networkID string) error { | |||
return daemon.deleteNetwork(networkID, false) | |||
} | |||
|
|||
func (daemon *Daemon) deleteLoadBalancerSandbox(n libnetwork.Network) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a disableService here, wondering why there is an asymmetry between the EnableService in the Create and no need for the disableService here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. I think I missed this in the last PR. I'll update it. sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address in the latest PR.
@pradipd vendor check is failing:
|
Oops. I shouldn't have done s/sanbox/sandbox/ in the vendored packages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for addressing my nits; don't forget to squash commits to two commits; 1 for "vendoring" and 1 for the local changes. (We don't have to preserve the "fix vendor" and "address review comments" when merging)
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
e5e3089
to
f314530
Compare
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
f314530
to
4c1b079
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making those changes to the test case.
I think the test still needs a bit of cleanup. Would you mind opening a PR to address these points?
poll.WaitOn(t, serviceRunningTasksCount(client, serviceID, instances)) | ||
|
||
_, _, err = client.ServiceInspectWithRaw(context.Background(), serviceID, types.ServiceInspectOptions{}) | ||
require.NoError(t, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These 2 lines seem to be unnecessary. We already know the service can be inspected from the poll.WaitOn()
. I think these 2 lines need to be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove in new PR.
require.NoError(t, err) | ||
assert.Contains(t, network.Containers, overlayName+"-sbox") | ||
|
||
err = client.ServiceRemove(context.Background(), serviceID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything from this line to the end of the test case seems to be test teardown, which should already be handled by defer setupTest(t)()
. Why is all this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to validate that the network is removed cleanly.
I see setupTest() is tearing everything down, but, it is forcefully shutting things down. I want to make sure the typical user pattern works correctly.
} | ||
} | ||
|
||
func serviceRunningTasksCount(client client.ServiceAPIClient, serviceID string, instances uint64) func(log poll.LogT) poll.Result { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a duplicate of serviceContainerCount()
in inspect_test.go. I'm fine if you want to rename it, but I don't think we need both. We can just add the state check, which seems to be the only difference.
Will open up a new PR to address these. |
#34898 is the PR. |
@@ -61,4 +62,5 @@ type Backend interface { | |||
LookupImage(name string) (*types.ImageInspect, error) | |||
PluginManager() *plugin.Manager | |||
PluginGetter() *plugin.Store | |||
GetLBAttachmentStore() *networkSettings.LBAttachmentStore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetAttachmentStore
@@ -121,6 +122,8 @@ type Daemon struct { | |||
pruneRunning int32 | |||
hosts map[string]bool // hosts stores the addresses the daemon is listening on | |||
startupDone chan struct{} | |||
|
|||
lbAttachmentStore network.LBAttachmentStore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
attachmentStore
@@ -31,3 +34,36 @@ type EndpointSettings struct { | |||
*networktypes.EndpointSettings | |||
IPAMOperational bool | |||
} | |||
|
|||
// LBAttachmentStore stores the load balancer IP address for a network id. | |||
type LBAttachmentStore struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AttachmentStore
.
@pradipd I fixed the naming in swarmkit to remove LB. Could you please do so in moby, as well? |
I have made the changes. How do I sync with your changes in swarmkit? |
@pradipd Wait till we merge the swarmkit changes and submit a PR with the updates. |
Note: This is 1 of 3 PRs (the other 2 are in libnetwork and swarmkit repos).
Signed-off-by: Pradip Dhara pradipd@microsoft.com
- What I did
Enabled ILB/ELB (routing mesh) functionality for windows.
- How I did it
This change enables ILB/ELB functionality on windows using a per-node, per-network LB endpoint. The per-node, per-network LB endpoint was discussed with docker
as a solution to #30820. Note that only windows is using the LB endpoint. We have not changed the linux networking code to take advantage of the LB endpoint.
- How to verify it
make test on linux (note: I'm still seeing 2 tests occasionally fail. But, I'm fairly certain they are unrelated to my change. I'm sending this out now and will continue to debug).
We have added a bunch of powershell scripts to test out functionality. However, they are not public. We will work with docker folks in figuring out how to make them public. Ideally, we should get the docker_cli_swarm_tests running on windows, but, that is much bigger task and requires changes in windows.
- Description for the changelog
Enabling ILB/ELB on windows using per-node, per-network LB endpoint.
- A picture of a cute animal (not mandatory but encouraged)
