Skip to content

Enabling ILB/ELB on windows using per-node, per-network LB endpoint. #34674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 18, 2017

Conversation

pradipd
Copy link
Contributor

@pradipd pradipd commented Aug 29, 2017

Note: This is 1 of 3 PRs (the other 2 are in libnetwork and swarmkit repos).

Signed-off-by: Pradip Dhara pradipd@microsoft.com

- What I did
Enabled ILB/ELB (routing mesh) functionality for windows.

- How I did it
This change enables ILB/ELB functionality on windows using a per-node, per-network LB endpoint. The per-node, per-network LB endpoint was discussed with docker
as a solution to #30820. Note that only windows is using the LB endpoint. We have not changed the linux networking code to take advantage of the LB endpoint.

- How to verify it
make test on linux (note: I'm still seeing 2 tests occasionally fail. But, I'm fairly certain they are unrelated to my change. I'm sending this out now and will continue to debug).
We have added a bunch of powershell scripts to test out functionality. However, they are not public. We will work with docker folks in figuring out how to make them public. Ideally, we should get the docker_cli_swarm_tests running on windows, but, that is much bigger task and requires changes in windows.

- Description for the changelog
Enabling ILB/ELB on windows using per-node, per-network LB endpoint.

- A picture of a cute animal (not mandatory but encouraged)
tubby shaved

@pradipd
Copy link
Contributor Author

pradipd commented Aug 29, 2017

libnetwork PR: moby/libnetwork#1925
swarmkit PR: moby/swarmkit#2363

@pradipd
Copy link
Contributor Author

pradipd commented Aug 29, 2017

@vieux
Copy link
Contributor

vieux commented Aug 30, 2017

@pradipd can you please fix the tests ?

@@ -360,6 +387,34 @@ func (daemon *Daemon) createNetwork(create types.NetworkCreateRequest, id string
}
daemon.LogNetworkEvent(n, "create")

if agent && !n.Info().Ingress() && n.Type() == "overlay" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this code (and the code in deleteNetwork) to a new function. A separate function will make it easier to unit test, and help the reader understand the different steps performed as part of createNetwork and deleteNetwork.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

sb, err := c.NewSandbox(sandboxName)
if err != nil {
logrus.Errorf("Failed creating %s sandbox: %v", sandboxName, err)
return nil, err
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the error needs to be logged. The conventional way of handling errors is to return them with errors.Wrapf(err, "failed creating %s sandbox", sandboxName).

The errors should end up in the logs.

@pradipd
Copy link
Contributor Author

pradipd commented Aug 31, 2017

Thanks for the feedback. I'm addressing it now. I had to fix some changes in libnetwork PR (moby/libnetwork#1925) and swarmkit PR (moby/swarmkit#2363)

endpointName := prefix + "-endpoint"
ep, err := n.CreateEndpoint(endpointName, libnetwork.CreateOptionIpam(ip, nil, nil, nil), libnetwork.CreateOptionLoadBalancer())
if err != nil {
logrus.Errorf("Failed creating %s in sandbox %s: %v", endpointName, sandboxName, err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I left the logrus error messages here, because they were here before for when we were creating the ingress network.

@pradipd
Copy link
Contributor Author

pradipd commented Sep 1, 2017

@vieux In order to get the tests passing I need changes from moby/libnetwork#1925 and moby/swarmkit#2363
Do I have to wait for those 2 pull requests to merge?
Or do i vendor in those changes and point vendor.conf to my github repo?

@GordonTheTurtle
Copy link

Please sign your commits following these rules:
https://github.com/moby/moby/blob/master/CONTRIBUTING.md#sign-your-work
The easiest way to do this is to amend the last commit:

$ git clone -b "windows_routingmesh" git@github.com:pradipd/moby.git somewhere
$ cd somewhere
$ git rebase -i HEAD~842354127368
editor opens
change each 'pick' to 'edit'
save the file and quit
$ git commit --amend -s --no-edit
$ git rebase --continue # and repeat the amend for each commit
$ git push -f

Amending updates the existing PR. You DO NOT need to open a new one.

@GordonTheTurtle GordonTheTurtle added the dco/no Automatically set by a bot when one of the commits lacks proper signature label Sep 7, 2017
@pradipd
Copy link
Contributor Author

pradipd commented Sep 12, 2017

@vieux : Can you help me understand the integration tests?
Originally I added a test to docker_cli_swarm_test.go, but that gave a janky error.
integration-cli is deprecated. Please add an API integration test to
17:19:20 ./integration/COMPONENT/. See ./TESTING.md for more details

So, I moved the test to docker_api_swarm_test.go, but the test is exactly the same. That fixed the Janky error, but, I don't think that was the proper way to fix the test.

I also tried to add the test to integration/service, but, I can't figure out how to run the tests.
How to I run the tests under integration/service?

Copy link
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to I run the tests under integration/service?

The same way you would the cli suite, make test-integration

d := s.AddDaemon(c, true, true)

// Create a new overlay network
out, err := d.Cmd("network", "create", "-d", "overlay", "new-overlay")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API suite should not use d.Cmd() please use testEnv.APIClient() to get a client, and make calls to the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification. That is what I thought, but, I saw other tests in the .Cmd() and hence the confusion. I will clean this up.

@GordonTheTurtle GordonTheTurtle added the dco/no Automatically set by a bot when one of the commits lacks proper signature label Sep 13, 2017
@pradipd pradipd force-pushed the windows_routingmesh branch from 9f7bd27 to a70fad7 Compare September 14, 2017 22:25
@GordonTheTurtle GordonTheTurtle removed the dco/no Automatically set by a bot when one of the commits lacks proper signature label Sep 14, 2017
@pradipd pradipd force-pushed the windows_routingmesh branch 2 times, most recently from 50f273d to 6fda10b Compare September 14, 2017 22:41
@vieux
Copy link
Contributor

vieux commented Sep 15, 2017

ping @tiborvass @fcrisciani @vdemeester can you take a look ?

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some nits and a question

return err
}

err = e.backend.AddLBAttachments(lbAttachments)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this can be just

return e.backend.AddLBAttachments(lbAttachments)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest PR.

if err := sb.EnableService(); err != nil {
logrus.Errorf("Failed enabling service for ingress sandbox")
}
daemon.createLoadBalancerSandbox("ingress", create.ID, ip, n, libnetwork.OptionIngress())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the returned error not handled here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I addressed this in the latest PR

@@ -235,6 +214,32 @@ func (daemon *Daemon) releaseIngress(id string) {
}
}

//AddLBAttachments sets the mapping of LB IP address per network
func (daemon *Daemon) AddLBAttachments(lbAttachments map[string]string) error {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this empty line here (and other functions you added) to make the coding style match the existing code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

func (daemon *Daemon) AddLBAttachments(lbAttachments map[string]string) error {

for nid, nodeIP := range lbAttachments {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, could you remove this empty line?

@@ -136,23 +136,34 @@ func (e *executor) Describe(ctx context.Context) (*api.NodeDescription, error) {
}

func (e *executor) Configure(ctx context.Context, node *api.Node) error {
na := node.Attachment
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too familiar with this, but is node.Attachment no longer needed? (is this now included in node.LbAttachments ?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. node.Attachment has been deprecated https://github.com/docker/swarmkit/blob/master/api/objects.proto#L64

e.backend.ReleaseIngress()
e.backend.ClearLBAttachments()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this function clearing the attachments of all the networks or only the Ingress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is clearing all the attachments for all networks --- which is wrong.
I fixed this in the latest PR.
The lb attachments are always "reset". I.e. they are cleared and then updated with the latest node.lbattachments passed in from the swarm manager.

return err
}

err = e.backend.AddLBAttachments(lbAttachments)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The feeling here is that if there is not ingress network there is no lb configured, is it correct or there is a different code path followed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is not ingress network, then we save the lb attachment (actually, only the per network IP address).
We only use the lb IP address when we create a service.

@@ -496,6 +543,28 @@ func (daemon *Daemon) DeleteNetwork(networkID string) error {
return daemon.deleteNetwork(networkID, false)
}

func (daemon *Daemon) deleteLoadBalancerSandbox(n libnetwork.Network) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a disableService here, wondering why there is an asymmetry between the EnableService in the Create and no need for the disableService here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I think I missed this in the last PR. I'll update it. sorry.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address in the latest PR.

@thaJeztah
Copy link
Member

@pradipd vendor check is failing:

18:12:05 The result of vndr differs
18:12:05 
18:12:05  M vendor/github.com/docker/libnetwork/drivers/overlay/peerdb.go
18:12:05 
18:12:05 Please vendor your package with github.com/LK4D4/vndr.

@pradipd
Copy link
Contributor Author

pradipd commented Sep 18, 2017

Oops. I shouldn't have done s/sanbox/sandbox/ in the vendored packages.
will fix.

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing my nits; don't forget to squash commits to two commits; 1 for "vendoring" and 1 for the local changes. (We don't have to preserve the "fix vendor" and "address review comments" when merging)

Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
@pradipd pradipd force-pushed the windows_routingmesh branch from e5e3089 to f314530 Compare September 18, 2017 20:29
Signed-off-by: Pradip Dhara <pradipd@microsoft.com>
@pradipd pradipd force-pushed the windows_routingmesh branch from f314530 to 4c1b079 Compare September 18, 2017 20:39
@vieux vieux merged commit a2ee40b into moby:master Sep 18, 2017
@pradipd pradipd deleted the windows_routingmesh branch September 19, 2017 04:54
Copy link
Member

@dnephin dnephin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for making those changes to the test case.

I think the test still needs a bit of cleanup. Would you mind opening a PR to address these points?

poll.WaitOn(t, serviceRunningTasksCount(client, serviceID, instances))

_, _, err = client.ServiceInspectWithRaw(context.Background(), serviceID, types.ServiceInspectOptions{})
require.NoError(t, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 lines seem to be unnecessary. We already know the service can be inspected from the poll.WaitOn(). I think these 2 lines need to be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove in new PR.

require.NoError(t, err)
assert.Contains(t, network.Containers, overlayName+"-sbox")

err = client.ServiceRemove(context.Background(), serviceID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything from this line to the end of the test case seems to be test teardown, which should already be handled by defer setupTest(t)(). Why is all this necessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to validate that the network is removed cleanly.
I see setupTest() is tearing everything down, but, it is forcefully shutting things down. I want to make sure the typical user pattern works correctly.

}
}

func serviceRunningTasksCount(client client.ServiceAPIClient, serviceID string, instances uint64) func(log poll.LogT) poll.Result {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate of serviceContainerCount() in inspect_test.go. I'm fine if you want to rename it, but I don't think we need both. We can just add the state check, which seems to be the only difference.

@pradipd
Copy link
Contributor Author

pradipd commented Sep 19, 2017

Will open up a new PR to address these.

@pradipd
Copy link
Contributor Author

pradipd commented Sep 19, 2017

#34898 is the PR.

@@ -61,4 +62,5 @@ type Backend interface {
LookupImage(name string) (*types.ImageInspect, error)
PluginManager() *plugin.Manager
PluginGetter() *plugin.Store
GetLBAttachmentStore() *networkSettings.LBAttachmentStore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetAttachmentStore

@@ -121,6 +122,8 @@ type Daemon struct {
pruneRunning int32
hosts map[string]bool // hosts stores the addresses the daemon is listening on
startupDone chan struct{}

lbAttachmentStore network.LBAttachmentStore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

attachmentStore

@@ -31,3 +34,36 @@ type EndpointSettings struct {
*networktypes.EndpointSettings
IPAMOperational bool
}

// LBAttachmentStore stores the load balancer IP address for a network id.
type LBAttachmentStore struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AttachmentStore.

@stevvooe
Copy link
Contributor

@pradipd I fixed the naming in swarmkit to remove LB. Could you please do so in moby, as well?

@pradipd
Copy link
Contributor Author

pradipd commented Sep 22, 2017

I have made the changes. How do I sync with your changes in swarmkit?
Do I vendor in your branch/commit in vendor.conf?
Or do I wait till you merge your changes into swarmkit, then vendor in the new master commit hash?

@stevvooe
Copy link
Contributor

@pradipd Wait till we merge the swarmkit changes and submit a PR with the updates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants