|
|
Subscribe / Log in / New account

TCP window scaling and broken routers

Every TCP packet includes, in the header, a "window" field which specifies how much data the system which sent the packet is willing and able to receive from the other end. The window is the flow control mechanism used by TCP; it controls the maximum amount of data which can be "in flight" between two communicating systems and keeps one side from overwhelming the other with data.

In the early days of TCP, windows tended to be relatively small. The computers of that age did not have huge amounts of memory to dedicate toward buffering network data, and the available networking technology was not fast enough to make use of a larger window in any case. Modern network interfaces can handle larger packets and keep more of them in flight at any given time; they will perform better with a larger window. Some kinds of high-speed long-haul links can have very high bandwidth, but also high latency. Keeping that sort of pipe filled can require a very large window; if a sending system cannot have a large number of packets in transit at any given time, it will not be able to make use of the bandwidth available. For these reasons, good performance can often require very large windows.

The TCP window field, however, is only 16 bits wide, allowing for a maximum window size of 64KB. The TCP designers must have thought that nobody would ever need a larger window than that. But 64KB is not even close to what is needed in many situations today. The solution to this problem is called "window scaling." It is not new; window scaling was codified in RFC 1323 back in 1992. It is also not complicated: a system wanting to use window scaling sets a TCP option containing an eight-bit scale factor. All window values used by that system thereafter should be left-shifted by that scale factor; a window scale of zero, thus, implies no scaling at all, while a scale factor of five implies that window sizes should be shifted five bits, or multiplied by 32. With this scheme, a 128KB window could be expressed by setting the scale factor to five and putting 4096 in the window field.

To keep from breaking TCP on systems which do not understand window scaling, the TCP option can only be provided in the initial SYN packet which initiates the connection, and scaling can only be used if the SYN+ACK packet sent in response also contains that option. The scale factor is thus set as part of the setup handshake, and cannot be changed thereafter.

The details are still being figured out, but it would appear that some routers on the net are rewriting the window scale TCP option on SYN packets as they pass through. In particular, they seem to be setting the scale factor to zero, but leaving the option in place. The receiving side sees the option, and responds with a window scale factor of its own. At this point, the initiating system believes that its scale factor has been accepted, and scales its windows accordingly. The other end, however, believes that the scale factor is zero. The result is a misunderstanding over the real size of the receive window, with the system behind the firewall believing it to be much smaller than it really is. If the expected scale factor (and thus the discrepancy) is large, the result is, at best, very slow communication. In many cases, the small window can cause no packets to be transmitted at all, breaking TCP between the two affected systems entirely.

In the 2.6.7 kernel, the default scale factor is zero; in Linus's BitKeeper tree and the 2.6.7-mm kernels, instead, it has been increased to seven. This change has brought the broken router behavior to light; suddenly people running current kernels are finding that they cannot talk to a number of systems out there. One of the higher-profile affected sites is packages.gentoo.org. Gentoo users are, unsurprisingly, not pleased.

As a way of making things work, Stephen Hemminger has proposed a patch which adds a calculation to select the smallest scale factor which covers the largest possible window size. The result on most systems is that the scale factor gets set to two. This factor will still be corrupted by broken routers, but the resulting window size (¼ of what it should be) is still large enough to allow communication to happen.

The patch makes networking with systems behind broken routers work again, but it has been rejected anyway. The networking maintainers (and David Miller in particular) believe that the patch simply papers over a problem, and that adding hacks to the Linux network stack to accommodate broken routers is a mistake. If, instead, the situation is left as it is, pressure on the router manufacturers should get the problem fixed relatively quickly. It has been a few years, now, that Linux has a strong enough presence in the networking world that it can get away with taking this sort of position.

In the mean time, anybody running a current kernel who is having trouble connecting to a needed site can work around the problem with a command like:

    echo 0 > /proc/sys/net/ipv4/tcp_default_win_scale 

or by adding a line like:

    net.ipv4.tcp_default_win_scale = 0

to /etc/sysctl.conf.

Index entries for this article
KernelNetworking/Window scaling
KernelTCP
KernelWindow scaling


to post comments

TCP window scaling and broken routers

Posted Jul 8, 2004 4:24 UTC (Thu) by Baylink (guest, #755) [Link] (1 responses)

> It has been a few years, now, that Linux has a strong enough presence in the networking
world that it can get away with taking this sort of position.

I wonder if the same thing is true of the hackers who maintain the large clusters of said
machines: can we 'down tools and tell em go to Hell' if they persist in misnaming criminals as
hackers? :-)

TCP window scaling and broken routers

Posted Jul 12, 2004 20:42 UTC (Mon) by ron.flory@adtran.com (guest, #22995) [Link]

careful: hacker != cracker

hacker (hobbyist) is not necessarily bad.
cracker is always bad.

Please see: http://info.astrian.net/jargon/terms/h/hacker.html

TCP window scaling and broken routers

Posted Jul 8, 2004 7:49 UTC (Thu) by ekj (guest, #1524) [Link]

64KB, the max with zero scaling ain't going to be enough for good performance in many situations.

For example, if your ping-time to the destination is 250ms, which is fairly typical for me for accessing US sites from Germany over DSL, then a 64KB window is going to be full if my available bandwith is 2Mbit/s or more. That's not a particularily rare condition.

When downloading from my Uni (which has triple redundant 622MB/s atm-links) the limiting factor is typically the 100Mbit ethernet, at that speed a 64KB window would be full for any transfer where the ping-time is more than 5ms, which is essentially everywhere.

I'm with the kernel-hackers on this one. A router that doesn't handle larger window-sizes properly is broken and needs to be fixed or replaced as soon as possible, anything else will just be band-aids.

Broken routers and firewalls

Posted Jul 8, 2004 20:15 UTC (Thu) by Ross (guest, #4065) [Link]

"If ... the situation is left as it is, pressure on the router
manufacturers should get the problem fixed relatively quickly.
... Linux has a strong enough presence in the networking world
that it can get away with taking this sort of position."

Really? Is that why all of the broken firewalls stopped blocking packets
with ECN bits? Well, all of them except for a few tiny obscure places like
Sun, Sprint, CitiBank, Cornell, SAE, ISOC, Iomega, US DoJ, Wells Fargo, and
Checker Auto Parts :)

But seriously, while I hope this does force vendors to fix their broken
code I just don't have a lot of faith that it will work.

I _still_ find websites behind broken firewalls which stop all ICMP
packets, including "must fragment" errors. This doesn't just affect Linux
users. Well I can't reliably visit some of those sites (iptables PMTU
clamping helps considerably). The same thing with ECN. I once went to the
trouble of actually calling a network admin at Southwest Airlines to help
them fix the problem. It worked, in less than one week they had patched
their router, but now it is broken again.

The basic problem is that it doesn't affect them, and they have little
incentive to fix it. There's no clear communications channel to get the
information to the people who need it.

If you want to report ECN problems here's a good resource:
http://urchin.earth.li/ecn/
(follow the link to the "ECN Hall of Shame")

This sounds a lot like the IE / Mozilla / Opera / etc dilema to me

Posted Jul 9, 2004 19:37 UTC (Fri) by HeathPetersen (guest, #14116) [Link]

For a few years now, I've expected web sites to support non-IE web browsers. If that hasn't happened, why should this?

Don't get me wrong, I am outgraged at the commercial entities forcing these things down out throats. I just don't expect quick resolutions to their errors.

TCP window scaling and broken routers

Posted Jul 10, 2004 0:19 UTC (Sat) by dlang (guest, #313) [Link] (4 responses)

let's see with ECN people were up in arms about those nasty firewalls that blocked backets that did unknown things with the undefined bits in the header (even before ECN was an approved spec) and said that they should have zerod out the bits instead of blocking the packet.

in this case the nasty firewalls zero out the bits in the unknown option and people are complaining

putting these two togeather it sounds like what people really want is for the firewalls/routers to just let everything through and not try to enforce anything.

why am I not surprised that this doesn't "just happen"

No

Posted Jul 10, 2004 2:53 UTC (Sat) by Ross (guest, #4065) [Link]

You misunderstand the standard: "must be zero" means that they must be
set to zero for software adhering to this version of the standard. They
absolutely must be ignored by software adhering to that version of the
standard. If they are dropped when not zero then the implementation is
broken because it is not upwards-compatible with other implementations.
TCP/IP is specifically designed to be upwards compatible. If you don't
understand options you are supposed to ignore them, otherwise there is
no point in having an extendable protocol.

But in any case, even for those who can't read RFCs properly, there was a
draft RFC at the time and it's now official. So there are absolutely,
positively no excuses anymore but there are still lots of broken vendors
and even more unpatched routers and firewalls (one only wonders what
security problems these systems must have).

TCP window scaling and broken routers

Posted Jul 21, 2004 18:31 UTC (Wed) by schabi (guest, #14079) [Link] (2 responses)

"this case the nasty firewalls zero out the bits in the unknown option and people are complaining"

It's different. With ECN, the router had two different, valid options: Leave the bits in the flag word as they are, or clear them and thus deleting the option. ECN was designed carefully enough that both ways worked. Blocking or dropping the packed is no option.

The Window scaling is not bits in the flag word, but an separately added option field. There, the firewall has two valid options: let the packet pass as it is, or remove the window scaling option field entirely. Communication continues to work with both options. Fiddling around inside the header field and wildly mangling the values is no option.

TCP window scaling and broken routers

Posted Dec 1, 2005 23:06 UTC (Thu) by walken (subscriber, #7089) [Link]

That sounds like a good idea, but - is there any way to get iptables to do what you describe ? From my own little netfilter experience, I know how to pass, drop or reject packets, but not how to filter bits (well, I think there is an option to do that with ECN, but what about OTHER must-be-zero bits) or how to drop arbitrary unknown tcp options.

Sounds a bit hypocritical for linux developers to complain about firewalls in the field if their own firewalling functionality does not allow this either.

But then again I'm not a netfilter expert so I could be mistaken.

TCP window scaling and broken routers

Posted Feb 7, 2008 21:12 UTC (Thu) by shemminger (subscriber, #5739) [Link]

The problem is firewall's that want to enforce window sizes but are too stupid and try to do
this without tracking the state of window scaling of the connection.

I will pick out OpenBSD as particularly broken in that regard, and they haven't fixed it.

Does Linux have that much clout?

Posted Jul 10, 2004 0:45 UTC (Sat) by giraffedata (guest, #1954) [Link]

It has been a few years, now, that Linux has a strong enough presence in the networking world that it can get away with taking this sort of position

Even if Linux in general has a strong enough presence, does kernel.org Linux have it? The rejection in question is only for kernel.org Linux, which almost nobody runs. The Linuxes that matter -- Red Hat, Suse, etc. probably will not follow suit, since their customers want their computers to talk the the existing Internet more than they want to take a stand against bad router owners.

TCP window scaling and broken routers

Posted Dec 10, 2004 0:02 UTC (Fri) by gene_wood (guest, #26577) [Link]

I just wanted to thank the author for detailing this phenomenon. I've been banging my head against this for about 2 months. I implemented the workaround you described and everything works perfectly.

The symptoms that I was experiencing are detailed here.

TCP window scaling and broken routers

Posted Dec 14, 2006 0:15 UTC (Thu) by pcharlan (guest, #29128) [Link]

With kernel 2.6.17.13 or higher, you can also do:

THEIR_IP=1.2.3.4
MY_GATEWAY=5.6.7.8

ip route add $THEIR_IP/32 via $MY_GATEWAY window 65535

which only limits window scaling for that destination without interfering with your other connections.

[It has been a while since the original article, but this still shows up first in Google when searching for "linux tcp window scaling broken router", so perhaps this will help someone.]

TCP window scaling and broken routers

Posted Jul 16, 2007 8:01 UTC (Mon) by ssabchew (guest, #46279) [Link]

I made an work around as I decrease the TCP outgoing buffer:
from
net.ipv4.tcp_rmem = 4096 87380 4194304

to this
net.ipv4.tcp_rmem = 4096 87380 207520

TCP window scaling and broken routers

Posted Aug 21, 2007 7:17 UTC (Tue) by dcam (guest, #46922) [Link]

Putting net.ipv4.tcp_window_scaling=0 into /etc/sysctl.conf and rebooting worked for me

TCP window scaling and broken routers

Posted Oct 18, 2007 15:35 UTC (Thu) by jsl123 (guest, #48520) [Link] (2 responses)

Just for the records...

Vista has the same problem too! (Found this with broken ssh connections)
A possible solution is described here:
http://www.tech-recipes.com/rx/1744/vista_tcp_cannot_comm...
(beware of line breaks in the above URL)

HTH. Salut, Joerg

Hotel access systems

Posted Apr 9, 2008 4:27 UTC (Wed) by shemminger (subscriber, #5739) [Link] (1 responses)

A common vendor of hotel access control systems (ibahn) seems to be particularly problematic.
I have already experienced that in 3 separate cases the wireless or wired access does not work
with window scaling.

Probably doesn't work with Vista either, unless Microsoft has turned off window scaling in
Vista SP1.

Hotel access systems

Posted Apr 9, 2008 17:02 UTC (Wed) by zlynx (guest, #2285) [Link]

Microsoft went to quite a lot of trouble to auto-detect network routes which fail to work with
window scaling, IPv6 and whatever else, and work around the problems.

Linux relies on the user to set things.  I believe there are some iptables rules you can apply
to selectively disable scaling and ECN.  However, I don't know of any distros with scripts
that detect problems and apply those rules automatically.

TCP window scaling and broken routers

Posted Jan 18, 2010 10:49 UTC (Mon) by PolyPeter (guest, #63026) [Link] (2 responses)

My setup
  • I have developed web application, now with more than 1000 members.
  • The web application runs on my own server with Linux Redhat.
  • The members can upload images (jpeg) to a gallery.
My problem
  • 3 members complain that they cannot upload images. On other web pages they can upload images just fine. They all have a Windows Vista machine.
  • The upload works just fine from the members network if they use Windows XP.
  • When uploading we have tested with af flash component where its possible to follow the uploaded bytes. The upload stops running at app. 64KB every time (64KB is described in the article!).
  • Other members with a Windows Vista machine does not have a similar problem
My conclusion
  • The upload problem only exist if the member have a broken router and at the same time have OS where TCP Window Scaling is enabled by default (for example Vista)
A partial solution

After I found this article i tryed did this:
echo 1 > /proc/sys/net/ipv4/tcp_window_scaling

After I did that, it solved the problem for one of the members! But only one. The two other members still have the problem...
Now I wonder what else I should do?

TCP window scaling and broken routers

Posted Jan 18, 2010 11:01 UTC (Mon) by PolyPeter (guest, #63026) [Link]


What I did was of course:
echo 0 > /proc/sys/net/ipv4/tcp_window_scaling
(turning OFF tcp window scaling)

TCP window scaling and broken routers

Posted Feb 11, 2010 10:23 UTC (Thu) by PolyPeter (guest, #63026) [Link]

After an automatic Windows Vista update at the local pc the uploads works! And can just conclude that Windows Vista is really crappy!


Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds