[Bug 589034] Re: nbd-proxy hangs the nbd-connection to server
Stéphane Graber
stgraber at stgraber.org
Wed Jul 6 09:30:24 UTC 2011
Marking as fix released as nbd-proxy has been disabled for a while now in both upstream and more recent Ubuntu releases.
A rewrite of nbd-proxy has been done and should fix most of these issues so we might turn nbd-proxy back on in a later release.
** Changed in: ltsp
Status: Confirmed => Fix Released
** Changed in: ltsp (Ubuntu)
Status: New => Fix Released
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to ltsp in Ubuntu.
https://bugs.launchpad.net/bugs/589034
Title:
nbd-proxy hangs the nbd-connection to server
Status in Linux Terminal Server Project:
Fix Released
Status in “ltsp” package in Ubuntu:
Fix Released
Bug description:
I am running an ltsp server on Ubuntu (10.04) Lucid Lynx, with a
Primergy TX120 S2 as an ltsp server, and HP Probook 4310s as a
terminal connecting to server. The server installation has the amd64
architecture, but the terminal image is using i386. This problem
could also be reproduced with a kvm virtual machine functioning as a
server, with a similar installation, and has been observed with
another type of terminal machine as well (XPC shuttle X27D).
The server contains ltsp-server and ltsp-server-standalone packages,
in version 5.2.1-0ubuntu9. The terminal image contains matching
versions (5.2.1-0ubuntu9) of ltsp-client and ltsp-client-core
packages. Kernel version on the server side is 2.6.32-22-server, and
on the terminal side it is 2.6.32-22-generic.
I am using dnsmasq as the dhcp-server, and the following settings in
/var/lib/tftpboot/ltsp/i386/lts.conf:
[default]
LDM_DIRECTX = True
LDM_LANGUAGE = "fi_FI.UTF-8"
LOCAL_APPS = True
LOCALDEV = True
LTSP_FATCLIENT = False
NBD_SWAP = True
REMOTE_APPS = True
SSH_FOLLOW_SYMLINKS = False
SSH_OVERRIDE_PORT = 222
On the server side Linux reports the following about the network
interface that connected to the terminal (some dmesg-snippets here):
[ 1.862987] 0000:30:00.0: eth1: (PCI Express:2.5GB/s:Width x1) 00:15:17:cf:5e:de
[ 1.862989] 0000:30:00.0: eth1: Intel(R) PRO/1000 Network Connection
[ 1.863069] 0000:30:00.0: eth1: MAC: 1, PHY: 4, PBA No: d50858-004
[ 20.324320] ADDRCONF(NETDEV_UP): eth1: link is not ready
[ 22.891005] e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[ 22.892038] ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
On the terminal side Linux reports the following about the network
interface that connected to the server (dmesg-snippets):
[ 1.451527] sky2 eth0: addr 00:26:55:c4:06:95
[ 4.535708] sky2 eth0: enabling interface
[ 4.535949] ADDRCONF(NETDEV_UP): eth0: link is not ready
[ 7.029456] sky2 eth0: Link is up at 1000 Mbps, full duplex, flow control both
[ 7.029693] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
On this configuration, nbd-connection to server works quite well,
without any significant problems (it appears to rarely hang, but only
rarely). However, when putting a switch (ZyXEL Desktop Ethernet
Switch 10/100Mbps) between these computers, the network interface
state changes on the server:
[18989.100157] e1000e: eth1 NIC Link is Down
[18994.101017] e1000e: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[18994.101023] 0000:30:00.0: eth1: 10/100 speed: disabling TSO
And on the terminal side:
[ 248.484539] sky2 eth0: Link is down.
[ 254.785883] sky2 eth0: Link is up at 100 Mbps, full duplex, flow control both
On this slower network connection between the server and the terminal,
nbd-connection frequently hangs. Loading the kernel and initial
ramdisk is always reliable, but the nbd connection may stop
transferring data at some point, and this point appears to change
randomly, yet often before the login screen comes up. Note that the
nbd connection does remain open --- at least on the server side a
socket connection remains established to the terminal, but nothing is
transferred between the machines.
With the previous configuration, the success rate of reaching the ldm
login screen is about 30-40% at every boot. Without the switch
sitting in-between, but using a direct gigabit link, the success rate
is something between 90-100%.
It seems this problem is due to nbd-proxy, because this issue goes
away when it is disabled in the initial ramdisk downloaded by the
terminal. After using a direct connection from nbd-client to the
server, the success rate of reaching the ldm login screen at every
boot appears to be pretty close to 100%.
I suspect there a correlation between the terminal CPU speed and the
network speed that affects this issue. Perhaps if a terminal machine
is comparatively slow and the network is fast, this problem occurs
very rarely?
This problem can be worked around by disabling nbd-proxy. This can be done by applying the attached patch to the terminal tree (under /opt/ltsp/i386 for the i386 architecture), and then rebuilding the terminal image with
"sudo ltsp-update-image --arch i386".
To manage notifications about this bug go to:
https://bugs.launchpad.net/ltsp/+bug/589034/+subscriptions
More information about the foundations-bugs
mailing list