[Bug 1071910] Re: lxc stop will hang forever
Tim
iceczd at gmail.com
Tue Oct 30 16:02:52 UTC 2012
** Description changed:
Background:
This is issue occurs during an automated process and occurs with a 1/20 chance per iteration
I have one lxc-container on the machine
It is backed with an lvm2 snapshot
Running on ubuntu 12.10 on ec2 small instance - upgraded from 12.04 fresh instance
This is a new issue that has occurred after migrating my code from 11.10
Process:
create snapshot "lvcreate"
mount snapshot "mount"
lxc-start
do actions in container
lxc-stop
unmount snapshot "umount"
remove snapshot "lvremove"
-repeat
The issue can occur at either lxc-stop or lvremove.
when it occurs with lxc-stop:
ps -A reveals that lxc-start is still running along with kdmflush, kjournald, and init that appears to be the init process for the container
kdmflush, kjournald, init or it's sub-processes cannot be killed with "kill -9 pid" but lxc-start can
when it occurs with lvremove it occurs after lvremove is called again after failing the first time with stderr:
Using logical volume(s) on command line
- Archiving volume group "vmg1" metadata (seqno 272).
- Removing snapshot snap
- Found volume group "vmg1"
- Found volume group "vmg1"
- Loading vmg1-vm table (252:0)
- Loading vmg1-snap table (252:1)
- /sbin/dmeventd: stat failed: No such file or directory
- vmg1/snapshot0 already not monitored.
- Suspending vmg1-vm (252:0) with device flush
- Suspending vmg1-snap (252:1) with device flush
- Suspending vmg1-vm-real (252:2) with device flush
- Suspending vmg1-snap-cow (252:3) with device flush
- Found volume group "vmg1"
- Resuming vmg1-snap-cow (252:3)
- Resuming vmg1-vm-real (252:2)
- Resuming vmg1-snap (252:1)
- Removing vmg1-snap-cow (252:3)
- device-mapper: remove ioctl on failed: Device or resource busy
- Unable to deactivate vmg1-snap-cow (252:3)
- Failed to resume snap.
- libdevmapper exiting with 1 device(s) still suspended.
+ Archiving volume group "vmg1" metadata (seqno 272).
+ Removing snapshot snap
+ Found volume group "vmg1"
+ Found volume group "vmg1"
+ Loading vmg1-vm table (252:0)
+ Loading vmg1-snap table (252:1)
+ /sbin/dmeventd: stat failed: No such file or directory
+ vmg1/snapshot0 already not monitored.
+ Suspending vmg1-vm (252:0) with device flush
+ Suspending vmg1-snap (252:1) with device flush
+ Suspending vmg1-vm-real (252:2) with device flush
+ Suspending vmg1-snap-cow (252:3) with device flush
+ Found volume group "vmg1"
+ Resuming vmg1-snap-cow (252:3)
+ Resuming vmg1-vm-real (252:2)
+ Resuming vmg1-snap (252:1)
+ Removing vmg1-snap-cow (252:3)
+ device-mapper: remove ioctl on failed: Device or resource busy
+ Unable to deactivate vmg1-snap-cow (252:3)
+ Failed to resume snap.
+ libdevmapper exiting with 1 device(s) still suspended.
lvremove spawns the lvm process and neither can be killed with "kill -9
pid" which indicates to me that they are waiting for something from the
kernel, and I am guessing this happens because of the same reason why
lxc-stop also hangs, and the containers processes can not be killed.
- This is all I can report for now, but I'll try getting some log info
- from lxc next Friday, let me know if you have any suggestions in the
- meantime.
+ Here is an excerpt from the syslog - lxc-stop hangs because of this
+ kernel error, and the error doesn't always occur on the cat command, it
+ can happen on others as well.
+
+ --START
+ Oct 30 14:43:12 domU-12-31-39-14-64-79 kernel: [ 1094.406366] kjournald starting. Commit interval 5 seconds
+ Oct 30 14:43:12 domU-12-31-39-14-64-79 kernel: [ 1094.406929] EXT3-fs (dm-1): using internal journal
+ Oct 30 14:43:12 domU-12-31-39-14-64-79 kernel: [ 1094.406931] EXT3-fs (dm-1): mounted filesystem with ordered data mode
+ Oct 30 14:43:12 domU-12-31-39-14-64-79 kernel: [ 1094.435218] device vethyurrCc entered promiscuous mode
+ Oct 30 14:43:12 domU-12-31-39-14-64-79 kernel: [ 1094.435613] IPv6: ADDRCONF(NETDEV_UP): vethyurrCc: link is not ready
+ Oct 30 14:43:12 domU-12-31-39-14-64-79 kernel: [ 1094.534518] IPv6: ADDRCONF(NETDEV_CHANGE): vethyurrCc: link becomes ready
+ Oct 30 14:43:12 domU-12-31-39-14-64-79 kernel: [ 1094.534543] br0: port 1(vethyurrCc) entered forwarding state
+ Oct 30 14:43:12 domU-12-31-39-14-64-79 kernel: [ 1094.534547] br0: port 1(vethyurrCc) entered forwarding state
+
+ --STOP
+ Oct 30 14:43:17 domU-12-31-39-14-64-79 kernel: [ 1099.112881] br0: port 1(vethyurrCc) entered disabled state
+ Oct 30 14:43:17 domU-12-31-39-14-64-79 kernel: [ 1099.115187] device vethyurrCc left promiscuous mode
+ Oct 30 14:43:17 domU-12-31-39-14-64-79 kernel: [ 1099.115190] br0: port 1(vethyurrCc) entered disabled state
+
+ --START
+ Oct 30 14:43:18 domU-12-31-39-14-64-79 kernel: [ 1100.188337] kjournald starting. Commit interval 5 seconds
+ Oct 30 14:43:18 domU-12-31-39-14-64-79 kernel: [ 1100.188852] EXT3-fs (dm-1): using internal journal
+ Oct 30 14:43:18 domU-12-31-39-14-64-79 kernel: [ 1100.188859] EXT3-fs (dm-1): mounted filesystem with ordered data mode
+ Oct 30 14:43:18 domU-12-31-39-14-64-79 kernel: [ 1100.310142] device vethfsh25j entered promiscuous mode
+ Oct 30 14:43:18 domU-12-31-39-14-64-79 kernel: [ 1100.310539] IPv6: ADDRCONF(NETDEV_UP): vethfsh25j: link is not ready
+ Oct 30 14:43:18 domU-12-31-39-14-64-79 kernel: [ 1100.319210] IPv6: ADDRCONF(NETDEV_CHANGE): vethfsh25j: link becomes ready
+ Oct 30 14:43:18 domU-12-31-39-14-64-79 kernel: [ 1100.319240] br0: port 1(vethfsh25j) entered forwarding state
+ Oct 30 14:43:18 domU-12-31-39-14-64-79 kernel: [ 1100.319244] br0: port 1(vethfsh25j) entered forwarding state
+
+ --STOP
+ Oct 30 14:43:23 domU-12-31-39-14-64-79 kernel: [ 1105.073237] br0: port 1(vethfsh25j) entered disabled state
+ Oct 30 14:43:23 domU-12-31-39-14-64-79 kernel: [ 1105.075541] device vethfsh25j left promiscuous mode
+ Oct 30 14:43:23 domU-12-31-39-14-64-79 kernel: [ 1105.075544] br0: port 1(vethfsh25j) entered disabled state
+
+ --START
+ Oct 30 14:43:24 domU-12-31-39-14-64-79 kernel: [ 1106.091653] kjournald starting. Commit interval 5 seconds
+ Oct 30 14:43:24 domU-12-31-39-14-64-79 kernel: [ 1106.092173] EXT3-fs (dm-1): using internal journal
+ Oct 30 14:43:24 domU-12-31-39-14-64-79 kernel: [ 1106.092176] EXT3-fs (dm-1): mounted filesystem with ordered data mode
+ Oct 30 14:43:24 domU-12-31-39-14-64-79 kernel: [ 1106.119867] device vethYI2DWn entered promiscuous mode
+ Oct 30 14:43:24 domU-12-31-39-14-64-79 kernel: [ 1106.120382] IPv6: ADDRCONF(NETDEV_UP): vethYI2DWn: link is not ready
+ Oct 30 14:43:24 domU-12-31-39-14-64-79 kernel: [ 1106.128936] IPv6: ADDRCONF(NETDEV_CHANGE): vethYI2DWn: link becomes ready
+ Oct 30 14:43:24 domU-12-31-39-14-64-79 kernel: [ 1106.128964] br0: port 1(vethYI2DWn) entered forwarding state
+ Oct 30 14:43:24 domU-12-31-39-14-64-79 kernel: [ 1106.128968] br0: port 1(vethYI2DWn) entered forwarding state
+
+ --STOP
+ Oct 30 14:43:28 domU-12-31-39-14-64-79 kernel: [ 1110.816859] br0: port 1(vethYI2DWn) entered disabled state
+ Oct 30 14:43:28 domU-12-31-39-14-64-79 kernel: [ 1110.819087] device vethYI2DWn left promiscuous mode
+ Oct 30 14:43:28 domU-12-31-39-14-64-79 kernel: [ 1110.819090] br0: port 1(vethYI2DWn) entered disabled state
+
+ --Why is this happening occasionally?
+ Oct 30 14:43:29 domU-12-31-39-14-64-79 udevd[2811]: inotify_add_watch(6, /dev/dm-1, 10) failed: No such file or directory
+
+ --START
+ Oct 30 14:43:29 domU-12-31-39-14-64-79 kernel: [ 1111.748495] kjournald starting. Commit interval 5 seconds
+ Oct 30 14:43:29 domU-12-31-39-14-64-79 kernel: [ 1111.748933] EXT3-fs (dm-1): using internal journal
+ Oct 30 14:43:29 domU-12-31-39-14-64-79 kernel: [ 1111.748936] EXT3-fs (dm-1): mounted filesystem with ordered data mode
+ Oct 30 14:43:29 domU-12-31-39-14-64-79 kernel: [ 1111.868572] device vethSaApSo entered promiscuous mode
+ Oct 30 14:43:29 domU-12-31-39-14-64-79 kernel: [ 1111.869304] IPv6: ADDRCONF(NETDEV_UP): vethSaApSo: link is not ready
+ Oct 30 14:43:30 domU-12-31-39-14-64-79 kernel: [ 1111.874370] IPv6: ADDRCONF(NETDEV_CHANGE): vethSaApSo: link becomes ready
+ Oct 30 14:43:30 domU-12-31-39-14-64-79 kernel: [ 1111.874394] br0: port 1(vethSaApSo) entered forwarding state
+ Oct 30 14:43:30 domU-12-31-39-14-64-79 kernel: [ 1111.874398] br0: port 1(vethSaApSo) entered forwarding state
+
+ --STOP
+ Oct 30 14:43:34 domU-12-31-39-14-64-79 kernel: [ 1116.749280] br0: port 1(vethSaApSo) entered disabled state
+ Oct 30 14:43:34 domU-12-31-39-14-64-79 kernel: [ 1116.751502] device vethSaApSo left promiscuous mode
+ Oct 30 14:43:34 domU-12-31-39-14-64-79 kernel: [ 1116.751505] br0: port 1(vethSaApSo) entered disabled state
+
+ --START
+ Oct 30 14:43:36 domU-12-31-39-14-64-79 kernel: [ 1118.774270] kjournald starting. Commit interval 5 seconds
+ Oct 30 14:43:36 domU-12-31-39-14-64-79 kernel: [ 1118.774709] EXT3-fs (dm-1): using internal journal
+ Oct 30 14:43:36 domU-12-31-39-14-64-79 kernel: [ 1118.774711] EXT3-fs (dm-1): mounted filesystem with ordered data mode
+ Oct 30 14:43:36 domU-12-31-39-14-64-79 kernel: [ 1118.803322] device vethC8ic4K entered promiscuous mode
+ Oct 30 14:43:36 domU-12-31-39-14-64-79 kernel: [ 1118.803718] IPv6: ADDRCONF(NETDEV_UP): vethC8ic4K: link is not ready
+ Oct 30 14:43:36 domU-12-31-39-14-64-79 kernel: [ 1118.812401] IPv6: ADDRCONF(NETDEV_CHANGE): vethC8ic4K: link becomes ready
+ Oct 30 14:43:36 domU-12-31-39-14-64-79 kernel: [ 1118.812458] br0: port 1(vethC8ic4K) entered forwarding state
+ Oct 30 14:43:36 domU-12-31-39-14-64-79 kernel: [ 1118.812464] br0: port 1(vethC8ic4K) entered forwarding state
+
+ --KERNEL ERROR
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252907] ------------[ cut here ]------------
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252921] kernel BUG at /build/buildd/linux-3.5.0/arch/x86/mm/fault.c:396!
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252926] invalid opcode: 0000 [#1] SMP
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252932] CPU 0
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252934] Modules linked in: veth dm_snapshot xt_tcpudp iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables x_tables bridge stp llc isofs microcode acpiphp
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252958]
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252960] Pid: 8140, comm: cat Not tainted 3.5.0-17-generic #28-Ubuntu
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252966] RIP: e030:[<ffffffff8168533f>] [<ffffffff8168533f>] vmalloc_fault+0x11f/0x208
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252979] RSP: e02b:ffff880002f1d9b8 EFLAGS: 00010046
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252983] RAX: ffff880026caeff8 RBX: ffffe8ffffc00ac8 RCX: 0000000000000000
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252988] RDX: 00003ffffffff000 RSI: ffff880000000ff8 RDI: 0000000000000000
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252993] RBP: ffff880002f1d9d8 R08: ffff880017c6ae70 R09: 00007f7b4d46e000
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.252998] R10: 0000000000000001 R11: 0000000000000000 R12: ffff880066231e88
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253003] R13: ffff880026caeff8 R14: ffff880000000ff8 R15: 0000000000000002
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253012] FS: 00007f7b4d68c700(0000) GS:ffff88006a000000(0000) knlGS:0000000000000000
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253017] CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253021] CR2: ffffe8ffffc00ac8 CR3: 0000000066231000 CR4: 0000000000002660
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253027] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253033] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253038] Process cat (pid: 8140, threadinfo ffff880002f1c000, task ffff88002470dc00)
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253044] Stack:
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253046] ffffe8ffffc00ac8 0000000000000029 ffff880002f1daf8 0000000000000000
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253055] ffff880002f1dae8 ffffffff816858f9 0000000000000657 ffffffff812e79e1
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253064] ffff88002470dc00 0000000000000060 ffff880055ecdd1c ffff88005636b540
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253072] Call Trace:
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253078] [<ffffffff816858f9>] do_page_fault+0x3b9/0x4e0
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253087] [<ffffffff812e79e1>] ? aa_path_name+0x71/0x440
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253094] [<ffffffff8107e86a>] ? lg_local_unlock+0x1a/0x20
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253100] [<ffffffff8168b14b>] ? xen_hypervisor_callback+0x1b/0x20
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253106] [<ffffffff81004eec>] ? xen_mc_extend_args+0xec/0x110
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253112] [<ffffffff810046c0>] ? load_TLS_descriptor+0x40/0xc0
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253118] [<ffffffff81004bd2>] ? xen_mc_flush+0xb2/0x1b0
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253123] [<ffffffff816821e5>] page_fault+0x25/0x30
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253130] [<ffffffff81176e54>] ? mem_cgroup_charge_statistics.isra.15+0x14/0x50
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253137] [<ffffffff81178ebc>] __mem_cgroup_uncharge_common+0xcc/0x2c0
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253143] [<ffffffff8100761d>] ? xen_pte_val+0x1d/0x40
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253149] [<ffffffff8117c242>] mem_cgroup_uncharge_page+0x22/0x30
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253155] [<ffffffff81153c97>] page_remove_rmap+0xb7/0x140
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253162] [<ffffffff8114797a>] ? vm_normal_page+0x1a/0x80
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253168] [<ffffffff81148c31>] unmap_page_range+0x4b1/0x740
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253173] [<ffffffff81148f4b>] unmap_single_vma+0x8b/0xd0
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253179] [<ffffffff81149762>] unmap_vmas+0x52/0xa0
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253184] [<ffffffff81150cf2>] exit_mmap+0x92/0x150
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253191] [<ffffffff81681dbe>] ? _raw_spin_lock_irqsave+0x2e/0x40
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253198] [<ffffffff8104ef24>] mmput+0x74/0x110
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253204] [<ffffffff810577ba>] exit_mm+0x10a/0x130
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253208] [<ffffffff81057939>] do_exit+0x159/0x8e0
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253213] [<ffffffff8105841f>] do_group_exit+0x3f/0xa0
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253218] [<ffffffff81058497>] sys_exit_group+0x17/0x20
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253224] [<ffffffff81689d29>] system_call_fastpath+0x16/0x1b
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253229] Code: 4c 89 e7 e8 71 e4 fe ff 4c 89 ef 48 89 de 49 89 c6 e8 63 e4 fe ff 48 83 38 00 49 89 c5 0f 84 e5 00 00 00 49 8b 3e 48 85 ff 75 02 <0f> 0b ff 14 25 e0 dd c1 81 48 89 c2 4$
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253285] RIP [<ffffffff8168533f>] vmalloc_fault+0x11f/0x208
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253291] RSP <ffff880002f1d9b8>
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253307] ---[ end trace 0e83c1ffecd3a6f4 ]---
+ Oct 30 14:43:39 domU-12-31-39-14-64-79 kernel: [ 1121.253312] Fixing recursive fault but reboot is needed!
--
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to lxc in Ubuntu.
https://bugs.launchpad.net/bugs/1071910
Title:
lxc stop will hang forever
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1071910/+subscriptions
More information about the Ubuntu-server-bugs
mailing list