user namespace delta over 3.7
Eric W. Biederman
ebiederm at xmission.com
Mon Nov 19 15:48:36 UTC 2012
Colin Ian King <colin.king at canonical.com> writes:
> On 14/11/12 20:55, Serge Hallyn wrote:
>> Quoting Tim Gardner (tim.gardner at canonical.com):
>>> On 11/06/2012 09:36 AM, Serge Hallyn wrote:
>>>> Hi,
>>>>
>>>> the core of user namespace code has landed upstream, however some more
>>>> is needed to run full ubuntu containers in a user namespace. Some of
>>>> this will land in 3.8, but probably not all. Eric's development tree
>>>> is at http://git.kernel.org/?p=linux/kernel/git/ebiederm/user-namespace.git;a=summary
>>>>
>>>> I have pushed that tree on top of a recent raring tree at
>>>> git://kernel.ubuntu.com/serge/quantal-userns.git in branch
>>>> master.oct25.userns-v70. It consists of 84 patches (including 5 just
>>>> updating under debian/, one by me fix to account for ubuntu delta, and
>>>> one not (yet) in Eric's tree to allow tmpfs mounts in a container),
>>>> which I can git-email if desired. The built kernel is in
>>>> ppa:serge-hallyn/userns-natty and does allow me to boot a full ubuntu
>>>> container in a user namespace - meaning every root owned process and
>>>> file is actually owned by userid 100000 on the host and contained.
>>>>
>>>> I'm sending this now in the hopes that whatever bits don't land in
>>>> 3.8 can be pushed onto the raring kernel. Our goal this cycle is to
>>>> support user namespaces, and next cycle to support completely
>>>> unprivileged creation and starting of containers.
>>>>
>>>> -serge
>>>>
>>>
>>> Serge - how about a pull request for a branch that has been rebased
>>> on Raring master-next ? I took a quick stab at it and quickly ran
>>> into uapi transition conflicts (I think).
>>
>> A successfully built kernel is at
>> git://kernel.ubuntu.com/serge/quantal-userns.git (branch
>> master-next.nov14.userns which should be the default).
>>
>> -serge
>>
>
> I've got some questions and/or observations about the following commits:
>
> b3f4f523c8c20f2ca2ac031900f1a252d750ec1d
> debian changes to build in ppa
>
> ..this fiddles around with the skipabi, skipmodules to allow building in
> a PPA, but we should not pull that into the raring kernel.
>
> 1c428901dcae93832f13a01492539cb77fea6c85
> net: Allow opening an af_unix socket
>
> sock_open() has:
> /* file->f_flags??? */
> //file->f_flags = O_RDWR | (flags & O_NONBLOCK);
>
> ..the comment seems to be alluding to the fact we're not sure if we should be
> setting f->f_flags and that the code was put in during development (for testig?)
> and then commented out. Anyhow, it's confusing and I'm now not sure what this
> is meant to be doing. Should this be removed?
Yes. That patch is one of the patches at the end of my user namespace
tree that I have been playing with but is not fully baked. The goal of
that change is to allow us to just open a unix domain sockect with open
instead of connect.
>
> 8b16d00119a210ad9ed6d62fd0addb37f23c683b
> fuse: Teach fuse how to handle the pid namespace.
>
> convert_fuse_file_lock() has:
>
> fl->fl_pid = pid_vnr(find_pid_ns(ffl->pid, fc->pid_ns));
>
> is it seems possible (but unlikely) for find_pid_ns() to return NULL which
> passes NULL into pid_vnr() which in turn passes NULL into
> pid_nr_ns() which returns 0. Is a zero pid of fl->fl_pid valid?
Good question. My fuse state is paged out at the moment.
> 05ee1be568bf321167fe93219aaac029a49b3d3a
> devpts: Remove the devpty cleanup special case.
>
> in ptmx_open():
>
> /* Find the devpts instance we are working with */
> mnt = devpts_mntget(filp);
>
> getpts_mntget() can return ERR_PTR(-ENODEV), so mnt probably needs checking for
> this kind of unlikely failure case.
You mean the IS_ERR(mnt) check on the next line in ptmx_open?
Certainly I see it there in my tree.
> ebb713c0e5b2bbedcb4c94715a3973c0b084e723
> devpts: Make the newinstance option historical
>
> parse_mount_options():
>
> case Opt_newinstance: this is now a historical mount option and now silently
> does nothing. Perhaps we should print some kind of warning or info message to
> indicate this just to warn users that this is now being ignored, however this is
> documented in the changes in Documentation/filesystems/devpts.txt so maybe this
> is totally unnecessary.
> 7350816a4fea9a9368f94a81cd44c3eb31fbbf3d
> net: Push capable(CAP_NET_ADMIN) into the rtnl methods
>
> Comment in this patch:
> "Later patches will remove the extra capable calls from methods
> that are safe for unprivilged users."
>
> ..are these later patches in this patch set, if so, which ones are they?
I assume Serge has picked them up. They are the net patches where
capable(CAP_NET_ADMIN) in netlink methods are deleted. Basically for
ipv4, ipv6 and bridge as I recall. With all of the weird protocols that
don't work in network namespaces were left with capable(CAP_NET_ADMIN)
calls.
> 375cb2956423a172e017c1bfbc0c32d96250f41f
> net: Don't export sysctls to unprivileged users
>
> __ip_vs_lblc_init():
>
> the following change added a '\' which looks like a typo:
>
> - kfree(ipvs->lblc_ctl_table);
> + kfree(ipvs->lblc_ctl_table);\
> return -ENOMEM;
It is definitely a typo.
It is the best kind of typo that causes no build problems and gets
merged before anyone notices. Ouch!
I have just sent a patch to David so this doesn't linger in net-next.
> d4856b6128ec5735e654259407fd8573faec4b88
> userns: Convert nfs and nfsd to use kuid/kgid where appropriate
>
> find_guid():
>
> When a gid is not found then a new one is added to the aces[] array. I don't see
> any bounds checking on this, so can this potentially fall off the end of the
> aces[] array at some point?
init_state allocates enough space so that this is not needed. This
logic is unchanged from the existing nfsd logic. The only practical
change here is that find_uid and find_gid are split into separate
functions for type safety.
Eric
More information about the kernel-team
mailing list