ec2 networking containers

Thu May 17 16:46:50 UTC 2012

Excerpts from Kapil Thangavelu's message of Tue May 15 16:38:20 -0700 2012:
> Excerpts from Clint Byrum's message of 2012-05-15 14:38:07 -0700:
> > Excerpts from Kapil Thangavelu's message of Fri May 11 10:59:40 -0700 2012:
> > > Hi Folks,
> > > 
> > > noodling on networking containers in ec2 (only tcp, udp, icmp). soren 
> > > recommended some things to investigate at the openstack conf.
> > > 
> > > http://www.tinc-vpn.org/
> > > http://www.ntop.org/products/n2n/
> > > 
> > > there's a nice presentation on using tinc on ec2 with some mods sans crypto,
> > > http://www.dtmf.com/ec2_vpn_fosdem2011.pdf
> > 
> > Thanks for sending Kapil. Tinc definitely looks like it is focused on
> > the simplest solution to this fairly complex problem.
> > 
> 
> indeed it is complex, and even then it doesn't fully solve the issue. deploy 
> mutiple units of wordpress to a machine and expose the service.. fail without 
> more complexity. Alternatively try different web apps units on the same machine. 
> only dynamic port allocation per unit really allows for this given port 
> conflicts which are going to be common around port 80/443.
> 

Yeah, the port problem is huge and as long as we have to deal with IPv4 we'll
have to deal with starvation at this level.

> > I still feel very strongly that EC2 is properly segmented for any
> > real world workload. The m1.small is about as powerful as a netbook,
> > and anything that one cannot match dollars to CPU time can be run on
> > a t1.micro. I don't really think juju should be focused on smaller use
> > cases than what a t1.micro can solve.
> 
> While i agree with that in principle, ie. that we should use provider instance 
> sizing to allow for appropriate workloads, it tends to run into real world 
> problems. In ec2 for example, t1.micros are a joke wrt to getting real world 
> work done, and even beyond that larger machines have better cpu/io capability 
> even ignoring the instance-size differential. ie the bigger you get the less you 
> have to share. There's a reason that various paas/saas in the cloud use larger 
> instance types and segment for isolation. None of them afaik, bother with trying 
> to do an overlay net in ec2 though, they just use dynamic port allocation in 
> conjunction with front end load balancers/reverse proxy caches.
> 
> http://www.quora.com/Do-different-EC2-instance-types-have-different-EBS-I-O-performance-characteristics
> http://perfcap.blogspot.com/2011/03/understanding-and-using-amazon-ebs.html
> 
> https://github.com/cloudfoundry/vcap/tree/master/warden
> https://devcenter.heroku.com/articles/dyno-isolation
> 

Thanks for the links, I had not seen this argument or those measurements
before.  It makes a lot of sense, even though I think its just working
around limitations in Amazon's implementation. Perhaps users that need
better I/O should go to a cloud provider that gives it to them.. that
may very well lead Amazon to offer it too.

That said, it would definitely be a plus if Juju users can also easily
work around Amazon's issues if for whatever reason they cannot move to
another cloud vendor.

> > 
> > I also don't expect that other cloud providers will deviate much from
> > the mix that EC2 has established, as they seem to have spent the last 5+
> > years getting this right and responding to real customer needs. I'm sure
> > RAX will have their own definition of what a "CPU" is and so will HP.
> > However, ultimately, people will ask for partial CPU instances, and juju
> > should expect that much from them, or expect that users will migrate
> > to EC2.
> 
> while that's true in some respect, i suspect other cloud providers based on 
> openstack will in future open richer networking apis to consumers, but i doubt 
> that will give us anything approaching the ability we're looking for (segmenting 
> provider instance types onto a soft overlay network).
> 

Some day, sure. For now, all cloud providers that are not Amazon are
trying to catch up to them, and will probably focus on offering a slightly
better version of the same model. Once there is competition on price,
then we'll see some really interesting differentiation.

> > 
> > For bare metal, juju should be focused on either HPC cases, or deploying
> > virtualization solutions.
> 
> agreed. although i'd sub hpc for anything i/o or cpu bound, ie. big data as 
> well.
> 

When I say HPC, I mean big data. :)

> > 
> > For HPC, putting two services on a node makes no sense because the
> > entire node should be 100% taxed on the HPC service. This is useful in
> > bare metal because the virtualization overhead implied by "the cloud"
> > may be enough to justify dedicated servers. Of course, one might argue
> > for OpenStack+LXC at that point.
> > 
> 
> agreed, perhaps even juju with lxc (single unit per machine for easy re-use).
> 
> > For OpenStack, I see a real need to be able to combine mysql+rabbit+cloud
> > controller because they will eat up a real server each. This won't matter
> > in real deployments, but users seem to report having 2 - 5 machines
> > for test clouds, not 9. This also is probably the case for test HPC
> > deployments too.
> 
> agreed. This came up in a few conversations at UDS as one of the primary drivers 
> around having some additional placement consideration, namely lack of physical 
> hardware for openstack deploys. I think we can handle this case fairly well via 
> a juju-jitsu deploy-to extension outside of the core...
> 

Interesting idea! :)

> > 
> > Given the type of network that one can expect with MaaS vs. EC2, I
> > think we can reasonably expect that we can just configure containers in
> > bridged mode and use them without an extra virtual network for them to
> > ride on. This is where I would suggest juju devote resources to before
> > we drag a whole virtual network into play.
> 
> so your saying focus on juju unit's with lxc briged mode only maas and ignore 
> non baremetal use cases around this?
> 

I'm saying provide the option to use multiple LXC units per machine with
bridged networking, but have it turned *off*. We can certainly provide
the option for EC2 providers, and then if a person's private cloud
offers them the ability for this to work, it just works too. For MaaS,
maybe we can turn it on or maybe we just suggest that people turn it on
when they are in test mode and can use it as such.

> in the future, maas has all kinds of options and possible integrations at the 
> physical and soft networking levels wrt to both topology discovery and 
> manipulation.
> 
> > 
> > Also for the "I just want to cleanup the charm in the 1:1 machine:service
> > deployment" case, I think we should take a good look at using
> > chroot. Upstart already supports it, so we can install and run all the
> > upstart jobs we need. Schroot already is able to kill all processes and
> > cleanup all files in a schroot. Since we wouldn't be isolating the chroot
> > from other services, just using it for cleanup purposes, this would be
> > a very simple way to get "containers" without the network complexity.
> > 
> 
> we could just as easily use lxc without network namespaces (heroku does this), 
> and that gives the added resource counter benefits of lxc. in the context of 
> juju i can't think of anyplace where a chroot is preferrable to lxc.
> 

lxc without networking namespacing does not work today because upstart
cannot start inside an LXC container that is not network-namespaced away
from the system upstart.

So, "works in an LTS" vs. "could be made to work" is the only reason I bring
this up. We can use chroot now, and then when upstart grows "work in an LXC
container without network namespacing" support in the next LTS, we can use it.