[ec2-beta] Ec2-beta Digest, Vol 5, Issue 14
Ben Hendrickson
ben at seomoz.org
Wed Apr 15 22:36:58 BST 2009
Chuck Short <chuck.short at canonical.com> wrote:
> Can you describe how you were setup so I can try reproducing the problem
> here?
Sure, although if you'd rather wait for a second person to report this
problem, that would also seem reasonable to me :-)
My email to the list on Mon, Apr 13 lists all of the commands I run to
setup a machine. That includes installing packages, setting up RAID
0, installing ResierFS, and amounting md0. The commands were copied
out of a .sh file that my software copies to all of the machines via
scp and then executes via SSH connections, so it should be fairly
precise. The actual software we run is a lot of custom c++, so I'd
try reproducing it without it. Indeed if you can't, then perhaps this
isn't worth your time!
Anyway details on our software:
1) We read and write using the linux open, read, write, and close calls.
2) We generally read and write in 1MB chunks.
3) We use the lzo library to implement lzop compatible compression.
4) We general run between 2 and 10 processes at a time on each machine
to maximize throughput. Generally each process has two pthreads, one
that does compression and one that does everything else.
Anyway, if I was trying to reproduce this without our software, I
might try something like:
1) Start some number of large ec2 instances.
2) Run the commands from my email on monday to set each up with raid 0
and RieserFS
3) Run: apt-get install lzop
4) Startup two screen sessions on each machine that loop compressing
and decompressing large files via lzop. That should ensure both
processors are maxed out, and lzop should exit false when a checksum
fails. For instance, make a unique empty directory under /mnt for
each screen session and in it run:
cat /dev/urandom | head -c 200000000000 | lzop -c > lf.lzo
bash -c "set -e; while true; do cat lf.lzo | lzop -d | lzop -c >
lf.2.lzo; rm lf.lzo; cat lf.2.lzo | lzop -d | lzop -c > lf.lzo; rm
lf.2.lzo; echo .; done"
Now, I haven't actually tried the above steps so no reason to follow
them literally! But our code does read and write files that are lzop
compatible, and we frequently have two processes running each with two
threads (one for compressing the output, one for everything else). So
the above steps should get something fairly close to what we do. I
think the "set -e" should make sure the last command stops looping
when lzop has its checksum fail.
Anyway, let me know if there is any information you'd find useful that
I've left out. And good luck! I know these sorts of issues are
incredibly hard to track down.
Ben
More information about the Ec2-beta
mailing list