[Bug 1449951] Re: tomcat7 in ubuntu trusty is affected by the '50 days timeout bug' when run by the Oracle JDK 7

Bug Watch Updater 1449951 at bugs.launchpad.net
Thu Apr 30 09:45:45 UTC 2015


Launchpad has imported 14 comments from the remote bug at
https://bz.apache.org/bugzilla/show_bug.cgi?id=56684.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.

------------------------------------------------------------------------
On 2014-06-30T12:21:22+00:00 Gustavo-l wrote:

After upgrading to Java 7, I've noticed some applications fail after
about 50 days f (for instance May 8 till June 27). This is suspiciously
close to 2 ^ 32 -1 milliseconds:

Jun 27, 2014 9:24:47 AM org.apache.catalina.core.StandardServer await
SEVERE: StandardServer.await: accept:
java.net.SocketTimeoutException: Accept timed out
        at java.net.PlainSocketImpl.socketAccept(Native Method)
        at java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
        at java.net.ServerSocket.implAccept(ServerSocket.java:530)
        at java.net.ServerSocket.accept(ServerSocket.java:498)
        at org.apache.catalina.core.StandardServer.await(StandardServer.java:431)
        at org.apache.catalina.startup.Catalina.await(Catalina.java:676)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:628)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Jun 27, 2014 9:24:47 AM org.apache.coyote.http11.Http11AprProtocol pause
INFO: Pausing Coyote HTTP/1.1 on http-51080
Jun 27, 2014 9:24:47 AM org.apache.coyote.ajp.AjpAprProtocol pause
INFO: Pausing Coyote AJP/1.3 on ajp-51009
Jun 27, 2014 9:24:48 AM org.apache.catalina.core.StandardService stop
INFO: Stopping service Catalina

After this, the application is no longer listening.

I suspect this may be due to a change in behavior in Java. See:

$ $JAVA_HOME/bin/java -version
java version "1.7.0_60"
Java(TM) SE Runtime Environment (build 1.7.0_60-b19)
Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)

$ strace -f -o /tmp/hh groovy -e 'new ServerSocket(60001, 1, InetAddress.getByName("localhost")).accept()'
File contains:
2212  bind(47, {sa_family=AF_INET6, sin6_port=htons(60001), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
2212  listen(47, 1)                     = 0
2212  poll([{fd=47, events=POLLIN|POLLERR}], 1, 4294967295 <unfinished ...>

$ $JAVA_HOME/bin/java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

2661  bind(46, {sa_family=AF_INET6, sin6_port=htons(60001), inet_pton(AF_INET6, "::ffff:127.0.0.1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
2661  listen(46, 1)                     = 0
2661  accept(46,  <unfinished ...>

While java6 calls accept(), java7 first waits for an event on the socket
with poll(). However, it uses a timeout of 4294967295 milliseconds.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/0

------------------------------------------------------------------------
On 2014-07-07T21:36:26+00:00 Markt-u wrote:

Hmm. My initial impression is that that is a JRE bug. I've contacted
someone at Oracle (see the dev list) who has been helpful in pointing us
in the right direction to report bugs. Keep an eye on the dev list for
progress on that.

However, I can't repeat the results you report. When I run strace on the
same Java version as you are using, I see a call to poll with a timeout
of -1 which I am assuming is an infinite timeout.

You'll need to provide enough information about your environment to
enable us (and Oracle) to repeat this otherwise this is going to get
resolved as WORKSFORME.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/1

------------------------------------------------------------------------
On 2014-07-07T22:44:30+00:00 Gustavo-l wrote:

I tested this on Ubuntu Trusty amd64. I can also reproduce on my home
machine, which runas arch (amd64).

You can try this Vagrantfile:

Vagrant.configure('2') do |config|
  config.vm.box = "ubuntu/trusty64"

  provision_script = <<-eos
      cat > ~vagrant/run-test << eod
#!/bin/bash

# only one version of Java installed; no need to set JAVA_HOME
# so Groovy picks the correct one
aptitude install -y oracle-java7-installer groovy
java -version
strace -f -o /tmp/hh groovy -e 'new ServerSocket(60001, 1, InetAddress.getByName("localhost")).accept()'
eod
      chmod +x ~vagrant/run-test

      apt-get install -y python-software-properties strace vim
      add-apt-repository -y ppa:webupd8team/java
      apt-get update
    eos

  config.vm.provision "shell", inline: provision_script
end

after ssh'ing, run sudo ./run-test, accept the license and then Ctrl+C.
You can then instpect /tmp/hh

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/2

------------------------------------------------------------------------
On 2014-07-07T23:37:22+00:00 Gustavo-l wrote:

The strace output may actually be a bug in strace because poll() takes a
signed int as timeout; it would seem strace is interpreting the value as
an unsigned int. Indeed:

$ cat a.c && gcc a.c && strace ./a.out 
#include <poll.h>
#include <stdio.h>

void main() {
	struct pollfd fd = { .fd = fileno(stdin), .events = POLLIN };
	int timeout = -1;

	poll(&fd, 1, timeout);
}
execve("./a.out", ["./a.out"], [/* 57 vars */]) = 0
brk(0)                                  = 0xeb5000
(...)
poll([{fd=0, events=POLLIN}], 1, 4294967295^CProcess 6565 detached

That leaves me (short of a bug in glibc or the kernel) with no
explanation for the exception I got, which takes quite a while to
reproduce, though if poll doesn't use the monotonic clock, there may be
an faster way. I'll check tomorrow.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/3

------------------------------------------------------------------------
On 2014-07-07T23:53:04+00:00 Gustavo-l wrote:

A quick observation: the poll syscall used to take a long, and now takes
an int. Maybe strace hasn't been updated yet:

http://lwn.net/Articles/483078/

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/4

------------------------------------------------------------------------
On 2014-07-08T11:59:21+00:00 Markt-u wrote:

I was testing on Ubuntu Precise. After updating to Trusty I see the same
value as you of 4294967295. That, at least, explains why we were seeing
different values.

I've looked through both the Tomcat code and the Java 7 code and I don't
see anything wrong (although I am no C/C++ expert). Neither do I see
anything that explain the behaviour you are seeing.

I think the next steps are to see if you can reproduce this issue with a
test case that doesn't take ~50 days to run.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/5

------------------------------------------------------------------------
On 2014-07-08T12:08:51+00:00 Knst-kolinko wrote:

Javadoc for ServerSocket.accept() says that SocketTimeoutException can
be thrown by the method "if a timeout was previously set with
setSoTimeout". [1]


[1] http://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#accept%28%29

BTW,
Javadoc for ServerSocket.setSoTimeout() says that to set an infinite timeout one uses the value of "0". Is somebody confusing '0' and '-1'?

I think the javadoc for ServerSocket.accept() implies that the default
timeout is infinite. So I think it is a JRE bug.


In general it makes some sense to protect against this. A timeout is not a "receiving a valid shutdown command" event, and encountering a timeout does not make the server socket an invalid one, so we should be able to continue.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/6

------------------------------------------------------------------------
On 2014-07-08T12:37:43+00:00 Markt-u wrote:

(In reply to Konstantin Kolinko from comment #6)
> BTW,
> Javadoc for ServerSocket.setSoTimeout() says that to set an infinite timeout
> one uses the value of "0". Is somebody confusing '0' and '-1'?

Not that I can see having looked at the JRE code.

> I think the javadoc for ServerSocket.accept() implies that the default
> timeout is infinite. So I think it is a JRE bug.

That is a possibility, but having looked at the JRE code, I don't see
where the problem is.

> In general it makes some sense to protect against this. A timeout is not a
> "receiving a valid shutdown command" event, and encountering a timeout does
> not make the server socket an invalid one, so we should be able to continue.

Agreed. There is no reason I can think of that accept should time out so
we can certiainly add some protection against this issue here. We can
probably add some useful debug info as well.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/7

------------------------------------------------------------------------
On 2014-07-08T20:32:22+00:00 Markt-u wrote:

Note that the protection and debug info is going to go into Tomcat 8. It
would help a lot if you could upgrade at least one system experiencing
this error to Tomcat 8.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/8

------------------------------------------------------------------------
On 2014-07-08T22:09:11+00:00 Markt-u wrote:

Workaround added to 8.0.x for 8.0.10 onwards. If the timout occurs a
warning is logged that reports how long accept was blocking for before
the timeout.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/9

------------------------------------------------------------------------
On 2014-07-09T10:35:47+00:00 Markt-u wrote:

The workaround and simple debug code was pretty simple so I have
backported it to 7.0.x (it will be in 7.0.55 onwards) and proposed if
for 6.0.x.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/10

------------------------------------------------------------------------
On 2014-07-29T09:32:11+00:00 Markt-u wrote:

Work around added to 6.0.x for 6.0.42 onwards.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/11

------------------------------------------------------------------------
On 2014-08-05T03:20:21+00:00 Fachhoch wrote:

is this fix available in tomcat 7.0.50?  I am getting this error in
jdk1.7 64 bit tomcat 7.0.50

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/12

------------------------------------------------------------------------
On 2014-08-05T03:27:38+00:00 Chuck-caldarale wrote:

(In reply to fach hoch from comment #12)
> is this fix available in tomcat 7.0.50?

No, it's in 7.0.55, as stated in comment #10.

Reply at:
https://bugs.launchpad.net/ubuntu/+source/tomcat7/+bug/1449951/comments/13


** Changed in: tomcat7
       Status: Unknown => Fix Released

** Changed in: tomcat7
   Importance: Unknown => High

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to tomcat7 in Ubuntu.
https://bugs.launchpad.net/bugs/1449951

Title:
  tomcat7 in ubuntu trusty is affected by the '50 days timeout bug' when
  run by the Oracle JDK 7

To manage notifications about this bug go to:
https://bugs.launchpad.net/tomcat7/+bug/1449951/+subscriptions



More information about the Ubuntu-server-bugs mailing list