Saturday, April 30, 2011

Intermittent HTTP 500 Error (Response Code)

In this blog post, I would like to write about intermittent HTTP 500 Error happening in AJAX based application. This error is not the typical internal server error thrown by the application server, which also shows as HTTP response with 500 as response code. Typical symptoms are that the request did not even reach the back-end server, bu the 500 response code is returned.

The application is an AJAX application, built using GWT in the front end and Spring in the back-end. The application was accessed using Internet Explorer in the client desktop.
The back-end infrastructure has Apache Web Server, routing the request to the Websphere application server. The application is a heavy user interactive application which make makes multiple RPC (Remote Procedure Calls) to the back-end to fetch the data . This essentially means that the application was making large number of HTTP requests to communicate with the back-end server.

The issue that we were facing was that 1% of the requests was getting a 500 response code.

We tried analyzing the reason for the 500 response code.To understand the issue, we need to know the details about the communication between the browser and the server. When the browser makes an HTTP request to the server, the browser sends the HTTP Request Header and the HTTP Request Body. The Web-server at the back-end, receives the header and body, transfers it to the back end server. In the case where the HTTP 500 status code was returned to the application, it turned out that the error was thrown by the Apache Web-server. Apache has a thread (connection) waiting for the request header and body to arrive and then forward it to the back-end server. In very rare scenarios, the header of the request arrives and the request body never arrives. Apache waits for 1 minute (timeout duration) and then responds with a 500 Error Code.

The next challenge was to find out as to why the request body was not arriving and only the request header was arriving in some specific cases.
We found that this was happening only with the Internet Explorer browser (IE6/ IE7 & IE8) and not with Firefox. After placing a network monitoring tool at the client side, we figured out that the issue was happening with IE when it was trying to retransmit the data on a new connection after the attempt to send data on the first connection fails. In the scenarios we saw, IE was trying to send request to the back-end server on an open connection (HTTP 1.1 compliant) with the back-end server. However this connection was already closed by the back-end server , after the keepalivetimeout period was reached. IE, without knowing that the connection is closed, attempted to send the data on that connection. Per design, IE should then retransmit the data on a new connection with the back-end server. In some scenarios, IE forgets to send the HTTP Body and sends only the HTTP Header. (IE Bug #). Though this IE Bug is reported only in IE6, we found that this can occur in all versions of IE including IE8.

There are multiple fixes to the problem

The first and the preferred one is the one recommended by Microsoft in the bug details. This involves registry changes and a headache of applying it across the end users machines.

The second fix is to ensure that the IE is made to work with only valid connections (i.e connections not closed by the back-end server). This is possible by increasing the keepalivetimeout value in the back-end to a value more than 60 seconds. The 60 seconds value is due to the fact that IE has a default setting of 60 seconds after which it removes the unused open connections with the back-end server. However, a higher value of KeepAliveTimeout also means that there would be large number of idle connections in Apache which would be reserved to a client.

No comments:

Post a Comment