program dies when RPC server is down

by Anonymous on 13 Dec 2008


When the RPC server is down or busy, the program will exit with such error:

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/", line 442, in "bootstrap
  File "/usr/share/videocache/", line 792, in run
  File "/usr/share/videocache/", line 534, in squid_part
  File "/usr/lib64/python2.4/", line 1096, in "call"
    return self."send(self."name, args)
  File "/usr/lib64/python2.4/", line 1383, in "request
  File "/usr/lib64/python2.4/", line 1129, in request
    self.send_content(h, request_body)
  File "/usr/lib64/python2.4/", line 1243, in send_content
  File "/usr/lib64/python2.4/", line 804, in endheaders
  File "/usr/lib64/python2.4/", line 685, in _send_output
  File "/usr/lib64/python2.4/", line 652, in send
  File "/usr/lib64/python2.4/", line 636, in connect
    raise socket.error, msg
error: (110, 'Connection timed out')

the line numbers may vary, depending on the stage in which the RPC server did not respond.

I suggest catching the exceptions, and just return the orignal URL to squid (if in squid_part), so service will continue to flow.

There is a small issue with this approach - if the RPC server is listening, and accepting connections, but does not respond to request - I couldn't find any variable that sets a "response timeout".

5 Answers

by Kulbir Saini on 13 Dec 2008

I'll have a look at it and try to catch exceptions at all stages. Thanks for the feedback :)

by imriz on 13 Dec 2008

I'm not much of a python programmer, but here is a diff containing my changes.

My changes includes more 'try' blocks for the RPC calls.
With further testing it seems that if SO_REUSEADDR is used with the RPC server, it will bind to the port, but on the next request, it will hang on recvfrom.

Therefore, I've removed the SO_REUSEADDR option, and instead made the script exit if it cannot bind to the port - squid will rerun the process, and at some point (when all the TIME_WAIT connections are closed) it would be able to rebind to the port, and start normally.

Your project is amazing, and I'm really really thankful for your efforts.

I think that the main weakness of the project lies within the RPC server - it is not robust, and doesn't recover from errors easily.

by Kulbir Saini on 14 Dec 2008

Thanks a lot Imriz for the patch. I have included all your suggestions in the base code. You'll will see all of them in the next version :)

Thanks for the compliments and suggestions.

by imriz on 14 Dec 2008

Hi Kulbir,

For some reason, the RPC server freezes after some time (could be a couple of hours). By freezing I mean that the RPC server TCP port answers (i.e. opens), but there is no response from the RPC server.

Could you please go over my changes and make sure I did not introduce a new bug? :)

by Kulbir Saini on 14 Dec 2008

Yeah. Sure. I'll test it before committing :)

You need to sign in. Please sign in to add answer to this question.