VideoCache

program dies when RPC server is down

by Anonymous on 13 Dec 2008

Hi,

When the RPC server is down or busy, the program will exit with such error:

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in "bootstrap
    self.run()
  File "/usr/share/videocache/videocache.py", line 792, in run
    squid_part()
  File "/usr/share/videocache/videocache.py", line 534, in squid_part
    video_id_pool.add(video_id)
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1096, in "call"
    return self."send(self."name, args)
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1383, in "request
    verbose=self."verbose
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1129, in request
    self.send_content(h, request_body)
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1243, in send_content
    connection.endheaders()
  File "/usr/lib64/python2.4/httplib.py", line 804, in endheaders
    self._send_output()
  File "/usr/lib64/python2.4/httplib.py", line 685, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.4/httplib.py", line 652, in send
    self.connect()
  File "/usr/lib64/python2.4/httplib.py", line 636, in connect
    raise socket.error, msg
error: (110, 'Connection timed out')

the line numbers may vary, depending on the stage in which the RPC server did not respond.

I suggest catching the exceptions, and just return the orignal URL to squid (if in squid_part), so service will continue to flow.

There is a small issue with this approach - if the RPC server is listening, and accepting connections, but does not respond to request - I couldn't find any variable that sets a "response timeout".

5 Answers

by Kulbir Saini on 13 Dec 2008

I'll have a look at it and try to catch exceptions at all stages. Thanks for the feedback :)

by imriz on 13 Dec 2008

I'm not much of a python programmer, but here is a diff containing my changes.

My changes includes more 'try' blocks for the RPC calls.
With further testing it seems that if SO_REUSEADDR is used with the RPC server, it will bind to the port, but on the next request, it will hang on recvfrom.

Therefore, I've removed the SO_REUSEADDR option, and instead made the script exit if it cannot bind to the port - squid will rerun the process, and at some point (when all the TIME_WAIT connections are closed) it would be able to rebind to the port, and start normally.

Your project is amazing, and I'm really really thankful for your efforts.

I think that the main weakness of the project lies within the RPC server - it is not robust, and doesn't recover from errors easily.

http://mariska.inter.net.il/~imriz/videocach.py.patch

by Kulbir Saini on 14 Dec 2008

Thanks a lot Imriz for the patch. I have included all your suggestions in the base code. You'll will see all of them in the next version :)

Thanks for the compliments and suggestions.

by imriz on 14 Dec 2008

Hi Kulbir,

For some reason, the RPC server freezes after some time (could be a couple of hours). By freezing I mean that the RPC server TCP port answers (i.e. opens), but there is no response from the RPC server.

Could you please go over my changes and make sure I did not introduce a new bug? :)

by Kulbir Saini on 14 Dec 2008

Yeah. Sure. I'll test it before committing :)

You need to sign in. Please sign in to add answer to this question.