VideoCache
Videocache is no longer in development.

program dies when RPC server is down

by Anonymous on 13 Dec 2008

Hi,

When the RPC server is down or busy, the program will exit with such error:

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib64/python2.4/threading.py", line 442, in "bootstrap
    self.run()
  File "/usr/share/videocache/videocache.py", line 792, in run
    squid_part()
  File "/usr/share/videocache/videocache.py", line 534, in squid_part
    video_id_pool.add(video_id)
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1096, in "call"
    return self."send(self."name, args)
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1383, in "request
    verbose=self."verbose
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1129, in request
    self.send_content(h, request_body)
  File "/usr/lib64/python2.4/xmlrpclib.py", line 1243, in send_content
    connection.endheaders()
  File "/usr/lib64/python2.4/httplib.py", line 804, in endheaders
    self._send_output()
  File "/usr/lib64/python2.4/httplib.py", line 685, in _send_output
    self.send(msg)
  File "/usr/lib64/python2.4/httplib.py", line 652, in send
    self.connect()
  File "/usr/lib64/python2.4/httplib.py", line 636, in connect
    raise socket.error, msg
error: (110, 'Connection timed out')

the line numbers may vary, depending on the stage in which the RPC server did not respond.

I suggest catching the exceptions, and just return the orignal URL to squid (if in squid_part), so service will continue to flow.

There is a small issue with this approach - if the RPC server is listening, and accepting connections, but does not respond to request - I couldn't find any variable that sets a "response timeout".

5 Answers

by Kulbir Saini on 13 Dec 2008

I'll have a look at it and try to catch exceptions at all stages. Thanks for the feedback :)

by imriz on 13 Dec 2008

I'm not much of a python programmer, but here is a diff containing my changes.

My changes includes more 'try' blocks for the RPC calls.
With further testing it seems that if SO_REUSEADDR is used with the RPC server, it will bind to the port, but on the next request, it will hang on recvfrom.

Therefore, I've removed the SO_REUSEADDR option, and instead made the script exit if it cannot bind to the port - squid will rerun the process, and at some point (when all the TIME_WAIT connections are closed) it would be able to rebind to the port, and start normally.

Your project is amazing, and I'm really really thankful for your efforts.

I think that the main weakness of the project lies within the RPC server - it is not robust, and doesn't recover from errors easily.

http://mariska.inter.net.il/~imriz/videocach.py.patch

by Kulbir Saini on 14 Dec 2008

Thanks a lot Imriz for the patch. I have included all your suggestions in the base code. You'll will see all of them in the next version :)

Thanks for the compliments and suggestions.

by imriz on 14 Dec 2008

Hi Kulbir,

For some reason, the RPC server freezes after some time (could be a couple of hours). By freezing I mean that the RPC server TCP port answers (i.e. opens), but there is no response from the RPC server.

Could you please go over my changes and make sure I did not introduce a new bug? :)

by Kulbir Saini on 14 Dec 2008

Yeah. Sure. I'll test it before committing :)