Hi,
When the RPC server is down or busy, the program will exit with such error:
Exception in thread Thread-3:
Traceback (most recent call last):
File "/usr/lib64/python2.4/threading.py", line 442, in "bootstrap
self.run()
File "/usr/share/videocache/videocache.py", line 792, in run
squid_part()
File "/usr/share/videocache/videocache.py", line 534, in squid_part
video_id_pool.add(video_id)
File "/usr/lib64/python2.4/xmlrpclib.py", line 1096, in "call"
return self."send(self."name, args)
File "/usr/lib64/python2.4/xmlrpclib.py", line 1383, in "request
verbose=self."verbose
File "/usr/lib64/python2.4/xmlrpclib.py", line 1129, in request
self.send_content(h, request_body)
File "/usr/lib64/python2.4/xmlrpclib.py", line 1243, in send_content
connection.endheaders()
File "/usr/lib64/python2.4/httplib.py", line 804, in endheaders
self._send_output()
File "/usr/lib64/python2.4/httplib.py", line 685, in _send_output
self.send(msg)
File "/usr/lib64/python2.4/httplib.py", line 652, in send
self.connect()
File "/usr/lib64/python2.4/httplib.py", line 636, in connect
raise socket.error, msg
error: (110, 'Connection timed out')
the line numbers may vary, depending on the stage in which the RPC server did not respond.
I suggest catching the exceptions, and just return the orignal URL to squid (if in squid_part), so service will continue to flow.
There is a small issue with this approach - if the RPC server is listening, and accepting connections, but does not respond to request - I couldn't find any variable that sets a "response timeout".
5 Answers
I'll have a look at it and try to catch exceptions at all stages. Thanks for the feedback :)
I'm not much of a python programmer, but here is a diff containing my changes.
My changes includes more 'try' blocks for the RPC calls.
With further testing it seems that if SO_REUSEADDR is used with the RPC server, it will bind to the port, but on the next request, it will hang on recvfrom.
Therefore, I've removed the SO_REUSEADDR option, and instead made the script exit if it cannot bind to the port - squid will rerun the process, and at some point (when all the TIME_WAIT connections are closed) it would be able to rebind to the port, and start normally.
Your project is amazing, and I'm really really thankful for your efforts.
I think that the main weakness of the project lies within the RPC server - it is not robust, and doesn't recover from errors easily.
Thanks a lot Imriz for the patch. I have included all your suggestions in the base code. You'll will see all of them in the next version :)
Thanks for the compliments and suggestions.
Hi Kulbir,
For some reason, the RPC server freezes after some time (could be a couple of hours). By freezing I mean that the RPC server TCP port answers (i.e. opens), but there is no response from the RPC server.
Could you please go over my changes and make sure I did not introduce a new bug? :)