When the RPC server is down or busy, the program will exit with such error:
Exception in thread Thread-3: Traceback (most recent call last): File "/usr/lib64/python2.4/threading.py", line 442, in "bootstrap self.run() File "/usr/share/videocache/videocache.py", line 792, in run squid_part() File "/usr/share/videocache/videocache.py", line 534, in squid_part video_id_pool.add(video_id) File "/usr/lib64/python2.4/xmlrpclib.py", line 1096, in "call" return self."send(self."name, args) File "/usr/lib64/python2.4/xmlrpclib.py", line 1383, in "request verbose=self."verbose File "/usr/lib64/python2.4/xmlrpclib.py", line 1129, in request self.send_content(h, request_body) File "/usr/lib64/python2.4/xmlrpclib.py", line 1243, in send_content connection.endheaders() File "/usr/lib64/python2.4/httplib.py", line 804, in endheaders self._send_output() File "/usr/lib64/python2.4/httplib.py", line 685, in _send_output self.send(msg) File "/usr/lib64/python2.4/httplib.py", line 652, in send self.connect() File "/usr/lib64/python2.4/httplib.py", line 636, in connect raise socket.error, msg error: (110, 'Connection timed out')
the line numbers may vary, depending on the stage in which the RPC server did not respond.
I suggest catching the exceptions, and just return the orignal URL to squid (if in squid_part), so service will continue to flow.
There is a small issue with this approach - if the RPC server is listening, and accepting connections, but does not respond to request - I couldn't find any variable that sets a "response timeout".
I'll have a look at it and try to catch exceptions at all stages. Thanks for the feedback :)
I'm not much of a python programmer, but here is a diff containing my changes.
My changes includes more 'try' blocks for the RPC calls.
With further testing it seems that if SO_REUSEADDR is used with the RPC server, it will bind to the port, but on the next request, it will hang on recvfrom.
Therefore, I've removed the SO_REUSEADDR option, and instead made the script exit if it cannot bind to the port - squid will rerun the process, and at some point (when all the TIME_WAIT connections are closed) it would be able to rebind to the port, and start normally.
Your project is amazing, and I'm really really thankful for your efforts.
I think that the main weakness of the project lies within the RPC server - it is not robust, and doesn't recover from errors easily.
Thanks a lot Imriz for the patch. I have included all your suggestions in the base code. You'll will see all of them in the next version :)
Thanks for the compliments and suggestions.
For some reason, the RPC server freezes after some time (could be a couple of hours). By freezing I mean that the RPC server TCP port answers (i.e. opens), but there is no response from the RPC server.
Could you please go over my changes and make sure I did not introduce a new bug? :)