VideoCache
Videocache is no longer in development.

Scaleing videocache

by Anonymous on 15 Apr 2009

Hi,

We are an educational ISP in the uk. At present youtube is consuming around 870GB of bandwidth per week, unfortunately we cant profile individual video hits at the moment which is making savings/disk pace projections difficult. Over the summer we are planning on setting up a videocache server to try and reduce this and to improve the performance of youtube within the WAN. The idea is that this server would be setup as an upstream server to the main proxy core and we will send all traffic for youtube.com and googlevideo.com to the videocache (and everything else direct).
We would ideally like to to have a redundant setup with 2 servers to ensure availability of the service, though we would prefer it to not duplicate the files across the 2 servers. Do you have any ideas how we can implement that?
The specification the the servers will be something like single core xeon 2.8GHz with 2GB of ram, 2*36GB hdd in raid 1 for OS / Logs and 3*146GB hdd in raid 0 for cache dir. Does this look sufficient?
The Videocache conf will be configured to remove video's unwatched after 6 days and the squid server would be configured with a null cache dir to ensure it does not consume space. Do you have any thoughts on how we can optimise this system to ensure best performance?

We have a proof of concept working, but are only at the early planning stage, any thoughts/suggestions would be much appreciated.

Thanks,

Tris

3 Answers

by imriz on 18 Apr 2009

Hi Tris,

Scaling videocache is problematic, but possible. Making videocache REDUNDANT (but not scaleable) is easier. Let's start with the former -

  1. You need a clustered files system, in order to allow two instances of videocache/squid/apache to access the same files at the same time. This part is easy - most of the OS offers some kind of solution for that.
  2. The tricky part is the downloader - you need to somehow make sure there is only one instance of the downloader running, in order to avoid race conditions (COMMENT - this could be easier if the downloader was splitted into a seperate daemon AND if locking mechanisim would be implemented). Currently, the way videocache is designed, you could achieve that by putting the RPC server behind a load balancer - this way the second instance won't try to start another RPC daemon thread if the first one is already running. The problem with this approach is that you will have to change the code so that the RPC daemon will try to bind into a LOCAL address, and the squidpart() will need to try and connect to the VIRTUAL ip (held by the LB). This is a minor code change, even if you are not a python expert.
  3. You need to LB the apache servers on each machine, and make sure the squidpart() redirects to the VIRTUAL ip. This is pretty easy.

Now, as for making it REDUNDANT only - this is pretty simple, and could be done with any filesystem and cluster software - you will still need to use an external storage, so both nodes can access it.

As for your hardware - it is a bit weak - please remember that while squid cannot benefit from an SMP machine, videocache as a whole CAN. In an ideal videocache configuration, squid does almost nothing, and most of the work is being done by the python part and the apache server.

I have a videocache-like solution, pushing more than 1.5 Gbit per second of traffic (mostly apache traffic), and I am fully utilizing 2.5 Xeon E5410 @ 2.33GHz CPUs.
I would suggest a quad core machine - this will leave you some spare cycles for managment and peaks.

I hope this answers some of your questions.

by Anonymous on 23 Apr 2009

Hi Imriz,

Thanks for the reply, some good info, and we now have some ideas we can work on for ensuring redundancy of the system. The project is on hold till the summer but we will look at getting new hardware for this now I think.

Best regards,

Tris

by imriz on 27 Apr 2009

Hi Tris,

I've misread your post - you were talking about 870 GigaByte per week - which means 10~ mbits per second in average

You don't need expensive hardware for that.