We are an educational ISP in the uk. At present youtube is consuming around 870GB of bandwidth per week, unfortunately we cant profile individual video hits at the moment which is making savings/disk pace projections difficult. Over the summer we are planning on setting up a videocache server to try and reduce this and to improve the performance of youtube within the WAN. The idea is that this server would be setup as an upstream server to the main proxy core and we will send all traffic for youtube.com and googlevideo.com to the videocache (and everything else direct).
We would ideally like to to have a redundant setup with 2 servers to ensure availability of the service, though we would prefer it to not duplicate the files across the 2 servers. Do you have any ideas how we can implement that?
The specification the the servers will be something like single core xeon 2.8GHz with 2GB of ram, 2*36GB hdd in raid 1 for OS / Logs and 3*146GB hdd in raid 0 for cache dir. Does this look sufficient?
The Videocache conf will be configured to remove video's unwatched after 6 days and the squid server would be configured with a null cache dir to ensure it does not consume space. Do you have any thoughts on how we can optimise this system to ensure best performance?
We have a proof of concept working, but are only at the early planning stage, any thoughts/suggestions would be much appreciated.
Scaling videocache is problematic, but possible. Making videocache REDUNDANT (but not scaleable) is easier. Let's start with the former -
Now, as for making it REDUNDANT only - this is pretty simple, and could be done with any filesystem and cluster software - you will still need to use an external storage, so both nodes can access it.
As for your hardware - it is a bit weak - please remember that while squid cannot benefit from an SMP machine, videocache as a whole CAN. In an ideal videocache configuration, squid does almost nothing, and most of the work is being done by the python part and the apache server.
I have a videocache-like solution, pushing more than 1.5 Gbit per second of traffic (mostly apache traffic), and I am fully utilizing 2.5 Xeon E5410 @ 2.33GHz CPUs.
I would suggest a quad core machine - this will leave you some spare cycles for managment and peaks.
I hope this answers some of your questions.
Thanks for the reply, some good info, and we now have some ideas we can work on for ensuring redundancy of the system. The project is on hold till the summer but we will look at getting new hardware for this now I think.