VideoCache
Videocache is no longer in development.

Videoache will download the same youtube movie twice

by imriz on 21 Feb 2009

Hi,

Due to a recent change in youtube, the video_id in the first request (to the get_video script), and the videoid in the resulting redirect are different, which makes videocache download them both. I suggest changing the recommended regex in squid.conf to reject google's cache servers.

13 Answers

by Kulbir Saini on 22 Feb 2009

Imriz,

Can you please cite an example. I didn't really get this.

Thank You!

by imriz on 22 Feb 2009

Consider the following get:

--13:29:54--  http://www.youtube.com/get_video?video_id=eMvDax3AuP4&t=vjVQa1PpcFN_VmUBILxH3sDqc-eMoDh9gTA8XTOgTco=&el=detailpage&ps=&fmt=34
Connecting to www.youtube.com|208.65.153.238|:80... connected.
HTTP request sent, awaiting response... 303 See Other
Location: http://v6.cache.googlevideo.com/videoplayback?id=78cbc36b1dc0b8fe&itag=34&ip=212.199.24.93&region=0&signature=47438D8A39678E6C830127C5ADFB25E605887C24.7DD4CA62178EE9504414B83E8DA430FA80E324DF&sver=2&expire=1235323795&key=yt1&ipbits=0 [following]
--13:29:55--  http://v6.cache.googlevideo.com/videoplayback?id=78cbc36b1dc0b8fe&itag=34&ip=212.199.24.93&region=0&signature=47438D8A39678E6C830127C5ADFB25E605887C24.7DD4CA62178EE9504414B83E8DA430FA80E324DF&sver=2&expire=1235323795&key=yt1&ipbits=0
Resolving v6.cache.googlevideo.com... 74.125.99.223
Connecting to v6.cache.googlevideo.com|74.125.99.223|:80... connected.

Notice that the video id in the get_video request and in the videoplayback request are different. videocache will try to download them as seperated items.

by Kulbir Saini on 22 Feb 2009

Imriz,

Ok. Then we should be caching the videos only from youtube and deny the videos from googlevideo servers. Can you try once that in your setup? Just deny them in squid.conf and they'll never reach videocache for download.

But here I am confused about the choice. We should deny youtube ones or the googlevideo ones. Whats your insight on this?

Thank You!

by imriz on 23 Feb 2009

Hi,

With transparent proxy in mind - I would suggest denying the googlevideo ones.

by bellera on 1 Mar 2009

Kulbir, Imriz, ...

I wrote a page about ... after a lot of testing ...

http://www.bellera.cat/josep/videocache/squid_videocache_youtube.html

Regards,

Josep Pujadas

by Gandi on 5 Mar 2009

Great stuff.

After a lot of testing i menaged to force youtube caching to work well.

This is my config:

# --BEGIN-- videocache config for squid
#url_rewrite_program /usr/bin/python /usr/local/videocache/videocache.py
url_rewrite_program /usr/local/squid/bin/zapchain "/usr/local/squid/bin/gg_rewrite" "/usr/bin/python /usr/local/videocache/videocache.py"
url_rewrite_children 5

acl videocache_allow_url url_regex -i www\\.youtube\\.com\\/get_video\\?
#acl videocache_allow_url url_regex -i \\.googlevideo\\.com\\/videoplayback \\.googlevideo\\.com\\/get_video\\?
#acl videocache_allow_url url_regex -i \\.google\\.com\\/videoplayback \\.google\\.com\\/get_video\\?
#acl videocache_allow_url url_regex -i \\.google\\.[a-z][a-z]\\/videoplayback \\.google\\.[a-z][a-z]\\/get_video\\?
#acl videocache_allow_url url_regex -i (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\
#acl videocache_allow_url url_regex -i (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\
#acl videocache_allow_url url_regex -i proxy[a-z0-9\\-][a-z0-9][a-z0-9][a-z0-9]?\\.dailymotion\\.com\\/
#acl videocache_allow_url url_regex -i vid\\.akm\\.dailymotion\\.com\\/
#acl videocache_allow_url url_regex -i [a-z0-9][0-9a-z][0-9a-z]?[0-9a-z]?[0-9a-z]?\\.xtube\\.com\\/(.*)flv
#acl videocache_allow_url url_regex -i bitcast\\.vimeo\\.com\\/vimeo\\/videos\\/
acl videocache_allow_url url_regex -i va\\.wrzuta\\.pl\\/wa[0-9][0-9][0-9][0-9]?
#acl videocache_allow_url url_regex -i \\.files\\.youporn\\.com\\/(.*)\\/flv\\/
#acl videocache_allow_url url_regex -i \\.msn\\.com\\.edgesuite\\.net\\/(.*)\\.flv
#acl videocache_allow_url url_regex -i media[a-z0-9]?[a-z0-9]?[a-z0-9]?\\.tube8\\.com\\/ mobile[a-z0-9]?[a-z0-9]?[a-z0-9]?\\.tube8\\.com\\/
#acl videocache_allow_url url_regex -i \\.mais\\.uol\\.com\\.br\\/(.*)\\.flv
#acl videocache_allow_url url_regex -i \\.video[a-z0-9]?[a-z0-9]?\\.blip\\.tv\\/(.*)\\.(flv|avi|mov|mp3|m4v|mp4|wmv|rm|ram)
#acl videocache_allow_url url_regex -i video\\.break\\.com\\/(.*)\\.(flv|mp4)

#acl videocache_allow_dom dstdomain v.mccont.com dl.redtube.com .cdn.dailymotion.com
acl videocache_allow_dom dstdomain dl.redtube.com

#acl videocache_deny_url url_regex -i http:\\/\\/[a-z][a-z]\\.youtube\\.com http:\\/\\/www\\.youtube\\.com
acl videocache_deny_url url_regex -i http:\\/\\/[a-z][a-z]\\.youtube\\.com

url_rewrite_access deny videocache_deny_url
url_rewrite_access allow videocache_allow_url
url_rewrite_access allow videocache_allow_dom
url_rewrite_access allow GG_banner

redirector_bypass on
# --END-- videocache config for squid

As You see i commented all googlevideo acls, and not used ones.
I changed youtube acl to match www.youtube.com

After that Youtube url's are requested only once but i had to change hit_threshold to 3 because some youtube url's reply with "we're sorry this video is no longer available" and users usually click refresh to fix this what causes another request and unnecessary caching. It happened before so it's not videocache issue.

Big thanks for this stuff. I searched for something like this for a very long time.

by Kulbir Saini on 6 Mar 2009

Gandi,

Thanks for the compliments and sharing your experience with videocache.

Keep caching :D

by Gandi on 6 Mar 2009

I found another bug.
When user tries to seek through youtube or redtube video, it makes another request.

Had to modify config to something like this:

[...]
acl videocache_allow_url url_regex -i www\\.youtube\\.com\\/get_video\\?
acl videocache_allow_url url_regex -i dl\\.redtube\\.com\\/(.*)\\.flv\\?start=0
acl videocache_deny_url url_regex -i http:\\/\\/[a-z][a-z]\\.youtube\\.com www\\.youtube\\.com\\/get_video\\?video_id=.{11}&(start|begin)=
[...]
by Kulbir Saini on 6 Mar 2009

Gandi,

Thats not the case actually. When user seeks, the video is requested again by the client. If the video is in queue (was requested previously), re-requesting will just increase its priority and will not force another download. So, it can be ignored.

Thank You!

by Gandi on 6 Mar 2009

Hi again!
I realize that it will not force another download, but it requests video another time.

Let's see example :)

hit_threshold = 2

User is starting to watch youtube video - it is first request.
User seeks through video - second request and videocache is starting download.
It's not desireble .. i think.
One user can start caching video just by seeking through file.

Sorry for off topic.

by Kulbir Saini on 7 Mar 2009

Gandi,

Your argument is valid but the regex you proposed will create problems seeking in videos which have been cached by videocache. Moreover, when one sets some value for hit_threshold, I think it may be fine to go with +1 or -1.

Thank You!

by Gandi on 7 Mar 2009

Hi again!

You are right, it makes problems seeking in cached videos :( but theye are served from cache and loaded to browser vary fast.
Is there a possibility to add a feauture to avoid getting requests from single IP in particular time? I think it should fix it.

Big thanks again!

by Kulbir Saini on 7 Mar 2009

Gandi,

Thats actually difficult to implement keeping in mind the complexity of the usage type. Imagine a cascading proxy case. The time limit will be an overkill.

Thank You!