VideoCache

Video keywords

by bellera on 31 Jan 2009

Hello!

At our school we use squid with squidGuard + videocache.

With the caches (squid and videocache) and squidGuard (no add-ons) we can optimize our Internet connection. SquidGuard filters also undesirable contents for children.

But in same circumstances, specially the large video sites like youtube.com, there are adult contents difficult to detect.

For example, I found cached
http://www.youtube.com/watch?v=-5SglF2dazA
not market at youtube.com as adult content.

More than that. “Related videos” link at youtube.com page permits to go to a lot of similar videos. No way for squidGuard to detect this problem.

I was thinking if it would be possible to store the keywords for videos cached. If would be a very interesting tool to surf the cache by keywords.

Perhaps a simple text file or XML file at site cache folder would be enough.

youtube.com API permits to obtain all the information about one video, http://code.google.com/intl/en/apis/youtube/1.0/developers_guide_python.html

The ideal (for us) would be to integrate this video database with squidGuard filtering.

Make this at real time it is perhaps impossible. We redirect first to squidGuard and second to videocache, using zapchain.

The easiest would be (perhaps) to process the database with a shell script and to add the undesired videos to the squidGuard filtering.

The most complete solution would be to include a list of forbidden keywords to videocache. Like this videocache could test the keywords for the video and decide if it must be cached/served or must be redirected to an “access denied” page (as squidGuard does).

Regards,

Josep Pujadas

5 Answers

by Kulbir Saini on 31 Jan 2009

Hi Josep,

Well, I would say this is a really ambitious feature. But it involves a lot of hard work, dedication and above all a lot of time. Since I don't work full time on videocache and I have lot of other commitments, I can't really promise this feature in near future.

Thank You!

by bellera on 1 Feb 2009

Ok, thanks!

Looking at YouTube Data API I found the way to obtain the information about the video in a XML file:

http://gdata.youtube.com/feeds/api/videos/VIDEO_ID

The page http://gdata.youtube.com/demo/index.html permits to see the result. Just edit the [Resulting URI:] box with

/feeds/api/videos/VIDEO_ID

and yo will see the results at the [Response] box.

Regards,

Josep Pujadas

by Kulbir Saini on 2 Feb 2009

Hi Josep,

Thanks for doing all the research :) I would like to rephrase my words I said earlier to, "Well, its a cool feature. I would love to take some time out for it in near future after I finish up the current todo list.".

Thank for the interest :)

by bellera on 6 Feb 2009

Hello!

I’m writing a python redirector for squid that:

  • It tests if the URL belongs to youtube.com
  • It extracts the video_id
  • It queries gdata (Google/Youtube DB)
  • It extracts title and keywords for the video
  • It evaluates title and keywords against squidGuard denied regex expressions.

I called it video_redirector.py

I will try to put together (working with squid), using zapchain:

video_redirector.py -> squidGuard -> videocache.py

I think this will solve my problem, testing if a video from youtube.com must be filtered or not at our school.

Regards,

Josep Pujadas

by Kulbir Saini on 6 Feb 2009

Hi Josep,

First of all best of luck with your plugin task.

Secondly, I would suggest the name to be videofilter instead of video_redirector because you are filtering the videos instead of redirecting them :)

Also try to get rid of underscores and hyphens in program or software names because they sometime cause conflicts with certain operating systems. For example, check this.

Later, it can be merged into videocache with optional usage. I would have helped you with this but the lack of time is becoming a huge problem for me :(

Thank You!

You need to sign in. Please sign in to add answer to this question.