Hello there,
When I'd created the: Statistics from youtube_cache.log by day (http://cachevideos.com/forum/post/statistics-youtubecachelog-day), I put a routine to alter the date of modified of file when it is request on day. With this is possible to delete the files that no have access for a long time.
For execute this, use: ./script.sh A B
where:
A = File's data was last modified A*24 hours ago.
B = Use Y to delete without ask you.
#!/bin/sh
# lopan dot eti at gmail dot com (Author: Lopan)
# GPL2
#Variables
VIDEO_CACHE_DIR=/var/spool/squid/video_cache
DAYS=$1
DL=$2
#Select file's data was last modified $DAYS ago
for MB in $(find $VIDEO_CACHE_DIR -mtime +$DAYS -exec ls -l {} ';' | awk '{print $5}'); do
MBT=$((MBT+MB))
done
#Print total in GB of selected files
echo "You are selected *`echo $MBT/1024/1024 | bc`GB* of videos to delete!"
#Ask about if can erase
if [ -z "$DL" ]; then
read -p "Can I delete the selected files? [Y/n]" -n 1 DL
fi
#Delete files? Are you sure?
if [ "$DL" == "Y" ]; then
echo "Wait! Deleting the selected files..."
echo "This process can take long time!"
find $VIDEO_CACHE_DIR -mtime +$DAYS -exec rm -rf {} \\;
fi
19 Answers
Hi lopan
please can you explain more about your script. for newbies to this. how to use this script ,where to paste it , how to run it .
you will be thankfull for it in advanced.
Salah,
To use this script you need run this other script (http://cachevideos.com/forum/post/statistics-youtubecachelog-day) every day.
The script (http://cachevideos.com/forum/post/statistics-youtubecachelog-day) alter the modified of date of file when this file is request.
Finally, this script (http://cachevideos.com/forum/post/script-delete-files-not-requested-long-time), select files without request for N days ago and delete this files.
To run this script (http://cachevideos.com/forum/post/script-delete-files-not-requested-long-time) use the syntax:
./script.sh A B
A = Select file's data was last modified A*24 hours ago.
B (optional) = Use Y to delete without ask you.
I think to merge 2 scripts (http://cachevideos.com/forum/post/script-delete-files-not-requested-long-time) and (http://cachevideos.com/forum/post/statistics-youtubecachelog-day) soon.
Thanks man these good scripts will help a lot. and they are what we are missing.
10x again and wish you more good ideas and scripts :)
Lopan,
Great work man!!! You offloaded a lot of my work :) Thanks again and keep up the good work !!!!
Hey Kulbir,
I think that I can merge the two scripts, Statistic and Cleaner.
What are you think?
So, I work on it!
Hi!
I think that would be another feather in the cap :) Go ahead!
Thank you for the support!!!
based on your script I've made one, but in python...
[SEE SCRIPT BELOW]
some parts could be remade... like improve the regexps and some code in the main class, but it's working...
for those who don't now how to run, just type...
python script.py N
where N is the number of days witch a video wasn't requested anymore
Bye...
Edited by admin : Added script here.
#!/usr/bin/env python
# videocache cleaner
import re
import time
import os
import sys
"version" = 0.01
log_dir = '/var/log/videocache/'
log_dir_files = os.listdir(log_dir)
cache_dir = '/var/spool/videocache/'
hit_pattern = '(\\d{4})-(\\d{2})-(\\d{2}) (\\d{2}):(\\d{2}):(\\d{2}),\\d{3} \\w+ \\d+\\.\\d+\\.\\d+\\.\\d+ ([a-zA-Z0-9\\._-]+) CACHE_HIT (\\w+)'
download_pattern = '(\\d{4})-(\\d{2})-(\\d{2}) (\\d{2}):(\\d{2}):(\\d{2}),\\d{3} \\w+ \\d+\\.\\d+\\.\\d+\\.\\d+ ([a-zA-Z0-9\\._-]+) DOWNLOAD (\\w+)'
class TouchNotCompleteError(Exception):
pass
class GetHitsNotCompleteError(Exception):
pass
class CacheCleaner(object):
def "init"(self, delete_age=90):
self.all_logs = ''
self.touch_complete = False
self.get_hits_complete = False
self.delete_age = delete_age
self.now = time.mktime(time.localtime())
self.last_hits = {}
self.last_downloads = {}
def _get_downloads(self):
self._check_get_hits()
download_list = re.findall(download_pattern, self.all_logs)
for download in download_list:
t = tuple([int(i) for i in download[:6]]) + (2, 35, 1)
date = '%s%s%s%s%s.%s' %(str(download[0])[-2:], download[1],
download[2], download[3], download[4],
download[5])
if download[6] not in self.last_hits:
if self.last_downloads.has_key(download[6]):
delta = self.now - time.mktime(t)
if self.last_downloads[download[6]]['last_download'] > delta:
self.last_downloads[download[6]]['last_download'] = delta
self.last_downloads[download[6]]['last_download_date'] = date
else:
self.last_downloads[download[6]] = {
'site':download[7].lower(),
'last_download':self.now - time.mktime(t),
'last_download_date':date
}
def _get_hits(self):
hit_list = re.findall(hit_pattern, self.all_logs)
for hit in hit_list:
t = tuple([int(i) for i in hit[:6]]) + (2, 35, 1)
date = '%s%s%s%s%s.%s' %(str(hit[0])[-2:], hit[1],
hit[2],hit[3], hit[4], hit[5])
if self.last_hits.has_key(hit[6]):
delta = self.now - time.mktime(t)
if self.last_hits[hit[6]]['last_hit'] > delta:
self.last_hits[hit[6]]['last_hit'] = delta
self.last_hits[hit[6]]['last_hit_date'] = date
else:
self.last_hits[hit[6]] = {
'site':hit[7].lower(),
'last_hit':self.now - time.mktime(t),
'last_hit_date':date
}
self.get_hits_complete = True
def _touch_files(self):
for download in self.last_downloads:
if self.last_downloads[download]['site'] == 'youtube':
cmd = 'touch %s%s/%s -t %s' %(cache_dir, self.last_downloads[download]['site'],
download, self.last_downloads[download]['last_download_date'])
else:
cmd = 'touch %s%s/%s.flv -t %s' %(cache_dir, self.last_downloads[download]['site'],
download, self.last_downloads[download]['last_download_date'])
os.system(cmd)
for hit in self.last_hits:
if self.last_hits[hit]['site'] == 'youtube':
cmd = 'touch %s%s/%s -t %s' %(cache_dir, self.last_hits[hit]['site'],
hit, self.last_hits[hit]['last_hit_date'])
else:
cmd = 'touch %s%s/%s.flv -t %s' %(cache_dir, self.last_hits[hit]['site'],
hit, self.last_hits[hit]['last_hit_date'])
os.system(cmd)
self.touch_complete = True
def run(self):
self._get_logs()
self._get_hits()
self._get_downloads()
self._touch_files()
self._clear()
def _check_get_hits(self):
if not self.get_hits_complete:
raise GetHitsNotCompleteError
def _check_touch(self):
if not self.touch_complete:
raise TouchNotCompleteError
def _clear(self):
self._check_touch()
cmd = "find %s -mtime +%s -exec rm -rf {} ';'" %(cache_dir, self.delete_age)
os.system(cmd)
def _clear_logs(self):
"""
TODO
"""
pass
def _get_logs(self):
self.all_logs = ''
for log in log_dir_files:
self.all_logs += open(log_dir+log).read()
def main():
if len(sys.argv) > 1:
c = CacheCleaner(delete_age=sys.argv[1])
else:
c = CacheCleaner()
c.run()
if "name" == '"main"':
main()
Thiago,
Cool script!!! Thank you very much for taking some time out to write this script. I hope this will be helpful for users :)
PS : If you register, you can nicely format the code snippets :)
Hi, I'm the that unregistered user...
I had another idea... if you execute a touch command when the video is downloaded and in every hit in your main script, these scripts could be replace by a simple find command...
Bye...
EDIT: after googleing a bit i found a python solution for the touch, becoming os independent, you can use os.utime(file, None)
Kimble,
Thats a nice idea indeed. We can modify the last modified time every time there is cache hit and then while cleaning we can remove videos based on their last modified time instead of using last access time. Because last access time may have changed due to several reasons. For example, when you take a backup of something, last access time gets updated but the video was not served.
I'll try to incorporate this in next version :)
Thank You!
Thiago, Kimble and Kulbir,
Wonderful!
Now is easy to clean old videos in cache.
Is a very nice function! :P
os.utime(Code_This, NOW) lol
c u
Hello!
Good idea to refresh the cache to save disk!
However I think this job should be done without system commands, only with Python code. I'm not a Python expert but I tried this example:
#!/usr/local/bin/python2.6
import os,time
video_file = '/var/spool/videocache/youtube/0954c0554eb4d59f'
print time.ctime(os.stat(video_file).st_ctime)
print time.ctime(os.stat(video_file).st_mtime)
print time.ctime(os.stat(video_file).st_atime)
./test.py
Mon Feb 9 17:49:48 2009
Thu Jan 1 20:33:59 2009
Tue Feb 10 04:43:32 2009
The result shows:
- The video was download at 2009-02-09_17:49:48
- The video is at youtube from 2009-01_01-20:33:59
- The video was last accessed at 2009-02-10_04:43:32
I hope thi helps!
Regards,
Josep Pujadas
Josep and all,
I have completed a script (still in testing stage) to remove unused videos from the cache. It'll be included in the next version. So, I'll request all of you to invest time in stats calculation part. I have not thought anything about it yet.
Thank you for your hard work guyz!!!
Hi Kulbir,
There's a small bug in the script:
if cur_time - os.stat(video)[stat.ST_MTIME] > expire*86400:
age = int((cur_time - os.stat(video)[stat.ST_ATIME]) / 86400)
The if should check ATIME and not MTIME
Imriz,
Actually thats a hack to get around the access done by other agents like copy command (cp) when backing up the cached videos. So, if you see videocache.py, you'll notice that we update the access time and modification time whenever there is a CACHE_HIT. So, using MTIME in vccleaner doesn't harm at all.
Thank you for looking at the code :)
Vccleaner Problem
Hi,
Been setting up a new squid server and using videocache to cache videos, great work thanks!
I found a problem in the vccleaner script, it uses the "base_dir" parameter from videocache.conf which works fine, however if the base_dir is setup with a maximum cache size, i.e. base_dir = /videocache/:35000, then vccleaner tries to use the directory for the cache as /videocache/:35000/youtube etc. This obviously doesn't work very well!
Thanks again
Dale
Dale,
Please apply the following patch to vccleaner file.
diff --git a/scripts/vccleaner b/scripts/vccleaner
index 8bae0be..2512b70 100644
--- a/scripts/vccleaner
+++ b/scripts/vccleaner
@@ -115,7 +115,7 @@ def main(root, etc_dir):
return (None, None, None, None)
else:
video_lifetime = int(mainconf.video_lifetime)
- base_dir = [apply_install_root(root, dir.strip()) for dir in mainconf.base_dir.split('|')]
+ base_dir = [apply_install_root(root, dir_tup.split(':')[0].strip()) for dir_tup in mainconf.base_dir.strip().split('|')]
logdir = apply_install_root(root, mainconf.logdir)
# Youtube specific options
It'll work fine after that.
Thank you for reporting the problem.