Using cron and cgi to back up log files

By Oli
At 2:42 PM · Wednesday, 19 November · 2003
To Coding · Unix

Unfortunately my ISP only provides raw log files for a single day, and suggests that users wanting to archive these should set up a FTP program to download them daily. This is not very convenient ;-) However they also allow me to use crontab to make my own cron file, and using this plus a little cgi kiddie scripting I have almost made a work-around. Please keep in mind I have no idea what I’m talking about ;-)

Cron is a unix tool allowing you to do things at regular intervals. It’s normally used for system maintenance, for example to do time-consuming tasks around 2-4am when the computer is idle. On web servers it’s also commonly used for analyzing and archiving log files. In this case I’m using it to append my log file to another file daily.

First here are some cron resources:

I modified the script above to this:

#!/usr/bin/perl
# 2003-11-16
# attempt at a script that appends the daily access log file
# to another log file that isn't deleted

# create path and filename variables
$stats_dir = "path_to_new_logs";
$monthly_log = "name_of_new.log";
$daily_dir = "path_to_provided_logs";
$daily_log = "daily_access.log";

# open the daily log and copy it into an array called @Text
open (DAILY, "$daily_dir/$daily_log");
    @TEXT = <DAILY>;
close (DAILY);

# work out a time stamp using the subroutine below
&get_date;

# open the log file I'm adding to using >> so I append
# to it, add a time stamp, then add the daily log info from
# the @Text array
open (MONTHLY, ">>$stats_dir/$monthly_log");
    print MONTHLY "Appended $date\n";
    print MONTHLY @TEXT;
close (MONTHLY);

# this is the date subroutine to generate a nice timestamp
# from the Fantomaster script
sub get_date {
($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst)=localtime();
  $mon++;
  $sec = sprintf ("%02d", $sec);
  $min = sprintf ("%02d", $min);
  $hour = sprintf ("%02d", $hour);
  $mday = sprintf ("%02d", $mday);
  $mon = sprintf ("%02d", $mon);
  $year = scalar localtime;
  $year =~ s/.*?(\d{4})/$1/;
  $date="$year-$mon-$mday, $hour:$min:$sec";
}

This script should copy my daily log file text and add it to another log file that won’t be deleted. To get it to work I needed to place the script as a text file in my cgi folder, and chmod it to make it executable (chmod 755 script_name.cgi). I tested it by typing the script’s URL into my browser (don’t worry about the 500 Server Internal Error), then looking at the target file on the server to see if anything was added.

Next comes the automation. To create/edit your crontab file use crontab -e, which will probably dump you into vi (a command line text editor of amazing power completely lacking in ease of use). Follow the O’Reilly guide to add/paste in your new command, which for this example is 10 1 * * * /home/path_to_your_cgi/script.cgi (execute script.cgi at 1:10am server time every day), then save and exit. You should see a message saying your crontab file was successfully saved. You can check it using crontab -l that’s a lower case L not a #1), which will show your crontab file. Now, theoretically, everything going to plan, cron should run the script every morning, copying the daily log file text to a cumulative log file. If I actually knew what I was doing, I’d then write another script to compress and email myself the cumulative log file monthly. Maybe next time ;-)

What I’d really like to do is put log data into a database, as described in Web Logs Using DBI by Ben Trott. Unfortunately the method he uses requires httpd.conf access — you basically need your own server. I wonder if it’s possible to do via .htaccess, or if it’s worth doing by using a perl script to insert the raw log file text into the database?

Ben just replied to my email. He says:

I’m pretty sure that you can use Perl*Handler tags within .htaccess files if it’s allowed by the main server configuration, but I’m not sure what impact that has upon the server—ie I don’t know if the module needs to be reloaded into the server on every request. But I think it should work in principle, and if it doesn’t, it’s probably related to the way your host has the httpd.conf configured.

Thanks!

Discussion...

Comments (3) · TrackBacks (0)  to  http://www.boblet.net/cgi-bin/mttb-external.cgi/41
1. Comment by oli  · 1 Dec, 2003 · 1:36 PM

Kevin Cameron from Bastish.net suggested an even easier way of doing this using only cron. Just add the following to your crontab (crontab -e):

30 2 * * * cat /home/user_name/logs/access.log >> /home/user_name/logs/my_log.log

This will use the program cat to append (>>) the file access.log to the file my_log.log at 2:30am server time every day. You’ll have to edit the paths, but they should be something similar to this. Works for me!

2. Comment by Ultrabob  · 10 Dec, 2003 · 4:10 PM

One thing to add to the methods you list above: you need to run them right before your server updates it’s logs or you will lose the data between your copying of the file and the server cleaning it’s logs.

I would have messed this up, so I thought I’d better add it.

3. Comment by oli  · 12 Dec, 2003 · 12:18 PM

Thanks Ultrabob - good point. My host provides the previous day’s log file, so I don’t end up copying a file that is being appended to. Also my host runs reverse lookup on the raw log files, to convert (some of) the IP numbers into host names. It’s a good idea to get the log after this has happened ;-)