Automatically Back Up Your Web Server Doc Root with Tar and Cron

How To Back Up a Web Server Doc Root Automatically with Tar and Cron

Last updated on | 3 replies

It’s important to make frequent automated backups of your web server’s document root should you ever accidentally delete files or suffer a hack. In this guide, we will make daily and monthly automated backups of our document root using tar to make “tarball” archives and crontab to automate the entire process.

1. Prepare Backup

There is no particular recommended folder to back up to in Linux so you can choose this yourself. In this guide, we are saving backups to /var/www_backups/. Ideally you would store these on an external drive or an offsite server, but in this guide we will focus on creating backups locally.

Begin by creating your backup folder.

sudo mkdir /var/www_backups/

Before initiating any backups, make sure you have sufficient disk space to store your backups.

To check for available disk space on your system, run:

df -h

To check the size of the directory you want to back up, in this example, the web document root:

du -s -h /var/www/html 

2. Understanding tar

The tar command in Linux is used to create .tar.gz  compressed archive files (called “tarballs”) suitable for storing backups.  The archive filename and extension can be whatever you want, but we recommend using your domain name followed by .tar.gz.

Here is an example of a typical tar command backing up this website’s web document root in  /var/www/html to an archive /var/www_backups/devanswe.rs.tar.gz .

sudo tar -cvpzf /var/www_backups/devanswe.rs.tar.gz -C /var/www/ html

It’s important you get this command right so it might help to break down each part:

  • sudo tar = execute  tar as superuser
  • c = create new file (overwrites old file)
  • v = verbose (will show backup progress on screen)
  • p = preserve permissions (755, 777, etc)
  • z = compress (compress the archive into a “tarball”)
  • f = filename
  • /var/www_backups/devanswe.rsm.tar.gz = the path and name of the backup archive that will be created.
  • -C = (uppercase C) = change to this directory
  • /var/www/ = the path to the folder above your document root
  •   = SPACE (Important!)
  • html = the name of the actual document root folder

The last four parts here -C /var/www/ html are required to keep your tarball directory structure concise, otherwise it will include the entire path (/var/www/html) in the tarball. As another example, if your doc root is somewhere like /var/www/example.com/public_html, then this part would be  -C /var/www/example.com public_html . That space before the doc root folder is important.

Excluding Files or Folders

If you need to exclude a particular file or folder from your backup, use --exclude=/file_or_directory_name . For example, in our daily backups of this website, we exclude the WordPress cache folder located in /var/www/html/wp-content/cache:

sudo tar -cvpzf /var/www_backups/devanswe.rs.tar.gz --exclude=html/wp-content/cache -C /var/www/ html

Notice --exclude=html – we don’t include the full path here, only name of the doc root folder itself. As another example, let’s say your doc root is located in /var/www/example.com/public_html, then your command would be:

sudo tar -cvpzf /var/www_backups/example.com.tar.gz --exclude=public_html/wp-content/cache -C /var/www/example.com/ public_html

To exclude more than one file or folder, simply repeat --exclude=/file_or_directory_name separated by a space as many times as you like.

Once you have your tar command prepared, continue to the next step to test.

3. Test Backup

Before configuring cron to automate your backup, you should first ensure your backup and restore is working correctly.

Execute your tar command in terminal. In the example below, we are backing up this website’s doc root.

sudo tar -cvpzf /var/www_backups/devanswe.rs.tar.gz -C /var/www/ html

Once your command has executed, you should see a list of files being processed by tar.

When done, list the backup folder to make sure your tarball is there. In this example, we’ll check the backup for this website. (the-la option here shows us file permissions and file sizes in human readable format).

ls -lh /var/www_backups/

Output:

-rw-r--r-- 1 root root 69M Dec  3 00:16 devanswe.rs.tar.gz

Above we can see our .tar.gz tarball in this folder.

4. Test Restore

Create a folder /var/www_backups/restore/ to restore the backup to.

sudo mkdir -p /var/www_backups/restore/

The command to extract and restore a tarball is very similar to the one used to create one, the only difference being the -x and z parameters, which mean extract and uncompress.

In the example below, we are going to extract devanswe.rs.tar.gz to our restore folder.

sudo tar -xvpzf /var/www_backups/devanswe.rs.tar.gz -C /var/www_backups/restore/

To break this down again:

  • sudo tar = execute  tar as superuser
  • x = extract tarball
  • v = verbose (will show backup progress on screen)
  • p = preserve permissions (755, 777, etc)
  • v = verbose (will show extract progress on screen)
  • z = uncompress
  • f = filename
  • /var/www_backups/devanswe.rs.tar.gz = the path and name of the tarball you want to extract and restore
  • -C = (uppercase C) = change to this directory
  • /var/www_backups/restore/ = the path where you want to restore to

Once the tarball has extracted, list the restore folder.

ls -l /var/www_backups/restore/

You should see a list of your doc root files and folders.

Once you’ve verified your restore, you can delete the restore folder and continue to the next step.

sudo rm -r /var/www_backups/restore/

5. Configure cron

cron is a service in Linux used to schedule automated commands. These are stored in a cron table called crontab.

To open crontab, run:

sudo crontab -e

Scroll to the bottom of the file and add your cron schedule and tar command. In the example below, we are backing up this website’s doc root daily.

/tmp/crontab.QMOot4/crontab
00 01 * * * sudo tar -cvpzf /var/www_backups/devanswe.rs.tar.gz -C /var/www/ html

00 01 * * * will run the command at 1AM every 24 hours and overwrite any current tarballs. As a test, you can change this to run in the next 3 minutes. If your time now is 16:30, enter 33 16 * * * for it to run at 16:33. (For more information on how to configure cron schedules, see Step 6 below).

Save and close crontab to initiate cron. (if using nano, press CTRL + X, press Y and then press ENTER)

Wait for your test cron to run and list the backup folder until you see your tarball.

ls -l /var/www_backups/
-rw-r--r-- 1 root root 69M Dec 3 16:33 devanswe.rs.tar.gz

If your tarball doesn’t appear after a while, make sure your command and crontab time are correct. You can also check the cron log with:

sudo grep CRON /var/log/syslog

Once you’ve verified the cron is running, you can change the schedule back to your preferred time.

6. Backup Frequency and Retention

In the previous steps, we learned how to configure cron to run every 24 hours and overwrite our tarball. However, you may want to retain multiple backups spread out over a week or longer.

We will now learn how to configure schedules in crontab and include a timestamp in the tarball filename to allow for more organized archiving of multiple files.

Crontab schedule

The default crontab entry begins with 5 stars followed by a command, which will run once a minute. You can change these to suit your exact schedule by minute, hour, day of month, month, and day of week.

.------------ minute (0-59) (* = every minute)
| .---------- hour (0-23) (* = every hour)
| | .-------- day of month (1-31) (* = every day)
| | | .------ month (1-12 or jan-dec) (* = every month)
| | | | .---- day of week (0-6 or mon-sun) (Sunday=0) (* = every day)
| | | | |
* * * * * command_to_run

Examples:

30 23 * * *       Every day at 11.30pm
0 0 * * *         Every day at midnight (00:00)
*/10 * * * *      Every 10 mins
0 */12 * * *      Every 12 hours 
0 17 * * sun      Every Sunday at 5pm
0 17 * * sun,mon  Every Sunday and Monday at 5pm
0 5,17 * * *      At 5am and 5pm daily
0 12 1 jan,feb *  At 12pm on the 1st of every Jan and Feb
0 0 1 * *         The 1st day of every month at midnight

Timestamps in Filename

To better manage multiple tarball archives, it’s recommend that you append a timestamp to the filename. We can do this using the date variable in Linux.

Let’s test the date variable using echo. To echo the current date in the format YYYY-MM-DD:

echo `date +%Y-%m-%d`
2018-12-07

To get the number of the day of the week (0 to 6, where 0 is Sunday and 6 is Saturday)

echo `date +%w`
2

Above we can see the day is 2 for Tuesday.

You can also echo the name of day of the week:

echo `date +%a`
Tue

For a full list of control characters supported by the date command, see: Linux Shell Script Date Format

We can use these date control characters in crontab to give our tarballs unique filenames.

In the next step we will show some examples and backup scenarios using crontab schedules and timestamps in filenames.

7. Backup Examples

Just a Daily Backup

If you just want a daily backup, use the crontab below to create a backup at 1am every morning. The tarball will be overwritten daily.

/tmp/crontab.QMOot4/crontab
00 01 * * * sudo tar -cvpzf /var/www_backups/devanswe.rs.daily.tar.gz -C /var/www/ html

7-day rolling backup

In this scenario, we will run a backup of the doc root at 1am and keep a copy for each day of the week. This is the same backup configuration we have for this website. Make sure you have enough disk space before setting these cron tasks.

/tmp/crontab.QMOot4/crontab
00 01 * * * sudo tar -cvpzf /var/www_backups/devanswe.rs.`date +\%a`.tar.gz -C /var/www/ html

This will initiate a backup every 1am. This part `date +\%a` will add the day of the week to the filename (Mon, Tue, Wed, etc).  This is so you have a tarball for each of the last 7 days and don’t have to worry about purging old copies. Note that in order to use the % symbol in crontab, it must be escaped with \, otherwise the cron will fail.

Here is a list of our backup folder showing a backup for each day of the week. Older backups are overwritten automatically.

ls -l /var/www_backups/
-rw-r--r-- 1 root root 69M Dec 3 01:00 devanswe.rs.Mon.tar.gz
-rw-r--r-- 1 root root 70M Dec 4 01:00 devanswe.rs.Tue.tar.gz
-rw-r--r-- 1 root root 70M Dec 5 01:00 devanswe.rs.Wed.tar.gz
-rw-r--r-- 1 root root 70M Dec 6 01:00 devanswe.rs.Thu.tar.gz
-rw-r--r-- 1 root root 72M Dec 7 01:00 devanswe.rs.Fri.tar.gz
-rw-r--r-- 1 root root 73M Dec 8 01:00 devanswe.rs.Sat.tar.gz
-rw-r--r-- 1 root root 73M Dec 9 01:00 devanswe.rs.Sun.tar.gz

12-month rolling backup

Another cron we have configured for this website is a monthly backup going back one year.

/tmp/crontab.QMOot4/crontab
0 0 1 * * sudo tar -cvpzf /var/www_backups/devanswe.rs.`date +\%b`.tar.gz -C /var/www/ html

This part `date +\%b` will add the month name (e.g ‘Jan’) in the tarball filename so you have a backup every month of the year.

Here is a list of our folder showing the monthly backup. As the years roll on, the older tarballs will simply be overwritten.

ls -l /var/www_backups/
-rw-r--r-- 1 root root 69M Jan 1 00:00 devanswe.rs.Jan.tar.gz
-rw-r--r-- 1 root root 70M Feb 1 00:00 devanswe.rs.Feb.tar.gz
-rw-r--r-- 1 root root 70M Mar 1 00:00 devanswe.rs.Mar.tar.gz
-rw-r--r-- 1 root root 70M Apr 1 00:00 devanswe.rs.Apr.tar.gz
-rw-r--r-- 1 root root 72M May 1 00:00 devanswe.rs.May.tar.gz
-rw-r--r-- 1 root root 73M Jun 1 00:00 devanswe.rs.Jun.tar.gz
-rw-r--r-- 1 root root 73M Jul 1 00:00 devanswe.rs.Jul.tar.gz

8. Offsite Backups

Now that you have your doc root backups stored locally in /var/www_backups, you should consider an offsite backup. For example, you could configure SFTP access to the /var/www_backups directory and then run an SFTP cron job on another server to pull these backups nightly.

What Next?

Now that you have your web server doc root safely backed up, you might also want to back up your databases.

Let me know if this helped. Follow me on Twitter, Facebook and YouTube, or 🍊 buy me a smoothie.

3 replies

Leave a reply

Your email address will not be published. Required fields are marked *

  1. C’était très clair, bien expliqué, bien présenté et avec un déroulement logique …
    Merci ! grâce à vous j’ai réaliser mon premier Cron job…