Bits of Learning

Learning sometimes happens in big jumps, but mostly in little tiny steps. I share my baby steps of learning here, mostly on topics around programming, programming languages, software engineering, and computing in general. But occasionally, even on other disciplines of engineering or even science. I mostly learn through examples and doing. And this place is a logbook of my experiences in learning something. You may find several things interesting here: little cute snippets of (hopefully useful) code, a bit of backing theory, and a lot of gyan on how learning can be so much fun.

Wednesday, March 20, 2013

Backup Script in Python

Sometime back I put out a backup script written in bash shell script. After having used it successfully for a while, I decided to re-write it in Python. Several reasons:
It had bugs. The bugs were known (it's not as if it's a million lines of code). But I couldn't get rid of them. Simply because I could never understand the semantics of the language well enough to write what I wanted to, exactly.
On the other hand, Python has this elegant, readable, syntax. The things do as you would except them to. Compilation is not minimal as in the case of shell. So, it throws up errors when there are any, and you can catch them. Consequently, the python code that I came up with is smaller and more readable.
My initial experience in using it is that it seems to run quicker too. But I can't vouch for that.

I managed the re-implementation in a couple of hours, as opposed to probably several days worth of effort (with gaps) that went into the shell-script. Part of the reason was of-course that the design was absolutely clear in my mind as I had already done the implementation earlier. So, I don't repent having spent my effort building my first prototype and throwing it away. But another part is (please don't read this sentence if you are a shell-script fan): Python is more modern and better than shell-script.

Here are the main features:

  1. It will make the contents under sourceRoot identical to destinationRoot.
  2. It will do selective copying. Means, it will copy a file anywhere inside sourceRoot to the right place inside destinationRoot if and only if there's no corresponding copy in destinationRoot, or destination copy is found to be older than the source copy.
  3. It allows you to specify any subdirectory of sourceRoot that you want to backup. This is useful when you know that changes since the last backup are localised to a particular location, and you don't want to waste time scanning other locations. For large data, this saves a lot of time.

So, I invite you to prefer this new one to the old one. It does everything that the shell-script did. And it doesn't have the old bugs. So here goes the code:


import os
import sys
import shutil

from os import *
from string import *

sourceRoot = ""; # fill in appropriate value
destinationRoot = ""; # fill in appropriate value
backupDirectoryName = ""; # fill in appropriate value

def isSubdirectory(dir1, dir2):
 if(len(dir1) < len(dir2)):
  return False
 if(not dir1[:len(dir2)] == dir2):
  return False
 return True

def getRelativeName(dirName, root):
 if(not isSubdirectory(dirName, root)):
  raise Exception(dirName + ' is not a subdirectory of ' + root)
 if(len(dirName) < len(root)):
  return dirName
 if(root[-1] == '/'):
  return dirName[len(root):]
  return dirName[len(root) + 1:]

# always provide the full path name
def backupDir(dirName):
 global sourceRoot

# print 'going into directory: ' + dirName
 relativePathName = getRelativeName(dirName, sourceRoot)
 currentDestinationDir = destinationRoot + '/' + relativePathName
 if(not os.path.exists(currentDestinationDir)):
 allnames = listdir(dirName)
 for name in allnames:
  sourceName = dirName +'/' + name
   destinationFileName = currentDestinationDir + '/' + name
   if((not os.path.exists(destinationFileName)) or (os.path.getmtime(sourceName) > os.path.getmtime(destinationFileName))):
    print 'copying ' + sourceName + ' to ' + currentDestinationDir + ' ...'
    shutil.copyfile(sourceName, destinationFileName)
#   else:
#    print 'not copying ' + sourceName + ' to ' + currentDestinationDir + ' ...'
   print 'Something wrong with ' + name
# print 'going out of directory: ' + dirName

# path is relative to root, not an absolute path
def createPath(path, root):
 def getPathName(path):
  if(len(path) > 1):
   return path[0] + '/' + getPathName(path[1:])
  return path[0]

 if(not os.path.exists(root + '/' + getPathName(path))):
  if(len(path) > 1):
   createPath(path[:-1], root)
  os.mkdir(root + '/' + getPathName(path))

# given a full/absolute pathname dirName, this returns the path relative to root.
def getRelativePath(dirName, root):
 if(not isSubdirectory(dirName, root)):
  raise Exception(dirName + ' is not a subdirectory of ' + root)
 return split(getRelativeName(dirName, root), '/')

def backup(s, d):
 global backupDirectoryName
 global sourceRoot
 global destinationRoot

 sourceRoot = s
 destinationRoot = d

 if(len(sys.argv) != 2):
  print 'usage: dirname'

 backupDirectoryName = sys.argv[1]
 if(sourceRoot == ''):
  print "sourceRoot hasn't been set"
 if(destinationRoot == ''):
  print "destinationRoot hasn't been set"
 if(backupDirectoryName == ''):
  print "backupDirectoryName hasn't been set"
 if(not os.path.isdir(sourceRoot)):
  print "sourceRoot = " + sourceRoot + " doesn't exist."
 if(not os.path.isdir(destinationRoot)):
  print "destinationRoot = " + destinationRoot + " doesn't exist."
 if(not os.path.isdir(backupDirectoryName)):
  print "backupDirectory = " + backupDirectoryName + " doesn't exist."
 if(not isSubdirectory(backupDirectoryName, sourceRoot)):
  print "backupDirectoryName isn't a subdirectory of the sourceRoot."
 createPath(getRelativePath(backupDirectoryName, sourceRoot), destinationRoot)

The above code is a sort of library, which you use along with a driver script. An example follows:


from backup import *

sourceRoot = '/home/sujitkc/my_work'
destinationRoot = '/home/sujitkc/work_backup'

backup(sourceRoot, destinationRoot)

We usually have multiple backup scenarios. For example, I have the following scenarios:

  1. I take daily (approximately) backup of my work folder at office.
  2. I take fortnightly or monthly backup of my personal data at home.
  3. I take monthly backup of my entire data.

In the above three cases, the only thing that changes is the sourceRoot and destinationRoot. Therefore, I have a separate driver for each of the above scenarios: and respectively. In each of these drivers, I have set the sourceRoot and destinationRoot variables to a different appropriate value. Depending on my backup scenario, I just have to run the corresponding driver script. That's all!
All you need to do to use the script are the following:
  • Save script 1 as somewhere and make it executable using:
chmod +x
  • Save script 2 as, say,, at the same location and make it executable using:
chmod +x
  • In script 2, change the values of the sourceRoot and destinationRoot variables to appropriate locations.
  • You are all set! Now run the script as follows:
./ your_backup_directory
If you are just starting off with this script and are wondering what to put as your_backup_directory, here's a clue: often, the your_backup_directory parameter has the same value as sourceRoot.


If you find the above script useful, it will be highly appreciated if you drop me a word of acknowledgement as a comment to this post. Feel free to communicate if you find any problem with the code.

Related Post:

A Script for Backing Up

No comments: