Posts from April 2010.

Flickr Backup

As some of you are probably (way too) aware of, I like to backup my social data across the web (see what I do for backing up my google calendars). I actually dream for the day when there is an Ubuntu package I can install, give it my credentials to a few websites (which it saves in your keyring), and then it proceeds to create an initial backup of all your data across all your services. Why do this? Well, aside from the mantra of “keep your own backups!” in case of service malfunction (remember when gmail went down for a few hours? I do, people went crazy), there is also the personal desire to have the ability to migrate to a new service should I wish in the future. If I find a better photo sharing service for some reason, I want to migrate my data/photos to it easily.

Now, backing up flickr.

There are few very important pieces of information to backup from flickr which I can do right now: my photos and my stats (views/referrals of my photos).

Photos
Tool: FlickrTouchr
Why #1: Do you backup your ~/Photos directory? If your answer is “No” or “Infrequently” you might really like this when your harddrive crashes and you don’t have local copies of those awesome photos from your awesome vacation.

Why #2: Do you take photos with your cell phone and upload them directly to flickr? Do you then clear them off your phone because they take up valuable space? This will make sure you have a copy of those on your own machine for easy editing/backup (see #1).

What it does: This one does what it does very well. It authorizes itself with your flickr account and then proceeds to download all of your photos (including your private ones, hence needing to authorize). Also, if you use the Sets feature of flickr, it keeps those associations by creating directories with the sets’ names. So, my directory structure that flickrtouchr creates for my account looks like this:

greg@rose:~/backup/flickr/photos$ ls -1
Bike Lock Fail Blog
Bike Ride – 20090628
Botanical Garden, July 4th, 2009
Bug Jam
Favourites
Gettysburg Trip
Jaunty Release Party
Mackinac Island Trip
No Set
SF – 2008
touchr.frob.cache
Traverse City – December ’09
UDS Karmic

You’ll see the “No Set” directory, which is where all the photos that are NOT part of any set.

How:
If you are going to run this script manually and your local machine with a web browser, you’ll be just fine and just follow the instructions it gives you. If, however, you are like me and want to run this via a cron job on a regular basis, you’ll need to take an extra 2 steps.
1. Start it on your local computer and it will authorize itself via your browser.
2. Kill it (CTRL+C) so you don’t have to sit there and wait for it to finish downloading all of your photos.
3. Copy the touchr.frob.cache file to your server and put it in the folder you’re going to backup your photos to.

Now when it runs it will pick up your credential information from that file and run as expected. Put 5 23 * * * python /home/greg/src/scripts/flickrtouchr.py in your crontab and you always have a backup of your photos! Don’t worry about running it every night; if the photo is already downloaded it just skips it (ie: It does The Right Thing®).

COOL! Now you have your photos backed up!

Statistics
Tool: My flickr-stats-export.sh based on this unnamed Github Gist
Why #1: I like numbers and these are the “raw” CSV files that flickr is producing for your photos. It tells you how many times your photos are viewed and what the referrer was.
Why #2: The stats are going away!

What it does: Pretty simply, it goes to the your stats download page and downloads all the CSV files linked from it. You can see that page by going to this url: http://www.flickr.com/photos/YOURUSERNAME/stats/downloads/ (fill in your username). It then makes a tar.gz of these to save space.

How:
How about we let it tell you:

greg@zen:~/src/scripts$ ./flickr-stats-export.sh –help

Usage: ./flickr-stats-export.sh DIRECTORY USERNAME COOKIES

DIRECTORY:
Directory to save the flickr-stats.tar.gz file of stats .CSVs

USERNAME:
Your flickr.com username

COOKIE:
See the -b flag from the CURL manpage.
It can be the contents of a cookie file or the full filename of the cookie file.
I recommend getting the cookie file from flickr using Firebug, then saving that
in the directory you plan to save the stats files.

If there is already a cookiejar.txt file in the download directory,
we will use that instead and this can be left blank.
See the -c flag from the CURL manpage for more on cookiejars.

As you can see, it needs your flickr cookies to run, so, 1) Install Firebug and Firecookie 2) Login to flickr 3) Go to the cookie tab in Firebug, then the Cookies dropdown and select “Export Cookies For This Site.” 4) Save that file somewhere.

I run this form my server, so I copied that cookies.txt file to the ~/backup/flickr/stats/ directory and then ran
./flickr-stats-export.sh ~/backup/flickr/stats/ grggrssmr /backup/flickr/stats/cookies.txt

I would suggest running this automatically so you don’t miss any stats. But, you only need to do it monthly as the stats csv files are only updated every first of the month. So, I have this in my crontab:
0 12 1 * * /home/greg/src/scripts/flickr-stats-export.sh /home/greg/backup/flickr/stats/ grggrssmr

Notice that I left off the cookies.txt? That is because after the first time it runs it saves the cookies in a “cookiejar.txt” file in the stats directory, and if that file is there, it uses it.

That cron job runs at Noon (Eastern time zone, where my server is) on the 1st day of every month. Why? This data will only be available until June 1st, 2010 at Noon PDT (Pacific time zone). So, I picked a time 3 hours before the data will disappear so that I A) won’t miss it and B) give it time to generate my data for the month of May. After June, you can remove this from your crontab as it won’t do much after the files are gone.

Luckily, if you forget to remove the script’s entry from your cronjob file after that date, it will just exit if it doesn’t have any .csv urls to download. So, it then won’t try to make a tar.gz of empty files and save empty data over your last good flickr-stats.tar.gz

Future Research
So, with those two things you have the photos and the statistics from your flickr account. However, that isn’t everything. I am working on extending flickrtouchr to also download the photo metadata (title, description, tags, comments, license) which it doesn’t save. With that metadata I will either create a metadata xml file associated with the jpg or embed the info INTO the jpg using the XMP standard (see: python-xmp-toolkit). You can see what I’m doing at this launchpad branch. Please feel free to branch it and help out!