Thoughts on TDD

I want to know what the buzz around Test-Driven Development is all about. The proponents make a good case for it. I’ve read about it a good bit but have never worked with anyone who practiced it. I also tend to be leery when I see people becoming religiously fanatical about anything, and some of what I read sounds, if not fanatical, at least unrealistic. Nonetheless there are a lot of sensible voices promoting TDD as a way to build better, more maintainable software. That sounds good to me.

I have tried to use TDD on several small projects to get a feel for it. I am recording my (somewhat random) thoughts on it here for future reference:

  • I want to know that TDD really helps and works and that it’s not just a smartypants thing.
  • I’m not good at it yet. I haven’t done enough TDD (but I still have thoughts on it).
  • It’s not a panacea. Get your silver bullet here.
  • It IS helpful when refactoring.
  • I think there probably were fewer bugs in the first live versions because of the tests.
  • It can be a trap if you’re not careful to keep the end result of the system in mind. Seems easy to focus too much on writing tests. I feel a tendency toward myopia when writing a lot of tests before doing any integration.
  • It doesn’t replace the need to look at the results of your code (for example, writing results to text files, or CSV files to be imported into a spreadsheet to look at, visualize with your brain).
  • You’ll never write enough tests to catch every possible failure.
  • The wrong algorithm with 100% test coverage is still the wrong algorithm.
  • It doesn’t replace manual or automated acceptance testing.
  • It will take longer up front to build a system using TDD. It may lead to a more correct system at the first release. It may save time later when the system needs to change.
  • You will throw away more code. If you find that some code isn’t needed and remove it, that code may have multiple tests associated with it that will also be removed. Maybe this isn’t bad since we’re supposed to be throwing away the prototype but often the prototype ends up turning directly into the production version. Perhaps this is a way of throwing away the prototype a little piece at a time. But seriously…
  • I don’t like the idea of significantly changing or adding complexity to the architecture of a system solely to make it more amenable to unit testing. Maybe it’s worth it.
  • It’s easier to do unit testing when using dynamic languages.
  • You’re more likely to need those tests when using dynamic languages.
  • I’m not sold on writing tests as THE way to drive development, but then, like I said, I’m not good at it yet.
  • Having unit tests is good. Regardless of whether tests are in the driver seat I plan to take advantage of automated unit testing and automated acceptance testing.
  • At this point it seems unlikely that I’ll adopt test-first style TDD as my preferred method for building software (or as a religion) but I’m still going to try to do it as a first approach. I’ll also be willing to abandon it without regrets if it becomes cumbersome for the job at hand.
  • Finally, it will be good to know how to do TDD in case someday I end up with a boss who says I have to. BTW: If that’s the reason you’re doing TDD you’re probably doing it for the wrong reason.

That’s enough for now. I also plan to look into Behavior-Driven Development (AKA smartypants TDD) and the associated tooling.

Pair Networks Database Backup Automation

I have a couple WordPress blogs, this being one of them, hosted at Pair Networks. I also have another non-blog site that uses a MySQL database. I have been doing backups of the databases manually through Pair’s Account Control Center (ACC) web interface on a somewhat regular basis, but it was bugging me that I hadn’t automated it. I finally got around to doing so.

A search led to this blog post by Brad Trupp. He describes how to set up an automated database backup on a Pair Networks host. I used “technique 2” from his post as the basis for the script I wrote.

Automating the Backup on the Pair Networks Host

First I connected to my assigned server at Pair Networks using SSH (I use PuTTY for that). There was already a directory named backup in my home directory where the backups done through the ACC were written. I decided to use that directory for the scripted backups as well.

In my home directory I created a shell script named dbbak.sh.

touch dbbak.sh

The script should have permissions set to make it private (it will contain database passwords) and executable.

chmod 700 dbbak.sh

I used the nano editor to write the script.

nano -w dbbak.sh

The script stores the current date and time (formatted as YYYYmmdd_HHMM) in a variable and then runs the mysqldump utility that creates the database backups. The resulting backup files are simply SQL text that will recreate the objects in a MySQL database and insert the data. The shell script I use backs up three different MySQL databases so the following example shows the same.

#!/bin/sh

dt=`/bin/date +%Y%m%d_%H%M`

/usr/local/bin/mysqldump -hDBHOST1 -uDBUSERNAME1 -pDBPASSWORD1 USERNAME_DBNAME1 > /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME1.sql

/usr/local/bin/mysqldump -hDBHOST2 -uDBUSERNAME2 -pDBPASSWORD2 USERNAME_DBNAME2 > /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME2.sql

/usr/local/bin/mysqldump -hDBHOST3 -uDBUSERNAME3 -pDBPASSWORD3 USERNAME_DBNAME3 > /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME3.sql

Substitute these tags in the above example with your database and account details:

  • DBHOST is the database server, such as db24.pair.com.
  • DBUSERNAMEn is the full access username for the database.
  • DBPASSWORDn is the password for that database user.
  • USERNAME_DBNAMEn is the full database name that has the account user name as the prefix.
  • USERNAME is the Pair Networks account user name.
  • DBNAMEn is the database name without the account user name prefix.

Once the script was written and tested manually on the host, I used the ACC (Advanced Features / Manage Cron jobs) to set up a cron job to run the script daily at 4:01 AM.

Automating Retrieval of the Backup Files

It was nice having the backups running daily without any further work on my part but, if I wanted a local copy of the backups, I still had to download them manually. Though FileZilla is easy to use, downloading files via FTP seemed like a prime candidate for automation as well. I turned to Python for that. Actually I turned to an excellent book that has been on my shelf for a few years now, Foundations of Python Network Programming by John Goerzen. Using the ftplib examples in the book as a foundation, I created a Python script named getdbbak.py to download the backup files automatically.

#!/usr/bin/env python
# getdbbak.py

from ftplib import FTP
from datetime import datetime
from DeleteList import GetDeleteList
import os, sys
import getdbbak_email

logfilename = 'getdbbak-log.txt'
msglist = []

def writelog(msg):
    scriptdir = os.path.dirname(sys.argv[0])
    filename = os.path.join(scriptdir, logfilename)
    logfile = open(filename, 'a')
    logfile.writelines("%sn" % msg)
    logfile.close()

def say(what):
    print what
    msglist.append(what)
    writelog(what)

def retrieve_db_backups():
    host = sys.argv[1]
    username = sys.argv[2]
    password = sys.argv[3]
    local_backup_dir = sys.argv[4]
    
    say("START %s" % datetime.now().strftime('%Y-%m-%d %H:%M'))
    say("Connect to %s as %s" % (host, username))

    f = FTP(host)
    f.login(username, password)

    ls = f.nlst("dbbak_*.sql")
    ls.sort()
    say("items = %d" % len(ls))
    for filename in ls:
        local_filename = os.path.join(local_backup_dir, filename)
        if os.path.exists(local_filename):
            say("(skip) %s" % local_filename)
        else:
            say("(RETR) %s" % local_filename)
            local_file = open(local_filename, 'wb')
            f.retrbinary("RETR %s" % filename, local_file.write)
            local_file.close()
            
    date_pos = 6
    keep_days = 5
    keep_weeks = 6
    keep_months = 4    
    del_list = GetDeleteList(ls, date_pos, keep_days, keep_weeks, keep_months)
    if len(del_list) > 0:
        if len(ls) - len(del_list) >= keep_days:
            for del_filename in del_list:
                say("DELETE %s" % del_filename)
                f.delete(del_filename)
        else:
            say("WARNING: GetDeleteList failed sanity check. No files deleted.")
    
    f.quit()
    say("FINISH %s" % datetime.now().strftime('%Y-%m-%d %H:%M'))
    getdbbak_email.SendLogMessage(msglist)


if len(sys.argv) == 5:
    retrieve_db_backups()
else:
    print 'USAGE: getdbbak.py Host User Password LocalBackupDirectory'

This script runs via cron on a PC running Ubuntu 8.04 LTS that I use as a local file/subversion/trac server. The script does a bit more than just download the files. It deletes older files from the host based on rules for number of days, weeks, and months to keep. It also writes some messages to a log file and sends an email with the current session’s log entries.

To set up the cron job in Ubuntu I opened a terminal and ran the following command to edit the crontab file:

crontab -e

The crontab file specifies commands to run automatically at scheduled times. I added an entry to the crontab file that runs a script named getdbbak.sh at 6 AM every day. Here is the crontab file:

 
MAILTO="" 

# m h dom mon dow command 

0 6 * * * /home/bill/GetDbBak/getdbbak.sh 

The first line prevents cron from sending an email listing the output of any commands cron runs. The getdbbak.py script will send its own email so I don’t need one from cron. I can always enable the cron email later if I want to see that output to debug a failure in a script cron runs.

Here is the getdbbak.sh shell script that is executed by cron:

 
#!/bin/bash 

/home/bill/GetDbBak/getdbbak.py FTP.EXAMPLE.COM USERNAME PASSWORD /mnt/data2/files/Backup/PairNetworksDb 

This shell script runs the getdbbak.py Python script and passes the FTP login credentials and the destination directory for the backup files as command line arguments.

As I mentioned, the getdbbak.py script deletes older files from the host based on rules. The call to GetDeleteList returns a list of files to delete from the host. That function is implemented in a separate module, DeleteList.py:

#!/usr/bin/env python
# DeleteList.py

from datetime import datetime
import KeepDateList


def GetDateFromFileName(filename, datePos):
    """Expects filename to contain a date in the format YYYYMMDD starting 
       at position datePos.
    """   
    try:
        yr = int(filename[datePos : datePos + 4])
        mo = int(filename[datePos + 4 : datePos + 6])
        dy = int(filename[datePos + 6 : datePos + 8])
        dt = datetime(yr, mo, dy)
        return dt
    except:
        return None
 

def GetDeleteList(fileList, datePos, keepDays, keepWeeks, keepMonths):
    dates = []
    for filename in fileList:
        dt = GetDateFromFileName(filename, datePos)
        if dt != None:
            dates.append(dt)
    keep_dates = KeepDateList.GetDatesToKeep(dates, keepDays, keepWeeks, keepMonths)        
    del_list = []
    for filename in fileList:
        dt = GetDateFromFileName(filename, datePos)
        if (dt != None) and (not dt in keep_dates):
                del_list.append(filename)    
    return del_list

That module in turn uses the function GetDatesToKeep defined in the module KeepDateList.py to decide which files to keep on order to maintain the desired days, weeks, and months of backup history. If a file’s name contains a date that’s not in the list of dates to keep then it goes in the list of files to delete.

#!/usr/bin/env python
# KeepDateList.py

from datetime import datetime


def ListHasOnlyDates(listOfDates):
    dt_type = type(datetime(2009, 11, 10))
    for item in listOfDates:
        if type(item) != dt_type:
            return False
    return True
    

def GetUniqueSortedDateList(listOfDates):
    if len(listOfDates) < 2:
        return listOfDates
    listOfDates.sort()
    result = [listOfDates[0]]
    last_date = listOfDates[0].date()
    for i in range(1, len(listOfDates)):
        if listOfDates[i].date() != last_date:
            last_date = listOfDates[i].date()
            result.append(listOfDates[i])
    return result
    
    
def GetDatesToKeep(listOfDates, daysToKeep, weeksToKeep, monthsToKeep):
    if daysToKeep < 1:
        raise ValueError("daysToKeep must be greater than zero.")
    if weeksToKeep < 0:
        raise ValueError("weeksToKeep must not be less than zero.")
    if monthsToKeep  0) and (tail > 0):
        tail -= 1
        days_left -= 1
        keep.append(dates[tail])
        
    year, week_number, weekday = dates[tail].isocalendar()
    weeks_left = weeksToKeep
    while (weeks_left > 0) and (tail > 0):
        tail -= 1
        yr, wn, wd = dates[tail].isocalendar()
        if (wn  week_number) or (yr  year):
            weeks_left -= 1
            year, week_number, weekday = dates[tail].isocalendar()
            keep.append(dates[tail])
        
    month = dates[tail].month
    year = dates[tail].year
    months_left = monthsToKeep
    while (months_left > 0) and (tail > 0):
        tail -= 1
        if (dates[tail].month  month) or (dates[tail].year  year):
            months_left -= 1
            month = dates[tail].month
            year = dates[tail].year
            keep.append(dates[tail])
        
    return keep

I also put the function SendLogMessage that sends the session log via email in a separate module, getdbbak_email.py:

#!/usr/bin/env python
# getdbbak_email.py

from email.MIMEText import MIMEText
from email import Utils
import smtplib

def SendLogMessage(msgList):
    from_addr = 'atest@bogusoft.com'
    to_addr = 'wm.melvin@gmail.com'
    smtp_server = 'localhost'
    
    message = ""
    for s in msgList:
        message += s + "n"

    msg = MIMEText(message)
    msg['To'] = to_addr 
    msg['From'] = from_addr 
    msg['Subject'] = 'Download results'
    msg['Date'] = Utils.formatdate(localtime = 1)
    msg['Message-ID'] = Utils.make_msgid()

    smtp = smtplib.SMTP(smtp_server)
    smtp.sendmail(from_addr, to_addr, msg.as_string())

Here is a ZIP file containing the set of Python scripts, including some unit tests (such as they are) for the file deletion logic: GetDbBak.zip

I hope this may be useful to others with a similar desire to automate MySQL database backups and FTP transfers who haven’t come up with their own solution yet. Even if you don’t use Pair Networks as your hosting provider some of the techniques may still apply. I’m still learning too so if you find mistakes or come up with improvements to this solution, please let me know.

Bookmarks Selective History

Some folks don’t keep bookmarks in their browsers anymore since you can always use a search engine to find things when you need them. Problem is, often the thing that interests me is not one of the top items in a search result set (even when I can remember the right search terms to use). If I’m looking for a specific thing I saw before I’m probably not going to be satisfied with search results showing me similar things but not that specific thing. I still use bookmarks.

I used to try to put bookmarks into folders based on category. That’s labor intensive always sorting out a categorization every time I create a bookmark. What I’ve been doing lately is this: When I first open Firefox (doesn’t have to be Firefox but that’s my main browser) I create a folder in the Bookmarks Toolbar named for the date such as 20090901. That folder is where I’ll drop any bookmarks collected during the day. I will also review the previous day’s folder for any items that I want to move to a category folder I already have (usually don’t move any). I then move the previous day’s folder to a folder named SelectiveHistory that is one level down from the Bookmarks Toolbar under a folder named Bill.

selectivehistoryexample3

This has been working well. I have found that when I want to go back to a web page it is more likely to be a recent one so I don’t usually have to browse back too far in my SelectiveHistory. Firefox makes it easy to browse the bookmarks tree by simply hovering the pointer. I can also choose Tools – Organize Bookmarks to open the bookmarks Library and do a search when looking for something not so recent.

I do some maintenance on the SelectiveHistory folder by moving the daily folders into a previous month folder and monthly folders into a previous year folder. Doing so takes little time and not a lot of thought (easy enough to do before the coffee kicks in). I should also mention that I use Xmarks to synchronize my massive bookmarks collection across the computers I use Firefox on frequently. I really should mention that, because I don’t think my method of collecting bookmarks described above would work nearly as well without synchronization.

CONDG Meeting – August 2009

I enjoyed the presentation by Bill Sempf at this month’s Central Ohio .NET Developers Group even if it was a little disorganized. I think Mr. Sempf and I share a certain scatterbrain quality though his achievements point to an ability to focus deeply when needed. He’s like smarter more extroverted version of the Bill writing this post. And we share a similar hairline.

Bill discussed some of the changes coming in C# 4.0 and how some of the smaller changes that started in C# 3.0 are part of a larger strategy to make things like LINQ possible, and make COM interop work more smoothly. He also pointed out some additions to Visual Studio that may be helpful when doing Test Driven Development. Visual Studio has been rewritten in WPF for version 2010 so go get more RAM.

A lot of the changes to C# are to help it compete with dynamic languages like Ruby and Python, and to make it not suck for automating Microsoft Office. Oh no. It’s VB with braces. At least it doesn’t have DIMs and SUBs.

Pizza was provided by Information Control Corporation (ICC), a company based in Columbus that (if I heard right) Bill Sempf works with as a consultant. ICC has released an open source framework called MVC4WPF. Thanks for the framework, and the pizza.

PyOhio 2009

I don’t make it to many of the Central Ohio Linux User Group meetings but I happened to be at the one where Catherine Devlin stopped in to announce an upcoming Python conference in Ohio, strangely enough, named “PyOhio.” That was in 2008 and, though it sounded interesting, I couldn’t make it to the conference. Nonetheless I did remain interested and was able to attend the first day, July 25th, of the two day conference this year.

PyOhio 2009 was held in a giant, oddly-shaped, glass-lined, marble-shingle-clad, cement block named Knowlton Hall on the OSU campus. The building is strangely proportioned. There are steps for giants and seats for little people. In fairness, it was a good place for the conference (except for that UPS that apparently lost power behind some locked door beeping constantly all day). The giant steps are meant for sitting on (not a special staircase for the basketball team) and college students typically don’t need as much seat space to be comfortable as I do. I just don’t share architectural aesthetic with the designer of the building.

Getting Started with Django

The first presentation I attended was an introduction to Django by Alex Gaynor. According to the Django web site “Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design.” Alex did a nice job introducing Django. He has obviously worked with it a great deal as evidenced by the live demo he did at the end. The live demo didn’t go perfectly but it went well. It takes courage to do an impromptu live demo since it’s a huge opportunity to crash and burn.

According to Alex, Django doesn’t really use the Model-View-Controller (MVC) pattern precisely but rather uses a pattern more like Model-View-Template (or maybe Model-Template-View – MTV). Django is “opinionated” to be more secure by default, you have to work harder to do things the wrong way (the old “falling into the pit of success” thing). Django views render templates. Templates use a custom template language within HTML. You define base templates that then reference child templates (it’s probably the other way around: child templates reference base templates, but I’m not sure – sorry). In Django a “project” is a web site and an “application” is an individual component of a web site. To learn more about Django (ignore what I’ve written here and) check out The Django Book online.

Django looks very interesting and appears to do a lot of the heavy lifting for you. I hope to explore it on the side sometime over the next year. I’ve done some work with PHP recently, a small project that didn’t use an existing framework. With frameworks like Django available it doesn’t make sense to not use one on anything but the simplest web project (and even then, those “simple” projects tend to become not so simple once you let them out of your head).

Python for Java Developers

The second session I went to was Python for Java Developers presented by Eric Floehr. Eric works at 3x Systems and has been working with both Java and Python. He is also working on starting a Python user group in central Ohio.

Eric talked about the similarities and differences between Java and Python, starting with the histories of the languages. He noted that James Gosling the creator of Java and Guido van Rossum the creator of Python are both cool guys.

The Python language was originally developed for the Amoeba distributed operating system developed by Andrew Tanenbaum who also developed Minix, the inspiration for the Linux kernel (apparently a rather influential fellow).

Eric also mentioned:

Python Operator Overloading

Neil Ludban talked (a little too quietly) about operator overloading in Python. He showed how the special methods, with names that begin and end with double underscores, implement operations on objects in Python. For example, the statement x + y is internally passing y to the __add__ method of object x like this: x.__add__(y)

I have a few terse notes about some things I want to explore further (my note taking diminishes as the day goes on):

Equality is subset of Sortable
x[y] = x[y.__index__()]
__repr__()
__unicode__()
bool(x); __nonzero__(x)
import operator; help(operator)
PEP-3119
Explore: slice, functools
with statement (PEP-343) implicit try/finally block
Attributes versus properties?
f(*iterable, **mapping)

There. Isn’t that helpful?

Python Not Harmful to CS Majors

Bill Punch from Michigan State University talked about the decision to replace C++ with Python in their CS1 class and the results thereof. Python seems to let them spend less time teaching the tooling and more time teaching problem solving. Bill asked us to try to remember what it was like to be a first time programmer (admittedly a hard thing to do after so many years). I appreciate that because over the years I have tried to write software that is accessible to non-computer people, and to do so requires seeing the use of the software from their perspective as best you can.

I enjoyed Bill’s description of “Dung Beetle Programmers” too: Instead of really understanding the program they are writing, and the problem they are solving, they just pile on code creating a ball of dung. Then to get it to work they pile on more for a bigger ball of dung. Finally they end up with an inordinate attachment to dung hence they don’t discard any bad or unnecessary code.

From the statistics they collected as students have progressed past CS1 since the switch to Python in fall 2007 they conclude that Python has not hurt the CS program. On the other hand, there doesn’t seem to be statistical evidence of a great improvement. Bill reported that he has seen a positive change in other ways. Students have come to him with stories of how they have used Python to solve real-world problems.

Form to Database Web Development

I caught the end of a session by Gloria W. Jacobs that I would like to have seen more of. Just a few links from that:

Game Development with Python and Pyglet

The quick and witty Steve Johnson talked about game development with Python. Unfortunately I only caught the end of this one as well, so only a few links:

Lightning Talks

The official PyOhio schedule ended with lightning talks:

Catherine Develin showed off some of the cool stuff you can do with sqlpython, an open source command line interface to Oracle she contributes to (and I believe has taken the lead on).

I failed to get the name of the gentleman who spoke on scaling and suggested using log shipping to update a set of read-only databases from a single write-to database. The read-only databases feed web servers.

Joe Amenta talked about Python 3.0 and a project called lib3to2 he’s working on to help you port backward to the 2.x Python interpreter should you need to do that.

Steve Johnson – funny slides to remind us that the name comes from Monty Python, not the snake.

zsh guy showed us the power of the command line.

Disease modeling in Python guy had scary diagrams where the end result was death (sort of like – life). High powered Python libraries: SymPy, NumPy, SciPy, and WxPython for the GUI.

Gloria W. Jacobs talked about Kamaelia.

Somebody talked about introspection in PyGame but by then my note taking was almost as bad as my memory at the end of a learning-packed day. After the conclusion of the scheduled sessions there were open spaces, sprints, and other evening activities. I couldn’t stick around for those this year.

Wrapping Up

Watching the other attendees working with their notebook PCs (saw a lot of Apples there) made me think I’d like to have some sort of wee PC with good battery life and mobile broadband. My spiral bound steno pad from Wal-mart just didn’t afford me any coolness or connectedness.

I want to thank all the folks who made PyOhio happen. I’d like to thank Catherine Devlin in particular for her role in organizing and spreading the word, and doing so with great enthusiasm. Assuming there will be a PyOhio 2010, I hope to be there for the whole event and maybe even contribute in some way. I don’t know that my Python skills will be up to presenter level by then but maybe I’ll at least come prepared to do an open space of some sort.

Python Imaging Library – Introduction

Sometimes you run across an item in an article or blog post that you don’t take much notice of at the time but it makes just enough of an impression that you recall its existence later, though you may forget the source. I recall reading about working with image files in Python but I don’t remember the source. I do remember there was an example that appeared to be doing some significant image manipulation in just a few lines of code.

It was a few years ago, and some time after that initial encounter, that I found myself with directories full of Windows bitmap files of several megabytes each. These were screen shots captured using either a tool called Screen Seize or using the manual method of pressing Print Screen and pasting into Paint. Regardless of how they got there, it was bugging me that they were taking up so much space. Disks are huge and space is cheap these days but I still recall that the first hard disk drive I used. It had a capacity of 5 MB and cost several thousand dollars. It’s ingrained that I don’t like wasting disk space.

Facing that listing of BMP files, the memory of that image manipulation example in Python came back to me. I searched and found the Python Imaging Library (PIL). You need to have Python installed first. Download and install the version of the PIL to match the installed version of Python and you’re good to go. The Python installer registers the .py extension so typing just the name of a Python script at a command prompt will invoke the Python interpreter to execute the script. I created a script named bmp2png.py (the old ‘2’ for ‘to’) and placed it in a directory that is in the PATH. To use the script, I simply opened a command prompt in the directory containing the bitmap files and ran bmp2png.py to create a smaller PNG file from each BMP file. Of course I looked at some of the PNG files to make sure the conversion went well before manually deleting the original BMP files.

To anyone familiar with Python, the following is a very obvious and simple script. It may also be non-Pythonic, or wrong in some way. I’m no Python guru, just a casual enthusiast at this point. There are a few “extra” lines in the script. The ones with the print statements are just for visual feedback. I like visual feedback (except from other drivers on the freeway).

import os, Image

print 'Converting BMP to PNG in ' + os.getcwd()
ls = os.listdir(os.getcwd())
for f in ls:
    name, ext = os.path.splitext(f)
    if ext.lower() == ".bmp":
        outfile = name + ".png"
        print '  ' + f + ' -&gt; ' + outfile
        Image.open(f).save(outfile)
print 'Done.'

Line 1 imports the os module needed to work with directories and such, and the Image module which contains the Python Imaging Library. Line 4 gets a list of all files in the current working directory, and at line 5 we start working with each file in the list. Line 6 splits the file name and extension into separate variables. We’ll process only the files with a .bmp extension. After making a new file name with the .png extension we get to line 10 where the magic happens. The save method of the Image object will convert the format of the file based on the extension of the given file name. That’s all there is to converting the files. Actually there can be a lot more to it if you want. The PIL uses default options when you don’t specify otherwise, but there are options available if you want more control over the conversion.

I have been impressed with what the Python Imaging Library can do, and I’ve just scratched the surface (oops, better buff that out – sorry). Though I use more efficient screen capture methods these days, I’ve found the above script useful from time to time. It was just a starting point. There are several similar, and slightly more advanced, scripts I plan to share in future posts.

CONDG Meeting – July 2009

Wow! I have been remiss as a blogger. No posts since April. I logged in and see there are five drafts I haven’t finished. I don’t know if this has anything to do with the fact that I started using Twitter in the meantime. Twitter: It’s like a sputtering of creative sparks, 140 character sparks at most (and mine not all that creative), that burn through the fuel of creative energy but never really get the fire going. There is something addictive about Twitter when you’re a geek. Maybe I shouldn’t blame my lack of writing on Twitter. There have been a lot of other things going on the last few months. On the bright side, I doubt many read this blog (if I checked metrics I’d know) so it’s not a big deal. But even if this is only a journal for my own future reference I should keep it up, right? Well, on to the meeting.

At this month’s meeting of the Central Ohio .NET Developers Group, Jeremiah Peschka (already following Jeremiah on Twitter) talked about SQL Server and Object-Relational Mapping. Jeremiah talked specifically about the NHibernate ORM tool. I’ve read a lot about NHibernate but so far have not worked on a project that used it. Prior to showing NHibernate, the support for hierarchical data in SQL Server was discussed. It seems that this hierarchical data could be useful in ORM scenarios. I really enjoyed the presentation and look forward to working with some of the tools and techniques that were discussed.

On a side note: Maybe it’s just me, but there’s something about Jeremiah’s mannerisms that reminds me of Clark Howard (just followed Clark Howard on Twitter). Of course Jeremiah is a much cooler guy than Clark, maybe not as rich. Of course I say that without really knowing either of them. And maybe I should be tweeting this instead.

How I Split Podcast Files

Update 2011-01-18: The Sansa m250 player finally died, and I now have a newer mp3 player that fast-forwards nicely, so I no longer do this goofy podcast splitting stuff.

Note: This is a “How-I” (works for me) not a “How-to” (do as I say) post.

I do goofy stuff sometimes. For example, I use Linux to download a couple podcasts targeted to Microsoft Windows developers. Specifically, I use µTorrent (that’s the “Micro” symbol so the name is pronounced “MicroTorrent”), a Windows BitTorrent client, running in Wine on Ubuntu to download the .NetRocks and Hanselminutes podcasts. I’ve had no problems running µTorrent in Wine. I got started doing this because my mp3 player was awkward to work with in Windows XP.

When I connected my Sansa m250 mp3 player to a Windows XP box, the driver software XP loaded wanted me to interact with the mp3 player as a media device. It has been a while, and I can’t recall exactly what it did, but I do recall it wanted me to use a media library application (one that would probably try to enforce DRM restrictions) and did not give me direct access to the file system on the player. There is probably a way around that, but I didn’t find it quickly at the time. What I did find was that when I connected the mp3 player to my old PC running Ubuntu it detected it and mounted it as a file system device that I could happily copy mp3 files to as I pleased. Good enough for me.

At first I was using the Azureus BitTorrent client, which is a Java app and runs on Ubuntu, to download the podcasts (and an occasional distro to play with). That application seemed to get more bloated with each release. It started displaying a bunch of flashy stuff and promoting things that you probably shouldn’t be downloading (but it’s okay if you don’t believe in copyright). I read about µTorrent and tried it on a Windows XP PC. It’s a lightweight program that does BitTorrent well without promoting piracy (personally, I do think copyright, with limits, is a good thing). While this worked well for downloading, I didn’t like the extra step of copying files from the PC running Windows to the other running Ubuntu to load them onto my mp3 player. After reading a timely article about Wine (the source of the article escapes me now), I decided to try running µTorrent using Wine. I don’t recall having any problems setting it up, it just worked. I did have to fiddle with my router to set up port forwarding but that’s not related to Wine or Ubuntu, just something you may have to do for BitTorrent to work.

This method of downloading the podcasts works well, but that’s not the end of the story. Occasionally I would be part way through a podcast and, for some reason (maybe I was trying to rewind a little bit within the file but my finger slipped and it went back to the beginning of the file), I would have to fast-forward to where I left off. Hour-long podcasts in a single mp3 file are not easy to fast forward with the Sansa player I have. It doesn’t forward faster the longer you hold the button like some devices do, it just goes at the same (painfully slow for a large file) pace. It seemed like splitting the mp3 files into sections would make that sort of thing easier. Bet there’s an app for that.

A search of the Ubuntu application repository turned up mp3splt. It has a GUI but I only wanted the command line executable which is available in the repository and can be installed from the command line (note that there’s no “i” in mp3splt):

sudo apt-get install mp3splt

After a couple trips to the man page to sort out which command line arguments to use, I had it splitting big mp3 files into sections in smaller mp3 files. That worked for splitting the files but I found that the player didn’t put those files in order when playing back. That’s not acceptable. I probably could just make a playlist file and use that to get the sections to play in order. I wondered if setting the ID3 tags in a way that numbered the tracks would make the player play them in order. Turns out it would. A search for “ID3” in the Ubuntu repository led to id3tool, a simple utility for editing the ID3 tags in mp3 files. I installed it too:

sudo apt-get install id3tool

I wrote a shell script named podsplit.sh to put this splitting apart all together. I use a specific directory to hold the mp3 files I want to split (but I’ll call it a “folder” since that’s the GUI metaphor, and I use the GNOME GUI to move the files around). I manually copy the downloaded mp3 files into the 2Split folder and then open a terminal and run the script. The script creates a sub-folder for each mp3 file that is split. When the script is finished I copy the sub-folders containing the resulting smaller mp3 files to the Sansa mp3 player.

Here’s the shell script:

#!/bin/bash

#------------------------------------------------------------
# podsplit.sh
#
# by Bill Melvin (bogusoft.com)
#
# BASH script for splitting mp3 podcasts into smaller pieces.
# I want to do this because it takes "forever" to fast-
# forward or rewind in a huge mp3 on my Sansa player.
#
# This script requires mp3splt and id3tool.
#
# This script, being a personal-use one-off utility, also 
# assumes some things:
# 1. mp3 files to be split are placed in ~/2Split
# 2. The file names are in the format showname_0001.mp3
#    or showname_0001_morestuff.mp3 where 0001 is the 
#    episode number.
# 
# I'm no nix wiz and I don't write many shell scripts so 
# this script also echoes a bunch of stuff so I can see 
# what's going on. 
#
#------------------------------------------------------------
# [2009-01-18] First version. 
#
# [2009-01-24] Use abbreviated show name for Artist.
#
# [2009-02-12] Changed split time from 3.0 to 5.0.   
#
# [2009-02-16] Use track number instead of end-time in track 
# title.
#
# [2009-02-19] Redirect some output to log file.
#------------------------------------------------------------

split_home=~/2Split
logfn="${split_home}/podsplit-log.txt"

ChangeID3() {
  filepath=$1
  filename=$2

  # Get track number from ID3.
  temp=`id3tool "$filepath" | grep Track: | cut -c9-`
  
  # Zero-pad to length of 3 characters.
  track=`printf "%03d" $temp`
    
  # Extract the name of the show and the episode number from 
  # the file name. This only works if the file naming follows 
  # the convention showname_0001_morestuff.mp3 where 0001 
  # is the episode number. The file name is split into fields 
  # delimited by the underscore character.
  show=`echo $filename | cut -d'_' -f1`
  episode=`echo $filename | cut -d'_' -f2`
  abbr="${show:0:6}"
  album="${abbr}_${episode}"
  title="${abbr}_${episode}_${track}"

  echo "ChangeID3"
  echo "filepath = $filepath" &gt;&gt; $logfn
  echo "filename = $filename" &gt;&gt; $logfn
  echo "show = $show" &gt;&gt; $logfn
  echo "abbr = $abbr" &gt;&gt; $logfn
  echo "episode = $episode" &gt;&gt; $logfn
  echo "album = $album" &gt;&gt; $logfn
  echo "title = $title" &gt;&gt; $logfn
  echo "track = $track" &gt;&gt; $logfn
  echo "BEFORE" &gt;&gt; $logfn
  id3tool "$filepath" &gt;&gt; $logfn
  
  id3tool --set-album="$album" --set-artist="$abbr" --set-title="$title" "$1"
  
  echo "AFTER" &gt;&gt; $logfn
  id3tool "$filepath" &gt;&gt; $logfn
}

SplitMP3() {  
  echo "SplitMP3"
  name1=$1
  echo "name1 = $name1"
  
  # Get file name and extension without directory path.
  name2=${name1#$split_home/}
  echo "name2 = $name2"
  
  # Get just the file name without the extension.
  name3=${name2%.mp3}
  echo "name3 = $name3"

  outdir=$split_home/$name3.split
  echo "Create $outdir"
  mkdir "$outdir"

  mp3splt -a -t 5.0 -d "$outdir" -o @t_@n $1

  for MP3 in $outdir/*.mp3
  do
    ChangeID3 "$MP3" "$name3"
  done   
}

for FN in $split_home/*.mp3
do
  SplitMP3 "$FN"
done

echo "Done."

This is not a flexible script as my folder for splitting files is hard-coded and it assumes a file naming convention for the mp3 files being split. If you’re an experienced shell scripter I’m sure you can do better. I still consider myself a Linux “noob” (and offer proof as well), intermediate in some areas at best. I am posting this because someone else may be trying to solve a similar problem and this can serve as an example of what worked for one person, in one situation, to work around the limitations of one particular mp3 player. Someone less goofy would probably just buy an iPod and use iTunes to handle the podcast files.

The Handy Switch

I turned on the TV while eating lunch. CNN ran a little Earth Day segment. A question was asked about things you can do at the office to save energy. One of the items mentioned was that you can save about seven dollars worth of energy a month by turning off a printer. Besides reminding me that it’s Earth Day (which I would have already known were I truly GreenTM), it made me think about a simple device I have been using to stop some of the power leaks caused by devices that are plugged in all the time (many of which are on even when they’re “off”). So for my Earth Day submission I give you the GE Handy Switch.

HandySwitch01

There may be similar switches made by a company other than General Electric. It just happens that the ones I have found locally are the Handy Switch made by GE. They come in both three prong (grounded) and two prong versions. As you can see from the pictures, I like to draw a “0” and a “1” with a marker to make it easy to see at arm’s length which position is off or on.

HandySwitch02

If a device such as a printer can use about seven dollars (that number comes from the CNN story, I’ve not confirmed it in any way) worth of electricity a month then, at around five dollars each, these switches can quickly pay back their cost in energy savings. If you’re curious about how much electricity your devices are actually using you can get a meter such as the Kill A Watt to measure that.

HandySwitch04

I keep buying more of these over time because they seem to actually be handy. They are available at Wal-Mart and others stores, at least here in the US. I assume there are similar devices suited to the local electrical connector specifications available in other countries.

HandySwitch03

If we all do little things to save energy, like plug small electrical leaks, over time the savings will add up, in our individual electric bills and in the world’s energy use.

Foxit Reader Gets The Boot

I have used Foxit Reader for a while on several PCs for viewing PDF files. I really liked that it was a single fast and lightweight executable that did not require an installer. Lately I ran into some problems with it crashing on a PC running Windows Vista. Not sure whether the problem was due to it running on Vista or due to PDF files using newer features of the PDF format (I have not looked to see if there are new features of the PDF format), I decided to download the latest version of Foxit Reader.

I noticed some bad signs early on in the installation.

foxit reader 1

“Install Firefox plugin” looks like a reasonable option. Some people probably do want to view PDF files in the browser. I prefer to download them by default. I’ll just turn off that option.

foxit reader 2

When I clicked the check box to turn off the Firefox plugin the other options that I had turned off above became selected again. I tried it a couple times and it repeated. Seems like a bug to me. I turned off the Firefox plugin, and then turned off the other options above, and moved on to the next step…

foxit reader 3

Oh no. AdCrap! And some sneaky wording too. See how you might think you have to “accept the License Terms” not for the toolbar but for the whole app?

They also assume I want Ask.com to be my default search provider. Why didn’t I think of that? At least they didn’t assume I always want to go to their start page, but they did give me the option.

foxit reader 4

More scary red text. This is not looking good.

foxit reader 5

But wait, there’s more… double AdCrap! Already checked is “Create desktop, quick launch and start menu icon to eBay.” Since I am installing this software to read documents it only makes sense that I want to go to eBay often, and quickly!

Well, I went ahead and installed it – slowly – reading the whole dialog – you can’t just next, next, next anymore.

foxit reader 6

After I opened a document that the older Foxit Reader had crashed on, and it seemed to be working now, I thought I might keep the new version. Then the automatic update dialog popped. This was the last straw. Am I going to trust the folks who just tried to get me to install AdCrap on my computer to do automatic updates? No. I am not.

foxit reader 7

And, yes, I am sure I want to uninstall it.

Is Acrobat Reader looking better to me than it did before? Not really, but Foxit Reader now looks worse. Sad to say what was once a slick and unencumbered piece of software is now showing signs of desperation. If their business plan was to use the free PDF reader to gain name recognition and “mind share” to help them sell other products and services they should have stuck with that plan. Trying to squeeze some revenue out of free software by putting in AdCrap turns people off. At least that’s what it does to me.