Columbus Ruby Brigade – June 2010

The Columbus Ruby Brigade met at Quick Solutions on 21 June, 2010.

Mike Doel who works at VacationView gave a talk on Capybara (a giant rodent that occasionally eats its own poop) and Capybara ("Son of Webrat"). One virtue of Capybara is that it facilitates testing the JavaScript bits on your site which Webrat cannot do.

Alex Moore presented IronRuby. Some IronRuby performance and RubySpec stats are at ironruby.info.

Alex recommended the book IronRuby Unleashed by Shay Friedman and mentioned the not yet released IronRuby in Action by Ivan Porto Carrero and Adam Burmister.

After the meeting we stopped at the nearby Busty Rucket for a pint. I tried Lake Erie Monster from Great Lakes Brewing Co. and I have to say it was indeed a monster. Starts with a malty sweetness and finishes by biting your head off with some powerful hops. Not exactly my cup of tea, which is not surprising since it was a beer.

Columbus Ruby Brigade

Here is my linkdump from the May 2010 meeting of the Columbus Ruby Brigade:

The erubycon conference will be held Oct 1-3, 2010.

Ben Wagaman presented Core of the CoreReflection.

Greg Malcolm showed us ruby-debug (cheat sheet).

Kevin Munc presented Method of the Month (methods actually) empty?, nil?, blank?, and present?

Matt Forsythe gave a nice walkthrough on using regular expressions.

Rubular.com was also mentioned.

Elizabeth Naramore gave an enthusiastic presentation on Technical Writing featuring Giant Inflatable Poop.

Joe O’Brien recommended a book by Jerry Weinberg – Weinberg on Writing: The Fieldstone Method

Crowdsourced Internationalization

I listened recently to a couple (not so recent) episodes of the Startup Success Podcast where the topic was “crowdsourced” testing. In episode 20 Bob Walsh and Patrick Foley interviewed Dave Garr and Darrell Benatar, founders of UserTesting.com. In episode 22 they interviewed Matt Johnston from uTest. These are both interesting services that facilitate a kind of hands on testing that would otherwise be too expensive for smaller (not so well funded) companies, whether they’re startups or not.

This also got me thinking about translation and internationalization. Since these services enlist testers from around the globe they could provide testing of translated versions of an application. There are crowdsourced translation services as well. It seems to me that combining such a service with a separate user testing service that puts the translation in front of many more eyes of native speakers could result in higher quality translated versions of an application. In the case where an application is built on a (non-web) platform that these services do not support, it might be worth mocking up menus and forms as web pages simply to make use of crowsourced translation and testing services.

At this point I’m just thinking out loud. This is not something I have a use for today but I wanted to make a note here for future reference. If anyone reading this (not that I think anyone actually reads the Blue Cog Blog) has experience in this area I’d like to hear about it.

Git Resources

I have been learning to use Git. The following is a list of resources I found to be useful, interesting, or that I want to explore further as I get into Git:

Website: Git – Fast Version Control System – The home of Git. When you think source code management it’s only natural to picture a monster eating trees.

Book: Book – Pro Git – by Scott Chacon – Concise coverage of using Git. You can purchase the book or read the whole book online.

Book: Safari Books Online: Version Control with Git, 1st Edition

Tool: msysgit – Run Git on Windows from a specialized BASH prompt.

[Update 2010-07-03: Changed the order of the list so the resources I have used the most are above this note.]

Website: GitHub – Secure Git hosting and collaborative development

Video: Webcast: Git in One Hour – Scott Chacon shows a lot of what he covers in his book in this screencast.

Video: James Gregory on GitJames Gregory does a screencast on Git as well.

Website: git ready – learn git one commit at a time

Article: An introduction to git-svn for Subversion/SVK users and deserters

Article: scie.nti.st – Hosting Git repositories, The Easy (and Secure) Way – Gitosis.

Article: Deploying A Web Application with Git and FTP – Rob Conery shows one way he uses Git.

Article: Git For Windows Developers – Git Series – Part 1 – Jason Meridth – Los Techies – Describes using msysgit.

Article: Branch-Per-Feature – How I Manage Subversion With Git Branches – Los Techies

Article: Git's guts: Branches, HEAD, and fast-forwards – James Gregory's Blog – Los Techies

Article: Martin Fowler – Version Control Tools – Not about Git specifically.

Article: ReinH – A Git Workflow for Agile Teams

Article: Jer on Rails – My Git Workflow

Article: JustinFrench.com – Git Aliases Rock

Article: GitHub – Guides – Put your git branch name in your shell prompt

Article: A Note About Git Commit Messages | tpope.net

Article/Tool: Michael Bien's Weblog – NetBeans GIT support – I have not tried the NBGit plugin yet but I have been playing with NetBeans a bit.

Podcast: Hanselminutes Podcast 108 – Exploring Distributed Source Control with Git

Tool: git_remote_branch

Tool: tortoisegit – Maybe like TortoiseSVN. I have not tried it.

Tool: EGit – Git plugin for Eclipse. I have not tried it.

Keith Hill’s Effective PowerShell Series

I’ve been aware of PowerShell for some time now but I haven’t had the need to use it much. As one who has written many batch files over the years I want to be ready to take that sort of automation to the much higher level PowerShell makes possible.

Windows PowerShell MVP Keith Hill's Blog is a great resource for learning PowerShell. He has written a series of posts titled "Effective PowerShell" and combined them into Effective Windows PowerShell: The Free eBook as well.

I just ran across these today and I look forward to exploring each in the series.

COhPy Meeting – December 2009

Here is my link dump from last night’s meeting of the Central Ohio Python Users Group:

The scheduled presenter, Brian Costlow, didn’t make it. Something about work being more important than a Python meeting. Priorities?

To fill the void, Eric Floehr showed a weather-related web application he has been working on that is built with Django. The app uses HTMLCalendar (Django, calendar – Stack Overflow).

Mark Erbaugh showed the web application he built using web.py. He also uses ReportLab.org to generate PDF files.

I had not run across this before: 29.2. zipimport – Import modules from Zip archives.

Catherine Devlin presented reStructuredText, S5, and Sphinx.

A few related links:
reStructuredText on Wikipedia
Quick reStructuredText
Easy Slide Shows With reST & S5
reStructuredText Primer — Sphinx v0.6.3 documentation

Catherine also mentioned:
PyCon 2010 Atlanta – A Conference for the Python Community
Python Package Index : PyPI, AKA the Cheese Shop

Also discussed was the construction of the COhPy web site:
Code at cohpy — bitbucket.org.
Using Google App Engine.

Finally, I haven’t used decorators in Python (nor in my house) but I’d like to read up on that:
PEP 318 — Decorators for Functions and Methods
Dr. Dobb's – Python 2.4 Decorators

CbusPASS – November 2009

Last night I attended the CbusPASS (that’s the Columbus chapter of the Professional Association for SQL Server, aka the Columbus SQL Server Users Group) meeting. I’m not using SQL Server much these days so the take home value isn’t immediate for me. I’m interested in databases in general, I have used SQL Server in the past, and I expect I will use it even more in the future so I do enjoy these meetings. The remote presentation almost failed due to audio problems but fortunately a member of the group had a notebook PC that worked for both audio and video. Tim Ford presented on SQL Server Dynamic Management Views and Dynamic Management Functions. What follows is basically a link dump from my notes:

Group leader: Jeremiah Peschka, SQL Server Developer
Jeremiah Peschka (peschkaj) on Twitter

Tim Ford’s web site SQLAgentMan
Tim Ford (sqlagentman) on Twitter

Tim writes for MSSQLTips.com, among other things.

Tim said he will post the slides and examples from the presentation at SpeakerRate.

MSDN: Dynamic Management Views and Functions

SQLTeam: Dynamic Management Views

SQLTeam: SQL Server – Find missing and unused indexes

MSDN: Reorganizing and Rebuilding Indexes

Backup your Resource Database.

There was discussion after the meeting about PowerPivot, previously code named Project Gemini, and the PowerPivotPro site.

Thoughts on TDD

I want to know what the buzz around Test-Driven Development is all about. The proponents make a good case for it. I’ve read about it a good bit but have never worked with anyone who practiced it. I also tend to be leery when I see people becoming religiously fanatical about anything, and some of what I read sounds, if not fanatical, at least unrealistic. Nonetheless there are a lot of sensible voices promoting TDD as a way to build better, more maintainable software. That sounds good to me.

I have tried to use TDD on several small projects to get a feel for it. I am recording my (somewhat random) thoughts on it here for future reference:

  • I want to know that TDD really helps and works and that it’s not just a smartypants thing.
  • I’m not good at it yet. I haven’t done enough TDD (but I still have thoughts on it).
  • It’s not a panacea. Get your silver bullet here.
  • It IS helpful when refactoring.
  • I think there probably were fewer bugs in the first live versions because of the tests.
  • It can be a trap if you’re not careful to keep the end result of the system in mind. Seems easy to focus too much on writing tests. I feel a tendency toward myopia when writing a lot of tests before doing any integration.
  • It doesn’t replace the need to look at the results of your code (for example, writing results to text files, or CSV files to be imported into a spreadsheet to look at, visualize with your brain).
  • You’ll never write enough tests to catch every possible failure.
  • The wrong algorithm with 100% test coverage is still the wrong algorithm.
  • It doesn’t replace manual or automated acceptance testing.
  • It will take longer up front to build a system using TDD. It may lead to a more correct system at the first release. It may save time later when the system needs to change.
  • You will throw away more code. If you find that some code isn’t needed and remove it, that code may have multiple tests associated with it that will also be removed. Maybe this isn’t bad since we’re supposed to be throwing away the prototype but often the prototype ends up turning directly into the production version. Perhaps this is a way of throwing away the prototype a little piece at a time. But seriously…
  • I don’t like the idea of significantly changing or adding complexity to the architecture of a system solely to make it more amenable to unit testing. Maybe it’s worth it.
  • It’s easier to do unit testing when using dynamic languages.
  • You’re more likely to need those tests when using dynamic languages.
  • I’m not sold on writing tests as THE way to drive development, but then, like I said, I’m not good at it yet.
  • Having unit tests is good. Regardless of whether tests are in the driver seat I plan to take advantage of automated unit testing and automated acceptance testing.
  • At this point it seems unlikely that I’ll adopt test-first style TDD as my preferred method for building software (or as a religion) but I’m still going to try to do it as a first approach. I’ll also be willing to abandon it without regrets if it becomes cumbersome for the job at hand.
  • Finally, it will be good to know how to do TDD in case someday I end up with a boss who says I have to. BTW: If that’s the reason you’re doing TDD you’re probably doing it for the wrong reason.

That’s enough for now. I also plan to look into Behavior-Driven Development (AKA smartypants TDD) and the associated tooling.

Pair Networks Database Backup Automation

I have a couple WordPress blogs, this being one of them, hosted at Pair Networks. I also have another non-blog site that uses a MySQL database. I have been doing backups of the databases manually through Pair’s Account Control Center (ACC) web interface on a somewhat regular basis, but it was bugging me that I hadn’t automated it. I finally got around to doing so.

A search led to this blog post by Brad Trupp. He describes how to set up an automated database backup on a Pair Networks host. I used “technique 2” from his post as the basis for the script I wrote.

Automating the Backup on the Pair Networks Host

First I connected to my assigned server at Pair Networks using SSH (I use PuTTY for that). There was already a directory named backup in my home directory where the backups done through the ACC were written. I decided to use that directory for the scripted backups as well.

In my home directory I created a shell script named dbbak.sh.

touch dbbak.sh

The script should have permissions set to make it private (it will contain database passwords) and executable.

chmod 700 dbbak.sh

I used the nano editor to write the script.

nano -w dbbak.sh

The script stores the current date and time (formatted as YYYYmmdd_HHMM) in a variable and then runs the mysqldump utility that creates the database backups. The resulting backup files are simply SQL text that will recreate the objects in a MySQL database and insert the data. The shell script I use backs up three different MySQL databases so the following example shows the same.

#!/bin/sh

dt=`/bin/date +%Y%m%d_%H%M`

/usr/local/bin/mysqldump -hDBHOST1 -uDBUSERNAME1 -pDBPASSWORD1 USERNAME_DBNAME1 > /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME1.sql

/usr/local/bin/mysqldump -hDBHOST2 -uDBUSERNAME2 -pDBPASSWORD2 USERNAME_DBNAME2 > /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME2.sql

/usr/local/bin/mysqldump -hDBHOST3 -uDBUSERNAME3 -pDBPASSWORD3 USERNAME_DBNAME3 > /usr/home/USERNAME/backup/dbbak_${dt}_DBNAME3.sql

Substitute these tags in the above example with your database and account details:

  • DBHOST is the database server, such as db24.pair.com.
  • DBUSERNAMEn is the full access username for the database.
  • DBPASSWORDn is the password for that database user.
  • USERNAME_DBNAMEn is the full database name that has the account user name as the prefix.
  • USERNAME is the Pair Networks account user name.
  • DBNAMEn is the database name without the account user name prefix.

Once the script was written and tested manually on the host, I used the ACC (Advanced Features / Manage Cron jobs) to set up a cron job to run the script daily at 4:01 AM.

Automating Retrieval of the Backup Files

It was nice having the backups running daily without any further work on my part but, if I wanted a local copy of the backups, I still had to download them manually. Though FileZilla is easy to use, downloading files via FTP seemed like a prime candidate for automation as well. I turned to Python for that. Actually I turned to an excellent book that has been on my shelf for a few years now, Foundations of Python Network Programming by John Goerzen. Using the ftplib examples in the book as a foundation, I created a Python script named getdbbak.py to download the backup files automatically.

#!/usr/bin/env python
# getdbbak.py

from ftplib import FTP
from datetime import datetime
from DeleteList import GetDeleteList
import os, sys
import getdbbak_email

logfilename = 'getdbbak-log.txt'
msglist = []

def writelog(msg):
    scriptdir = os.path.dirname(sys.argv[0])
    filename = os.path.join(scriptdir, logfilename)
    logfile = open(filename, 'a')
    logfile.writelines("%sn" % msg)
    logfile.close()

def say(what):
    print what
    msglist.append(what)
    writelog(what)

def retrieve_db_backups():
    host = sys.argv[1]
    username = sys.argv[2]
    password = sys.argv[3]
    local_backup_dir = sys.argv[4]
    
    say("START %s" % datetime.now().strftime('%Y-%m-%d %H:%M'))
    say("Connect to %s as %s" % (host, username))

    f = FTP(host)
    f.login(username, password)

    ls = f.nlst("dbbak_*.sql")
    ls.sort()
    say("items = %d" % len(ls))
    for filename in ls:
        local_filename = os.path.join(local_backup_dir, filename)
        if os.path.exists(local_filename):
            say("(skip) %s" % local_filename)
        else:
            say("(RETR) %s" % local_filename)
            local_file = open(local_filename, 'wb')
            f.retrbinary("RETR %s" % filename, local_file.write)
            local_file.close()
            
    date_pos = 6
    keep_days = 5
    keep_weeks = 6
    keep_months = 4    
    del_list = GetDeleteList(ls, date_pos, keep_days, keep_weeks, keep_months)
    if len(del_list) > 0:
        if len(ls) - len(del_list) >= keep_days:
            for del_filename in del_list:
                say("DELETE %s" % del_filename)
                f.delete(del_filename)
        else:
            say("WARNING: GetDeleteList failed sanity check. No files deleted.")
    
    f.quit()
    say("FINISH %s" % datetime.now().strftime('%Y-%m-%d %H:%M'))
    getdbbak_email.SendLogMessage(msglist)


if len(sys.argv) == 5:
    retrieve_db_backups()
else:
    print 'USAGE: getdbbak.py Host User Password LocalBackupDirectory'

This script runs via cron on a PC running Ubuntu 8.04 LTS that I use as a local file/subversion/trac server. The script does a bit more than just download the files. It deletes older files from the host based on rules for number of days, weeks, and months to keep. It also writes some messages to a log file and sends an email with the current session’s log entries.

To set up the cron job in Ubuntu I opened a terminal and ran the following command to edit the crontab file:

crontab -e

The crontab file specifies commands to run automatically at scheduled times. I added an entry to the crontab file that runs a script named getdbbak.sh at 6 AM every day. Here is the crontab file:

 
MAILTO="" 

# m h dom mon dow command 

0 6 * * * /home/bill/GetDbBak/getdbbak.sh 

The first line prevents cron from sending an email listing the output of any commands cron runs. The getdbbak.py script will send its own email so I don’t need one from cron. I can always enable the cron email later if I want to see that output to debug a failure in a script cron runs.

Here is the getdbbak.sh shell script that is executed by cron:

 
#!/bin/bash 

/home/bill/GetDbBak/getdbbak.py FTP.EXAMPLE.COM USERNAME PASSWORD /mnt/data2/files/Backup/PairNetworksDb 

This shell script runs the getdbbak.py Python script and passes the FTP login credentials and the destination directory for the backup files as command line arguments.

As I mentioned, the getdbbak.py script deletes older files from the host based on rules. The call to GetDeleteList returns a list of files to delete from the host. That function is implemented in a separate module, DeleteList.py:

#!/usr/bin/env python
# DeleteList.py

from datetime import datetime
import KeepDateList


def GetDateFromFileName(filename, datePos):
    """Expects filename to contain a date in the format YYYYMMDD starting 
       at position datePos.
    """   
    try:
        yr = int(filename[datePos : datePos + 4])
        mo = int(filename[datePos + 4 : datePos + 6])
        dy = int(filename[datePos + 6 : datePos + 8])
        dt = datetime(yr, mo, dy)
        return dt
    except:
        return None
 

def GetDeleteList(fileList, datePos, keepDays, keepWeeks, keepMonths):
    dates = []
    for filename in fileList:
        dt = GetDateFromFileName(filename, datePos)
        if dt != None:
            dates.append(dt)
    keep_dates = KeepDateList.GetDatesToKeep(dates, keepDays, keepWeeks, keepMonths)        
    del_list = []
    for filename in fileList:
        dt = GetDateFromFileName(filename, datePos)
        if (dt != None) and (not dt in keep_dates):
                del_list.append(filename)    
    return del_list

That module in turn uses the function GetDatesToKeep defined in the module KeepDateList.py to decide which files to keep on order to maintain the desired days, weeks, and months of backup history. If a file’s name contains a date that’s not in the list of dates to keep then it goes in the list of files to delete.

#!/usr/bin/env python
# KeepDateList.py

from datetime import datetime


def ListHasOnlyDates(listOfDates):
    dt_type = type(datetime(2009, 11, 10))
    for item in listOfDates:
        if type(item) != dt_type:
            return False
    return True
    

def GetUniqueSortedDateList(listOfDates):
    if len(listOfDates) < 2:
        return listOfDates
    listOfDates.sort()
    result = [listOfDates[0]]
    last_date = listOfDates[0].date()
    for i in range(1, len(listOfDates)):
        if listOfDates[i].date() != last_date:
            last_date = listOfDates[i].date()
            result.append(listOfDates[i])
    return result
    
    
def GetDatesToKeep(listOfDates, daysToKeep, weeksToKeep, monthsToKeep):
    if daysToKeep < 1:
        raise ValueError("daysToKeep must be greater than zero.")
    if weeksToKeep < 0:
        raise ValueError("weeksToKeep must not be less than zero.")
    if monthsToKeep  0) and (tail > 0):
        tail -= 1
        days_left -= 1
        keep.append(dates[tail])
        
    year, week_number, weekday = dates[tail].isocalendar()
    weeks_left = weeksToKeep
    while (weeks_left > 0) and (tail > 0):
        tail -= 1
        yr, wn, wd = dates[tail].isocalendar()
        if (wn  week_number) or (yr  year):
            weeks_left -= 1
            year, week_number, weekday = dates[tail].isocalendar()
            keep.append(dates[tail])
        
    month = dates[tail].month
    year = dates[tail].year
    months_left = monthsToKeep
    while (months_left > 0) and (tail > 0):
        tail -= 1
        if (dates[tail].month  month) or (dates[tail].year  year):
            months_left -= 1
            month = dates[tail].month
            year = dates[tail].year
            keep.append(dates[tail])
        
    return keep

I also put the function SendLogMessage that sends the session log via email in a separate module, getdbbak_email.py:

#!/usr/bin/env python
# getdbbak_email.py

from email.MIMEText import MIMEText
from email import Utils
import smtplib

def SendLogMessage(msgList):
    from_addr = 'atest@bogusoft.com'
    to_addr = 'wm.melvin@gmail.com'
    smtp_server = 'localhost'
    
    message = ""
    for s in msgList:
        message += s + "n"

    msg = MIMEText(message)
    msg['To'] = to_addr 
    msg['From'] = from_addr 
    msg['Subject'] = 'Download results'
    msg['Date'] = Utils.formatdate(localtime = 1)
    msg['Message-ID'] = Utils.make_msgid()

    smtp = smtplib.SMTP(smtp_server)
    smtp.sendmail(from_addr, to_addr, msg.as_string())

Here is a ZIP file containing the set of Python scripts, including some unit tests (such as they are) for the file deletion logic: GetDbBak.zip

I hope this may be useful to others with a similar desire to automate MySQL database backups and FTP transfers who haven’t come up with their own solution yet. Even if you don’t use Pair Networks as your hosting provider some of the techniques may still apply. I’m still learning too so if you find mistakes or come up with improvements to this solution, please let me know.

CONDG Meeting – August 2009

I enjoyed the presentation by Bill Sempf at this month’s Central Ohio .NET Developers Group even if it was a little disorganized. I think Mr. Sempf and I share a certain scatterbrain quality though his achievements point to an ability to focus deeply when needed. He’s like smarter more extroverted version of the Bill writing this post. And we share a similar hairline.

Bill discussed some of the changes coming in C# 4.0 and how some of the smaller changes that started in C# 3.0 are part of a larger strategy to make things like LINQ possible, and make COM interop work more smoothly. He also pointed out some additions to Visual Studio that may be helpful when doing Test Driven Development. Visual Studio has been rewritten in WPF for version 2010 so go get more RAM.

A lot of the changes to C# are to help it compete with dynamic languages like Ruby and Python, and to make it not suck for automating Microsoft Office. Oh no. It’s VB with braces. At least it doesn’t have DIMs and SUBs.

Pizza was provided by Information Control Corporation (ICC), a company based in Columbus that (if I heard right) Bill Sempf works with as a consultant. ICC has released an open source framework called MVC4WPF. Thanks for the framework, and the pizza.