Simple Parsing Script

Posted by: barbara | Date: Apr 04, 2009 | Category: python    

A simple script to parse plain text files (in this case pipe-delimited), then write the rows to a table. Take a look at the sample text file - the fields in the first row should match the column headers.

CUSTOMERORDERS_20090401.txt

#!/usr/bin/python
# usage: ./parser.py CUSTOMERORDERS_20090401.txt
# don't forget to run chmod +x on this file

import fileinput, string, sys, re, MySQLdb as Database

filename = sys.argv[1]
tablename = filename[:-13]  # strip a few chars from the end of the filename arg - your mileage may vary

records = []

for line in fileinput.input( filename ):
    if fileinput.isfirstline():  # first row becomes col names
        columns = line.split('|')  # split on the delimiter

    if not fileinput.isfirstline():
        try:
            line[1]  # only append populated rows
            otherline = line.split('|')  # split on the delimiter
            records.append(otherline)
        except IndexError:
            print "emptyline"

db = Database.connect("host", "username", "password", "dbname")
cursor = db.cursor()

insertcolumns = ", ".join(columns)
for row in records:
    insertrow = "', '".join(row)
    sql = """INSERT INTO %s (%s) VALUES ('%s')""" %(tablename, insertcolumns, insertrow)
    cursor.execute(sql)
db.close()
Comment by Doug Hellmann on Apr 05, 2009:
As part of my ongoing efforts to get people to use the standard library, I'll point out that the csv module will parse the file if you set up a dialect: import csv csv.register_dialect('pipes', delimiter='|') with open(filename, 'r') as f: reader = csv.reader(f, dialect='pipes') for row in reader: print row It even deals with quoted values and embedded newlines. (I wasn't able to get the code to format cleanly as a comment, but you should get the idea. I'll add it to the PyMOTW article for csv this week.)