Simple Parsing Script
A simple script to parse plain text files (in this case pipe-delimited), then write the rows to a table. Take a look at the sample text file - the fields in the first row should match the column headers.
#!/usr/bin/python
# usage: ./parser.py CUSTOMERORDERS_20090401.txt
# don't forget to run chmod +x on this file
import fileinput, string, sys, re, MySQLdb as Database
filename = sys.argv[1]
tablename = filename[:-13] # strip a few chars from the end of the filename arg - your mileage may vary
records = []
for line in fileinput.input( filename ):
if fileinput.isfirstline(): # first row becomes col names
columns = line.split('|') # split on the delimiter
if not fileinput.isfirstline():
try:
line[1] # only append populated rows
otherline = line.split('|') # split on the delimiter
records.append(otherline)
except IndexError:
print "emptyline"
db = Database.connect("host", "username", "password", "dbname")
cursor = db.cursor()
insertcolumns = ", ".join(columns)
for row in records:
insertrow = "', '".join(row)
sql = """INSERT INTO %s (%s) VALUES ('%s')""" %(tablename, insertcolumns, insertrow)
cursor.execute(sql)
db.close()
Comment by
Doug Hellmann
on Apr 05, 2009:
As part of my ongoing efforts to get people to use the standard library, I'll point out that the csv module will parse the file if you set up a dialect:
import csv
csv.register_dialect('pipes', delimiter='|')
with open(filename, 'r') as f:
reader = csv.reader(f, dialect='pipes')
for row in reader:
print row
It even deals with quoted values and embedded newlines.
(I wasn't able to get the code to format cleanly as a comment, but you should get the idea. I'll add it to the PyMOTW article for csv this week.)

