Getting OpenOffice.org CSV Data Into MySQL

Over the last few days, I've been working on getting data from CSV files to a database (related to Drupal), I got to write some PHP code to do it because... nobody else seemed to be doing what I was doing.

The scenario is this. Each row of the spreadsheet is related to fields in a table in the database, but each column item in the spreadsheet was to be mapped to a different row in the same table of the database. So, if the spreadsheet data looked like this to start:


1,2,3,
A,B,C,

The database table would look like this:

1,1,0
2,2,0
3,3,0
A,1,0
B,2,0
C,3,0

In the resulting database table, you'll note that the second row counts the column. The 0s are used to signify 'other data'. But here's the thing... in a world full of open source and Free Software, nobody seems to have created a way to allow people to map their data when importing. While it is easy to fumble with the spreadsheet to create a worksheet which has the data in the format one wants, even OpenOffice 2.0 gets hairy when dealing with large string manipulations - even with 512 megabytes of RAM sitting around (which had me angry about my 1 gigabyte 3 GHz machine being stripped after my father's death). But I got it done using PHP, and how I did that is what I'll write about.

Any programmer worth their weight in the programming language of their choice should be able to tackle this problem. Many people who call themselves programmers can't.

Because of time constraints, this ended up being more of a kludge than a hack; the deadline kept getting rolled forward by my client's customer. I tackled it in PHP because it was the available at the time; by and large a perl or Python script probably would have been better. Since I can't go into the details of the database - that information isn't for public consumption - I can go over how to handle such cases programmatically. And because I was doing Q&A beforehand, I had the script generate the MySQL INSERT statements so that I could review them before using them. Then I ran the script generated by my code in MySQL to enter the data.

A program writing a program is nothing new, but I don't think of SQL as a programming language. Some would disagree. It's really a very basic data description language, not too far removed from XML. It has additional commands that allow manipulation of the data. Generally speaking, SQL scripts are pretty stupid. Create. Insert. Modify. There's no 'Transmogrify', so you have to do that in preprocessing - which is all that I did.

The Generic Algorithm

When a program reads a CSV file, it reads in a text string which is delimited by commas. Almost all languages have a way of dealing with with CSV files, and if they don't - breaking up a string that is separated by commas is a pretty simple task.

In each row of the example data above, there are three data items which are mapped to separate rows in the database. Opening the input file and outputting to the screen are up to the programming language at hand; the answer for the data, as most seasoned programmers would immediately grasp, is about loop counters:

for (ii=0, ii<=NumberOfRows, ii++)
{
for (jj=0, jj<=3, jj++)

{
Read the data from the CSV file;
Generate MySQL INSERT statement with data from CSV,
Use value of jj for second row in output database,
Any additional rows added here.
}

}

That's really how simple it is. The devil is, as usual, in the debugging. If I had more time, I would have had the script connect to the database and insert the data directly - BUT - it was important data, and I wanted to review the generated SQL scripts before I ran them.

I've been thinking about how to do this generically, where the user selects which table data from every row in a spreadsheet goes to, with counters and so forth... and maybe I'll get around to coding something so that custom code isn't needed everytime. I'm honestly surprised that a clear solution for this has not been documented elsewhere... At a meta level, this is a matrix operation. Down in the trenches, it's all about loop control and string manipulation.

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Thanks for this article, I

Thanks for this article, I think thats waht Im looking for...

--
Staaten der Welt

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Lines and paragraphs break automatically.
  • Easily link to terms in various wikis. For help, see <a href="/interwiki/3">interwiki</a>.

More information about formatting options

CAPTCHA
Sorry, but you are required to have some math knowledge to use the internet.
4 + 11 =
Solve this simple math problem and enter the result. E.g. for 1+3, enter 4.

Syndicate content