Cleaning up the data file

Date: Fri Jul 20 2007
Theoretically the CSV file format is very easy to use. But there have been a few problems with some files I've retrieved from shareasale.

The first is documented by Shareasale. You have to search the file for the string YOURUSERID and replace it by your Shareasale ID number. This number is shown at the top of the affiliate account manager pages.

But the second is very headscratching to me. Some of the datafiles have lines which are split over multiple lines. That is, in some cases it appears the merchant has given product descriptions which are split over multiple lines. And when put in the middle of a datafeed line, that causes the datafeed line to take multiple lines.

I use the following groovy script to process the file. It replaces YOURUSERID with my user ID, and makes sure the data feed lines are on one line.

File input = new File(args[0])

def lastline = ""

input.eachLine { 
    def line = it.replaceAll(/YOURUSERID/, "#####");
    if (line =~ /^[0-9]+/) {
        if (lastline != "") {
            println lastline
        lastline = line
    } else {
        //println "NONUMERIC " + line
        while (lastline != "" && (lastline[lastline.size()-1] == '\r' || lastline[lastline.size()-1] == '\n')) {
            lastline = lastline[0..lastline.size()-2]
        lastline += line

println lastline