Putting Shareasale datafeed information into the database

Date: Fri Jul 20 2007
The following is the method I use for inserting the product data from a Shareasale datafeed into the database. It is using the schema I described in another page.

The outline of the groovy script is this:



import java.io.*;
import groovy.sql.Sql;

// Open DB connection

def infile = new File(args[0]);

def db = Sql.newInstance('jdbc:derby:shareasale', 
    '', '', 'org.apache.derby.jdbc.EmbeddedDriver');

infile.each { 
    // ... process each dataline
}

// Shut down the database
def db2 = Sql.newInstance('jdbc:derby:;shutdown=true', 
    '', '', 'org.apache.derby.jdbc.EmbeddedDriver');


It's simply to open the database, for each line in the input file process it and insert into the database, and close the database. The meat of the script is the method for processing each line of input.


infile.each {     
    line = it.replaceAll(/YOURUSERID/, "####");
    def fields = line.split("\\|");
    
...
}

The first part of processing the line of input is to split it into fields based on the '|' character. It also replaces YOURUSERID with the affiliate ID.


infile.each {     
    line = it.replaceAll(/YOURUSERID/, "####");
    def fields = line.split("\\|");
    
    def item = [
        ProductID: fields[0],
        Name: fields[ 1],
        MerchantID: fields[ 2],
        Merchant: fields[ 3],
        Link: fields[ 4],
        Thumbnail: fields[ 5],
        BigImage: fields[ 6],
        Price: fields[ 7],
        RetailPrice: fields[8],
        Category: fields[ 9],
        SubCategory: fields[10],
        Description: fields[11],
        Custom1: fields[12],
        Custom2: fields[13],
        Custom3: fields[14],
        Custom4: fields[15],
        Custom5: fields[16],
        LastUpdated: fields[17],
        Status: fields.size() >= 19 ? fields[18] : "",
        Manufacturer: fields.size() >= 20 ? fields[19] : "",
        PartNumber: fields.size() >= 21 ? fields[20] : "",
        MerchantCategory: fields.size() >= 22 ? fields[21] : "",
        MerchantSubcategory: fields.size() >= 23 ? fields[22] : "",
        ShortDescription: fields.size() >= 24 ? fields[23] : "",
        ISBN: fields.size() >= 25 ? fields[24] : "",
        UPC: fields.size() >= 26 ? fields[25] : "",
        ];
...
}

Next, for convenience, I assign the individual fields into a groovy Map object. The advantage here is to make accessing fields more readable. Rather than accessing the field with code reading "fields[15]" you instead write item.Custom4". Which would you rather maintain?


infile.each {     
    line = it.replaceAll(/YOURUSERID/, "####");
    def fields = line.split("\\|");
    
    def item = [
        ProductID: fields[0],
        Name: fields[ 1],
        MerchantID: fields[ 2],
        Merchant: fields[ 3],
        Link: fields[ 4],
        Thumbnail: fields[ 5],
        BigImage: fields[ 6],
        Price: fields[ 7],
        RetailPrice: fields[8],
        Category: fields[ 9],
        SubCategory: fields[10],
        Description: fields[11],
        Custom1: fields[12],
        Custom2: fields[13],
        Custom3: fields[14],
        Custom4: fields[15],
        Custom5: fields[16],
        LastUpdated: fields[17],
        Status: fields.size() >= 19 ? fields[18] : "",
        Manufacturer: fields.size() >= 20 ? fields[19] : "",
        PartNumber: fields.size() >= 21 ? fields[20] : "",
        MerchantCategory: fields.size() >= 22 ? fields[21] : "",
        MerchantSubcategory: fields.size() >= 23 ? fields[22] : "",
        ShortDescription: fields.size() >= 24 ? fields[23] : "",
        ISBN: fields.size() >= 25 ? fields[24] : "",
        UPC: fields.size() >= 26 ? fields[25] : "",
        ];
...
    db.execute """
    INSERT INTO products (ProductID, Name, MerchantID, Merchant, Link,
        Thumbnail, BigImage, Price, RetailPrice, Category, SubCategory,
        Description, Custom1, Custom2, Custom3, Custom4, Custom5,
        LastUpdated, Status, Manufacturer, PartNumber,
        MerchantCategory, MerchantSubcategory, ShortDescription, ISBN, UPC
        )
        VALUES
        ( 
            ${item.ProductID}, ${item.Name}, ${item.MerchantID}, ${item.Merchant}, ${item.Link},
            ${item.Thumbnail}, ${item.BigImage}, ${item.Price}, ${item.RetailPrice}, ${item.Category}, ${item.SubCategory},
            ${item.Description}, ${item.Custom1}, ${item.Custom2}, ${item.Custom3}, ${item.Custom4}, ${item.Custom5},
            ${item.LastUpdated}, ${item.Status}, ${item.Manufacturer}, ${item.PartNumber},
            ${item.MerchantCategory}, ${item.MerchantSubcategory}, ${item.ShortDescription}, ${item.ISBN}, ${item.UPC}
        )
    """
}

The next part to this is entering the data into the database. This is simply to use an SQL INSERT INTO statement.

The part that's still missing is the processing in-between retrieving the product data from the feed file, and inserting into the database. As I said earlier this processing is unique to each merchant, because each merchant offers a differently mangled datafeed file. The rule to follow is that by the time of the INSERT INTO statement you must have the item object in the state you want it to be.

   
    if (item.Merchant =~ /UrbanScooters/) { cleanup_urbanScooters(item) }

My method is to first look to see which merchant offers this product. Based on the merchant the script calls a per-merchant cleanup function.


def cleanup_urbanScooters(item) {
    item.Category  = item.SubCategory;
    item.SubCategory = item.Custom1;
    if (item.Custom1 =~ /Scooters/) { item.Category = "Scooters" }    
}

Fortunately Urban Scooters gives a pretty reasonable datafeed file. But as you can see I like to change it slightly.


    if (item.Name =~ /Shogun/) { item.SubCategory = "Japanese" }
...
    if (item.Name =~ /NiCd.*Batteries/) {
        item.Category = "Batteries"
        item.SubCategory = "NiCad"
    }

These are examples of consulting the Name to determine better categorization.