Since the categorization is primarily being generated algorithmically, this leads us to an iterative process of generating the categorization, looking at the size of the resulting categories, and tweaking the categorization algorithms.
The following is a useful script to list the category/subcategory list along with a count of products in each one. You can readily see which categories have too many items in them, and it also shows you misplaced categories etc. The script also demonstrates how to use Groovy to make database queries and generate a report:
import groovy.sql.Sql;
def db = Sql.newInstance('jdbc:derby:shareasale', '', '', 'org.apache.derby.jdbc.EmbeddedDriver');
db.eachRow('SELECT DISTINCT Category, SubCategory FROM products ORDER BY Category') {
def cat = it.Category;
def subcat = it.SubCategory
try {
db.eachRow("SELECT COUNT(*) AS count FROM products WHERE Category = ${cat} AND SubCategory = ${subcat}") {
println cat+": "+subcat + " (${it.count})"
}
} catch (Exception e) {
println "SELECT ${cat}: ${subcat} produced ${e.toString()}"
}
}
Unfortunately I was unable to make the two database queries into one.