Save your server! Detect taxonomy term loops in Drupal 6!

Date: Tue Dec 21 2010 Drupal 6 »»»» Drupal Taxonomy »»»» Drupal Planet

I dunno exactly how I did this, but the taxonomy hierarchy on one of my sites had a taxonomy loop. That server has been having periodic load overload that kills performance and eventually I realized an issue is this taxonomy loop. In taxonomy.module the function taxonomy_get_parents_all can get tripped up if your taxonomy hierarchy has a loop. I'd found one looped term by hand, and just wrote a little script to detect others.

What's a taxonomy loop? It's where the parent terms of the taxonomy term refer back on themselves. We may think of the taxonomy as having a tree shape, but there isn't enough preventative code in Drupal to prevent a term to say that one of its children is its parent. The term_hierarchy table is simply 'tid' and 'parent' with no sanity guarantee that the parent term referenced is actually a parent.

The observed behavior was that visiting a node that had the looped taxonomy term, that the page would never load, while other pages on the site would load perfectly.

I'm not sure how I ended up with the following (saw some indication that maybe old versions of taxonomy_manager could be used to create loops):

mysql> select * from term_data where tid = 1413;
+------+-----+------------------------+-------------+--------+---------------+
| tid  | vid | name                   | description | weight | resolved_guid |
+------+-----+------------------------+-------------+--------+---------------+
| 1413 |   3 | Renewable Energy Books |             |      0 | NULL          |
+------+-----+------------------------+-------------+--------+---------------+
1 row in set (0.01 sec)
mysql> select * from term_hierarchy where tid = 1413;
+------+--------+
| tid  | parent |
+------+--------+
| 1413 |   1413 |
+------+--------+
1 row in set (0.03 sec)

mysql> delete from term_hierarchy where tid = 1413;
Query OK, 1 row affected (0.06 sec)

To ensure this was the fix to your problem - find a node that uses that term, then visit its page before and after deleting the term_hierarchy entry. The behavior should change from the page never loading, to it loading correctly.

I've written a bit of code that can be executed with "drush php-script" that will inspect the taxonomy terms, print out a little report about them, including any detected loops. The 'safe_get_parents_all' function is derived directly from 'taxonomy_get_parents_all' with a little bit of checking thrown in.

<?php
$result = db_query('SELECT tid, name FROM {term_data}');
while ($term = db_fetch_object($result)) {
  drush_print( '>>> '. $term->name);
  $parents  = safe_get_parents_all($term->tid);
  $related  = taxonomy_get_related($term->tid);
  $synonyms = taxonomy_get_synonyms($term->tid);
  drush_print( $term->name .'('. $term->tid .'): parents='. count($parents) .' related='. count($related) .' synonyms='. count($synonyms));
}
function safe_get_parents_all($tid) {
    $parents = array();
    if ($tid) {
        $parents[] = taxonomy_get_term($tid);
        $n = 0;
        while ($parent = taxonomy_get_parents($parents[$n]->tid)) {
            foreach ($parent as $p) {
                if (! parents_has_term($parents, $p)) {
                    array_push($parents, $p);
                    //drush_print_r('... '. count($parents));
                } else {
                    drush_print_r('DETECTED LOOP tid='. $tid .' has parent '. $p->tid .' multiple times');
                    drush_print_r($parents);
                }
            }
            $n++;
        }
    }
    return $parents;
}

function parents_has_term($parents, $term) {
    foreach ($parents as $p) {
        if ($p->tid === $term->tid) {
            return TRUE;
        } else {
            return FALSE;
        }
    }
}