Two ways to find Drupal taxonomy terms that have zero nodes (empty terms)

Date: Sun Oct 02 2011

Over the course of time you might add nodes that you later delete, and end up with taxonomy terms that have no nodes. These empty terms add overhead to your site, and add empty taxonomy listing pages. It's best to remove the empty terms.

I said "Overhead".. I don't know all the instances but have seen a couple places where all terms are loaded into memory. If your vocabulary includes empty terms (no nodes for the term) each empty vocabulary term makes for a bigger array that is loaded into memory.

The following are for Drupal 6.. haven't looked at how to do this with Drupal 7.

The first builds on this bit of SQL that's in the taxonomy_tree module.


SELECT COUNT(n.nid) AS c FROM term_node t 
INNER JOIN node n ON t.vid = n.vid 
WHERE n.status = 1 AND t.tid = %d

Turning it into a shell script :-


drush sqlq 'select tid from term_node' \
    | sed 1d \
    | uniq \
    | sed 's/.*/drush sqlq "SELECT t.tid, COUNT(n.nid) AS c FROM term_node t INNER JOIN node n ON t.vid = n.vid WHERE n.status = 1 AND t.tid = &"/' \
    | sh | grep -v tid

We first list all the terms irregardless of vocabulary. Because "drush sqlq" outputs, as the first line, a row giving the name of each column, so the "sed 1d" deletes that line. I noticed this often includes duplicate entries in the output, hence used "uniq" to remove duplicates. The last bit constructs a new "drush sqlq" command with the SQL to count the number of nodes for a given vocabulary term.

However there's a better way based on directly using the Drupal API.

<?php
$tree = taxonomy_get_tree(1);
foreach ($tree as $term) {
  $count = taxonomy_term_count_nodes($term->tid); 
  echo $term->name ." ". $term->tid ." ". $count ."\n";
  if ($count === 0) {
    taxonomy_del_term($term->tid);
  }
}

This code snippet takes the extra step of deleting any empty term. But let's take it line by line.

The "taxonomy_get_tree" function takes a vocabulary ID, and returns an array listing all the terms in the given vocabulary. It isn't a nested array, but a flat array, with each array entry listing its parent or related terms, meaning the vocabulary hierarchy is represented indirectly. You can get a list of vocabulary ID's this way:


drush sqlq 'select * from vocabulary;'

You can inspect the vocabulary tree using this command:-


drush eval 'print_r(taxonomy_get_tree(3));'

For our purpose it's enough to step across the array, operating on each term.

The "taxonomy_term_count_nodes" function gives us the number of nodes for a given term (as its function name implies). In fact that function contains an amazingly similar SQL statement to the one above.

It's simple enough that if the count is 0, then to use "taxonomy_del_term" to delete that empty term. If you instead want to list the empty terms, you can remove the call to taxonomy_del_term .. and do anything else you like.

Q.E.D.