Eliminating duplicates in Views 3 Drupal 6 when the "distinct" and "group by" features don't work

Date: Tue May 27 2014 Drupal Tutorial »»»» Drupal Views
Yes, Drupal 6 is water under the stream except for those of us still maintaining Drupal 6 websites. On one of my sites I'm trying to bring the module status into 2014, because they'd last been updated in 2012 and Views hadn't even been upgraded to Views 3. It was pretty straightforward to generate a Drush Make file to generate updated code for the site. A couple of the Views didn't get properly upgraded to Views 3, and in most cases minor tweaks were all that was needed. But one View had a major flaw - it generated lots of duplicated content.

Of course duplicate content is a side effect of some Views queries. Starting with Views 3 we've been able to specify "distinct" and "group by" attributes to the query, that turn into elements of the SQL SELECT statement ("SELECT DISTINCT(...) ... WHERE ... GROUP BY fieldName ..."). But when I made those settings in the View, the resulting SQL query didn't have the expected clauses, and therefore lots of duplicate values came out of the query.

What you're supposed to do is go to the Advanced Settings and work on the "Use grouping" choice, and the "Query settings". Between those two you can specify DISTINCT and GROUP BY support. But in the case of this View, that didn't work, and no I wasn't able to figure out why.

After a lot of yahoogling I found discussion about using different Views hooks to manipulate the Views query object in order to get it set up correctly. I've never been able to make heads or tails of this object, because as a user interface it's extremely non-user-friendly. I don't know of any place this object is documented, so when you export the View definition I see it as this magical piece of text that somehow magically tells Views what to do. But obviously it's something Views knows how to interpolate, and there might even be documentation for it. Somewhere.

In any case I did manage to write a Views hook, and it wasn't too terribly hard because the Views plugin code isn't that hard to read. These two options, and a bunch more, are controlled by code in views/plugins/views_plugin_query_default.inc (findable by grepping for "GROUP BY").

The next step is to create a module - in my case, the site already had a module for modifications to site behavior - that contains two things: a) hook_views_api implemention to tell Views this module has Views code, b) hook_views_query_alter implementation to alter the query. This hook is called just before the query array is converted into an SQL query that will be executed.

In MODULENAME/MODULENAME.module add this code:


/**
* Implementation of hook_views_query_alter().
*/
function MODULENAME_views_api() {
  return array(
    'api' => 3,
  );
}

This is just a signal to Views saying that Views code lives in this module. The code itself will be in MODULENAME.views.inc. Here's the code:


function MODULENAME_views_query_alter(&$view, &$query)
{
  if ($view->name == 'discussions') {  // change these 'name' and 'current_display' comparisons to match your needs
    if ($view->current_display == 'page_1' || $view->current_display == 'page_3') {
      $query->set_distinct(TRUE);
      $query->add_groupby('nid'); // Do this more than once if needed - specify the name of SQL table columns
      $query->has_aggregate = TRUE;
      // $query->groupby[0] = "nid";
    }
  }
  // These help you figure out what is in these objects - to use this, download and enable the Devel module
  // dpm($view);
  // dpm($query);
}

This worked fine, and being able to print the views object using the Devel module made it easier to know how to read the Views source.