Canonical Entity representation for Drupal, sounds like it could fix one core critique I have of Drupal

Date: Tue Dec 20 2011 Drupal Planet
This project just popped up on my radar - Canonical Entity Representation (see http://groups.drupal.org/node/197588 and http://groups.drupal.org/node/197583). The description of this sounds like it would resolve one of the key critiques I have against Drupal. Namely the lack of a import/export format that's part of Drupal Core. I'd like to call some attention to this and hopefully, as the two posts above suggests, this can be a standard part of Drupal starting with v8.

The announcement I saw said: A discussion in the WSCCI and Services groups is looking to standardize how we represent nodes and other entities in JSON or XML form.

They offer these justifications:

  1. Including entities in exported configuration, or in configuration files.
  2. Taking a content snapshot in some form other than an SQL dump file (which, you know, kinda sucks for mose uses).
  3. Transferring a node from one site to another for content sharing purposes.
  4. Aggregating content from many sites together for improved searching and cataloging.
  5. Exposing Drupal content to other non-Drupal systems. This is made easier by using non-Drupal-specific formats.

This is great. This is one of the things in Drupal which has rubbed me the wrong way for years. That there isn't a standard export/import. It's all great and wonderful that the data we've stored in our Drupal websites is in a database, that everything is documented and visible as open source software, but the situation we have is one where our data is entrapped inside Drupal. Yes it's possible to write code to extract our data from Drupal, but that introduces a lot of friction into what should be a frictionless-no-effort operation. Namely, exporting or importing content to/from a Drupal site.

An example of an attitude that I believe is correct is Google's Data Liberation Front - http://www.dataliberation.org/ - where their goal is "to make it easier for users to move their data in and out of Google products."

One of the things that website suggests is -

... we always encourage people to ask these three questions before starting to use a product that will store their data:
  1. Can I get my data out in an open, interoperable, portable format?
  2. How much is it going to cost to get my data out?
  3. How much of my time is it going to take to get my data out?

With the ideal answers being:

  1. Yes.
  2. Nothing more than I'm already paying.
  3. As little as possible.

Drupal is a "product" (well, open source content management system, the word "product" isn't quite right) that stores our data to display on websites. If we apply those questions to Drupal it would be:-

  1. Not really.
  2. Depends on whether you're a coder. You may have to hire a programmer.
  3. Possibly quite a lot of time.

That much covers export of data from Drupal. That is, you can write some custom code to export Drupal's data in any format you like. And there are some modules that exist that might do what you want, but probably do not.

The other side of the equation is importing data. Again there are sometimes modules to do this, but often they're not really all that helpful.

Finally there's a couple kinds of data to be concerned about. Some of it is the configuration settings and there has already been quite a bit of work on that front, because some shops have multi-stage deployment practices and export modules (features) and configuration data from development to staging to production servers. But what about the content?

For an example of a "product" where the Data Liberation guys have done their magic, consider the Blogger platform. I have a few blogs hosted with Blogger and the other day I wanted to merge the content from several of these blogs together to form one blog. If these blogs were on Drupal I'd be saying UGH and frustratedly either launch into a conversion script, or else just throw up my hands and go on to some other project that has an actual chance of being done. But because of the Data Liberation guys this was trivial on Blogger.

The steps were:-

  1. Create new blog
  2. For each of the old blogs, export the content using the export button in the blog settings
  3. On the new blog, import each of the exported files

It was trivial and even preserved the URL's so that setting up redirects were real simple.