Reducing bandwidth use, decreasing page load times, for better Drupal user experience

Date: Thu Jan 14 2010 Apache Tricks »»»» Drupal Management »»»» Green Web Hosting »»»» Usability »»»» Drupal Planet
One of my sites has a high traffic load (2000 visits per day, over 6000 page views per day) and has been using up the bandwidth allotment on the shared hosting account where it's hosted. Concerns are that a large download per page would turn off visitors due to a long page load time, and also the environmental impact of excessive bandwidth usage. Initially the only measurement tool I had was the realtime bandwidth statistics provided by the hosting provider (see the screenshots below) and it was only later when Firebug became functional on firefox 3.5 and YSlow was then usable. It turns out that Firebug and YSlow were the critical tool for this project, and that by using the following tips my site went from a 'D' YSlow score to 'B' and has drastically reduced bandwidth use.

[toc]

Bandwidth charts

last30days.jpg last7days.jpg

This is the bandwidth charts over the last 30 days and 7 days of performing the changes. Note how the bandwidth use bounces around a bit in a daily cycle but is generally steady at 20kbytes per sec. This is the chart I ultimately want to reduce. Along the way I found some other things to decrease as well.

Aggregating CSS and JS files

The first steps are in the Performance admin settings to turn on CSS and JavaScript optimization and turn on caching. This didn't make any difference on the aggregate bandwidth, but both those steps do decrease the number of discrete downloaded objects. Normally a Drupal install will have lots of modules enabled, several of whom will supply their own CSS or Javascript files. When the browser has to individually load each of them it adds considerably to page load time even if the data size remains small because web browsers are unable to load more than two files at a time.

Aggregating CSS and JavaScript got YSlow to give a 'B' on making fewer HTTP requests. This is unable to become an 'A' because the modules on my site include some that require CSS or JS files stored on 3rd party servers.

A bit about YSlow use

To look at this using YSlow, look first at the 'Grade' tab and see that it breaks out each element of the overall grade as individual grades. A quick scan down the list shows the areas that need attention. Click on the Components tab to see details for each object required by the page. It shows a table broken down by the component type giving details about each one. This includes the uncompressed and compressed size, the URL, expires tags, etc. For most of this you'll be going between the Grade and Components tabs to see how to minimize the components to get a better grade.

Browser caching of files

YSlow might report "add Expires headers" because one thing YSlow looks for is a way to use browser side caching to minimize the data transferred. If every time the user visits your site they have to download each and every object all over again it is not only their time being wasted, it is your bandwidth charge being wasted. Perhaps it's obvious that the HTML for the overall page has to be downloaded each time the user visits the page, but for example it's common to have a set of CSS or JavaScript files referred from each page. Do those need to be downloaded every time? Nope. Those files only need to be downloaded on two occurrences, the first being their first visit to your site, and the second being if the files change. CSS and JavaScript files are unlikely to change so why make your visitors download them more than once?

The Expires header defines how long a given file can be cached in the browser. For files that rarely change setting an expiration in the distant future lets it be downloaded once and kept by the browser on the users' computer. YSlow's Components tab has a column showing the expiration date.

Optimizing page load times using mod_deflate, mod_expires, and ETag on Apache2 goes over use of mod_expires plus some other techniques.

In Drupal one example of an egregious situation improved by mod_expires are the editor widgets that can be used on the node add/edit form. My sites use BUEditor and it comes with about 10 little PNG files and without an Expires header each of these are loaded each time the node add/edit page is loaded.

In the grade for "Reduce HTTP Requests" YSlow might recommend to combine multiple images into sprites. The technique is to instead of load 10 small images to load one larger image, and use CSS and HTML techniques to address each subsection of the image. Unfortunately BUEditor's design does not lend itself to sprite usage. But installing an Expires header goes a long way towards solving this issue.

Turning on Expires support first requires ensuring mod_expires is enabled and then using these directives:-


ExpiresByType image/jpeg "access 30 days"
ExpiresByType image/gif  "access 30 days"
ExpiresByType image/png  "access 30 days"

The article linked above has more information about this.

Turning on compression

Compression -- Drupal side or Apache side or both? gives a very good overview of the way to enable page compression. The technique is to use a compression algorithm (such as GZIP) to compress the data sent from web server to the browser. It requires matching compression algorithms on each end and does require more CPU power to run the compression/decompression. Obviously the tradeoff is between CPU usage and bandwidth and page load times.

Note that Drupal's "Optimize CSS" and "Optimize JS" options under admin/settings/performance does not gzip. As I said above optimizing those files did not decrease total bandwidth use, but did decrease the number of individual files downloaded.

In the Components tab you can inspect compression levels by comparing the 'Size' and 'Size GZIP' columns.

Turning on compression increases CPU load on the server with the possibility that the server's CPU power would be used up performing compression. It's worth measuring whether it makes sense for your site whether to compress or simply use more bandwidth.

It is important to compress only once. Compressing a compressed file doesn't make an even smaller file, but instead makes a larger result. There are two places compression can be enabled, either in Drupal (using PHP's GZIP support) or in the web server using mod_deflate in Apache.

The CSS GZIP module can compress the CSS files, and Javascript Aggregator can compress JS files. However it's somewhat simpler to do it in Apache.

To use mod_deflate use something like the following which is adapted from the discussion linked above. Also see the documentation here: http://httpd.apache.org/docs/2.2/mod/mod_deflate.html


<ifmodule mod_deflate.c="">
  AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript application/x-javascript application/javascript
</ifmodule>

This enables compression for the given content types. Image files aren't being given compression because they are often already compressed, or the benefit is very small. To demonstrate the effect of compression is simple using the gzip command line tool to compress different files to see the effect.


% ls -l greenhouse-gas.jpg 
-rw-r--r--  1 davidherron  staff  44849 Aug 24 20:29 greenhouse-gas.jpg
% gzip greenhouse-gas.jpg 
% ls -l greenhouse-gas.jpg*
-rw-r--r--  1 davidherron  staff  43875 Aug 24 20:29 greenhouse-gas.jpg.gz
% ls -l screenshot-drupal.org.png 
-rw-r--r--  1 davidherron  staff  27127 Oct 29  2007 screenshot-drupal.org.png
% gzip screenshot-drupal.org.png 
% ls -l screenshot-drupal.org.png*
-rw-r--r--  1 davidherron  staff  27176 Oct 29  2007 screenshot-drupal.org.png.gz
% ls -l perm\ pavement.pdf 
-rw-r--r--@ 1 davidherron  staff  40381 Jul  5 19:05 perm pavement.pdf
% gzip perm\ pavement.pdf 
% ls -l perm\ pavement.pdf*
-rw-r--r--@ 1 davidherron  staff  37579 Jul  5 19:05 perm pavement.pdf.gz
% ls -l system.css 
-rw-r--r--  1 davidherron  staff  10020 Jan  9  2008 system.css
% gzip system.css 
% ls -l system.css*
-rw-r--r--  1 davidherron  staff  2859 Jan  9  2008 system.css.gz
% ls -l jquery.js 
-rw-r--r--  1 davidherron  staff  31089 Jun 25  2008 jquery.js
% gzip jquery.js 
% ls -l jquery.js*
-rw-r--r--  1 davidherron  staff  15710 Jun 25  2008 jquery.js.gz
% ls -l tabledrag.js 
-rw-r--r--  1 davidherron  staff  39171 Jun 18 05:24 tabledrag.js
% gzip tabledrag.js 
% ls -l tabledrag.js*
-rw-r--r--  1 davidherron  staff  10360 Jun 18 05:24 tabledrag.js.gz

Minifying CSS and JS files

Minification means to squeeze out all the white-space in a file so that it's semantically the same. The web browser doesn't care about human readability and is able to grok CSS or JS files that lack whitespace just as readily as it groks the human readable ones. Turning on the CSS and JS optimization in Drupal takes care of this issue.

Unfortunately some themes include some CSS or JS directly in the theme file. This CSS or JS does not get optimized and YSlow might complain about that. Sorry, there's not a lot you can do about that without hacking the theme.

Configuring ETags (entity tags)

YSlow complains my site doesn't have ETags and gives it a D on this score. I've tried reading the Apache documentation about this several times and it just doesn't make any sense. In Optimizing page load times using mod_deflate, mod_expires, and ETag on Apache2 he documents this method to turn off ETags and supposedly YSlow will shut up about this.


FileETag None

Use a Content Delivery Network (CDN)

CDN's are good for high traffic sites because it can distribute the files out to servers that are "near" the users (in terms of network topology). However CDN's are not within my budget of approximately $0, and further Drupal makes it nigh on impossible to use CDN's. There are some contributed modules to enable CDN usage and I've never used these. I'm simply ignoring the 'D' given to me by YSlow on this attribute.

Enabling parallel loading

As noted above web browsers are generally unable to load multiple files at the same time. However they can be tricked to do more through the use of multiple subdomains. The Parallel module makes this possible if you set up three subdomains of your domain. It farms out the file requests to these subdomains, enabling the browser to request more files at the same time. I haven't tried this. It is not a trick which will decrease bandwidth use, but it will decrease the page load time by allowing the browser to do more things in parallel.

Conclusion

There are other things YSlow measures but Drupal already handles most of them well.

Of the above items it is enabling compression that did the most good. I first turned on Expires headers 2-3 weeks before turning on compression, watched the bandwidth usage and did not see any noticeable decrease. However turning on compression gave an instant and dramatic decrease in bandwidth use.

after1hr.jpg

This was taken about 1 hour after enabling compression. You'll note the immediate dive in bandwidth use.

after2weeks.jpg

This is the 30 days view after two weeks. It's clear the immediate bandwidth decrease has held true over a long period. Further looking at Google Analytics data shows the number of page views has if anything increased (slightly) over the time period.

dailyafter2weeks.jpg

Finally this is the 7 day view taken two weeks after the change. This shows the general bandwidth use has dropped from 12-15 kbytes per second to 7 kbytes per second, and further the aggregate bandwidth use reported by the hosting provider is nowhere near being exhausted.