The difference of length, size and count in ActiveRecord associations

Even experienced Ruby on Rails developers sometimes seems to forget the subtle difference between those methods.

The goal here is to count how many objects are there, persisted in database, for an association.

We have a few methods to do that, and all has pros and cons.

Given that:

class User < Account # let's forget I'm using STI in this example
  has_many :tracks

Let’s fire up a console and do some testing.

user = User.find(2)
  User Load (0.6ms)  SELECT `accounts`.* FROM `accounts` WHERE `accounts`.`type` IN ('User') AND `accounts`.`id` = 2 LIMIT

collection#count

user.tracks.count
 (5.6ms)  SELECT COUNT(*) FROM `tracks` WHERE `tracks`.`account_id` = 2
 => 3758

This method runs a query. More specifically a COUNT query, which is not bad if we need fresh data.

collection#length

user.tracks.length
 Track Load (299.5ms)  SELECT `tracks`.* FROM `tracks` WHERE `tracks`.`account_id` = 2
 => 3758

Uh.. Apparently this bad boy just loaded all the tracks for the given user in memory and counted them. Not good. Go check your code now if you are doing that.

collection#size

user.tracks.size
 => 3758

No query run. That is because in the accounts table I have set up a ‘tracks_count’ integer field and configured it to be used as counter_cache in Track model:

class Track < ActiveRecord::Base
 belongs_to :account, counter_cache: true # :account and not user because I'm using STI

Thoughts

Having a counter_cache column greatly improves the object count performance when we need this data because obviously it is cached directly in our table.

However it’s the way in which counter cache works that doesn’t makes me feel confortable at times, mainly because of the way it is calculated.

When ActiveRecord wants to update a counter cache it initializes the attribute to 0 if it’s NULL and then adds (+1) or subtract (-1) to the the actual value of this field.

While this may not be a bad thing per se I believe it doesn’t provide *real* data consistency.

In my eyes it would be much better, but with a performance penalty, to write out counter caches values that actually are the result of a fresh count.

In any case I kind of agree with ActiveRecord developers doing this +- game here, because it’s fairly easy to change this behaviour and code your custom counter caches with real SELECT COUNT(*) should you feel doing so.

I occasionally do that when I want to be absolutely sure the value has not been corrupted by a previous breakages, failed transactions or squirrels playing with power plugs.

The strategy you want to employ definitely should be tailored to the importance given to the counters in your project.

For example if those counters are used for billing I’d definitely go with a custom counter cache or another type of mechanisms (daily cron?) that guarantees the counter fields are always up to date.

It’s just another thing to be aware of and this post is a friendly reminder.

$1.99 domains with SSL purchase!

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> <pre lang="" line="" escaped="" highlight="">