Maybe it is time to give this old feature another chance.
I’ve been working on Rails apps since early in version 3. I remember my first introduction to fixtures being tied to learning about testing via RSpec and how to create data in the database for your tests.
Everything I was reading at the time recommended FactoryBot over the test fixtures included in Rails. Thoughtbot, the company behind FactoryBot, has really wonderfully presented arguments for using factories. They employ vocabulary I had learned from one of my earliest Rails heroes, Jim Weirich. The Betterspecs site recommends against using fixtures. Even one of the CI platforms I’ve used has a blog post siding with factories over fixtures.
Factories Are the de Facto Standard
No one was framing it so harshly, but I came away with the impression that fixtures were universally hated. Across at least six different rails apps and a decade of time, none of the coworkers, conference presentations or blog posts that helped to shape my worldview have ever challenged that impression. Factories have been the de facto standard.
Fixtures, with their small mention in Rails Guides and easily missed page in the Rails API docs, seem to have largely accepted defeat. Except fixtures remain a default part of the Rails framework. I’ve been using factories all this time, often running into problems with difficult and complicated solutions. Maybe I’m using factories wrong, or maybe it is time to give fixtures a try.
Have We Thrown the Baby Out With the Bathwater?
Stable ID generation
Rails fixtures use YAML with the top level as a hash, so every record must have a unique key. These keys are converted to id numbers in a repeatable way.
[1] pry(main)> require 'active_record/fixtures'
=> true
[2] pry(main)> ActiveRecord::FixtureSet.identify(:george)
=> 380982691
[3] pry(main)> ActiveRecord::FixtureSet.identify(:george)
=> 380982691
[4] pry(main)> ActiveRecord::FixtureSet.identify(:reginald)
=> 41001176
[5] pry(main)> ActiveRecord::FixtureSet.identify(:george)
=> 380982691
Not a lot to digest here. If you pass in the same name, you always get the same number. This is only vaguely useful in finding records when debugging your tests, but underpins my favorite feature.
Foreign key references by name
Those same keys are used to reference associations. Belongs to, has many, and even the more complicated has many through or polymorphic associations all work. The foreign key values are simply generated from the names using the same algorithm as primary keys. References don’t need to reside in the same file, namespace, or run the same block of code. You can specify id values and opt in to tighter coupling by value, but the path of least resistance is by name.
This really starts to shine the more references you have. This is also precisely where factories begin to make life more difficult. In particular hierarchies of factories:
Foo has many Bar
Bar has many Baz
And so on…
Reasoning about when a factory will invoke an associated factory or dealing with associations that should be shared between instances created by the factory gets more and more difficult. In cases where your site navigation relies on your model hierarchy you’ll end up using all these factories in a lot of different tests.
Fixtures simply opt out of the difficult parts. When a fixture runs it first turns off foreign key constraints, inserts the records with generated IDs and turns the constraints back on.
Readability
Fixtures are data in YAML format. One file per table. Each record has a unique key, each attribute has a named key. The structure is fairly easy to follow.
Since we’re dealing with a data format, the data has to be written out literally. No random generation via faker, no branching, no looping. Not having full access to a Turing-complete language means you don’t have to use a Turing-complete thought process to read your data.
The whitespace formatting of YAML makes reading changes in diff format a breeze. Quoting and bracing syntax is minimal.
The fixture files are invoked simply a table at a time rather than a record at a time further reducing the cognitive load at the call site.
ERB syntax
Remember how I said fixtures don’t make you read Turing-complete language? Yeah, that was great, but they run the YAML files through the same ERB templating engine as the view layer uses.
This gives you the full power (and responsibility) of ruby programming, but you have to opt in. You can build a loop that creates a thousand records to test your pagination. You can add an attribute with an ERB tag to run faker to make sure your record isn’t blocked by a unique constraint in the database.
ERB syntax introduces friction to using loops or randomization like Faker. There when you need it, but not convenient enough that you will want to make a mess by overusing it.
Off Label Uses
Fixtures were clearly built with testing in mind, but what we have here is a serialized data format run through a template engine and creating records in a database. Maybe this could be applied to other problems.
Development seed data
Rails has an established practice of using code at db/seeds.rb to create necessary data to use your development environment. If fixtures serve this same purpose in the test environment, it would stand to reason they would work for the development environment too.
Turns out there is a rake task that makes this pretty easy. You can even point it at a completely different fixtures directory if you want to keep your test and development data separate!
bin/rails db:fixtures:load FIXTURES_PATH=db/fixtures
This works well and I would absolutely recommend it for capturing complicated record setups in version control and reusing between different developers. Much better than having complicated brittle seed code or a long manual process documented in the project readme.
Intermediate representation for spreadsheet data import
I recently had a project that started with a large body of research in the form of dozens of spreadsheets. It is a pretty common progression from spreadsheets to a web application.
The references and foreign key relationships are not enforced by spreadsheets, so they take a number of different forms depending on the use case and the author. This collection of spreadsheets had semi structured data within cells, manually maintained primary keys, and lists of foreign keys.
Writing parsing code to handle any one cell was manageable enough. The real challenge would be handling all the various parsing strategies that were applied across different columns or the same column in a different file or even small groups of differently formatted cells.
So I decided to use fixture yaml files as an intermediate representation of my data import. My parsing code ran fast because it wasn’t touching the database, it was writing out YAML files. I was able to adjust my code and run file diffs on the results between changes easily spotting places where a new parsing strategy worked better or where it broke other cases.
Partial fixture loading
Change is the only constant.
While fixtures speed up my development feedback cycle and helped me see the big picture, they did not serve me well when the big picture changed. After the initial loading of the body of research, updates started to take place in the newly minted web application. Things were going largely to plan. Then we found a couple more spreadsheets that had been missing from the initial load. Re-running my scripts easily re-generated the fixture intermediate representation. Unfortunately fixtures only run a table at a time.
Running an insert or update (upsert) query would be a killer feature if fixtures were intended for this use case, but it simply doesn’t exist and I had to build my own. Additionally the foreign key relationships that fixtures handle so easily became another hurdle. Without the default loading strategy turning off constraints the path of least resistance was to index the data structures myself with hashes and insert the records in the correct order.
Without support for upsert partial loading I would not use fixtures this way again.
Conclusions
I’ve come to see the differences between factories and fixtures as a matter of style than either one being strictly better than the other. Having a readable serialized format for loading database records is really useful even if it has the limitation of fully replacing a table at a time.
Fixtures get a lot of criticism from the factory crowd which may be a bit overblown. It is definitely worth giving them a try if you haven’t already.
Loved the article? Hated it? Didn’t even read it?
We’d love to hear from you.