ActiveRecord - Quick Tips

ActiveRecord - Quick Tips

I will use the data from an dump with 100.000 registers

pluck vs. map

Map is useful for arrays, and that is it. Rails select all columns to instantiate an object for each row

Pluck delegates that job to the SQL, which turns the process way much faster and effective.

Benchmark.ms { Purchase.all.map(&:id) }
=> 23967.25999999944

Benchmark.ms { Purchase.pluck(:id) }
=> 425.65599999943515

If you already had an array with the objects, maybe it is better to use the map than touch the database again

uniq

Order makes difference

Benchmark.ms { Purchase.pluck(:id).uniq }
 => 1645.4709999998158

Benchmark.ms { Purchase.uniq.pluck(:id) }
 => 446.65100000020175

At the first case, the pluck returns an array with an id for each row from the Purchases table, the duplicated are then filtered using the Array#uniq

In the second the uniq uses the ActiveRecord::QueryMethods#uniq that filter directly in the SQL with DISTINCT

find_each vs. find_in_batches vs. Each

For iterations over a small number of objects, using each solves it. But when the talk about thousands or millions of objects, using each becomes inviable taking in consideration that it loads all objects to just them iterate over it.

The solution is to “broke” that in blocks

Purchase.where(confirmed: false).each do |purchase|
  purchase.confirm!
end


Purchase.where(confirmed: false).find_in_batches do |group|
  group.each { |purchase| purchase.confirm! }
end

Purchase.where(confirmed: false).find_each do |puchase| 
  purchase.confirm!
end

each will block the row UNTIL gets it done, it can be a trouble if we are talking about a big table or if the table is used in other scenarios as well

Using the find_in_batches, it will make the queries by blocks of 1000 ( default, but can be changed ), in other words it will no longer block the table to other queries

find_each is basically an shortcut for find_in_batches that dont use the scope of the blocks returned

includes

Nice to avoid N+1 queries, since it also load the relation records

Purchase.includes(:person).map { |purchase| purchase.person.id }

joins vs includes

Joins is a good one if you are just filtering through the results, not accessing the records from the relationship directly. ( It dont avoid N+1 queries )

Purchase.joins(:person).where(:people => {confirmed: true}).map { |purchase| purchase.id }

Otherwise, if will access the associations directly its a good one to use includes since it avoids N+1 queries

Purchase.includes(:person).map { |purchase| purchase.person.id }

exists vs any

Benchmark.ms { Purchase.any? }
 => 31.580000009853393

Benchmark.ms { Purchase.exists? }
 => 3.292000008514151

any?: accepts a block and retrieves the records in the relation with it ( calls #to_a, calls the block, hits it with Enumerable#any? ), without a block its the same as !empty? and count the records of the relation

As few examples earlier, the exists? delegate it to the SQL setting a LIMIT 1 and becomes way faster.

Group Counting

Interesting way to get the count based on attributes of the model.

Purchase.group(:signed_contract).count
 => {false=>1527, true=>98484}