updates @ m.blog

Ruby & Bundler: They Took Our Jobs!

Since version 1.4.0, Ruby’s Bundler has allowed you to specify a number of --jobs to run concurrently, speeding up gem dependency installation. But what’s the right number of jobs to set?

The prevailing wisdom comes from these benchmarks by Jeff Dickey, from a year ago when Bundler 1.4.0 was in prerelease. His conclusion was to use N-1 jobs where N is the number of cores on your development machine.

However, since version 1.5.3, Bundler now actually baked part of this logic in and automatically uses N-1 cores, when N is the number of jobs requested with the --jobs= argument.2 So presumably you would want to use your actual number of cores since that change…

But I was still curious – did this mean physical cores, or logical cores? My main dev machine is a quad-core Core i7 3.4GHz iMac, which has hyper threading enabled by default. So as far as most applications are concerned, it has 8 logical cores, not 4. The performance implications of hyperthreading are still a bit weird (plus you should never trust any benchmarks you haven’t run on your own environment) so I decided to do some benchmarking of my own.

Benchmarks

All tests done with ruby 2.1.3p242, Bundler version 1.7.3. For me I was testing installing the dev dependencies for the lolcommits gem. 35/35Mbps fiber internet connection.

Core i7 3.4GHz iMac (quad core, with hyperthreading), 12GB RAM, SSD hard drive. 35/35Mbps fiber internet connection.

| jobs | cpu  | time  |
|:-----|-----:|------:|
| 1    |  53% |  47.3 |
| 2    |  54% |  46.7 |
| 3    |  86% |  29.9 |
| 4    |  89% |  28.9 |
| 5    |  99% |  26.7 |
| 6    | 105% |  25.3 |
| 7    | 117% |  23.4 |
| 8    | 122% |  23.2 |
| 9    | 117% |  24.4 |
| 10   | 125% |  22.9 |
| 12   | 125% |  22.8 |
| 20   | 134% |  23.0 |
| 40   | 149% |  22.5 |

For comparison here is a much lower end system, a Core 2 Duo 2.1GHz Macbook Air (dual core), 4GB RAM, SSD hard drive.

| jobs | cpu  | time  |
|:-----|-----:|------:|
| 1    |  64% |  79.9 |
| 2    |  68% |  75.6 |
| 3    |  98% |  53.0 |
| 4    | 102% |  50.8 |
| 5    | 106% |  49.4 |
| 6    | 104% |  50.6 |
| 7    | 109% |  48.6 |
| 8    | 123% |  46.7 |
| 9    | 108% |  49.2 |
| 10   | 110% |  48.3 |
| 12   | 112% |  47.6 |
| 20   | 113% |  48.7 |
| 40   | 116% |  49.3 |

Discussion

It appears things have changed in the world of bundler 1.7.3. Some conclusions I draw from my results:

  1. Bundler jobs are IO bound, not CPU bound. Note that even when --jobs=40 on my iMac, CPU utilization never reached over 150% – if we were efficiently using all the CPUs then utilization would have approached 400%).
  2. There appears to be no real speed penalty to going too high with the number of jobs.
  3. N-1 cores does not appear to be the ideal way to set number of jobs in bundler. For my MacBook Air, that would result in N=1 jobs, when it gained proximate speed increases up to 4 jobs (despite only being dual-core).

Note that more benchmarks should be done to confirm these results. I invite others to please try to replicate on their hardware. In particular, I’d like to see a range of different gem dependencies tested, and for someone without a SSD drive to test so we can see the impact of lower IO speed.

In the meantime though, for my hardware at least, my recommendation is actually to set the number of jobs to N*2, where N is the number of physical CPUs available to the workstation. On my iMac, this would equal 8, whereas on my MacBook Air, it would equal 4. This formula seems to get all the benefits of the added concurrency without going too crazy.

TL;DR more jobs seem to be better. Don’t let them take your jobs.

Appendix

Determining number of CPU cores programatically

This is especially pertinent because the way most scripts that set this automatically work is to use the number of logical CPUs, rather than physical CPU cores. The common command on Darwin that scripts such as Boxen use is systl hw.ncpu. Thankfully, there actually many variables we can use to determine actual CPU count on a Mac:

$ sysctl hw | grep cpu | grep -v frequency | grep -v type | grep -v family
hw.ncpu: 8
hw.activecpu: 8
hw.physicalcpu: 4
hw.physicalcpu_max: 4
hw.logicalcpu: 8
hw.logicalcpu_max: 8
hw.cpu64bit_capable: 1
hw.ncpu = 8
hw.availcpu = 8

The _max variants (e.g. logicalcpu vs. logicalcpu_max) are there to support hardware that step down the number of active cores when in power save mode (e.g. a portable on battery life). A delta between those would be visible in that situation.

Run your own tests

For my purposes, I tested via the following command in my project directory:

for j ({1..8}); do rm -rf .bundle; time bundle install --quiet --jobs=$j; done

Note that I have bundler configured to install in the local directory, you will want to modify the remove command to reflect your own, which you can determine from bundle config path.

Comments