Reliable Ruby timeouts with System Timer:
Do not blindly trust timeout.rb

system timer sign
First published: May 2008
Last update: July 2009

timeout.rb, the library used by Ruby to ensure timeouts, is not 100% reliable. In particular, it is guaranteed to not work in Ruby M.R.I1 1.8 when you are issuing system calls that can cause your Ruby process to sleep for a period of time that is longer than your timeout.

Update: system_timer is only relevant if you are running Ruby 1.8. You do not need it if you are running Ruby 1.9, JRuby, Rubinius or MacRuby. Actually using system_timer with one of these Ruby interpreters would not even make any sense since, as explained in this article, system_timer is designed to workaround a fundamental limitation of the threading implementation in Ruby M.R.I. 1.8 (green threads). All other Ruby interpreters use native threads and timeout.rb should work as expected as long as the global interpreter lock is released.

1. Background

In one ThoughtWorks project, while I qualified a large Ruby on Rails application for production, I realized that every now and then, some of our Rails instances would hang for a very, very long time when calling some web services. This surprised me and my teammates since we had configured a 20 second timeout for these web services calls and had even devised a fallback strategy. Clearly the system was not timing out as configured, and this had a direct impact on our system stability and scalability.

It turned out that timeout.rb, the library used by Ruby to ensure timeouts, is not 100% reliable. In particular, it is guaranteed to not work in M.R.I 1.8 when you are issuing system calls that can cause your Ruby process to sleep for a period of time that is longer than your timeout.

As a consequence, David Vollbracht and I implemented an alternative timeout library, System Timer, which provides guaranteed timeouts for Ruby applications – even when these applications cross boundaries and go down to the system level. We based the implementation on UNIX signals, following up on an original idea by Kurtis Seebaldt.

Please also note that System Timer is not a drop-in replacement to timeout.rb, nor was it designed to be. Most of the time timeout.rb is just good enough, and relying on UNIX signals for all your timeouts would be overkill. Nevertheless, System Timer is a must for stable Ruby on Rails applications and a great fit when you need guaranteed timeouts on access to externals resources that cross Ruby boundaries: web services, database calls, …

2. Why Use System Timer

2.1. Timeouts Keep Ruby on Rails Healthy

Making sure that your application will not hang during access to external resources is always a good thing, but for Ruby on Rails applications this is pretty much a question of life or death… at least for your Mongrel cluster.

Ruby on Rails is not thread-safe so a Rails application is typically deployed as a collection of Mongrel processes. Each Mongrel server can only serve one Rails request at a time, but good concurrency is achieved by distributing requests across multiple Mongrel processes. Your average enterprise-grade Rails application typically runs 30 to 100 Mongrel servers. The catch is that if for some reason your application hangs while processing a Rails request, this server will be unable to process any more Rails request. So if you are deploying your Rails application with 50 Mongrel servers, it will only take 50 requests that hang or are extremely slow to bring your entire application down… and this state of being may ignite a furious debate between bloggers on the scalability of Ruby on Rails.

The Rails community is well-aware of the importance of not letting request processing run for too long, and typically, it addresses this problems by combining complementary techniques:

2.2. In Some Cases the Ruby Timeout Library Does Not Quite Work

As seasoned Rails developers, my ThoughtWorks teammates and I used all these above mentioned techniques on my last project. In particular, we set pretty agressive timeouts on some of our web service calls: these calls provided an enhanced user experience but were not critical to the operation at hand. We had fallback strategies for these calls and thought that we would provide a snappy user experience with code like:

factory = SOAP::WSDLDriverFactory.new("A-Scary-WSDL.wsdl")
returning(factory.create_rpc_driver) do |driver|
   ...
   driver.options["protocol.http.connect_timeout"] = 5.seconds
   driver.options["protocol.http.send_timeout"] = 5.seconds
   driver.options["protocol.http.receive_timeout"] = 5.seconds
   ...
end

And, to be honest, we were feeling pretty good about our fallback strategies, and we were even confident that our application was ready to gracefully handle our customer’s ambitious user load. Nevertheless, our load-testing environment revealed that, unexpectedly, Rails was hanging on some of these web-service calls. This was quite a setback. We already knew that these (external) web services were not extremely reliable, but we were counting on our in-place timeouts and fallback strategy to handle that. Even worse, pretending nothing happened and relying on monitoring tools to cover up the mess by automatically restarting the servers was not really an option (assuming that we were ever tempted) since:

Moreover, we had timeouts specifically configured for these calls, so the system was clearly not behaving as expected. During a little troubleshooting session to figure out what was happening, we discovered that Rails was hanging on a system call, which was somewhat expected, but the timeout was never kicking in. So we started to take a closer look at how timeouts were implemented in Ruby.

It turned out that the soap4r library we were using for our web services calls relied on net/http to enforce the configured timeouts. In turn net/http, like pretty much all Ruby libraries, relied on timeout.rb for its timeout behavior with code like:

s = timeout(@open_timeout) { TCPSocket.open(conn_address(), conn_port()) }

Our investigation was converging on timeout.rb. It was time to examine its implementation in more depth:

# From lib/timeout.rb

def timeout(sec, exception=Error)
  return yield if sec == nil or sec.zero?
  raise ThreadError, "timeout within critical session" if Thread.critical
  begin
    x = Thread.current
    y = Thread.start {
      sleep sec
      x.raise exception, "execution expired" if x.alive?
    }
    yield sec
    #    return true
  ensure
    y.kill if y and y.alive?
  end
end

For the purpose of this article, you can ignore the first 2 lines of the timeout implementation: They just deal with corner cases such as when no timeout is specified or when the timeout method is used in a critical section (which prevents more than one thread from running). The interesting part here is that timeouts are implemented by starting a new thread (y) – which I often call the “homicidal thread” – which will sleep for the duration of the timeout. At this point we have a race between the current thread (x) to do its business and the “homicidal thread”.

The key information here is that for timeout.rb to work, , a freshly created Ruby thread has to be scheduled3 by the Ruby interpreter.

2.3. Green Threads Are Not Guaranted to be Scheduled

At this point, the problem was quite clear. When processing Rails requests, Mongrel uses Ruby’s threading system. Ruby threads can be implemented in very different ways by the various Ruby interpreters. But in practice, we almost always run Rails applications in production using (M.R.I. 1.8), the “official” Ruby interpreter, which implements Ruby threads as_green threads_4.

Ruby Thread Scheduler

The purpose of this article is not to debate the pros and cons of using green versus native threads. What really matters in our context is that, unfortunately, it is a well-known limitations of green threads that when a green thread performs a blocking system call to the underlying operating systems, none of the green threads in the virtual machine will run until the system call returns. This is quite intuitive once you realize that all Ruby green threads run on top of a single native thread and the operating system will not schedule this native thread to run until the blocking, synchronous, system call completes. From the operating system perspective, there is only a single thread in the Ruby interpreter (the native one), and there is no point scheduling this process until the system call completes.

Ruby Thread Scheduler

In fact, to be fair, I should mention that, when Matz implemented Ruby threads, he was well aware of this green threads drawback. As a consequence, whenever possible, the 1.8 M.R.I implementation actually goes out of its way to prevent this problem from getting triggered. For I/O in particular, while the interpreter exposes a synchronous API to Ruby programmers, it actually uses non-blocking I/O internally for its system calls. In this way the Ruby interpreter can still schedule other green threads while the I/O operation is in progress. So for some system operations at least, the M.R.I. interpreter tries hard to avoid starving all other threads from executing. Nevertheless, it cannot achieve this goal for all potential system calls that could be triggered. In particular, initiating network connections and/or a broken or slow DNS server will typically block the whole Ruby process while the call completes.

2.4. Putting it All Together

At this point everything should make sense:

This is why timeout.rb does not provide guaranteed timeouts and our Rails application were hanging for so long on these web services calls.

Now that we understood what the problem was, it was time to engineer a solution. So David Vollbracht and I pair-programmed an alternative timeout implementation for M.R.I 1.8, in the form of a native gem: system_timer.

3. How to Use system_timer

System Timer works around the green thread limitations by basing its implementation on underlying operating system mechanisms, not the Ruby threading system.

A traditional UNIX operating system already offers a native way to setup timers and interrupt processes, and one which is not prone to the problem on blocking, synchronous, system calls: UNIX signals. So we based System Timer implementation on SIGALRM – In case you are wondering why we did not use SIGVTALRM, M.R.I. 1.8 thread scheduler is already making extensive use of it.

OK, enough theory. What if you actually want to use System Timer on your project? How do you install it? How do you plug it in?

3.1. Installing System Timer

System Timer is just a standard Ruby gem hosted on Rubyforge. You can install it in the usual way with:

sudo gem install system_timer

You can even install it on Windows:

gem install system_timer

Nevertheless the Windows gem is nothing but a “placebo” implementation using timeout.rb under the cover. Obviously, this will not fix any production problem (but you do not deploy Ruby applications on Windows, do you?). The placebo Windows gem is nevertheless very useful when you share a codebase relying on System Timer with teammates that run the application on Windows.

3.2. Securing External Resource Access with System Timer

The System Timer API is extremely similar to the one in timeout.rb:

require 'system_timer'

SystemTimer.timeout_after(30.seconds) do
  do_something_that_could_take_a_long_time
end

In fact it is so similar to timeout.rb that you can even use the exact same API too (if you want to use it as a drop-in replacement for timeout.rb):

require 'system_timer'

SystemTimer.timeout(30.seconds) do
  do_something_that_could_take_a_long_time
end

In this way your application or library can leverage System Timer when available, but gracefully fallback to timeout.rb otherwise:

begin
  require 'system_timer'
  MyTimer = SystemTimer
rescue LoadError
  require 'timeout'
  MyTimer = Timeout
end

# ...

MyTimer.timeout(30.seconds) do
  do_something_that_could_take_a_long_time
end

When you do not want to use System Timer as a drop-in replacement for timeout.rb, you can conveniently wrap all calls to an object with with timing constraints:

class TimeBoundProxy

  def initialize(target, timeout)
    @target = target
    @timeout = timeout
  end

  def method_missing(a_method, *args, &block)
    SystemTimer.timeout_after(@timeout) do
      @target.send a_method, *args, &block
    end
  end

end

And then use this TimeBoundProxy to wrap calls on objects crossing system-boundaries (e.g. SOAP driver):

def create_time_bound_driver
  TimeBoundProxy.new create_rpc_driver, timeout_in_seconds
end



4. When to Use System Timer

Note that System Timer is not intended to be a drop-in replacement to timeout.rb. timeout.rb is good enough if you are not crossing boundaries – read this as not reaching for the operating system. Besides, using UNIX signals for all your timeouts would be overkill and the SIGALRM timer is shared between all the applications running on the same system. MySQL, in particular, also relies on SIGLARM. System Timer has been designed to preserve and restore any existing SIGALRM signal handler and it plays nice with MySQL for instance (we use have been using System Timer in production since the January 2008). Nevertheless if you were to define very long timeouts (say 15 minutes), you could potentially interfere with other applications running on the same box.

Nevertheless, System Timer is a must for stable Ruby on Rails applications and a great fit when you need guaranteed timeouts on access to externals resources that cross Ruby boundaries: web services, database calls, etc.

5. References


  1. Matz’ Ruby Interpreter.

  2. Check all the options

  3. Operating systems achieve the grand illusion of multiple processes/threads running simultaneously by switching from one process/thread to another in a very short time frame. The scheduler is the part od the O.S controlling when to switch and which process/thread to choose to run for the next time frame. So a thread is said to scheduled when it is chosen by the scheduler to run. If a thread is never scheduled, it will never run.

  4. A green-thread is a a thread that is managed in user-space instead of natively by the underlying operating system. Meaning that the Ruby interpreter manages multi-threading internally and implements its own thread scheduler. The Ruby interpreter creates and schedules the threads itself, and the underlying OS sees a Ruby interpreter as a single native thread.

Original web site design by: JFX diz*web.