Introducing exq-scheduler, a distributed scheduler for Sidekiq
by rahul

One of our customers wanted to setup a time based job scheduling system, similar to cron, to reliably schedule critical tasks. They were already using sidekiq, to run some tasks in the background.

To improve fault tolerance, we tried running sidekiq-schedular (a popular scheduling library) in a distributed setup. But we found synchronization issues1 that could lead to jobs being scheduled multiple times under some scenarios.

In our effort to fix these issues, we built exq-scheduler.

We had to understand some finer aspects of distributed systems, to get a better understanding of how processes synchronize using shared memory. We plan to write about some of these understandings in subsequent notes.

About Sidekiq and Exq

Sidekiq is a job processing library. It has two components, a client and worker. It uses a redis LIST as storage. A job instruction is a json object complying with a schema. A sidekiq client creates a job instruction in the specified schema & pushes it to a queue (redis LIST). A sidekiq worker listening to the queue receives this instruction and performs a related task.

client and worker can be written in any language, as long as they work with the same schema, as implemented by sidekiq. Exq is an elixir library, which complies with the same storage schema. In other words, you could use sidekiq client with exq worker, or visa versa, or even use exq for both client and worker.

exq-scheduler is a sidekiq client that enqueues job instructions as per a time schedule.

These time schedules can be configured using cron syntax as follows.

config :exq_scheduler, :schedules,
  signup_report: %{
    cron: "0 * * * *",
    class: "SignUpReportWorker",
    args: ["arg1", "arg2"],
    queue: "default"
  }

Features

Here are some of exq-scheduler’s salient features. We might expand on implementation details in future posts.

  • Run multiple schedulers while avoiding duplicate job instructions.
    We do this by building a key deterministically using execution time and pushing to the queue in a transaction.

  • Redis sentinel support.
    This allows improving availability of shared redis instance.

  • Schedule missed jobs.
    If scheduler experiences down time for some reason (node restarted etc.), jobs missed in the last 1 hour are scheduled by default. The interval can be configured.

  • Define schedules for a timezone.
    eg:
    config :exq_scheduler,
    missed_jobs_window: 60 * 60 * 1000,
    time_zone: "Asia/Kolkata"
    
  • API consistent with sidekiq and sidekiq-scheduler.
    This enables any compatible job runner to pick up instructions. Also allows use of sidekiq and sidekiq-scheduler’s web UI.

  • Deploy with existing Exq workers without major changes in deployment setup.

Stability

exq-scheduler is currently being used in production with one of our customers. The library is backed by tests which introduces various faults and verifies that invariants are always maintained.

  1. Issues were caused by lack of determinism when building a unique key for scheduled tasks and not handling edge cases, which account for failure of scheduler, after acquiring a lock. https://github.com/moove-it/sidekiq-scheduler/issues/156, https://github.com/moove-it/sidekiq-scheduler/issues/181 

  |   A primer on Elixir Stream ⇒

About ActiveSphere

We help businesses build and scale complex web-apps, visualizations and high-throughput messaging systems. Check out some of our other open source projects or our work portfolio if it interests you.