A Practical Guide to Parallel Programming in Crystal (2025)

This article is based on content created by kojix2 (a human) alternately calling DeepWiki and ChatGPT, but kojix2 (a human) has reviewed, edited, and proofread the entire text. The article was translated from Japanese to English using Claude. If you find any mistakes, please comment. Thank you.

Crystal’s parallel processing is based on a hybrid model that primarily uses Fiber (cooperative and lightweight) and utilizes Thread (OS threads) when necessary.

ExecutionContext, which has been rapidly developed since around 2024-2025, provides a new abstraction layer for safely spreading Fibers across multiple threads.

This article organizes the latest parallel execution model in Crystal.

Building with Parallel Execution Enabled

As of November 19, 2025, you need to use the following two flags:

  • -Dpreview_mt: Enables parallel execution of Fibers
  • -Dexecution_context: Enables the use of ExecutionContext
crystal build -Dpreview_mt -Dexecution_context program.cr 

While Crystal’s parallel execution is in preview, it has been over 6 years since its release and works without issues in many cases.

Overview of Crystal’s Concurrency and Parallelism

Crystal has five major execution models:

Model Execution Unit Characteristics
Fiber (default) Fiber (lightweight thread) Cooperative, automatic switching on I/O, lightweight
ExecutionContext::Concurrent Fiber group Sequential execution on 1 thread (concurrent)
ExecutionContext::Parallel Fiber group Execution on multiple threads (parallel)
ExecutionContext::Isolated 1 Fiber + 1 dedicated thread For GUI loops and blocking FFI calls
Thread OS thread For handling low-level operations

The standard design is as follows:

  • Use Fiber as the basis
  • Use ExecutionContext only where parallelism is needed

Cooperative Scheduling of Fiber and I/O

Fiber is a cooperative execution model that has existed for a while. By default (when parallel execution is disabled), switching occurs only when:

  • I/O
  • sleep
  • Channel receive/send
  • Fiber.yield

are triggered. (Fiber.suspend is called and the Fiber is suspended.)

The basic approach in Crystal is to put I/O-bound processing on Fibers.

Each Fiber has its own stack memory. The stack has a virtual size of 8MiB, but it’s only reserved, and actual memory usage starts from 4KiB.

What is a “Stack” in Crystal?

When reading Crystal documentation, you’ll encounter the word “stack.” Note that this differs from the general meaning of “stack” – it refers to a “memory region that behaves like a stack,” which is actually memory allocated from the OS heap.

What is placed on the stack:

  • Value types: Struct, Tuple, StaticArray, etc.
  • Primitive types: Int32, Float64, Bool, Char, etc.
  • Pointers to reference types: Array, Hash, etc. (The reference type objects themselves are placed on the heap, but the pointers to them are placed on the stack)

Values placed on the stack are not directly targeted by GC, but they are scanned during GC execution to prevent heap objects referenced by stack variables from being mistakenly collected.

As described later, the key point is that when captured by closures like spawn do end, the above value types are exceptionally placed on the heap and become accessible from other threads.

Background Knowledge: Thread / Scheduler / Fiber

In Crystal, each thread has its own Crystal::Scheduler that manages the fibers to be executed.

Main Thread Creation and Initialization

The main thread is automatically created by the OS when the program starts. Subsequently, when Thread.current is called, a Thread object for the main thread is created. The stack address of the main thread is obtained with the stack_address method. This is the actual thread stack allocated by the OS when the process starts.

Main Fiber Creation

When the Thread object is initialized, the main Fiber is created simultaneously. The main Fiber uses a special constructor Fiber.new(stack : Void*, thread) to utilize the OS thread stack. Unlike normal Fibers, makecontext is not called, and it uses the already running context.

Lazy Initialization of Scheduler

The main thread’s scheduler is initialized when Thread#scheduler is called. The scheduler has:

  • @event_loop: Platform-specific event loop
  • @stack_pool: Fiber stack reuse pool
  • @runnable: Queue of runnable fibers
  • @main: Thread’s main fiber

Default Thread Configuration

Without using ExecutionContext and preview_mt, only the main thread exists. The main thread has its own Crystal::Scheduler instance, which manages all fibers.

Stack Allocation for New Fibers

When a new Fiber is created, stack memory is obtained from Fiber::StackPool. When a Fiber terminates, its stack is returned to the pool through StackPool.release for reuse by the next Fiber. Stack allocation reserves 8MiB of virtual address space. Only the bottom page of the stack (4KiB) is committed to physical memory. When the stack grows and reaches a guard page, that page’s guard status is removed and a new guard page is committed. This continues until reserved pages run out.

Parallel Execution with ExecutionContext

ExecutionContext is a “virtual thread group” that executes Fibers together.

ExecutionContext::Concurrent

This is the same concurrent execution as traditional Fibers. It’s safe and easy to handle.

ctx = Fiber::ExecutionContext::Concurrent.new("workers")
  • Only one Fiber executes at a time within the context
  • Therefore, access contention to shared variables doesn’t occur (however, using Mutex/Atomic is considered safer as “recommended safety”)

Suitable when parallelization is unnecessary but you want to use Fibers.

ExecutionContext::Parallel

Parallel execution on multiple threads.

ctx = Fiber::ExecutionContext::Parallel.new("workers", 8)

Changing parallel size during execution:

ctx.resize(count)  
  • Each thread runs its own scheduler
    • The scheduler is an instance of the Fiber::ExecutionContext::Parallel::Scheduler class, responsible for executing individual Fibers. It has a local queue and manages runnable Fibers. It searches for and executes Fibers in the main loop (run_loop).
  • Fibers within the context are moved to and executed on arbitrary threads
    • When a Fiber moves between threads, only the execution context (registers and stack pointer) actually moves. The Fiber’s stack memory (heap from the OS perspective) does not move. This memory region is fixed during the Fiber’s lifetime. When a Fiber resumes on a new thread, the saved stack pointer is loaded and points to the original stack memory region.
  • Due to parallelism, Atomic / Mutex is mandatory for shared mutable state.
    • Local variables and instance variables (pointers) captured from the closure that spawns the Fiber are placed in a closure data structure allocated on the heap, and that pointer moves with the Fiber. This means that value type local variables (like StaticArray) that would normally be allocated on the stack are exceptionally allocated on the heap.

Parallel is the central feature of Crystal’s goal of “safe and fast parallel execution.”

ExecutionContext::Isolated

1 Fiber = 1 dedicated thread

gui = Fiber::ExecutionContext::Isolated.new("GUI") do
  Gtk.main
end
gui.wait
  • A single Fiber monopolizes an OS thread
  • Safe to use blocking I/O (e.g., GUI event loops, blocking FFI calls)
  • Cannot add additional spawns within the context (they are forced to go to the default context)

Suitable for main loops of GUI applications and FFI that calls C functions with I/O bundle blocking.

Default Fiber Without Using ExecutionContext

When ExecutionContext is not specified, Fibers execute in the default ExecutionContext (Fiber::ExecutionContext.default). The default ExecutionContext is Parallel, but since the initial parallelism is set to 1, it behaves the same as Concurrent.

Fiber::ExecutionContext.default.size # => 1

Basic Patterns of Channel and WaitGroup

Crystal’s parallel processing is based on a Channel + WaitGroup pattern similar to Go.

Producer-Consumer (Parallel)

consumers = Fiber::ExecutionContext::Parallel.new("consumers", 8)
channel    = Channel(Int32).new(64)
wg         = WaitGroup.new(32)
result     = Atomic.new(0)

32.times do
  consumers.spawn do
    while value = channel.receive?
      result.add(value)
    end
  ensure
    wg.done
  end
end

1024.times { |i| channel.send(i) }
channel.close
wg.wait

p result.get  # => 523776
  • Communication via Channel
  • Synchronization via WaitGroup
  • Safe updates of shared state via Atomic

This is the basic form of parallel execution in Crystal.

32 consumer Fibers executing in parallel atomically add 1024 integer values (0-1023) received from the channel and calculate their sum (523776)

Protection of Shared Variables in Concurrent

Concurrent is serial execution so contention doesn’t occur, but Crystal officially states that using Atomic / Mutex is preferable.

Atomic / Mutex / SpinLock

Atomic

A variable that can safely read and write values even when accessed simultaneously from multiple threads, a basic synchronization primitive for preventing race conditions.

  • Directly mapped to LLVM atomic instructions
  • compare_and_set, add, sub, get, set
  • Same memory orders as C/C++: Acquire / Release / Relaxed, etc.

Types that cannot be used with Atomic include value types such as structures (Struct) and StaticArray.

Mutex

A lock that protects code regions (critical sections) that must not be executed simultaneously by multiple Fibers, controlling so that only one Fiber can execute at a time.

  • Fiber-safe
  • Three modes: Checked / Reentrant / Unchecked
  • Re-entry prohibited by default (safe)
mutex = Mutex.new  
shared_array = [] of Int32  

10.times do |i|  
  spawn do  
    mutex.synchronize do  
      # Only one Fiber executes at a time within this block  
      shared_array << i  
      sleep 0.001.seconds
    end  
  end  
end  

sleep 1.second  
puts shared_array.size  # => 10

Example of manually locking/unlocking:

mutex = Mutex.new  
counter = 0  

10.times do  
  spawn do  
    mutex.lock  
    begin  
      counter += 1  
      sleep 0.001.seconds  
    ensure  
      mutex.unlock  # Always unlock  
    end  
  end  
end  

sleep 1.second  
puts counter  # => 10

SpinLock

A lightweight lock specialized for very short-term locks. It continues to use CPU while waiting (spinning), so it’s unsuitable for long-term locks.

  • For very short critical sections
  • Only effective with preview_mt / win32

SpinLock is used in implementations such as Crystal::Scheduler, Crystal::ThreadLocalValue, Crystal::Once, Mutex, WaitGroup, EventLoop::Polling, and Fiber::StackPool. There are almost no scenarios where users would directly use SpinLock in code.

Areas to Be Careful About in the Standard Library

The following are areas in the Crystal standard library that may not guarantee complete thread safety and require caution.

What Qualifies as a Shared Variable Subject to Contention?

While we’ve used the term “shared variable,” Crystal doesn’t have user-accessible global variables, so the most typical shared variable is a class variable.

  • Class variables: Always shared variables (determined by variable type)
  • Instance variables and local variables: Determined by whether they are referenced from multiple Fibers or threads when spawned

If captured by spawn, local variables can also become shared variables.

ENV

  • The safety of Unix’s getenv/setenv/unsetenv is environment-dependent
  • Parallel modification is not recommended

This is also discussed in the Crystal Forum:

https://forum.crystal-lang.org/t/eliminate-environment-modifications/8533/29

Class Variables

In Crystal, you can use the @[ThreadLocal] annotation to make class variables thread-local.

class Foo
  @[ThreadLocal]
  @@var = 123

  def self.var
    @@var
  end
end

In this case, each thread has an independent copy of @@var, so changing the value in one thread doesn’t affect other threads.

Class variables without @[ThreadLocal] are shared. In this case, you need to use Atomic / Mutex for parallel updates.

IO (File, Socket, STDOUT/ERR)

Safety may not be guaranteed when simultaneously operating on the same IO from multiple threads.

Logger

Logger also uses IO internally. Writing to the same Logger from multiple threads may not be safe.

Report Any Issues You Find

Crystal is a programming language with far fewer users compared to languages like Python and Java. User reports are very valuable and precious. It’s important to continue improving the language and libraries by actively reporting bugs to Crystal Forum and GitHub issues.

Cases Where Thread Should Be Used

Thread directly represents the OS’s native thread. It can be used when low-level control is needed.

There are almost no cases where you should use Thread directly without using ExecutionContext.
It may be an option in cases such as:

  1. Want to parallelize compute-intensive tasks
  2. FFI is blocking and cannot suspend Fiber (however, if the FFI function is CPU-intensive processing, blocking is considered desirable behavior)
  3. C library requires thread-local initialization

Using Thread::Channel enables safe communication between threads.

FFI (C Library Calls) and Parallel Execution

Since C libraries are not necessarily thread-safe, following patterns like these is considered safe:

  • Wrap with Mutex
  • Isolate in ExecutionContext::Isolated context
  • Dedicated Thread + Thread::Channel
  • Use ThreadLocal state

Summary

Crystal’s parallel execution is currently in the midst of major evolution. In addition to Fiber, which has been used for concurrent execution in I/O-bound processing, ExecutionContext::Parallel now enables full-fledged parallel processing. Using Atomic / Mutex / Channel / WaitGroup, you can build safe parallel processing similar to Go. Execution::Isolated is effective for GUI / FFI. Thread can be used in special cases where OS threads need to be handled directly. Note that there are ambiguous parts regarding thread safety in the standard library.

Practical Guidelines for Parallel Execution in Crystal

  • Leave I/O to Fiber
    • No special action needed as Crystal’s I/O model is tightly integrated with Fiber.
  • Use Parallel or Thread for CPU-bound tasks
    • ExecutionContext::Parallel is the first choice.
  • Protect shared state with Atomic or Mutex
    • Treat gray zones like ENV and Logger conservatively
  • Test explicitly using -Dpreview_mt and -Dexecution_context

This concludes the article. Thank you for reading to the end.

Leave a Reply