Operating Systems and Concurrency

Mar 18, 2026
Computer Science

I spent years writing Node.js without understanding why it was single-threaded, why that was a feature, or what the operating system was actually doing with my code. I knew async/await made things "non-blocking," but I couldn't explain what was being blocked in the first place, or who was doing the blocking. When I finally learned how operating systems manage processes, threads, and memory, a dozen mysteries about my own code suddenly made sense.

This post covers the OS concepts that matter most for application developers—not because you'll build a kernel, but because understanding the layer below your runtime makes you better at debugging, designing, and reasoning about performance.


Processes and Threads

A process is an instance of a running program. It has its own address space (memory), file descriptors, and at least one thread of execution. When you run node server.js, the OS creates a process.

A thread is a unit of execution within a process. All threads in a process share the same memory space but have their own call stack and program counter. Threads are lighter than processes—creating a thread is fast because there's no new address space to set up.

Process isolation matters. If Process A crashes, Process B is unaffected—they have separate memory. This is why browsers run each tab in its own process: a crashing tab doesn't bring down the browser.

Thread sharing matters. Threads within a process can read and write the same memory. This makes communication between threads fast (no copying data) but introduces the entire category of concurrency bugs—race conditions, deadlocks, data corruption.

Why Node.js Is "Single-Threaded"

Node.js runs your JavaScript on a single main thread. But it isn't truly single-threaded—the runtime uses a thread pool (via libuv) for file I/O, DNS lookups, and other blocking operations. When you call fs.readFile(), the actual disk read happens on a background thread. When it completes, the result is placed on the event loop for your callback to process on the main thread.

This design avoids the need for locks in your application code. You never have two threads accessing the same JavaScript variable simultaneously. The tradeoff: CPU-intensive work on the main thread blocks everything—no requests get handled until that computation finishes.

// This blocks the event loop — no other requests are processed for ~2 seconds
app.get("/heavy", (req, res) => {
  const result = computeFibonacci(45) // CPU-bound, runs on main thread
  res.json({ result })
})
 
// Fix: offload to a worker thread
import { Worker } from "node:worker_threads"
 
app.get("/heavy", (req, res) => {
  const worker = new Worker("./fib-worker.js", { workerData: 45 })
  worker.on("message", (result) => res.json({ result }))
})

Context Switching

The CPU can only run one thread per core at a time. When there are more threads than cores (which is almost always true), the OS context switches—it saves the current thread's state (registers, program counter, stack pointer), loads another thread's state, and resumes that thread.

Context switching is fast (microseconds) but not free. Each switch involves:

  1. Saving registers and CPU state to memory.
  2. Flushing or invalidating CPU cache lines (expensive).
  3. Loading the new thread's state.
  4. If switching between processes, flushing the TLB (translation lookaside buffer) for virtual memory.

This is why creating thousands of threads degrades performance—the OS spends more time switching between them than actually running them. It's also why event-driven architectures (Node.js, nginx) outperform thread-per-request architectures (Apache) for I/O-heavy workloads: one thread handling thousands of connections avoids thousands of context switches.


System Calls

Your code can't directly access hardware. It can't read a file, send a network packet, or allocate memory without going through the OS. System calls (syscalls) are the interface between user programs and the kernel.

When you call fs.readFile() in Node.js, the chain is:

  1. Your JavaScript calls a Node.js API.
  2. Node.js calls a C++ binding via libuv.
  3. libuv calls the OS read() syscall.
  4. The kernel accesses the disk (or filesystem cache) and returns data.
  5. Data flows back up through libuv to your callback.

Each syscall involves switching from user mode to kernel mode—the CPU changes privilege levels. This transition has overhead, which is why batching I/O operations (reading a file in large chunks instead of byte by byte) matters for performance.

Common syscalls: open, read, write, close (files), socket, connect, send, recv (networking), fork, exec (processes), mmap (memory mapping).


Virtual Memory

Every process thinks it has the entire address space to itself. This illusion is virtual memory—the OS maps each process's virtual addresses to physical RAM locations.

Process A sees:        Physical RAM:
0x0000 - code          [Process A code at 0x7F00]
0x1000 - heap          [Process B heap at 0x3200]
0x2000 - stack         [Process A heap at 0x8400]
                       [Process A stack at 0x1100]

Paging divides memory into fixed-size blocks (typically 4KB pages). The OS maintains a page table per process that maps virtual pages to physical frames. When your code accesses an address, the CPU's MMU (Memory Management Unit) translates it using the page table.

Page faults occur when a virtual page isn't currently in physical RAM:

  • Minor fault: The page exists but hasn't been mapped yet (e.g., first access to mmap'd file). The OS maps it without disk I/O.
  • Major fault: The page was swapped to disk. The OS loads it from swap, which is slow—disk access is ~1000x slower than RAM.

This is why "running out of memory" doesn't always mean your RAM is full. When the system swaps heavily, performance collapses because every memory access potentially involves a disk read. Monitoring swap usage is as important as monitoring RAM usage.


File Systems

A file system organizes data on disk into files and directories. The OS abstracts the raw disk blocks into a hierarchical namespace you interact with through paths.

Key concepts:

  • Inodes: Metadata structures that store file attributes (size, permissions, timestamps) and pointers to the actual data blocks. The filename is stored in the directory, not the inode—this is why hard links work.
  • File descriptors: Integer handles the OS gives your process to reference open files. There's a per-process limit (typically 1024 by default on Linux). Running out of file descriptors is a common failure mode for servers handling many connections.
  • Buffering: The OS caches recently accessed disk blocks in the page cache (RAM). Reading a file that was recently read is fast because it's served from cache, not disk. This is also why free -m on Linux shows most of your RAM as "used"—it's being used for page cache, which is automatically reclaimed when applications need memory.
# Check file descriptor limits
ulimit -n     # per-process soft limit
cat /proc/sys/fs/file-max  # system-wide limit

Race Conditions

A race condition occurs when the behavior of a program depends on the timing of events that aren't guaranteed to happen in a particular order.

import threading
 
counter = 0
 
def increment():
    global counter
    for _ in range(1_000_000):
        counter += 1  # NOT atomic: read, add, write
 
threads = [threading.Thread(target=increment) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
 
print(counter)  # Expected: 4,000,000. Actual: ~2,500,000 (varies)

counter += 1 looks like one operation but it's three: read the current value, add 1, write the new value. Two threads can read the same value, both add 1, and both write the same result—losing an increment.

In JavaScript/Node.js, you're mostly safe from this because your code runs on one thread. But race conditions still happen with async operations:

let balance = 100
 
async function withdraw(amount) {
  if (balance >= amount) {
    // Check
    await processPayment(amount) // Async gap — other code runs here
    balance -= amount // Update
  }
}
 
// Two withdrawals racing
withdraw(80)
withdraw(80) // Both pass the check because balance hasn't been updated yet

The await creates a gap where another coroutine can run. The check-then-act pattern is dangerous across async boundaries.


Locks and Mutexes

A mutex (mutual exclusion) is a lock that ensures only one thread can access a critical section at a time.

import threading
 
counter = 0
lock = threading.Lock()
 
def increment():
    global counter
    for _ in range(1_000_000):
        with lock:
            counter += 1
 
threads = [threading.Thread(target=increment) for _ in range(4)]
for t in threads: t.start()
for t in threads: t.join()
 
print(counter)  # Always 4,000,000

The lock guarantees that only one thread executes counter += 1 at a time. The others wait.

Semaphores are generalized locks that allow up to N threads to enter (a mutex is a semaphore with N=1). They're used for resource pools—like limiting database connections to 10 concurrent.

Read-write locks allow multiple readers or one writer. This is common in database systems: many queries can read simultaneously, but writes require exclusive access.


Deadlocks

A deadlock occurs when two or more threads each hold a lock the other needs, and none can make progress.

Thread A: holds Lock 1, waiting for Lock 2
Thread B: holds Lock 2, waiting for Lock 1
→ Neither can proceed.

Four conditions must all be true for deadlock (Coffman conditions):

  1. Mutual exclusion: Resources can't be shared.
  2. Hold and wait: A thread holds resources while waiting for more.
  3. No preemption: Resources can't be forcibly taken.
  4. Circular wait: A cycle of threads waiting for each other.

Prevention strategies:

  • Lock ordering: Always acquire locks in the same global order. If every thread acquires Lock 1 before Lock 2, circular wait is impossible.
  • Timeouts: Try to acquire a lock with a timeout. If it fails, release held locks and retry.
  • Avoid holding multiple locks when possible.

Databases handle deadlocks by detecting cycles in the wait-for graph and aborting one of the transactions. PostgreSQL does this automatically.


Thread Pools

Creating a new thread for every task is wasteful—thread creation has overhead, and too many threads cause excessive context switching. A thread pool maintains a fixed number of worker threads and a task queue.

Node.js's libuv uses a thread pool (default 4 threads, configurable via UV_THREADPOOL_SIZE) for file I/O and DNS. Database connection pools work the same way—instead of opening a new connection per query, you reuse connections from a pool.

The optimal pool size depends on the workload:

  • CPU-bound tasks: Number of cores (more threads just means more context switching).
  • I/O-bound tasks: More threads than cores is fine, because threads spend most of their time waiting.

Async I/O and the Event Loop

The event loop is the OS-level answer to "how do I handle many I/O operations without many threads?"

Operating systems provide mechanisms for non-blocking I/O: epoll (Linux), kqueue (macOS/BSD), IOCP (Windows). These let a single thread ask the kernel: "notify me when any of these 10,000 sockets have data ready."

This is the foundation of Node.js, nginx, and every high-performance network server. One thread, many connections, no context switching between request handlers. The thread is never idle—it's always processing the next ready event.

The tradeoff: if your callback does CPU-heavy work (image processing, JSON parsing of a huge payload), it blocks the event loop and no other events get processed until it finishes. This is why Node.js is great for I/O-heavy services (APIs, proxies, real-time apps) and poor for CPU-heavy ones (video encoding, ML inference) unless you offload to worker threads.


The Pragmatic Takeaway

You don't need to write kernel code. But you need a mental model of what happens beneath your runtime.

When your Node.js server slows under load, is it context switching from too many worker threads? Is the event loop blocked by synchronous computation? Is the system swapping because memory is exhausted?

When your database queries slow down, is it lock contention from concurrent writes? Is it running out of connections from the pool? Is a long transaction holding locks that block others?

When your containers get killed, is it an OOM (out of memory) because the process exceeded its cgroup limit? Is it a file descriptor leak?

Every "mysterious" production issue I've debugged has had an OS-level explanation. Understanding processes, threads, memory, and I/O models doesn't just help in interviews—it's the difference between guessing and diagnosing.