Dynamic Dispatch: Principles and Implementations Across Languages

Dynamic dispatch is the runtime process of selecting which function or method body to execute when multiple implementations could apply. It’s a cornerstone of object-oriented programming and polymorphism—and it also shows up in “static” languages via interfaces, virtual methods, trait objects, and protocol existentials.

Reader map (what this covers—and what it doesn’t)

Covers: runtime target selection (dynamic dispatch), the main mechanisms (vtables, interface dispatch variants, inline caches, dictionary passing, multimethod caches), and the optimization hooks (ICs, devirtualization, specialization).
Also covers: how popular languages map onto those mechanisms (C++/Java/.NET/JS/Python/Ruby/Rust/Swift/Haskell/Julia/CLOS).
Doesn’t focus on: message-passing semantics as a language design philosophy (e.g., Smalltalk “everything is a message”), or a deep AOT-vs-JIT comparison beyond what matters for dispatch.

Terminology boundaries (so we don’t mix concepts)

A few terms get conflated:

Dynamic dispatch: runtime selection of a call target among multiple candidates.
Dynamic typing: types are checked/used primarily at runtime; often correlates with dynamic dispatch, but you can have dynamic dispatch in statically typed languages (interfaces/virtuals), and static dispatch in dynamically typed languages (after specialization/JIT in hot paths).
Late binding: a broader term meaning “bound later than compile time” (could mean runtime dispatch, dynamic linking, etc.). In this post, we’ll be precise and say “dynamic dispatch” when we mean runtime target selection.
Overload resolution: typically compile-time selection among multiple functions with the same name based on static types/signatures. This is not dynamic dispatch.
Multiple dispatch vs overloading: multiple dispatch chooses based on runtime types of multiple arguments; overloading chooses based on compile-time types.

What “dispatch” means (anchored with a concrete example)

At a call site like obj.m() or f(x), the program must resolve which implementation to run.

Static dispatch: target known at compile time (or link time). Often inlinable.
Dynamic dispatch: target chosen at runtime based on runtime type metadata, method tables, caches, etc.

A single call site can be static or dynamic depending on what the compiler knows:

struct Base { virtual int f() { return 1; } };
struct Derived : Base { int f() override { return 2; } };

int call_known() {
  Derived d;
  // Compiler knows the exact type is Derived here.
  return d.f();            // can be devirtualized → direct call/inlined
}

int call_unknown(Base* b) {
  // Only know “some Base” at compile time.
  return b->f();           // dynamic dispatch via vtable
}

Same method name, same “shape” of call, different dispatch because compile-time knowledge differs.

Why you should care: dispatch is an optimization boundary

The direct overhead of dynamic dispatch can be small (often “a couple of loads + an indirect call”). The bigger cost is what it can block: inlining, constant propagation, escape analysis/scalar replacement, vectorization, etc. That’s why understanding dispatch mechanisms helps you reason about performance and abstraction design.

Core ingredients of dynamic dispatch

Most systems combine:

Runtime type identity (class pointer, type tag, shape/map, metadata)
Mapping from (type, operation) → code (vtables/itables/dictionaries/caches)
Call protocol (how this/self is passed; ABI details)
Optimization strategy (inline caches, devirtualization, guarded inlining, specialization)

The trade-off is consistent: more dynamism → more runtime work per call, unless the runtime can make common cases fast.

A quick comparison table

Below is a deliberately “typical” view; exact details vary by ABI/runtime.

| Mechanism | Lookup / call steps (simplified) | Typical cost characteristics | Commonly used in | Optimization hooks | |---|---|---|---|---| | Vtable (single dispatch) | load vptr → load slot → indirect call | predictable; inhibits inlining unless devirtualized | C++, Java/.NET (conceptually), Rust dyn Trait (trait vtable) | devirtualization, guarded inlining, PGO/type profiling | | Interface dispatch (secondary tables / stubs) | resolve interface method to impl (itable/dictionary/stub) → indirect call | often slightly more work than vtable; varies widely | JVM invokeinterface, .NET interface calls, Swift existentials | call-site caches, stubs, devirtualization | | Inline caches (ICs) for dynamic objects | check receiver shape/type (+ prototype validity) → fast path; else generic lookup | can be near-direct in monomorphic steady state; megamorphic slower | JavaScript, Ruby; increasingly Python bytecode caching | monomorphic/polymorphic ICs, shape stabilization | | Hash/dict lookup | hash name → probe dict → resolve descriptor/bind → call | flexible but expensive without caches | Python/Ruby object models (baseline) | dict versioning, ICs, specialization | | Dictionary passing (typeclass/witness) | pass dict/witness arg → load function pointer → call | often optimized away by inlining/specialization | Haskell typeclasses, Swift generics, some ML-family designs | specialization, inlining, whole-module optimization | | Multimethod cache (multiple dispatch) | compute type-tuple key → cache lookup; miss triggers method search | hot path can be fast; miss can be expensive | Julia, CLOS | method caches, specialization, inline caches for type tuples |

Mechanism 1: vtables (classic OOP single dispatch)

A vtable is a table (often an array) of function pointers for virtual methods.

Typical model:

Each object carries a hidden pointer (often vptr) to class metadata or directly to the vtable.
A virtual call does:
1. load vptr
2. load function pointer from a slot
3. indirect call

ABI realities and “fixed slot” nuance

It’s common to say “method m is at a fixed vtable slot,” but the important nuance is:

The slot is fixed for a given base/interface view, not necessarily “globally for the class.”
With single inheritance, you often get one primary vptr/vtable, and slots line up predictably.
With multiple inheritance, objects may contain multiple vptrs (one per base subobject view), and calls may go through thunks that adjust this before entering the final override.

So the “fixed offset” story is: fixed relative to the ABI’s chosen layout for that base subobject/interface.

Where you see it

C++: virtual functions (ABI-dependent). Multiple inheritance introduces multiple vptrs and thunks.
Java/.NET: “virtual dispatch” exists conceptually, though actual implementation is runtime/JIT-specific and may not be a literal C++-style vtable in memory.
Rust dyn Trait: trait objects use a vtable-like structure (details below).

Mechanism 2: interface dispatch (when the slot isn’t a single fixed offset)

Dispatch through interfaces is trickier than class-virtual dispatch because a class can implement many interfaces, and the mapping from “interface method” to “concrete method entry point” may not be a single fixed slot in the class’s primary vtable.

Strategies differ substantially across runtimes. Two common shapes:

JVM-style sketch (`invokeinterface`)

A simplified mental model (not an exact spec of HotSpot internals):

At the call site, you have an interface method (name+descriptor) and a receiver object.
Runtime must find the receiver’s concrete class.
It then resolves the interface method to the concrete implementation using per-class interface metadata (often described as itables or secondary interface method tables).
JITs typically add inline caches / profiling so that once a call site is monomorphic, it becomes a quick “type check + direct call” or a short stub.

.NET-style sketch (stubs + caching)

A common approach is:

The call goes through a small dispatch stub.
The stub uses the receiver’s type and the interface method token/slot to find the target.
The runtime caches the result (often effectively building a fast path for common receiver types).
The JIT may devirtualize when it can prove the concrete type (or observes it via profiling).

The important point: “interface dispatch” is not one thing—it’s a family of strategies combining metadata + stubs + caches.

Mechanism 3: dynamic-object lookup + inline caches

Dynamic languages often allow objects to change shape (add/remove properties, change prototypes/metaclasses), so the runtime can’t rely on a static per-class vtable layout.

To avoid “hash lookup on every access,” modern runtimes combine object layout metadata with inline caches.

Separate the costs: property access vs method call

Readers often compress everything into “dispatch,” but there are usually multiple stages:

Property/attribute lookup: find m on the object/class/prototype chain.
Binding/descriptor step: turn a function into a bound method (or apply descriptor rules).
Call: invoke the resulting callable (which itself may have its own dispatch).

Optimizations may accelerate (1) and (2) separately from (3).

JavaScript: shapes/hidden classes + prototype chains

Modern JS engines represent object layouts via hidden classes / shapes / maps.

Own-property access: if the engine knows the receiver’s shape, it can load a property via a fixed offset.
Prototype-chain lookup: if the property isn’t an own property, the engine walks the prototype chain.

Where ICs come in:

Inline caches typically key on the receiver map/shape and include guards that the prototype chain is still valid (e.g., via a prototype validity cell or equivalent guard).
Monomorphic IC: one shape → fast path.
Polymorphic IC: a small set of shapes.
Megamorphic: many shapes → generic lookup.

This is why “stable shapes” and avoiding highly polymorphic call sites matter in JS performance.

Python (CPython): attribute lookup, binding, and 3.11+ inline caches

In CPython, obj.m generally involves:

instance __dict__ (or managed/optimized storage)
type dictionary
MRO search
descriptor protocol (functions become bound methods)

Since Python 3.11, CPython introduced and expanded inline caches in the bytecode interpreter. Concretely, caches accelerate common opcodes such as:

LOAD_ATTR (attribute reads)
LOAD_METHOD (method lookup optimized for calls)
call-related opcodes (the call sequence was reorganized; caching helps reduce repeated resolution work)

A key detail: caches must be invalidated when relevant dictionaries change. CPython uses dictionary version tags (and related guards) so that when a class dict or instance dict mutates in a way that could affect lookup, cached assumptions are discarded.

The result: Python can be much faster on stable attribute patterns, but the model remains highly dynamic, so dispatch/lookup is still significant compared to ahead-of-time compiled code.

Mechanism 4: dictionary passing (typeclasses, witnesses, trait objects)

Here “dictionary” means a bundle of operations (function pointers / metadata) passed alongside a value—not a hash map.

This avoids “look up by name” at runtime. Instead, the caller supplies “here are the operations for this type.”

Rust: trait objects (`dyn Trait`)

Rust supports dynamic dispatch with dyn Trait:

A trait object is a fat pointer: (data_ptr, vtable_ptr).
The vtable contains function pointers for the trait’s methods and typically includes required runtime metadata such as drop glue (and size/alignment information).
Calls through dyn Trait are indirect and generally cannot be forced to inline with #[inline]—though the optimizer may still devirtualize in some cases.

Rust also heavily uses static dispatch via generics (T: Trait), which monomorphize code and often erase dispatch overhead entirely.

Haskell: typeclass dictionaries + specialization

Haskell typeclasses are commonly implemented as dictionary passing:

A polymorphic function receives an implicit dictionary of operations.
GHC can often inline dictionary fields and/or apply specialization to produce versions of functions for specific types.

That’s why “typeclass overhead disappears” is not magic—it’s the combination of dictionary passing plus aggressive inlining/specialization that can eliminate the indirection.

Swift: witness tables, generics vs existentials

Swift protocols use witness tables (a form of dictionary passing), but the performance story depends on how you use the protocol:

Generics (func f<T: P>(_ x: T)): the compiler passes a witness table for T and can often specialize f for concrete T (especially with whole-module optimization). This can make calls effectively static.
Protocol existentials (any P): values carry runtime metadata and a witness table-like reference; calls are typically dynamically dispatched through that table.

At a high level, whether Swift can turn indirection into direct calls depends on specialization opportunities and compilation mode (e.g., whole-module optimization, resilience boundaries, @inlinable).

Mechanism 5: multiple dispatch (multimethods) and caches

Dispatch doesn’t have to be based on a single receiver.

Single dispatch: choose based on dynamic type of one distinguished receiver (obj.m()).
Multiple dispatch: choose based on dynamic types of multiple arguments (collide(a, b)).

Ambiguity and “most specific” isn’t always trivial

Multiple dispatch typically orders methods by a specificity partial order (e.g., Int is more specific than Number). But partial orders can produce ties:

Two candidates may be incomparable (neither is strictly more specific).
Systems handle this via rules (signal ambiguity error, require disambiguation, or define tie-breakers).

Cost model: search vs cached fast path

A useful mental model:

Cold path / cache miss: search method table for best match (can be nontrivial).
Hot path / cache hit: use a cache keyed by the argument type tuple → jump to the resolved method.

Julia

Julia is a flagship multiple-dispatch language:

Methods are defined for combinations of argument types.
The runtime/JIT specializes functions on concrete types.

Important nuance: specialization often makes dispatch effectively static per compiled method instance. However, dynamic dispatch can still occur when:

arguments are abstractly typed (e.g., ::AbstractArray without concrete parameters)
values flow as Any
type instability forces runtime dispatch

Common Lisp (CLOS)

CLOS supports generic functions with multimethods and method combination.

Dispatch considers multiple arguments.
Implementations rely heavily on caches; stable call patterns can be fast.

Devirtualization: turning dynamic dispatch into (almost) static dispatch

A major optimization in both compiled and JITed systems is devirtualization: prove (or speculate) that a call site has one likely target.

A common JIT pattern is guarded inlining:

// original: x.f()
if (type(x) == T) {
  // inline T::f
  ...
} else {
  // fallback to generic dispatch
  call_virtual(x, "f")
}

Common enablers

Class hierarchy analysis (method not overridden)
Sealed/final classes (closed set of subtypes)
Type profiling / PGO (call site is monomorphic in practice)
Whole-program optimization / LTO (more global visibility)

Common blockers

Separate compilation (can’t see all overrides)
Reflection / dynamic class loading (notably on the JVM)
FFI boundaries
Highly polymorphic or megamorphic call sites

Common misconceptions (quick reset)

“Dynamic dispatch = dynamic typing.” No: interfaces/virtuals in static languages are dynamic dispatch.
“Dynamic dispatch is slow by default.” Often the steady-state cost is small, especially with monomorphic ICs or successful devirtualization.
“The JIT can’t inline virtual calls.” It can—via profiling + guarded inlining.
“Multiple dispatch is just overloading.” Overloading is compile-time; multiple dispatch is runtime across multiple arguments.

Practical performance considerations (micro vs macro)

At the micro level, a vtable call might be “two loads + indirect branch.” That’s not always catastrophic.

At the macro level, the bigger hit is often lost inlining, which can cascade:

Without inlining, the compiler may miss escape analysis/scalar replacement, keeping objects on the heap.
Missed constant propagation can prevent loop optimizations.

Tie-back to mechanisms:

Monomorphic ICs (JS/Ruby/Python caches) and devirtualization (C++/JVM/.NET) are so valuable because they restore inlining-like benefits.

Rules of thumb:

Monomorphic call sites are optimization-friendly.
Megamorphic call sites tend to stay expensive.
Stable object shapes (JS) and stable attribute layouts (Python) help caches.

Mapping languages to mechanisms (without the “language tour” repetition)

C++: vtables (ABI-defined), multiple inheritance complexities (multiple vptrs, thunks), plus templates for static dispatch.
Java (JVM): virtual + interface dispatch with heavy JIT profiling, ICs, guarded inlining; interface dispatch is notably runtime-specific.
C#/.NET: virtual + interface dispatch via stubs/caches; JIT devirtualization when types are known or profiled.
JavaScript: shapes/hidden classes + prototype-chain lookup; ICs cache receiver map + prototype validity.
Python (CPython): dict/MRO/descriptor lookup with bytecode inline caches (3.11+) guarded by dict versioning.
Ruby: dynamic lookup with inline caches (runtime-dependent, but similar principles).
Rust: dyn Trait fat pointers + vtables (incl. drop glue); generics monomorphization for static dispatch.
Swift: witness tables; generics often specialize (static-ish), existentials typically dispatch through runtime metadata.
Haskell (GHC): dictionary passing plus inlining/specialization often eliminates overhead.
Julia / CLOS: multiple dispatch with caches; cold-path search + hot-path cached resolution.

Choosing the right dispatch model (design, not just runtime)

Prefer static dispatch when predictability and peak performance matter and flexibility isn’t required.
Prefer single dispatch for classic OOP encapsulation and substitutability.
Prefer multiple dispatch when behavior naturally depends on multiple runtime types (and you want to avoid manual double-dispatch).
Prefer dictionary/witness-table designs when you want polymorphism without inheritance and want specialization to erase overhead.

Concluding checklist (actionable)

When performance matters, aim for:

Keep hot call sites monomorphic (fewer runtime types at the same call site).
Use sealed/final where appropriate (helps devirtualization in many ecosystems).
Prefer generics/monomorphization when the abstraction is performance-critical (e.g., Rust generics, Swift generics, Haskell specialization).
Use trait objects / existentials intentionally when you need runtime heterogeneity.
Stabilize object shapes (JS) and avoid mutating class/instance dictionaries in hot paths (Python) to keep caches valid.

Summary

Dynamic dispatch is runtime selection of behavior based on runtime information, enabling polymorphism and extensibility. The core mechanisms are remarkably reusable across ecosystems:

Vtables: predictable indirect calls; great when devirtualized
Interface dispatch variants: secondary tables/stubs + caches; runtime-specific
Dynamic lookup + inline caches: shapes/maps + prototype/MRO rules + caching
Dictionary passing: carry operations explicitly; often optimized away via specialization
Multimethod caches: cold-path method search with hot-path cached resolution

Understanding which mechanism you’re actually using—and what optimizations it enables—helps you write faster code, design better abstractions, and explain why similar-looking patterns behave very differently across languages.