Tier 6 · Constellation Level

Mac Cluster

M4 Mac M4 Mac M4 Mac + Mac (opt) Gigabit Switch

Pooled Apple Silicon at Scale

Multiple Mac Minis networked together — each contributing its unified memory and Apple GPU cores to a shared inference pool. Run the largest open-source models and serve them to an entire organisation, privately and without cloud costs.

150–400+
tokens / second
Pooled Apple Silicon
405B+
model parameters
Largest open-source LLMs
concurrent users
Local network serving
Starting from
~$5,000+
Scales with number of nodes
Built for: organisations · schools · research teams · anyone needing private LLM serving for multiple users simultaneously

How the cluster works

Node 1 (Primary)

Orchestrates inference, handles API requests

Node 2

Contributes unified memory + GPU cores to pool

Node 3+

Each node adds capacity linearly

Gigabit Switch

Low-latency inter-node communication

Specifications (per node)

UnitApple Mac Mini M4 or M4 Pro
Memory per node24–64 GB unified (pooled across cluster)
Inference speed~40–80 tok/s per node · 150–400+ tok/s pooled
Models supportedUp to 405B+ parameters (across full cluster)
NetworkingGigabit Ethernet · low-latency cluster mode
Power per node~8–35 W (exceptionally efficient)
OSmacOS (latest) · llama.cpp cluster stack
Minimum nodes2 (expandable)
Design your cluster