Mac Cluster – Giga Power Universe

Pooled Apple Silicon at Scale

Multiple Mac Minis networked together — each contributing its unified memory and Apple GPU cores to a shared inference pool. Run the largest open-source models and serve them to an entire organisation, privately and without cloud costs.

150–400+

tokens / second

Pooled Apple Silicon

405B+

model parameters

Largest open-source LLMs

∞

concurrent users

Local network serving

Starting from

~$5,000+

Scales with number of nodes

Built for: organisations · schools · research teams · anyone needing private LLM serving for multiple users simultaneously

How the cluster works

Node 1 (Primary)

Orchestrates inference, handles API requests

Node 2

Contributes unified memory + GPU cores to pool

Node 3+

Each node adds capacity linearly

Gigabit Switch

Low-latency inter-node communication

Pooled Apple Silicon — unified memory shared across nodes via llama.cpp cluster mode
Serve models to your whole team or organisation over local network
Extremely power efficient — ~8–35 W per node
Silent, desktop-sized — no special server room needed
Start with 2 nodes, expand anytime
Compatible with OpenAI-compatible API — drop-in for existing tools

Specifications (per node)

Unit	Apple Mac Mini M4 or M4 Pro
Memory per node	24–64 GB unified (pooled across cluster)
Inference speed	~40–80 tok/s per node · 150–400+ tok/s pooled
Models supported	Up to 405B+ parameters (across full cluster)
Networking	Gigabit Ethernet · low-latency cluster mode
Power per node	~8–35 W (exceptionally efficient)
OS	macOS (latest) · llama.cpp cluster stack
Minimum nodes	2 (expandable)

Design your cluster