The Prometheus server offers up to 128 terabytes of memory, exceeding the capacity of Nvidia's DGX B300 by 60 times. This scale targets the memory-bound nature of token generation in large language models. By widening the data pipeline, the hardware aims to eliminate the primary bottleneck slowing LLM inference performance.