ParaRNN enables the parallel training of nonlinear recurrent neural networks, removing the sequential bottleneck that previously limited their scale. This architecture allows RNNs to reach billions of parameters while maintaining a smaller memory footprint than Transformers. Apple researchers now provide a viable alternative for developers deploying LLMs on resource-constrained hardware.