Apple researchers developed ParaRNN, a method allowing nonlinear Recurrent Neural Networks to train in parallel. This removes the sequential bottleneck that previously prevented RNNs from scaling to billions of parameters. By combining the efficiency of RNN inference with scalable training, the architecture offers a viable, low-memory alternative to Transformers for LLM deployment.