In recent years, autoregressive Transformers have been at the forefront of generative modeling, revolutionizing how samples are generated. These models predict each element of a sample, whether it’s pixels in an image or characters in text, by building on previously generated elements. This approach allows for a richer and more coherent generation process.
Traditional Transformers face limitations when it comes to handling long sequences, restricting their effectiveness to sequences of around 2,048 elements. However, our Perceiver models have overcome this hurdle by excelling in tasks involving up to 100,000 elements. By leveraging cross-attention to encode inputs into a latent space, Perceivers offer a scalable and efficient solution that outperforms standard Transformers.
A significant advancement in this space is the Perceiver AR model, which introduces a novel approach to processing inputs. By aligning latent spaces with input elements and carefully masking the input, Perceiver AR extends the capabilities of traditional Transformers by attending to much longer sequences, up to 50 times longer, while maintaining ease of deployment.
The scalability of Perceiver AR is evident in its performance across various benchmarks, such as ImageNet, PG-19, and MAESTRO, where it consistently produces state-of-the-art results. This improved scalability allows for better long-context modeling, surpassing models like Transformer-XL even at comparable scales.
One exciting application of Perceiver AR is in music generation, where the model can create cohesive and melodic pieces based on a sequence of notes. The ability to capture musical coherence showcases the versatility and creativity of the Perceiver AR architecture.
To delve deeper into the capabilities of Perceiver AR, consider exploring the following:
- Access the JAX code for training Perceiver AR on Github
- Read the research paper on arXiv
- Watch the spotlight presentation at ICML 2022
For more insights on Perceiver AR, visit the Google Magenta blog post featuring additional music examples!