[2212. 09748] Scalable Diffusion Models with Transformers In addition to possessing good scalability properties, our largest DiT-XL 2 models outperform all prior diffusion models on the class-conditional ImageNet 512x512 and 256x256 benchmarks, achieving a state-of-the-art FID of 2 27 on the latter