Supplementary Material: Implementation and Experiments for GAU-based Model This is achieved by designing a performant layer named GAU (Gated Attention Unit), which combines the Attention layer and FFN In this paper, some implementation details are re-analyzed both theoretically and practically We then propose a novel GAU-based model and pre-train it on a Chinese corpus