Google AI Proposes Transformer XL With 80 Percent Longer Dependency Than Recurrent Neural Networks

Artificial Intelligence News

Google-AI-Proposes_Transformer-XL-With-80-Percent-Longer-Dependency-Than-Recurrent-Neural-Networks Google AI Proposes Transformer XL With 80 Percent Longer Dependency Than Recurrent Neural NetworksA team of Google AI researchers, in conjunction with Carnegie Mellon University, announced the details of their newly proposed architecture, named Transformer-XL. It is designed to advancing natural language understanding beyond a fixed-length context with higher self-attention. Fixed-length context is a long text series truncated into fixed-length segments of a few hundred characters.

The research teams have utilized two methods to quantitatively study the effective lengths of Transformer-XL and the baselines, including the segment-level recurrence mechanism and a relative positional encoding scheme. In segment-level recurrence, recurrence mechanism supports to address the limitations of utilizing a fixed-length context. During the training process, the concealed state sequences computed in the preceding segment are fixed and cached. These are then reused as an extended context once the model starts processing the next new segment. In relative positional encodings, while the segment-level recurrence method is effective, there is a technical challenge that comprises reusing the concealed states. The challenge is to keep the positional information coherent while reusing the states. As a result, Transformer-XL gained new outcomes on a variety of major Language Modeling (LM) benchmarks. This is the first self-attention model which is able to attain better outcomes than Recurrent Neural Networks (RNNs) on both character-level and word-level language modeling.

Transformer-XL has three advantages: its dependency is nearly 80 percent longer than RNNs and 450 percent longer than vanilla Transformers. It is able to perform 1,800+ times faster than a vanilla Transformer during the assessment of language modeling tasks without requiring re-computation. In the end, Transformer-XL has better performance capabilities in perplexity on long sequences because of long-term reliance modeling and on short sequences by resolving the context fragmentation problem.