[R]Adaptive Methods in Transformers (Paper explained)

Found a really useful resource for anyone interested in Adaptive Methods in Transformers https://youtu.be/_pYxa50HTBw . Covers adaptive span, sparse mappings and Layerdrop in detail. Implementation can be found here https://github.com/prajjwal1/adaptive_transformer.

submitted by /u/guardianultra
[link] [comments]

Close Menu