Jindřich Prokop presents XLNet: Generalized Autoregressive Pretraining for Language Understanding

On 2019-11-05 14:00:00 at G205, Karlovo náměstí 13, Praha 2

A reading group looking at XLNet [4] neural language model and its potential
applications in:
❄ medical QaA
❄ chatbots
❄ other areas of interests of the participants
will be held on 05/11 at 14:00 in Karlovo náměstí compound, in building G,
room
205. The paper will be presented by Jindřich Prokop. You are most welcome to
attend. If you plan to attend, please send an indication of your attendance to
prokojin@fel.cvut.cz. More detailed rationale for the reading group follows.
--------------------------------------------------------------------------------

In recent years, the transformer architecture [1] established itself as a basis
for state-of-the-art pre-trained neural language models. Several improvements
were made on this, notably:
❄ Transformer-XL [2], which re-introduced recurrence,
❄ BERT [3] approach which uses a different pre-training objective (denoising
auto-encoding) compared to previous approaches (directional
auto-regression), capitalizing on using context from both sides,
❄ and XLNet [4], transformer model using generalized auto-regression, which:

1) Was a fruit of an endeavour to get advantages of preceding two, while
getting over their deficiencies
2) In June this year achieved SotA results on 18 tasks and outperformed BERT
on 20 tasks,
3) Provided strong arguments why its generalized AR approach could provide
better results than BERT.

Although most of the XLNet results were outdone by further tuned BERT approach
subsequently, the arguments given by the XLNet authors seem to hold
nevertheless
and the possibility of better tuned XLNet reclaiming its supremacy remains. The
reading group aims at getting an understanding of XLNet components and
discussing applications relevant to the participants. If successful, further
meets can look into BERT (RoBERTa, ALBERT) which are currently SotA on some
tasks (RACE, SQuAD 2.0).
--------------------------------------------------------------------------------

[1] Attention is all you need [https://arxiv.org/abs/1706.03762v5]
[2] Transformer-XL [https://arxiv.org/abs/1901.02860v3]
[3] BERT [https://arxiv.org/abs/1810.04805v2]
[4] XLNet [https://arxiv.org/abs/1906.08237v1]