Part 2: Abstractions, Design, and Testing

Representing text as features: Tokenizers, TextFields, and TextFieldEmbedders

A deep dive into AllenNLP's core abstraction: how exactly we represent textual inputs, both on the data side and the model side.

1Language to features

2Tokenizers and TextFields

3TokenIndexers

4The model side: TextFieldEmbedders

5Coordinating the three parts

6Using pretrained contextualizers and embeddings

7Doing word-level modeling with a wordpiece transformer

8How padding and masking works

9Interacting with TextField outputs in your model code

10How to upload transformer weights and tokenizers to HuggingFace


  1. See the BucketBatchSampler for AllenNLP’s built-in way to organize batches