Part 2: Abstractions, Design, and Testing

Representing text as features: Tokenizers, TextFields, and TextFieldEmbedders

A deep dive into AllenNLP's core abstraction: how exactly we represent textual inputs, both on the data side and the model side.

1Language to features

2Tokenizers and TextFields

3TokenIndexers

4The model side: TextFieldEmbedders

5Coordinating the three parts

6Using pretrained contextualizers and embeddings

7Doing word-level modeling with a wordpiece transformer

8How padding and masking works

9Interacting with TextField outputs in your model code


  1. See the BucketBatchSampler for AllenNLP’s built-in way to organize batches