Probing & BERTology

Understanding linguistic structure in pre-trained language models, such as ELMo and BERT.

Pre-trained language models have led to dramatic advancements in NLP capability in recent years, handily outperforming pipelined systems and non-contextual embeddings on most common tasks. What makes them so powerful? As a model like BERT learns to fill in blanks or predict the next word, what kind of linguistic or world knowledge does it acquire? How is this information organized: does the model learn the same rules a human might, or develop it’s own idiosyncratic understanding? And: how is this information used, if such a model is asked to classify text, answer questions, or perform other downstream NLP tasks?

For an excellent primer on what we do - and don’t - know in this space, also see A Primer in BERTology: What We Know About How BERT Works (Rogers et al. 2020).

  1. Can Generative Multimodal Models Count to Ten?
    Can Generative Multimodal Models Count to Ten?
    Sunayana Rane, Alexander Ku, Jason Michael Baldridge, Ian Tenney, Thomas L. Griffiths, and Been Kim
    In Proceedings of the Annual Meeting of the Cognitive Science Society, 2024
  2. The MultiBERTs: BERT Reproductions for Robustness Analysis
    The MultiBERTs: BERT Reproductions for Robustness Analysis
    Thibault Sellam, Steve Yadlowsky, Ian Tenney, Jason Wei, Naomi Saphra, Alexander D’Amour, Tal Linzen, Jasmijn Bastings, Iulia Turc, Jacob Eisenstein, Dipanjan Das, and Ellie Pavlick
    ICLR (spotlight), 2022
  3. Asking without Telling: Exploring Latent Ontologies in Contextual Representations
    Asking without Telling: Exploring Latent Ontologies in Contextual Representations
    Julian Michael, Jan A. Botha, and Ian Tenney
    In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
  4. Do Language Embeddings capture Scales?
    Do Language Embeddings capture Scales?
    Xikun Zhang, Deepak Ramachandran, Ian Tenney, Yanai Elazar, and Dan Roth
    In Findings of the Association for Computational Linguistics: EMNLP, 2020
  5. What Happens To BERT Embeddings During Fine-tuning?
    What Happens To BERT Embeddings During Fine-tuning?
    Amil Merchant, Elahe Rahimtoroghi, Ellie Pavlick, and Ian Tenney
    In Proceedings of the Third BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, 2020
  6. BERT Rediscovers the Classical NLP Pipeline
    BERT Rediscovers the Classical NLP Pipeline
    Ian Tenney, Dipanjan Das, and Ellie Pavlick
    In Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019
  7. What do you learn from context? Probing for sentence structure in contextualized word representations
    What do you learn from context? Probing for sentence structure in contextualized word representations
    Ian Tenney, Patrick Xia, Berlin Chen, Alex Wang, Adam Poliak, R Thomas McCoy, Najoung Kim, Benjamin Van Durme, Sam Bowman, Dipanjan Das, and Ellie Pavlick
    In International Conference on Learning Representations, 2019