Featured image

Youtube: Open NLP meetup #7

Slides: A Practical Introduction to Image Retrieval

Colab: MultiModalRetriever - Live coding

All the material can also be found here.


A Practical Introduction to Image Retrieval Link to heading

by Sara Zanzottera from deepset

Search should not be limited to text only. Recently, Transformers-based NLP models started crossing the boundaries of text data and exploring the possibilities of other modalities, like tabular data, images, audio files, and more. Text-to-text generation models like GPT now have their counterparts in text-to-image models, like Stable Diffusion. But what about search? In this talk we’re going to experiment with CLIP, a text-to-image search model, to look for animals matching specific characteristics in a dataset of pictures. Does CLIP know which one is “The fastest animal in the world”?


For the 7th OpenNLP meetup I presented the topic of Image Retrieval, a feature that I’ve recently added to Haystack in the form of a MultiModal Retriever (see the Tutorial).

The talk consists of 5 parts:

  • An introduction of the topic of Image Retrieval
  • A mention of the current SOTA model (CLIP)
  • An overview of Haystack
  • A step-by-step description of how image retrieval applications can be implemented with Haystack
  • A live coding session where I start from a blank Colab notebook and build a fully working image retrieval system from the ground up, to the point where I can run queries live.

Towards the end I mention briefly an even more advanced version of this image retrieval system, which I had no time to implement live. However, I later built a notebook implementing such system and you can find it here: Cheetah.ipynb

The slides were generated from the linked Jupyter notebook with jupyter nbconvert Dec_1st_OpenNLP_Meetup.ipynb --to slides --post serve.

This was my most popular talk to date, with almost a hundred attendees watching live.