OpenNLP Meetup: A Practical Introduction to Image Retrieval

Youtube link, slides, Colab (live coding). All the material can also be found here.

A Practical Introduction to Image Retrieval

by Sara Zanzottera from deepset

Search should not be limited to text only. Recently, Transformers-based NLP models started crossing the boundaries of text data and exploring the possibilities of other modalities, like tabular data, images, audio files, and more. Text-to-text generation models like GPT now have their counterparts in text-to-image models, like Stable Diffusion. But what about search? In this talk we’re going to experiment with CLIP, a text-to-image search model, to look for animals matching specific characteristics in a dataset of pictures. Does CLIP know which one is “The fastest animal in the world”?

For the 7th OpenNLP meetup I presented the topic of Image Retrieval, a feature that I’ve recently added to Haystack in the form of a MultiModal Retriever (see the Tutorial).

The talk consists of 5 parts:

An introduction of the topic of Image Retrieval
A mention of the current SOTA model (CLIP)
An overview of Haystack
A step-by-step description of how image retrieval applications can be implemented with Haystack
A live coding session where I start from a blank Colab notebook and build a fully working image retrieval system from the ground up, to the point where I can run queries live.

Towards the end I mention briefly an even more advanced version of this image retrieval system, which I had no time to implement live. However, I later built a notebook implementing such system and you can find it here: Cheetah.ipynb

The slides were generated from the linked Jupyter notebook with jupyter nbconvert Dec_1st_OpenNLP_Meetup.ipynb --to slides --post serve.