Posts

March 04, 2026

Phishing AI Agents

Most LLMs are hardened against classic prompt injection attacks. But AI agents also behave like naive humans sometimes...

February 04, 2026

How does LLM memory work?

All LLMs can keep track of a short conversation. But how do they remember things long-term?

January 07, 2026

From RAG to AI Agent

A step-by-step guide to transform your RAG pipelines into effective AI agents.

December 11, 2025

What are the "experts" in Mixture-of-Experts LLMs?

And how can 8 or 16 of them cover all possible domain of expertise?

November 04, 2025

What's hybrid retrieval good for?

We've been told embedding search strictly superior to BM25 and all other keyword-search algorithms. But they still have a role in modern search pipelines.

October 29, 2025

Making sense of KV Cache optimizations, Ep. 4: System-level

Let's make sense of the zoo of system-level techniques that exist out there.

October 28, 2025

Making sense of KV Cache optimizations, Ep. 3: Model-level

Let's make sense of the zoo of model-level techniques that exist out there.

October 27, 2025

Making sense of KV Cache optimizations, Ep. 2: Token-level

Let's make sense of the zoo of token-level techniques that exist out there.

October 26, 2025

Making sense of KV Cache optimizations, Ep. 1: An overview

Let's make sense of the zoo of techniques that exist out there.

October 23, 2025

How does prompt caching work?

Nearly all inference libraries can do it for you. But what's really going on under the hood?

October 17, 2025

What is prompt caching?

Caching prompts can have an outsized impact on the cost and latency of your AI apps. But what exactly to cache and how?

October 09, 2025

Why using a reranker?

And is the added latency worth it? Let's understand what they do and how can they improve the quality of your RAG pipelines so drastically.

September 15, 2025

Trying to play "Guess Who" with an LLM

I expected a different kind of fun.

June 02, 2025

Can you really interrupt an LLM?

You never see that in the demos... why?

May 21, 2025

A simple vibecoding exercise

Can GenAI help you finish your side-projects?

May 16, 2025

Using Llama Models in the EU

The ban's terms are surprisingly not well known among users of these popular "open-source" LLMs.

May 12, 2025

Beyond the hype of reasoning models: debunking three common misunderstandings

This is a teaser for my upcoming talk at ODSC East 2025, "LLMs that Think: Demystifying Reasoning Models". If you want to learn more, join the webinar!

October 30, 2024

Building Reliable Voice Bots with Open Source Tools - Part 2

A practical guide on the best techniques to build performant and cost effective voice bots.

September 20, 2024

Building Reliable Voice Bots with Open Source Tools - Part 1

A deep look at the main challenges of building performant and cost effective voice bots.

June 10, 2024

The Agent Compass

Agent means everything and nothing in today's GenAI landscape. Let's shed some light on this topic.

May 06, 2024

Generating creatures with Teranoptia

Having fun with fonts doesn’t always mean obsessing over kerning and ligatures. Sometimes, writing text is not even the point!

April 29, 2024

RAG, the bad parts (and the good!)

A summary of my recent talk at ODSC East about RAG, just in case you haven't heard enough of it already.

April 14, 2024

Explain me LLMs like I'm five: build a story to help anyone get the idea

Let's explore a high-level way to tell clearly what LLMs are good for to the average pedestrian and help them reason about it.

February 28, 2024

ClozeGPT: Write Anki cloze cards with a custom GPT

Writing good Anki cards is a chore. Let's bring LLMs to the rescue.

February 21, 2024

Is RAG all you need? A look at the limits of retrieval augmentation

This blogpost is a teaser for my upcoming talk at ODSC East 2024 in Boston, April 23-25.

January 06, 2024

Headless WiFi setup on Raspberry Pi OS "Bookworm" without the Raspberry Pi Imager

Setting up a headless Pi used to be simpler. Is it still possible to do it without the RPi Imager?

November 09, 2023

The World of Web RAG

What if our RAG application could fetch data directly from the web, live? Let's build this pipeline with Haystack 2.0.

November 05, 2023

Indexing data for RAG applications

RAG apps need data to work. Let's see how to pre-process our data to make our Haystack 2.0 RAG pipeline perform even better.

October 27, 2023

RAG Pipelines from scratch

Let's build a simple RAG Pipeline with Haystack 2.0 by just connecting three components: a Retriever, a PromptBuilder and a Generator.

October 26, 2023

A New Approach to Haystack Pipelines

Haystack 2.0 comes with a brand new pipeline concept. Let's discover it!

October 15, 2023

Haystack's Pipeline - A Deep Dive

What are Haystack's pipelines and how do they work?

October 11, 2023

Why rewriting Haystack?!

Before even diving into what Haystack 2.0 is, how it was built, and how it works, let’s spend a few words about the whats and the whys.

October 10, 2023

Haystack 2.0: What is it?

December is finally approaching, and with it the release of a Haystack 2.0.

September 10, 2023

An (unofficial) Python SDK for Verbix

If you need a Python SDK for a verb conjugator, try this one while it's still alive.

December 11, 2021

My Dotfiles

What Linux developer would I be if I didn't also have my very own dotfiles repo?