Series: Practical Questions

July 05, 2026

Reasoning makes translations worse

We tend to believe that making the LLM reason more can only improve its answer, regardless of the task. However, when translating from a language to another, the opposite happens.

June 12, 2026

What's an agent harness?

The intelligence of modern AI agents is due to the LLMs, but their practical capabilities only exist thanks to their harness.

March 24, 2026

Setting the temperature to zero will make an LLM deterministic?

We all know LLMs don't always respond the same thing to slight changes of prompt. But why does their answer differ also when the prompt is identical? And what can we do to prevent it?

March 15, 2026

Is grep really better than a vector DB?

Some agentic applications don't use vector DBs for search. Is it a good idea?

February 04, 2026

How does LLM memory work?

All LLMs can keep track of a short conversation. But how do they remember things long-term?

December 11, 2025

What are the "experts" in Mixture-of-Experts LLMs?

And how can 8 or 16 of them cover all possible domain of expertise?

November 04, 2025

What's hybrid retrieval good for?

We've been told embedding search strictly superior to BM25 and all other keyword-search algorithms. But they still have a role in modern search pipelines.

October 23, 2025

How does prompt caching work?

Nearly all inference libraries can do it for you. But what's really going on under the hood?

October 17, 2025

What is prompt caching?

Caching prompts can have an outsized impact on the cost and latency of your AI apps. But what exactly to cache and how?

October 09, 2025

Why using a reranker?

And is the added latency worth it? Let's understand what they do and how can they improve the quality of your RAG pipelines so drastically.