Series: Practical Questions

December 11, 2025

What are the "experts" in Mixture-of-Experts LLMs?

And how can 8 or 16 of them cover all possible domain of expertise?

November 04, 2025

What's hybrid retrieval good for?

We've been told embedding search strictly superior to BM25 and all other keyword-search algorithms. But they still have a role in modern search pipelines.

October 23, 2025

How does prompt caching work?

Nearly all inference libraries can do it for you. But what's really going on under the hood?

October 17, 2025

What is prompt caching?

Caching prompts can have an outsized impact on the cost and latency of your AI apps. But what exactly to cache and how?

October 09, 2025

Why using a reranker?

And is the added latency worth it? Let's understand what they do and how can they improve the quality of your RAG pipelines so drastically.