← Back to blog

LLM Benchmark & Optimization

In-depth analysis of LLM cost/performance trade-offs for production RAG systems.

At Galadrim, I led a comprehensive study on the performance of various market LLMs (OpenAI, Anthropic, Mistral, Llama).

The goal was to define cost/performance benchmarks to guide our RAG architecture choices.

I developed an automated testing protocol measuring latency (TTFT, TPS) and response quality on specific tasks (summarization, extraction, classification).