Posts tagged with "Llama"

Found 2 posts

May 18, 2026

Small Models in Production: When Phi-4 and 8B Llama Win

Frontier models are the default. Defaults are how teams overpay on LLM bills. Three workloads where small models (Phi-4, Llama 3.x 8B, Mistral Small) outperform on cost-per-decision without losing meaningfully on quality, three workloads where they do not, and the two-tier production pattern that cost-conscious teams converge on after a quarter of evaluation work.

AI & Cloud Infrastructure

February 17, 2026

Small Language Models On-Prem: The Phi-4 and Llama 3.3 ROI Math

When running small language models on-prem actually beats hosted inference — Phi-4, Llama 3.3, GPU sizing, Ollama and vLLM deployment patterns, and the honest cost math for Swedish data-residency workloads.