Large Language Models - AI Dev News | Machine Learning Engineering

LlamaIndex 0.12.28 QueryFusionRetriever Throws ValidationError After Pydantic 2.10 Bump

6 mins read

AI/ML

LlamaIndex 0.12.28 QueryFusionRetriever Throws ValidationError After Pydantic 2.10 Bump

Originally reported: March 24, 2026 — llama_index 0.12.28 Overview What changed between the prior and current release Reproducing the ValidationError on a.

Mistral-7B-v0.3 QLoRA on Modal A100-40GB: nf4 + bf16_compute Beat My RunPod H100 Spot Cost Per Step

11 mins read

Cloud Computing

Mistral-7B-v0.3 QLoRA on Modal A100-40GB: nf4 + bf16_compute Beat My RunPod H100 Spot Cost Per Step

April 11, 2026May 22, 2026 Jia Li Song0Tagged Anthropic News, Cohere News, Hugging Face News, JAX News, Keras News, Mistral AI News, OpenAI News, PyTorch News, Stability AI News, TensorFlow News

TL;DR: For a Mistral-7B-v0.3 QLoRA fine-tune at sequence length 2048 and micro-batch 4, a Modal A100-40GB container running bitsandbytes nf4 with bfloat16.

vLLM 0.6 Continuous Batching Cut My Llama 3 Latency in Half

13 mins read

AI/ML

vLLM 0.6 Continuous Batching Cut My Llama 3 Latency in Half

April 11, 2026May 22, 2026 Li Mei Fong0Tagged Anthropic News, Cohere News, Hugging Face News, JAX News, Keras News, Mistral AI News, OpenAI News, PyTorch News, Stability AI News, TensorFlow News

Upgrading a Llama 3 8B endpoint from vLLM 0.5.4 to 0.6.x is the rare dependency bump where the numbers on the dashboard actually move.

o3 Just Broke My Benchmarks (And Probably Yours Too)

8 mins read

AI/ML

o3 Just Broke My Benchmarks (And Probably Yours Too)

January 26, 2026April 19, 2026 Priya Sharma0

I’ve been staring at evaluation curves for the better part of a decade. Usually, they creep up. You get a percent here, a percent there.

Unlocking Multimodal Reasoning: A Deep Dive into the New Wave of Thinking Models on Hugging Face

11 mins read

AI/ML

Unlocking Multimodal Reasoning: A Deep Dive into the New Wave of Thinking Models on Hugging Face

December 8, 2025December 26, 2025 Jia Li Song0Tagged Hugging Face News

Introduction The landscape of artificial intelligence is undergoing a seismic shift, moving rapidly beyond text-only paradigms into a rich, multimodal.

Mastering ONNX 4-Bit Quantization: A Technical Deep Dive into Efficient Edge AI

12 mins read

Large Language Models

Mastering ONNX 4-Bit Quantization: A Technical Deep Dive into Efficient Edge AI

November 24, 2025December 28, 2025 Elara Vance0Tagged ONNX News

The landscape of artificial intelligence is shifting rapidly from massive, cloud-based training clusters to efficient, local inference.

Beyond Benchmarks: How New Open-Source Models are Revolutionizing AI Reasoning and Coding

6 mins read

AI/ML

Beyond Benchmarks: How New Open-Source Models are Revolutionizing AI Reasoning and Coding

October 31, 2025December 26, 2025 Jia Li Song0

Introduction The landscape of artificial intelligence is in a state of perpetual, rapid evolution. For years, the most powerful large language models.