AI Engineering by Nirant
1.2K subscribers
About AI Engineering by Nirant
I share links and learning resources. No spam.
Similar Channels
Swipe to see more
Posts
I knew this to be true for BERT-era and ResNet-style models, and very pleasantly surprised that this holds for pre-training LLMs too: The choice of pretraining data and tokenizer has the largest impact on scaling trends. Even switching from Llama (Transformer) to Mamba (State-Space Model) barely changes loss-to-loss relationships! In contrast, architecture, model size, context length, and optimizer settings have negligible impact. This suggests architectures can be freely optimized for efficiency, while data curation is the real key to strong generalization. Source: Loss Line work by Brendel group https://brendel-group.github.io/llm-line/
1 click search benchmarking over text datasets for bm25, embedding models https://github.com/machinelearningZH/semantic-search-eval Design Review: Could use better synthetic data generation, this is very brittle and favours BM25 heavily
For folks interested in the detailed technical blog with benchmarks, this is the one on the Sarvam LLM: https://www.sarvam.ai/blogs/sarvam-m
Lovable show a sharp error drop with Claude 4 — clearly Claude 4 is killer at tool use setups like Lovable, Cursor, Claude Code https://x.com/antonosika/status/1926719161935233139
This seems to be completely on point: RAG isn't meant for code agents or tools even. https://pashpashpash.substack.com/p/why-i-no-longer-recommend-rag-for So what's better? grep, file search, AST-indexed search with iterations seem to work a lot better. This is true even in my experience building code generation agents.
*Paras Chopra, Wingify, is hiring Research Interns at Lossfunk.* - LLM x RL x evolutionary techniques - World models - Continual learning - Novelty generation by AI systems https://superform.co/form/Cx8u9Uj
Hahaha, ideas around continuous pre-training are now being formalized into teams called "mid-training" https://vintagedata.org/blog/posts/what-is-mid-training
Claude 4 models are meaningfully better on code benchmarks than o3 — the best reasoning model so far. And Anthropic is bundling web search, a Python execution sandbox and a files API 🤯 https://x.com/AnthropicAI/status/1925591523502301353
Sarvam has a new model release: 24B, called "M". Built on Mistral 24B: https://huggingface.co/sarvamai/sarvam-m. Apache 2.0 license. Gemma 3 remains competitive for Indic use cases — but this is a good counterpart if you're looking for something with Indian values
You can now use Claude Code inside Cursor, Windsurf, VS Code, JetBrains — this is amazing for senior engineers! https://docs.anthropic.com/en/docs/claude-code/ide-integrations