AI Engineering by Nirant

1.2K subscribers

About AI Engineering by Nirant

I share links and learning resources. No spam.

Similar Channels

Swipe to see more

Posts

AI Engineering by Nirant
5/27/2025, 12:58:44 PM

I knew this to be true for BERT-era and ResNet-style models, and very pleasantly surprised that this holds for pre-training LLMs too: The choice of pretraining data and tokenizer has the largest impact on scaling trends. Even switching from Llama (Transformer) to Mamba (State-Space Model) barely changes loss-to-loss relationships! In contrast, architecture, model size, context length, and optimizer settings have negligible impact. This suggests architectures can be freely optimized for efficiency, while data curation is the real key to strong generalization. Source: Loss Line work by Brendel group https://brendel-group.github.io/llm-line/

👍 7
AI Engineering by Nirant
5/29/2025, 4:43:16 AM

1 click search benchmarking over text datasets for bm25, embedding models https://github.com/machinelearningZH/semantic-search-eval Design Review: Could use better synthetic data generation, this is very brittle and favours BM25 heavily

❤️ 1
AI Engineering by Nirant
5/23/2025, 1:54:02 PM

For folks interested in the detailed technical blog with benchmarks, this is the one on the Sarvam LLM: https://www.sarvam.ai/blogs/sarvam-m

AI Engineering by Nirant
5/27/2025, 10:51:53 AM

Lovable show a sharp error drop with Claude 4 — clearly Claude 4 is killer at tool use setups like Lovable, Cursor, Claude Code https://x.com/antonosika/status/1926719161935233139

❤️ 😮 2
AI Engineering by Nirant
5/25/2025, 6:25:43 AM

This seems to be completely on point: RAG isn't meant for code agents or tools even. https://pashpashpash.substack.com/p/why-i-no-longer-recommend-rag-for So what's better? grep, file search, AST-indexed search with iterations seem to work a lot better. This is true even in my experience building code generation agents.

❤️ 👍 4
AI Engineering by Nirant
5/23/2025, 1:53:17 PM

*Paras Chopra, Wingify, is hiring Research Interns at Lossfunk.* - LLM x RL x evolutionary techniques - World models - Continual learning - Novelty generation by AI systems https://superform.co/form/Cx8u9Uj

❤️ 1
AI Engineering by Nirant
6/2/2025, 7:28:03 AM

Hahaha, ideas around continuous pre-training are now being formalized into teams called "mid-training" https://vintagedata.org/blog/posts/what-is-mid-training

😂 3
AI Engineering by Nirant
5/22/2025, 4:56:33 PM

Claude 4 models are meaningfully better on code benchmarks than o3 — the best reasoning model so far. And Anthropic is bundling web search, a Python execution sandbox and a files API 🤯 https://x.com/AnthropicAI/status/1925591523502301353

🔥 ❤️ 4
AI Engineering by Nirant
5/23/2025, 11:16:52 AM

Sarvam has a new model release: 24B, called "M". Built on Mistral 24B: https://huggingface.co/sarvamai/sarvam-m. Apache 2.0 license. Gemma 3 remains competitive for Indic use cases — but this is a good counterpart if you're looking for something with Indian values

❤️ 👍 8
AI Engineering by Nirant
5/27/2025, 6:36:00 AM

You can now use Claude Code inside Cursor, Windsurf, VS Code, JetBrains — this is amazing for senior engineers! https://docs.anthropic.com/en/docs/claude-code/ide-integrations

👍 ❤️ 7
Link copied to clipboard!