AI Engineering by Nirant

1.2K subscribers

About AI Engineering by Nirant

I share links and learning resources. No spam.

Similar Channels

Problems And Solutions 👍🙏

Al Bahir Engineering Works

Gobus Sarl

Learn with ALLY

Yo Acuso

SNBT Warriors H-58

Jobsanita.it - Concorsi Pubblici & ECM FAD

laabdon.com.uy

Programming & AI Resources - Python, Java, C, Artificial Intelligence | Computer Science | SDE Jobs

Mundo Agora 🌏

Learn Skills Zero To Hero

Jesus Lives Ministries

Neymar Links

Sherry's Teatrip

The Learning Curve

Swipe to see more

Posts

AI Engineering by Nirant

5/27/2025, 12:58:44 PM

I knew this to be true for BERT-era and ResNet-style models, and very pleasantly surprised that this holds for pre-training LLMs too: The choice of pretraining data and tokenizer has the largest impact on scaling trends. Even switching from Llama (Transformer) to Mamba (State-Space Model) barely changes loss-to-loss relationships! In contrast, architecture, model size, context length, and optimizer settings have negligible impact. This suggests architectures can be freely optimized for efficiency, while data curation is the real key to strong generalization. Source: Loss Line work by Brendel group https://brendel-group.github.io/llm-line/

👍 7

View

AI Engineering by Nirant

5/29/2025, 4:43:16 AM

1 click search benchmarking over text datasets for bm25, embedding models https://github.com/machinelearningZH/semantic-search-eval Design Review: Could use better synthetic data generation, this is very brittle and favours BM25 heavily

❤️ 1

View

AI Engineering by Nirant

5/23/2025, 1:54:02 PM

For folks interested in the detailed technical blog with benchmarks, this is the one on the Sarvam LLM: https://www.sarvam.ai/blogs/sarvam-m

View

AI Engineering by Nirant

5/27/2025, 10:51:53 AM

Lovable show a sharp error drop with Claude 4 — clearly Claude 4 is killer at tool use setups like Lovable, Cursor, Claude Code https://x.com/antonosika/status/1926719161935233139

❤️ 😮 2

View

AI Engineering by Nirant

5/25/2025, 6:25:43 AM

This seems to be completely on point: RAG isn't meant for code agents or tools even. https://pashpashpash.substack.com/p/why-i-no-longer-recommend-rag-for So what's better? grep, file search, AST-indexed search with iterations seem to work a lot better. This is true even in my experience building code generation agents.

❤️ 👍 4

View

AI Engineering by Nirant

5/23/2025, 1:53:17 PM

*Paras Chopra, Wingify, is hiring Research Interns at Lossfunk.* - LLM x RL x evolutionary techniques - World models - Continual learning - Novelty generation by AI systems https://superform.co/form/Cx8u9Uj

❤️ 1

View

AI Engineering by Nirant

6/2/2025, 7:28:03 AM

Hahaha, ideas around continuous pre-training are now being formalized into teams called "mid-training" https://vintagedata.org/blog/posts/what-is-mid-training

😂 3

View

AI Engineering by Nirant

5/22/2025, 4:56:33 PM

Claude 4 models are meaningfully better on code benchmarks than o3 — the best reasoning model so far. And Anthropic is bundling web search, a Python execution sandbox and a files API 🤯 https://x.com/AnthropicAI/status/1925591523502301353

🔥 ❤️ 4

View

AI Engineering by Nirant

5/23/2025, 11:16:52 AM

Sarvam has a new model release: 24B, called "M". Built on Mistral 24B: https://huggingface.co/sarvamai/sarvam-m. Apache 2.0 license. Gemma 3 remains competitive for Indic use cases — but this is a good counterpart if you're looking for something with Indian values

❤️ 👍 ❤ 8

View

AI Engineering by Nirant

5/27/2025, 6:36:00 AM

You can now use Claude Code inside Cursor, Windsurf, VS Code, JetBrains — this is amazing for senior engineers! https://docs.anthropic.com/en/docs/claude-code/ide-integrations

👍 ❤️ 7

View