TutorialsSource: MEDIUM LARGE LANGUAGE MODELSImportance: 2/5
This tutorial provides a practical overview of core LLM concepts for machine learning engineers. It begins with foundational elements like tokens, transformer architectures, and embeddings, then covers advanced techniques including prompt engineering, retrieval-augmented generation (RAG), and fine-tuning. The guide emphasizes developing sound engineering judgment to move beyond trial-and-error prompting. No new research or product announcements are made; it serves as an educational resource.
SocialSource: V2EXImportance: 2/5
Livid demonstrated that by creating a custom node like /go/wunder on V2EX and posting detailed product feature descriptions there, V2EX Chat can answer product-related questions using those posts as context. The example uses a specific edge.v2ex.com chat session link to show the AI replying solely based on node content. This effectively turns V2EX Chat into a retrieval-augmented knowledge base, enabling product Q&A without building a separate chatbot.
PapersSource: ARXIVImportance: 4/5
The paper proposes Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that teaches language models to reason by analogy. It first trains a reasoning-aware retriever via gold-relevance distillation, so that contexts are ranked by expected reasoning benefit rather than semantic overlap. The policy model is then fine-tuned using reinforcement learning on retrieved analogous demonstrations under verifiable outcome rewards, enabling it to leverage reasoning traces. Analysis shows that reasoning-aware retrieval surfaces complementary solution strategies that provide distinct scaffolding per problem. On AIME 2025, RA-RFT improves average@32 accuracy over GRPO by 7.1 points for Qwen3-1.7B and 2.8 points for Qwen3-4B, demonstrating that reasoning-aware retrieval is an orthogonal improvement to reward design or training curricula.
PapersSource: ARXIVImportance: 4/5
The paper introduces SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, comprising 31 datasets across 7 task types. Evaluation of 31 embedding models shows large instruction-tuned multilingual models perform best, while existing Slovak-specific NLU models transfer poorly to embedding tasks. The authors develop e5-sk-small (45M parameters) and e5-sk-large (365M) by vocabulary trimming and fine-tuning Multilingual E5 models. Despite size reductions of up to 62%, these open-source models achieve competitive performance with proprietary APIs and are suitable for local deployment in semantic search and RAG. The benchmark, models, datasets, and code are released openly, offering a replicable path for other under-resourced languages.
PapersSource: ARXIVImportance: 4/5
The paper introduces FORGE, a benchmark that measures how often search-augmented LLMs recommend fake products when retrieved web pages are polluted. FORGE rewrites real product descriptions into fake ones across 225 products, 15 categories, and 5 consumer scenarios, then tests 12 commercial and open-weights LLMs. A single polluted page causes fooled recommendation rates up to 27%, and replacing the top-3 search results raises the rate to 73.8%. Vulnerability varies by category, with less familiar products more easily exploited, and reasoning models sometimes worsen the problem by fabricating social proof. Three defenses are evaluated—skepticism prompting, consensus filtering over model priors, and cross-document evidence—but skepticism can backfire and filtering may suppress legitimate recommendations.
This Towards Data Science tutorial presents a PDF parsing method that outputs relational DataFrames instead of flat text. It extracts structured elements including lines, pages, table of contents, images, cross-references, captions, spans, and a parsing summary. The relational shape is designed to improve retrieval-augmented generation (RAG) workflows by preserving document structure. The post is part of the 'Enterprise Document Intelligence' series.