Little Brains, Big Feats: Exploring Compact Language Models
English summary
This study examines how small language models perform as generators within Retrieval-Augmented Generation (RAG) systems. Using both open-source and proprietary datasets spanning various subjects and question types, the authors benchmark the generation quality of compact models. The key finding is that a RAG system powered by a small language model can be executed directly on-device without GPU hardware, completing tasks within a reasonable time frame. Experimental code and supplementary materials are publicly available on GitHub.
Chinese summary
本研究考察了小型语言模型在检索增强生成(RAG)系统中作为生成器的表现。作者使用涵盖不同主题和问题类型的开源与专有数据集对紧凑模型的生成质量进行了基准测试。关键发现是,由小型语言模型驱动的RAG系统无需GPU硬件即可在设备端直接运行,并在合理的时间内完成任务。实验代码和补充材料已在GitHub上公开。
Key points
Small language models can serve as effective generators in RAG pipelines.
小型语言模型可以在RAG流程中充当有效的生成器。
A RAG system with a small LM can run on-device without a GPU, achieving reasonable inference speed.
搭载小型语言模型的RAG系统可在无GPU的设备上运行,并达到合理的推理速度。
The evaluation covers diverse datasets (open-source and proprietary) and question types.
评估涵盖了多样化的数据集(开源和专有)及问题类型。
Experimental code is provided in a GitHub repository for reproducibility.
GitHub仓库提供了实验代码以确保可复现性。