Little Brains, Big Feats: Exploring Compact Language Models
This study examines how small language models perform as generators within Retrieval-Augmented Generation (RAG) systems. Using both open-source and proprietary datasets spanning various subjects and question types, the authors benchmark the generation quality of compact models. The key finding is that a RAG system powered by a small language model can be executed directly on-device without GPU hardware, completing tasks within a reasonable time frame. Experimental code and supplementary materials are publicly available on GitHub.