Microsoft Releases GELab-Zero-4B-Preview-Sico-Evolution: A 4B Vision-Language GUI Agent Fine-Tuned from Qwen3-VL
English summary
Microsoft has released GELab-Zero-4B-preview-Sico-Evolution, a 4-billion-parameter vision-language model specialized for GUI agent tasks. The model is built on Qwen3-VL using LoRA fine-tuning and targets mobile and general GUI agent use cases. It supports English and Chinese text inputs, and processes image-text-to-text pipelines. The release is open-source under the Apache 2.0 license and is noted as an early preview version.
Chinese summary
微软发布了GELab-Zero-4B-preview-Sico-Evolution,这是一个40亿参数的视觉语言模型,专精于GUI代理任务。该模型基于Qwen3-VL并采用LoRA进行微调,面向移动端和通用GUI代理场景。它支持中英双语文本输入,处理图像-文本至文本的流水线。模型以Apache 2.0开源许可发布,标注为早期预览版本。
Key points
Released a 4B parameter vision-language model specialized for GUI agent tasks.
发布了一个40亿参数、专精GUI代理任务的视觉语言模型。
Fine-tuned from Qwen3-VL using LoRA, targeting mobile and GUI agent applications.
基于Qwen3-VL通过LoRA微调,面向移动端和GUI代理应用。
Supports English and Chinese, and operates on image-text-to-text pipeline.
支持中英文,运行于图像-文本至文本的流水线。
Open-source under Apache 2.0 license, marked as a preview version.
以Apache 2.0许可开源,标记为预览版。