LFM2-VL

Liquid AI has introduced LFM2-VL, a new series of vision-language foundation models. These models are designed to work efficiently on various devices, from phones to wearables. LFM2-VL can process both text and image inputs, making it versatile for different applications. It offers significant speed improvements, up to 2x faster on GPUs compared to existing models, while maintaining high accuracy. LFM2-VL comes in two versions: LFM2-VL-450M for resource-limited settings and LFM2-VL-1.6B for more capable but still lightweight use. These models are part of the open-source ecosystem, allowing for customization and multi-platform edge deployment. LFM2-VL is built on three main components: a language model backbone, a vision encoder, and a multimodal projector. The model is trained on a large dataset, including open-source and in-house synthetic vision data, totaling 100 billion multimodal tokens. LFM2-VL has been evaluated on several public benchmarks, showing excellent performance in high-resolution image understanding and multimodal instruction following. The models are available under an open license based on Apache 2.0, allowing for academic, research, and commercial use by smaller companies. Larger companies can obtain a commercial license by contacting Liquid AI. The models are compatible with Hugging Face transformers and TRL, with plans to integrate them into other popular frameworks. For custom solutions with edge deployment, interested parties can contact the sales team at Liquid AI.
Comments
Please log in to post a comment.