Collaborative Language Model Runner
Petals is an innovative framework designed to democratize access to large language models (LLMs) with over 100 billion parameters. Developed under the BigScience project, Petals enables users to run and fine-tune these models collaboratively through a decentralized network. This system allows users to contribute their computational resources, thereby overcoming hardware limitations that typically restrict individual researchers from utilizing such massive models. By loading only a small part of the model locally and teaming up with others, users can run these models without requiring high-end hardware, making advanced NLP capabilities more accessible to a broader audience.
Petals operates by splitting large language models across multiple machines in a BitTorrent-style network, allowing for distributed model execution. It offers a flexible PyTorch-based API that supports custom fine-tuning, sampling methods, and access to model internals. The framework also boasts efficient inference capabilities, achieving speeds up to 10x faster than traditional offloading techniques. Furthermore, Petals supports collaborative fine-tuning, enabling users to adapt large models for specific domains or tasks using distributed resources.