New Way to run LLMS
Exo supports different partitioning strategies to split up a model across devices. The default partitioning strategy is ring memory weighted partitioning. This runs an inference in a ring where each device runs a number of model layers proportional to the memory of the device.
exo is under active development, so expect occasional bugs. The exo team actively addresses issues reported through their GitHub repository. exo also welcomes community contributions, with a list of bounties available for those interested.
Tired of expensive NVIDIA GPUs hindering your deep learning projects? exo is a revolutionary new software that unlocks the hidden potential of your existing devices! exo transforms your iPhone, iPad, Android phone, Mac, Linux machine, or practically any device into a single, powerful GPU. exo is an experimental, open-source software program that facilitates distributed deep learning. In simpler terms, it allows you to leverage the combined processing power of multiple devices to run complex machine learning models that wouldn't be possible on a single device due to memory or processing limitations.
Installation:
Currently, installing exo from source is the recommended approach. Prerequisites include Python version 3.12.0 or higher due to compatibility issues with asyncio in previous versions. Detailed installation instructions can be found on the exo GitHub repository.
Troubleshooting:
If you encounter issues running exo on Mac, refer to the MLX installation guide for troubleshooting steps.
Git
exo is incredibly easy to use. Here's an example demonstrating how to run a model across multiple macOS devices:
python3 main.py
Example Usage:
exo is incredibly easy to use. Here's an example demonstrating how to run a model across multiple macOS devices:
Device 1:
python3 main.py
Device 2:
python3 main.py
That's all! exo automatically discovers connected devices, eliminating the need for configuration.
Accessing Models:
The primary method for accessing models running on exo involves the exo library with peer handles. Refer to the provided Llama 3 example for guidance.
exo also launches a ChatGPT-compatible API endpoint accessible at http://localhost:8000. Currently, only tail nodes (those designated for the end of the ring topology) support this functionality.
The web assistant should be able to provide quick and effective solutions to the user's queries, and help them navigate the website with ease.
The Web assistant is more then able to personalize the user's experience by understanding their preferences and behavior on the website.
The Web assistant can help users troubleshoot technical issues, such as broken links, page errors, and other technical glitches.
Please log in to gain access on Unleash the Power of Your Devices: Distributed Deep Learning with exo file .