Recently, Terminus Group has proposed a two-stage multi-tier multi-node scheduling algorithm to address the challenges of deploying LLMs in multi-layer cloud-edge architectures and the limitations of traditional scheduling methods.
Methodology
The proposed two-stage multi-tier multi-node scheduling algorithm for collaborative AI computing consists of two main stages:
1. Inter-layer LLM Decoupling and Partitioning: This stage uses integer linear programming to effectively allocate model size and computational requirements across different layers.
2. Intra-layer LLM Task Scheduling: This stage employs graph neural networks (GNNs) to assess resource utilization and network conditions, determining the optimal scheduling nodes within each layer.
The results show that the proposed solution increases throughput by 9.1%-26.3% compared to traditional methods, enabling efficient deployment of LLMs in multi-layer networks and significantly improving system performance.
Original Thesis