In the rapidly evolving landscape of artificial intelligence, researchers Hemang Jain, Divyansh Pandey, and Karthik Vaidhyanathan from IBM Research are exploring innovative ways to optimize the performance of language model-based systems. Their recent work, titled “CALM: A Self-Adaptive Orchestration Approach for QoS-Aware Routing in Small Language Model based Systems,” was published in the Proceedings of the 2023 IEEE/ACM International Symposium on Quality of Service (IWQoS).
AI systems, particularly those powered by language models, often face dynamic workloads and resource requirements that can impact their overall quality of service (QoS). This is especially true for systems using small language models (SLMs), which, while resource-efficient, may struggle to meet the diverse and scalable demands of real-world applications. The researchers argue that a coordinated fleet of SLMs, each with specialized strengths, could dynamically adapt to shifting contexts and workload patterns, offering a more robust solution.
To achieve this, the team introduced CALM, a self-adaptive orchestration mechanism based on the MAPE-K (Monitor, Analyze, Plan, Execute, Knowledge) framework. CALM continuously monitors user queries, analyzes the QoS metrics of the SLMs, and identifies the optimal SLM to handle each query. It then routes the query to the selected SLM and uses caching and scheduling to decide which SLMs to keep in memory, enhancing both effectiveness and efficiency.
The evaluation of CALM showed promising results. Compared to single-LLM baselines, CALM reduced latency by approximately 40% and energy consumption by 50%, while maintaining domain-specific task performance. These improvements could have significant implications for the energy sector, where AI systems are increasingly being used for tasks such as predictive maintenance, energy trading, and customer service.
For instance, in predictive maintenance, CALM could optimize the routing of sensor data to different SLMs, reducing latency and energy consumption while maintaining accurate predictions. Similarly, in energy trading, CALM could enhance the efficiency of AI systems that analyze market trends and make trading decisions, potentially leading to cost savings and improved profitability.
In conclusion, the researchers’ work on CALM offers a promising approach to optimizing the performance of language model-based systems, with potential applications in the energy sector. By leveraging a coordinated fleet of SLMs and intelligent orchestration, CALM could help energy companies reduce costs, improve efficiency, and enhance the reliability of their AI systems.
This article is based on research available at arXiv.

