Parallax: Boosting Edge AI with 46% Latency Cut, 30% Energy Savings

Researchers Chong Tang, Hao Dai, and Jagmohan Chauhan from the University of California, San Diego, have developed a framework called Parallax that aims to improve the performance of deep neural network (DNN) applications on edge devices, such as smartphones and other mobile devices. Their work addresses the growing demand for real-time DNN applications and the challenges posed by complex models and heterogeneous hardware.

Edge devices often include specialized accelerators like mobile GPUs to speed up DNN inference, but some operations may not be supported by these accelerators and fall back to the CPU. Existing frameworks handle these fallbacks inefficiently, leading to idle CPU cores, high latency, and memory spikes. Parallax tackles this issue by partitioning the computation graph to expose parallelism, managing memory more effectively, and scheduling tasks adaptively based on device constraints. It also enables heterogeneous inference for dynamic models through fine-grained subgraph control.

The researchers evaluated Parallax on five representative DNNs across three different mobile devices. They found that Parallax achieved up to 46% latency reduction, maintained controlled memory overhead with an average of 26.5%, and delivered up to 30% energy savings compared to state-of-the-art frameworks. These improvements are significant for real-time mobile inference, where responsiveness is crucial.

For the energy sector, this research could have practical applications in improving the efficiency of edge devices used for energy monitoring, predictive maintenance, and other applications that rely on real-time DNN inference. By reducing latency and energy consumption, Parallax could help extend the battery life of mobile devices and reduce the overall energy footprint of edge computing infrastructure. The framework’s ability to handle complex models without requiring model refactoring or custom operator implementations makes it a promising tool for energy sector applications that demand real-time, accurate, and efficient data processing.

This research was published in the Proceedings of the ACM on Mobile Computing and Interaction (PACMIMC).

This article is based on research available at arXiv.

Related Posts