X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning

Yunfeng Lu; Huaxi Gu; Xiaoshan Yu; Peng Li

Journal of Lightwave Technology
Vol. 39,
Issue 13,
pp. 4247-4254
(2021)

X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning

Yunfeng Lu, Huaxi Gu, Xiaoshan Yu, and Peng Li

Not Accessible

Your library or personal account may give you access

Get PDF
Email
Share
Get Citation
Copy Citation Text
Yunfeng Lu, Huaxi Gu, Xiaoshan Yu, and Peng Li, "X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning," J. Lightwave Technol. 39, 4247-4254 (2021)

Export Citation
- BibTex
- Endnote (RIS)
- HTML
- Plain Text
Save article

Abstract

In a large-scale distributed machine learning system, the interconnection network between computing devices has an important impact on performance in the training of neural network models. The current expansion of training data and model size has led to a rapid increase in the number of computing devices used in distributed machine learning systems, which places higher demands on network scalability. In addition, the synchronization algorithms used for data exchange between computing devices have different communication topologies, and traditional electrical networks have difficulty matching them due to their fixed network topology. Neural network models and model partitioning methods can also affect the amount of communication between devices, but the overprovisioned bandwidth of traditional electric networks incurs unnecessary costs. To address these issues, we propose a scalable, flexible, and high-performance network architecture called X-NEST. The flexibility of optical switching devices allows X-NEST to dynamically change its topology and the number of links between devices according to traffic pattern variations, thereby improving network performance and resource utilization. Although changes in the connection relationships between devices depend on the controller, the simple and flexible control plane of X-NEST can quickly respond to network communication requirements. Extensive analytical simulations using different traffic patterns demonstrate that X-NEST copes well with the communication characteristics of various synchronization algorithms.

PDF Article

More Like This

Fast and scalable all-optical network architecture for distributed deep learning

Wenzhe Li, Guojun Yuan, Zhan Wang, Guangming Tan, Peiheng Zhang, and George N. Rouskas
J. Opt. Commun. Netw. 16(3) 342-357 (2024)

Fast control plane for flexible and scalable optical interconnects

Yunfeng Lu, Huaxi Gu, Xiaoshan Yu, and Peng Li
Opt. Express 30(3) 3316-3328 (2022)

Accelerating model synchronization for distributed machine learning in an optical wide area network

Ling Liu, Liangjun Song, Xi Chen, Hongfang Yu, and Gang Sun
J. Opt. Commun. Netw. 14(10) 852-865 (2022)

Previous Article Next Article

Cited By

You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Abstract

Cited By

Journal of Lightwave Technology