Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group
  • Journal of Lightwave Technology
  • Vol. 39,
  • Issue 13,
  • pp. 4247-4254
  • (2021)

X-NEST: A Scalable, Flexible, and High-Performance Network Architecture for Distributed Machine Learning

Not Accessible

Your library or personal account may give you access

Abstract

In a large-scale distributed machine learning system, the interconnection network between computing devices has an important impact on performance in the training of neural network models. The current expansion of training data and model size has led to a rapid increase in the number of computing devices used in distributed machine learning systems, which places higher demands on network scalability. In addition, the synchronization algorithms used for data exchange between computing devices have different communication topologies, and traditional electrical networks have difficulty matching them due to their fixed network topology. Neural network models and model partitioning methods can also affect the amount of communication between devices, but the overprovisioned bandwidth of traditional electric networks incurs unnecessary costs. To address these issues, we propose a scalable, flexible, and high-performance network architecture called X-NEST. The flexibility of optical switching devices allows X-NEST to dynamically change its topology and the number of links between devices according to traffic pattern variations, thereby improving network performance and resource utilization. Although changes in the connection relationships between devices depend on the controller, the simple and flexible control plane of X-NEST can quickly respond to network communication requirements. Extensive analytical simulations using different traffic patterns demonstrate that X-NEST copes well with the communication characteristics of various synchronization algorithms.

PDF Article
More Like This
Fast and scalable all-optical network architecture for distributed deep learning

Wenzhe Li, Guojun Yuan, Zhan Wang, Guangming Tan, Peiheng Zhang, and George N. Rouskas
J. Opt. Commun. Netw. 16(3) 342-357 (2024)

Fast control plane for flexible and scalable optical interconnects

Yunfeng Lu, Huaxi Gu, Xiaoshan Yu, and Peng Li
Opt. Express 30(3) 3316-3328 (2022)

Accelerating model synchronization for distributed machine learning in an optical wide area network

Ling Liu, Liangjun Song, Xi Chen, Hongfang Yu, and Gang Sun
J. Opt. Commun. Netw. 14(10) 852-865 (2022)

Cited By

You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.

Contact your librarian or system administrator
or
Login to access Optica Member Subscription

Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.