Method for synthesis of fault-tolerant topologies with implicit clusters based on de Bruijn transformations in redundant numeral systems
DOI:
https://doi.org/10.18372/2073-4751.80.19784Keywords:
topology, efficiency, fault tolerance, survivability, de Bruijn sequencesAbstract
The work is devoted to the development of a method for synthesizing topologies based on de Bruijn transformations in redundant numeral systems, which allows synthesizing fault-tolerant topologies of a given order, including those with implicit clusters. A method for forming such clusters and a method for studying the survivability of topologies using dynamic determination of failure probability based on betweenness centrality are also developed.
The proposed comprehensive approach allows us to synthesize graphs that, on the one hand, contain less redundancy, and on the other hand, have higher survivability due to better use of the available redundancy, which allows us to increase fault tolerance with lower costs and ensure better efficiency.
References
November 2023 | TOP500. URL: https://www.top500.org/lists/top500/2023/11.
Esfahanian, Hakimi. Fault-tolerant routing in debruijn comrnunication networks. IEEE Transactions on Computers, 1985. Vol. 100(9). P. 777–788.
Atchley S. et al. (2023, November). Frontier: Exploring Exascale The System Architecture of the First Exascale Supercomputer. SC23: International Conference for High Performance Computing, Networking, Storage and Analysis : proceedings, Denver, CO, USA, 11–17 November 2023 / SIGHPC, IEEE CS. New York, 2023. P. 1–16. DOI: 10.1145/3581784.3607089.
Aurora | Argonne Leadership Computing Facility. (n.d.). URL: https://www.alcf.anl.gov/aurora.
Eagle System Configuration. (n.d.). High-Performance Computing | NREL. URL: https://www.nrel.gov/hpc/eagle-system-configuration.html
Ajima Y. High-dimensional interconnect technology for the K computer and the supercomputer Fugaku. URL: https://www.fujitsu.com/global/about/resources/publications/technicalreview/topics/article005.html.
Documentation - Network and interconnect. (n.d.). URL: https://docs.lumi-supercomputer.eu/hardware/network/.
About | Leonardo pre-exascale supercomputer. (2024, February 21). Leonardo Pre-exascale Supercomputer. URL: https://leonardo-supercomputer.cineca.eu/about/#:~:text=Leonardo%20features%20a%20Dragonfly%2B%20topology,HPC%20application%20performance%20and%20scalability.
Stunkel C. B. et al. The high-speed networks of the Summit and Sierra supercomputers. IBM Journal of Research and Development. 2020. Vol. 64(3/4). P. 3–1.
MareNostrum 5. (n.d.). BSC-CNS. URL: https://www.bsc.es/ca/marenostrum/marenostrum-5
Morgan T. P. (2022, October 26). The NVSwitch fabric that is the hub of the DGX H100 SuperPOD. The Next Platform. URL: https://www.nextplatform.com/2022/03/23/nvidia-will-be-a-prime-contractor-for-big-ai-supercomputers/.
Wang T. et al. Rethinking the data center networking: Architecture, network protocols, and resource sharing. IEEE access. 2014. Vol. 2. P. 1481–1496.
Jain N. et al. Predicting the performance impact of different fat-tree configurations. SC '17: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis : proceedings, Denver, CO, USA, 12–17 November 2017 / SIGHPC, IEEE CS. New York, 2017. P. 1–13. DOI: 10.1145/3126908.312696.
Ohring S. R. et al. On generalized fat trees. 9th international parallel processing symposium : proceedings. Santa Barbara, CA, USA, 25–28 April 1995 / IEEE. 1995. P. 37–44. DOI: 10.1109/IPPS.1995.395911.
Zahavi, E. (2010). D-Mod-K routing providing non-blocking traffic for shift permutations on real life fat trees. CCIT Report, 776, 840.
Alizadeh M., Edsall T. On the data path performance of leaf-spine datacenter fabrics. 2013 IEEE 21st Annual Symposium on High-Performance Interconnects : proceedings. San Jose, CA, USA, 21–23 August 2013 / IEEE. 1995. P. 71–74. DOI: 10.1109/IPPS.1995.395911.
Sabir E., Mamut A., Vumar E. The extra connectivity of the enhanced hypercubes. Theoretical Computer Science. 2019. Vol. 799. P. 22–31.
Shpiner A. et al. Dragonfly+: Low cost topology for scaling datacenters. In 2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB) (pp. 1-8). IEEE.
Loutskii H. et al. Increasing the fault tolerance of distributed systems for the Hyper de Bruijn topology with excess code. 2019 IEEE International Conference on Advanced Trends in Information Theory : proceedings. Kyiv, Ukraine, 18–20 December 2019 / IEEE. 2019. P. 1–6. DOI: 10.1109/ATIT49449.2019.9030487.
Dodonov A., Lande D. Modeling the Survivability of Network Structures. URL: https://www.academia.edu/download/108489732/paper1.pdf.
Downloads
Published
How to Cite
Issue
Section
License
The scientific journal adheres to the principles of Open Access and provides free, immediate, and permanent access to all published materials without financial, technical, or legal barriers for readers.
All articles are published in Open Access under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Copyright
Authors who publish their works in the journal:
-
retain the copyright to their publications;
-
grant the journal the right of first publication of the article;
-
agree to the distribution of their materials under the CC BY 4.0 license;
-
have the right to reuse, archive, and distribute their works (including in institutional and subject repositories), provided that proper reference is made to the original publication in the journal.