[MGI37-P15] 大量の学習データに利用に向けたノード間MPI並列TensorFlowの性能評価
キーワード:機械学習、ノード間並列
Recently machine learning (ML) achieves great works in the recognition, and it plays an essential role in the AI (artificial intelligence) area. In general, the learning data is one of the most critical keys for the ML since the feature extraction from each learning data affects strongly to the accuracy of recognition. Thus, the large amount of learning data is required for ML to avoid bias recognition. However, it takes a long time and needs many computer memory to learn from the massive data. In this talk, the inter-node MPI parallel TensorFlow is introduced.
TensorFlow is the one of ML framework developed by Google, and it is used in the world. Google has developed the inter-node parallel TensorFlow however it is not suitable for the supercomputer such as picking up the local computing node directly to participate in the calculation. To overcome this problem, Horovod MPI based TensorFlow has released by Uber, and it can use the MPI tuned for the supercomputer. As the collaboration of Cray and Kyoto University, more optimized MPI based TensorFlow which is called as CPE ML Plugin has been introduced to the supercomputer of Kyoto University. In this talk the performance evaluation of this MPI based TensorFlow.
TensorFlow is the one of ML framework developed by Google, and it is used in the world. Google has developed the inter-node parallel TensorFlow however it is not suitable for the supercomputer such as picking up the local computing node directly to participate in the calculation. To overcome this problem, Horovod MPI based TensorFlow has released by Uber, and it can use the MPI tuned for the supercomputer. As the collaboration of Cray and Kyoto University, more optimized MPI based TensorFlow which is called as CPE ML Plugin has been introduced to the supercomputer of Kyoto University. In this talk the performance evaluation of this MPI based TensorFlow.