Increasing performance with AMD

Amazon EC2 Hpc7a instances benchmarking

Download the full White Paper here

Introduction

Do IT Now deployed an HPC cluster in the cloud, using AMD-based HPC-optimized instances on AWS, installed Altair applications on the infrastructure, and conducted benchmarks with reference models for each application. Do IT Now also compared results in terms of performance and costs for each application and instance combination.

During this analysis, Do IT Now also explored possible optimization patterns and parameters to tune the applications, the instances, and the overall HPC architecture in the cloud.

Objective

The objective of this project is to provide the results of the benchmarks conducted in the cloud environment, showing the baseline performance results and any additional optimizations that the Do IT Now team applied to the infrastructure, the software, and its parameters.

As an additional objective, Do IT Now will present and describe the various design steps, any issue and how it was solved, and the lessons learned during the project execution.

About the cloud instances

The selected cloud instances are based on AMD EPYC™ processors of two different generations, all within the same AWS HPC instance family:

  • Amazon EC2 Hpc6a instances, featuring two 3rd Gen AMD EPYC™ 7003 series 48-core processors with up to 3.6 GHz all-core turbo frequency and 100 Gb/s networking
  • Amazon EC2 Hpc7a instances, featuring two 4th Gen AMD EPYC™ 9004 series 96-core processors with up to 3.7 GHz all-core turbo frequency, DDR5 RAM, and 300 Gb/s networking. Hpc7a instances also offer 24-, 48-, or 96-core 4th Gen AMD EPYC™ processors.

In case of Hpc7a, AWS is offering smaller instance sizes that make it easier for customers to pick a smaller number of CPU cores to activate while keeping all other resources constant based on their workload requirements. In fact, the smaller sizes (with 96, 48, and 24 cores) increase the memory per core and memory bandwidth per core. This can have a serious impact on solver performance and becomes tangible in the case of commercial software that’s licensed on a per-core basis. Additionally, the best core topology configuration is automatically selected by AWS, maximizing performance and eliminating the need for complicated core pinning configurations at run time.

We ran the benchmarks on Hpc6a and all Hpc7a sizes.

About the software

The software used for the benchmarks are part of the Altair CAE product line:

  • Altair® Radioss®, a leading analysis solution to evaluate and optimize product performance for highly nonlinear problems under dynamic loadings. Used worldwide across all industry sectors, it improves the crashworthiness, safety, and manufacturability of complex designs.
  • Altair® AcuSolve®, a leading general-purpose Computational Fluid Dynamics (CFD) solver that is capable of solving the most demanding industrial and scientific applications. Robust and scalable solver technology empowers users by providing unparalleled accuracy.

About the models

Altair® Radioss®

The model used for this software is the publicly available “Taurus 10 million finite elements” model, recovered from the OpenRadioss™ public repository. It is well suited to benchmarking HPC performance in HPC clusters with many CPUs. This benchmark has a refined mesh with 10 million finite elements.

The model has three different settings, based on the simulation time:

  • Full simulation, 120 milliseconds.
  • Shorter simulation, 10 milliseconds: best suited for fast performance and scalability testing on many nodes/CPUs.
  • Very short simulation, 2 milliseconds: useful to test the HPC cluster functions.

For our benchmarks we use the “Shorter” simulation test case, with 10 milliseconds of simulation time.

Altair® AcuSolve®

The model used for AcuSolve has been sent to us directly from Altair, as they considered it a good example of a simulation that will demonstrate high scalability in an HPC environment.

The “Impinging Nozzle” is a fairly substantial model (7.8 million nodes and 7.7 million elements) that simulates a steady water flow through a nozzle using a Navier–Stokes-based solver, and a Spalart– Allmaras turbulence model.

The simulation for this benchmark has been limited to 200 time-steps.

Conclusions

In terms of pure performance, Hpc7a instances always have the edge against the still excellent Hpc6a. The higher cores-per-instance count and the more than double bandwidth for the high-speed interconnect are powerful tools that both multi-thread and multi-process solvers can use to achieve greater performance.

In terms of price, it’s important to note that the per-price job scales less than performance, meaning that adding a host to the computation to get better performance will not actually increase the price of the job by that instance’s hourly price, but for a fraction. This fraction will follow the scalability curve.

This can be achieved thanks to the scalability properties of the Altair solvers, the advanced architecture of AMD processors, and the AWS optimizations of the HPC-series instances and EFA network interfaces.

The price comparison favors the older generation of the AWS HPC instances for Altair® Radioss®, while the opposite is true for Altair® AcuSolve®. In both cases, prices are not that far from each other even though performance on the Hpc7a is clearly better.

Additionally, the different sizes of Hpc7a instances help in achieving even faster performance and can provide more cost-efficient options.

© 2024. All rights reserved.
Altair® Radioss® and Altair® AcuSolve® are registered trademarks of Altair Engineering Inc.

For more information