Current location: Home> Ai News

DeltaAI Release: NCSA meets next-generation AI research needs

Author: LoRA Time: 18 Dec 2024 63

The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign has just launched its highly anticipated DeltaAI system.

DeltaAI is an advanced artificial intelligence computing and data resource that will be a companion system to NCSA Delta, a 338-node HPE Cray-based supercomputer installed in 2021. The new DeltaAI is funded by the National Science Foundation (NSF) with nearly $30 million, and researchers across the country will use the system on a pilot basis through the NSF ACCESS program and the National Artificial Intelligence Research Resource (NAIRR).

The system will accelerate complex AI, machine learning and HPC applications running terabytes of data by using advanced AI hardware, including the NVIDIA H100 Hopper GPU and GH200 Grace Hopper super chip.

The media interviewed NCSA Director Bill Gropp at SC24 in Atlanta to learn the inside story of the new DeltaAI system that was fully put into use last Friday.

From Delta to DeltaAI: Meeting Growing GPU Demand

Gropp said that when conceiving and deploying the original Delta system, DeltaAI was inspired by NCSA's growing need for GPUs. "The name Delta comes from the fact that we're seeing these advances in computing architecture, particularly with GPUs and other interfaces. Some communities have adopted these, but not all communities, and we really feel that's what people should adopt an important direction."

"So, we proposed Delta to the National Science Foundation and got funding, basically taking almost all of the GPU resources. We had anticipated that it would be a hybrid of modeling simulations, like molecular dynamics, fluid flow, and artificial intelligence. . But as we deployed Delta, AI was just getting started and the demand was growing.”

Gropp said the original Delta system, equipped with an Nvidia A100 GPU and a more modest amount of GPU memory, was state-of-the-art at the time, but after the emergence and popularity of large language models and other forms of generative artificial intelligence (GenAI), the game changed. changed.

"We looked at people's needs and realized that AI research has huge demands on GPU resources, and these larger models will require more GPU memory," he said.

Scaling GPU power to demystify artificial intelligence

NCSA's original Delta system became the supporting system for the new DeltaAI.

The new DeltaAI system will deliver approximately twice the performance of the original Delta, delivering petaflops of double-precision (FP64) performance for tasks requiring high numerical accuracy, such as fluid dynamics or climate modelling, and an astonishing 633 petaflops of half-precision (FP16) performance, optimized for machine learning and AI workloads.

This extraordinary computing power is driven by 320 NVIDIA Grace Hopper GPUs, each equipped with 96GB of memory, for a total of 384GB of GPU memory per node. The nodes are also supported by 14PB of storage with up to 1TB/sec bandwidth and interconnected with a highly scalable fabric.

Gropp said the NSF's supplemental funding to Delta and DeltaAI will allow them to deploy additional nodes with more than 1TB of GPU memory per node, which will support AI research, especially research dedicated to understanding LLM training and inference. Gropp hopes this aspect of DeltaAI’s research potential will be a boon to explainable AI, as these huge memory resources allow researchers to work with larger models, process more data simultaneously, and gain deeper insights into the mechanics of AI systems. exploration.

“We do a lot of research on explainable AI, trustworthy AI, and understanding how reasoning works,” Gropp explains. He highlights the key question driving this work: “Why do models work the way they do?” of? How do you improve their quality and reliability?"

Understanding how AI models reach specific conclusions is critical to identifying bias to ensure fairness and improve accuracy, especially in high-stakes applications such as healthcare and finance. Explainable AI is a response to “black box” AI systems and models, which are not easily understood or accessible and often lack transparency in how inputs are processed to generate outputs.

As AI adoption accelerates, the need for explainability and accuracy grows simultaneously, Gropp said, prompting questions like "how to reduce the interpolation errors that are intrinsically in these models so that people can rely on what they're getting out of them." ?" and other questions. "Seeing the need is why we proposed this. I think that's why NSF funded it and why we're so excited."

Democratizing artificial intelligence and high-performance computing

DeltaAI will be available to researchers across the United States through the NSF ACCESS program and the National Artificial Intelligence Research Resource (NAIRR) pilot program. This broad accessibility is designed to facilitate collaboration and expand the reach of DeltaAI’s advanced computing capabilities.

"We're really looking forward to seeing more and more users taking advantage of our state-of-the-art GPUs, as well as taking advantage of the support we can provide and the ability to collaborate and share our resources with other groups," Gropp said.

Gropp said the new system will play a dual role in advancing artificial intelligence and more traditional computational science. While DeltaAI’s nodes are optimized for AI-specific workloads and tools, they are equally accessible to HPC users because the system is designed to be a versatile platform that serves both AI research and traditional HPC applications. program.

HPC workloads such as molecular dynamics, fluid dynamics and structural mechanics will benefit greatly from the system's advanced architecture, particularly its multi-GPU nodes and unified memory. These features address common challenges in HPC, such as memory bandwidth limitations, by providing massive bandwidth to improve performance on compute-intensive tasks.

Balancing AI hype with actual scientific progress

Integrated on the same network and shared file system as the original Delta system, DeltaAI represents a forward-thinking approach to infrastructure design. This interconnected setup not only maximizes resource efficiency but also sets the stage for future scalability.

Gropp said plans are in place to add new systems over the next year or two, reflecting a shift toward a continuous upgrade model rather than waiting for current hardware to become obsolete. While this approach may create challenges in managing more heterogeneous systems, the benefits of staying at the forefront of innovation far outweigh the complexity.

This innovative approach to infrastructure design ensures that legacy computing workloads are maintained and seamlessly integrated with advances in AI, creating a balanced and versatile research environment in a modern computing environment that can lead to AI fatigue.

“The hype around AI can be exhausting,” Gropp noted. “We do have to be careful because there’s tremendous value in the things AI can do. But there are a lot of things it can’t do, and I don’t think it ever can. It can be done, at least with the technology we have.”

DeltaAI exemplifies NCSA's commitment to advancing the frontiers of scientific understanding and practical applications of artificial intelligence and high-performance computing technologies. Scientific applications such as turbulence modeling are benefiting from the combination of HPC and AI.

"I think this is an exciting example of what we really want to do," Gropp said. "Not only do we want to understand it and satisfy our curiosity about it, but we want to be able to use this knowledge to improve human life." ”

FAQ

Who is the AI course suitable for?

AI courses are suitable for people who are interested in artificial intelligence technology, including but not limited to students, engineers, data scientists, developers, and professionals in AI technology.

How difficult is the AI course to learn?

The course content ranges from basic to advanced. Beginners can choose basic courses and gradually go into more complex algorithms and applications.

What foundations are needed to learn AI?

Learning AI requires a certain mathematical foundation (such as linear algebra, probability theory, calculus, etc.), as well as programming knowledge (Python is the most commonly used programming language).

What can I learn from the AI course?

You will learn the core concepts and technologies in the fields of natural language processing, computer vision, data analysis, and master the use of AI tools and frameworks for practical development.

What kind of work can I do after completing the AI ​​course?

You can work as a data scientist, machine learning engineer, AI researcher, or apply AI technology to innovate in all walks of life.