By John Russell
The appliance hardware is actually an IBM Power9-based AC922 server – available in two sizes, one with 2 Nvidia V100 GPU accelerators and 6 gigabytes of memory, and a larger version with 4 V100s and one terabyte of memory. The appliance comes packed with all the software required and would likely be sitting in the same datacenter as the host HPC infrastructure.
Here’s how it would work. The host infrastructure performs the actual simulations (IBM has targeted some early domains) and sends information about the simulation to the IBM appliance which applies its Bayesian expertise to optimize simulation parameters and search space. The appliance returns the optimized parameters to the host which uses them in the next round of simulation. This is done iteratively to converge on the best solution.
“Think of it as ping-ponging back and forth between the systems,” said Chris Porter, worldwide offering leader for IBM’s converged HPC-AI solutions.
If effective, the approach would be a way for users to charge up their simulation activities without disrupting their existing infrastructure or incurring dramatic costs associated with infrastructure upgrades. At least that’s the idea.
Porter said no participation from ISVs is required. “The way our optimizer works is we interact at the input file and the output file level. So think of applications such as a simulator, computational fluid dynamics or electronics or a protein docking simulator. In all of those cases, there are input files that define the problem, and then some sort of a solver, and then they create output files to be to be examined usually by the user.
“We don’t need participation from the ISV [because] we just read and write input and output files. We do have to create what we call interface functions, you can think of them as plugins on those applications that do the parsing of the specific input and output files that come from a particular simulator. Those interface functions are simulator specific. They’re also open source. When we launch the product, we will have a public GitHub housing these interface functions on a simulator by simulator basis.”
Right now, said Porter, IBM is focused on automotive, aerospace, electronic design, and oil and gas. He noted that automotive and aerospace share many commercial simulators for computational fluid dynamics, virtual dynamics, etc. forming. “There’s also a little overlap in fluid dynamics for gas and a few ISVs straddle the auto, aero and electronic design [domain]. ANSYS, for instance,” he said.
With its eyes on eventually extending the IBM BOA capabilities to the cloud and mindful of latency challenges for inter-systems communications, IBM says it has created a light data model for exchanging information between the host and IBM appliance.
Porter explains, “We needed to make sure the data model is very light to be viable if the two [systems] are physically separated across the WAN or across the internet. Simulation data can be gigabytes, sometimes terabytes, and the latency associated with transferring that much data out to the cloud, even if you don’t transfer anything back, is so high that it wouldn’t be viable.”
All of the actual simulation as well as parsing needed data output is done on the host HPC. “We calculate things like objective function and constraint values, in situ, in the HPC environment. The only things we send across to the cloud from the HPC are objective function values and constraint values. We’re talking about at maximum kilobytes of data, even if we’re running double precision, and what comes back out would be up to a few thousand design variable values, which again you’re talking about maybe up to a megabyte on a per optimization iteration. The data model is extremely light,” he said.
Currently no one is using the IBM appliance but Porter said there are users of IBM’s Bayesian optimization software. AstraZeneca is using it for some high-throughput virtual screening work with positive results, according to Porter. The University of Texas at Austin, he said, has used the software for optimizing oil reservoir simulations and was able to reduce the number of simulations required to increase the amount of oil that they could extract from one well by a factor of three.
In making the case to potential users of the appliance, IBM is focused on two value parameters. One is simply providing better answers as measured by a quantitative objective function. “For example, for an airplane that might be lower drag. For oil, it means extracting more oil for less cost,” Porter said. The other metric is speedup of simulations and time-to-solution.
The new strategy is intriguing. As IBM explained last year, “[O]ur ambition is to produce solutions that are minimally or zero disruptive on installed base, just so that we can shrink that sale cycle and make life as easy as possible for whatever the client is and what they’re doing.”