NA-ASC-500-11 Issue 16
The Meisner Minute
Editorial by Bob Meisner
Greetings from HQ. It has been six months since our last newsletter, not only because the newsletter editor Reeta Garber retired, but also because we moved to a new web site.
During that time we received our FY2011 budget and completed planning for this year’s deliverables, signed a memorandum of understanding (MOU) with the Office of Science to work jointly towards exascale, delivered the final configuration of Cielo and signed a contract to deliver over 6 petaFLOPS of capacity computing under the Tri-lab Capacity Computing, or TLCC-2, project. I spent much of my time working with all three labs in identifying the National Security Enterprise drivers pushing us into exascale computing regimes.
For well over a year, we’ve been active in the DOE’s exascale initiative. We have formally joined with the Office of Science in a cooperative effort to define a unified program for the DOE through the MOU I mentioned above. The ASC program has a clear mission need for exascale-level computing, but we recognize that there are significant technical and administrative hurdles that we need to overcome, and our partnership with other major players will help us achieve our ends. We share the technical challenges, but our mission drivers differ.
Our need for exascale arises from our commitment to “deliver confidence” to support national security decisions. This commitment is captured nicely by this statement: “By 2025 provide the predictive simulation capabilities necessary to understand an aging US stockpile and assess the performance of foreign nuclear devices.” This short statement encapsulates four main points that define our drivers in an unclassified manner. First, 2025 is a time when stockpile advice will be given by designers and engineers one to two generations beyond those with test experience. Second, we must equip these future generations with simulation capabilities largely devoid of empiricism. This predictive capability must be sufficient to judge a stockpile with components older than those tested underground. And, finally, the predictive tools we develop must be applied to assessing devices for which we lack test experience.
During the past six months, there have also been staff changes in the ASC Program. We were sad to lose Sander Lee to his pursuit of other professional interests outside supercomputing, and we wish him well. Paul Henning from LANL has agreed to come to HQ for three months to fill in for Sander while we look for his replacement. As you read this, the Dans will have departed the Forrestal building. Dan Segalman has already returned to SNL-Livermore, while Dan Nikkel returns to LLNL after completing his final assignment assembling the “Need for Exascale” story. The Physical Engineering and Modeling program welcomes Jason Pruet as its new federal lead. Jason divides his time between ASC and the Science Campaign as we work to break down “cylinders-of-excellence” between programs. Finally, we are fortunate to have Mark Anderson on a year-long detail from LANL to finish his work on the V&V strategy and help us with the Defense Applications and Modeling program.
In closing allow me to reiterate that your job by 2025 is to deliver technical confidence by providing the predictive simulation capabilities necessary to understand an aging US stockpile and to assess the performance of foreign nuclear devices. Nobody does it better than you.
PS: Sergey—It was a pleasure seeing you at SC10 last year. Happy Birthday.
Cielo Upgrade Positions ASC Well for CCC2
Cielo is a petascale resource for conducting NNSA weapons simulations in the 2011–2015 timeframe. Cielo provides a production, classified computational resource. It is located at Los Alamos National Laboratory and is operated by ACES: The New Mexico Alliance for Computing at Extreme Scale, a collaboration between Los Alamos and Sandia national laboratories.
During May 2011, the Cielo system was upgraded from 1.03 petaFLOPS (72 cabinets) to 1.37 petaFLOPS (96 cabinets). After the hardware upgrade, the Cielo operating system was also upgraded. Panasas file system software upgrades are part of the upgrade plans and are ongoing in preparation for Capability Computing Campaign 2 (CCC2). Users from all three laboratories — Los Alamos, Lawrence Livermore, and Sandia national laboratories — are using Cielo for CCC1 applications work until CCC2 begins in July 2011.
Replacing the Purple supercomputer, Cielo is the new NNSA ASC production, classified computational resource for 2011–2015. Cielo is over 13 times the size of Purple; it began providing capability-computing cycles in February 2011.
Green Computing at LANL
Power Use/Utilization Effectiveness (PUE) close to Industry’s best-in-class metric
Data collected in April 2011 show that the ASC Facility Operations and User Support Program at LANL has significantly improved energy efficiency of operations in the Metropolis Center for Modeling and Simulation. The Center houses Cielo, TLCC, and Roadrunner systems. This improvement is addressing DOE Order 430.2B Departmental Energy, Renewable Energy and Transportation Management. The DOE Order calls for energy reduction in the entire DOE by 30% by 2015. Energy reduction actions also enable LANL to meet DOE Order 450.1A, Environmental Protection Program.
A new automated power and temperature-metering system installed in the Center has contributed to achieving an industry-standard power use/utilization effectiveness (PUE) rating of 1.3, which is close to Industry’s best-in-class metric of 1.2. While allowing data collection for building a baseline, the system also gives the ability to systematically turn on and off facility equipment based on changing computer-room platform and supporting infrastructure requirements. For example, the addition of 24 racks to Cielo recently increased the system size by .33 petaflops (PFs) from 1.03 PFs to 1.37PFs.
Sandia’s Quantified Margins and Uncertainties Assessment of W80 Abnormal Mechanical Nuclear Safety
In an assessment funded by the Advanced Simulation and Computing (ASC) Program, a W80 system model is being used to quantify the margins and uncertainties (QMU) of nuclear safety issues in abnormal mechanical environments such as handling drop accidents. Underpinning this project are the uncertainty quantification methods and computational tools developed by the ASC program, including SIERRA solid mechanics codes to perform the simulations. Uncertainties in the system model, which are prohibitively expensive to determine exclusively from comparisons with full system experiments, were derived in part by propagating uncertainties determined from model comparisons with subsystem and material characterization tests (such as the notched tension model shown below) up to the integrated system level.
In addition, novel mesh-independent methods for modeling failure propagation and material softening, recently incorporated in the SIERRA solid mechanics codes, are being tested for their effectiveness and efficiency in addressing long-standing computational issues bearing on this and other systems in abnormal mechanical environments. The project has undergone peer review by a panel of experts in a series of evaluations, most recently in late 2010. The project is supporting the 2011 and 2012 W80 Annual Assessment Reports.
ASC Codes Assess National Ignition Facility Risks
Lawrence Livermore scientists leveraged advanced physics models in the modern ASC codes to make assessments of the risk to National Ignition Facility (NIF) chamber optics and diagnostics caused by debris generated during laser-driven materials science experiments. High-resolution 2D and 3D simulations were used in the decision making process to determine if it was safe to proceed with NIF experiments.
Debris risk assessment for the Material Strength Drive (HEDS) experiments was performed using a combination of simulation tools, including ALE3D and ParaDyn. The ARES code was used to evaluate the debris risk for a radiative supernova hydrodynamics experiment conducted in collaboration with the University of Michigan.
The goal of this work was to apply SIERRA Mechanics tools to understand the source of the observed damage and thereby help guide the design of the current DOI flash-lamp assembly. Application of high-level shock environments in a mechanical computational model of the CASA flash-lamp assembly (Figure 1) showed critical stress regions (shown as red in Figure 2) consistent with the location of fractures observed after testing. The analysis suggested that residual stresses from glass manufacturing and flash-lamp packaging are both critical factors in determining ultimate flash-lamp environmental response.
Critical challenges include the design of glass-to-metal seals, managing mismatches in thermal expansion properties of the various materials, and providing robust means for flash-lamp packaging that would protect glass components from expected shock and vibration environments. These results provide the DOI flash-lamp team with early input to identify robust manufacturing processes and to guide mechanical design decisions.
ParaDiS Code Extended for Face-Centered Cubic Systems
Lawrence Livermore scientists recently extended the ParaDiS code capability to simulate dislocation responses in face-centered cubic (FCC systems), thus enabling prediction capabilities under extreme conditions for a wide range of materials.
Previously, the code was used for only body-centered cubic (BCC) metals predictions, where simulations were run on ASC’s BlueGene/L supercomputer to demonstrate ParaDiS’s new capability as part of a multiscale modeling program by constructing predictive constitutive models.
Sixteen elemental solids exhibit the FCC crystal structure at ambient conditions as well as common engineering materials such as aluminum alloys and austenitic stainless steels. The new capability in ParaDiS will enable LLNL to predict the dynamic strength of common structural materials. A small-scale simulation is currently underway to investigate the relationship between the stress-strain response and its underlying microstructure in single crystal FCC copper.
A snapshot of dislocation structure at the end of an early simulation stage. Many portions of the cell are occupied by dislocation tangles separated by empty space in between, where this type of microstructure can lead to significant increase in material strength.
Themis Approach Provides Remote Ensemble Analysis
Modeling and simulation are central to stockpile verification and validation. Groups of related ASC simulation runs (ensembles) are used to evaluate the sensitivity of computer models to variations in material properties or other input parameters. Sensitivity analysis examines how variation in simulation outputs can be mapped to, or understood in relation to, differences in the inputs. Given the scale of ensemble data, movement of ensembles off the High Performance Computing (HPC) systems where they are generated to separate systems for analysis is impractical.
In response, Sandia National Laboratories is developing Themis, a Web-based approach (see illustration), for conducting remote sensitivity analysis on ensemble data using a three-tiered architecture. The first tier consists of the big data or computationally expensive parallel analysis operations that must remain on the HPC. The second tier provides database storage and tracking of previous analysis results, integrated with a web server and access controls to authenticate users and user groups. The third tier provides interactive visualization and remote exploration of the analysis results through a standard Web browser.
The illustration shows Themis analysis of car data correlating physical characteristics (inputs) with performance data (outputs). Bar charts show high-level correlations between all inputs (top half labeled in green) and all outputs (bottom half labeled in purple). Positive correlation values are upward red bars, while negative values are downward blue bars. The length of each bar indicates its magnitude/relative importance. Opposite colors between inputs and outputs represent inverse relationships, whereas same colors show positive relationships. The scatter plot shows how well individual cars match the high-level correlations, with the best matches appearing closest to the diagonal. Cars are color-coded by their individual MPG values (from blue to red, with red high), where MPG has been selected in the linked table of raw values below.
For ASC-sized data ensembles, we expect remote analysis and visualization to become standard, as data sizes continue to increase. The tiered architecture that underlies Themis provides a scalable mechanism for supporting many use cases – from collaborative analysis to domain-specific, specialized applications. Themis serves as a concrete example of this architecture, providing researchers and analysts an opportunity to experiment with the algorithmic, visual, and cognitive challenges of ensemble analysis.
LLNL Researchers Find Way to Mitigate Traumatic Brain Injury in Study for Joint IED Defeat Organization
Researchers at Lawrence Livermore National Laboratory (LLNL) have found that soldiers using military helmets one size larger and with thicker pads could reduce the severity of traumatic brain injury (TBI) from blunt and ballistic impacts. Their results came after a one-year study funded by the U.S. Army and the Joint IED Defeat Organization (JIEDDO) to compare the effectiveness of various military and football helmet pads in mitigating the severity of impacts.
Moss and King used a combination of experiments and computational simulations to study the response of the various pad systems to battlefield-relevant impacts to gain an understanding of how helmet pads provide protection against these impacts.
The findings have been presented to the Program Executive Office (PEO) Soldier, which is directed by Brig. Gen. Peter Fuller and is the U.S. Army acquisition agency responsible for everything a soldier wears or carries.
LLNL mechanical engineer Mike King (left) and physicist Willy Moss watch a compression test of a helmet pad. The pair has found a simple way to potentially reduce the severity of traumatic brain injury from blunt and ballistic impacts
To learn more, see the ABC TV (KGO) report.
Roadrunner Open Science in Nature Physics
As the first petascale supercomputer, Roadrunner brought fame in computing technology and scientific discovery and recognition to Los Alamos National Laboratory (LANL). It has also brought success to research using the state-of-the-art plasma simulation code VPIC. One example is the study of magnetic reconnection, a process thought to play an important role in a diverse range of applications including solar flares, geomagnetic substorms, magnetic fusion devices, and a wide variety of astrophysical problems. An article describing simulation results and its effect on the research is published in Nature Physics.* The petascale supercomputer has enabled a series of three-dimensional simulations using over 1012 computational particles, nearly 103 larger than previous two-dimensional studies.
A simulation of three-dimensional reconnection showing interacting flux ropes forming across the reconnection layer. Some sample magnetic field lines are shown in yellow and cutting planes along the perimeter also show current density.
*W. Daughton, et al., "Role of electron physics in the development of turbulent magnetic reconnection in collisionless plasmas," Nature Physics, published online 10 April 2011 (http://dx.doi.org/doi:10.1038/nphys1965).
Modeling Rotation of a Galaxy Wins Supercomputing Challenge Top Prize
A program for middle- and high-school students that encompasses the school year, the New Mexico Supercomputing Challenge is an opportunity to work on the most powerful computers in the world.
In April 2011, students, teachers, and volunteer judges conducted project evaluations that culminated in an awards ceremony. Los Alamos Middle School student Cole Kendrick won several prizes, including the top prize, for his research project in which he developed a computer program to model the rotation of a galaxy including dark matter.
Evoking memories of the Roadrunner Universe open-science project to shake down the Roadrunner supercomputer in 2009, Kendrick's project was set up to find out how dark matter affects rotational curves in galaxies and how accurately this effect could be modeled. Kendrick wanted to find out what happens when dark matter and galaxy masses are changed and whether this method would work for different galaxies. In addition to the top prize, he also received the Crowd Favorite Award and the Best Use of Visualization and Parallel Processing awards from the New Mexico Institute of Mining and Technology’s Computer Science and Engineering Department. To read more about the Challenge, visit http://www.challenge.nm.org/.
Computing Milestones Showcased at LANL’s Museum
For decades, Los Alamos National Laboratory has been synonymous with supercomputing, achieving a number of milestones along the way.
Those milestones and more are showcased in a supercomputing exhibit called “The Road to Roadrunner and Beyond” at LANL’s Bradbury Science Museum (http://www.lanl.gov/museum). Visitors at the museum get a comprehensive look at supercomputing from its origins up to and beyond Roadrunner, the world’s first computer to operate at speeds exceeding one petaflop — 1 million billion calculations per second. The updated exhibit includes interactive displays, artifacts from early computers like the FERMIAC mechanical computer, vacuum tubes from the MANIAC computer, and unique IBM cell blades from Roadrunner. To see a video about the exhibit, visit http://www.youtube.com/watch?v=pLu9FCb4964. Updating the exhibit was made possible by a grant from the Institute of Electrical and Electronics Engineers (IEEE).
Ribbon-cutting ceremony kicks off “The Road to Roadrunner and Beyond” exhibit on May 19, 2011. Left to right: David Israelevitz, Research Engineer; Dr. Gordon W. Day, IEEE President Elect; Andrew White, LANL Deputy Associate Director; and Linda Deck, Museum Director. Photo by Sandra Valdez, LANL.
Douglas (Doug) Doerfler is a Principal Member of Technical Staff in the Scalable Computer Architectures organization at Sandia National Laboratories. He has been with the ASC Program since 2000 working in the Computational Systems and Software Environment (CSSE) program element, formerly known as Simulation and Computer Science.
In his present position, Doug is the principal system architect for the Cielo project. Cielo — a petascale supercomputer — has replaced Lawrence Livermore National Laboratory’s Purple machine as the ASC Program’s production capability computing platform. The development of the Cielo system is the first and most visible activity for the Los Alamos National Laboratory (LANL)/Sandia Alliance for Computing at Extreme Scale (ACES) — a collaboration established by a May 2008 MOU signed by then Sandia Director Tom Hunter and LANL Director, Michael Anastasio. As the Cielo system architect, Doug has served as the technical lead for the Request For Proposals and led the development of the acceptance criteria. He has had a key role in managing the integration and deployment of Cielo, which includes the successful completion of a very visible LANL/Sandia Level 2 programmatic milestone, “Platform Integration Readiness.”
For Cielo, Doug has assumed and addressed numerous technical and programmatic responsibilities associated with managing Cielo’s deployment across two laboratories — a first in which two DOE/NNSA laboratories have jointly deployed a supercomputer. “The Cielo project has been my highest profile assignment and has been a good match for my skill set: technical judgment, leadership and management,” says Doug. “And it has been my most satisfying project. I’ve really enjoyed having a key role in the ACES relationship with LANL, and even though Cielo is sited at Los Alamos, Sandia continues to demonstrate a high level of computer architecture and platform leadership and is able to continue to provide impact in the capability computing area for NNSA/ASC.”
Much in Doug’s background has prepared him for his current assignment and this subsequent recognition. Upon joining the ASC Program in 2000, Doug worked on the Computational Plan (CPlant) project. This project was an early ASC effort to explore using commodity hardware and open source software for large-scale scientific computing, i.e. an early instance of what we refer today as cluster computing. The CPlant program deployed several clusters at Sandia. When Doug came to the ASC program, one of his first impacts was to develop an integration and test methodology for the Antarctica cluster — the largest and final CPlant deployment at over 2000 nodes. This effort was rewarded with a promotion to management for Doug. In addition to managing CPlant his group was also responsible for maintaining one of the ASC Program’s first supercomputers, ASCI Red. The builder’s (Intel) contract had expired and Sandia had assumed the operational management for the program. Later, he was also part of the ASC’s Red Storm supercomputer acquisition team.
Much farther in the past as a graduate student, Doug, worked on a Sandia contract, looking at low-power data acquisition techniques for unattended ground sensors. This led to a job at Sandia, hiring on in the summer of 1985. “My first project was working with a team to develop a perimeter security system for the Air Force. I developed the control software that collected data from sensors placed around the perimeter of international Air Force bases. This system was installed on several bases worldwide. In 1988 I transferred to a group that was developing automatic target recognition (ATR) techniques and systems for the Army, and eventually the Air Force. I led the development of several real-time ATR systems, the highlight being a deployment on one of the Air Force's Joint STARS aircraft, a collaborative effort among Sandia, Northrop Grumman and the Air Force. These prior projects provided me with an introduction to high performance computing (HPC) at the embedded level and I wanted to better understand the technologies and challenges of large-scale computing, and the ASC program was a perfect fit. I transferred to Sandia’s Computing and Research Center in 2000 and began my third ‘career’ at Sandia working on the a series of large-scale machines: CPlant, Red Storm, TLCC, Red Sky, Cielo.”
Doug also participates in DOE workshops, such as the current exascale workshops hosted by the NNSA and the Office of Science, and represents Sandia and ASC at numerous conferences and meetings with industry, academia and other government organizations. His primary research interest is in area of performance analysis of HPC platforms and technologies and has published several conference papers and journal articles. Most recently he co-authored a paper published in the IEEE International Parallel and Distributed Processing Symposium Workshop on Large-Scale Parallel Processing. This paper, "Investigating the Impact of the Cielo Cray XE6 Architecture on Scientific Application Codes,” was also selected for publication in a special issue of Parallel Processing Letters Journal. Co-authors are Courtenay Vaughan, Mahesh Rajan, and Richard F. Barrett. He is also co-author of a paper titled, "Application-Driven Acceptance of Cielo, an XE6 Petascale Capability Platform," published at the 2011 Cray User Group (CUG) meeting describing the methodology that was developed to use NNSA applications as a primary acceptance criterion for Cielo. Co-authors included Mahesh Rajan, Cindy Nuss (Cray), Cornell Wright (LANL), and Tom Spelce (LLNL).
Los Alamos’ Manuel Vigil (Cielo Project Manager) has this to say about his Sandia counterpart:
“It is a pleasure working with Doug on the Cielo Project. As the principal system architect for Cielo, Doug has provided outstanding leadership and guidance to the multi-site technical team on Cielo system integration. His deep understanding of supercomputing issues has allowed him to provide leadership to both Cray and Panasas, as well as the technical experts from both Sandia and Los Alamos. One key area where Doug has demonstrated outstanding leadership is his working with Sandia, LLNL, and LANL applications personnel to interact with Cray in demonstrating outstanding application performance on Cielo. I highly value and appreciate Doug's leadership and contributions to the Cielo project.”
Outside work, Doug enjoys “going to movies with my family, motorcycling, bicycling and keeping my house from falling apart.”
ASC Relevant Research
Sandia National Laboratories
Citations for Publications in 2011
Citations for Publications in 2010
[Additions to those in Issue 15.]
Citations for Publications in 2009
[Additions to those in Issue 15.]
Los Alamos National Laboratory
Citations for Publications in 2011
Printer-friendly version -- ASCeNews Quarterly Newsletter - March/June 2011