12
CIMdata, Inc. 3909 Research Park Drive Ann Arbor, MI 48108 +1 (734) 668-9922 www.CIMdata.com Beyond Simulation Data Management Why Innovative Approaches are Required September 2012

Beyond Simulation Data Management Filesize

Embed Size (px)

Citation preview

Page 1: Beyond Simulation Data Management Filesize

CIMdata, Inc. 3909 Research Park Drive

Ann Arbor, MI 48108 +1 (734) 668-9922 www.CIMdata.com

Beyond Simulation Data Management

Why Innovative Approaches are Required

September 2012

Page 2: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 1

Beyond Simulation Data Management The importance and the use of simulation in product development are increasing, as companies are making more product development decisions based on simulation. The number of simulations will dramatically increase as techniques for optimization and robust engineering are adopted. The resulting massive increase in simulation data raises concerns about our capability to deal with it effectively. Simulation analytics provides a way to extract relevant information and understand trends across hundreds or thousands of simulations. Companies need to adopt improved business practices to manage their simulation data, and to guard against warehousing data that may, in fact, never be accessed.

Introduction As companies drive to digital product development, simulation becomes more important. Simulation is used to evaluate virtual prototypes, enabling better decisions earlier in the product development cycle. Simulation can cut product development time and cost, and improve quality.

The nature of simulation is changing. Better computer hardware and software provide a closer coupling of design variables (such as geometry, materials, and topology) with physics-based simulation of product performance, enabling computer-based optimization of designs. Also, statistical methods for robust design ensure product quality and performance in the face of manufacturing, material property, and other variations. Rather than a few simulations to evaluate a single, nominal design, optimization and robust design require hundreds or even thousands more simulations than in the past.

Simulation is also being performed earlier in the product development process, during product ideation and concept selection. Here, on the left side of the systems engineering “Vee” in Figure 1, the product is less well defined, but complex, multi-discipline product trade decisions are being made.

Simulations of virtual prototypes cut development time and cost, and improve quality.

Page 3: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 2

Figure 1—The Systems Engineering “Vee” Model for Product Development1

These trends in the complexity of issues being addressed with simulation result in a massive increase in the volume of simulation data, and this raises a number of critical concerns:

• Methods to gain understanding from the data must be improved

• Business processes to effectively manage the data are generally not in place

• The sheer volume of data may simply overwhelm our capability to process it to extract useful information

Here we lay out and discuss these issues, and make recommendations to better exploit the information generated by simulation. Research for this paper was partially supported by Tecplot.

Simulation Data Management With the rise of the importance of simulation, the need for simulation data management (SDM) has emerged as a critical priority. Companies are increasingly making product development decisions based on simulation rather than physical testing. Simulation data must be managed better for a number of reasons, including the documentation of engineering decisions, and configuration management and control. In short, simulation is achieving a more strategic role.

Solution providers have responded with a number of applications to meet data management requirements. A CIMdata study2 showed that these tools are technically quite adequate for simulation data management and for integration in the PLM environment. However, adoption has been slow.

CIMdata’s research and analysis reveals myriad non-technical issues that face an effective SDM implementation. This paper discusses some of those issues,

1 Source: http://ops.fhwa.dot.gov/publications/seitsguide/images/image021.jpg 2 CPDA Simulation Data and Process Management Scorecard, 2009

Simulation data must be better managed, since it is achieving a strategic role.

Tools are technically quite capable, but non-technical business issues have caused limited adoption.

Page 4: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 3

which are beyond the usual concepts of SDM, and concludes that companies may have to reevaluate their strategies for dealing with simulation data.

The Simulation Data Deluge Simulation has the potential to generate enormous amounts of data. Simulation results files are often very large. Also, the trend is towards optimization and robust design, which may require hundreds or thousands more simulations than are needed for the assessment of a single design.

Indeed, the threat is that simulation data will simply overwhelm us, and become a significant source of waste. The technology trends are not encouraging.

Moore’s Law is a measure of computing capability. For technical computing, the hardware capability has been doubling every eighteen months for the last four decades. Disk capacity, described by some as conforming to Kryder’s Law, is doubling every twelve months. The problem, as pointed out by Scott Imlay of Tecplot, is that the doubling time for disk read/write speed is only three years. In other words, our ability to warehouse data is growing 70% faster than our ability to access and process that data! There are indications that this is already a problem. An IT manager at a large automotive company claimed that over 30% of their simulation data is never accessed (not even once) before it is deleted. Simulations are run, the data is created, and then it is never used. If one includes the data that is accessed only once and not examined in detail, it is not hard to imagine that the waste is in excess of 50%, a staggering number. Data storage for managed data is not cheap. An issue for SDM is that much of the data may be on local drives, on the individual engineers’ workstations. If that data is to be managed, it needs to be moved to a data center. A CIMdata study in 2009, referenced above, showed that the cost of desktop storage in a corporate setting was about $10 per terabyte per month. However, the cost for data center storage was two orders of magnitude greater, about $1,000 per terabyte per month. Desktop data is typically not backed up. In the data center, storage is provisioned on high cost, high availability redundant drives and is also backed up. Backups may be retained for years after the original data is deleted.

The requirement for thousands of simulations for optimization and robust design is over and above the growth in simulation enabled by Moore’s Law and these other heuristics. The clear danger is we will be creating data warehouses and then never extract the value that is in the data.

Information and Simulation Data The reason simulation results files are so large is that they contain three-dimensional field data, often with time as a fourth dimension. These files are usually analyzed in a post-processing step, which involves using graphical tools to manually examine the data. Sometimes, standard summary reports may be

The volume of simulation data threatens to be overwhelming.

Our capacity to generate and warehouse data is increasing much faster than our capacity to process it.

A lot of data is never accessed.

Costs for storing managed data are much higher than for unmanaged (desktop) data.

Simulation files are large, but the useful information is sparse and may be difficult to find.

Page 5: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 4

automatically generated, but the detailed results are retained to examine interesting aspects or anomalies in the solution.

The data deluge brings forth issues here also. It is often impractical to examine the details of each simulation, so conclusions are often based on gross or integrated quantities. The problem is compounded for optimization and robust design, where a surrogate model (e.g., a response surface) may provide little or no understanding of the physics behind a particular result. The “interesting” information contained in these large files is actually rather sparse. However, the location and nature of interesting results is not known ahead of time, and they may be missed by standard reports and integrated quantities. Also, the interesting results may be revealed only by comparisons across hundreds of simulations, which is something most post-processing tools cannot accomplish.

Information Lifecycle Management Information Lifecycle Management (ILM) is the practice of applying certain policies to the effective management of information throughout its useful life. It has been used for decades by companies to manage important records, but is rarely, if ever, applied to simulation data. ILM should be a part of the SDM strategy. This may be relatively easy to accomplish, since companies already have ILM implemented for some of their product related data and documents as well as other important business related information records.

Again, the data deluge is a factor. It is impractical to keep all simulation data forever, but some data does need to be retained, for example if it documents product decisions or compliance. ILM should be applied to classify data when it is created. In particular, the classification (which may be changed during the lifecycle) will specify the disposition of the data at the end of its active lifecycle. It is extremely difficult, if not impossible, to correctly classify simulation data a significant time after it is created. As we note, the context is important, and the context is not explicitly contained in the data. As time passes, the context is lost or forgotten.

Metrics, Metadata, and Context Information The difficulty of dealing with and managing simulation data lies in the nature of the data itself. Simulation data is unlike other data that may be managed as a strategic asset. Because of this, companies struggle with establishing business processes for SDM. For example, a CAD model is created, manually and interactively, by a skilled operator over a period of weeks or months. There are strong change control processes, and companies rarely, if ever, delete CAD data for products that have been produced. The size of a CAD model is measured in megabytes. CAE results files, on the other hand, are created in hours or days. They are very large, perhaps tens of gigabytes. They are generally deleted within weeks, and not kept

Existing business processes for Information Lifecycle Management should be applied to simulation data.

Simulation data is unlike other managed data. Context is the key to the value of simulation data.

Page 6: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 5

beyond the end of the project. (Engineering reports containing simulation evaluations and design recommendations are kept.) Different processes are required to manage CAE and CAD data. Simulation is the evaluation of product performance, and it then follows that context is the key to the value of simulation data. It is important to know many things about the simulation that are not easily accessible or are not explicitly contained in the simulation data files themselves. The context information includes metrics such as performance requirements and their assessed values, key results, and also details of the simulation project: the loads, the material, the design bill of materials, etc. This information should be retained as searchable metadata. It is extremely important that the metadata be automatically gathered, to the greatest possible extent. Users will resist requirements to spend time manually entering more than a minimum amount of metadata. Data provided manually may be incomplete or incorrect. Some SDM solutions are able to parse input and output files to extract metadata such as design variables, materials used, mesh size, loads, and key results. Some simulation applications are able to generate standard summary reports, which can also be catalogued with the metadata.

SDM solutions also maintain the parent-child relationship of input and output data. Thus, it is possible to trace the data pedigree throughout the simulation process. In the best implementations this pedigree is maintained even if some intermediate data is deleted. Then, for example, a report remains associated to a CAD bill of materials even if the detailed results files are deleted. In other words, the metadata should be managed as data entities distinct from the files. Some systems store metadata in the file headers, so the process history and pedigree are lost if a file is deleted.

Discussion Given the situation described above, we see two broad areas for improvement. They are business process improvements and innovative ways to deal with the deluge of simulation results data. Companies struggle to implement SDM because they are coming from an environment where the data was unmanaged. Even though SDM solutions are available, they now have to establish practices, processes, and procedures to manage the data. SDM projects thus usually violate one of the basic tenets of successful IT project management: Do not change both the tools and the process at the same time. Unfortunately, it is difficult to establish realistic manual processes to manage data in the absence of a data management system. The situation almost becomes one of deploying the tools and then discovering how to use them. Once this fact is recognized, it can be managed.

Business processes must be improved and implemented to deal with simulation data.

Metadata must be automatically gathered or generated.

The data pedigree and the process used to create it must be captured, and retained even after the data is deleted.

Page 7: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 6

A first step is to establish ILM classifications for data, and then lifecycles for those data classifications. User roles also need to be defined. A few simple rules, and roles, will suffice to handle most data. The second area for improvement is dealing with the deluge of data. This is not simply about post-processing, it is about innovative strategies to extract the maximum value from simulation.

Simulation results files may be many Gigabytes in size. Post-processing tools such as VCollab3 are available which allow the user to (manually) specify the interesting results. These results are then saved in a special file, which turns out generally to be only 5 to 10% the size of the original results file. This special file contains a distillation of the “interesting” parts of the results. It is not a static report.

We have already noted that maybe 50% of simulation data is not effectively utilized. Of the remainder, perhaps only 5 to 10% is interesting! This is an astounding conclusion, that only a few percent of the data is actually relevant. Organizations are drowning in data they do not have the capacity to process or interpret, yet the overwhelming part of that data is simply waste.

The “Big Data” Problem and Simulation Analytics Simulation thus faces a “Big Data” problem. There are vast amounts of data for which the relevant information content is very low. Also, the “interesting” aspects of the data can be hard to find and are not known a priori. In a recent Commentary4, CIMdata noted the need for Simulation Analytics to deal with this problem. Simulation analytics is the combination of processes and tools that will:

• Enrich the information content and perhaps dramatically reduce the volume of data that needs to be stored and retained.

• Compute CAE-specific metrics. • Find or enable the user to find “interesting” features across multiple runs

and projects. • Provide the data in a form that can realistically be retained and archived.

Tools that are tailored or specialized for simulation data are required: the tools must handle field data and must also be able to compute simulation-specific metrics and metadata. As noted in Wikipedia, “Analytics is the discovery and communication of meaningful patterns in data. Analytics often favors data visualization to communicate insight. Since analytics can require extensive computation, the

3 VCollab is a tool for 3D data visualization and collaboration. http://www.vcollab.com/ 4 CIMdata commentary, Avoid Drowning in a Deluge of Simulation Data: The Case for Simulation

Analytics, August 2012. http://www.cimdata.com/publications/commentary.html?commentary_ID=93

Simulation analytics provides a means to manage the large volume of simulation data and to extract the relevant information.

We have to deal with the deluge of data, and to establish ways to extract the relevant information.

Page 8: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 7

algorithms and software used for analytics harness the most current methods in computer science, statistics, and mathematics.”5 In other words, applying this definition, analytics is outside the usual notions of post-processing and is beyond the usual paradigm of manually examining the results.

Our first proposed step is to enrich the information contained in the raw results. One goal is to reduce the file size so the data may be more easily analyzed. Another goal is to compute metrics, metadata and integrated quantities that will help people understand the data and to identify interesting features.

The second proposed step is to provide an interactive, visual environment where many different simulations may be analyzed and compared. The field data must be available, since analysis only of the metadata and integrated quantities is insufficient.

CIMdata is not aware of any single tool that accomplishes all of this vision for simulation analytics, but tools are available that accomplish parts.

There are various approaches to data compression for large files. Some of them are “lossy,” in that the full accuracy of the original data cannot be recovered. The data may be stored with lower precision, or it may be interpolated onto a different computational grid. Some tools are application-specific, for example FEMZIP compresses LS-Dyna files only. Some simulation applications are able to generate standard reports. Such reports are a big step along the road to obtaining smaller files that are much richer in information. However, the reports are pre-scripted, and relevant information in the solution may be missed. They are useful for quick comparisons of the differences between simulations, but not useful to gain a detailed understanding of the underlying physics. In practice, setting up standard reports for a new project often involves a significant amount of scripting.

The file size issue is also important for communication and collaboration. There are emerging standards such as JT and 3D PDF for lightweight representations to visualize design data. These standards have not been adopted for simulation data and are incapable of transmitting detailed simulation information. Tools like VCollab do create simulation-specific lightweight files. The second part of the issue is a visual platform for simulation analytics. Tecplot Chorus, discussed in the next section, is such a platform. There are a number of tools available that will orchestrate simulation studies for optimization, design of experiments, or robust design. Examples are Altair’s Hyperstudy, SIMULIA’s Isight, and Phoenix Integration’s ModelCenter. These tools help to screen the results of many simulations based on metadata such as key results and integrated quantities, but they generally do not allow quick access to the underlying simulation field data. Understanding the physics behind the results is still a barrier.

5 http://en.wikipedia.org/wiki/Analytics

Tools are available to compress simulation files, and to provide standard reports that are much smaller than the original files.

Standards for lightweight representations to visualize design data are not capable of supporting simulation information.

Tools to orchestrate simulation studies focus on integrated measures, and not on field data.

Page 9: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 8

Tecplot Chorus: A Platform for CFD Analytics Tecplot Chorus provides an analytics platform for fluids simulation that fulfils much of the vision described here. Chorus provides a framework to examine the flow physics and the details of results across multiple simulations. It also computes and displays metadata such as moments and forces and key independent variables such as the parameters being varied in an optimization study. In a sense, Chorus combines analytics and traditional post-processing enabled across a large number of simulation cases.

Chorus is also able to compute surrogate models in the form of response surfaces. These estimate the functional relationship between a dependent variable and the independent variables. Surrogate models can be used to search for optimum configurations, to interpolate between cases, and to estimate sensitivity. They can also be used in subsequent analyses, such as control system development where it is not practical to include the full CFD simulation.

It is worthwhile to repeat that Chorus is meeting the need to comprehend and interpret simulation results by simultaneously looking at integrated quantities and the field data. The need for such analytics tools is driven by ever more complex simulation scenarios, and the trend towards using simulation to seek optimal, robust designs.

User Interviews and Case Studies CIMdata interviewed a number of end users, managers, and industry experts about the issues “beyond simulation data management.” All agreed on the problem, though those in smaller groups felt they were better able to personally negotiate resources like storage than those in larger companies.

Managers and industry experts agreed on the waste in storing large results files that were likely never to be further investigated. End users were less likely to see their practices as wasteful and are prepared to keep data “in case” they might need to look at it.

The interviews also included some users of Tecplot Chorus. It is interesting that they all acknowledged using Chorus as a single-user tool, but envisioned it as a collaborative solution to understand and explain simulation results. The users all stated the business case for Chorus as time saved in managing and finding data and in not having to write or customize scripts to compare results across projects. In addition, they quoted time saved and productivity gained in comprehending, interpreting, and communicating results. The users all felt Chorus was a good tool to identify invalid simulations, points that are “out of bed.” At the same time, some pointed to simulations that seemed invalid based on the metadata, but proved to be valid when examining the field data.

Tecplot Chorus is an analytics tool for CFD. It provides integrated quantities, response surfaces, and access to the field data.

Chorus is envisioned by its users as a tool to facilitate communication and collaboration.

Page 10: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 9

These users also stressed the value of the response surfaces (surrogate models) to gain an understanding of the model parameters. This has enabled them to bring more design variables into their studies. A case study is provided by Mr. Andy Luo, of Swift Engineering. At issue was the design and placement of the rear wing on the Nippon series of open-wheel race cars. The question was the performance of the wing on its own car, and also the effect the wake would have on cars following or attempting to pass. See Figure 2.

Figure 2—Study of Wing Geometry on the Aerodynamics and Wake of an Open-Wheel Race Car

(Courtesy of Tecplot)

As shown in Figure 3, Chorus was used to interpret the data. It was found that some points were not particularly well represented by the response surface. If only the integrated metadata quantities are available, it is likely these points would be disregarded as invalid simulations. Chorus allows immediate access to the field data, and it was found that there were flow instabilities, and that these points were actually valid. A design was found where the wake’s interference with the aerodynamics of a following car was minimized. This enhanced passing ability, and the competitiveness of the races. This design also was more robust, in that it avoided the flow instabilities.

Examining only the metadata may cause results to be incorrectly classified as invalid.

Page 11: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 10

Figure 3—Project Analysis with Tecplot Chorus

(Courtesy of Tecplot)

Conclusion The importance of simulation for product development and manufacturing continues to increase. With the increasing adoption of simulation for optimization and for robust design, the growth rate of simulation data is likely to exceed the performance predicted by measures like Moore’s Law. The ensuing deluge of simulation data poses significant challenges. Data storage capacity is growing much faster than data access times. There is a significant risk that companies will create warehouses of data that cannot be effectively analyzed. The danger is that simulation will be inadequately leveraged, and that there will be huge amounts of waste in generating and storing data that is not utilized. Simulation analytics should be applied to better understand and interpret the results of complex simulations involving variation in operating conditions or design alternatives. Tecplot Chorus is an analytics tool that allows comparisons across many such simulations. It provides quick access to the field data, in addition to metadata such as design variables and integrated metrics.

Tools should be applied to increase the level of information in simulation results files, thereby decreasing file size and better enabling data mining for important features. Companies also need to strengthen their business processes for managing simulation data.

About CIMdata CIMdata, a leading independent worldwide firm, provides strategic management consulting to maximize an enterprise’s ability to design and deliver innovative products and services through the application of Product Lifecycle Management (PLM) solutions. Since its founding nearly thirty years ago, CIMdata has

The importance of simulation for product development and manufacturing continues to increase, as does the volume of simulation data. Simulation analytics is key to coping with the glut of data that needs to be analyzed.

Page 12: Beyond Simulation Data Management Filesize

Beyond Simulation Data Management Why Innovative Approaches are Required

Copyright © 2012 by CIMdata, Inc. 11

delivered world-class knowledge, expertise, and best-practice methods on PLM solutions. These solutions incorporate both business processes and a wide-ranging set of PLM enabling technologies. CIMdata works with both industrial organizations and providers of technologies and services seeking competitive advantage in the global economy. CIMdata helps industrial organizations establish effective PLM strategies, assists in the identification of requirements and selection of PLM technologies, helps organizations optimize their operational structure and processes to implement solutions, and assists in the deployment of these solutions. For PLM solution providers, CIMdata helps define business and market strategies, delivers worldwide market information and analyses, provides education and support for internal sales and marketing teams, as well as overall support at all stages of business and product programs to make them optimally effective in their markets.

In addition to consulting, CIMdata conducts research, provides PLM-focused subscription services, and produces several commercial publications. The company also provides industry education through PLM certification programs, seminars, and conferences worldwide. CIMdata serves clients around the world from offices in North America, Europe, and Asia-Pacific. To learn more about CIMdata’s services, visit our website at www.CIMdata.com or contact CIMdata at: 3909 Research Park Drive, Ann Arbor, MI 48108, USA. Tel: +1 734.668.9922. Fax: +1 734.668.1957; or at Oogststraat 20, 6004 CV Weert, The Netherlands. Tel: +31 (0) 495.533.666.

This document is copyright 2012 by CIMdata, Inc. and is protected by U.S. and international copyright laws and conventions. This document may not be copied, reproduced, stored in a retrieval system, transmitted in any form, posted on a public or private website or bulletin board, or sublicensed to a third party without the written consent of CIMdata. No copyright may be obscured or removed from the paper. CIMdata® is a Registered Trademark of CIMdata, Inc. All trademarks and registered marks of products and companies referred to in this paper are protected.

This document was developed on the basis of information and sources believed to be reliable. This document is to be used “as is.” CIMdata makes no guarantees or representations regarding, and shall have no liability for the accuracy of, data, subject matter, quality, or timeliness of the content.