Please copy and paste this embed script to where you want to embed

CAE and Design Optimization – Advanced

Contents

Contents Introduction ......................................................................................................2 About This Series ...........................................................................................2 About This Book .............................................................................................2 Supporting Material ........................................................................................3 Optimization – Requirements And Approaches.....................................................4 Basic Definitions.............................................................................................4 “Mathematically Demanding” Approaches ........................................................5 The Need For Alternative Approaches ..............................................................6 Summary ..................................................................................................... 13 Design Improvement – A Designer’s View ......................................................... 15 Concept vs. Existing Design .......................................................................... 15 Design Of Experiment................................................................................... 16 Approximations: Building Better-Behaved Models ........................................... 17 Stochastics................................................................................................... 18 Optimization ................................................................................................ 20 Summary ..................................................................................................... 21 Statistics – A Worms Eye View.......................................................................... 22 Dealing With Populations .............................................................................. 22 Probability ................................................................................................... 25 Experiments................................................................................................. 27 Statistics – A Birds Eye View............................................................................. 34 DOE ............................................................................................................ 34 Variable Screening........................................................................................ 39 Summary ..................................................................................................... 40 Statistics – A Designer’s View ........................................................................... 42 Applications Of Approximate Models .............................................................. 42 Types Of Approximate Models ....................................................................... 47 Optimization ................................................................................................ 49 Reliability ..................................................................................................... 49 Summary ..................................................................................................... 52 HyperStudy - Putting It All Together ................................................................. 54 HyperStudy.................................................................................................. 55 Before The Study ......................................................................................... 55 Performing The Study................................................................................... 57 Reviewing Results ........................................................................................ 61 Glossary And References.................................................................................. 65

1

Introduction

CAE and Design Optimization – Advanced

Introduction About This Series To make the most of this series you should be an engineering student, in your third or final year of Mechanical Engineering. You should have access to licenses of HyperWorks, to the Altair website, and to an instructor who can guide you through your chosen projects or assignments. Each book in this series is completely self-contained. References to other volumes are only for your interest and further reading. You need not be familiar with the Finite Element Method, with 3D Modeling or with Finite Element Modeling. Depending on the volumes you choose to read, however, you do need to be familiar with one or more of the relevant engineering subjects: Design of Machine Elements, Strength of Materials, Kinematics of Machinery, Dynamics of Machinery, Probability and Statistics, Manufacturing Technology and Introduction to Programming. A course on Operations Research or Linear Programming is useful but not essential.

About This Book This volume introduces techniques to investigate reliability and robustness in engineering design, and provides an alternative interpretation of Design Optimization: the increasingly widely used method of Design Improvement. If you want to study design optimization in the context of concept-design, or New Product Design as it’s sometimes called, you will find the companion volume CAE And Design Optimization – Basics useful. The techniques outlined in this book are usually applied either to improve existing designs or to improve concepts that have been suggested by the techniques outlined for concept-design in the companion volume. While it’s not essential, a good grasp of the basic principles of statistics and probability will help you tremendously. Several essential aspects are covered in this book, although in a qualitative fashion. You may want to treat the chapter titled Statistics – A Worm’s Eye View as a reference. If you choose to adopt this approach, at least a cursory reading of this chapter is strongly recommended.

2

CAE and Design Optimization - Advanced

Introduction

The various references cited in the book will probably be most useful after you have worked through your project and are interpreting the results.

Supporting Material Your instructor will have the Student Projects that accompany these volumes – it should certainly be made use of. Further reading and references are indicated both in this book and in the Instructor’s Manual. If you find the material interesting, you should also look up the HyperWorks On-line Help System. The Altair website, www.altair.com, is also likely to be of interest to you, both for an insight into the evolving technology and to help you present your project better. The two volumes of this series that cover manufacturing simulation and multi-body dynamics complement the techniques covered in this book. They should be interesting from an application perspective if your areas of work include non-linear analyses and multi-disciplinary optimization.

All models are wrong, some are useful. George E.P.Box

3

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

Optimization – Requirements And Approaches The earlier volume in this series, CAE and Design Optimization – Basics, introduced the need for engineers to design products optimally, and presented one definition of “optimum” as “the greatest degree or best result obtained or obtainable under specific conditions”.

Basic Definitions A design problem usually involves a set of resources, some limits on these resources, and one or more performance-criteria to achieve with some required degree of precision. There is almost always more than one correct “answer” to a design problem. These are feasible in the sense that they provide the required functions, and work with available resources. Some answer are better than others either because they use fewer resources or because they meet the performance criteria with a higher precision, or with a higher reliability, or to a higher degree. The optimum design is one that achieves the function to the best degree using the fewest resources. While traditional design approaches draw upon expertise or intuition to arrive at one or more feasible solutions, optimization techniques are valuable in design because they turn the problem around, using an inverse formulation. Instead of being given a design to check for feasibility, the computer program is given the specifications and asked to hunt for the best solution. To provide the specifications to the computer program we need to state: 1. The data: things that are given and cannot be changed. For instance, if a component is to be designed to carry a particular load, the designer must take the load as “data” and must not seek to change it (unless, of course, a feasible solution cannot be found). The design space is the region within which the designer must work. This data is captured in the analysis model, which could be a Finite Element model or any other mathematical model. 2. The design variables: things that can be controlled by the designer. For example, the thickness of sheet-steel or the cross-section of a beam. 4

CAE and Design Optimization – Advanced

Optimization – Requirements and Approaches

3. The responses: calculated or derived values that must be tracked to check whether constraints have been violated, or to rank the quality of various design options. 4. The constraints: limits on the resources that must be observed if the design is to be judged feasible. For instance the permissible stress in the chosen material or ranges of frequency that must be avoided. The constraint could be either on a design variable (maximum available thickness of sheet-steel, for example) or on a response (maximum displacement at a point, for example). 5. The objective: the measure of quality of the design. This may be the center of gravity, or the deflection at points of interest, etc. Usually we seek to minimize the objective. Remember that minimizing an objective is the same as maximizing its reciprocal. Also remember that this is distinct from a constraint. A constraint needs to be satisfied, while the objective must be minimized.

Original

Optimized

Using optimization at the concept-design level itself is an attractive option. This gives us the capability to decide how and where to place material within the design space so as to achieve the objective to the best degree possible while satisfying as many constraints as possible without drawing on intuition or experience. There are various ways to use these powerful tools. Topology optimization, topography optimization, size optimization and shape optimization methods are used either singly or in combination to achieve remarkable economy and elegance of designs. Their extensions to discrete optimization make them an indispensable tool for applications such as design of composites.

“Mathematically Demanding” Approaches The 4 methods of optimization named above will not do if the mathematics involved is ill-behaved. 5

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

A well-behaved function is one that is both predictable and tractable. That is, we can both understand its behavior and evaluate the behavior at all points in the domain. One measure that affects our ability to understand the behavior of a function is its continuity. As you will recall from your introductory course on Calculus, the continuity of a function is closely related to its differentiability – that is, whether or not the gradient of the function can be evaluated. If the function is differentiable we can use gradient search methods to locate the optimum solution. Another useful measure of the tractability is the number of turning points in the function: a function with just one minimum gives us the comfort that we will find the minimum regardless of where we start from. If the curve / surface has multiple turning points, though, we may find ourselves trapped in a local minimum, unable to reach the global minimum using gradient search methods.

The Need For Alternative Approaches Unfortunately, mechanics is not always accommodating. Simple and essential aspects such as contact or friction mean that the mathematics is pushed into areas where the required degree of decorum cannot be counted on. We can of course, build models that neglect these aspects. In this approach, used quite frequently, we linearize the problem. Often, a linear approximation serves very well, particularly since engineers use factors of safety to account for the approximation of the model. In many cases, even if linearization is recognized as inaccurate, we use it as a starting point to arrive at a conceptual design. Then we use more accurate models to verify and improve this concept. There are some times, however, when linearization is simply not possible. There are several factors that force us to step beyond linearization and look for advanced methods of design, analysis and optimization.

6

CAE and Design Optimization – Advanced

Optimization – Requirements and Approaches

Multi-Disciplinary Optimization Almost any problem in the physical world involves multiple effects. Quite often each of these is characterized by a different equation. For instance, we use Navier-Stokes Equations to describe the flow of strainenergy in solids and fluids, but must resort to Maxwell’s Equations to model the flow of electromagnetic energy in the same solids or fluids. We simply do not have a single equation that adequately predicts the flow of both types of energy. In some cases, we can decompose the effects because they are orthogonal and treat them independently. In the world of CAE, this means that we will use a different solver for each effect. In other cases, we cannot afford to decompose the effects because they are closely linked. In this case, we try to derive equations that work adequately to model all effects in a single model. Almost invariably, these equations are so complex that they require enormous computing resources to evaluate. An excellent example of this is Fluid-Structure-Interaction (FSI). To compute the inflation characteristics of a commonly used safety device, the airbag used in cars1, FSI must be used. The gas within the airbag is a fluid, described by an equation of state and following the gas laws. The bag itself is a structure (of fabric). It is impossible to analyze the two effects independently and still get meaningful results!

Multi-Objective Optimization In some cases the quality of the design cannot be reduced to a single objective. There are multiple measures of product quality and it’s not obvious which solution is the best.

1

The picture shown is from the Research Quarterly, Summer 2003, of the Los Alamos National Laboratory. 7

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

Consider the dilemma a car designer faces. The task may be to design a vehicle that provides a mileage of at least 20 km / liter and satisfies the safety statutes. Can we treat either of these functional requirements as a design constraint? Yes, that’s certainly possible, but it’s not the best way out. Satisfying the constraints means you will come up with a design that just satisfies both the mileage requirement and the safety-statute. It does not mean you will achieve the best possible design. Your competition may come up with a design that has an even better mileage or is even safer. Clearly, treating the requirements as constraints will not take us to the optimum because of the difference between a constraint and an objective. A constraint must be satisfied, while an objective must be minimized. If, as in the case of the vehicle, there are two or more responses to be minimized, this is called Multi-Objective-Optimization (MOO). Very often the objectives are opposing. In the car, an increase in safety by adding stiffness and weight usually results in a fall in fuel economy. A design where improving one objective results in worsening at least one other objective is said to be Pareto Optimal. A collection of all such designs is called a Pareto Frontier. Quite often, in a design review, someone will ask “What if we were to increase this variable?” or “Can we gain here if we are willing to give up something else?” If design iterations start after such questions, development time overruns may be the result. Pareto frontiers help answer such questions quantitatively. The designer’s task is to deliver the Pareto Frontier to the other members of the product team, so that an educated decision can be made on what to trade off. There may be several designs that exceed both the fuel-economy target and the statutory safety laws. Deciding which of these is “the” best calls for an evaluation that lies outside the design engineer’s realm. Most often, marketing strategies decide this. Choosing a solution invariably involves a trade-off to arrive at the best compromise.

8

CAE and Design Optimization – Advanced

Optimization – Requirements and Approaches

As an aside, engineers who tend to underestimate the importance of marketing would do well to remember Akin’s Laws of Spacecraft Design2, one of which points out that “A bad design with a good presentation is

doomed eventually. A good design with a bad presentation is doomed immediately.”

Non-Linear Analyses Problems that involve large deformations, large strains, plastic flow, contact, radiative heat transfer, etc. are characterized by non-linear differential equations. Metal-forming is one such application. Most general non-linear differential equations do not even possess proofs of existence of a solution. These are challenging problems to even solve, let alone optimize using the inverse- approach described earlier for concept-design (that is, specify the criteria and let the algorithm locate the best design). We may not be sure that an optimal solution exists (that is, we may not be sure that there is a global minimum), but would we be foolish to give up without a search? In cases like these it makes sense to understand the lay of the land first.

Design Space Exploration There are several fields of engineering where mechanics is not yet upto the task. Many real-world complexities are not adequately understood, much less captured satisfactorily in mathematical models. Like an explorer in uncharted territory, we may need to explore the design space to understand how the various variables are linked, whether there are any combinations that lead to pathologically bad designs, etc. In a lot of cases it is a foolish waste of resources to start searching for an optimum without first checking that an optimum even exists. Design Space Exploration is the first step towards this. Once the results have been reviewed, the search for an improved design can be embarked upon with a better estimate of the time and cost of finding such a solution.

2

Written by Dave Akins. See http://spacecraft.ssl.umd.edu/ for the complete list. 9

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

We can therefore use a numerical procedure to explore behavior, rather than to find a “right” answer. This is very well illustrated by the following abstract3 which is drawn from the field of bio-mechanics. Since the language is outside usual engineering expertise, areas of particular importance to us have been emphasized: “The use of a mesial-occlusal-distal (MOD) restoration in repairing a large carious lesion depends on many factors. Biomechanical performance is one of the most important. It has been recognized that resistance to restoration failure is not solely a biological concern (e.g. toxicity), but that the cavity shape, dimensions, and the state of stress must all be taken into account. In the present study, a newly developed auto-mesh program was used to generate 30 three-dimensional (3D) finite element (FE) models simulating the biomechanics for multiple factorial design of the MOD gold restoration in a maxillary second premolar. Stress levels were related to individual design factors (e.g. pulpal wall depth [P], isthmus width [W] and interaxial thickness [T]) and to their interactions under the worst physiological scenario: a concentrated bite force acting on lingual cusp with debonded interfaces between cavity walls and restorations. The results showed that enlarging the volume of the MOD cavity significantly increased stresses in enamel but did not intentionally affect stresses in dentin. The alternation of individual design parameters significantly changed the peak stresses (P < 005). For all three parameters, except for the width, the peak stress increased as the cavity dimension increased. Stress elevation rate (termed as 'volumetric stress rate' – stress elevation by increasing one unit volume of the restored materials) was different among three design factors. Depth was the most critical factor governing the stress elevation in enamel (176 MPa mm3) while length (interaxial thickness) was the most important parameter in dentin (049 MPa mm3). Width was the least compromising factor to the remaining tooth, 032 MPa mm3 for enamel and −023 MPa mm3 for dentin. The findings, at its core, did not fully agree with the

traditional concept that the preservation of tooth substances will reduce risk of tooth fracture. This study leaves open possibility for the structural optimization of the MOD restoration”

Robust Design Webster’s dictionary defines a Robust System as one that has “demonstrated an ability to recover gracefully from the whole range of

exceptional inputs and situations in a given environment. One step below bulletproof. Carries the additional connotation of elegance in addition to just careful attention to detail.” Now recall the definition we used at the beginning of this chapter for “optimum”: “the greatest degree or best result obtained or obtainable under specific conditions”. 3

From “Multifactorial analysis of an MOD restored human premolar using auto-mesh finite element approach”, Lin CL, Chang CH, Ko CC, J Oral Rehabil. 2001 Jun;28(6):576-85

10

CAE and Design Optimization – Advanced

Optimization – Requirements and Approaches

The last two words in that definition have often exercised engineers’ ingenuity. Since “all measurements are subject to variation4”, to what extent can we rely on the existence of “specific” conditions? If the conditions that are used to arrive at the optimum design themselves are likely to vary, then how certain can we be that the design is indeed optimal? How will the design respond to the “exceptional inputs” that Webster mentions? Consider the graph, which plots the response vs. a design variable. Let’s assume the objective is to minimize the response. That is, the best value for the design variable is the one that gives the least response. Variations in the design variable (x) result in variations in the response (z). In the figure, possible variations in x are shown at two points: at the optimum point and at the “robust” point.

As shown in the figure, this means that the non-optimum-but-robust design is actually better than the optimal design! Given the same range of variation in x, the spread in z at the optimal-point exceeds the spread in z at the robust point. The optimal design is better only if variations in the input can be controlled. A conscientious designer should specify the maximum permitted variation 4

The observation is credited to W.E.Deming, the importance of whose work on Quality cannot be overestimated. 11

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

within which the optimum remains optimal. Else, the claim to be an optimum design is clearly dubious. Further, as we saw, mathematically demanding techniques to find optimal solutions make several assumptions: that the functions are all continuous, that a global minimum exists, and so on. While all models are incorrect to some extent, can we quantify the error in the approximation inherent in these assumptions?

Reliability Reliability is usually defined as the probability of a measure lying within a specific range. Almost any product will fail sooner or later. Designs that rely on large factors of safety are often called conservative. That is, they are safe, but perhaps excessively so. Can they be made less safe without any reduction in salability? The consumer-good industry in particular has strong motivations to think this way. Most people would prefer a fragile-but-light cell-phone to one that is stronger-but-heavier because it’s over-designed by liberal uses of safetyfactors. Not only is product-failure not as dangerous as an air-crash, the products themselves are often designed to have a short life. HP is said to have spent over US$125,000,000 to design lighter, and cheaper, printers. In an effort to convince his designers, “manager Tom Alexander finally grabbed an HP printer and set it on the conference room floor. Then he stood on it, all 200 pounds of him. The point behind his grandstanding? Customers aren't going to use printers as step stools, so don't add costs by building them strong enough to withstand the weight of a grown man.5” Why is this important for our goal of optimizing? We need to recognize that if an event or condition is probable to occur, not dead-certain, then we can quantify the probability of failure. In the example of the HP printers, it is improbable but definitely possible that a customer may pile something on a printer that is even heavier than the intrepid manager. If the probability of this is known, and if from this the probability of failure can be estimated, then the business manager can design a warranty cost accordingly.

5

12

From an article in Fortune, February 2003.

CAE and Design Optimization – Advanced

Optimization – Requirements and Approaches

In other words, we need to account for the fact that in the real world, many events are better modeled as probable than certain, and we need to evaluate our designs in the light of these probabilities. These evaluations are measures of the reliability of the design. An infinitely reliable design (one that will never fail) is often much more expensive than one that is designed to be 99% reliable (i.e. one that is designed such that one out of every 100 will fail). Consider the example of the “Spirit”, the Mars Rover. The exorbitant cost of launching a space probe to Mars6 meant that the design goal was a life of 90 days only (a Martian day, called a sol, is 39 minutes and 35 seconds longer than an earth-day). The probe celebrated its 1000th day of active-service on the 26th of October, 2006. Whether this was due to a happy combination of improbable conditions or an excellent design is a matter of conjecture!

Summary

“Business for consumer electronics makers hasn't looked this good in two decades, with revenue rising well over 10 percent in major world markets, yet gadget makers are still turning in meager profits. Analysts said the industry has landed in a virtuous cycle where higher volumes are needed for better economies of scale and lower costs, which lead to more competitive prices. That drives consumer demand, but also causes oversupply, which leads to low margins.” From a Reuters article by Lucas van Grinsven January 8, 2007

The techniques that we have branded mathematically demanding are useful, but obviously the above cases require alternate approaches. In several cases, we prefer the term Design Improvement to Design Optimization. This is sensible because while we cannot be certain that a particular design represents an optimum, we can be certain whether or not it represents an improvement over the previous design. We can, in fact, use the phrase design improvement interchangeably with design optimization.

6

Mars probes have been marked by a failure rate that is high by terrestrial standards, but remarkably low given the lack of information and control. The cost of failure, unfortunately, is almost always enormous. 13

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

Finally, do not take the use of the phrase “mathematically demanding” to mean that the techniques that will be presented in this book are not rigorous. It’s just that they follow a different set of mathematical rules, to which we will devote three chapters!

Robust Design: Not just strong. Flexible! Idiot Proof! Simple! Efficient! A … high level performance despite … a wide range of changing client and manufacturing conditions. Genichi Taguchi

14

CAE and Design Optimization – Advanced

Design Improvement – A Designer’s View

Design Improvement – A Designer’s View Since the mathematics involved can be quite demanding, it’s useful to start with a qualitative understanding of the approaches that help us design in the face of the challenges outlined in the earlier chapter. You may find it useful to re-read this chapter after having covered the mathematics.

Concept vs. Existing Design One of the advantages of using methods like topology optimization is that they can make the concept design stage come alive with the non-intuitive yet functional elegance of the solutions. However, not all designers have the freedom to work with fresh concepts. In many cases, existing designs are tweaked or nudged to better states. This is particularly true in the case of “one-of” designs like spacecraft, where the fear of failure inhibits the use of radical innovations. We can summarize the relevance of design-improvement methods, using the following table: Challenge

Applicability of Basic Tools

Advanced Tools Required?

Concept Design

Very useful provided the mathematical models are well behaved

Not essential

Existing Designs

Very useful provided the mathematical models are well behaved

Useful, not essential

Design Space Exploration

None

Design of Experiment

Non-linearity

Linearization provides an initial starting point.

Approximations,

MDO

Decomposition may be applicable

Optimization Approximations, Optimization

MOO

None

Experiments, Approximations, Optimization

Robustness

None

Stochastic Studies

Reliability

None

Stochastic Studies

15

Design Improvement – A Designer’s View

CAE and Design Optimization – Advanced

The rest of this chapter introduces the Advanced Tools listed above: Design of Experiments, Approximations, Optimization and Stochastic Studies.

Design Of Experiment The current definition of the scientific approach is often traced back to Francis Bacon’s procedure “which by slow and faithful toil gathers information from things and brings it into understanding”. The use of the adjective “scientific” is taken to mean that there is a verifiable basis for any assumptions of behavior. Today’s step-by-step approach to understanding phenomena is 1. Put forward a hypothesis 2. Analyze the data available 3. Conclude by rejecting or accepting the hypothesis This approach is well recognized and widely applied to Quality Control, where engineers try to determine settings for the manufacturing process that will result in an acceptable quality. Most manufacturing processes are very complex and there is often little understanding of cause-and-effect. Despite this, the analysis of gathered data allows engineers to fine-tune processes. More recently, this approach has been applied to computer simulations. There is one important and significant difference between computer simulations and experiments in the real world. In the latter, repeating an experiment even on the same subject invariably leads to at least a small difference: the galvanometer records a slightly different value, the subject of the interview reacts differently, etc. This inherent noise is absent from computer models7. Running the same analysis again on the same computer will give the same answer. The next sections will cover the use of Design Of Experiments8 (DOE) to computer simulation, and show why they have been so successful in their adoption. The use of DOE is so widespread today that we often talk of “the result of the DOE” rather than the more correct “result of the experiment”! 7

Computers models too can be susceptible to noise. We neglect it in this discussion, but it is a topic of active research. 8 This method is usually traced back to Ronald Fisher, for whom the F-distribution, which we will encounter when we discuss ANOVA, is named. 16

CAE and Design Optimization – Advanced

Design Improvement – A Designer’s View

Approximations: Building Better-Behaved Models Any model is an approximation of physical behavior. There are many reasons we have to accept this limitation. Mathematics today does not have entirely satisfactory ways to deal with several classes of numbers, equations are based on observations which cannot be proved to be universal, measurements have limited accuracy, and so on. The introduction of Calculus by Newton and Leibniz led to an explosion of equations to predict the rate of change of one variable given the rate of change in others. These differential equations were quickly applied to physics and mechanics. Unfortunately, several of these equations were not solvable. An equation could be recognized as comprehensive, but in the absence of an effective way to solve it, engineers were not benefited much. The introduction of series methods and the subsequent use of computers have made a dramatic difference, and continue to do so even today. Unfortunately, even today, several equations are so intractable that they cannot be solved effectively at an acceptable cost or in an acceptable time. Why is this relevant to us? Remember that numerical methods like the Finite Element Method have been tremendously successful and have, in several applications, promoted the use of highly-non-linear models. Now recall from our earlier discussion that the “ill behavior” of some mathematical models prevents us from using gradient-based optimization methods. This is a bit of an impasse. We recognize that the models are acceptable in terms of simulating behavior. They do a good job of linking independent variables and dependent variables. That is, given a scenario described by the independent variables, we can use these models to calculate responses of interest. The power and utility of these models is clear. But we cannot use these models in optimization because they are not wellbehaved enough. Either gradients don’t exist, or the model is so demanding that we cannot afford to evaluate it at enough configurations to search for the optimum.

17

Design Improvement – A Designer’s View

CAE and Design Optimization – Advanced

One way out of this is based on the following steps9: 1. Use the mathematical model (such as an FE model) to calculate responses at selected points. DOE is used to reduce the number of these evaluation points as much as possible. 2. Use these evaluated responses to generate an approximate equation that is mathematically well-behaved. At a minimum, it should be quick to evaluate at any point of interest. 3. Perform further investigations using this approximate function. Since it’s easy to evaluate, various methods can be used to get at the improved design. These range from gradient based methods to Monte Carlo methods, as will be detailed later. 4. Since the approximation is just that – approximate – use statistical measures to quantify the reliability of the approximation

Stochastics Most problems in engineering textbooks are very well defined. Since the goal is usually to acquaint the student with the principles or theories being taught, complexities are usually ignored. Problems in the real world are rarely so accommodating. New product designers have to live with constant changes in specifications, giving rise to the joke that designing to a specification is like walking on water: easy to do if it’s frozen. But even if specifications are removed from the list of things that can change, the engineer in the “real” world is presented with a huge list of items that cannot be treated as the absolute truth. A spread of 10% is often considered to be sufficiently accurate for engineering purposes. Engineers have long recognized and dealt with several forms of variation or uncertainty. CAD users will be familiar with the stack-up analyses used to determine tolerances that should be assigned to nominal dimensions. These are techniques that designers employ to deal with uncertainties in manufacturing related aspects. 9

Readers of the volume on CAE And Design Optimization – Basics will recognize that OptiStruct also uses approximate models.

18

CAE and Design Optimization – Advanced

Design Improvement – A Designer’s View

Stress Analysts use factors of safety to deal with uncertainty in loads, material properties or environmental conditions. The magnitude of the factor-of-safety depends on the probable variations in conditions and on the cost of failure. Another approach is to construct analytical methods such as Random Response Methods, widely used in vibration studies. Sometimes, these approaches lead to an unnecessarily conservative, and therefore expensive, design. From a design engineer’s perspective, then, rather than calculating the effect of a clearly defined condition (i.e. something that is deterministic) you need to work with a set of conditions that is known to vary. Some variations may be entirely random: you may know little of the event, except that it is possible. Earthquakes are an excellent example of this. No one, today, can predict when an earthquake can occur. Nor can they predict what the magnitude of the earthquake will be. Many items in Mechanical Engineering are stochastic: that is, they are random, but in a predictable fashion10. You cannot say for certain, but you can conjecture, or guess, at when the event will occur, and with what severity. The word “stochastic” is relatively easy to understand, particularly if you look at its synonyms: hypothetical, , assumed, indeterminate, postulated, speculative. The last synonym is excellent: a stochastic method is one that is speculative. Usually, stochastic processes are treated as distinct from (and in some sense the “opposite” of) deterministic processes, which are defined by a set of equations that specify exactly how the process behaves. Examples of stochastic processes range from currency exchange rates to the spread of epidemics. What is common to all of these is that we cannot say for sure how the value will evolve, but can say what value it will probably take.

10

All random events can be modeled using stochastic approaches provided there is a large volume of data. 19

Design Improvement – A Designer’s View

CAE and Design Optimization – Advanced

As design engineers, we need a way to simulate possible variations in conditions and study the impact of these variations on measures of design quality. Factors of safety have traditionally been used to compensate for such variations. One effect they have is to convert a stochastic problem to a deterministic one, but at a cost. If you don’t want to pay this cost, you’re going to have to find a way to accept the stochastic nature of variables that affect the performance of your product. It’s important to remember that this does not mean you should neglect factors of safety. Failure prevention is an essential part of design. Stochastic behavior comes in particularly useful when we want to quantify the effects of variations on the design. We can either vary the controllable parameters, or investigate the effects of variations of uncontrollable parameters. In other words, either we are already aware that it won’t fail and we want to see how much we can tweak it towards better performance, or we are willing to tolerate some failure in return for better performance.

Optimization DOE, Approximate Models and Stochastic Models are all ways to investigate the behavior of our Design Space. We can use these investigations to improve our design in several ways. First, if the original model is not mathematically well-behaved, we can apply gradient-based optimization tools to the approximate models to search for an optimum. The methods outlined in this book can also deal with the presence of multiple local minima. Next, we can combine stochastic studies with optimization methods to perform Reliability Based Design Optimization (RBDO). In this approach, we include the uncertainty in data, in responses and constraints in the investigations, and draw our conclusions based on these. We can also perform trade-off studies to identify Pareto Optimal designs and construct Pareto Frontiers. In cases of MDO, we may also need to work with multiple solvers. The variables of interest may have different behaviors in each “physics” and we

20

CAE and Design Optimization – Advanced

Design Improvement – A Designer’s View

need a method to search for the solution that is the best across all these multiple-disciplines.

Summary To optimize design problems that demand the use of complex mathematical models, or multiple disciplines, or that have multiple objectives, then, our requirements of computer software are: •

Support for various DOE approaches

•

The ability to build and evaluate Approximate Models

•

Capabilities to include and interpret Stochastic effects

•

Capabilities to search for global minima, optionally using approximate models

•

The ability to build models for and extract results from multiple solvers

After all, facts are facts, and although we may quote one to another with a chuckle the words of the Wise Statesman, “Lies - damn lies - and statistics,” still there are some easy figures the simplest must understand, and the astutest cannot wriggle out of. Lord Courtney

21

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

Statistics – A Worms Eye View Henri Poincare is said to have described mathematics as the art of giving the same name to different things11. A cursory study of statistics can reinforce that impression. However, in order to effectively deploy the powerful methods described in the earlier chapters, a good grasp of some statistical techniques and their applications to CAE and Design Optimization is indispensable. This chapter lists terms that we will encounter frequently in our usage of CAE for advanced design optimization. Most are defined without any preamble or context. These will be provided in the next two chapters.

Dealing With Populations The word statistics is believed to first been used in German, where the word Staat refers to a body of men. It’s natural, therefore, to start our revision of basic statistics with the definition of the term population. In our context, we take the word to describe all the variables (and their ranges) that form our study. Populations can be further classified.

Discrete vs. Continuous If the members of the population are distinct, we treat them as discrete entities. An example would be the collection of all prime numbers. While the set is infinite in the sense that there is no “last” prime number, each prime is a discrete member of the set. In some cases the data we are dealing with is continuous in the mathematical sense of the term. For instance if we choose to deal with the set of all possible temperatures, the set is truly continuous. In several cases even if the population is truly discrete, we prefer to model it as continuous for ease of manipulation.

11

22

Quoted by E.T.Bell, in “Men of Mathematics”

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

A good example of this would be the population of a small village vs. the population of a large country. Since the former is likely to be a few hundreds at the most, data can be easily manipulated by treating each inhabitant as a distinct member of the village. In the latter case, where the population lies in the hundreds of millions, it is much easier to treat it as a continuous set. In this case, all the theorems of differential and integral calculus can be applied.

Random Variable We use an algebraic symbol to represent any member of the population. This symbol is called a variable. It can take on the characteristics of any member of the population, just as a variable in an algebraic equation can take on any value in the range / domain. If the variable is not predisposed to take on any particular values, then we call it a random variable, often abbreviated to RV. RVs are sometimes segregated into discrete random variables and continuous random variables, depending on whether the population is being treated as distinct or continuous. There is little point in working with variables that are not random, just as there is little point in playing a game where the die is loaded: the outcome is a foregone conclusion. Therefore we focus our attention on random variables only.

Ways To Measure Data Grouping data to form the population is the first step. Unfortunately, defining the basis for a group is by no means trivial or obvious. And different bases for inclusion in a population can dramatically affect the discernible trends. Discerning a trend, of course, is the main reason we collect the data in the first place: as with most applications of modern science, the goal is to understand the behavior of the population. Based on this understanding, predictions can be made and comparisons drawn. Even though the variable is random, statistical studies show that values tend to cluster around the average. Measures of this tendency to cluster are important in our efforts to characterize populations. 23

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

Measures Of Central Tendency There are three basic measures: the mean, the median, and the mode. While there are several measures of the mean, most often we look for a value that can be interpreted as the average value. While the mean, median and mode are often distinct from each other, they can sometimes be the same. In particular, for the Normal Distribution that is described below and that is our main focus of attention, all three are the same. The most common symbol for the mean is µ, while x is sometimes used. Standard Deviation While measures of central tendency indicate where the mean lies, how can we measure the spread of the data? This is done using the standard deviation, for which the usual symbol is σ. It is a standard measure of the dispersion of the data from the mean. Variance The variance is nothing but the square of σ. It represents the average of the squares of deviations from the mean. Squaring the deviations prevents negative values from canceling out positive values. Beginners sometimes make the mistake of treating the standard deviation and variance as equivalent, since the former is the square-root of the latter. But it is important to remember that they have different statistical properties. We sometimes prefer to work with the variance, as we will see when we discuss ANOVA. Coefficient of Variance The variance or standard deviation by themselves may provide a misleading picture. For instance, consider a population with a standard deviation of 3.75 and a mean of 1,000. If we want to compare it with another population whose standard deviation is also 3.75, but whose mean is 10, it is “obvious” that in the former case the data is more tightly clustered around the mean. In mathematical terms, “obvious” means that we are normalizing the standard deviation with respect to the mean. The coefficient of variance is the standard deviation divided by the mean, and provides a more reliable measure to compare populations.

24

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

Probability To define probability, it’s hard to improve on Aristotle, who said “the probable is what usually happens.” From a mathematical perspective, one of the advantages of statistics is that each individual member of the population need not be dealt with separately: we are always dealing with the population as a whole. As a result we need to measure the likelihood of encountering a particular member of the population: in other words, we need to estimate the probability of occurrence of any element or set of elements in the population.

Continuous Distributions When we work with populations, we find it useful to characterize the variation of the random variable over the entire population in terms of a distribution. That is, a symbolic form or formula that we can use to calculate probabilities of occurrence of various values. There are several different distributions that have been encountered in scientific studies: Binomial, Normal, Weibull, Poisson, etc. The most commonly encountered distribution when working with continuous random variables is the Normal (or Gaussian or Bell-shaped) distribution. In this distribution, the range is from -∞ to +∞, and the distribution is symmetric about the mean. The Normal Distribution: PDF The probability of occurrence of any value x (that is, the probability that the random variable will take that particular value) is given by the equation −( x− µ )2

f ( x) =

1 2πσ

2

e

2σ 2

In this equation µ is the mean and σ is the standard deviation. The plot of this Probability Density Function (PDF) is the reason it is also called the BellShaped Distribution.

25

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

CDF The integral of the PDF is called the Cumulative Density Function, since it represents the sum of probabilities of all values that lie within the limits of the integral. x

F ( x) =

∫ f (t )dt

−∞

This is useful. •

F(x1) is the probability that the random variable will take values less than x1.

•

F(x1) – F(xo) is the probability that the random value will take values between x1 and x0.

•

1 – F(x1) is the probability that the random variable will take values above x1.

The sum of probabilities of occurrence of all values within the population must be 1. For the Normal distribution, this means that +∞

∫ f (t )dt = 1

−∞

Any Normal Distribution can be transformed or mapped to the standard normal distribution, which has a µ = 0 and a σ = 1. Values of the integral are tabulated in Handbooks12. For any given normal distribution, we first transform it to the standard distribution, then check the probability of occurrence of a particular value. For instance suppose your manufacturing process has a µ = 1001.5 σ = 23.54. And suppose your customer will only accept components with a value less than 1001.5 ± 20. You will transform these values to the standard distribution and check the probability that the components you manufacture will be acceptable. If the probability of occurrence of 1001.5 ± 20 is 0.60, this means that 60% of the components you manufacture will be accepted, and 40% will be rejected.

12

Obviously, it is easier to look up a table than to evaluate the integral! In fact, the integral cannot be evaluated analytically. 26

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

The Central Limit Theorem For the Normal Distribution, the Mean, the Median and the Mode are all equal. How about for other distributions? One of the difficulties that beginners face is that of deciding which distribution should be used to model a particular physical event13. This is not an easy question to answer. A reference to the theory of each distribution and a research into earlier uses for the physical event are recommended. However in most experiments, regardless of what distribution best characterizes the population, the Normal Distribution is still of overriding importance. The reason for this is the Central Limit Theorem, which tells us that regardless of the distribution followed by the random variable in question, the mean of samples drawn from the population will follow a Normal Distribution.

Experiments Even with discrete variables, enumerating all possible values is often beyond the power of available resources. For continuous distributions, it is obviously out of the question. Data is sometimes collected by observation. The observer makes no effort to control the variables, restricting effort to collecting data and drawing conclusions from these. If an independent variable does not change during the period of observation, the analyst can draw no conclusions about its effect on behavior of the population. To get around this, the statistician needs to control the independent variables. On other words, the statistician needs to get data from an experiment, where some variables are deliberately controlled.

Sample (vs. Census) Enumerating every possible value that the random variable can take is called a census. While a census is sometimes used – for instance in critical healthcare – most often we use the power of statistics to draw conclusions from a sample. A sample is a subset of the population. The statistical measures of the samples are used to infer the statistical properties of the population.

13

Finite Element beginners will sympathize: the choice of elements is often just as confusing for a beginner. 27

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

Random Samples In order to do this effectively, the sample must be free of bias. That is, it must not be predisposed to any particular sets of values. For instance, if you choose to conduct a survey to determine the most common names in your locality by looking up the telephone directory, you are restricting your survey to that part of the population that is accessible by telephone: perhaps this introduces a bias or prejudice into your observations. Samples that are free of bias are called random samples. Deciding which parts of the population to sample is an important question. Indeed, “the generation of random numbers is too important to be left to chance14.” As mentioned above, the Central Limit Theorem tells us that the properties of samples follow a Normal Distribution even if the population itself does not. Why is this important to us? To estimate the properties of a population, we take a number of samples, each of which consists of finite number of members. Now suppose we treat the means of the samples themselves as a population, and calculate from this the sample mean. That is, if we have drawn 20 samples, and each has a mean µi, the sample mean is given by 20

s=

∑µ i =1

i

20

Because the Central Limit Theorem tells us that the means of the samples themselves follow a Normal Distribution, regardless of the distribution the parent population follows, we can use the properties of the Normal Distribution to calculate a confidence interval. This tells us the probability that the mean of the population will lie within some specific distance from the sample-mean. For example, there is a 95% probability that the mean of the population will lie within the 95% confidence interval of the sample mean.

14

Robert Coveyou, of the Oak Ridge National Laboratory, quoted by Ivars Peterson in “The Jungles of Randomness; A Mathematical Safari”

28

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

For small normally distributed samples, the t-distribution is a good estimate of the confidence interval. When using the t-distribution, it is customary to talk of the “100(1-p)” confidence interval, but the intent is the same: to measure the confidence with which we can assert that the mean of the population lies within some distance of the sample-mean. The smaller p is, the higher the confidence in the estimated values. p = 0.05 is the same as a 95% confidence, while p = 0.01 is the 99% confidence interval. If a report says that the “95% confidence interval” for a response R is 10.5 to 21.27, this means that there is a 95% probability that R will lie between 10.5 and 21.27.

ANOVA To make a meaningful comparison between measurements of different samples, the conditions should be the same for all samples. This is obvious. Unfortunately, any experiment is susceptible to noise. That is, even if the experimenter wants to keep everything the same some variations inevitably creep in. Anyone who has worked in a lab knows that no galvanometer ever reports the exact same value twice. In experiments, moreover, it is sometimes not enough to just measure the mean of the population from the means of samples. We also want to make predictions. That is, we want to change the values of control variables and study their impact on the mean of the population. The values of the control variable are sometimes referred to as the signal. It is often important to estimate this signal-to-noise ratio. A manufacturing process that has a high signal-to-noise ratio is hard to control, for example. When evaluating the results of experiments, our interest lies in deciding whether the change in the measured values is because of noise-in-themeasurements or because of changes-in-the control-variable? The ANalysis Of VAriance, ANOVA, is the method used to do this. The control variables are called factors. Values that the control variables can take are called levels. The values of interest are called responses. Let’s say the experimenter has 1 factor, which can take 3 levels. And let’s say the experimenter has constructed 5 samples (that is, 5 groups).

29

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

Then for each response we can present the results of the experiment as in the table below: Group Number

Responses for:

1 2 3 4 5

Level 1 R11 R21 R31 R41 R51

Level 2 R12 R22 R32 R42 R52

Level 3 R13 R23 R33 R44 R44

Note that R11 would have been the same as R21 in the absence of noise. Also, R12 would have been the same as R11 if the level had been the same. The goal of an ANOVA is to study the difference within groups (i.e. at different values of the control variable, which shows the effect of the “signal”), and the difference between groups (i.e. the effect of the “noise”). That is, ANOVA helps us determine whether the variation in the response due to the change in the control variable is significant, when compared to the change in the response due to noise. If you have multiple factors, then the number of possible combinations increases dramatically. This will be discussed in more detail in the next chapter. For now, we will only point out that in most experiments there is more than one factor. Without going into the equations, we will state simply that ANOVA involves calculating variances in the response between columns (between groups) and between rows (within groups). Then, using the F-distribution, the significance of the difference is estimated15. The results of the Anova are often displayed in a table showing the factors and the levels of significance. ss

df

MS

F

p

Factor 1 Factor 2 Factor 3 Factor 4 Error Total SS 15

30

For details on the F-Distribution and why it is used here, look up the references.

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

In the table, ss is the sum-of-squares, df is the degrees-of-freedom, MS is the mean-sum-of-squares, and F is from tables of the F-distribution. The Fvalue is tested against a critical value from the F-distribution (similar to the confidence-interval described above). The larger the F-value, the more significant the effect of the factor on the response. The last column, p, is related to the confidence-interval. For instance, to have 95% confidence in the effect of a factor, the p would be 0.05 for that factor. If there were multiple responses, ANOVA would involve drawing up the table above for each response. Anova is sometimes conducted using the coefficient of variance, in which case it is called ANCOVA. If there are multiple variables in the experiment, then the term MANOVA is sometimes used.

Monte Carlo Methods Constructing random samples is not an easy task. One alternative approach is based on the theory that if an experiment is repeated enough times using random values for the control variables, the responses can be used to calculate the properties of the distribution itself. This is called the Monte Carlo approach. The main advantage of the Monte Carlo methods is that they are scaleable; they can be applied as easily to a 1000-dimensioned problem as to a 1-dimensioned problem. In engineering practice, these methods are often deployed when there are a large number of variables. There are several variations on the method, all of which are sometimes referred to collectively as the Monte Carlo methods, since the principle behind all of them is the same. When dealing with multiple variables, each of which can have multiple levels, the problem of designing an experiment itself can become intractable. As we will see in the next Chapter, designing a meaningful experiment is not a trivial task!

31

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

In other words, several problems are susceptible to the problems of scale: as the size of the problem rises, the resources required rise exponentially. The designer is faced with the choice of diluting the experiment and damaging the confidence interval, or omitting variables or levels at the cost of insight into the behavior.

Many assume wrongly that Monte Carlo methods can be applied only to problems that involve probability.

In such scenarios, Monte Carlo methods used along with Approximate Models can be a very effective approach.

Interested readers should look up descriptions of Buffon’s Needle to get an idea as to how widely applicable the method is.

Reliability Engineering is one area where Monte Carlo methods are very widely used, because the use of random inputs turns a deterministic simulation into a stochastic one. To sum up, Monte Carlo methods use sequences of random numbers to simulate a process. The only requirement is that the process be described by a PDF.

As Buffon’s Needle illustrates, Monte Carlo methods can also be used to calculate the value of π, which is certainly not probabilistic!

Uni-variate Analyses If a distribution has only one independent variable, we say it is univariate. From the available data (gathered either through experiment or by observation) we calculate the correlation coefficient between the variable and the response. The correlation coefficient is usually represented by the symbol r or ρ, and can be calculated by a variety of methods. It’s important to remember that a correlation does not imply a direct cause-and-effect relationship. If the correlation coefficient is non-zero, we then use regression to fit a curve through the data points, and use this regression equation to predict values of the response. Regression is often linear, and is most often calculated using a least-squares approach.

32

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

Multi-variate Analyses If there are multiple independent variables, as is normally the case, we use multi-variate statistics. In this case, the regression equation is a “surface” instead of a curve. The One Factor at A Time (OFAT) approach is sometimes taken to reduce multi-variate distributions to uni-variate distributions. That is, all variables but one are held constant, and the effect of varying that one factor is studied by the experimenter. However, factors are often linked. Some may be independent, some may work in tandem, some in opposition. One of the jobs of the experimenter is to search for such linkages. The OFAT approach will not uncover linkages between variables. An alternate approach, called DOE (described in the next chapter) is better suited to the task.

It's easier to square the circle than to get around a mathematician. Augustus De Morgan

33

Statistics – A Bird’s Eye View

CAE and Design Optimization – Advanced

Statistics – A Birds Eye View The previous chapter laid out several terms in statistics: samples, factors, levels, responses, and tests of significance. We saw how levels of factors are varied across samples, responses are measured, and the significance of the measurements is estimated. All these are based on “sound” mathematical principles. That is, the assumptions and proofs are well laid out. There is one important question that the previous chapter did not address. How are the samples themselves to be constructed? We need a scientific basis that provides guidelines on three critical fronts: •

How many measurements must be made?

•

At which levels must these measurements be made and which factors are important?

•

How can we quantify the error?

The first guideline is provided by DOE. The second question is partially answered by variable screening. The third is addressed by ANOVA.

DOE Design Of Experiments is the procedure of selecting the points in the design space where responses are to be measured. First, let us recall the important terms:

34

•

factors are the independent variables. They may be discrete (e.g. number of pills administered) or continuous (e.g. modulus of elasticity)

•

levels are the values that the factors can take

•

responses are the dependent variables

CAE and Design Optimization – Advanced

Statistics – A Bird’s Eye View

To understand the importance of DOE, let’s take some specific numbers. It will take some patience and concentration to follow the discussion, but the effort is well worth it. For the purpose of discussion, the scenario is as follows: •

We are studying the behavior of a rolling mill.

•

We want to conduct the same experimental measurement 10 times on the same mill so that we allow for any error in measurements or other variations (i.e. noise). That is, there are 10 sample groups.

•

There are 2 Factors, F1 and F2. F1 is temperature, while F2 is the number of passes.

•

Factor 1 can have any of 3 levels – L1, L2 and L3. These represent the values 420° Celsius, 450° Celsius and 470° Celsius.

•

Factor 2 can have any of 4 levels – L1, L2, L3, and L4. These represent the values 0, 2, 5 and 10.

•

The response is the finish of the strip, as measured by the average surface roughness, RA

The combinations of different levels for different factors is best illustrated by the following table (sometimes a graphic display called a Latin Square is used to display the combinations):

# of passes

Temperature 420° 450°

470°

0 2 5 10

As the table shows, there are 12 combinations of factors that are possible – that is, 12 feasible process-settings (3*4, since there are 3 possible levels for the temperature and 4 for the number-of-passes). 35

Statistics – A Bird’s Eye View

CAE and Design Optimization – Advanced

At 10 repetitions, this means 120 possible measurements. If the process designer changes the equipment to allow for 6 levels of temperature and 5 levels of passes, we now have 300 possible measurements. The purpose of the example above is twofold. First, it brings in a new example for the “population”. In this case, the population consists of all the possible combinations of values of all the factors. Since the factors are discrete, a census is possible: that is, evaluate at all possible combinations and choose the best. Second, it brings in a new example of “group” for ANOVA. In a large population, the groups would involve different subsets of the population. In this case since we only have 1 mill, we treat repeated experiments on the same population as different groups. ANOVA, in this case, measures the variation-between-repeated-measurements and compares it with the variation-across-levels. The logic is similar to the earlier example, with the between-groups factor being replaced by the between-repeated-measures factor. Note that if we had had access to 100 identical mills, we could have performed 10 tests, each on a different mill. Clearly this would define a different “population” since the “mill being tested” would be a variable in this case.

Full Factorial An approach where all possible combinations of levels and factors is evaluated (that is, the responses are measured) is called a full factorial experiment. Obviously, it can increase in size very rapidly. For 20 factors and 3 levels per factor, the number of combinations is 320, which is roughly 3x109. In general, if there are n factors, and each factor can have li levels, then the total number of combinations in a full-factorial experiment is given by n

∏l i =1

36

i

CAE and Design Optimization – Advanced

Statistics – A Bird’s Eye View

For preliminary investigations, DOEs are sometimes applied to 2-level designs. That is, only two levels are used for each factor even if more are possible in the population. These 2 levels could be “present” vs. “absent”, “high” vs. “low”, etc. In these cases, the total number of possible combinations is 2n where n is the number of factors. Such experiments are called 2-level experiments, and are usually used to screen variables – to eliminate the trivial and retain the important ones. An obvious way to reduce effort is to choose a subset of the full-factorial. That is, to select a subset of all the possible combinations of levels and factors. This approach is called a fractional factorial design. Depending on which subset is chosen, we end up with different designs. Most of these designs are named after their proposers.

Fractional Factorial In this design, we choose a fraction of the full-factorial. This fraction can be ½, ¼, etc. For instance, a 2-level 3-factor design has 8 combinations if a full-factorial design is chosen. The full factorial design is called a 23 design. If we choose to run a ½ factorial, we call it a 23-1 design, and it will contain 4 measurements. A full discussion of how to choose the fraction is beyond the scope of this book. Look up one of the References for this detail, as well as for a discussion on which subset of the full-factorial to choose for a given fraction. Standard literature often describes fractional-factorial designs as 2(k-p), since there are usually 2 levels per factor. For example, a ½ factorial design with 6 factors is called a 2(6-1) design. “k” is “6”, of course, because there are 6 factors. “p” is “1” because this is a “½ factorial design”. If it were a ¼ factorial design, “p” would be 2. The tables below show the 23 full factorial, and one possible ½ fractional factorial. In these tables, we have chosen +1 and –1 to represent the possible levels for each factor.

37

Statistics – A Bird’s Eye View

1 2 3 4 5 6 7 8

Full Factorial F1 F2 F3 +1 +1 +1 +1 +1 -1 +1 -1 +1 +1 -1 -1 -1 +1 +1 -1 +1 -1 -1 -1 +1 -1 -1 -1

CAE and Design Optimization – Advanced

1 2 3 4

½ Fractional Factorial F1 F2 F3 -1 -1 +1 +1 -1 -1 -1 +1 -1 +1 +1 +1

The fractional factorial design was generated by first choosing 4 values for F1 and F2. The levels for F3 were chosen by multiplying the chosen levels of F1 and F2. We say that we have “confounded F3 by F1 and F2”. We do pay a price for this simplification. The confounding (also called aliasing) means we lose the ability to determine some interactions between factors. Fractional Factorial designs are suitable when some factors are considered more important than others. We are willing to give up some resolution in the weaker factors in return for the economy we gain by virtue of the shorter experiment. However, remember that there is a-priori judgment involved in deciding which factors to treat as “weaker”. Since the effects of the lesser factors is reduced, such designs are sometimes called screening designs. Sometimes, we may choose to hold some factors fixed at chosen levels so that we can measure the contribution of these factors to the total variation of the responses. This is called Blocking. The “blocks” are the levels at which the factors are held fixed. Different methods of choosing blocks to include give rise to different designs: complete block designs, incomplete block designs and randomized block designs.

Plackett-Burman Named for the R.L.Plackett and J.P.Burman16, this design reduces the number of runs even further. The main effects are heavily confounded. For

16

They published their paper titled “The Design Of Optimal Multifactorial Designs” in 1946

38

CAE and Design Optimization – Advanced

Statistics – A Bird’s Eye View

instance, you can construct a 12-run experiment with 11 factors. Standard designs are provided for various numbers of factors.

Central Composite Also called Box-Wilson designs, there are several variations of this method: circumscribed, inscribed, and face centered. The first two require 5 levels per factor, while the third requires 3. CC designs can be full-factorial or fractional-factorial. Remember that a 2-level experiment can only capture linear effects, while a 3-level experiment can capture quadratic effects, and so on.

Box-Behnken A Box-Behnken design, also called a quadratic design, is a slightly more economical variation of the full-factorial CC design but can be more expensive than a fractional-factorial CC Design. 3 levels are required per factor.

Other Designs There are several other types of designs, such as the Latin Hypercube Design, Taguchi Design, and so on. Some designs are particularly suited for computer experiments. The term D-Optimal is used to describe experiment designs that are generated based on the chosen model given the number of runs. The matrix that defines the experiment is generated so as to optimize results for the chosen number of runs. Generating such a matrix by hand is not feasible, so D-Optimal designs are invariably used only for computer-experiments.

Variable Screening In many cases, we want to check whether selected factors have an effect on the responses or not. To do this, multiple samples are constructed, and the experiment is repeated across these samples. As described in the previous chapter, ANOVA is used to test levels of significance of the various factors. If the effect of the variation of a factor is significantly more important than the change of groups, then it is retained. Else it is screened.

39

Statistics – A Bird’s Eye View

CAE and Design Optimization – Advanced

Computer Experiments In many cases, the experiment is conducted using a computer model. This, of course, is of particular interest to us. In the physical world, an experiment consists of choosing levels for factors and measuring the responses. The measurements are repeated either for different samples in a population as in a poll or an inspection in Quality Control, or for repeated measurements on the same sample as described in the example of the rolling-mill. In process control the experiments can be repeated over the same machine. Either way the inherent noise generates different response values. And we then use ANOVA to test significance of the effect of noise on our conclusions. On a computer, running the same set of levels on a computer model will generate the same response, since there is no inherent noise17. This is only to be expected: the computer is a deterministic machine, after all. Therefore you cannot use ANOVA to compare variance-between-groups with variance-across-groups. The goal of a computer model is not to generate such results. Inherent in the computer model is the assumption that the model has already been finetuned so that it only contains important effects. Usually, this is done by a process called Parameter Identification or System Identification. This is explained in the next section. One important outcome of a computer DOE is to generate an approximate model (e.g. a Response Surface) that can be used to conduct further numerical experiments. ANOVA is used, but unlike a physical-experiment where between-groups factors are investigated, here it is used to check which factors should be included in the approximate model. This is called variable screening, in the context of computer experiments.

Summary The previous chapter covered essential terms in statistics and probability. This chapter put those terms together to explain how experiments are conducted, and outlined the issues involved in the design of experiments. 17

As pointed out earlier, computers models too can be susceptible to noise. Neglected in our discussion, as it is a different subject.

40

CAE and Design Optimization – Advanced

Statistics – A Bird’s Eye View

The next chapter looks at the issue from an engineering design point of view, bringing together terms and techniques relevant to CAE.

However beautiful the strategy, you should occasionally look at the results. Winston Churchill

41

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

Statistics – A Designer’s View One of the main strengths of mathematical approaches is that they can be applied to any context, provided the assumptions are fulfilled. This applies to statistics too. When using the tools, it is essential that we keep in mind the implicit assumptions. Statistics is commonly used in non-engineering applications. Prediction of outcomes of elections is an example that’s universally familiar. Pollsters use samples to understand and predict the behavior of the electorate, while candidates use samples to tailor their campaign promises. Coming to engineering applications, statistics is widely used in manufacturing. Applications in process control are well documented, and are familiar to any engineer who has undergone an introductory course in Quality Control. The applicability to engineering design is not as widely known. Accordingly, we will spend the rest of this chapter on two things. We will review some important trends that have led to an increase in the use of these tools. And we will use examples to show how the techniques of the previous two chapters can be used in the context of CAE and Design Optimization.

Applications Of Approximate Models Approximate models, also called Meta-Models or Surrogate Models, are not essential for all applications. To understand how and why they are relevant to our application, we will break our discussion into two parts. First, we’ll list the advantages Approximate models have. Different scenarios may benefit from one or more of these. It is possible that a scenario may not need any of these benefits, in which case Approximate Models can be dispensed with completely. Second, we will discuss the types of Approximate models used in CAE.

Improvement In Mathematically Behavior Non-linear behavior means the input and response are not linearly related. This means a small change in input could cause a sudden jump in output. 42

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

Calculus describes these as sudden changes in gradient. Some situations can be even worse: gradients may not exist at all. For cases like these, approximate models offer a good way out. We choose the form of the approximate model to ensure that it is differentiable or otherwise well-behaved. (Note that differentiability is important for gradientbased optimization methods, but is not required for other design improvement methods.) In other words, we give up some precision for an increase in decorum.

Reduction Of Computational Load Any engineer who has used Finite Element Analysis will jump at the opportunity to use models that can reduce solution time. Analyses in nonlinear applications like vehicle-safety can take several hours of CPU time for a single run. Consider this extract from a technical publication18: “A two-level, full factorial design would yield 27 = 128 treatments,

which is a prohibitive number to perform with FEA. Modifying the FE models tends to be extremely tedious, and the simulation run time would be unreasonably long.” A single analysis can take several hours of CPU time. A numerical experiment would be prohibitively expensive. And pity the engineer who finds a mistake in the experiment design at the end of the experiment. Approximate models can reduce the required computational effort by orders of magnitude. What’s more, they offer a way out of the second problem too. If you find an error in the experiment-design, you can repair the approximate model: points that define the model can be added, removed or moved.

18

“Failure Analysis of Rapid Prototyped Tooling in Sheet Metal Forming – Cylindrical Cup Drawing”, Y.Park and J.S.Colton, Transactions of the ASME, Vol 127, February 2005 43

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

Variable Screening Testing takes time and effort. It is expensive. The more the factors you want to test, the higher the time and expense. What if some factors are unimportant? Can you conduct a preliminary investigation to rank the importance of various factors? Can you then save time and money by excluding the lower-ranked ones from more detailed studies?

Screening samples are carefully constructed to detect such effects. In the earlier chapters we saw how ANOVA is an effective method to quantify and compare the effects of factors. With computer models, our approach must be a little different. Since a computer model is deterministic, repeating an experiment on the computer will yield the same results as long as there is no variation in the levels of factors. We cannot use ANOVA to compare between-groups variations to within-groups variations. How, then, can we use computer models for variable screening? With specific reference to CAE, there are two scenarios we will consider. But first, let’s review the basics of modeling for CAE. 1. Behavior of a real-world situation is captured using observation or experiment. 2. Mathematical Models, which usually involve some approximation again, are used to reflect the observed behavior. These are not always well-behaved, but are often called high fidelity models.

“Nobody believes analysis results except the analyst. Everybody believes test results except the test engineer.” M.Racicot

3. We further build Approximate Models, which are derived from the Mathematical Models.

Now let’s examine our issue: variable screening. In view of the 3 steps listed above, let us state the question more precisely, recognizing that there are actually two different questions:

44

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

1. We want to know which variables affect the power of the high-fidelity model to reproduce observed data. 2. We want to know which variables affect the power of the approximate model to reproduce the high-fidelity model. In the first case, we have some data from physical observations or experiments. We need to fine tune variables in the computer model. Take damping-factors or friction coefficients, for example. Mechanics is not welldeveloped enough for us to establish these material-data from fundamental considerations. They are usually set empirically – that is, to match data from an observation or an experiment. In the second case, we have a computer model that is tried and tested. There is no doubt about its validity. This is the high fidelity model. It could be an analytical expression or a numerical model19. However the high fidelity may have several input parameters. If we are to use it to conduct experiments, which of these variables should we include in the experiment? If you have an analytical equation that relates responses and factors, calculus can be used to evaluate sensitivity. Unfortunately, it is not always possible to determine the sensitivity of a response to the factors even if an analytical model links the two. The equation may not be differentiable in the domain. Or, it may impossible to evaluate it, even if it exists. If numerical models such as FEM are used, there is a model that reliably calculates response from inputs, but is not analytical. So sensitivity must be calculated numerically. If we include more factors than are essential, not only do we increase computing time, we also increase the difficulty of assimilating the results! Remember that sorting through the collected data is often a chore that experimenters dread. The first case (screening between observed data and the high-fidelity model) is addressed by parameter identification. In this approach, the results of a physical experiment are set up as target values. The computer model is run with various levels of many factors. By inspecting the 19

Transfer Functions (covered in most courses on Control Systems) provide excellent examples of high-fidelity analytical models. Many linear processes can be accurately described by numerical models. 45

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

computer model against the available physical-experiment results, we determine which factors can be safely omitted from the computer model without hurting its ability to match the physical-experiment results. This method does not need approximate models, but uses the same techniques to check which values can “safely” be omitted or used in the computer model for further CAE. In the second case, approximate models are extremely useful. A screening experiment is designed using the high fidelity model as the “target”. Screening experiments typically involve only two levels for each factor. The designer is encouraged to include as many factors as possible. ANOVA is conducted on the factors themselves to quantify their effects on the responses. Without presenting the mathematics here20, we will summarize the method: 1. Construct the approximate model as a weighted sum that involves the factors. 2. Choose a number of sampling points using one of the DOE methods described earlier. 3. Use regression analysis (a least-squares approach is often used) to calculate the coefficients in the summation. The relative values of the coefficients in the summation represent the importance of each factor. 4. Inspect the residual (usually shown as a graph, this shows the difference between the high-fidelity model and the approximate model) to ensure the overall adequacy of the model. 5. The error in the approximate model (that is, the difference between the approximate model and the high-fidelity model at each sampling point) follows a Normal Distribution.

20

For an excellent description see “Automotive crashworthiness design using response surface-based variable screening and optimization”, K.J.Craig, N.Stander, D.A.Doorge, S.Varadappa, International Journal for Computer-Aided Engineering Software, Vol 22 No.1, 2005, pp.38-61 46

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

6. Use ANOVA to calculate the contribution of each factor to the approximate model, along with the “confidence” in these estimates. Unlike the earlier example of ANOVA, the results of this screening are usually presented in graphical form. If the approximate model is to be used calculate multiple responses, one graph is presented for each response. For instance consider the histograms below21, in which the length of each bar indicates the effect of the corresponding factor. Remember that the estimated effect has some error. This error is calculated by the ANOVA. The F-values are used to estimate the “confidence” in the estimate. This is usually taken to be 95%. In the graphs below, the lighter part of each bar is that part of the effect that the analysis is 95% confident of. The darker part, which is the lower-confidence fraction of the total effect, is treated as error.

This type of chart is called a Pareto Chart of Effects. Sometimes a line is drawn across the bars to indicate how large an effect has to be in order to be statistically significant. From the charts shown above, the factor “R_Bracket_Gauge” has a significant effect on the “Mass”, but is almost irrelevant as far as the “Left Knee Force” is concerned. Since the “T_Flange_Depth” has a negligible effect on both responses, it can be screened from further experiments.

Types Of Approximate Models Approximate models, often referred to as Response Surfaces, are usually constructed in one of three ways. Remember that this technology is quite 21

From “Automotive crashworthiness design using response surface-based variable screening and optimization”, cited above. 47

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

recent, at least in comparison to the other methods commonly used in CAE, so these methods are still evolving. Common to all approximate models is the fact the high-fidelity model is used to evaluate responses at sample points. The sample points themselves are chosen using a DOE.

Least Squares Regression In this approach, regression analysis is used to fit the surface to the sampling points using a least-squares approximation. It is believed that a DOptimal design is the most suited experiment for this approach. The surface itself is usually a polynomial surface – either linear or quadratic or elliptical. The least-squares approach means that the surface is unlikely to match the high-fidelity solution anywhere. A large number of sampling points and a higher order of polynomial helps improve accuracy of the response surface. This method is usually good at capturing global minima, since it tends to smooth out local minima.

Moving Least Squares Regression This is a modification of the above method, in which the weights in the regression equation are a function of the distance of the point of interest from each DOE sampling point. Since the weights associated with each sampling point “decay” as the evaluation point moves away, an analytical expression is not possible but the approximation is still computationally efficient. Usually, the type of decay can be chosen by the analyst to vary the closeness of fit.

Kriging Named for D.G.Krige, who was trying to determine the grade of ore from samples, this method is sometimes preferred because of its improved accuracy. Unlike the least-squares fits, the surface interpolates the values at the sampling points. It provides for the inclusion of a stochastic component, with a given mean and variance, into the approximation model. It is believed to be less robust than the least squares method, particularly if the high-fidelity model’s results contain some noise.

48

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

Optimization Remember that the approximate model is not essential. If the high-fidelity model is well-behaved and computationally efficient, optimization can be performed without using an approximate model. Since well-behaved high-fidelity models are amenable to gradient-search methods, “mathematically demanding” techniques can be used. The techniques described in this paper gain importance either when gradientsearch methods cannot be applied to the high-fidelity model, or when we have reason to believe there are several local minima. In the first case, we can use the approximate model to provide a mathematically well-behaved function to the optimizer. To address the second problem, where local minima make it hard to locate a global minimum, an adaptive response surface search is preferred. This is best explained with reference to the figure. f(x) is the objective function and the domain is between points 1 and 3. The goal is to locate the global maximum. The search starts with any two points, numbered 1 and 2 in the figure. Using these two points and the responses at these two points, RS1 is constructed. The maximum of RS1 is easily determined: it lies at point 3. Next, evaluate the response at the points 1, 2 and3, and construct the quadratic curve RS2. The maximum of RS2, again easily determined, lies at point 4. We now evaluate the response at the 4 points 1, 2, 3 and 4. This allows us to construct RS3 and to locate its maximum. If we evaluate f(x) at this point, and if it turns out to be a maximum, we stop. The combination of approximations, trade-off studies using Pareto Frontiers, and search methods like the above, allow us to apply optimization techniques to computationally demanding, non-linear and multi-objective problems.

Reliability Recall our earlier definition of reliability: the probability of a measure lying within a specific value. One of the main drawbacks of an “academic” knowledge of engineering is that most textbooks present well-defined 49

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

problems. Loads are clearly specified, material properties are clearly specified, geometries are clearly specified, etc. The real world is quite different. An engineer who starts to practice engineering design has to deal with an inherent uncertainty not just in the design data but also in the manufacturing process. Manufacturing engineers have long lived with acceptable levels of uncertainty. The quality measures of components that are mass manufactured follow a Normal Distribution. For a Normal Distribution, a large percent of the population lies within 1 standard deviation of the mean. An even larger fraction lies within 3 standard deviations of the mean. Manufacturing engineers seek to control process parameters so that they can achieve Six-Sigma quality. That is, they do not shoot for zero rejections. They shoot for an acceptance rate that matches the six-sigma spread of a Normal Distribution.

Design For Six Sigma This approach, applied to design (and abbreviated to DFSS) has been used quite widely. It is worth noting that there are differences of opinion on the use of statistics in life-threatening situations. Richard Feynman wrote22 “It appears that there are enormous differences of opinion as to

the probability of a failure with loss of vehicle and of human life. The estimates range from roughly 1 in 100 to 1 in 100,000. The higher figures come from the working engineers, and the very low figures from management. …

Engineers at Rocketdyne, the manufacturer, estimate the total probability as 1/10,000. Engineers at Marshal estimate it as 1/300, while NASA management, to whom these engineers report, claims 22

In his report on the Shuttle disaster of 1986. It makes excellent reading for engineers! See http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/rogerscommission/Appendix-F.txt for details. 50

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

it is 1/100,000. An independent engineer consulting for NASA thought 1 or 2 per 100 a reasonable estimate.” The cost of failure, clearly, is debatable. Whether a failure of an MP3 player is less critical than the failure of an axle of a car sometimes depends on who the owners are! However, remember that cost is not the only reason to adopt DFSS. The impossibility of eliminating variations in data means many problems cannot be treated as purely deterministic. A Design Engineer using CAE has to accept the fact that there will be variations in any of • • • • • • • •

Material properties Loads Boundary conditions Initial conditions Geometry errors Assembly errors Solver precision Choice of model (mesh, element type, algorithm, etc.)

Noting that the aerospace industry emphasizes safety over cost, an extract from an aerospace-application conference23 presentation serves to highlight the fact that cost is not the only motivation for stochastic analyses: Structural Material Scatter Material

Characteristic

CV

Metallic

Rupture Buckling

8-15% 14%

Carbon Fiber

Rupture

10-17%

Screw, Rivet, Welding

Rupture

8%

Bonding

Adhesive Strength Metal / metal

12-16% 8-13%

Honeycomb

Tension Shear, Compression Face-wrinkling

16% 10% 8%

Inserts

Axial Loading

12%

23

Klein M., Schueller G., “Proabilistic Approach to Structural Factors of Safety in Aerospace”, Proceedings of the CNES Spacecraft Structures and Mechanical Testing Conference, Paris, June 1994 51

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

Thermal Protection

In-plane Tension In-plane Compression

12-24% 15020%

Load Type

Origin Of Results

CV

Launch vehicle thrust

STS, Ariane

5%

Launch vehicle quasistatic loads

STS, Ariane, Delta

30%

Transient

Ariane 4

60%

Thermal

Thermal Tests

8-20%

Deployment Shocks (solar array)

Aerospatiale

10%

Thruster Burn

Calibration Tests

2%

Acoustic

Ariane 4 and STS (flight)

30%

Vibration

Satellite Tests

20%

Load Scatter

We have already seen the issues related to the use of high-fidelity models, and have built methods to estimate the error if we use approximate models both to explore the design space (using DOE) and to optimize. To complete our toolbox, we also need a way to quantify the variation in response, given a variation in data. Called stochastic analysis, this helps us estimate the reliability of the design. If we deem the risk of failure too high, stochastic analysis also tells us which factors we should pay more attention to. These analyses are usually done using Monte Carlo methods. An enormous number of runs can be required to make the best use of statistical effects. Monte Carlo experiments frequently require thousands of runs.

Summary The first chapter discussed MOO, MDO, and non-linear models, and introduced the need for experiments. In the next three chapters, we looked at the mathematical principles involved. Our exploration first covered the applications of statistics and probability – DOE, ANOVA, Stochastics - to process control, which is usually addressed at the undergraduate level in relation to Quality Control. We then used the Engineering Design and Optimization context to understand how the same tools are applicable here too.

52

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

It is a simple logical extension, then, to conclude that CAE requires software that allows us to •

explore the design space

•

search for optimum solutions

•

estimate reliability

•

perform trade-offs

•

interface with multiple solvers

Depending on the problem at hand, of course, we will need to use one or more of these capabilities. But we certainly need these capabilities for our toolkit to be complete!

You'll never have all the information you need to make a decision. If you did, it would be a foregone conclusion, not a decision. David Mahoney

53

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

HyperStudy - Putting It All Together We are now clear on two things. First, that we need to resort to advanced optimization tools if the design problem involves one or more of the following challenges: •

non-linear behavior, which results in large computation time, makes the use of gradient-search methods harder, and may possess multiple local minima

•

multi-disciplinary analysis, with different solvers being used for different disciplines

•

multi-objective-optimization, which calls for a best-compromise instead of a best-design

•

stochastic behavior, which means reliability and robustness must be quantified.

Second, that CAE tools to address such problems must provide:

54

•

support for multiple DOE models, in order to o explore the design-space o build approximate models o conduct Monte-Carlo type stochastic analyses

•

the ability to evaluate and improve approximate models

•

algorithms to search for the global optimum among local minima, either with or without an approximate model

•

capabilities to include and interpret stochastic effects, including support for reliability estimation

•

the ability to perform trade-offs

•

the ability to build models for and extract results from multiple solvers

CAE and Design Optimization – Advanced

HyperStudy - Putting It All Together

HyperStudy With direct connections to Altair’s Solvers24, interfaces to several other solvers, and the ability to interface with any solver, HyperStudy is a good way to achieve most of the requirements we’ve studied so far. If you have set up a model in HyperMesh, for example, you can invoke HyperStudy which then has direct access to model variables. If you’re using a solver that is not from Altair but that interfaces with HyperMesh and HyperView, then the approach is a little different. You would first create a data file for the solver (you could use HyperMesh to do this, though it’s not essential). Then you would read this data file in HyperStudy, and set up your study by choosing variables from the data file. HyperStudy uses its interfacing abilities both to generate additional data files for the solver (with changed data) for each experimental measurement, and to read in evaluated responses. Finally, if you’re working with a solver that has no existing interface to HyperMesh or HyperView, you could use the Templex programming language to build a custom interface25. The assignments illustrate the steps in detail, so they will not be covered in this chapter. Here, we will look at the steps you should take to setup and review a problem. With the background laid out in the earlier chapters, you should be able to follow why we take the approaches described below. You should pay particular attention to the recommended methods to review results.

Before The Study Remember that one of the main reasons we are using HyperStudy is that CAE is quite computationally intensive. It is prudent, therefore, to spend some time planning your attack strategy. Without adequate planning, it is easy to find you have invested more time and effort than you had intended to. 24

Of which Finite Element analyses, kinematic and dynamic analyses of mechanisms, and sheet-metal forming simulation are covered by the other volumes of this series. 25 This is covered in the volume titled Managing The CAE Process 55

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

As with all experiments, gathering too much data is often as bad as gathering too little. The effort involved in collating and interpreting results is all too often underestimated. To make things easier for you, HyperStudy follows a “wizard” approach. That is, the interface provides the various functions as a step-by-step sequence, ensuring that you complete the steps in the correct order. Even before you start building your model, you should be clear on the answers to the following: 1. Are the objectives and constraints suitable? Are they physically meaningful? 2. Can the responses be measured quantitatively? What accuracy is required for a meaningful experiment? 3. Are there any effects that should be deliberately blocked or omitted? Should a screening-run be performed to verify assumptions on importance? What levels should be used for the screening run? 4. Which effects will be aliased or confounded by the screening run? Are they important? 5. What should the sequence of experiments be? Can the results of the first experiment be used to create the second, and so on? 6. How will findings be confirmed?

Also spend some time planning how you will collate and present results. FE models are often interpreted by displaying results on the 3D model of the component, but this may not be appropriate. Remember that you will have to review a much larger magnitude of results: you may have results of several dozen FE analyses! Also, there are numerous other forms of simulation and applications that do not use the kind of graphics that FEA uses. A polynomial transfer function may be the high-fidelity model! 56

CAE and Design Optimization – Advanced

HyperStudy - Putting It All Together

It’s a good idea, therefore, to track the vital data only. Since dealing with a large volume of data is a common task, the accompanying assignments also introduce HyperGraph. A quick look at this application can help you plan your results-interpretation phase.

Performing The Study There are 5 distinct steps in HyperStudy. The first is essential: called the Study Setup, this is where you define design variables and responses. The former are the factors in the experiments you will conduct.

Study Setup A “study” is saved in an XML26 file. Unless you’re working with a data management tool (such as Altair Data Manager) it’s a good idea to create a separate folder for each study. The definition of the study consists of 1. One or more high-fidelity models – an OptiStruct FEA model, a MotionSolve multi-body model, etc. Unless you’re performing an MDO, you will need only one model for the study. 2. The design variables, or factors. These can be continuous or discrete, A design variable can be a numerical value, as is common with numerical analyses. Since HyperStudy is a general-purpose tool, the design variable can be quite general - it can even be a text-string. Remember that the levels of a design variable are meaningful only if it is discrete. If you want to carry out a screening run with continuous design variables, you can define them as discrete for the purpose of the screening. If the variable is continuous, you specify the bounds for each, instead of the levels. The subsequent DOE will decide which levels to use within these bounds, depending on the type of design specified for the experiment. 3. The responses. These are chosen from the results of the evaluation carried out by the high-fidelity model. For instance, the stress at a node can be selected either from a text-output file or a binary-results file.

26

Short for Extensible Markup Language. Uses tags, similar to HTML files. 57

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

4. Optionally, design variables can be linked. This could either be a design requirement (for example the specification that the fillet radius be dependent on the thickness) or because multiple solvers are used. 5. Sensitivity. This is optional. All solvers calculate the sensitivity of responses to factors. Some solvers actually write this information out, so you can use it in a subsequent study.

DOE Study Once the study has been defined, you specify what type of experiment you want to use. In general, Fractional Factorial, Placket-Burman, or D-Optimal are used for screening runs, while Box-Behnken, Central Composite, or D-Optimal are used to construct response surfaces. Remember that •

Full Factorial is not recommended if # factors > 5, since the combination of the number of factors and their levels can make it prohibitively expensive

•

Fractional Factorial is often used with just 2 levels for each factor

•

Taguchi does not take into account interactions between variables (it uses “orthogonal arrays”)

•

Plackett-Burman is called a geometric-design is number of runs is a multiple of 4, it is called a non-geometric design.

•

Central Composite is recommended for construction of second order response surfaces

For the study, you will need to distinguish between controlled and uncontrolled variables. The former consist of design variables that you want to manipulate as a part of your design, while the latter are due to uncontrollable noise. You can use different DOEs for the controlled and uncontrolled variables, depending on the amount of effort you can afford to spend.

58

CAE and Design Optimization – Advanced

HyperStudy - Putting It All Together

Once you have done this, you fine-tune your experiment. Recall the table drawn up earlier when the Fractional Factorial design was discussed. Most DOEs are specified as a matrix, showing the levels that will be used for each factor, and which effects will be confounded. Fine-tuning means you can edit the matrix to change the allocations for each factor / level and the interactions between them. Once the DOE has been specified, HyperStudy runs the analyses and extracts responses.

Approximation If the DOE is to be followed by an optimization or a stochastic analysis, it’s a good idea to build an approximation. Even if these are not planned, the use of approximations for variable screening is a very useful insight. You can define an approximation for any or all responses calculated by the DOE. Each response can have a different type of approximation. You can also build multiple approximations for a each response, which you would do if you are unclear on which is best suited. Remember that a 2-level experiment can only support a linear-regression model. The DOE points are now used to construct the approximation. You can use some of the points to define the approximation, and others to “validate” it. If, for instance, you have conducted 6 runs, you may use runs 1/3/5/6 to create the approximation. Then you can check the efficacy of the approximation by using runs 2/4 to validate it and calculate the residuals. The next logical step is to perform an ANOVA of the variables to determine whether or not they should be retained for further studies. Also, you would normally perform trade-off studies to determine the impact of changes in factors on the objective or objectives – sometimes called “what-if studies”.

Optimization Study Remember that an optimization is not the only reason HyperStudy is used. Therefore it’s logical that none of the previous steps defined any of the optimization-specific terms: constraints and objectives. Constraints are limits on the design variables – any design that crosses these limits is infeasible. Objectives are the quantitative measures of design 59

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

quality. HyperStudy allows you to perform either constrained or unconstrained optimization, and allows you to work with either a single objective or multiple objectives. HyperStudy provides several optimization algorithms. One, called the Adaptive Response Surface uses the sequential-search method described earlier. The other methods (Sequential Quadratic Programming, Method of Feasible Directions, Genetic Algorithm and the completely general userdefined) are beyond the scope of this book, and are described in the on-line help documentation. You can choose to perform to minimize or maximize the objective, to perform Min-Max optimization, or a system identification. The last is when you want to minimize the deviation of the objective from a target value. In any case, you can choose to use either the high-fidelity model or the approximate model (provided one has been built, of course). You will want to be judicious in your choice of iterations for the analysis. It is better to use a small limit to start with. Even if this is unsuccessful, you can use it to see how the optimization is progressing. Subsequent optimizations can restart from this analysis, meaning the initial investigation is not wasted.

Stochastic Study A stochastic study, which generated the PDF of responses based on PDFs of design variables, can be performed directly after the Study Setup is complete. Normal, Uniform, Triangular, Exponential and Weibull distributions are supported. A DOE is not required, since the sampling method is specified here: you can choose between

60

•

a simple random sample, which is the basic Monte Carlo method. A large number of runs is required for meaningful results

•

a Latin HyperCube sample which reduces the number of runs by distributing samples using the PDF of the variables

CAE and Design Optimization – Advanced

•

HyperStudy - Putting It All Together

a Hammersley sample, which is an improvement over the Latin HyperCube while still being less expensive than the simple Monte Carlo

Once this is done, HyperStudy evaluates the responses using either the high-fidelity model or the approximation, depending on your choice. That is, you explicitly tell HyperStudy which to use.

Reviewing Results Statistics are notorious for their ability to allow the analyst to draw a variety of different conclusions from the same data. Before discussing what to review and how, it’s useful to summarize the motivations for each type of study.

DOE The goals usually are one or more of the below: 1. to screen variables by looking for correlation between factors and responses, usually by running a fractional / reduced experiment for a large number of factors with a few levels (as low as 2) for each 2. to detect interactions between variables, usually by running a full factorial experiment after a screening run, small number of factors with more levels than a screening experiment 3. to construct an approximation for an optimization or a stochastic study

Approximation The main reasons we choose to build approximate models are: 1. to screen variables by using an ANOVA to detect significance of effect on response for given a confidence 2. to provide a well-behaved model that can be used for optimization or stochastic studies 3. to perform trade-off studies

61

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

Optimization An optimization is just a way to locate a particular point on the responsecurve or surface. The two reasons we search for such a point are 1. to locate a global minimum, normally used for a computationally demanding model, an MDO or an MOO 2. to locate values for the design variables so that a target value is matched as closely as possible

Stochastic Study While every problem is non-deterministic in reality, remember that it is often possible to get a good answer using a deterministic model and applying an appropriate factor of safety. It’s when the factor-of-safety approach is either too expensive or covers up too much detail that we turn to stochastic analyses. The main goals are: 1. to evaluate robustness of the design by comparing the coefficient-ofvariance of the responses and the coefficient-of-variance of the variables 2. to estimate reliability of the design by calculating the probability that responses lie in selected bands

Interpreting Graphics Displays Histograms and ant-hill plots are the principal means of presenting data in an easy-to comprehend fashion. An ant-hill plot, also called a scatter diagram, plots markers on a graph without fitting any curve. If one axis is a design variable and another is a response, a quick look at the plot indicates the presence or absence of any correlation. As an example, consider the plots shown below. In both, the y-axis is a response, while the x axis of each is a different design variable. The plot on the right shows a negative correlation. As the design variable increases, the response decreases. If the points were distributed more or less parallel to the x-axis, we would conclude that the response is independent of the design variable.

62

CAE and Design Optimization – Advanced

HyperStudy - Putting It All Together

The plot on the left shows a variation, but no discernible pattern. In all likelihood, there is another factor that is causing the change in the response. The advantage of using an ant-hill is that it allows for a reasonable comparison between a computer-experiment and a physical experiment.

Comparing individual values is normally not very realistic, since it doesn’t allow for “random” variations – that is, noise, which shows up as lack of repeatability in a physical test. Histograms are bar-charts, with data clubbed into “classes”. Classes are also called “groups”, “buckets” or bins. In general, the more the bins, the finer the resolution of the data. Of course, you need to have a large number of runs for the bins to be high. Consider a two-level experiment with a finite number of trials. The results of this are best shown as a histogram. If the number of trials approaches ∞, the distribution approaches a Normal distribution and the histogram approaches a density function. Connecting the tops of each bin gives the density functions – PDFs and CDFs.

HyperGraph HyperStudy provides quick and easy graphical display of most data, but there are times when you may want to generate different plots or forms of display of data. HyperGraph provides support for direct import of data from other HyperWorks applications, including HyperStudy. 63

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

Additionally, analytical curves can be plotted in HyperGraph, allowing for useful correlations between analytical models or theoretical values and the results of numerical experiments. A brief introduction to HyperGraph is provided with the assignments that accompany the Instructor’s Manual.

You got to be very careful if you don't know where you're going, because you might not get there. Yogi Berra

64

CAE and Design Optimization – Advanced

Glossary and References

Glossary And References ANOVA

Short for Analysis of Variance. Related terms are MANOVA (for Multivariate ANOVA), ANCOVA (for analysis of covariance) and MANCOVA.

Ant-hill Plot

Also called a scatter plot, this is a graph showing the change in one variable as the other changes. It is used to look for a correlation between the variables.

Confidence Interval

The interval within which a variable may lie with a particular confidence. A 95% confidence is often used in engineering applications.

Constraint

A limit on a design variable.

Convex Function

A function that has only one minimum in the domain. This minimum is the “global minimum”.

Correlation

Usually expressed as a coefficient which is normalized to lie between – 1 and +1, indicates whether two variables are linked or not. A value of 0 implies the variables are independent. A value of +1 means a perfect positive correlation – that is, they increase together at the same rate. A value of –1 means as one increases the other decreases, at the same rate. The weaker the relationship between variables, the more the samples required to prove the existence of the relationship. If there’s no relationship, the sample will need to be ∞, i.e. the whole population.

Dependent Variable

The responses that are measured by the experimenter, who varies the dependent variables as per a plan. The plan is called the DOE.

Design Variable

See “variable”

Factor

See “variable”.

Global Variable

If an analysis involves multiple solvers (as in MDO), a global variable is one that is relevant in all contexts.

Independent Variable

The factors that are manipulated or varied by the experimenter.

Interaction

Means that two or more independent variables are linked, not independent of each other. This does not mean there’s a cause-andeffect relationship. It only means that changing one means the other

65

Glossary and References

CAE and Design Optimization – Advanced effect relationship. It only means that changing one means the other changes too. If an interaction is detected, further study is required to understand if there’s a cause-and-effect linkage between these two, or perhaps a third independent variable that’s the cause.

66

Latin Square

A square array in which each letter or symbol appears exactly once in each row and column. Used to design experiments by treating the different rows as levels of the first factor, the columns as levels of the second factor and the contents of each cell of the array as levels of the third factor.

Level

Values that a factor can take. Can be qualitative (“good” or “bad”) or quantitative. In CAE, levels are quantitative.

MDA

Multi Disciplinary Analysis

MDO

Multi-disciplinary Optimization. Used, for example, when your product needs to be designed for optimal performance as a mechanism and as a structure.

Min-Max

A formulation in which the maximum value of several responses is minimized.

MOO

Multi Objective Optimization

OFAT

One Factor At a Time. Method of testing for effect of variables on responses. Easier to do than a DOE, but may miss interactions if the exist.

Orthogonal Array

In the context of DOE, refers to a table of rows and columns such that for any pair of columns (i.e. factors) all combinations of levels occur, and all occur the same number of times.

Outlier

A point in the design space that does not follow the general pattern. This may be either due to noise, in which case the point should be ignored, or due to actual behavior of the system. In the latter case, the designer will need to decide how this affects design decisions.

Pareto Frontier

Relevant to MOO. Plot showing Pareto Optimal designs. Helps choose the best compromise, often on non-engineering bases.

Pareto Optimal

A design configuration in which none of the objectives can be improved without worsening at least one other objective.

RBDO

Reliability Based Design Optimization

Regression

Usually linear, it’s an equation that links two correlated variables. We calculate the regression line only after calculating the correlation coefficient and ensuring that the variables are indeed correlated.

CAE and Design Optimization – Advanced

Glossary and References

coefficient and ensuring that the variables are indeed correlated.

Residual

The difference between observed and predicted values. The smaller the residual, the better the model used for prediction. For instance, you will want to check the difference between the approximation and the high-fidelity model, to check whether you can reliably use the approximation. Measures the goodness-of-fit.

Resolution

Measure of the ability of a DOE to capture interactions. The higher the resolution, the better the capability. A full-factorial design has an ∞ resolution. Confounding reduces the resolution of an experiment. In practice, a resolution of 5 is excellent, while a resolution of 3 is sufficient for screening.

Response Surface

In the absence of a continuous function relating the objective to design variables, numerical experiments can be used to generate a table of objective-function values vs. design-variable values. A surface fitted through this table of points, called the Response Surface, is then used to find optimal locations.

Robust Design

A design method to reduce sensitivity of the design to inherent unpredictability of design parameters.

Saturated Design

A DOE in which the number of evaluation points equals the number of unknown coefficients in the approximation. It is not possible to test for a lack of fit.

Sensitivity

Rate of change. Normally = gradient of the response with respect to the design variable(s).

Significance

See Statistical Significance

Statistical Significance

A measure of whether a relationship between variables is a result of chance or not. Usually measured as a p-value. A p-value of 0.05 indicates that there’s a 5% probability that the relationship is due to luck (i.e. sampling error, for instance). A p-value of 0.9 indicates a 90% probability that the relationship is due to luck. Obviously a smaller p-value is indicative of a higher significance. Tests of significance depend on the sample size. See correlation.

Stochastic

Something that involves chance or probability, but with an overall and measurable trend or direction – this make sit possible to predict the behavior. Engineers frequently encounter stochastic processes and stochastic variables.

Variable

Also called factor. Anything that we can measure or control in an experiment. Also see dependent and independent variables. In engineering design, we usually use the term design variable.

67

Glossary and References

CAE and Design Optimization – Advanced engineering design, we usually use the term design variable.

Worst Case Design

A formulation in which the objective is minimized with respect to some variables and maximized with respect to others. This is not the same as a Min-Max design, where the minimization or maximization is done with respect to the same variables.

References Hill, T. & Lewicki, P. (2006). Statistics Methods and Applications. (http://www.statsoft.com/textbook/stathome.html) Statistics For The Utterly Confused, L.R.Jaisingh The Theory Of The Design Of Experiments, D.R.Cox and N.Reid Total Quality Management, D.H.Besterfield, C.Besterfield, G.H.Besterfield, M.Besterfield-Sacre NIST/SEMATECH e-Handbook of Statistical Methods, (http://www.itl.nist.gov/div898/handbook)

Other Resources www.altair-india.com/edu, which is periodically updated, contains case studies of actual usage. It also carries tips on software usage.

Which Distribution Should You Use? Judicious use of distributions helps take decisions in real-life. For example, suppose a showroom sold, on average, 17 green cars a day in the past. What is the probability that 20 will be sold tomorrow? Many scenarios are documented in literature: for instance, failure process are generally described using the Weibull distribution, and repair processes by the Lognormal distribution. In case of a lack of clarity on which distribution to use, one approach is to fit a curve to the data and then choose the closest distribution.

The Normal Distribution z is used to denote the standard distribution, which has a mean = 0 and a standard deviation = 1. For example, using the table below, the cell marked with an “&” (0.7257) gives the probability that the random variable will be < 68

CAE and Design Optimization – Advanced

Glossary and References

0.6, while the cell marked with a “$” (0.7357) is the probability that z will be < 0.63. z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1

0 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257& 0.7580 0.7881 0.8159 0.8413 0.8643

0.01 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665

0.02 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686

0.03 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357$ 0.7673 0.7967 0.8238 0.8485 0.8708

0.04 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729

0.05 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749

0.06 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770

0.07 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790

0.08 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810

0.09 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830

The notation X = N(µ, σ) means that X is a random variable that follows a Normal Distribution, and has a mean = µ and a standard deviation = σ. Using Excel, the formula NORMSDIST(0.63) gives you 0.7357. For a general normal distribution with a mean µ and a standard deviation σ, the formula NORMDIST(0.63, µ, σ, TRUE) gives 0.7357. To do this manually, convert the random variable to the standard form by subtracting the mean and dividing by the standard deviation. That is, z=

x−µ

σ

Then use this transformed variable z with the above table to find P(z). You can, of course, use the Normal Table backwards: that is, you can look up the value that z should have for, let’s say, a 95% probability of occurrence. Remember that the values above are cumulative densities. That is, they represent the integral from –∞ to the given value. Some useful properties of the Normal Distribution: if X = N(µ1, σ1) and Y = N(µ2, σ2) X–Y

= N(µ1 - µ2, σ12 + σ22)

X+Y

= N(µ1 + µ2, σ12 + σ22) 69

Glossary and References

CAE and Design Optimization – Advanced

aX + bY

= N(aµ1 + bµ2, a2σ12 + b2σ22)

aX

= N(aµ1, a2σ12)

The F Distribution This is the distribution of the ratio of two estimates of the variance. It results when variables that follow a normal distribution are sampled, and the measured values are squared and summed. The distribution depends on the “degrees of freedom”: that is, the number of levels (within groups) and the number of samples (between groups). The variability within a group is assumed to occur because of error, while the variability across groups is assumed to occur because of true variance and error. ANOVA is a way to separate the error from the true variance.

70

View more...
Contents

Contents Introduction ......................................................................................................2 About This Series ...........................................................................................2 About This Book .............................................................................................2 Supporting Material ........................................................................................3 Optimization – Requirements And Approaches.....................................................4 Basic Definitions.............................................................................................4 “Mathematically Demanding” Approaches ........................................................5 The Need For Alternative Approaches ..............................................................6 Summary ..................................................................................................... 13 Design Improvement – A Designer’s View ......................................................... 15 Concept vs. Existing Design .......................................................................... 15 Design Of Experiment................................................................................... 16 Approximations: Building Better-Behaved Models ........................................... 17 Stochastics................................................................................................... 18 Optimization ................................................................................................ 20 Summary ..................................................................................................... 21 Statistics – A Worms Eye View.......................................................................... 22 Dealing With Populations .............................................................................. 22 Probability ................................................................................................... 25 Experiments................................................................................................. 27 Statistics – A Birds Eye View............................................................................. 34 DOE ............................................................................................................ 34 Variable Screening........................................................................................ 39 Summary ..................................................................................................... 40 Statistics – A Designer’s View ........................................................................... 42 Applications Of Approximate Models .............................................................. 42 Types Of Approximate Models ....................................................................... 47 Optimization ................................................................................................ 49 Reliability ..................................................................................................... 49 Summary ..................................................................................................... 52 HyperStudy - Putting It All Together ................................................................. 54 HyperStudy.................................................................................................. 55 Before The Study ......................................................................................... 55 Performing The Study................................................................................... 57 Reviewing Results ........................................................................................ 61 Glossary And References.................................................................................. 65

1

Introduction

CAE and Design Optimization – Advanced

Introduction About This Series To make the most of this series you should be an engineering student, in your third or final year of Mechanical Engineering. You should have access to licenses of HyperWorks, to the Altair website, and to an instructor who can guide you through your chosen projects or assignments. Each book in this series is completely self-contained. References to other volumes are only for your interest and further reading. You need not be familiar with the Finite Element Method, with 3D Modeling or with Finite Element Modeling. Depending on the volumes you choose to read, however, you do need to be familiar with one or more of the relevant engineering subjects: Design of Machine Elements, Strength of Materials, Kinematics of Machinery, Dynamics of Machinery, Probability and Statistics, Manufacturing Technology and Introduction to Programming. A course on Operations Research or Linear Programming is useful but not essential.

About This Book This volume introduces techniques to investigate reliability and robustness in engineering design, and provides an alternative interpretation of Design Optimization: the increasingly widely used method of Design Improvement. If you want to study design optimization in the context of concept-design, or New Product Design as it’s sometimes called, you will find the companion volume CAE And Design Optimization – Basics useful. The techniques outlined in this book are usually applied either to improve existing designs or to improve concepts that have been suggested by the techniques outlined for concept-design in the companion volume. While it’s not essential, a good grasp of the basic principles of statistics and probability will help you tremendously. Several essential aspects are covered in this book, although in a qualitative fashion. You may want to treat the chapter titled Statistics – A Worm’s Eye View as a reference. If you choose to adopt this approach, at least a cursory reading of this chapter is strongly recommended.

2

CAE and Design Optimization - Advanced

Introduction

The various references cited in the book will probably be most useful after you have worked through your project and are interpreting the results.

Supporting Material Your instructor will have the Student Projects that accompany these volumes – it should certainly be made use of. Further reading and references are indicated both in this book and in the Instructor’s Manual. If you find the material interesting, you should also look up the HyperWorks On-line Help System. The Altair website, www.altair.com, is also likely to be of interest to you, both for an insight into the evolving technology and to help you present your project better. The two volumes of this series that cover manufacturing simulation and multi-body dynamics complement the techniques covered in this book. They should be interesting from an application perspective if your areas of work include non-linear analyses and multi-disciplinary optimization.

All models are wrong, some are useful. George E.P.Box

3

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

Optimization – Requirements And Approaches The earlier volume in this series, CAE and Design Optimization – Basics, introduced the need for engineers to design products optimally, and presented one definition of “optimum” as “the greatest degree or best result obtained or obtainable under specific conditions”.

Basic Definitions A design problem usually involves a set of resources, some limits on these resources, and one or more performance-criteria to achieve with some required degree of precision. There is almost always more than one correct “answer” to a design problem. These are feasible in the sense that they provide the required functions, and work with available resources. Some answer are better than others either because they use fewer resources or because they meet the performance criteria with a higher precision, or with a higher reliability, or to a higher degree. The optimum design is one that achieves the function to the best degree using the fewest resources. While traditional design approaches draw upon expertise or intuition to arrive at one or more feasible solutions, optimization techniques are valuable in design because they turn the problem around, using an inverse formulation. Instead of being given a design to check for feasibility, the computer program is given the specifications and asked to hunt for the best solution. To provide the specifications to the computer program we need to state: 1. The data: things that are given and cannot be changed. For instance, if a component is to be designed to carry a particular load, the designer must take the load as “data” and must not seek to change it (unless, of course, a feasible solution cannot be found). The design space is the region within which the designer must work. This data is captured in the analysis model, which could be a Finite Element model or any other mathematical model. 2. The design variables: things that can be controlled by the designer. For example, the thickness of sheet-steel or the cross-section of a beam. 4

CAE and Design Optimization – Advanced

Optimization – Requirements and Approaches

3. The responses: calculated or derived values that must be tracked to check whether constraints have been violated, or to rank the quality of various design options. 4. The constraints: limits on the resources that must be observed if the design is to be judged feasible. For instance the permissible stress in the chosen material or ranges of frequency that must be avoided. The constraint could be either on a design variable (maximum available thickness of sheet-steel, for example) or on a response (maximum displacement at a point, for example). 5. The objective: the measure of quality of the design. This may be the center of gravity, or the deflection at points of interest, etc. Usually we seek to minimize the objective. Remember that minimizing an objective is the same as maximizing its reciprocal. Also remember that this is distinct from a constraint. A constraint needs to be satisfied, while the objective must be minimized.

Original

Optimized

Using optimization at the concept-design level itself is an attractive option. This gives us the capability to decide how and where to place material within the design space so as to achieve the objective to the best degree possible while satisfying as many constraints as possible without drawing on intuition or experience. There are various ways to use these powerful tools. Topology optimization, topography optimization, size optimization and shape optimization methods are used either singly or in combination to achieve remarkable economy and elegance of designs. Their extensions to discrete optimization make them an indispensable tool for applications such as design of composites.

“Mathematically Demanding” Approaches The 4 methods of optimization named above will not do if the mathematics involved is ill-behaved. 5

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

A well-behaved function is one that is both predictable and tractable. That is, we can both understand its behavior and evaluate the behavior at all points in the domain. One measure that affects our ability to understand the behavior of a function is its continuity. As you will recall from your introductory course on Calculus, the continuity of a function is closely related to its differentiability – that is, whether or not the gradient of the function can be evaluated. If the function is differentiable we can use gradient search methods to locate the optimum solution. Another useful measure of the tractability is the number of turning points in the function: a function with just one minimum gives us the comfort that we will find the minimum regardless of where we start from. If the curve / surface has multiple turning points, though, we may find ourselves trapped in a local minimum, unable to reach the global minimum using gradient search methods.

The Need For Alternative Approaches Unfortunately, mechanics is not always accommodating. Simple and essential aspects such as contact or friction mean that the mathematics is pushed into areas where the required degree of decorum cannot be counted on. We can of course, build models that neglect these aspects. In this approach, used quite frequently, we linearize the problem. Often, a linear approximation serves very well, particularly since engineers use factors of safety to account for the approximation of the model. In many cases, even if linearization is recognized as inaccurate, we use it as a starting point to arrive at a conceptual design. Then we use more accurate models to verify and improve this concept. There are some times, however, when linearization is simply not possible. There are several factors that force us to step beyond linearization and look for advanced methods of design, analysis and optimization.

6

CAE and Design Optimization – Advanced

Optimization – Requirements and Approaches

Multi-Disciplinary Optimization Almost any problem in the physical world involves multiple effects. Quite often each of these is characterized by a different equation. For instance, we use Navier-Stokes Equations to describe the flow of strainenergy in solids and fluids, but must resort to Maxwell’s Equations to model the flow of electromagnetic energy in the same solids or fluids. We simply do not have a single equation that adequately predicts the flow of both types of energy. In some cases, we can decompose the effects because they are orthogonal and treat them independently. In the world of CAE, this means that we will use a different solver for each effect. In other cases, we cannot afford to decompose the effects because they are closely linked. In this case, we try to derive equations that work adequately to model all effects in a single model. Almost invariably, these equations are so complex that they require enormous computing resources to evaluate. An excellent example of this is Fluid-Structure-Interaction (FSI). To compute the inflation characteristics of a commonly used safety device, the airbag used in cars1, FSI must be used. The gas within the airbag is a fluid, described by an equation of state and following the gas laws. The bag itself is a structure (of fabric). It is impossible to analyze the two effects independently and still get meaningful results!

Multi-Objective Optimization In some cases the quality of the design cannot be reduced to a single objective. There are multiple measures of product quality and it’s not obvious which solution is the best.

1

The picture shown is from the Research Quarterly, Summer 2003, of the Los Alamos National Laboratory. 7

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

Consider the dilemma a car designer faces. The task may be to design a vehicle that provides a mileage of at least 20 km / liter and satisfies the safety statutes. Can we treat either of these functional requirements as a design constraint? Yes, that’s certainly possible, but it’s not the best way out. Satisfying the constraints means you will come up with a design that just satisfies both the mileage requirement and the safety-statute. It does not mean you will achieve the best possible design. Your competition may come up with a design that has an even better mileage or is even safer. Clearly, treating the requirements as constraints will not take us to the optimum because of the difference between a constraint and an objective. A constraint must be satisfied, while an objective must be minimized. If, as in the case of the vehicle, there are two or more responses to be minimized, this is called Multi-Objective-Optimization (MOO). Very often the objectives are opposing. In the car, an increase in safety by adding stiffness and weight usually results in a fall in fuel economy. A design where improving one objective results in worsening at least one other objective is said to be Pareto Optimal. A collection of all such designs is called a Pareto Frontier. Quite often, in a design review, someone will ask “What if we were to increase this variable?” or “Can we gain here if we are willing to give up something else?” If design iterations start after such questions, development time overruns may be the result. Pareto frontiers help answer such questions quantitatively. The designer’s task is to deliver the Pareto Frontier to the other members of the product team, so that an educated decision can be made on what to trade off. There may be several designs that exceed both the fuel-economy target and the statutory safety laws. Deciding which of these is “the” best calls for an evaluation that lies outside the design engineer’s realm. Most often, marketing strategies decide this. Choosing a solution invariably involves a trade-off to arrive at the best compromise.

8

CAE and Design Optimization – Advanced

Optimization – Requirements and Approaches

As an aside, engineers who tend to underestimate the importance of marketing would do well to remember Akin’s Laws of Spacecraft Design2, one of which points out that “A bad design with a good presentation is

doomed eventually. A good design with a bad presentation is doomed immediately.”

Non-Linear Analyses Problems that involve large deformations, large strains, plastic flow, contact, radiative heat transfer, etc. are characterized by non-linear differential equations. Metal-forming is one such application. Most general non-linear differential equations do not even possess proofs of existence of a solution. These are challenging problems to even solve, let alone optimize using the inverse- approach described earlier for concept-design (that is, specify the criteria and let the algorithm locate the best design). We may not be sure that an optimal solution exists (that is, we may not be sure that there is a global minimum), but would we be foolish to give up without a search? In cases like these it makes sense to understand the lay of the land first.

Design Space Exploration There are several fields of engineering where mechanics is not yet upto the task. Many real-world complexities are not adequately understood, much less captured satisfactorily in mathematical models. Like an explorer in uncharted territory, we may need to explore the design space to understand how the various variables are linked, whether there are any combinations that lead to pathologically bad designs, etc. In a lot of cases it is a foolish waste of resources to start searching for an optimum without first checking that an optimum even exists. Design Space Exploration is the first step towards this. Once the results have been reviewed, the search for an improved design can be embarked upon with a better estimate of the time and cost of finding such a solution.

2

Written by Dave Akins. See http://spacecraft.ssl.umd.edu/ for the complete list. 9

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

We can therefore use a numerical procedure to explore behavior, rather than to find a “right” answer. This is very well illustrated by the following abstract3 which is drawn from the field of bio-mechanics. Since the language is outside usual engineering expertise, areas of particular importance to us have been emphasized: “The use of a mesial-occlusal-distal (MOD) restoration in repairing a large carious lesion depends on many factors. Biomechanical performance is one of the most important. It has been recognized that resistance to restoration failure is not solely a biological concern (e.g. toxicity), but that the cavity shape, dimensions, and the state of stress must all be taken into account. In the present study, a newly developed auto-mesh program was used to generate 30 three-dimensional (3D) finite element (FE) models simulating the biomechanics for multiple factorial design of the MOD gold restoration in a maxillary second premolar. Stress levels were related to individual design factors (e.g. pulpal wall depth [P], isthmus width [W] and interaxial thickness [T]) and to their interactions under the worst physiological scenario: a concentrated bite force acting on lingual cusp with debonded interfaces between cavity walls and restorations. The results showed that enlarging the volume of the MOD cavity significantly increased stresses in enamel but did not intentionally affect stresses in dentin. The alternation of individual design parameters significantly changed the peak stresses (P < 005). For all three parameters, except for the width, the peak stress increased as the cavity dimension increased. Stress elevation rate (termed as 'volumetric stress rate' – stress elevation by increasing one unit volume of the restored materials) was different among three design factors. Depth was the most critical factor governing the stress elevation in enamel (176 MPa mm3) while length (interaxial thickness) was the most important parameter in dentin (049 MPa mm3). Width was the least compromising factor to the remaining tooth, 032 MPa mm3 for enamel and −023 MPa mm3 for dentin. The findings, at its core, did not fully agree with the

traditional concept that the preservation of tooth substances will reduce risk of tooth fracture. This study leaves open possibility for the structural optimization of the MOD restoration”

Robust Design Webster’s dictionary defines a Robust System as one that has “demonstrated an ability to recover gracefully from the whole range of

exceptional inputs and situations in a given environment. One step below bulletproof. Carries the additional connotation of elegance in addition to just careful attention to detail.” Now recall the definition we used at the beginning of this chapter for “optimum”: “the greatest degree or best result obtained or obtainable under specific conditions”. 3

From “Multifactorial analysis of an MOD restored human premolar using auto-mesh finite element approach”, Lin CL, Chang CH, Ko CC, J Oral Rehabil. 2001 Jun;28(6):576-85

10

CAE and Design Optimization – Advanced

Optimization – Requirements and Approaches

The last two words in that definition have often exercised engineers’ ingenuity. Since “all measurements are subject to variation4”, to what extent can we rely on the existence of “specific” conditions? If the conditions that are used to arrive at the optimum design themselves are likely to vary, then how certain can we be that the design is indeed optimal? How will the design respond to the “exceptional inputs” that Webster mentions? Consider the graph, which plots the response vs. a design variable. Let’s assume the objective is to minimize the response. That is, the best value for the design variable is the one that gives the least response. Variations in the design variable (x) result in variations in the response (z). In the figure, possible variations in x are shown at two points: at the optimum point and at the “robust” point.

As shown in the figure, this means that the non-optimum-but-robust design is actually better than the optimal design! Given the same range of variation in x, the spread in z at the optimal-point exceeds the spread in z at the robust point. The optimal design is better only if variations in the input can be controlled. A conscientious designer should specify the maximum permitted variation 4

The observation is credited to W.E.Deming, the importance of whose work on Quality cannot be overestimated. 11

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

within which the optimum remains optimal. Else, the claim to be an optimum design is clearly dubious. Further, as we saw, mathematically demanding techniques to find optimal solutions make several assumptions: that the functions are all continuous, that a global minimum exists, and so on. While all models are incorrect to some extent, can we quantify the error in the approximation inherent in these assumptions?

Reliability Reliability is usually defined as the probability of a measure lying within a specific range. Almost any product will fail sooner or later. Designs that rely on large factors of safety are often called conservative. That is, they are safe, but perhaps excessively so. Can they be made less safe without any reduction in salability? The consumer-good industry in particular has strong motivations to think this way. Most people would prefer a fragile-but-light cell-phone to one that is stronger-but-heavier because it’s over-designed by liberal uses of safetyfactors. Not only is product-failure not as dangerous as an air-crash, the products themselves are often designed to have a short life. HP is said to have spent over US$125,000,000 to design lighter, and cheaper, printers. In an effort to convince his designers, “manager Tom Alexander finally grabbed an HP printer and set it on the conference room floor. Then he stood on it, all 200 pounds of him. The point behind his grandstanding? Customers aren't going to use printers as step stools, so don't add costs by building them strong enough to withstand the weight of a grown man.5” Why is this important for our goal of optimizing? We need to recognize that if an event or condition is probable to occur, not dead-certain, then we can quantify the probability of failure. In the example of the HP printers, it is improbable but definitely possible that a customer may pile something on a printer that is even heavier than the intrepid manager. If the probability of this is known, and if from this the probability of failure can be estimated, then the business manager can design a warranty cost accordingly.

5

12

From an article in Fortune, February 2003.

CAE and Design Optimization – Advanced

Optimization – Requirements and Approaches

In other words, we need to account for the fact that in the real world, many events are better modeled as probable than certain, and we need to evaluate our designs in the light of these probabilities. These evaluations are measures of the reliability of the design. An infinitely reliable design (one that will never fail) is often much more expensive than one that is designed to be 99% reliable (i.e. one that is designed such that one out of every 100 will fail). Consider the example of the “Spirit”, the Mars Rover. The exorbitant cost of launching a space probe to Mars6 meant that the design goal was a life of 90 days only (a Martian day, called a sol, is 39 minutes and 35 seconds longer than an earth-day). The probe celebrated its 1000th day of active-service on the 26th of October, 2006. Whether this was due to a happy combination of improbable conditions or an excellent design is a matter of conjecture!

Summary

“Business for consumer electronics makers hasn't looked this good in two decades, with revenue rising well over 10 percent in major world markets, yet gadget makers are still turning in meager profits. Analysts said the industry has landed in a virtuous cycle where higher volumes are needed for better economies of scale and lower costs, which lead to more competitive prices. That drives consumer demand, but also causes oversupply, which leads to low margins.” From a Reuters article by Lucas van Grinsven January 8, 2007

The techniques that we have branded mathematically demanding are useful, but obviously the above cases require alternate approaches. In several cases, we prefer the term Design Improvement to Design Optimization. This is sensible because while we cannot be certain that a particular design represents an optimum, we can be certain whether or not it represents an improvement over the previous design. We can, in fact, use the phrase design improvement interchangeably with design optimization.

6

Mars probes have been marked by a failure rate that is high by terrestrial standards, but remarkably low given the lack of information and control. The cost of failure, unfortunately, is almost always enormous. 13

Optimization – Requirements and Approaches

CAE and Design Optimization – Advanced

Finally, do not take the use of the phrase “mathematically demanding” to mean that the techniques that will be presented in this book are not rigorous. It’s just that they follow a different set of mathematical rules, to which we will devote three chapters!

Robust Design: Not just strong. Flexible! Idiot Proof! Simple! Efficient! A … high level performance despite … a wide range of changing client and manufacturing conditions. Genichi Taguchi

14

CAE and Design Optimization – Advanced

Design Improvement – A Designer’s View

Design Improvement – A Designer’s View Since the mathematics involved can be quite demanding, it’s useful to start with a qualitative understanding of the approaches that help us design in the face of the challenges outlined in the earlier chapter. You may find it useful to re-read this chapter after having covered the mathematics.

Concept vs. Existing Design One of the advantages of using methods like topology optimization is that they can make the concept design stage come alive with the non-intuitive yet functional elegance of the solutions. However, not all designers have the freedom to work with fresh concepts. In many cases, existing designs are tweaked or nudged to better states. This is particularly true in the case of “one-of” designs like spacecraft, where the fear of failure inhibits the use of radical innovations. We can summarize the relevance of design-improvement methods, using the following table: Challenge

Applicability of Basic Tools

Advanced Tools Required?

Concept Design

Very useful provided the mathematical models are well behaved

Not essential

Existing Designs

Very useful provided the mathematical models are well behaved

Useful, not essential

Design Space Exploration

None

Design of Experiment

Non-linearity

Linearization provides an initial starting point.

Approximations,

MDO

Decomposition may be applicable

Optimization Approximations, Optimization

MOO

None

Experiments, Approximations, Optimization

Robustness

None

Stochastic Studies

Reliability

None

Stochastic Studies

15

Design Improvement – A Designer’s View

CAE and Design Optimization – Advanced

The rest of this chapter introduces the Advanced Tools listed above: Design of Experiments, Approximations, Optimization and Stochastic Studies.

Design Of Experiment The current definition of the scientific approach is often traced back to Francis Bacon’s procedure “which by slow and faithful toil gathers information from things and brings it into understanding”. The use of the adjective “scientific” is taken to mean that there is a verifiable basis for any assumptions of behavior. Today’s step-by-step approach to understanding phenomena is 1. Put forward a hypothesis 2. Analyze the data available 3. Conclude by rejecting or accepting the hypothesis This approach is well recognized and widely applied to Quality Control, where engineers try to determine settings for the manufacturing process that will result in an acceptable quality. Most manufacturing processes are very complex and there is often little understanding of cause-and-effect. Despite this, the analysis of gathered data allows engineers to fine-tune processes. More recently, this approach has been applied to computer simulations. There is one important and significant difference between computer simulations and experiments in the real world. In the latter, repeating an experiment even on the same subject invariably leads to at least a small difference: the galvanometer records a slightly different value, the subject of the interview reacts differently, etc. This inherent noise is absent from computer models7. Running the same analysis again on the same computer will give the same answer. The next sections will cover the use of Design Of Experiments8 (DOE) to computer simulation, and show why they have been so successful in their adoption. The use of DOE is so widespread today that we often talk of “the result of the DOE” rather than the more correct “result of the experiment”! 7

Computers models too can be susceptible to noise. We neglect it in this discussion, but it is a topic of active research. 8 This method is usually traced back to Ronald Fisher, for whom the F-distribution, which we will encounter when we discuss ANOVA, is named. 16

CAE and Design Optimization – Advanced

Design Improvement – A Designer’s View

Approximations: Building Better-Behaved Models Any model is an approximation of physical behavior. There are many reasons we have to accept this limitation. Mathematics today does not have entirely satisfactory ways to deal with several classes of numbers, equations are based on observations which cannot be proved to be universal, measurements have limited accuracy, and so on. The introduction of Calculus by Newton and Leibniz led to an explosion of equations to predict the rate of change of one variable given the rate of change in others. These differential equations were quickly applied to physics and mechanics. Unfortunately, several of these equations were not solvable. An equation could be recognized as comprehensive, but in the absence of an effective way to solve it, engineers were not benefited much. The introduction of series methods and the subsequent use of computers have made a dramatic difference, and continue to do so even today. Unfortunately, even today, several equations are so intractable that they cannot be solved effectively at an acceptable cost or in an acceptable time. Why is this relevant to us? Remember that numerical methods like the Finite Element Method have been tremendously successful and have, in several applications, promoted the use of highly-non-linear models. Now recall from our earlier discussion that the “ill behavior” of some mathematical models prevents us from using gradient-based optimization methods. This is a bit of an impasse. We recognize that the models are acceptable in terms of simulating behavior. They do a good job of linking independent variables and dependent variables. That is, given a scenario described by the independent variables, we can use these models to calculate responses of interest. The power and utility of these models is clear. But we cannot use these models in optimization because they are not wellbehaved enough. Either gradients don’t exist, or the model is so demanding that we cannot afford to evaluate it at enough configurations to search for the optimum.

17

Design Improvement – A Designer’s View

CAE and Design Optimization – Advanced

One way out of this is based on the following steps9: 1. Use the mathematical model (such as an FE model) to calculate responses at selected points. DOE is used to reduce the number of these evaluation points as much as possible. 2. Use these evaluated responses to generate an approximate equation that is mathematically well-behaved. At a minimum, it should be quick to evaluate at any point of interest. 3. Perform further investigations using this approximate function. Since it’s easy to evaluate, various methods can be used to get at the improved design. These range from gradient based methods to Monte Carlo methods, as will be detailed later. 4. Since the approximation is just that – approximate – use statistical measures to quantify the reliability of the approximation

Stochastics Most problems in engineering textbooks are very well defined. Since the goal is usually to acquaint the student with the principles or theories being taught, complexities are usually ignored. Problems in the real world are rarely so accommodating. New product designers have to live with constant changes in specifications, giving rise to the joke that designing to a specification is like walking on water: easy to do if it’s frozen. But even if specifications are removed from the list of things that can change, the engineer in the “real” world is presented with a huge list of items that cannot be treated as the absolute truth. A spread of 10% is often considered to be sufficiently accurate for engineering purposes. Engineers have long recognized and dealt with several forms of variation or uncertainty. CAD users will be familiar with the stack-up analyses used to determine tolerances that should be assigned to nominal dimensions. These are techniques that designers employ to deal with uncertainties in manufacturing related aspects. 9

Readers of the volume on CAE And Design Optimization – Basics will recognize that OptiStruct also uses approximate models.

18

CAE and Design Optimization – Advanced

Design Improvement – A Designer’s View

Stress Analysts use factors of safety to deal with uncertainty in loads, material properties or environmental conditions. The magnitude of the factor-of-safety depends on the probable variations in conditions and on the cost of failure. Another approach is to construct analytical methods such as Random Response Methods, widely used in vibration studies. Sometimes, these approaches lead to an unnecessarily conservative, and therefore expensive, design. From a design engineer’s perspective, then, rather than calculating the effect of a clearly defined condition (i.e. something that is deterministic) you need to work with a set of conditions that is known to vary. Some variations may be entirely random: you may know little of the event, except that it is possible. Earthquakes are an excellent example of this. No one, today, can predict when an earthquake can occur. Nor can they predict what the magnitude of the earthquake will be. Many items in Mechanical Engineering are stochastic: that is, they are random, but in a predictable fashion10. You cannot say for certain, but you can conjecture, or guess, at when the event will occur, and with what severity. The word “stochastic” is relatively easy to understand, particularly if you look at its synonyms: hypothetical, , assumed, indeterminate, postulated, speculative. The last synonym is excellent: a stochastic method is one that is speculative. Usually, stochastic processes are treated as distinct from (and in some sense the “opposite” of) deterministic processes, which are defined by a set of equations that specify exactly how the process behaves. Examples of stochastic processes range from currency exchange rates to the spread of epidemics. What is common to all of these is that we cannot say for sure how the value will evolve, but can say what value it will probably take.

10

All random events can be modeled using stochastic approaches provided there is a large volume of data. 19

Design Improvement – A Designer’s View

CAE and Design Optimization – Advanced

As design engineers, we need a way to simulate possible variations in conditions and study the impact of these variations on measures of design quality. Factors of safety have traditionally been used to compensate for such variations. One effect they have is to convert a stochastic problem to a deterministic one, but at a cost. If you don’t want to pay this cost, you’re going to have to find a way to accept the stochastic nature of variables that affect the performance of your product. It’s important to remember that this does not mean you should neglect factors of safety. Failure prevention is an essential part of design. Stochastic behavior comes in particularly useful when we want to quantify the effects of variations on the design. We can either vary the controllable parameters, or investigate the effects of variations of uncontrollable parameters. In other words, either we are already aware that it won’t fail and we want to see how much we can tweak it towards better performance, or we are willing to tolerate some failure in return for better performance.

Optimization DOE, Approximate Models and Stochastic Models are all ways to investigate the behavior of our Design Space. We can use these investigations to improve our design in several ways. First, if the original model is not mathematically well-behaved, we can apply gradient-based optimization tools to the approximate models to search for an optimum. The methods outlined in this book can also deal with the presence of multiple local minima. Next, we can combine stochastic studies with optimization methods to perform Reliability Based Design Optimization (RBDO). In this approach, we include the uncertainty in data, in responses and constraints in the investigations, and draw our conclusions based on these. We can also perform trade-off studies to identify Pareto Optimal designs and construct Pareto Frontiers. In cases of MDO, we may also need to work with multiple solvers. The variables of interest may have different behaviors in each “physics” and we

20

CAE and Design Optimization – Advanced

Design Improvement – A Designer’s View

need a method to search for the solution that is the best across all these multiple-disciplines.

Summary To optimize design problems that demand the use of complex mathematical models, or multiple disciplines, or that have multiple objectives, then, our requirements of computer software are: •

Support for various DOE approaches

•

The ability to build and evaluate Approximate Models

•

Capabilities to include and interpret Stochastic effects

•

Capabilities to search for global minima, optionally using approximate models

•

The ability to build models for and extract results from multiple solvers

After all, facts are facts, and although we may quote one to another with a chuckle the words of the Wise Statesman, “Lies - damn lies - and statistics,” still there are some easy figures the simplest must understand, and the astutest cannot wriggle out of. Lord Courtney

21

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

Statistics – A Worms Eye View Henri Poincare is said to have described mathematics as the art of giving the same name to different things11. A cursory study of statistics can reinforce that impression. However, in order to effectively deploy the powerful methods described in the earlier chapters, a good grasp of some statistical techniques and their applications to CAE and Design Optimization is indispensable. This chapter lists terms that we will encounter frequently in our usage of CAE for advanced design optimization. Most are defined without any preamble or context. These will be provided in the next two chapters.

Dealing With Populations The word statistics is believed to first been used in German, where the word Staat refers to a body of men. It’s natural, therefore, to start our revision of basic statistics with the definition of the term population. In our context, we take the word to describe all the variables (and their ranges) that form our study. Populations can be further classified.

Discrete vs. Continuous If the members of the population are distinct, we treat them as discrete entities. An example would be the collection of all prime numbers. While the set is infinite in the sense that there is no “last” prime number, each prime is a discrete member of the set. In some cases the data we are dealing with is continuous in the mathematical sense of the term. For instance if we choose to deal with the set of all possible temperatures, the set is truly continuous. In several cases even if the population is truly discrete, we prefer to model it as continuous for ease of manipulation.

11

22

Quoted by E.T.Bell, in “Men of Mathematics”

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

A good example of this would be the population of a small village vs. the population of a large country. Since the former is likely to be a few hundreds at the most, data can be easily manipulated by treating each inhabitant as a distinct member of the village. In the latter case, where the population lies in the hundreds of millions, it is much easier to treat it as a continuous set. In this case, all the theorems of differential and integral calculus can be applied.

Random Variable We use an algebraic symbol to represent any member of the population. This symbol is called a variable. It can take on the characteristics of any member of the population, just as a variable in an algebraic equation can take on any value in the range / domain. If the variable is not predisposed to take on any particular values, then we call it a random variable, often abbreviated to RV. RVs are sometimes segregated into discrete random variables and continuous random variables, depending on whether the population is being treated as distinct or continuous. There is little point in working with variables that are not random, just as there is little point in playing a game where the die is loaded: the outcome is a foregone conclusion. Therefore we focus our attention on random variables only.

Ways To Measure Data Grouping data to form the population is the first step. Unfortunately, defining the basis for a group is by no means trivial or obvious. And different bases for inclusion in a population can dramatically affect the discernible trends. Discerning a trend, of course, is the main reason we collect the data in the first place: as with most applications of modern science, the goal is to understand the behavior of the population. Based on this understanding, predictions can be made and comparisons drawn. Even though the variable is random, statistical studies show that values tend to cluster around the average. Measures of this tendency to cluster are important in our efforts to characterize populations. 23

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

Measures Of Central Tendency There are three basic measures: the mean, the median, and the mode. While there are several measures of the mean, most often we look for a value that can be interpreted as the average value. While the mean, median and mode are often distinct from each other, they can sometimes be the same. In particular, for the Normal Distribution that is described below and that is our main focus of attention, all three are the same. The most common symbol for the mean is µ, while x is sometimes used. Standard Deviation While measures of central tendency indicate where the mean lies, how can we measure the spread of the data? This is done using the standard deviation, for which the usual symbol is σ. It is a standard measure of the dispersion of the data from the mean. Variance The variance is nothing but the square of σ. It represents the average of the squares of deviations from the mean. Squaring the deviations prevents negative values from canceling out positive values. Beginners sometimes make the mistake of treating the standard deviation and variance as equivalent, since the former is the square-root of the latter. But it is important to remember that they have different statistical properties. We sometimes prefer to work with the variance, as we will see when we discuss ANOVA. Coefficient of Variance The variance or standard deviation by themselves may provide a misleading picture. For instance, consider a population with a standard deviation of 3.75 and a mean of 1,000. If we want to compare it with another population whose standard deviation is also 3.75, but whose mean is 10, it is “obvious” that in the former case the data is more tightly clustered around the mean. In mathematical terms, “obvious” means that we are normalizing the standard deviation with respect to the mean. The coefficient of variance is the standard deviation divided by the mean, and provides a more reliable measure to compare populations.

24

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

Probability To define probability, it’s hard to improve on Aristotle, who said “the probable is what usually happens.” From a mathematical perspective, one of the advantages of statistics is that each individual member of the population need not be dealt with separately: we are always dealing with the population as a whole. As a result we need to measure the likelihood of encountering a particular member of the population: in other words, we need to estimate the probability of occurrence of any element or set of elements in the population.

Continuous Distributions When we work with populations, we find it useful to characterize the variation of the random variable over the entire population in terms of a distribution. That is, a symbolic form or formula that we can use to calculate probabilities of occurrence of various values. There are several different distributions that have been encountered in scientific studies: Binomial, Normal, Weibull, Poisson, etc. The most commonly encountered distribution when working with continuous random variables is the Normal (or Gaussian or Bell-shaped) distribution. In this distribution, the range is from -∞ to +∞, and the distribution is symmetric about the mean. The Normal Distribution: PDF The probability of occurrence of any value x (that is, the probability that the random variable will take that particular value) is given by the equation −( x− µ )2

f ( x) =

1 2πσ

2

e

2σ 2

In this equation µ is the mean and σ is the standard deviation. The plot of this Probability Density Function (PDF) is the reason it is also called the BellShaped Distribution.

25

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

CDF The integral of the PDF is called the Cumulative Density Function, since it represents the sum of probabilities of all values that lie within the limits of the integral. x

F ( x) =

∫ f (t )dt

−∞

This is useful. •

F(x1) is the probability that the random variable will take values less than x1.

•

F(x1) – F(xo) is the probability that the random value will take values between x1 and x0.

•

1 – F(x1) is the probability that the random variable will take values above x1.

The sum of probabilities of occurrence of all values within the population must be 1. For the Normal distribution, this means that +∞

∫ f (t )dt = 1

−∞

Any Normal Distribution can be transformed or mapped to the standard normal distribution, which has a µ = 0 and a σ = 1. Values of the integral are tabulated in Handbooks12. For any given normal distribution, we first transform it to the standard distribution, then check the probability of occurrence of a particular value. For instance suppose your manufacturing process has a µ = 1001.5 σ = 23.54. And suppose your customer will only accept components with a value less than 1001.5 ± 20. You will transform these values to the standard distribution and check the probability that the components you manufacture will be acceptable. If the probability of occurrence of 1001.5 ± 20 is 0.60, this means that 60% of the components you manufacture will be accepted, and 40% will be rejected.

12

Obviously, it is easier to look up a table than to evaluate the integral! In fact, the integral cannot be evaluated analytically. 26

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

The Central Limit Theorem For the Normal Distribution, the Mean, the Median and the Mode are all equal. How about for other distributions? One of the difficulties that beginners face is that of deciding which distribution should be used to model a particular physical event13. This is not an easy question to answer. A reference to the theory of each distribution and a research into earlier uses for the physical event are recommended. However in most experiments, regardless of what distribution best characterizes the population, the Normal Distribution is still of overriding importance. The reason for this is the Central Limit Theorem, which tells us that regardless of the distribution followed by the random variable in question, the mean of samples drawn from the population will follow a Normal Distribution.

Experiments Even with discrete variables, enumerating all possible values is often beyond the power of available resources. For continuous distributions, it is obviously out of the question. Data is sometimes collected by observation. The observer makes no effort to control the variables, restricting effort to collecting data and drawing conclusions from these. If an independent variable does not change during the period of observation, the analyst can draw no conclusions about its effect on behavior of the population. To get around this, the statistician needs to control the independent variables. On other words, the statistician needs to get data from an experiment, where some variables are deliberately controlled.

Sample (vs. Census) Enumerating every possible value that the random variable can take is called a census. While a census is sometimes used – for instance in critical healthcare – most often we use the power of statistics to draw conclusions from a sample. A sample is a subset of the population. The statistical measures of the samples are used to infer the statistical properties of the population.

13

Finite Element beginners will sympathize: the choice of elements is often just as confusing for a beginner. 27

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

Random Samples In order to do this effectively, the sample must be free of bias. That is, it must not be predisposed to any particular sets of values. For instance, if you choose to conduct a survey to determine the most common names in your locality by looking up the telephone directory, you are restricting your survey to that part of the population that is accessible by telephone: perhaps this introduces a bias or prejudice into your observations. Samples that are free of bias are called random samples. Deciding which parts of the population to sample is an important question. Indeed, “the generation of random numbers is too important to be left to chance14.” As mentioned above, the Central Limit Theorem tells us that the properties of samples follow a Normal Distribution even if the population itself does not. Why is this important to us? To estimate the properties of a population, we take a number of samples, each of which consists of finite number of members. Now suppose we treat the means of the samples themselves as a population, and calculate from this the sample mean. That is, if we have drawn 20 samples, and each has a mean µi, the sample mean is given by 20

s=

∑µ i =1

i

20

Because the Central Limit Theorem tells us that the means of the samples themselves follow a Normal Distribution, regardless of the distribution the parent population follows, we can use the properties of the Normal Distribution to calculate a confidence interval. This tells us the probability that the mean of the population will lie within some specific distance from the sample-mean. For example, there is a 95% probability that the mean of the population will lie within the 95% confidence interval of the sample mean.

14

Robert Coveyou, of the Oak Ridge National Laboratory, quoted by Ivars Peterson in “The Jungles of Randomness; A Mathematical Safari”

28

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

For small normally distributed samples, the t-distribution is a good estimate of the confidence interval. When using the t-distribution, it is customary to talk of the “100(1-p)” confidence interval, but the intent is the same: to measure the confidence with which we can assert that the mean of the population lies within some distance of the sample-mean. The smaller p is, the higher the confidence in the estimated values. p = 0.05 is the same as a 95% confidence, while p = 0.01 is the 99% confidence interval. If a report says that the “95% confidence interval” for a response R is 10.5 to 21.27, this means that there is a 95% probability that R will lie between 10.5 and 21.27.

ANOVA To make a meaningful comparison between measurements of different samples, the conditions should be the same for all samples. This is obvious. Unfortunately, any experiment is susceptible to noise. That is, even if the experimenter wants to keep everything the same some variations inevitably creep in. Anyone who has worked in a lab knows that no galvanometer ever reports the exact same value twice. In experiments, moreover, it is sometimes not enough to just measure the mean of the population from the means of samples. We also want to make predictions. That is, we want to change the values of control variables and study their impact on the mean of the population. The values of the control variable are sometimes referred to as the signal. It is often important to estimate this signal-to-noise ratio. A manufacturing process that has a high signal-to-noise ratio is hard to control, for example. When evaluating the results of experiments, our interest lies in deciding whether the change in the measured values is because of noise-in-themeasurements or because of changes-in-the control-variable? The ANalysis Of VAriance, ANOVA, is the method used to do this. The control variables are called factors. Values that the control variables can take are called levels. The values of interest are called responses. Let’s say the experimenter has 1 factor, which can take 3 levels. And let’s say the experimenter has constructed 5 samples (that is, 5 groups).

29

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

Then for each response we can present the results of the experiment as in the table below: Group Number

Responses for:

1 2 3 4 5

Level 1 R11 R21 R31 R41 R51

Level 2 R12 R22 R32 R42 R52

Level 3 R13 R23 R33 R44 R44

Note that R11 would have been the same as R21 in the absence of noise. Also, R12 would have been the same as R11 if the level had been the same. The goal of an ANOVA is to study the difference within groups (i.e. at different values of the control variable, which shows the effect of the “signal”), and the difference between groups (i.e. the effect of the “noise”). That is, ANOVA helps us determine whether the variation in the response due to the change in the control variable is significant, when compared to the change in the response due to noise. If you have multiple factors, then the number of possible combinations increases dramatically. This will be discussed in more detail in the next chapter. For now, we will only point out that in most experiments there is more than one factor. Without going into the equations, we will state simply that ANOVA involves calculating variances in the response between columns (between groups) and between rows (within groups). Then, using the F-distribution, the significance of the difference is estimated15. The results of the Anova are often displayed in a table showing the factors and the levels of significance. ss

df

MS

F

p

Factor 1 Factor 2 Factor 3 Factor 4 Error Total SS 15

30

For details on the F-Distribution and why it is used here, look up the references.

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

In the table, ss is the sum-of-squares, df is the degrees-of-freedom, MS is the mean-sum-of-squares, and F is from tables of the F-distribution. The Fvalue is tested against a critical value from the F-distribution (similar to the confidence-interval described above). The larger the F-value, the more significant the effect of the factor on the response. The last column, p, is related to the confidence-interval. For instance, to have 95% confidence in the effect of a factor, the p would be 0.05 for that factor. If there were multiple responses, ANOVA would involve drawing up the table above for each response. Anova is sometimes conducted using the coefficient of variance, in which case it is called ANCOVA. If there are multiple variables in the experiment, then the term MANOVA is sometimes used.

Monte Carlo Methods Constructing random samples is not an easy task. One alternative approach is based on the theory that if an experiment is repeated enough times using random values for the control variables, the responses can be used to calculate the properties of the distribution itself. This is called the Monte Carlo approach. The main advantage of the Monte Carlo methods is that they are scaleable; they can be applied as easily to a 1000-dimensioned problem as to a 1-dimensioned problem. In engineering practice, these methods are often deployed when there are a large number of variables. There are several variations on the method, all of which are sometimes referred to collectively as the Monte Carlo methods, since the principle behind all of them is the same. When dealing with multiple variables, each of which can have multiple levels, the problem of designing an experiment itself can become intractable. As we will see in the next Chapter, designing a meaningful experiment is not a trivial task!

31

Statistics – A Worm’s Eye View

CAE and Design Optimization – Advanced

In other words, several problems are susceptible to the problems of scale: as the size of the problem rises, the resources required rise exponentially. The designer is faced with the choice of diluting the experiment and damaging the confidence interval, or omitting variables or levels at the cost of insight into the behavior.

Many assume wrongly that Monte Carlo methods can be applied only to problems that involve probability.

In such scenarios, Monte Carlo methods used along with Approximate Models can be a very effective approach.

Interested readers should look up descriptions of Buffon’s Needle to get an idea as to how widely applicable the method is.

Reliability Engineering is one area where Monte Carlo methods are very widely used, because the use of random inputs turns a deterministic simulation into a stochastic one. To sum up, Monte Carlo methods use sequences of random numbers to simulate a process. The only requirement is that the process be described by a PDF.

As Buffon’s Needle illustrates, Monte Carlo methods can also be used to calculate the value of π, which is certainly not probabilistic!

Uni-variate Analyses If a distribution has only one independent variable, we say it is univariate. From the available data (gathered either through experiment or by observation) we calculate the correlation coefficient between the variable and the response. The correlation coefficient is usually represented by the symbol r or ρ, and can be calculated by a variety of methods. It’s important to remember that a correlation does not imply a direct cause-and-effect relationship. If the correlation coefficient is non-zero, we then use regression to fit a curve through the data points, and use this regression equation to predict values of the response. Regression is often linear, and is most often calculated using a least-squares approach.

32

CAE and Design Optimization – Advanced

Statistics – A Worm’s Eye View

Multi-variate Analyses If there are multiple independent variables, as is normally the case, we use multi-variate statistics. In this case, the regression equation is a “surface” instead of a curve. The One Factor at A Time (OFAT) approach is sometimes taken to reduce multi-variate distributions to uni-variate distributions. That is, all variables but one are held constant, and the effect of varying that one factor is studied by the experimenter. However, factors are often linked. Some may be independent, some may work in tandem, some in opposition. One of the jobs of the experimenter is to search for such linkages. The OFAT approach will not uncover linkages between variables. An alternate approach, called DOE (described in the next chapter) is better suited to the task.

It's easier to square the circle than to get around a mathematician. Augustus De Morgan

33

Statistics – A Bird’s Eye View

CAE and Design Optimization – Advanced

Statistics – A Birds Eye View The previous chapter laid out several terms in statistics: samples, factors, levels, responses, and tests of significance. We saw how levels of factors are varied across samples, responses are measured, and the significance of the measurements is estimated. All these are based on “sound” mathematical principles. That is, the assumptions and proofs are well laid out. There is one important question that the previous chapter did not address. How are the samples themselves to be constructed? We need a scientific basis that provides guidelines on three critical fronts: •

How many measurements must be made?

•

At which levels must these measurements be made and which factors are important?

•

How can we quantify the error?

The first guideline is provided by DOE. The second question is partially answered by variable screening. The third is addressed by ANOVA.

DOE Design Of Experiments is the procedure of selecting the points in the design space where responses are to be measured. First, let us recall the important terms:

34

•

factors are the independent variables. They may be discrete (e.g. number of pills administered) or continuous (e.g. modulus of elasticity)

•

levels are the values that the factors can take

•

responses are the dependent variables

CAE and Design Optimization – Advanced

Statistics – A Bird’s Eye View

To understand the importance of DOE, let’s take some specific numbers. It will take some patience and concentration to follow the discussion, but the effort is well worth it. For the purpose of discussion, the scenario is as follows: •

We are studying the behavior of a rolling mill.

•

We want to conduct the same experimental measurement 10 times on the same mill so that we allow for any error in measurements or other variations (i.e. noise). That is, there are 10 sample groups.

•

There are 2 Factors, F1 and F2. F1 is temperature, while F2 is the number of passes.

•

Factor 1 can have any of 3 levels – L1, L2 and L3. These represent the values 420° Celsius, 450° Celsius and 470° Celsius.

•

Factor 2 can have any of 4 levels – L1, L2, L3, and L4. These represent the values 0, 2, 5 and 10.

•

The response is the finish of the strip, as measured by the average surface roughness, RA

The combinations of different levels for different factors is best illustrated by the following table (sometimes a graphic display called a Latin Square is used to display the combinations):

# of passes

Temperature 420° 450°

470°

0 2 5 10

As the table shows, there are 12 combinations of factors that are possible – that is, 12 feasible process-settings (3*4, since there are 3 possible levels for the temperature and 4 for the number-of-passes). 35

Statistics – A Bird’s Eye View

CAE and Design Optimization – Advanced

At 10 repetitions, this means 120 possible measurements. If the process designer changes the equipment to allow for 6 levels of temperature and 5 levels of passes, we now have 300 possible measurements. The purpose of the example above is twofold. First, it brings in a new example for the “population”. In this case, the population consists of all the possible combinations of values of all the factors. Since the factors are discrete, a census is possible: that is, evaluate at all possible combinations and choose the best. Second, it brings in a new example of “group” for ANOVA. In a large population, the groups would involve different subsets of the population. In this case since we only have 1 mill, we treat repeated experiments on the same population as different groups. ANOVA, in this case, measures the variation-between-repeated-measurements and compares it with the variation-across-levels. The logic is similar to the earlier example, with the between-groups factor being replaced by the between-repeated-measures factor. Note that if we had had access to 100 identical mills, we could have performed 10 tests, each on a different mill. Clearly this would define a different “population” since the “mill being tested” would be a variable in this case.

Full Factorial An approach where all possible combinations of levels and factors is evaluated (that is, the responses are measured) is called a full factorial experiment. Obviously, it can increase in size very rapidly. For 20 factors and 3 levels per factor, the number of combinations is 320, which is roughly 3x109. In general, if there are n factors, and each factor can have li levels, then the total number of combinations in a full-factorial experiment is given by n

∏l i =1

36

i

CAE and Design Optimization – Advanced

Statistics – A Bird’s Eye View

For preliminary investigations, DOEs are sometimes applied to 2-level designs. That is, only two levels are used for each factor even if more are possible in the population. These 2 levels could be “present” vs. “absent”, “high” vs. “low”, etc. In these cases, the total number of possible combinations is 2n where n is the number of factors. Such experiments are called 2-level experiments, and are usually used to screen variables – to eliminate the trivial and retain the important ones. An obvious way to reduce effort is to choose a subset of the full-factorial. That is, to select a subset of all the possible combinations of levels and factors. This approach is called a fractional factorial design. Depending on which subset is chosen, we end up with different designs. Most of these designs are named after their proposers.

Fractional Factorial In this design, we choose a fraction of the full-factorial. This fraction can be ½, ¼, etc. For instance, a 2-level 3-factor design has 8 combinations if a full-factorial design is chosen. The full factorial design is called a 23 design. If we choose to run a ½ factorial, we call it a 23-1 design, and it will contain 4 measurements. A full discussion of how to choose the fraction is beyond the scope of this book. Look up one of the References for this detail, as well as for a discussion on which subset of the full-factorial to choose for a given fraction. Standard literature often describes fractional-factorial designs as 2(k-p), since there are usually 2 levels per factor. For example, a ½ factorial design with 6 factors is called a 2(6-1) design. “k” is “6”, of course, because there are 6 factors. “p” is “1” because this is a “½ factorial design”. If it were a ¼ factorial design, “p” would be 2. The tables below show the 23 full factorial, and one possible ½ fractional factorial. In these tables, we have chosen +1 and –1 to represent the possible levels for each factor.

37

Statistics – A Bird’s Eye View

1 2 3 4 5 6 7 8

Full Factorial F1 F2 F3 +1 +1 +1 +1 +1 -1 +1 -1 +1 +1 -1 -1 -1 +1 +1 -1 +1 -1 -1 -1 +1 -1 -1 -1

CAE and Design Optimization – Advanced

1 2 3 4

½ Fractional Factorial F1 F2 F3 -1 -1 +1 +1 -1 -1 -1 +1 -1 +1 +1 +1

The fractional factorial design was generated by first choosing 4 values for F1 and F2. The levels for F3 were chosen by multiplying the chosen levels of F1 and F2. We say that we have “confounded F3 by F1 and F2”. We do pay a price for this simplification. The confounding (also called aliasing) means we lose the ability to determine some interactions between factors. Fractional Factorial designs are suitable when some factors are considered more important than others. We are willing to give up some resolution in the weaker factors in return for the economy we gain by virtue of the shorter experiment. However, remember that there is a-priori judgment involved in deciding which factors to treat as “weaker”. Since the effects of the lesser factors is reduced, such designs are sometimes called screening designs. Sometimes, we may choose to hold some factors fixed at chosen levels so that we can measure the contribution of these factors to the total variation of the responses. This is called Blocking. The “blocks” are the levels at which the factors are held fixed. Different methods of choosing blocks to include give rise to different designs: complete block designs, incomplete block designs and randomized block designs.

Plackett-Burman Named for the R.L.Plackett and J.P.Burman16, this design reduces the number of runs even further. The main effects are heavily confounded. For

16

They published their paper titled “The Design Of Optimal Multifactorial Designs” in 1946

38

CAE and Design Optimization – Advanced

Statistics – A Bird’s Eye View

instance, you can construct a 12-run experiment with 11 factors. Standard designs are provided for various numbers of factors.

Central Composite Also called Box-Wilson designs, there are several variations of this method: circumscribed, inscribed, and face centered. The first two require 5 levels per factor, while the third requires 3. CC designs can be full-factorial or fractional-factorial. Remember that a 2-level experiment can only capture linear effects, while a 3-level experiment can capture quadratic effects, and so on.

Box-Behnken A Box-Behnken design, also called a quadratic design, is a slightly more economical variation of the full-factorial CC design but can be more expensive than a fractional-factorial CC Design. 3 levels are required per factor.

Other Designs There are several other types of designs, such as the Latin Hypercube Design, Taguchi Design, and so on. Some designs are particularly suited for computer experiments. The term D-Optimal is used to describe experiment designs that are generated based on the chosen model given the number of runs. The matrix that defines the experiment is generated so as to optimize results for the chosen number of runs. Generating such a matrix by hand is not feasible, so D-Optimal designs are invariably used only for computer-experiments.

Variable Screening In many cases, we want to check whether selected factors have an effect on the responses or not. To do this, multiple samples are constructed, and the experiment is repeated across these samples. As described in the previous chapter, ANOVA is used to test levels of significance of the various factors. If the effect of the variation of a factor is significantly more important than the change of groups, then it is retained. Else it is screened.

39

Statistics – A Bird’s Eye View

CAE and Design Optimization – Advanced

Computer Experiments In many cases, the experiment is conducted using a computer model. This, of course, is of particular interest to us. In the physical world, an experiment consists of choosing levels for factors and measuring the responses. The measurements are repeated either for different samples in a population as in a poll or an inspection in Quality Control, or for repeated measurements on the same sample as described in the example of the rolling-mill. In process control the experiments can be repeated over the same machine. Either way the inherent noise generates different response values. And we then use ANOVA to test significance of the effect of noise on our conclusions. On a computer, running the same set of levels on a computer model will generate the same response, since there is no inherent noise17. This is only to be expected: the computer is a deterministic machine, after all. Therefore you cannot use ANOVA to compare variance-between-groups with variance-across-groups. The goal of a computer model is not to generate such results. Inherent in the computer model is the assumption that the model has already been finetuned so that it only contains important effects. Usually, this is done by a process called Parameter Identification or System Identification. This is explained in the next section. One important outcome of a computer DOE is to generate an approximate model (e.g. a Response Surface) that can be used to conduct further numerical experiments. ANOVA is used, but unlike a physical-experiment where between-groups factors are investigated, here it is used to check which factors should be included in the approximate model. This is called variable screening, in the context of computer experiments.

Summary The previous chapter covered essential terms in statistics and probability. This chapter put those terms together to explain how experiments are conducted, and outlined the issues involved in the design of experiments. 17

As pointed out earlier, computers models too can be susceptible to noise. Neglected in our discussion, as it is a different subject.

40

CAE and Design Optimization – Advanced

Statistics – A Bird’s Eye View

The next chapter looks at the issue from an engineering design point of view, bringing together terms and techniques relevant to CAE.

However beautiful the strategy, you should occasionally look at the results. Winston Churchill

41

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

Statistics – A Designer’s View One of the main strengths of mathematical approaches is that they can be applied to any context, provided the assumptions are fulfilled. This applies to statistics too. When using the tools, it is essential that we keep in mind the implicit assumptions. Statistics is commonly used in non-engineering applications. Prediction of outcomes of elections is an example that’s universally familiar. Pollsters use samples to understand and predict the behavior of the electorate, while candidates use samples to tailor their campaign promises. Coming to engineering applications, statistics is widely used in manufacturing. Applications in process control are well documented, and are familiar to any engineer who has undergone an introductory course in Quality Control. The applicability to engineering design is not as widely known. Accordingly, we will spend the rest of this chapter on two things. We will review some important trends that have led to an increase in the use of these tools. And we will use examples to show how the techniques of the previous two chapters can be used in the context of CAE and Design Optimization.

Applications Of Approximate Models Approximate models, also called Meta-Models or Surrogate Models, are not essential for all applications. To understand how and why they are relevant to our application, we will break our discussion into two parts. First, we’ll list the advantages Approximate models have. Different scenarios may benefit from one or more of these. It is possible that a scenario may not need any of these benefits, in which case Approximate Models can be dispensed with completely. Second, we will discuss the types of Approximate models used in CAE.

Improvement In Mathematically Behavior Non-linear behavior means the input and response are not linearly related. This means a small change in input could cause a sudden jump in output. 42

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

Calculus describes these as sudden changes in gradient. Some situations can be even worse: gradients may not exist at all. For cases like these, approximate models offer a good way out. We choose the form of the approximate model to ensure that it is differentiable or otherwise well-behaved. (Note that differentiability is important for gradientbased optimization methods, but is not required for other design improvement methods.) In other words, we give up some precision for an increase in decorum.

Reduction Of Computational Load Any engineer who has used Finite Element Analysis will jump at the opportunity to use models that can reduce solution time. Analyses in nonlinear applications like vehicle-safety can take several hours of CPU time for a single run. Consider this extract from a technical publication18: “A two-level, full factorial design would yield 27 = 128 treatments,

which is a prohibitive number to perform with FEA. Modifying the FE models tends to be extremely tedious, and the simulation run time would be unreasonably long.” A single analysis can take several hours of CPU time. A numerical experiment would be prohibitively expensive. And pity the engineer who finds a mistake in the experiment design at the end of the experiment. Approximate models can reduce the required computational effort by orders of magnitude. What’s more, they offer a way out of the second problem too. If you find an error in the experiment-design, you can repair the approximate model: points that define the model can be added, removed or moved.

18

“Failure Analysis of Rapid Prototyped Tooling in Sheet Metal Forming – Cylindrical Cup Drawing”, Y.Park and J.S.Colton, Transactions of the ASME, Vol 127, February 2005 43

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

Variable Screening Testing takes time and effort. It is expensive. The more the factors you want to test, the higher the time and expense. What if some factors are unimportant? Can you conduct a preliminary investigation to rank the importance of various factors? Can you then save time and money by excluding the lower-ranked ones from more detailed studies?

Screening samples are carefully constructed to detect such effects. In the earlier chapters we saw how ANOVA is an effective method to quantify and compare the effects of factors. With computer models, our approach must be a little different. Since a computer model is deterministic, repeating an experiment on the computer will yield the same results as long as there is no variation in the levels of factors. We cannot use ANOVA to compare between-groups variations to within-groups variations. How, then, can we use computer models for variable screening? With specific reference to CAE, there are two scenarios we will consider. But first, let’s review the basics of modeling for CAE. 1. Behavior of a real-world situation is captured using observation or experiment. 2. Mathematical Models, which usually involve some approximation again, are used to reflect the observed behavior. These are not always well-behaved, but are often called high fidelity models.

“Nobody believes analysis results except the analyst. Everybody believes test results except the test engineer.” M.Racicot

3. We further build Approximate Models, which are derived from the Mathematical Models.

Now let’s examine our issue: variable screening. In view of the 3 steps listed above, let us state the question more precisely, recognizing that there are actually two different questions:

44

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

1. We want to know which variables affect the power of the high-fidelity model to reproduce observed data. 2. We want to know which variables affect the power of the approximate model to reproduce the high-fidelity model. In the first case, we have some data from physical observations or experiments. We need to fine tune variables in the computer model. Take damping-factors or friction coefficients, for example. Mechanics is not welldeveloped enough for us to establish these material-data from fundamental considerations. They are usually set empirically – that is, to match data from an observation or an experiment. In the second case, we have a computer model that is tried and tested. There is no doubt about its validity. This is the high fidelity model. It could be an analytical expression or a numerical model19. However the high fidelity may have several input parameters. If we are to use it to conduct experiments, which of these variables should we include in the experiment? If you have an analytical equation that relates responses and factors, calculus can be used to evaluate sensitivity. Unfortunately, it is not always possible to determine the sensitivity of a response to the factors even if an analytical model links the two. The equation may not be differentiable in the domain. Or, it may impossible to evaluate it, even if it exists. If numerical models such as FEM are used, there is a model that reliably calculates response from inputs, but is not analytical. So sensitivity must be calculated numerically. If we include more factors than are essential, not only do we increase computing time, we also increase the difficulty of assimilating the results! Remember that sorting through the collected data is often a chore that experimenters dread. The first case (screening between observed data and the high-fidelity model) is addressed by parameter identification. In this approach, the results of a physical experiment are set up as target values. The computer model is run with various levels of many factors. By inspecting the 19

Transfer Functions (covered in most courses on Control Systems) provide excellent examples of high-fidelity analytical models. Many linear processes can be accurately described by numerical models. 45

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

computer model against the available physical-experiment results, we determine which factors can be safely omitted from the computer model without hurting its ability to match the physical-experiment results. This method does not need approximate models, but uses the same techniques to check which values can “safely” be omitted or used in the computer model for further CAE. In the second case, approximate models are extremely useful. A screening experiment is designed using the high fidelity model as the “target”. Screening experiments typically involve only two levels for each factor. The designer is encouraged to include as many factors as possible. ANOVA is conducted on the factors themselves to quantify their effects on the responses. Without presenting the mathematics here20, we will summarize the method: 1. Construct the approximate model as a weighted sum that involves the factors. 2. Choose a number of sampling points using one of the DOE methods described earlier. 3. Use regression analysis (a least-squares approach is often used) to calculate the coefficients in the summation. The relative values of the coefficients in the summation represent the importance of each factor. 4. Inspect the residual (usually shown as a graph, this shows the difference between the high-fidelity model and the approximate model) to ensure the overall adequacy of the model. 5. The error in the approximate model (that is, the difference between the approximate model and the high-fidelity model at each sampling point) follows a Normal Distribution.

20

For an excellent description see “Automotive crashworthiness design using response surface-based variable screening and optimization”, K.J.Craig, N.Stander, D.A.Doorge, S.Varadappa, International Journal for Computer-Aided Engineering Software, Vol 22 No.1, 2005, pp.38-61 46

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

6. Use ANOVA to calculate the contribution of each factor to the approximate model, along with the “confidence” in these estimates. Unlike the earlier example of ANOVA, the results of this screening are usually presented in graphical form. If the approximate model is to be used calculate multiple responses, one graph is presented for each response. For instance consider the histograms below21, in which the length of each bar indicates the effect of the corresponding factor. Remember that the estimated effect has some error. This error is calculated by the ANOVA. The F-values are used to estimate the “confidence” in the estimate. This is usually taken to be 95%. In the graphs below, the lighter part of each bar is that part of the effect that the analysis is 95% confident of. The darker part, which is the lower-confidence fraction of the total effect, is treated as error.

This type of chart is called a Pareto Chart of Effects. Sometimes a line is drawn across the bars to indicate how large an effect has to be in order to be statistically significant. From the charts shown above, the factor “R_Bracket_Gauge” has a significant effect on the “Mass”, but is almost irrelevant as far as the “Left Knee Force” is concerned. Since the “T_Flange_Depth” has a negligible effect on both responses, it can be screened from further experiments.

Types Of Approximate Models Approximate models, often referred to as Response Surfaces, are usually constructed in one of three ways. Remember that this technology is quite 21

From “Automotive crashworthiness design using response surface-based variable screening and optimization”, cited above. 47

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

recent, at least in comparison to the other methods commonly used in CAE, so these methods are still evolving. Common to all approximate models is the fact the high-fidelity model is used to evaluate responses at sample points. The sample points themselves are chosen using a DOE.

Least Squares Regression In this approach, regression analysis is used to fit the surface to the sampling points using a least-squares approximation. It is believed that a DOptimal design is the most suited experiment for this approach. The surface itself is usually a polynomial surface – either linear or quadratic or elliptical. The least-squares approach means that the surface is unlikely to match the high-fidelity solution anywhere. A large number of sampling points and a higher order of polynomial helps improve accuracy of the response surface. This method is usually good at capturing global minima, since it tends to smooth out local minima.

Moving Least Squares Regression This is a modification of the above method, in which the weights in the regression equation are a function of the distance of the point of interest from each DOE sampling point. Since the weights associated with each sampling point “decay” as the evaluation point moves away, an analytical expression is not possible but the approximation is still computationally efficient. Usually, the type of decay can be chosen by the analyst to vary the closeness of fit.

Kriging Named for D.G.Krige, who was trying to determine the grade of ore from samples, this method is sometimes preferred because of its improved accuracy. Unlike the least-squares fits, the surface interpolates the values at the sampling points. It provides for the inclusion of a stochastic component, with a given mean and variance, into the approximation model. It is believed to be less robust than the least squares method, particularly if the high-fidelity model’s results contain some noise.

48

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

Optimization Remember that the approximate model is not essential. If the high-fidelity model is well-behaved and computationally efficient, optimization can be performed without using an approximate model. Since well-behaved high-fidelity models are amenable to gradient-search methods, “mathematically demanding” techniques can be used. The techniques described in this paper gain importance either when gradientsearch methods cannot be applied to the high-fidelity model, or when we have reason to believe there are several local minima. In the first case, we can use the approximate model to provide a mathematically well-behaved function to the optimizer. To address the second problem, where local minima make it hard to locate a global minimum, an adaptive response surface search is preferred. This is best explained with reference to the figure. f(x) is the objective function and the domain is between points 1 and 3. The goal is to locate the global maximum. The search starts with any two points, numbered 1 and 2 in the figure. Using these two points and the responses at these two points, RS1 is constructed. The maximum of RS1 is easily determined: it lies at point 3. Next, evaluate the response at the points 1, 2 and3, and construct the quadratic curve RS2. The maximum of RS2, again easily determined, lies at point 4. We now evaluate the response at the 4 points 1, 2, 3 and 4. This allows us to construct RS3 and to locate its maximum. If we evaluate f(x) at this point, and if it turns out to be a maximum, we stop. The combination of approximations, trade-off studies using Pareto Frontiers, and search methods like the above, allow us to apply optimization techniques to computationally demanding, non-linear and multi-objective problems.

Reliability Recall our earlier definition of reliability: the probability of a measure lying within a specific value. One of the main drawbacks of an “academic” knowledge of engineering is that most textbooks present well-defined 49

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

problems. Loads are clearly specified, material properties are clearly specified, geometries are clearly specified, etc. The real world is quite different. An engineer who starts to practice engineering design has to deal with an inherent uncertainty not just in the design data but also in the manufacturing process. Manufacturing engineers have long lived with acceptable levels of uncertainty. The quality measures of components that are mass manufactured follow a Normal Distribution. For a Normal Distribution, a large percent of the population lies within 1 standard deviation of the mean. An even larger fraction lies within 3 standard deviations of the mean. Manufacturing engineers seek to control process parameters so that they can achieve Six-Sigma quality. That is, they do not shoot for zero rejections. They shoot for an acceptance rate that matches the six-sigma spread of a Normal Distribution.

Design For Six Sigma This approach, applied to design (and abbreviated to DFSS) has been used quite widely. It is worth noting that there are differences of opinion on the use of statistics in life-threatening situations. Richard Feynman wrote22 “It appears that there are enormous differences of opinion as to

the probability of a failure with loss of vehicle and of human life. The estimates range from roughly 1 in 100 to 1 in 100,000. The higher figures come from the working engineers, and the very low figures from management. …

Engineers at Rocketdyne, the manufacturer, estimate the total probability as 1/10,000. Engineers at Marshal estimate it as 1/300, while NASA management, to whom these engineers report, claims 22

In his report on the Shuttle disaster of 1986. It makes excellent reading for engineers! See http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/rogerscommission/Appendix-F.txt for details. 50

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

it is 1/100,000. An independent engineer consulting for NASA thought 1 or 2 per 100 a reasonable estimate.” The cost of failure, clearly, is debatable. Whether a failure of an MP3 player is less critical than the failure of an axle of a car sometimes depends on who the owners are! However, remember that cost is not the only reason to adopt DFSS. The impossibility of eliminating variations in data means many problems cannot be treated as purely deterministic. A Design Engineer using CAE has to accept the fact that there will be variations in any of • • • • • • • •

Material properties Loads Boundary conditions Initial conditions Geometry errors Assembly errors Solver precision Choice of model (mesh, element type, algorithm, etc.)

Noting that the aerospace industry emphasizes safety over cost, an extract from an aerospace-application conference23 presentation serves to highlight the fact that cost is not the only motivation for stochastic analyses: Structural Material Scatter Material

Characteristic

CV

Metallic

Rupture Buckling

8-15% 14%

Carbon Fiber

Rupture

10-17%

Screw, Rivet, Welding

Rupture

8%

Bonding

Adhesive Strength Metal / metal

12-16% 8-13%

Honeycomb

Tension Shear, Compression Face-wrinkling

16% 10% 8%

Inserts

Axial Loading

12%

23

Klein M., Schueller G., “Proabilistic Approach to Structural Factors of Safety in Aerospace”, Proceedings of the CNES Spacecraft Structures and Mechanical Testing Conference, Paris, June 1994 51

Statistics – A Designer’s View

CAE and Design Optimization – Advanced

Thermal Protection

In-plane Tension In-plane Compression

12-24% 15020%

Load Type

Origin Of Results

CV

Launch vehicle thrust

STS, Ariane

5%

Launch vehicle quasistatic loads

STS, Ariane, Delta

30%

Transient

Ariane 4

60%

Thermal

Thermal Tests

8-20%

Deployment Shocks (solar array)

Aerospatiale

10%

Thruster Burn

Calibration Tests

2%

Acoustic

Ariane 4 and STS (flight)

30%

Vibration

Satellite Tests

20%

Load Scatter

We have already seen the issues related to the use of high-fidelity models, and have built methods to estimate the error if we use approximate models both to explore the design space (using DOE) and to optimize. To complete our toolbox, we also need a way to quantify the variation in response, given a variation in data. Called stochastic analysis, this helps us estimate the reliability of the design. If we deem the risk of failure too high, stochastic analysis also tells us which factors we should pay more attention to. These analyses are usually done using Monte Carlo methods. An enormous number of runs can be required to make the best use of statistical effects. Monte Carlo experiments frequently require thousands of runs.

Summary The first chapter discussed MOO, MDO, and non-linear models, and introduced the need for experiments. In the next three chapters, we looked at the mathematical principles involved. Our exploration first covered the applications of statistics and probability – DOE, ANOVA, Stochastics - to process control, which is usually addressed at the undergraduate level in relation to Quality Control. We then used the Engineering Design and Optimization context to understand how the same tools are applicable here too.

52

CAE and Design Optimization – Advanced

Statistics – A Designer’s View

It is a simple logical extension, then, to conclude that CAE requires software that allows us to •

explore the design space

•

search for optimum solutions

•

estimate reliability

•

perform trade-offs

•

interface with multiple solvers

Depending on the problem at hand, of course, we will need to use one or more of these capabilities. But we certainly need these capabilities for our toolkit to be complete!

You'll never have all the information you need to make a decision. If you did, it would be a foregone conclusion, not a decision. David Mahoney

53

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

HyperStudy - Putting It All Together We are now clear on two things. First, that we need to resort to advanced optimization tools if the design problem involves one or more of the following challenges: •

non-linear behavior, which results in large computation time, makes the use of gradient-search methods harder, and may possess multiple local minima

•

multi-disciplinary analysis, with different solvers being used for different disciplines

•

multi-objective-optimization, which calls for a best-compromise instead of a best-design

•

stochastic behavior, which means reliability and robustness must be quantified.

Second, that CAE tools to address such problems must provide:

54

•

support for multiple DOE models, in order to o explore the design-space o build approximate models o conduct Monte-Carlo type stochastic analyses

•

the ability to evaluate and improve approximate models

•

algorithms to search for the global optimum among local minima, either with or without an approximate model

•

capabilities to include and interpret stochastic effects, including support for reliability estimation

•

the ability to perform trade-offs

•

the ability to build models for and extract results from multiple solvers

CAE and Design Optimization – Advanced

HyperStudy - Putting It All Together

HyperStudy With direct connections to Altair’s Solvers24, interfaces to several other solvers, and the ability to interface with any solver, HyperStudy is a good way to achieve most of the requirements we’ve studied so far. If you have set up a model in HyperMesh, for example, you can invoke HyperStudy which then has direct access to model variables. If you’re using a solver that is not from Altair but that interfaces with HyperMesh and HyperView, then the approach is a little different. You would first create a data file for the solver (you could use HyperMesh to do this, though it’s not essential). Then you would read this data file in HyperStudy, and set up your study by choosing variables from the data file. HyperStudy uses its interfacing abilities both to generate additional data files for the solver (with changed data) for each experimental measurement, and to read in evaluated responses. Finally, if you’re working with a solver that has no existing interface to HyperMesh or HyperView, you could use the Templex programming language to build a custom interface25. The assignments illustrate the steps in detail, so they will not be covered in this chapter. Here, we will look at the steps you should take to setup and review a problem. With the background laid out in the earlier chapters, you should be able to follow why we take the approaches described below. You should pay particular attention to the recommended methods to review results.

Before The Study Remember that one of the main reasons we are using HyperStudy is that CAE is quite computationally intensive. It is prudent, therefore, to spend some time planning your attack strategy. Without adequate planning, it is easy to find you have invested more time and effort than you had intended to. 24

Of which Finite Element analyses, kinematic and dynamic analyses of mechanisms, and sheet-metal forming simulation are covered by the other volumes of this series. 25 This is covered in the volume titled Managing The CAE Process 55

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

As with all experiments, gathering too much data is often as bad as gathering too little. The effort involved in collating and interpreting results is all too often underestimated. To make things easier for you, HyperStudy follows a “wizard” approach. That is, the interface provides the various functions as a step-by-step sequence, ensuring that you complete the steps in the correct order. Even before you start building your model, you should be clear on the answers to the following: 1. Are the objectives and constraints suitable? Are they physically meaningful? 2. Can the responses be measured quantitatively? What accuracy is required for a meaningful experiment? 3. Are there any effects that should be deliberately blocked or omitted? Should a screening-run be performed to verify assumptions on importance? What levels should be used for the screening run? 4. Which effects will be aliased or confounded by the screening run? Are they important? 5. What should the sequence of experiments be? Can the results of the first experiment be used to create the second, and so on? 6. How will findings be confirmed?

Also spend some time planning how you will collate and present results. FE models are often interpreted by displaying results on the 3D model of the component, but this may not be appropriate. Remember that you will have to review a much larger magnitude of results: you may have results of several dozen FE analyses! Also, there are numerous other forms of simulation and applications that do not use the kind of graphics that FEA uses. A polynomial transfer function may be the high-fidelity model! 56

CAE and Design Optimization – Advanced

HyperStudy - Putting It All Together

It’s a good idea, therefore, to track the vital data only. Since dealing with a large volume of data is a common task, the accompanying assignments also introduce HyperGraph. A quick look at this application can help you plan your results-interpretation phase.

Performing The Study There are 5 distinct steps in HyperStudy. The first is essential: called the Study Setup, this is where you define design variables and responses. The former are the factors in the experiments you will conduct.

Study Setup A “study” is saved in an XML26 file. Unless you’re working with a data management tool (such as Altair Data Manager) it’s a good idea to create a separate folder for each study. The definition of the study consists of 1. One or more high-fidelity models – an OptiStruct FEA model, a MotionSolve multi-body model, etc. Unless you’re performing an MDO, you will need only one model for the study. 2. The design variables, or factors. These can be continuous or discrete, A design variable can be a numerical value, as is common with numerical analyses. Since HyperStudy is a general-purpose tool, the design variable can be quite general - it can even be a text-string. Remember that the levels of a design variable are meaningful only if it is discrete. If you want to carry out a screening run with continuous design variables, you can define them as discrete for the purpose of the screening. If the variable is continuous, you specify the bounds for each, instead of the levels. The subsequent DOE will decide which levels to use within these bounds, depending on the type of design specified for the experiment. 3. The responses. These are chosen from the results of the evaluation carried out by the high-fidelity model. For instance, the stress at a node can be selected either from a text-output file or a binary-results file.

26

Short for Extensible Markup Language. Uses tags, similar to HTML files. 57

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

4. Optionally, design variables can be linked. This could either be a design requirement (for example the specification that the fillet radius be dependent on the thickness) or because multiple solvers are used. 5. Sensitivity. This is optional. All solvers calculate the sensitivity of responses to factors. Some solvers actually write this information out, so you can use it in a subsequent study.

DOE Study Once the study has been defined, you specify what type of experiment you want to use. In general, Fractional Factorial, Placket-Burman, or D-Optimal are used for screening runs, while Box-Behnken, Central Composite, or D-Optimal are used to construct response surfaces. Remember that •

Full Factorial is not recommended if # factors > 5, since the combination of the number of factors and their levels can make it prohibitively expensive

•

Fractional Factorial is often used with just 2 levels for each factor

•

Taguchi does not take into account interactions between variables (it uses “orthogonal arrays”)

•

Plackett-Burman is called a geometric-design is number of runs is a multiple of 4, it is called a non-geometric design.

•

Central Composite is recommended for construction of second order response surfaces

For the study, you will need to distinguish between controlled and uncontrolled variables. The former consist of design variables that you want to manipulate as a part of your design, while the latter are due to uncontrollable noise. You can use different DOEs for the controlled and uncontrolled variables, depending on the amount of effort you can afford to spend.

58

CAE and Design Optimization – Advanced

HyperStudy - Putting It All Together

Once you have done this, you fine-tune your experiment. Recall the table drawn up earlier when the Fractional Factorial design was discussed. Most DOEs are specified as a matrix, showing the levels that will be used for each factor, and which effects will be confounded. Fine-tuning means you can edit the matrix to change the allocations for each factor / level and the interactions between them. Once the DOE has been specified, HyperStudy runs the analyses and extracts responses.

Approximation If the DOE is to be followed by an optimization or a stochastic analysis, it’s a good idea to build an approximation. Even if these are not planned, the use of approximations for variable screening is a very useful insight. You can define an approximation for any or all responses calculated by the DOE. Each response can have a different type of approximation. You can also build multiple approximations for a each response, which you would do if you are unclear on which is best suited. Remember that a 2-level experiment can only support a linear-regression model. The DOE points are now used to construct the approximation. You can use some of the points to define the approximation, and others to “validate” it. If, for instance, you have conducted 6 runs, you may use runs 1/3/5/6 to create the approximation. Then you can check the efficacy of the approximation by using runs 2/4 to validate it and calculate the residuals. The next logical step is to perform an ANOVA of the variables to determine whether or not they should be retained for further studies. Also, you would normally perform trade-off studies to determine the impact of changes in factors on the objective or objectives – sometimes called “what-if studies”.

Optimization Study Remember that an optimization is not the only reason HyperStudy is used. Therefore it’s logical that none of the previous steps defined any of the optimization-specific terms: constraints and objectives. Constraints are limits on the design variables – any design that crosses these limits is infeasible. Objectives are the quantitative measures of design 59

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

quality. HyperStudy allows you to perform either constrained or unconstrained optimization, and allows you to work with either a single objective or multiple objectives. HyperStudy provides several optimization algorithms. One, called the Adaptive Response Surface uses the sequential-search method described earlier. The other methods (Sequential Quadratic Programming, Method of Feasible Directions, Genetic Algorithm and the completely general userdefined) are beyond the scope of this book, and are described in the on-line help documentation. You can choose to perform to minimize or maximize the objective, to perform Min-Max optimization, or a system identification. The last is when you want to minimize the deviation of the objective from a target value. In any case, you can choose to use either the high-fidelity model or the approximate model (provided one has been built, of course). You will want to be judicious in your choice of iterations for the analysis. It is better to use a small limit to start with. Even if this is unsuccessful, you can use it to see how the optimization is progressing. Subsequent optimizations can restart from this analysis, meaning the initial investigation is not wasted.

Stochastic Study A stochastic study, which generated the PDF of responses based on PDFs of design variables, can be performed directly after the Study Setup is complete. Normal, Uniform, Triangular, Exponential and Weibull distributions are supported. A DOE is not required, since the sampling method is specified here: you can choose between

60

•

a simple random sample, which is the basic Monte Carlo method. A large number of runs is required for meaningful results

•

a Latin HyperCube sample which reduces the number of runs by distributing samples using the PDF of the variables

CAE and Design Optimization – Advanced

•

HyperStudy - Putting It All Together

a Hammersley sample, which is an improvement over the Latin HyperCube while still being less expensive than the simple Monte Carlo

Once this is done, HyperStudy evaluates the responses using either the high-fidelity model or the approximation, depending on your choice. That is, you explicitly tell HyperStudy which to use.

Reviewing Results Statistics are notorious for their ability to allow the analyst to draw a variety of different conclusions from the same data. Before discussing what to review and how, it’s useful to summarize the motivations for each type of study.

DOE The goals usually are one or more of the below: 1. to screen variables by looking for correlation between factors and responses, usually by running a fractional / reduced experiment for a large number of factors with a few levels (as low as 2) for each 2. to detect interactions between variables, usually by running a full factorial experiment after a screening run, small number of factors with more levels than a screening experiment 3. to construct an approximation for an optimization or a stochastic study

Approximation The main reasons we choose to build approximate models are: 1. to screen variables by using an ANOVA to detect significance of effect on response for given a confidence 2. to provide a well-behaved model that can be used for optimization or stochastic studies 3. to perform trade-off studies

61

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

Optimization An optimization is just a way to locate a particular point on the responsecurve or surface. The two reasons we search for such a point are 1. to locate a global minimum, normally used for a computationally demanding model, an MDO or an MOO 2. to locate values for the design variables so that a target value is matched as closely as possible

Stochastic Study While every problem is non-deterministic in reality, remember that it is often possible to get a good answer using a deterministic model and applying an appropriate factor of safety. It’s when the factor-of-safety approach is either too expensive or covers up too much detail that we turn to stochastic analyses. The main goals are: 1. to evaluate robustness of the design by comparing the coefficient-ofvariance of the responses and the coefficient-of-variance of the variables 2. to estimate reliability of the design by calculating the probability that responses lie in selected bands

Interpreting Graphics Displays Histograms and ant-hill plots are the principal means of presenting data in an easy-to comprehend fashion. An ant-hill plot, also called a scatter diagram, plots markers on a graph without fitting any curve. If one axis is a design variable and another is a response, a quick look at the plot indicates the presence or absence of any correlation. As an example, consider the plots shown below. In both, the y-axis is a response, while the x axis of each is a different design variable. The plot on the right shows a negative correlation. As the design variable increases, the response decreases. If the points were distributed more or less parallel to the x-axis, we would conclude that the response is independent of the design variable.

62

CAE and Design Optimization – Advanced

HyperStudy - Putting It All Together

The plot on the left shows a variation, but no discernible pattern. In all likelihood, there is another factor that is causing the change in the response. The advantage of using an ant-hill is that it allows for a reasonable comparison between a computer-experiment and a physical experiment.

Comparing individual values is normally not very realistic, since it doesn’t allow for “random” variations – that is, noise, which shows up as lack of repeatability in a physical test. Histograms are bar-charts, with data clubbed into “classes”. Classes are also called “groups”, “buckets” or bins. In general, the more the bins, the finer the resolution of the data. Of course, you need to have a large number of runs for the bins to be high. Consider a two-level experiment with a finite number of trials. The results of this are best shown as a histogram. If the number of trials approaches ∞, the distribution approaches a Normal distribution and the histogram approaches a density function. Connecting the tops of each bin gives the density functions – PDFs and CDFs.

HyperGraph HyperStudy provides quick and easy graphical display of most data, but there are times when you may want to generate different plots or forms of display of data. HyperGraph provides support for direct import of data from other HyperWorks applications, including HyperStudy. 63

HyperStudy - Putting It All Together

CAE and Design Optimization – Advanced

Additionally, analytical curves can be plotted in HyperGraph, allowing for useful correlations between analytical models or theoretical values and the results of numerical experiments. A brief introduction to HyperGraph is provided with the assignments that accompany the Instructor’s Manual.

You got to be very careful if you don't know where you're going, because you might not get there. Yogi Berra

64

CAE and Design Optimization – Advanced

Glossary and References

Glossary And References ANOVA

Short for Analysis of Variance. Related terms are MANOVA (for Multivariate ANOVA), ANCOVA (for analysis of covariance) and MANCOVA.

Ant-hill Plot

Also called a scatter plot, this is a graph showing the change in one variable as the other changes. It is used to look for a correlation between the variables.

Confidence Interval

The interval within which a variable may lie with a particular confidence. A 95% confidence is often used in engineering applications.

Constraint

A limit on a design variable.

Convex Function

A function that has only one minimum in the domain. This minimum is the “global minimum”.

Correlation

Usually expressed as a coefficient which is normalized to lie between – 1 and +1, indicates whether two variables are linked or not. A value of 0 implies the variables are independent. A value of +1 means a perfect positive correlation – that is, they increase together at the same rate. A value of –1 means as one increases the other decreases, at the same rate. The weaker the relationship between variables, the more the samples required to prove the existence of the relationship. If there’s no relationship, the sample will need to be ∞, i.e. the whole population.

Dependent Variable

The responses that are measured by the experimenter, who varies the dependent variables as per a plan. The plan is called the DOE.

Design Variable

See “variable”

Factor

See “variable”.

Global Variable

If an analysis involves multiple solvers (as in MDO), a global variable is one that is relevant in all contexts.

Independent Variable

The factors that are manipulated or varied by the experimenter.

Interaction

Means that two or more independent variables are linked, not independent of each other. This does not mean there’s a cause-andeffect relationship. It only means that changing one means the other

65

Glossary and References

CAE and Design Optimization – Advanced effect relationship. It only means that changing one means the other changes too. If an interaction is detected, further study is required to understand if there’s a cause-and-effect linkage between these two, or perhaps a third independent variable that’s the cause.

66

Latin Square

A square array in which each letter or symbol appears exactly once in each row and column. Used to design experiments by treating the different rows as levels of the first factor, the columns as levels of the second factor and the contents of each cell of the array as levels of the third factor.

Level

Values that a factor can take. Can be qualitative (“good” or “bad”) or quantitative. In CAE, levels are quantitative.

MDA

Multi Disciplinary Analysis

MDO

Multi-disciplinary Optimization. Used, for example, when your product needs to be designed for optimal performance as a mechanism and as a structure.

Min-Max

A formulation in which the maximum value of several responses is minimized.

MOO

Multi Objective Optimization

OFAT

One Factor At a Time. Method of testing for effect of variables on responses. Easier to do than a DOE, but may miss interactions if the exist.

Orthogonal Array

In the context of DOE, refers to a table of rows and columns such that for any pair of columns (i.e. factors) all combinations of levels occur, and all occur the same number of times.

Outlier

A point in the design space that does not follow the general pattern. This may be either due to noise, in which case the point should be ignored, or due to actual behavior of the system. In the latter case, the designer will need to decide how this affects design decisions.

Pareto Frontier

Relevant to MOO. Plot showing Pareto Optimal designs. Helps choose the best compromise, often on non-engineering bases.

Pareto Optimal

A design configuration in which none of the objectives can be improved without worsening at least one other objective.

RBDO

Reliability Based Design Optimization

Regression

Usually linear, it’s an equation that links two correlated variables. We calculate the regression line only after calculating the correlation coefficient and ensuring that the variables are indeed correlated.

CAE and Design Optimization – Advanced

Glossary and References

coefficient and ensuring that the variables are indeed correlated.

Residual

The difference between observed and predicted values. The smaller the residual, the better the model used for prediction. For instance, you will want to check the difference between the approximation and the high-fidelity model, to check whether you can reliably use the approximation. Measures the goodness-of-fit.

Resolution

Measure of the ability of a DOE to capture interactions. The higher the resolution, the better the capability. A full-factorial design has an ∞ resolution. Confounding reduces the resolution of an experiment. In practice, a resolution of 5 is excellent, while a resolution of 3 is sufficient for screening.

Response Surface

In the absence of a continuous function relating the objective to design variables, numerical experiments can be used to generate a table of objective-function values vs. design-variable values. A surface fitted through this table of points, called the Response Surface, is then used to find optimal locations.

Robust Design

A design method to reduce sensitivity of the design to inherent unpredictability of design parameters.

Saturated Design

A DOE in which the number of evaluation points equals the number of unknown coefficients in the approximation. It is not possible to test for a lack of fit.

Sensitivity

Rate of change. Normally = gradient of the response with respect to the design variable(s).

Significance

See Statistical Significance

Statistical Significance

A measure of whether a relationship between variables is a result of chance or not. Usually measured as a p-value. A p-value of 0.05 indicates that there’s a 5% probability that the relationship is due to luck (i.e. sampling error, for instance). A p-value of 0.9 indicates a 90% probability that the relationship is due to luck. Obviously a smaller p-value is indicative of a higher significance. Tests of significance depend on the sample size. See correlation.

Stochastic

Something that involves chance or probability, but with an overall and measurable trend or direction – this make sit possible to predict the behavior. Engineers frequently encounter stochastic processes and stochastic variables.

Variable

Also called factor. Anything that we can measure or control in an experiment. Also see dependent and independent variables. In engineering design, we usually use the term design variable.

67

Glossary and References

CAE and Design Optimization – Advanced engineering design, we usually use the term design variable.

Worst Case Design

A formulation in which the objective is minimized with respect to some variables and maximized with respect to others. This is not the same as a Min-Max design, where the minimization or maximization is done with respect to the same variables.

References Hill, T. & Lewicki, P. (2006). Statistics Methods and Applications. (http://www.statsoft.com/textbook/stathome.html) Statistics For The Utterly Confused, L.R.Jaisingh The Theory Of The Design Of Experiments, D.R.Cox and N.Reid Total Quality Management, D.H.Besterfield, C.Besterfield, G.H.Besterfield, M.Besterfield-Sacre NIST/SEMATECH e-Handbook of Statistical Methods, (http://www.itl.nist.gov/div898/handbook)

Other Resources www.altair-india.com/edu, which is periodically updated, contains case studies of actual usage. It also carries tips on software usage.

Which Distribution Should You Use? Judicious use of distributions helps take decisions in real-life. For example, suppose a showroom sold, on average, 17 green cars a day in the past. What is the probability that 20 will be sold tomorrow? Many scenarios are documented in literature: for instance, failure process are generally described using the Weibull distribution, and repair processes by the Lognormal distribution. In case of a lack of clarity on which distribution to use, one approach is to fit a curve to the data and then choose the closest distribution.

The Normal Distribution z is used to denote the standard distribution, which has a mean = 0 and a standard deviation = 1. For example, using the table below, the cell marked with an “&” (0.7257) gives the probability that the random variable will be < 68

CAE and Design Optimization – Advanced

Glossary and References

0.6, while the cell marked with a “$” (0.7357) is the probability that z will be < 0.63. z 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1

0 0.5000 0.5398 0.5793 0.6179 0.6554 0.6915 0.7257& 0.7580 0.7881 0.8159 0.8413 0.8643

0.01 0.5040 0.5438 0.5832 0.6217 0.6591 0.6950 0.7291 0.7611 0.7910 0.8186 0.8438 0.8665

0.02 0.5080 0.5478 0.5871 0.6255 0.6628 0.6985 0.7324 0.7642 0.7939 0.8212 0.8461 0.8686

0.03 0.5120 0.5517 0.5910 0.6293 0.6664 0.7019 0.7357$ 0.7673 0.7967 0.8238 0.8485 0.8708

0.04 0.5160 0.5557 0.5948 0.6331 0.6700 0.7054 0.7389 0.7704 0.7995 0.8264 0.8508 0.8729

0.05 0.5199 0.5596 0.5987 0.6368 0.6736 0.7088 0.7422 0.7734 0.8023 0.8289 0.8531 0.8749

0.06 0.5239 0.5636 0.6026 0.6406 0.6772 0.7123 0.7454 0.7764 0.8051 0.8315 0.8554 0.8770

0.07 0.5279 0.5675 0.6064 0.6443 0.6808 0.7157 0.7486 0.7794 0.8078 0.8340 0.8577 0.8790

0.08 0.5319 0.5714 0.6103 0.6480 0.6844 0.7190 0.7517 0.7823 0.8106 0.8365 0.8599 0.8810

0.09 0.5359 0.5753 0.6141 0.6517 0.6879 0.7224 0.7549 0.7852 0.8133 0.8389 0.8621 0.8830

The notation X = N(µ, σ) means that X is a random variable that follows a Normal Distribution, and has a mean = µ and a standard deviation = σ. Using Excel, the formula NORMSDIST(0.63) gives you 0.7357. For a general normal distribution with a mean µ and a standard deviation σ, the formula NORMDIST(0.63, µ, σ, TRUE) gives 0.7357. To do this manually, convert the random variable to the standard form by subtracting the mean and dividing by the standard deviation. That is, z=

x−µ

σ

Then use this transformed variable z with the above table to find P(z). You can, of course, use the Normal Table backwards: that is, you can look up the value that z should have for, let’s say, a 95% probability of occurrence. Remember that the values above are cumulative densities. That is, they represent the integral from –∞ to the given value. Some useful properties of the Normal Distribution: if X = N(µ1, σ1) and Y = N(µ2, σ2) X–Y

= N(µ1 - µ2, σ12 + σ22)

X+Y

= N(µ1 + µ2, σ12 + σ22) 69

Glossary and References

CAE and Design Optimization – Advanced

aX + bY

= N(aµ1 + bµ2, a2σ12 + b2σ22)

aX

= N(aµ1, a2σ12)

The F Distribution This is the distribution of the ratio of two estimates of the variance. It results when variables that follow a normal distribution are sampled, and the measured values are squared and summed. The distribution depends on the “degrees of freedom”: that is, the number of levels (within groups) and the number of samples (between groups). The variability within a group is assumed to occur because of error, while the variability across groups is assumed to occur because of true variance and error. ANOVA is a way to separate the error from the true variance.

70

Thank you for interesting in our services. We are a non-profit group that run this website to share documents. We need your help to maintenance this website.

To keep our site running, we need your help to cover our server cost (about $400/m), a small donation will help us a lot.