Failure Mode and Effects Analysis (FMEA), is a systematic approach for performing failure analysis of a product, process, or system. A FMEA begins by evaluating all possible failure modes of the item being analyzed, and then determining the possible causes and resulting effects of those failures. You then assess the risk level associated with each of the failure modes, based on a set of established criteria. Finally, you find ways to detect, mitigate, or prevent failures deemed most critical.
FMEA is a well-organized and step-by-step approach to identifying potential failures of a product, service, design, or process, and then effectively addressing those failures that represent the highest risk. FMEA was first developed by the U.S. Department of Defense in the 1940s. Since then, it has been adopted as a valuable tool to help companies establish and improve their quality objectives.
FMEAs help to support and advance sound engineering design principles. FMEA is not meant to replace good engineering design, but to augment design practices by identifying and addressing potential design deficiencies. The effective use of FMEAs results in high-quality products – a central aim for reliability and quality-conscious organizations. It is why you see FMEAs performed across a broad range of industries, and why FMEAs consistently remain one of the most popular methodologies employed in reliability engineering.
Many types of FMEA exist. Although there are differences in use and implementation, all FMEAs have the same essential goal. The aim is to assess risk and to detect, mitigate, or eliminate critical risk events.
Some examples of FMEA types include:
- Design (DFMEA): A DFMEA takes place during the design stage of a product. It aims to detect and mitigate the effects of failures before a product is manufactured and shipped out.
- Process (PFMEA): A PFMEA is performed on a process (as opposed to a design). The goal is to identify problems in a process and eliminate them.
- Manufacturing: A manufacturing FMEA is a type of PFMEA that analyzes risks that are part of the manufacturing process of a product.
- Service: A service FMEA performs the steps of FMEA on a service provided by your company.
- Software: Software FMEAs analyze potential failures with software and the effects of those failures on a system.
- System or functional: A system or functional FMEA analyzes the function of a system. A system or functional FMEA might be performed to assess the function of a design before the design itself is finalized.
- FMECA: A Failure Mode, Effects, and Criticality Analysis (FMECA) is a type of FMEA that assesses failure risk through the use of criticality values based on failure rates of system components.
- FMEA-MSR: FMEA Monitoring and System Response (FMEA-MSR) was introduced in the 2019 AIAG & VDA FMEA Handbook. FMEA-MSR analyzes the risk of failure while a product is in use by a customer.
Many companies create custom FMEAs based on their specific needs. Oftentimes a company will take an FMEA standard that defines how a FMEA should be completed – such as ARP5580, AIAG, AIAG & VDA, MIL-STD-1629A, or SAE J1739 – and adapt it to their requirements.
Typically, FMEAs are done in a worksheet format using a team-based approach. The worksheet compiles all the potential ways the system may fail. The team then determines the causes and resulting effects of those failures. Review of the various effects is then done to determine the level of risk associated with each.
There are various ways to assess risk. Two commonly used FMEA risk assessment methods are Risk Priority Number (RPN) and Action Priority (AP). RPN considers the Severity, Occurrence, and Detection of the item. The AP metric was introduced in the 2019 AIAG & VDA FMEA standard handbook. AP values can be Low, Medium, or High. Alternatively, FMECA uses criticality numbers to assess risk.
By categorizing risk levels, the team can determine which items require action and then develop a plan accordingly.
Sometimes the FMEA worksheet is completed by using a spreadsheet application. However, analysts typically turn to FMEA-specific software applications, which are far superior to spreadsheets for power and efficiency. FMEA software is designed expressly to guide the FMEA process and offer the features necessary for complete analysis.
Lastly, no matter what type of FMEA is being used, it is most effective when viewed as a living document. It should be continually updated as new information emerges, processes change, or product design evolves.
Quality and reliability are essential drivers for any company, and FMEAs are a proven and effective technique used throughout industries all over the world to help achieve those objectives.
Some governmental and regulatory bodies require FMEAs, as do certain industries, such as medical devices and aerospace. In other sectors, such as the automotive industry, FMEAs are widely accepted and considered essential. In fact, the automotive industry is a leading driver in the development of FMEA protocols and procedures.
Even when FMEAs are not required for compliance reasons, organizations often decide to use them as a standard practice for quality improvement objectives. Many companies worldwide recognize the value of FMEAs, especially as it relates to the cost of poor quality (COPQ).
Some of the benefits of using FMEA are:
- Improves products: Using FMEA before a product hits the market gives you the ability to fine-tune the design for optimal quality.
- Enhances quality and reliability: FMEAs improve the quality of a product or service by ensuring you have taken the time to assess potential problem areas and address them before they become safety issues or catastrophic failures.
- Offers a baseline: FMEAs give you a baseline to refer to when developing new products by providing a lessons-learned reference to avoid repeating past mistakes.
- Boosts customer satisfaction: Product reliability consistently rates as one of the most important elements in customer satisfaction. FMEAs help to ensure you are delivering products customers are happy with.
How does FMEA work? Although there will be some variation when it comes to how your company performs FMEA compared to how another company performs FMEA, generally speaking, the steps in the process are similar across all types. For best results, FMEA should be a collaborative process that takes place across the lifecycle of a product or process, from design to deployment.
1. Identify Potential Failures
You start a FMEA by breaking down the item you are analyzing, whether it is a product or process, into its component parts. In the case of a product, the breakdown may delineate each of the hardware components that comprise the system. For a process, the breakdown delineates the steps of the process. You decide how finely itemized the breakdown is depending on how detailed you want your FMEA to be.
You then systematically go through those elements and determine the failure modes, or the possible ways the given item can potentially fail. A team approach is helpful at this point in order to help brainstorm all potential failure modes.
2. Analyze Causes and Effects of the Failures
After you’ve identified the possible failure modes, the next step is to identify the possible causes of the failures and the effects of those failures on your system or process. There may be more than one cause and more than one effect for each failure mode.
Some questions to consider when identifying the causes:
- What components of the system may be at fault?
- Are there environmental factors that are a concern?
- How could a customer mistakenly cause the failure?
Once again, a team approach for cause analysis is helpful in order to fully identify all possible causes. For example, causes could be:
- Defective component on motherboard
- Power surge
- Customer forgot to calibrate the device prior to use
Some questions to ask when analyzing the effects of failure include:
- What happens when the failure takes place?
- How does the failure impact a customer?
- How much waste is generated by the failure?
Effects will be unique to your organization and analysis. For example, effects may be something like:
- Operator injury
- Strange noises
- System crash
There may even be “No effect” in some cases.
3. Rank the Risk Level
Once you have the failure modes and effects recorded, the next step is to determine the risk level associated with each item. Risk can be assessed in more than one way, and you can decide which is the best approach for your organization.
One commonly used approach for risk assessment is RPN, or Risk Priority Number. RPN is a widely adopted method for risk assessment, and used often in DFMEAs, PFMEAs, and FMEAs based on automotive standards. Three factors are taken into account to compute RPN, and you assess each of these for each item in your FMEA:
- Severity: Denotes the seriousness of the problem if it happens, with a focus on the consequences. The higher the number, the greater the severity.
- Occurrence: Denotes how likely the issue is to occur. To determine the rate of occurrence, you’ll want to look at all the potential causes of a failure and the chance that those causes will occur.
- Detection: Denotes how easy or difficult it is to identify the problem. A higher rating means an issue is less likely to be detected either by engineers during the test phases of product development or by customers.
RPN is usually calculated as Severity * Occurrence * Detection. Using a 1 to 10 scale for each of these results in RPN values ranging from 1 to 1000.
Another method of risk assessment is Action Priority, or AP. AP was introduced in the AIAG & VDA FMEA Handbook. AP also uses the same Severity, Occurrence, and Detection factors that RPN is based on. However, AP takes those factors into account and determines a rank for each item:
- High (H): Action is needed to correct or review the issue.
- Medium (M): Action should be taken to review or correct the concern.
- Low (L): Action could be taken to review the issue, but it is low priority.
Criticality is a numerical approach to risk assessment typically used in FMECAs based on MIL-STD-1629. Both failure mode criticality and item criticality values are computed based on failure rates, failure mode probability, and failure mode percentage rates.
Using Other Measures
Oftentimes, organizations define their own unique risk assessment protocol. For example, some may tweak the RPN assessment, or may customize the Severity, Occurrence, and Detection values that typically ranging from 1 to 10. Or, they may overweight Severity. Or they may ignore Detection. In some cases, they may define a completely new metric.
Whatever method you choose, the idea is to be able to rank the items in your FMEA in order to prioritize the work necessary to make product or process changes in order to lower risk.
4. Come up with a Recommended Action Plan
Once the potential problems are ranked, you have a starting point to determine which items to focus on for improvement. For example, if you are using RPN which ranges from 1 to 1000, you may decide that all items with an RPN over 300 must be addressed. Or, you may use AP and decide that all items designated as High must be changed in order to lower their AP to Medium.
This is the planning part of FMEA: the team must come up with a list of Recommended Actions, or action plans, in order to lower the risk associated with failures. In some cases, you may determine you can eliminate the risk entirely. In other cases, you may only be able to minimize or mitigate the level of risk. Or, you may determine your only reasonable course of action is to detect the issue and notify the customer.
Whatever the plan of attack, the recommended actions are then added to the FMEA and assigned to team members to complete, typically with an attached due date.
5. Implement Recommended Action Plan and Assess Risk Reduction
As the team members complete the action items, the FMEA is updated to describe the work done and when it was completed. Then, the item is reassessed based on this design or process change. Typically, revised values of the risk assessment method being used, such as Revised RPN or Revised AP, are then computed.
The process in repeated across all items you have deemed as requiring action. At that point, the overall risk levels in your FMEA should fall within your established guidelines.
6. Update FMEA Document over Time
As the product, system, or process grows and changes over time, the resulting FMEA document needs to be kept up to date. In other words, as your product or process evolves, so too should your FMEA. As changes occur, go back and repeat the FMEA process considering the design or process modification. This ensures that your risk level remains in control and that your reliability and quality objectives are met.
Going through the FMEA process considering a very simplistic FMEA example may be helpful. Let’s assume we manufacture car batteries.
Step 1: Identify Potential Failures
Clearly, there are a multitude of potential failure modes! However, for our simple example, we’ll consider just a single failure mode: the battery goes dead.
Step 2: Analyze the Causes and Effects of the Failure
What are the possible causes? One cause could be that the driver consistently forgets to turn his headlights off.
What are the effects of a dead battery? One effect is that the car does not start.
Step 3: Rank the Risk Level
We are going to use standard RPN definition for risk assessment, and therefore need to determine the Severity, Occurrence, and Detection. For this case, there is no Detection, the driver attempts to start the car and it doesn’t work, so it is a 10 (not detected at all). The Severity of the item is not catastrophic, but definitely an annoyance to the driver and not good for customer satisfaction. We’ll assess the Severity at a 5. Lastly, in this situation (our forgetful driver), the Occurrence is high. We’ll assess Occurrence at a 7.
Our resulting RPN = Severity * Occurrence * Detection = 5 * 7 * 10 = 350.
We also have decided that RPNs above 300 are above an acceptable level. So, we need to address this problem.
Step 4: Come up with a Recommended Action Plan
There are a couple of actions we could take to reduce this risk. We could make the car sound an alarm when the car is turned off, the headlights are on, and the door is opened. Or, we could automatically turn the headlights off after a period of inactivity. We decide the second action is easier and less costly (we don’t need an extra sensor to determine when the door is opened), so we assign our software engineer the task of adding in a program to turn the lights off after 5 minutes of inactivity.
Step 5: Implement Recommended Action and Assess Risk Reduction
Once our engineer completes the programming task, we can go back and determine our Revised RPN value. The Severity and Detection remain the same. However, the Occurrence drops to a 1 because the driver can no longer leave the lights on.
Our Revised RPN = 5 * 1 * 10 = 50.
We’ve greatly improved our risk with this small change!
From this very simple example, you can get an idea of how large and complex FMEA can become, especially when looking in detail at a design, and considering how complex systems can be. It is why it is vital that your FMEAs remain organized and well managed. The best way to do that is to use a software tool built for FMEAs.
Contact Relyence to Learn More About Our FMEA Software
Relyence FMEA is the most comprehensive and advanced software package for failure mode and effects analysis available today. Just a few of the innovative and unique-to-Relyence capabilities include always-in-syncTM technology for control of your FMEA data throughout your process, autocomplete for data consistency, and the powerful Knowledge Bank for FMEA data reusability. Our completely browser-based package means not only that you have access anytime, anywhere, but also that you are free from weighty installations.