Contribution to Failure Description as the Phenomena

Contribution to Failure Description as the Phenomena We frequently work with the events' description besides other assessments in safety/risk assessment. In pure technical applications these events are related with the failure occurrence of equipment, a device, a system or an item. This contribution can be a complex problem for the term "failure" and its related characteristics. In this paper there are mentioned functions of an object and their description, classification of failures, main characteristics of failure, possible causes of failure, mechanisms of failure and consequences of failure and also other contributions related with failure very closely.


Introduction
Before we introduce the topic of a failure let us ask a simple question. Why do things actually break? Answers can vary. One of the answers might be the following statement which we are going to develop more. Usually the reason for this is that the applied load exceeds the dimension/robustness of the product. The load can be purely mechanical (force, tension, etc.), purely electrical (power, electromagnetic field, etc.), purely chemical (effect of chemical substances, etc.), general physical (warmth, radiation, etc.), or of a totally different nature. Whenever the applied load exceeds the assumed dimension of the item, unwanted (usually irreversible) processes start, and sooner or later a failure occurs. The load can be a one time load or it can be applied a number of times. Concerning the first instance, overload failure will occur and in the second case fatigue failure will occur. As time passes, the product could become weaker for any one of many reasons (unless a failure occurs immediately). One of the basic assumptions dealing with a failure is as follows. Before any failure incurred due to inner cause (e.g. operation or using an item) occurs, it is essential to have a device in operation. Idleness of an item or a system can end in a failure due to natural ageing, but in this case the initial mechanism is not properly understood. A relevant failure occurs mostly only during operation. Failure is a term widely used in technical practice especially concerning dependability theory. For the reliability practitioners failure is a basic term in dependability theory, and it is key and essential for observing stochastic relations of item behavior. It is an event which is used by probability theories on a general level, for they speak about a probability event. In dependability theory it is necessary to realize the fact of failure as a stochastic term, to understand its meaning, and to understand other links. And only because of this, mathematical tools, used in dependability, are not only a dead and boring "set" of formulas, relations and graphical expressions.
While observing a technical item we concentrate basically on possible causes of failures, their development over time, their process, mechanism, and of course their impact, effect, or other influences which might result from a failure occurrence. It is inevitable to realize that a failure is of key importance for operation and function of technical items. Theory and practice in particular shows us that failures occur under different situations, various circumstances, different conditions, etc. Theoretically, dealing with failures, we can describe their possible causes, nature of occurrence, process of development, and we are able to model them at the same time. We can see connections between individual groups of failures and their profiles. We can match a range of importance and numerical values with the failures, they can fall into groups, sets, etc. However, our biggest, continual effort is to eliminate failure occurrence, reduce its number (frequency), limit the number of its occurrences over a specified time period or in relation to another observed dependent quantity (mileage, cycles, etc.). Our intention is to be able to determine their occurrence so exactly that we could be prepared to face it as well as possible. Simply our aim is to get a better profile of an observed item from the view of its dependability and related properties.
Furthermore, we would like to describe possible classes of failures, their profiles, courses, development, consequences, and other relations which might be important for dependability theory and especially for this paper itself. The phenomena involved in this article are definitely not an example of a complete and synoptic list of all known and possible events assisting a failure. The aim of this article is to introduce the topic which is usually believed to be obvious, familiar and clear. However, reality need not match our ideas or the ideas of other people in full. The purpose of the paper is also to initiate the reader into the topic of a failure and at the same time to popularize it. Without full understanding we would not like the reader to absorb a piece of scripted information and not to obtain its complex form. A frequently used term might have a totally different meaning then. It would be great while working on it and finding it in a book, using theoretical tools, profiles, graphs, models, and other descriptions and contexts, we would be able to imagine there is definitely something more to the term [1, 2, 3, 4].

Current terminology situation
The following part speaks just briefly about the current terminology situation which is caused by the ISO/IEC representatives and national bodies. Failure according to the present version of the IEC 60050 (191) is defined as follows: "termination of the ability of an item to perform a required function". Note 1. After failure the item has a fault. Note 2. Failure is an event, as distinguished from fault, which is a state. Note 3. This concept as defined does not apply to items consisting of software only.
Failure according to the newly upgraded version IEC 60050 (191) is defined as follows: "loss of ability to perform as required" Note 1: When the loss of ability is caused by a pre-existing condition, the failure occurs when a particular set of circumstances is encountered (see latent fault 191-44-07). Note 2: A failure of an item is an event, as distinct from a fault of an item (191-44-01), which is a state.
Note 3: Qualifiers may be used to classify failures according to the severity of consequences, such as catastrophic, critical, major, minor, marginal and insignificant, the definitions depending upon the field of application.
It results from these definitions and further analysis that the term "failure" will be understood as an event which leads straight to either a partial or complete loss of ability of an item to fulfill a required function. Most terms that are specified in the introduction dealing with the description of failure factors and profiles might also be found in a basic source document mentioned before.
At present it just so happens that because of modification and updating of terminology, an existing view of understanding a failure and relating facts can be changed. Just to demonstrate the complexity of the present state we introduce the following facts. According to the notes of the term failure mentioned above, see IEC 60050 (191)/1990, an item after failure has a fault. ("An item after failure has a fault".) Owing to continual discussions about this topic it is impossible to ignore the idea that a fault does not follow a failure but precedes it. This technical incompatibility together with many others has not been solved yet but their form has been very much discussed. A possible decision in favour of a new view will influence radically the existing approach, conception and observation of the failure.
While working with the term failure, as well as with relating states, it is necessary to take the current terminology mismatch into account and to adapt possible decisions to it. The possibility of a realized change has to be accepted along with all the suffered consequences. Unfortunately, this change will violate the understanding of all existing terms/disciplines introduced so far that deal with a proper function/failure and dependability.

What might the failure affect
In this part it is necessary to draw attention to some relating events. We are dealing with a failure which prevents the items ability from performing a required function (either the main one, the minor one, or some other one as detailed below). It results from all the definitions in the paper that the inability of a system or a product to operate in a required way is a key term determining a failure.
Based on many studies and approaches a factual scale of individual functions description in complex conception was formed for a system. On the basis of these assumptions it is also essential to distinguish the influence of a failure on a function performed by an item. A failure occurrence might affect the range of the function. An outline of item functions is provided to make the understanding much easier, and failures occurrence is not strictly limited to a kind of an item function.
A required function -specifies an item task. A correct, exact and unequivocal definition is a primary, starting point for all dependability definitions as well as for a right failure definition. Operation conditionsaffect significantly both dependability and especially possible failure occurrence, hence why they have to be determined very thoroughly.

1) Main function:
-an intended (required) or primary function 2) Minor function: -need for providing main function 3) Supporting function: -the aim is to provide protection of people and an environment from potential damage regarding main or minor function failure as well as common support (brakes, circuit breakers, filters, etc.). 4) Information function: -it provides conditions, monitoring, measuring, diagnostics, etc. (it refers to displays, indicators etc.). 5) Interface function: -it provides an interface between an assessed item and other items (cabling, operating elements, switches, breakers, etc.).
The required function and/or operation conditions might be time dependent.
In this case a mission profile has to be determined and all dependability viewpoints have to be related to it. A representative mission profile and corresponding dependability targets have to be stated in the item's specification. The mission duration is often/usually considered as a parameter t, that is time. The dependability function -especially the reliability function is designated as R(t). R(t) is the probability that no failure at item level will occur in the interval (0;t〉, often with the assumption R(0) = 1 -it means that at the time t = 0 the object was in the state of operation. In order to avoid confusion a distinction between predicted and estimated (assessed) dependability should be made on the basis of a real evaluation during operation or tests. The predicted dependability is calculated on the basis of the item's dependability structure and the failure rate of its component. The estimated dependability is specified on the basis of a statistical evaluation of dependability tests or field data by known operating and environmental conditions. Failure: -it occurs when an item terminates its ability to perform its required function. However simple the definition might look, it is difficult to apply it to complex items/systems. The basic operating time is generally a random variable. It is often reasonably long but on the other hand it might be very short, caused by systematic failure influence for example. It can also be caused by early failure influence resulting from a transient event at turn-on. A general presumption in investigating failure-free operating times is that at t = 0 which means that in an instant t = 0 the object is free of defects and systematic failures and therefore it is able to operate one hundred per cent. Besides their relative frequency, failures can be categorized according to one of the views mentioned before (mode, course, cause, consequences, mechanisms, etc.) [2,3].
-according to a place of occurrence -during a test; -during operation.
These are the very basic failures categories and factors they fall into, and this is the common way of how to work and deal with them. Moreover, we can determine some other (supplementary) failure categories but their degradation physical, chemical, or other processes leading to a failure Valis D., Bartlett M. L. 218 presence here is not possible due to space limits of the paper. The authors of the paper may provide more information for those who are interested.

Failure occurrence cause
According to the IEC 60050 (191) the circumstances occurring during design, manufacture or use which have resulted in a failure are the cause of a failure. To know the cause of a failure is useful in case we want to decide how to prevent a failure or its reoccurrence. Failure causes can be classified in relation to the life cycle of the system (see also figure 1 bellow) [3,4].
Cause -the cause of a failure can be intrinsic, due to weaknesses in the item and/or wearout, or extrinsic, due to errors, misuse or mishandling during the design, production and especially the use itself. Extrinsic causes often lead to systematic failures which are deterministic and might be considered like defects (dynamic defects in software quality). Defects are present at t=0, even if they cannot be discovered at t=0. Failures always seem to appear in time, even if the time to failure is very short as it can be with systematic or early failures.

1) Design failure
-occurs due to inadequate design. It is basically any failure directly related to item design. It means that due to item design a part of the whole degraded or got damaged and this resulted in a failure of the whole.
2) Weakness failure -occurs due to weakness (internal) inherent or induced in the system so that the system cannot stand the stress it encounters in its normal environment.
3) Manufacturing failure -a failure caused by nonconformity during manufacturing and processing. It is basically any failure caused by faulty processing, or inadequate manufacturing, or an error made while controlling the process during manufacturing, tests and repairs.

Failure mechanism
The failure mechanism is a very complex and extensive passage of the failure profile. It can be sudden or gradual with its relating manifestations. Failure mechanism -physical, chemical, electrical, thermal or other process that results in failure.
Mode (manifestation, course) -the mode of a failure is a symptom (local effect) by which a failure is observed. For example -opens, shorts, or drifts (for electronic components). Brittle rupture, creep, cracking, seizure, or fatigue (for mechanical components), etc.
The connections related to these aspects of a failure are shown in the following description: 1. Intermitted (incoherent) failure -a failure which lasts only for a short time. A good example of this is a fault that occurs only under certain conditions occurring intermittently (irregularly).
2. Extended failure -failures that occur until some corrective action rectifies the failure. They can be divided into the following two categories: a) Sudden failure -a failure which occurs without warning b) Gradual failure -a failure which occurs with signals to warn of the occurrence. Usually it is a case of significant behavior changes (decreasing performance, increasing temperature, rising vibrations, etc.).
We have to distinguish among different failure mechanisms of mechanical, electrical and hydraulic parts. The differentiation is so complex that it can not be easily presented in this paper.

Failure consequences
Many information sources use the term failure consequence. Also many standards define them and work with them differently. The following part should help to clarify the concept of failure consequences, as we also know them from many reliability analyses.
Effect -the effect (consequence) of a failure can be different if considered on the item itself or at a higher level. A usual classification of a failure has usually the following qualitative profile and is: nonrelevant, partial, complete, …, critical failure. Since a failure can also cause further failures in an item or a system, a distinction between primary and secondary failure is important.
A classification of the severity of a failure mode in accordance with the MIL-STD 882 is listed: 1) Catastrophic failure -a failure that can lead to death or can cause total system (item) loss. 2) Critical failure -a failure which results in many serious injuries or major system damage. Sometimes we think of it as a failure, or combination of failures, that prevents an item from performing a required mission.

3) Marginal failure
-a failure that leads to minor injury or minor system damage. 4) Negligible failure -a failure that leads to less than minor injury of system damage.
Another classification can be found in the RCM approach where the following classes are used: Failures with safety consequences; Failures with environmental consequences; Failures with operational consequences; Failures with non-operational consequences.
A classification of the failure severity into groups (categories) is given in more standards. Each of them is specific in a way and corresponds with a presupposed application. The IEC 61882, IEC 60812, IEC 50 126 and many others are some of the examples. We do not have the ambition to make a complete list of failure consequences and their classification.

Sources of failure profile determination
We do not want to speak about basic and clear failure measures and characteristics which are obviously well known in our community. Our attempt is to present different sources of failure data/measures/characteristic obtaining. The main sources are: 1) Data on elements' reliability guaranteed by a producer -there is no need to expand on it; 2) Conclusive test results (observation) of the same (comparable) item reliability. It is based on the standardized assessment of reliability tests of technical items. The methods and methodologies of how to conduct tests are standardized for different equipment.
3) Predictions -standardised calculation of item's reliability based on a reliable source (MIL HDBK 217F). This is the American military standard that enables the data on electronic elements' reliability to be estimated. It is commonly used when estimating the elements' failure rate especially in military applications.
4) Specialized information databases on elements' reliability (specialized in terms of elements' profile or conditions of usage). Specialized information databases on elements' reliability are usually established and kept to meet the needs of single industrial branches or technical areas. The data acquired when observing items in operation or the results of specialized dependability tests are collected in the databases. One of the most respectable and frequently used databases on reliability in this area is the database established and kept by the Reliability analyses centre (RAC) which at present distributes three important databases on the commercial basis: EPRD-97; NPRD-95; FMD-97; SPIDR 2007 [5,6,7,8].
5) General information database on elements' reliability. These databases are usually published as parts of specialized literature in the dependability area. The information put in them is usually very general.
6) Expert estimations. Expert estimations of numerical values of reliability measures might be used only when appropriate values cannot be specified by a different, more reliable method. The authors of the article know from experience that this solution is accepted only as an exception because in most cases the numerical values of reliability measures can be determined by other methods described in this paper.

Conclusion
This contribution is supposed to give a general overview in the area of the basic term "a failure" as described above. As the understanding of all related matters is very complex it is not possible to express complete knowledge and experience here. Some reliability and safety engineers might be confused while beginning with a specific analysis (e.g. FMECA, PHA, JSA, OSHA, etc.). The main benefit of this contribution is supposed to be a general and introductive material for understanding a failure, its full profile with all related characteristics. The next purpose of the paper is to provide a hand (possibly guide lines) to orient the analyst on the appropriate information sources which are necessary for the analysis. Due to the limited space within the paper, the information provided is not complete (more an overview), therefore those who are interested we kindly ask to contact the authors.