Ffort: A Benchmark Suite for Fault Tree Analysis

This paper presents FFORT (the Fault tree FOResT): A large, diverse, extendable, and open benchmark suite consisting of fault tree models, together with relevant metadata. Fault trees are a common formalism in reliability engineering, and the FFORT benchmark brings together a large and representative suite of fault tree models. The benchmark provides each fault tree model in standard Galileo format, together with references to its origin, and a textual and/or graphical description of the tree. This includes quantitative information such as failure rates, and the results of quantitative analyses of standard reliability metrics, such as the system reliability, availability and mean time to failure. Thus, the FFORT benchmark provides: (1) Examples of how fault trees are used in various domains; (2) A large class of tree models to evaluate fault tree methods and tools; (3) Results of analyses to compare newly developed methods with the benchmark results. Currently, the benchmark suite contains 202 fault tree models of great diversity in terms of size, type, and application domain. The benchmark offers statistics on several relevant model features, indicating e.g. how often such features occur in the benchmark, as well as search facilities for fault tree models with the desired features. In addition to the trees already collected, the website provides a user-friendly submission page, allowing the general public to contribute with more fault trees and/or analysis results with new methods. Thereby, we aim to provide an open-access, representative collection of fault trees at the state of the art in modeling and analysis.


Introduction
Fault trees (FTs) are a widely-used formalism for safety and reliability analysis (Ericson (1999); Stamatelatos et al. (2002); Ruijters and Stoelinga (2015)). An FT is a graphical representation of the possible failure modes of a system, i.e. the distinct processes by which determined system functionality failures can be observed (Rausand and Høyland (2004)), broken down into intermediate failures and their interactions. From quantitative information about elementary failure behavior (like component failure rates), fault tree analysis provides quantitative results on metrics such as time-dependent system failure probability and average system downtime.
Fault trees are popular as a clear graphical formalism to analyze RAMS-reliability, availability, maintainability, and safety-metrics of complex systems. Many extensions and analysis methods and tools have been developed from the original FT concept (Ruijters and Stoelinga (2015)). However, a systematic way of comparing all these methods is lacking: Many published papers use their own examples and case studies to evaluate the merits of their new techniques. From a methodological point of view this practice has significant disadvantages: (1) It is possible to present case studies that are biased in favor of the newly introduced methods and tools; (2) When papers use their own examples, an objective comparison of different methods becomes difficult; (3) Due to lack of sources, the number of examples and case studies used in publications is often relatively small. This paper makes an important step towards a more systematic comparison in fault tree analysis research, namely by providing a large, open * , quantitative, searchable, and extensible benchmark suite for fault tree models. In this sense, a major feature of FFORT is the diversity of its content, where FTs have different size (number of basic events and gates), type (static vs. dynamic, repairable or not), failure behavior (diverse gates and probability distributions in basic events), and RAMS metrics computed. Table 1 offers an overview of this.
Furthermore, reproducibility has gained renewed interest in computer science, particularly in research on formal methods (Schlick et al. (2018)). According to the ACM (2018), such reproducibility should be tested by a different team using a different experimental setup from the original research. So far, there has been no systematic effort to reproduce published results in fault tree analysis. Thus and in addition to the contributions mentioned above, we reproduce parts of previously published papers by analyzing their fault trees in a systematic way, thereby validating the published results.
To these aims, the benchmark suite provides for each FT it contains: • A complete textual representation in the standard Galileo format, following the syntactic guidelines detailed in Sect. 4. • A summarized description and pictorial illustration (taken from the authors when available) to facilitate understanding. • Quantitative information such as rates and failure probabilities of basic events. • Values of RAMS metrics, namely previouslypublished values if available, plus newly computed values for reference purposes.
The FTs-and their metadata-that FFORT thus provides come from an unbiased collection of case studies gathered from scientific literature, ranging from the classic NASA fault tree handbook (Stamatelatos et al. (2002)) to modern papers on advanced analysis techniques. These include several industrial cases modelling software or physical assets of companies, such as PCBAs, vehicle guidance, railways, ship mooring, and tank storage. Search and filter facilities are provided to select FTs with specific characteristics; an essential feature in the benchmark as per the great diversity of FTs offered.
Thus, FFORT makes significant steps towards: ( In addition to the trees already collected, we provide a user-friendly submission page, allowing modelers to upload more examples of FTs or analysis results with new methods. Thereby, we aim to continue to provide an open-access, representative collection of fault trees at the state of the art in modeling and analysis.
The structure of this paper is as follows: Sect. 2 provides a brief overview of fault trees. Sect. 3 describes the data and metadata that is stored in FFORT. Sect. 4 explains the methodology used to collect the FTs currently in FFORT, while Sect. 5 provides some statistics about these FTs. Sect. 6 shows the user interface to access FFORT, before ending with a conclusion and discussion in Sect. 7.

Fault Trees
Fault tree analysis is an industry-standard, widely used formalism to graphically model systems and analyse them for reliability and safety (IEC61025 (2006)). Entities like NASA, NRC, ProRail, Boeing, etc. use fault tree analysis to ensure compliance to both national and international safety regulations. FTs model the interactions of component failures that may lead to (sub-) system failures, thereby supporting the analysis of a wide range of qualitative and quantitative dependability analyses.
A fault tree is a directed acyclic graph, in which the leaves are called basic events and the remaining nodes are called gates. Basic events specify elementary failure causes (e.g., failures of individual components, external causes), while the gates specify how these failures combine to cause system level failure. The root of the FT is called the top level event and denotes system failure. Fig. 1 shows an example of a fault tree. The top level event is called "System." Its symbol ( ) denotes an OR-gate, representing that a failure of either child causes a system failure. Its children are the basic event "HCR" and the AND-Gate "Batteries." The latter denotes that the battery subsystem requires both batteries B 1 and B 2 to fail for the subsystem (and thus the entire system) to fail. Standard (or static) FTs support only boolean gates (i.e., AND, OR, and k-out-of-N). Various extensions have been developed supporting more complex combinations. The most prominent is the dynamic fault tree (DFT) by Dugan et al. (1990) which adds PAND (Priority-AND) gates imposing temporal requirements, SPARE gates used to denote spare parts, FDEP (function dependency) gates denoting subtrees that cause other subtrees to fail, and SEQ (sequence-enforcer) gates that denote that certain failures can only occur in a particular order. A more recent extension by Ruijters et al. (2016) is the fault maintenance tree, which adds inspection and repair modules to specify complex maintenance and repair policies.
Fault trees can be analyzed to obtain various metrics relevant to reliability engineering. Qualitative analysis provides information such as cut sets: Sets of component that, if failed, cause the system to fail. If basic events are decorated with failures probabilities or rates, quantitative metrics can be calculated. Which properties are applicable depends on the provided information: Given failure probabilities, the system reliability (probability of no system failure) can be computed. If failure rates over time are available, one can compute timed reliability (probability of no system failure occurring before a given mission time) or mean time to failure (MTTF). When repair information is available as well, the system availability (average proportion of time that the system is not failed when operating under standard conditions) can also be computed.

The FFORT Benchmark Suite
In addition to storing the trees themselves, FFORT also stores metadata to explain each FT and its origins. Since we collect FTs from published literature, FFORT does not need to store the full details of each tree, rather referring to the original publication for full details. Nonetheless, we try to provide sufficient context for each tree to be reasonably understandable without reading the entire paper.
FFORT stores all FTs as variants of a particular family, with each variant possibly having some associated set of results. We have identified two patterns to the variants: (1) Different FTs modeling the same system, e.g. to suit different analysis tools; (2) FTs modeling similar systems, e.g. to explore the effects of adding certain redundancies. Many families contain only a single variant, as that is the only published FT.
The FTs themselves (i.e. the data) are stored in the Galileo format (as described by Sullivan and Dugan (1998); Sullivan et al. (1999)), while the associated metadata is stored in JSON. We store the following metadata-required fields indicated in bold-for each family: • Name and short description of the FT.
• Reference to the publication first describing the FT (title, author names, publication year, and DOI or website link if available online). • Name and e-mail address of the person who submitted the FT to FFORT. • Date in which the FT was added to FFORT.
• Additional references providing further information.
In turn, for each variant in a family we store the following metadata: • Name and description if the family has multiple variants. In addition, we automatically extract quantitative information about the fault trees data, for statistical and search/filtering purposes. In particular, we calculate for each FT: • The number of BEs and their attributes.
• The number of gates of each type.
• Whether the FT supports any type of repairs.
Furthermore, we provide reference results of standard RAMS metrics for each FT in the collection. We apply two tools to every FT for this: Storm-DFT by Volk et al. (2018) and DFTCalc by Arnold et al. (2013), the latter with its 'exact' backend if possible, otherwise with its IMCA backend. For non-repairable FTs we calculate mean time to failure and reliability; for repairable FTs we calculate availability in addition. In the case of reliability, if a particular time point was used in previous results, we calculate reliability for that time as well; otherwise we compute reliability for time t = 1. Some FTs could not be analyzed using one or both of these tools due to restrictions on the supported FT featurese.g., Storm-DFT cannot process repairable FTsor computational resource limits.

Collection Methodology
To populate FFORT, we have applied specific criteria in our literary survey, to ensure a consistent and homogeneous representation of the data in line with the required features described in Sect. 3.

Validation rules
When considering fault tree FT (family or variant) for inclusion in the benchmark, the following conditions were checked: (1) FT must have been introduced in a referenceable publication, namely a scientific article published in a journal or conference or workshop proceedings, or a published book.
(2) Nontrivial size; specifically FT must have at east 10 nodes, unless it is part of a family where there are other trees that satisfy this condition.
(3) Structural unambiguity, i.e. there must be either a complete graphical representation or a clear description (or a combination of these) describing FT entirely. (4) Analyzability by public software, i.e. there must exist some publicly available tool that can compute the metrics for FT that appear in the publication it was taken from † . Notice this does not necessarily rule out trees † We target theoretical analyzability, disregarding practical hardships like tree size or computation time.
appearing in publications where results were computed analytically ("by hand"), as long as the corresponding Galileo encoding of FT can theoretically be analyzed by existing tools. (5) FT must contain quantitative information that admits the computation of some standard RAMS metric, e.g. availability, reliability, (untimed) failure probability, etc.
Item 5 rules out fault trees which, due to the available published information, are susceptible to qualitative analyses alone, for instance those studied by Zhang et al. (2018). This is motivated by the focus of FFORT on quantitative fault tree analysis, oriented to the development and improvement of software tools that target this goal.

Naming and structural conventions
The data content of the fault trees in FFORT is stored as plain text files, written in the standard Galileo format ‡ . The file extension is dft and, to facilitate parsing by software tools, we use the following conventions for the nodes of the tree: • Names, be these of basic events or gates, are enclosed in double quotes "like this." • The top level event (i.e. the root node of the tree) is named "System". • The names of all other nodes follow the publication from which the tree was extracted: ◮ if the node name in the publication is an abbreviation (e.g. HCR_2), the string is used verbatim enclosed in double quotes as per the first item (e.g. "HCR_2"); ◮ if a long name is used instead, possibly including spaces (e.g. higher cabin ‡ Described at https: //dftbenchmarks.utwente.nl/galileo.html relay 2), spaces are stripped and the string is written in camel case (e.g. "higherCabinRelay2").
• After the tree root on the first line, each node of the tree appears as a single line in the file, describing it according to the Galileo format. • The order of declaration of the nodes in the file follows a preorder (i.e. root-first order) of the tree hierarchy; that is, if gates G 1 , G 2 , . . . , G N are children of gate G, then the line declaring G in the file appears before the lines declaring {G i } N i=1 . • Basic events, i.e. the fault tree leaves, are declared in the file after all gates, and in its leftto-right order of definition; e.g. for the (sub-) tree PAND(BE1,BE2) the line declaring the PAND gate appears first in the file, then the line declaring the basic event BE1 is declared on a lower (not necessarily consecutive) line, and immediately below it is the line declaring the basic event BE2.
To provide a full concrete example, the fault tree in Fig. 1 is translated into the content of the tree.dft file shown in Fig. 3, where the failure rates of the basic events (i.e. the values assigned to the lambda constants) are assumed given in the text of the corresponding publication. toplevel "System"; "System" or "HCR" "batteries"; "batteries" and "B1" "B2"; "HCR" lambda=2.8e-5; "B1" lambda=1.13e-6; "B2" lambda=1.13e-6;

Statistics
FFORT is a diverse benchmark suite with fault trees that differ in size (i.e. the number of nodes in the tree), type (static vs. dynamic, repairable or not, with maintenance support or not), failure behaviour (diverse failure probability distributions for the basic events, and several gate types), and metrics computed (untimed failure probability, reliability for certain time horizon, availability, mean time to failure). Table 1 offers an overview of this diversity.
At the time of publication, FFORT contains 202 FTs from a total of 24 families. There is considerable variation in the number of FTs per family, with the largest family containing 68 FTs, and many 'families' having only one FT. The families containing many variants are mainly those that were used as benchmarks for analysis tools (where variants of different sizes demonstrate scalability

FT elements
We observe a balance between discrete-time and continuous-time FTs, with 16 families specifying basic events' failure rates, and the remaining 8 specifying simple failure probabilities. We did not encounter any FTs that mixed these types. In terms of size, the FTs vary from 6 to 253 basic events (median 22), and 4 to 161 gates (median 14). We note that the largest FTs are contained in families that also have more modest sizes, with the smallest FTs per family ranging from 6 to 54 BEs (median 10) and 4 to 50 gates (median 9).
With respect to the FT types, we have a mix of static, dynamic, and maintenance FTs, as shown in Fig. 4. For the purpose of this classification, we consider each FT is a member of the most restrictive class it fits in. So for instance, a tree containing only static gates is considered a static fault tree, even though it also meets the formal definition of a dynamic fault tree.
The gate types used are shown in Fig. 5. As one would expect, the AND-and OR-gates are by far the most common, with relatively even numbers for the more complex types. The sequenceenforcer gate and inspection module are the least used, which is understandable given their poor support by many analysis tools.
The average number of each type of gate per FT can be found in Fig. 6. The OR-gate is by far the most common, which was expected as this § Capitals of the tree's Name, see Fig. 2 Static (9) Dynamic (11) Maint. (4) 0 5 10 15 20 (a) Per family: 4 maintenance, 11 dynamic (+2 also maintenance gates), 9 static (+11 also dynamic, +4 also maintenance).
Static (22) Dynamic (152) Maint.  is what usually connects different failure modes (e.g. a general system failure may take place if functionality A or B are lost, which can be caused by failure modes F A or F B respectively). Next are the AND-, SPARE-, and PAND-gates, often connecting failure modes related via redundancy. The FDEP, voting (VOT), and sequence enforcer (SEQ) gates occurs more rarely, as many systems do not include these features. Inspection modules (IM) occur very infrequently, as most FTs that include maintenance specify only one policy represented by an IM.

Quantitative results
Of the 24 families in FFORT, 16 include at least one quantitative result. As described in Sect. 3, we provide published results that are taken from scientific literature (generally the same paper that describes the FT itself) and reference results computed ourselves for the sake of the benchmark. Of the 16 FTs with results, 11 contain published re- Of three FTs that do not have results, one contains a feature not supported by any of the analysis tools available to us (the Restoration factor), and the other two are too large for the tools to analyze them in two hours on our computers ¶ . Fig. 7 shows the percentage of FT (families) for which a particular type of result is available. An interesting remark is that the published results contain only metrics on reliability and availability. The mean time to failure is included as a reference result for many FTs, but is apparently not published in general. We did not compute reference results for availability, as the only tool available to us to compute the availability for repairable FTs (DFTCalc) is the same tool that generated the published results.
In computing reference results, we have already identified one published result (by Arnold et al. (2013)) that did not match the reference result. In collaboration with the original authors, we identified that the published value was erroneous, caused by a typographical error in the program invocation.

User Interface
The FFORT website consists of three web pages: The main page where (filtered subsets of) FTs can be viewed and downloaded, the statistics page where various statistics of the collection can be found in real-time, and the submission page where new FTs can be submitted.
In addition to the website, a git repository † † can be downloaded containing all the models with their JSON metadata (as well as the source code of the website). This feature allows e.g. tool authors to automatically execute their tool on all available FTs. Fig. 2 shows part of the front page of FFORT. At the bottom of the image, the list of FTs can be seen. Information about each model can be seen by clicking its name, which expands the entry to shows its description, reference, and image if available (an example description is shown at the top of Fig. 8  showing the variants with links to their Galileo files.

Main page
At the top of the page is the search box. Here, the user can search for FTs by name, description, or publication author or year. In addition, the user can choose to show only FTs of a particular type (static, dynamic, or maintenance), containing particular gates, and/or for which a particular type of result is available. Furthermore, the FTs can be restricted by the date they were added to FFORT. In this way, a tool author can, for example, try their tools on all FTs with certain properties, and later easily check whether any new FTs were added with those properties.

Statistics page
The statistics page of FFORT shows various statistics similar to those reported in Sect. 5, computed in real-time over the models present in FFORT. In addition, the statistics can be calculated over subsets of the submitted trees using all the filtering capabilities of the main page as described in the previous section. Fig. 8  page (except for the information that is automatically calculated from the tree). The submission page allows multiple variants to be submitted by adding additional models, and one or more results can be submitted for each variant. New variants can be submitted by naming the model identical to an already-included one, and new results can be submitted by using an identical name and omitting the model file.

Submission page
For expert users, the JSON-formatted metadata can be edited manually to perform tasks that would otherwise be cumbersome (e.g., adding large numbers of variants by copying and pasting one variant and making minor adjustments) or unsupported (e.g., specifying a different submission date for re-submissions with corrected models).
Completed forms can be submitted automatically to the maintainers of FFORT. After submission, the maintainers verify that the submitted tree and metadata are valid (i.e., that the submitted Galileo file has the correct syntax, DOIs refer to the correct papers, etc.), generate the modelderived metadata (i.e., number of gates, etc.), and add the submission to the main page.

Conclusion
This paper has presented FFORT, a compilation of diverse fault trees for benchmark purposes. We have collected FTs from the scientific literature, described them in a uniform input language, and we make them publicly available together with metadata about the FTs. We provide the metadata both on a user-friendly website and in machinereadable form. We further hope to expand the FFORT both by collecting further FTs ourselves and by soliciting contributions from other researchers on fault tree analysis.
Discussion One of the goals of FFORT is to provide validation for analysis techniques. Already during the construction of FFORT, the calculation of reference results identified software bugs in both tools used for the task (DFTCalc and Storm-DFT). Furthermore, we identified a case where