We consider a repairable system such that different completeness degrees are possible for the repair (or corrective maintenance) that go from a ‘minimal’ up to a ‘complete’ repair. Our question is: to what extent must the system be repaired in case of failure for the long-run availability to be optimal? The system evolves in time according to a Markov process as long as it is running, whereas the duration of repairs follows general distributions. After repair, the system starts again in the up-state i with probability d(i). We observe from numerical examples that the optimal restarting distribution dopt (such that the long-run availability is optimal) is generally random and does not correspond to a new start in a fixed up-state. Sufficient conditions under which the optimal restarting distribution is non-random are given. Also, the optimal restarting distribution is provided for two classical structures in reliability.
Let us consider a repairable system such that different completeness degrees are possible for the repair (or corrective maintenance), that go from a ‘minimal’ up to a ‘complete’ repair. One may think, for instance, of a system with redundant components. Our questions are: in case of failure, is it worth achieving complete repairs, that may be long (or costly), or is it better to repair the system as quickly as possible? To what extent should the corrective maintenance be performed? The answer to such questions highly depends on the criterion used to measure the performance of the system: we are interested here in the long-run availability, that is the probability for the system to be up in the long run. Our problem then is to find the degree of the repair such that the long-run availability is optimal. In Ref. [1], we studied such kinds of problems, but we concentrated there on finding conditions under which complete repairs are optimal, namely, we showed that for a system with some kind of ageing property, complete repairs are optimal. As for other papers about maintenance optimization, most of them actually deal with preventive maintenance, only a few with corrective maintenance. Nearest problems from ours may be found in papers dealing with redundancy optimization. One may think for instance about Chapter 6 of Ref. [2], to Ref. [3], or to Ref. [4] and references therein. In such papers, the authors are mainly interested in optimizing reliability under constraints or under the assumption of two failure modes. Their aim is to provide algorithms for finding optimal redundancy. The closest work from ours was found in Ref. [5] where the authors consider a system composed of N identical parallel units, for which they show (among other results) that even in the case of units with constant failure rate, cost may be improved by deliberately taking out of operation some non-failed units. The question then arises to find the optimal number of units to be put into operation (or to repair in case of failure), which they compute under different assumptions.
Here, we do not fix the structure of the system as the previous authors did, but we assume that the system evolves in time according to a Markov process as long as it is running. When the system fails, a repair is begun with a general distribution. After repair, we assume that the system always starts again in the same way. More precisely, if the up-states of the system are denoted by 1,2,…,m, the system starts again after any repair in state i (1≤i≤m) with the same probability, denoted by d(i). This means that we allow the new starts after repair to be random. As for the technical realization of such a thing, let us think for instance of a system composed of two parallel subsystems with two repairmen facilities. In case of failure, the repair of both subsystems is begun simultaneously. Then, we decide to let the system start again as soon as one is over. Also, we may adjust the new start of the system according to the desired restarting distribution by adding some repairmen facilities for one of the subsystems.
For such a system (see Section 2 for more details), we compute the long-run availability A∞(d) (in Section 3, Theorem 1) and then comes our problem, namely to look for the restarting distribution dopt that makes the long-run availability optimal. We first observe from a numerical example ( Example 1) that this optimal distribution does not always correspond to a new start in a fixed up-state and may be random. This justifies the introduction of a random distribution for the new starts after repair, though we also observe that the optimal distribution often is non-random. A natural problem then is to look for conditions under which the optimization may be limited to such non-random distributions. Indeed, from a practical point of view, it is easier to know exactly which components to repair in case of failure. Besides, from a theoretical point of view, the research of the optimal restarting distribution is, under such conditions, highly simplified. There are, in that case, only m possible restarting distributions, whereas all the possible distributions on {1,…,m} have to be considered in the general case. Such sufficient conditions are given in Theorem 2 ( Section 4). They are tested on some examples, and then used to study ‘k out of n’ standby structures (in Section 5): for both of them, we show that the optimal restarting distribution is non-random and corresponds to an optimal number of components to be repaired in case of failure, which we compute. This easily provides us with the optimal number of redundant components to be set up in those structures, in case of complete repairs.
We now specify our assumptions and notations.