Penalty Search
Juha-Miikka NURMILAAKSO
Product Modeling and Realization Group
Laboratory of Information Processing Science
Helsinki University of Technology
FIN-02150 Espoo, Finland
Abstract: This paper describes penalty search based on a modified neighborhood search where move and solution penalty functions attempt to keep the search process out of recently "explored" regions. During the penalty search process trajectories which may cause a cycle or drive the search process towards recently found local optimums are made "less attractive" by using penalty parameters. Penalty search has been extended by the idea of reactive search, which is based on a "reactive" mechanism for the determination of search parameters. In reactive penalty search penalty parameters are determined and readjusted considering the past behavior of the search process.
Keywords: Modified neighborhood search; Penalty function; Reactiviness
1. Introduction
During the last thirty years many flexible problem-independent methods for combinatorial optimization problems have been developed. These methods are based on either an imitation of a process appearing in the nature (genetic algorithms, neural networks, simulated annealing) or a heuristic search procedure (A*, beam search, tabu search). On the other hand, these methods are either constructive or iterative. In the late 1980s Glover [6, 7, 8] summarized a tabu search based on a modified neighborhood, which is able to go beyond local optimum by using prohibition-based techniques with diversification and intensification. Tabu search was preceded by memoryless methods, where the past history of the search process doesn't influence on their behavior in the future, and rigid memory methods based on a search in the solution tree.
Usually a problem-dependent implementation of an optimization method requires that search parameters are fitted to the structure of a problem. When the structure, e.g. coefficients of an objective function and constraints, is changed, required search parameters are often predetermined by a human trial-and-error adjustment. Battiti [3] proposed the idea of reactive search; during the search process search parameters are determined according to the information obtained in the history of the search process. Most papers on reactive search [1, 2] study the "reactive" determination of prohibition parame
ters. As a result of these studies the reactive tabu search framework has been presented also as a version for continuous optimization problems [4].
Penalty methods [5] are known in linear and non-linear programming, where a constrained problem is transformed into an unconstrained one and the purpose of the penalty function is to keep the search process in the feasible solution space.
Tabu search and a new heuristic search procedure, penalty search, contain similar actions such as diversification and intensification but in the former tabus are replaced with move penalties in the latter. The fundamental difference is the behavior where penalty search penalizes transformations nullifying the influences of recently done transformations, while tabu search prohibits from doing such transformations to a solution. In addition, penalty search contains solution penalties, which can prevent the search process from being driven towards recently found local optimums. The advantage of penalties is flexibility which makes penalty search a good alternative to tabu search. Penalty search is suggested to be extended by a reactive mechanism which determines and readjusts penalty parameters, when necessary, by using the information about the past behavior of the search process.
This paper provides a description where the functioning and behavior of neighborhood search is prese
nted (section 2). Tabu search is roughly described to show the difference between tabus and penalties in a modified neighborhood search (section 3). The goals and requirements of reactive search are also presented (section 4). Penalty search is described so that the idea of move and solution penalty functions as well as the purpose of their reactive determination is presented (section 5).
2. Neighborhood search
Neighborhood (local) search is a well known optimization method especially in non-linear programming, however, it is also a basis for some combinatorial optimization methods such as simulated annealing [9] and tabu search.
Let an optimization problem (1) be defined so that an objective function f(x) is to be minimized by a neighborhood search in a constrained search space X consisting of feasible solutions x with integer (or binary) variables x1,..., x n.
min f(x)  s.t. x∈X                                                                                                                                            (1) Neighborhood search starts from a feasible solution x(0) and proceeds in an iterative way by selecting a move y(t)from a finite set of moves Y(x(t)) for transforming a solution x(t) into another one x(t+1) in a finite set of neighborhood solutions X(x(t)) (2). Sets of moves and neighborhood solutions ar
e usually constructed to contain only moves for transforming a solution into feasible one (X(x(t))⊆X).
y(t)(x(t)) = x(t+1)                      where x(t+1)∈X(x(t)) and y(t)∈Y(x(t)) and x(t)≠x(t+1)  (2) A a sequence of solutions and moves is generated in a greedy way (3), i.e. in each iteration t the move y(t) directing to a solution with the lowest value of an objective function f(x(t+1)) in the neighborhood is selected. The best move y(t) results the highest improvement to a solution x(t). If there exist moves which result an equal  improvement, moves should be put in order from best to worst by using the same principle in each iteration. If ties are broken in a random way,  possible modifications in neighborhood search may not work successfully.
f(t+1) = f(x(t+1)) = min {f(y(t)(x(t))) | y(t)∈Y(x(t))}                                                                (3) This greediness makes the selected move to improve the current solution f(t) and the generated trajectory to decrease (4) until the search process reaches a solution x(t gen) (5), which is a local optimum. Unfortunately, in some cases it is time consuming to verify that this solution x(t gen) is really locally optimal because there may exist a trajectory which changes and decreases the value of an objective function after a certain number of iterations d cha (≥1) (6a, 6b).
f(t+1) < f(t)t = 0,..., t gen - 1  (4) f(x(t gen))≤f(x)                                                                                         
        for all x∈X(x(t gen))  (5) f(t+1) = f(t)t = t gen,..., t gen + d cha - 1  (6a) f(t gen + d cha + 1) < f(t gen + d cha)                                                                                                                  (6b) Each move y(t) consists of a certain number of elements e(y(t)) (1≤e(y(t))≤n) so that an element e i indicates increase of variable x i, while an element e n+i indicates its decrease. This means that a combination of elements can contain either an element e i  (= e n+i-1) or e n+i (= e i-1) because they are inverse elements to each other. A move y(t+1) is an inverse transformation of the latest selected move y(t) if and only if the current solution x(t+1) will be transformed into the previous one x(t) (7a). Therefore a move y(t+1) only consists of inverse elements e1-1,..., e2*n-1 with respect to its inverse full inverse move y(t) (7b).
y(t+1)(x(t+1)) = y(t+1)(y(t)(x(t))) = x(t+2) = x(t)                                                                  (7a) y(t+1) = y(t)-1                                                                                            where y(t)-1∈Y(x(t+1))  (7b) Neighborhood search without modifications is often useless due to the well known defect; the search process will be trapped in a local optimum. This means that the search process is in a cycle which is an endless cyclic repetition of a certain set of solutions X cyc(x(t cyc)). In a cycle a move y(t) is selected again after a certain number of iterations c (= 2, 3,...) and it belongs to a certain set of moves Y cyc(x(t cyc)) which are full or partial inverses to each other (8a, 8b).
x(t) = x(t+q*c)              where x(t)∈X cyc(x(t cyc)) and t = t cyc,..., t cyc + c - 1 and q = 1, 2,...  (8a) y(t) = y(t+q*c)              where y(t)∈Y cyc(x(t cyc)) and t = t cyc,..., t cyc + c - 1 and q = 1, 2,...  (8b) However, there are two known modifications which help the neighborhood search process to go beyond local optimums:
• restart the search process from a different (e.g. random) solution,
• permit (e.g. simulated annealing) or compel (e.g. tabu search) the search process to select an impairing move.
3. Tabu search
Tabu search is a heuristic search procedure, which is based on a modified neighborhood search containing both restarting and compelling actions to prevent it becoming trapped in local optimum. The past history of the search process is collected, processed and stored in different kind of memories and this information influences the future behavior of the search process. According to Glover [8] short-term memory is used to control the neighborhood search, while intermediate and long-term memory is used to diversify and intensify the search process.
Restarting is carried out by diversifying or intensifying the search process [8]; diversification drives the search process into new regions, while intensification reinforces the move combinations and solution features that are found historically good. In the simplest case intensification drives the search process into a region lying close to the recently found local optimum and containing "uncharted" solutions, while diversification seeks distant and historically "unexplored" regions. Diversification or intensification is started when the solution doesn't improve during a certain period, and the search process is finished usually after a predetermined number of restarts.
Compelling means that a move which is an inverse transformation of a recently selected move is prohibited for a given period. In each iteration t an inverse move y(t)-1 of the just selected move y(t) is set to be tabu by inserting it into the set of prohibited moves M(t+1). After restarting or a predetermined number of iterations d (≥1) tabu is cancelled by removing the prohibited move from this set (10). Tabus have quite absolute influence on the search process because a move y(t) is usually selected from set of admissible moves Y(x(t)) \ M(t); if a move is prohibited, it can be selected if and only if this move belongs to set of moves A(t) (⊆Y(x(t))) fulfilling an aspiration criterion (9). For example, an aspiration criterion allows to select a move set to be tabu if this move directs to a solution with the lowest value found so far of an objective function.
f(t+1) = min {f(y(t)(x(t))) | y(t)∈Y(x(t))  \(M(t)  \ A(t))}                                                    (9) M(t+1) = M(t)∪ {y(t)-1}                                                                                                  where M(0) = ∅  (10a) M(t+1) = M(t) \ {y(t-d)-1}                                                                                                        if t - d≥0  (10b) The determination of a prohibition parameter d has proven to be a challenging problem because fixed value can be too restrictive making the set of admissible moves empty or too loose causing a long cycle of some search states. Reactive mechanism has been proposed to solve the determination of search parameters.
reactive materials studiesThe set of prohibited moves M(t) can be implemented by using a list structure or a tree structure. A list structure is updated by removing the oldest inverse move y(t-d)-1, if the defined size d has been reached, and inserting an inverse transformation y(t)-1 of the just selected move y(t) in each iteration t. A move y(t) is recognized as a prohibited one if a list contains an inverse move y(t s)-1 (= y(t)) which consists of exactly same elements as a move y(t). Unfortunately, this recognition process is often time consuming and a list structure supports reactiviness quite poorly even if the size of a list can be dynamically adjusted. A tree structure is updated simple and the recognition process is fast because this structure consists of memory items for each possible move and these items can set to be tabu for a given period. The problem of a tree structure is the huge memory requirement because the size of this structure can be exponentially proportional to the number of variables x i.
Instead of a list structure or a tree structure tabus can be implemented by using an array structure [10], the fixed size of which 2 * n doesn't influence on the length of a prohibition period. In each iteration t i a prohibition parameter d i (≥1) is determined for each element e i, whose inverse element  e i-1 is contained in the just selected move y(t i). Determined prohibition parameters are stored in an array which consists of memory items
for each element.  The problem of tabus with an array structure appears in the recognition process; should a move be recognized as a prohibited one if it consists only of prohibited elements or if it contains at least one prohibited element ? Prohibition level d(y(t)) (11) provides one solution to this recognition problem; a move y(t) is recognized as a prohibited one if and only if prohibition level d(y(t)) exceeds the limit h t * e(y(t)), which contains a new search parameter h t (0 < h t < 1).
d(y(t)) = ∑ ((d i + t i + 1 - t) / d i)                                                                                                              (11)                      for all e i in y(t) with t i < t≤t i+ d i
To the recognition problem this paper provides another solution where tabus are replaced with penalties. The advantage of penalties compared with tabus is flexibility; the intensity of penalties can vary, while tabus are only able to work at on-off level. In addition, penalties support reactiviness and an array structure.
4. Reactive search
The fundamental idea of reactive search is to control the behavior of the search process by utilizing its past history. Battiti [3] proposed that the past history of the search process is used for feedback-based parameter tuning and automated balancing of diversification and intensification.
In each iteration a reactive mechanism adjusts the search parameters by considering the past behavior of the search process. Often further parameters influence the behavior of the search process through aspiration criterion and conditions of diversification and intensification. These parameters can also be adjusted by a reactive mechanism.
Information about a reached solution x(t+1) and a selected move y(t) must be collected in each iteration t to determine search parameters reactivily. Then collected information is processed and stored in the memories designed for the reactive mechanism. In the simplest case the cumulative sum of occurrences both of elements e i in selected moves y(t) and of the value of variables x i in reached solutions x(t+1) for a defined number of iterations can be calculated and stored in the memory [10]. This can be difficult, in particular, if the search space of the problem is infinite. The structure of the memories should support continuous reprocessing; stored information must be often compressed or d
eleted because its utility decreases along with time and its amount increases requiring more and more computing time and computer memory.
The four goals of reactive search can be intuitively defined as follows where the two last goals indicate "good" decisions during the search process:
• make a human predetermination of search parameters unnecessary,
• readjust search parameters to be suitable for the current search state,
• avoid the occurrence of cycles, also long ones,
• explore the search space of a problem in an "even" but reasonable way.
5. Penalty search
5.1. Penalty method
In linear and non-linear programming penalty methods [5] are used to provide a transformation of a constrained problem (12a, 12b) to an unconstrained problem where the sum of an objective function f(
x) and a penalty function p(x) is minimized (13a). In this conventional approach the purpose of penalty function p(x) is to penalize solution x for violating constraint(s) and keep the search process in the feasible space X so that feasible solutions are, in most cases, "more attractive" that infeasible ones (13b).
min f(x)  s.t. x∈X                                                                                                                                        (12a) X = {x∈R n | g j(x) = 0 , h k(x)≤0 for all j and k}                                                                                  (12b) min (f(x) + p(x))  s.t. x∈R n                                                                                                                    (13a)
p(x) = ∑u j * |g j(x)| + ∑v k * max {0, h k(x)}                                  where u j, v k≥0 for all j and k  (13b)                for all j                      for all k
Penalty method is a basis for the new approach, penalty search, where penalties are used for even exploration of a search space and cycle avoidance. In penalty search the purpose of a penalty function is to keep the search process out of recently explored regions.
5.2. Penalty search procedure
As tabu search, penalty search is a memory intensive heuristic search procedure, which is based on a modified neighborhood search with restarting and permitting actions. Restarting contains diversification or intensification, like in tabu search, and the third possibility is to combine both strategies and determine the degree of diversification or intensification. Restarting is needed when a given number of iterations has been done or a local optimum has been found after the initial start or last restart. Permitting is carried out during the search process so that selecting a move which is or may be against "good" decision is penalized.
The fundamental idea of the penalty search is simple when an objective function f(x) is minimized (1) by a modified neighborhood search; the move y(t) directing to a solution with the lowest total value of an objective function f(x(t+1)) and a non-negative penalty function p(x(t),y(t),t) in the neighborhood Y(x(t)) is selected in each iteration t (14), while the move can be selected from the set of admissible moves in tabu search (9).
f(t+1) = min {f(y(t)(x(t))) + p(x(t),y(t),t) | y(t)∈Y(x(t))}                                            (14)
Penalty function p(x(t),y(t),t) is a control mechanism which often gives an impairing move y(t) (≠y best(t)) a better chance to become selected in the case that the best move y best(t) may cause a cycle
or drive the search process into a recently explored region and an impairing move y(t) doesn't (15a, 15b).
f(y(t)(x(t))) + p(x(t),y(t),t)≤f(y best(t)(x(t))) + p(x(t),y best(t),t)                  (15a) f(y best(t)(x(t))) < f(y(t)(x(t)))                                                                                                    (15b) A non-negative penalty function p(x(t),y(t),t) (16) consists of move penalty functions m j(y(t),t), each of which is related to a given element e j and its occurrence in a move y(t)in an iteration t, and solution penalty functions s k(y(t)(x(t))), each of which is related to the distance ||y(t)(x(t)) - x(t k)|| between possible next y(t)(x(t)) and a center solution x(t k), which is usually a local optimum, found in a iteration t k (≤t).  Due to the effectiveness requirement the number of solution penalty functions k max (= 1, 2,...) should be defined proportional to the size of a problem.
p(x(t),y(t),t) = ∑m j(y(t),t) + ∑s k(x(t+1))                                                                                (16) j = 1,..., 2*n k = 1,..., k max
5.3. Move penalties
The purpose of move penalty functions is to make a move less attractive for a given period, when the move is an inverse transformation of a recently selected move or contains its elements. Analogously, 
while tabus in tabu search prohibit the search process from selecting that move for a given period. In this way the search process attempts to prevent itself from becoming trapped in a local optimum.
In each iteration t j a move penalty function m j(y(t),t) is determined for each element e j, whose inverse element e j-1 is contained in the just selected move y(t j). The value of this function is usually positive for a determined period, number of iterations d j, and it can also be zero temporarily if an aspiration criterion is fulfilled. Otherwise the value of the move penalty function is always zero. In the simplest form a move penalty function is constant (17) or linearly decreasing (18) requiring two move penalty parameters a j (≥0) and d j (≥1).
m j(y(t),t) = a j                                                if t j < t≤t j + d j and move y(t) contains element e j  (17) m j(y(t),t) = a j * (d j + t j + 1 - t) / d j        if t j < t≤t j + d j and move y(t) contains element e j  (18)
If move penalty functions m j(y(t),t) are "properly" determined for all elements e j and a set of moves Y(x(t+1)) contains at least two moves in the next iteration t+1, there will exist some move y(t+1) (≠y(t)-1) which is  more attractive than an inverse transformation of the just selected move  y(t)-1 (19).
f(y(t+1)(x(t+1))) + ∑m j(y(t+1),t+1) < f(y(t)-1(x(t+1))) + ∑m j(y(t)-1,t+1)                  (19)                                                for all e j in y(t+1)                                                                  for all e j in y(t)-1
This shows that move penalties are able to prevent from selecting a full inverse move y(t)-1 in the next iteration t+1. If the values of move penalty functions are very large, move penalties and tabus, which prohibit from selecting a move containing a prohibited element, have quite similar influences on the search process. However, successful long cycle avoidance requires readjustment of move penalty parameters, which prevents from selecting partial inverse moves one after the another.
5.4. Solution penalties
The purpose of solution penalty functions is to prevent from generating a trajectory which drives the search process to the explored region containing the recently found local optimum. Solution penalties may also help the search process to escape from the just found local optimum.
When the search process finds a solution x(t) which is locally optimal in an iteration t k, a solution penalty function s k(x(t+1)) is determined and its center solution is the current one (x(t k) = x(t)). This function will replace the oldest solution penalty function if all functions are in use. The value of a solution penalty function is usually positive within a given distance c k from its center solution and it can also be zero temporarily if an aspiration criterion is fulfilled. Otherwise the value of this function is always zero. In the simplest form a solution penalty function is linearly depending (20) on the distance |
|x(t+1) - x(t k)||, which can be calculated using p-metrics (21), requiring two solution penalty parameters b k (≥0) and c k (> 0).
s k(x(t+1)) = b k * (c k - ||x(t+1) - x(t k)||)                                                        if  ||x(t+1) - x(t k)|| < c k    (20) ||x(t+1) - x(t k)||p = (∑ |x i(t+1) - x i(t k)|p)1/p                                                        where p = 1,2 or ∞  (21)
i = 1,..., n
Solution penalty function s k(x(t+1)) works within a given distance c k so that its value will increase if a move y inc(t) directing towards center solution of solution penalty function x(t k) is selected. This can prevent from selecting a move which drives the search process into an explored region containing already found solution x(t k). In the opposite situation, when the search process has found a locally optimal solution, which is also a center solution x(t k), a move y dec(t) directing to a solution with lower value of solution penalty function s k(x(t+1)) has a better chance to become selected (22a, 22b).
f(y dec(t)(x(t))) + s k(y dec(t)(x(t))) < f(y inc(t)(x(t))) + s k(y inc(t)(x(t)))              (22a) where ||y inc(t)(x(t)) - x(t k)|| < ||y dec(t)(x(t)) - x(t k)|| < c k                                                              (22b) 5.5. Determination of penalty parameters and reactiviness
Fixed values are the easiest way to implement determination of penalty parameters but it is also time consuming to find suitable values which are robust for changes in the structure of a problem. In addition, fixed penalty parameters can cause same kind of problems as fixed prohibition parameters in tabu search. One way to make this problem "unlikelier" is to use random values; if there were information about behavior of the search process, a probability density function could be constructed for each search parameter so that a parameter would be randomly determined by using its cumulative distribution function.
The idea of reactive search combined with penalty reactive penalty search provides a promising way to determinate penalty parameters. Then penalty parameters can be determined and readjusted, when necessary, by adapting probabilistic methods to stored information. This reactiviness has considerable advantages which make the search process adaptive to its current state.

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。