In this chapter we return to the control problem (policy improvement) and estimate action value function $\hat{q}(s, a, \boldsymbol{w}) \approx q_\star(s,a)$, and still restrict our attention to the ...
Abstract: This paper presents a short (and not exhaustive) introduction to the most used exact, approximation, and metaheuristic algorithms for solving hard combinatorial optimization problems. After ...