**Location: **
Faculty Hall

**Indian Institute of Science**

**Bangalore**

**Cordially Invites you to the **

**INSTITUTE COLLOQUIUM**

**(Divn. Of Electrical Sciences)**

**by **

**Professor Shalabh Bhatnagar**

**Department of Computer Science & Automation**

**Title : “Optimization via Simulation”**

**Date : Monday, 21 ^{st} November 2016**

**Prof Anurag Kumar, Director**

**will preside**

**—————————————————————————————**

Abstract:

Optimization methods play an important role in many disciplines such as signal processing, communication networks, neural networks, manufacturing systems, economics, operations research etc. A standard way to model optimization problems is through an objective or a cost function whose optimum constitutes the desired solution. A stochastic optimization problem usually has the feature that its objective function is an expectation of a certain noisy cost function. Model free algorithms work under the premise that the distribution of noise is not known, however, samples of the noisy cost objective can be observed or simulated. This is the setting that our work focuses on.

An important strand of our work at IISc has to do with the development of random search optimization techniques for stochastic optimization. In the face of high-dimensional parameters, random search techniques turn out to be extremely useful because they speed up the search process (for the optimum). The objectives we consider are general non-linear functions with some smoothness properties. I shall discuss in some detail here our work on the random perturbation algorithms where the perturbation directions are chosen according to some distributions. The speed-up results from the fact that using such algorithms, only one or two simulations of the system are required for parameter updates. We discuss our contributions to both gradient and Newton based algorithms. We will also briefly discuss our contributions towards the adaptation of the aforementioned algorithms to problems of optimization under functional inequality constraints as well as those for which the parameter set is discrete. For the latter setting, we proposed a random projection technique that helps in the adaptation of the continuous optimization algorithms to the discrete setting. We will also briefly touch upon our recent work on the development of algorithms for global optimization — that are based on the cross entropy method.

It turns out that stochastic search algorithms of the type we study belong in general to the broad class of stochastic approximation algorithms. Many analyses of stochastic approximation assume that the stochastic iterates are synchronous (i.e, all parameter components updated simultaneously with a global clock), are stable (i.e., uniformly bounded with probability one), the noise is an `additive’ martingale difference sequence, and they largely cater to the case of point-to-point costs or objectives that are normally assumed Lipschitz continuous. We build on work by Borkar and Meyn to show the stability and convergence of stochastic approximation under the following settings: (a) asynchronous, distributed updates, (b) recursions involving multiple time-scales, (c) the cost functions being point-to-set maps, and (d) algorithms with a Markov noise component in addition to martingale difference sequences. Algorithms with Markov noise are for instance useful in the context of reinforcement learning while those based on point-to-set costs are useful for the analysis of stochastic search algorithms with biases resulting from a sensitivity parameter as well as for Markov decision processes under partial state observations.

Reinforcement learning algorithms are a broad class of data-driven algorithms for dynamic decision making under uncertainty. An important goal here is to find an optimal policy or a sequence of actions given the states of the environment — revealed to the decision maker one at a time. An important quantity of interest here is the value function, i.e., the optimal cost — that is often a function of the system state. Since most real-life systems are `large-scale’, one usually considers instead parametrized value functions that depend on certain state features and find policies that optimize the parametrized objectives. In this context we discuss a data-driven feature adaptation algorithm that updates the features in the Grassmann manifold and converges,

for any given policy update, to the globally optimal feature. Thus, the parametrized class of functions itself is tuned in a way that the true value function sits within the optimal class and to which the algorithm converges.

Finally, we will discuss an engineering application in the context of vehicular traffic control. The problem is to tune the red and green signal durations at the junctions on a road network with the goal of minimizing the level of congestion encountered on the roads. We assume that some coarse information on the congestion levels on the various lanes is available and device an algorithm that combines the discrete parameter optimization scheme described before with a reinforcement learning algorithm. The results indicate significant performance improvements numerically.

———————————————————————————————————————

**Tea : 5-00 p.m.**

**ALL ARE WELCOME**