Institute Colloquium by Prof. Shalabh Bhatnagar

Location: Faculty Hall

Shalab Bhatnagar

Indian Institute of Science

Cordially Invites you to the                    


(Divn. Of Electrical  Sciences)
Professor Shalabh Bhatnagar
Department of Computer Science & Automation

Title : “Optimization via Simulation”
Date : Monday, 21st November 2016
Venue: Faculty Hall, Main Building
Time : 4-00 p.m.

Prof Anurag Kumar, Director
will preside



Optimization methods play an important role in many disciplines such as signal processing, communication networks, neural networks, manufacturing systems, economics, operations research etc. A standard way to model optimization problems is through an objective or a cost function whose optimum constitutes the desired solution. A stochastic optimization problem usually has the feature that its objective function is an expectation of a certain noisy cost function. Model free algorithms work under the premise that the distribution of noise is not known, however, samples of the noisy cost objective can be observed or simulated. This is the setting that our work focuses on.

An important strand of our work at IISc has to do with the development of random search optimization techniques for stochastic optimization. In the face of high-dimensional parameters, random search techniques turn out to be extremely useful because they speed up the search process (for the optimum). The objectives we consider are general non-linear functions with some smoothness properties. I shall discuss in some detail here our work on the random perturbation algorithms where the perturbation directions are chosen according to some  distributions. The speed-up results from the fact that using such  algorithms, only one or two simulations of the system are required for parameter updates. We discuss our contributions to  both gradient and Newton based algorithms. We will also briefly discuss our contributions towards the adaptation of the aforementioned algorithms  to problems of optimization under functional inequality constraints as well as those for which the parameter set is discrete. For the latter setting, we proposed a random projection technique   that helps in the adaptation of the continuous optimization algorithms  to the discrete setting. We will also briefly touch upon our recent  work  on the development of algorithms for global optimization —  that are  based on the cross entropy method.

It turns out that stochastic search algorithms of the type we study belong in general to the broad class of stochastic approximation algorithms. Many analyses of stochastic approximation assume that the stochastic iterates are synchronous (i.e, all parameter components updated simultaneously with a global clock), are stable  (i.e., uniformly bounded with probability one), the noise is an `additive’ martingale difference sequence, and they largely cater to  the case of point-to-point costs or objectives that are normally assumed Lipschitz continuous. We build on work by Borkar and Meyn  to show the stability and convergence of stochastic approximation  under the following settings: (a) asynchronous, distributed updates,  (b) recursions involving multiple time-scales, (c) the cost functions  being point-to-set maps, and (d) algorithms with a Markov noise component in addition to martingale difference sequences. Algorithms with Markov noise are for instance useful in the context of reinforcement learning while those based on point-to-set costs are useful for the analysis of stochastic search algorithms with biases resulting from a sensitivity parameter as well as for Markov decision processes under partial state observations.

Reinforcement learning algorithms are a broad class of data-driven algorithms for dynamic decision making under uncertainty. An important goal here is to find an optimal policy or a sequence of actions given the states of the environment — revealed to the decision maker one at a time. An important quantity of interest here is the value function, i.e., the optimal cost — that is often a function of the system state. Since most real-life systems are `large-scale’, one usually considers instead parametrized value functions that depend on certain state features and find policies that optimize the parametrized objectives. In this context we discuss a data-driven feature adaptation algorithm that updates the features in the Grassmann manifold and converges,

for any given policy update, to the globally optimal feature. Thus, the parametrized class of functions itself is tuned in a way that the true value function sits within the optimal class and to which the algorithm converges.

Finally, we will discuss an engineering application in the context of vehicular traffic control. The problem is to tune the red and green signal durations at the junctions on a road network with the goal of minimizing the level of congestion encountered on the roads. We assume that some coarse information on the congestion levels on the various lanes is available and device an algorithm that combines the discrete parameter optimization scheme described before with a reinforcement learning algorithm. The results indicate significant performance improvements numerically.


Tea : 5-00 p.m.