Analysis of thompson sampling for the multi-armed bandit problem
The multi-armed bandit problem is a popular model for studying exploration/exploitation
trade-off in sequential decision problems. Many algorithms are now available for this well-…
trade-off in sequential decision problems. Many algorithms are now available for this well-…
Thompson sampling for contextual bandits with linear payoffs
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a
randomized algorithm based on Bayesian ideas, and has recently generated significant …
randomized algorithm based on Bayesian ideas, and has recently generated significant …
Bandits with concave rewards and convex knapsacks
S Agrawal, NR Devanur - Proceedings of the fifteenth ACM conference …, 2014 - dl.acm.org
In this paper, we consider a very general model for exploration-exploitation tradeoff which
allows arbitrary concave rewards and convex constraints on the decisions across time, in …
allows arbitrary concave rewards and convex constraints on the decisions across time, in …
A framework for high-accuracy privacy-preserving mining
S Agrawal, JR Haritsa - 21st International Conference on Data …, 2005 - ieeexplore.ieee.org
To preserve client privacy in the data mining process, a variety of techniques based on
random perturbation of individual data records have been proposed recently. In this paper, we …
random perturbation of individual data records have been proposed recently. In this paper, we …
Near-optimal regret bounds for thompson sampling
Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is
a randomized algorithm based on Bayesian ideas and has recently generated significant …
a randomized algorithm based on Bayesian ideas and has recently generated significant …
piRNABank: a web resource on classified and clustered Piwi-interacting RNAs
S Sai Lakshmi, S Agrawal - Nucleic acids research, 2008 - academic.oup.com
Piwi-interacting RNAs (piRNAs) are expressed in mammalian germline cells and have been
identified as key players in germline development. These molecules, typically of length 25–…
identified as key players in germline development. These molecules, typically of length 25–…
A dynamic near-optimal algorithm for online linear programming
A natural optimization model that formulates many online resource allocation problems is the
online linear programming (LP) problem in which the constraint matrix is revealed column …
online linear programming (LP) problem in which the constraint matrix is revealed column …
Dyslipidaemia in nephrotic syndrome: mechanisms and treatment
S Agrawal, JJ Zaritsky, A Fornoni… - Nature Reviews …, 2018 - nature.com
Nephrotic syndrome is a highly prevalent disease that is associated with high morbidity
despite notable advances in its treatment. Many of the complications of nephrotic syndrome, …
despite notable advances in its treatment. Many of the complications of nephrotic syndrome, …
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds
S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We present an algorithm based on posterior sampling (aka Thompson sampling) that achieves
near-optimal worst-case regret bounds when the underlying Markov Decision Process (…
near-optimal worst-case regret bounds when the underlying Markov Decision Process (…
Reinforcement learning for integer programming: Learning to cut
Integer programming is a general optimization framework with a wide variety of applications,
eg, in scheduling, production planning, and graph optimization. As Integer Programs (IPs) …
eg, in scheduling, production planning, and graph optimization. As Integer Programs (IPs) …