EWRL15 (2022) | European Workshops on Reinforcement Learning

The 15th European Workshop on Reinforcement Learning (EWRL 2022) —————————————————————-

Dates: 19-21 September 2022
Location: **Aula De Carli – Politecnico di Milano – Campus Bovisa**, Building B9

Via Durando, 10 – 20158 – Milano (MI) – Italy

There are many entrances to the campus, we suggest to use the entrance in via Durando 10 to reach the venue easier

Schedule (add to Google Calendar)

Monday- 19/09/2022

8:30 – 9:30 Check-in

9:30 – 10:30 Tutorial 1 (part 1) Matteo Pirotta: “Exploration in Reinforcement Learning”

10:30 – 11:00 coffee break

11:00 – 12:00 Tutorial 1 (part 2)

12:00 – 13:00 Sponsor Talks 1

13:00 – 14:30 Lunch break

14:30 – 15:30 Tutorial 2 (part 1) Matthieu Geist: “Regularization in Reinforcement Learning”

15:30 – 16:00 coffee break

16:00 – 17:00 Tutorial 2 (part 2)

17:00 – 18:00 Sponsor Talks 2

18:00 – 20:00 Welcome reception

Tuesday- 20/09/2022

8:00 – 9:00 Check-in

8.45 – 9.00 Opening remarks

9:00 – 9:40 Invited talk 1 Sarah Perrin: “Scaling up MARL with MFGs and vice versa!”

9:40 – 10:00 Contributed talk 1 (Scalable Deep Reinforcement Learning Algorithms for Mean Field Games)

10:00 – 11:00 Poster session 1 (with Coffee break)

11:00 – 11:40 Invited talk 2 Niao He: “Complexities of Actor-critic Methods for Regularized MDPs and POMDPs”

11:40– 12:00 Contributed talk 2 (IQ-Learn: Inverse soft-Q Learning for Imitation)

12:00 – 12:20 Contributed talk 3 (Newton-based Policy Search for Networked Multi-agent Reinforcement Learning)

12:20 – 14:00 Lunch break

14:00 – 14:40 Invited talk 3 Ann Nowé: “Beyond the optimal action in Reinforcement Learning”

14:40 – 15:00 Contributed talk 4 (Group Fairness in Reinforcement Learning)

15:00 – 15:20 Contributed talk 5 (Direct Advantage Estimation)

15:20 – 16:00 Invited talk 4 Jan Peters: “Robot RL: Lessons from the Physical World”

16:00 – 18:00 Poster session 2 (with Coffee break)

20:00 Social Dinner

Wednesday- 21/09/2022

8:00 – 9:00 Check-in

9:00 – 9:40 Invited talk 1 Alessandro Lazaric: Understanding (unsupervised) exploration in goal-based Reinforcement Learning

9:40 – 10:00 Contributed talk 1 (Optimistic PAC Reinforcement Learning: the Instance-Dependent View)

10:00 – 11:00 Poster session 1 (with Coffee break)

11:00– 11:40 Invited talk 2 Ciara Pike-Burke: “Multi-armed bandits with history dependent rewards”

11:40 – 12:00 Contributed talk 2 (A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits)

12:00 – 12:20 Contributed talk 3 (Dynamic Pricing with Online Data Aggregation and Learning)

12:20 – 14:00 Lunch break

14:00 – 14:40 Invited talk 3 Gergely Neu: “Primal-Dual Methods for Reinforcement Learning”

14:40 – 15:00 Contributed talk 4 (Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality)

15:00 – 15:20 Contributed talk 5 (Local Feature Swapping for Generalization in Reinforcement Learning)

15:20 – 16:00 Invited talk 4 Richard Sutton: “An Architecture for Intelligence”

16:00 – 18:00 Poster session 2 (with Coffee break)

Poster Session Assignment

Each poster is assigned a day ( either September 20 or September 21) and will be presented in both (morning and afternoon) poster sessions of that day

Poster Session 20 September

Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Curriculum Reinforcement Learning via Constrained Optimal Transport
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Multi-Objective Coordination Graphs for the Expected Scalarised Returns with Generative Flow Models
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Rate-Optimal Online Convex Optimization in Adaptive Linear Control
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Mixture of Interpretable Experts for Continuous Control
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Adaptive Belief Discretization for POMDP Planning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: IQ-Learn: Inverse soft-Q Learning for Imitation
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: On Bayesian Value Function Distributions.
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Minimax-Bayes Reinforcement Learning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Formulation and validation of a complete car-following model based on deep reinforcement learning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: A Deep Reinforcement Learning Approach to Supply Chain Inventory Management
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: On learning history-based policies for controlling Markov Decision Processes
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Belief states of POMDPs and internal states of recurrent RL agents: an empirical analysis of their mutual information
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Get Back Here: Robust Imitation by Return-to-Distribution Planning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Semi-Counterfactual Risk Minimization Via Neural Networks
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Dynamic Pricing with Online Data Aggregation and Learning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Newton-based Policy Search for Networked Multi-agent Reinforcement Learning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Group Fairness in Reinforcement Learning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Cross-Entropy Soft-Risk Reinforcement Learning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: $Q$-Learning for $L_p$ Robust Markov Decision Processes
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Learning Efficiently Function Approximation for Contextual MDP
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Risk-aware linear bandits with convex loss
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Local Advantage Networks for Multi-Agent Reinforcement Learning in Dec-POMDPs
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration \& Planning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: RLDesigner: Toward Framing Spatial Layout Planning as a Markov Decision Process
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Optimistic Risk-Aware Model-based Reinforcement Learning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Quantification of Transfer in Reinforcement Learning via Regret Bounds for Learning Agents
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Scalable Deep Reinforcement Learning Algorithms for Mean Field Games
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Cooperative Online Learning in Stochastic and Adversarial MDPs
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Interactive Inverse Reinforcement Learning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: A Unifying Framework for Reinforcement Learning and Planning
Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Neural Distillation as a State Representation Bottleneck in Reinforcement Learning

Poster Session 21 September

When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Optimistic PAC Reinforcement Learning: the Instance-Dependent View
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Active Exploration for Inverse Reinforcement Learning
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Boosting reinforcement learning with sparse and rare rewards using Fleming-Viot particle systems
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Look where you look! Saliency-guided Q-networks for visual RL tasks
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Local Feature Swapping for Generalization in Reinforcement Learning
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: On Convergence of Neural asynchronous Q-iteration
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: On Reward Binarisation and Bayesian Agents
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Goal-Conditioned Generators of Deep Policies
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Tabular and Deep Learning of Whittle Index
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularization
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Direct Advantage Estimation
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Offline Credit Assignment in Deep Reinforcement Learning with Hindsight Discriminator Networks
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Continuous Control with Action Quantization from Demonstrations
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Learning Generative Models with Goal-conditioned Reinforcement Learning
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Analyzing Thompson Sampling for Contextual Bandits via the Lifted Information Ratio
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: A Sparse Linear Program for Global Planning in Large MDPs
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Sample-Efficient Reinforcement Learning of Partially Observable Markov Games
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Entropy Regularized Reinforcement Learning with Cascading Networks
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Regret Bounds for Satisficing in Multi-Armed Bandit Problems
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Analysis of Stochastic Processes through Replay Buffers
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Reinforcement Learning with a Terminator
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative Losses
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Deep Coherent Exploration for Continuous Control
When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation

Registration

Registrations for the 15th European Workshop on Reinforcement Learning are now open! The registration includes participation in the main event activities, as well as lunch all days of the event and a social dinner on September 20th. The early bird registration period ends on July 31st August 5th. Thanks to the generosity of our sponsors, we will be able to offer to students a limited number of participation grants, in the form of fee waivers. We will offer the grants based on merit and D&I considerations. If you think you come from an underrepresented group or have financial needs, please consider applying for the grant. The grant application deadline is July 21st. The grant notification will be given by July 28th, to allow students not receiving the grant to complete the payment for the early bird registration.

Description

The 15th European workshop on reinforcement learning (EWRL 2022) invites reinforcement learning researchers to participate in the revival of this world class event. We plan to make this an exciting event for researchers worldwide, not only for the presentation of top quality papers, but also as a forum for ample discussion of open problems and future research directions.

Reinforcement learning is an active field of research which deals with the problem of sequential decision making in unknown (and often) stochastic and/or partially observable environments. Recently there has been a wealth of both impressive empirical results, as well as significant theoretical advances. Both types of advances are of significant importance and we would like to create a forum to discuss such interesting results.

The workshop will cover a range of sub-topics including (but not limited to):

MDPs and Dynamic Programming
Temporal Difference Methods
Policy Optimization
Model-based RL and Planning
Exploration in RL
Offline RL
Unsupervised and Intrinsically Motivated RL
Representation Learning in RL
Lifelong and Non-stationary RL
Hierarchical RL
Partially observable RL
Multi-agent RL
Multi-objective RL
Transfer and Meta RL
Deep RL
Imitation Learning and Inverse RL
Risk-sensitive and robust RL
Theoretical aspects of RL
Applications and Real-life RL

Paper Submission

We invite submissions for the 15th European Workshop on Reinforcement Learning (EWRL 2022) from the entire reinforcement learning spectrum. The papers can present new work or give a summary of recent work of the author(s). There will be no proceedings of EWRL15. As such, papers that are intended for or have been submitted to other conferences or journals are also welcome. Submitted papers will be reviewed by the program committee in a double-blind procedure.

Submissions should follow the JMLR format adapted for EWRL linked below. There is a limit of 9 pages, excluding acknowledgments, references, and appendix. Authors of accepted papers will be allowed an additional page to prepare the camera-ready version. All accepted papers will be considered for the poster sessions. Outstanding papers will also be considered for a 20 minutes oral presentation.

Please send your inquiries by email to the organizers at ewrl2022@gmail.com.

Submission deadline: 1 June 2022 8 June 2022 11.59pm AOE
Page limit: 9 pages excluding acknowledgments, references, and appendix
Paper format: EWRL 2022 Author Kit
Paper Submissions: CMT

Important Dates

Paper submissions due: 1 June 2022 8 June 2022 11.59pm AOE
Early Registration begins: 1 July 2022
Participation grant application begins: 1 July 2022
Paper notification: 14 July 2022
Participation grant application ends: 21 July 2022
Participation grant notification: 28 July 2022
Early registration ends: 31 July 5 August 2022
Camera ready due: 1 September 2022
Workshop begins: 19 September 2022
Workshop ends: 21 September 2022

Confirmed Invited Speakers

Sarah Perrin (Inria Lille)
- Topic: Scaling up MARL with MFGs and vice versa!
Niao He (ETH Zurich)
- Topic: Complexities of Actor-critic Methods for Regularized MDPs and POMDPs
Alessandro Lazaric (Facebook AI Research)
- Topic: Understanding (unsupervised) exploration in goal-based RL
Gergely Neu (Universitat Pompeu Fabra)
- Topic: Primal-Dual Methods for Reinforcement Learning
Ann Nowé (Vrije Universiteit Brussel)
- Topic: Beyond the optimal action in RL
Jan Peters (Technische Universität Darmstadt)
- Topic: Robot RL: Lessons from the Physical World
Ciara Pike-Burke (Imperial College London)
- Topic: Multi-armed bandits with history dependent rewards
Richard Sutton (University of Alberta – DeepMind)
- Topic: An Architecture for Intelligence

Confirmed Tutorial Sessions

Matthieu Geist (Google Research)
- Topic: Regularization in Reinforcement Learning
Matteo Pirotta (Facebook AI Research)
- Topic: Exploration in Reinforcement Learning

Organizing Committee

General Chair

Marcello Restelli (Politecnico di Milano – Milan, Italy)

Organizing Chair

Francesco Trovò (Politecnico di Milano – Milan, Italy)

Program Chair

Alberto Maria Metelli (Politecnico di Milano – Milan, Italy)

Program Co-Chairs

Mirco Mutti (Universita di Bologna- Bologna, Politecnico di Milano – Milan, Italy)
Pierre Liotet (Politecnico di Milano – Milan, Italy)

Diversity and Inclusion Chairs

Giorgia Ramponi (ETH AI Center)
Riccardo Zamboni (Politecnico di Milano – Milan, Italy)

Workflow Chairs

Lorenzo Bisi (Politecnico di Milano – Milan, Italy)
Luca Sabbioni (Politecnico di Milano – Milan, Italy)

Communication Chairs

Amarildo Likmeta (Universita di Bologna- Bologna, Politecnico di Milano – Milan, Italy)
Marco Mussi (Politecnico di Milano – Milan, Italy)

External Organizers

European Laboratory for Learning and Intelligent Systems (ELLIS) – Milan Unit

ETH AI Center

EWRL 2022 invites companies or research institutions involved in fundamental research or application of reinforcement learning, to become official sponsors of the event. EWRL 2022 offers a single level of sponsorship, at the cost of 5000€, with the following benefits:

Logo display on the official EWRL 2022 website
Logo display on the Welcome Kit distributed during the event
A poster session slot for presenting your research or applications
Access to the EWRL recruitment database
Two full-access registrations to the event

Workshop Venue

Credits: Wikimedia

EWRL2022 takes place in Milan, Italy. The precise address is:

**Aula De Carli – Politecnico di Milano – Campus Bovisa**

Via Candiani, 72 – 20158 – Milano (MI) – Italy

Via Giuseppe Candiani

Reaching the Venue

Milan is very easy to travel to by car, train or airplane. The easiest way to reach Milan is by train, with many daily trains reaching the stations of Milano Centrale, Milano Porta Garibaldi or Milano Cadorna. By airplane, the best airports for reaching the workshop venue are Milano Malpensa or Milano Linate. You can also come via the Orio al Serio (Bergamo) Airport. After having reached Milan by train or airplane, to reach the Workshop venue you can chose the following options:

If you are in Milano Malpensa Airport, you can take the Malpensa Express Train, directly from the airport every 30 mins. The train will have final destination either Milano Cadorna, or Milano Centrale but in both cases will stop in the station of Milano Bovisa Politecnico, where the workshop will take place. So we suggest not to take the train to the final destination, but rather stop directly in Milano Bovisa Politecnico. Milan can also be reached by bus departing from Malpensa Airport. In this case you will reach Milano Centrale train station (in around 1 hour). From Milano Centrale, you can take every train passing from local train lines S1, S2 or S13.
If you reach Milan in Milano Linate Airport, you will first need to reach a train station either by Taxi or by bus number 73. The easiest station to reach is Milano centrale by taking bus 73 at the airport and then switching to bus 91. Once you reach a train station, you can take every train passing from the local train lines S1, S2 or S13, as they will stop in Milano Bovisa Politecnico train station. These buses and trains are easily accessible with a regular single use ATM metro ticket.
If you reach Milan in the Orio al Serio (Bergamo) Airport, sadly no railway connection to Milan is available. Nevertheless, you can take a Taxi or better yet a bus from the airport directly to Milano Centrale. The bus can be taken directly at the airport exit, it is available every 20-30 minutes and it reaches Milano Centrale Station in 50-60 minutes. From Milano Centrale, you can take every train passing from local train lines S1, S2 or S13 to reach the Milano Bovisa Politecnico train station.
If you reach Milano by train and you do not pass from Milano Bovisa Politecnico train station before reaching your final destination, the easiest way to the workshop venue is to take any train passing from local train lines S1, S2 or S13.

Program Committee

Aditya Modi
Ahmed Touati
Alain Dutech
Aldo Pacchiano
Alessandro Lazaric
Alessio Russo
Alexis Jacq
Amarildo Likmeta
André Biedenkapp
Andrea Tirinzoni
Boris Belousov
Brendan O’Donoghue
Carlo D’Eramo
Christos Dimitrakakis
Ciara Pike-Burke
Claire Vernade
Conor F Hayes
David Abel
David Brandfonbrener
David Meger
Davide Tateo
Debabrota Basu
Divya Grover
Dongruo Zhou
Dylan R Ashley
Elena Smirnova
Emilie Kaufmann
Emmanuel Esposito
Eugenio Bargiacchi
Felipe Leno da Silva
Felix Berkenkamp
Francesco Faccio
Fredrik Heintz
Gergely Neu
Germano Gabbianelli
Gianluca Drappo
Giorgia Ramponi
Giorgio Manganini
Giuseppe Canonaco
Glen Berseth
Hannes Eriksson
Hao Liu
Harsh Satija
Hélène Plisnier
Ido Greenberg
Jens Kober
Johan Källström
Jonathan J Hunt
Julien Perolat
Kamyar Azizzadenesheli
Khaled Eldowa
Khazatsky Alexander
Khimya Khetarpal
Kianté Brantley
Léonard Hussenot
Lior Shani
Martin Klissarov
Martino Bernasconi
Mathieu Reymond
Matteo Papini
Matteo Pirotta
Matthew E. Taylor
Matthieu Geist
Nico Montali
Nicolò A Cesa-Bianchi
Olivier Bachem
Omar Darwiche Domingues
Paolo Bonetti
Patrick Mannion
Patrick Saux
Peter Vamplew
Philippe Preux
Pierluca D’Oro
Pierre Liotet
Pierre Menard
Prashanth L.A.
Puze Liu
Quanquan Gu
Rafael Rodriguez Sanchez
Rahul Savani
Riad Akrour
Riccardo Poiani
Richard S Sutton
Robert Dadashi
Roberta Raileanu
Romina Abachi
Ronald Ortner
Roxana Radulescu
Rui YUAN
Samuele Tosatto
Shangdong Yang
Simon Du
Tal Lancewicki
Taylor W Killian
Tengyang Xie
Thanh Nguyen-Tang
Tian Xu
Tianwei Ni
Tom Schaul
Tom Zahavy
Tommaso R Cesari
Weitong ZHANG
Yannis Flet-Berliac
Yi Su
Yishay Mansour
Younggyo Seo

Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Newton-based Policy Search for Networked Multi-agent Reinforcement LearningManganini, Giorgio; Fioravanti, Simone; Ramponi, Giorgia Accept (Oral)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Last Switch Dependent Analysis of Satiation and Seasonality in BanditsLaforgue, Pierre; Clerici, Giulia; Cesa-Bianchi, Nicolò; Gilad-Bachrach, Ran Accept (Oral)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Local Feature Swapping for Generalization in Reinforcement LearningBertoin, David; Rachelson, Emmanuel Accept (Oral)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near OptimalityZahavy, Tom; Schroecker, Yannick; Behbahani, Feryal; Baumli, Kate; Flennerhag, Sebastian; Hou, Shaobo; Singh, Satinder Accept (Oral)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Dynamic Pricing with Online Data Aggregation and LearningGenalti, Gianmarco; Mussi, Marco; Nuara, Alessandro; Gatti, Nicola Accept (Oral)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Scalable Deep Reinforcement Learning Algorithms for Mean Field GamesLauriere, Mathieu; Perrin, Sarah; Girgin, Sertan; Muller, Paul; Jain, Ayush; Cabannes, Théophile; Piliouras, Georgios; Perolat, Julien; Élie, Romuald; Pietquin, Olivier; Geist, Matthieu Accept (Oral)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Optimistic PAC Reinforcement Learning: the Instance-Dependent ViewTirinzoni, Andrea; Al Marjani, Aymen; Kaufmann, Emilie Accept (Oral)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Group Fairness in Reinforcement LearningSatija, Harsh; Lazaric, Alessandro; Pirotta, Matteo; Pineau, Joelle Accept (Oral)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): IQ-Learn: Inverse soft-Q Learning for ImitationGarg, Divyansh; Chakraborty, Shuvam; Cundy, Chris; Song, Jiaming; Geist, Matthieu; Ermon, Stefano Accept (Oral)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal SpecificationsG. León, Borja; Shanahan, Murray; Belardinelli, Francesco Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Deep Reinforcement Learning Approach to Supply Chain Inventory ManagementStranieri, Francesco; Stella, Fabio Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to ActJacq, Alexis; Ferret, Johan; Geist, Matthieu; Pietquin, Olivier Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Continuous Control with Action Quantization from DemonstrationsDadashi, Robert; Hussenot, Léonard; Vincent, Damien; Girgin, Sertan; Raichuk, Anton; Geist, Matthieu; Pietquin, Olivier Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Unifying Framework for Reinforcement Learning and PlanningMoerland, Thomas M; Broekens, Joost; Plaat, Aske; Jonker, Catholijn M Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic ResetsStrupl, Miroslav; Faccio, Francesco; Ashley, Dylan R; Schmidhuber, Jürgen ; Srivastava, Rupesh Kumar Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative LossesShang, Xuedong; Shao, Han; Qian, Jian Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Semi-Counterfactual Risk Minimization Via Neural NetworksAminian, Gholamali; Vega, Roberto I; Rivasplata, Omar; Toni, Laura; Rodrigues, Miguel Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): When Privacy Meets Partial Information: A Refined Analysis of Differentially Private BanditsAzize, Achraf; Basu, Debabrota Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Deep Coherent Exploration for Continuous ControlZhang, Yijie; van Hoof, Herke Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration \& PlanningOuhamma, Reda; Basu, Debabrota; Maillard, Odalric Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Neural Distillation as a State Representation Bottleneck in Reinforcement LearningGuillet, Valentin; Wilson, Dennis; Aguilar-Melchor, Carlos; Rachelson, Emmanuel Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Tabular and Deep Learning of Whittle IndexRobledo, Francisco; Ayesta, Urtzi; Avrachenkov, Konstantin; Borkar, Vivek Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Learning Efficiently Function Approximation for Contextual MDPLevy, Orin; Mansour, Yishay Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDPLevy, Orin; Mansour, Yishay Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Look where you look! Saliency-guided Q-networks for visual RL tasksBertoin, David; Zouitine, Adil; Zouitine, Mehdi; Rachelson, Emmanuel Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Quantification of Transfer in Reinforcement Learning via Regret Bounds for Learning AgentsTuynman, Adrienne; Ortner, Ronald Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Regret Bounds for Satisficing in Multi-Armed Bandit ProblemsMichel, Thomas; Hajiabolhassan, Hossein; Ortner, Ronald Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Risk-aware linear bandits with convex lossSaux, Patrick; Maillard, Odalric Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Interactive Inverse Reinforcement LearningKleine Büning, Thomas; George, Anne-Marie; Dimitrakakis, Christos Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Reinforcement Learning with a TerminatorGuy, Tennenholtz; Merlis, Nadav; Shani, Lior; Mannor, Shie; Shalit, Uri; Chechik, Gal; Hallak, Assaf; Dalal, Gal Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): On Convergence of Neural asynchronous Q-iterationSmirnova, Elena Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Curriculum Reinforcement Learning via Constrained Optimal TransportKlink, Pascal; Yang, Haoyi; D’Eramo, Carlo; Peters, Jan; Pajarinen, Joni Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Cross-Entropy Soft-Risk Reinforcement LearningGreenberg, Ido; Chow, Yinlam; Ghavamzadeh, Mohammad; Mannor, Shie Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Active Exploration for Inverse Reinforcement LearningLindner, David; Krause, Andreas; Ramponi, Giorgia Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Offline Credit Assignment in Deep Reinforcement Learning with Hindsight Discriminator NetworksFerret, Johan; Pietquin, Olivier; Geist, Matthieu Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Local Advantage Networks for Multi-Agent Reinforcement Learning in Dec-POMDPsAvalos, Raphael; Reymond, Mathieu; Nowé, Ann; Roijers, Diederik M Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Sample-Efficient Reinforcement Learning of Partially Observable Markov GamesLiu, Qinghua; Szepesvari, Csaba; Jin, Chi Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): General Policy Evaluation and Improvement by Learning to Identify Few But Crucial StatesFaccio, Francesco; Ramesh, Aditya; Herrmann, Vincent; Harb, Jean; Schmidhuber, Jürgen Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Goal-Conditioned Generators of Deep PoliciesFaccio, Francesco; Herrmann, Vincent; Ramesh, Aditya; Kirsch, Louis; Schmidhuber, Jürgen Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory ManagementMeisheri, Hardik; Nath, Somjit; Baranwal, Mayank; Khadilkar, Harshad Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Near-Optimal Regret for Adversarial MDP with Delayed Bandit FeedbackJin, Tiancheng; Lancewicki, Tal; Luo, Haipeng; Mansour, Yishay; Rosenberg, Aviv Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Rate-Optimal Online Convex Optimization in Adaptive Linear ControlCassel, Asaf B; Cohen, Alon; Koren, Tomer Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Cooperative Online Learning in Stochastic and Adversarial MDPsLancewicki, Tal; Rosenberg, Aviv; Mansour, Yishay Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Multi-Objective Coordination Graphs for the Expected Scalarised Returns with Generative Flow ModelsHayes, Conor F; Verstraeten, Timothy; Roijers, Diederik M; Howley, Enda; Mannion, Patrick Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Get Back Here: Robust Imitation by Return-to-Distribution PlanningCideron, Geoffrey; Pietquin, Olivier; Dadashi, Robert; Dulac-Arnold, Gabriel; Tabanpour, Baruch; Geist, Matthieu; Hussenot, Léonard; Curi, Sebastian; Girgin, Sertan Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Analysis of Stochastic Processes through Replay BuffersDi-Castro, Shirli; Mannor, Shie; Di Castro, Dotan Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Mixture of Interpretable Experts for Continuous ControlTateo, Davide; Akrour, Riad; Peters, Jan Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): On Reward Binarisation and Bayesian AgentsCatt, Elliot; Hutter, Marcus; Veness, Joel Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): $Q$-Learning for $L_p$ Robust Markov Decision ProcessesKumar, Navdeep; Wang, Kaixin; Levy, Kfir; Mannor, Shie Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic MotivationYang, Zhao; Moerland, Thomas M; Preuss, Mike; Plaat, Aske Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru*; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Curious Exploration via Structured World Models Yields Zero-Shot Object ManipulationSancaktar, Cansu *; Blaes, Sebastian; Martius, Georg Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Linear Convergence of Natural Policy Gradient Methods with Log-Linear PoliciesYUAN, Rui; Gower, Robert M; Lazaric, Alessandro; Du, Simon; Xiao, Lin Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru*; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback GraphsRouyer , Chloé *; van der Hoeven, Dirk; Cesa-Bianchi, Nicolò; Seldin, Yevgeny Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): RLDesigner: Toward Framing Spatial Layout Planning as a Markov Decision ProcessKakooee, Reza; Dillenburger, Benjamin Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Belief states of POMDPs and internal states of recurrent RL agents: an empirical analysis of their mutual informationLambrechts, Gaspard; Bolland, Adrien; Ernst, Damien Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual BanditsNeu, Gergely; Olkhovskaya, Julia; Papini, Matteo; Schwartz, Ludovic Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Formulation and validation of a complete car-following model based on deep reinforcement learningHart, Fabian Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPsTirinzoni, Andrea; Al Marjani, Aymen; Kaufmann, Emilie Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Scalable Representation Learning in Linear Contextual Bandits with Constant Regret GuaranteesTirinzoni, Andrea; Papini, Matteo; Touati, Ahmed; Lazaric, Alessandro; Pirotta, Matteo Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Learning Generative Models with Goal-conditioned Reinforcement LearningVargas Vieyra, Mariana; Menard, Pierre Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Adaptive Belief Discretization for POMDP PlanningGrover, Divya; Dimitrakakis, Christos Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Boosting reinforcement learning with sparse and rare rewards using Fleming-Viot particle systemsMastropietro, Daniel G; Majewski, Szymon; Ayesta, Urtzi; Jonckheere, Matthieu Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Sparse Linear Program for Global Planning in Large MDPsNeu, Gergely; Okolo, Nneka M Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Best-of-Both-Worlds Algorithm for Bandits with Delayed FeedbackMasoudian, Saeed; Zimmert, Julian; Seldin, Yevgeny Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Entropy Regularized Reinforcement Learning with Cascading NetworksShilova, Alena; Della Vecchia, Riccardo; Preux, Philippe; Akrour, Riad Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularizationPatil, Gandharv; L.A., Prashanth; Precup, Doina Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): On learning history-based policies for controlling Markov Decision ProcessesPatil, Gandharv; Mahajan, Aditya; Precup, Doina Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Minimax-Bayes Reinforcement LearningKleine Büning, Thomas; Dimitrakakis, Christos; Eriksson, Hannes; Grover, Divya; Jorge, Emilio Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): On Bayesian Value Function Distributions.Jorge, Emilio; Eriksson, Hannes; Dimitrakakis, Christos; Basu, Debabrota; Grover, Divya Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): TempRL: Temporal Priors for Exploration in Off-Policy Reinforcement LearningBagatella, Marco; Christen, Sammy; Hilliges, Otmar Accept (Poster)
Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Optimistic Risk-Aware Model-based Reinforcement LearningAbachi, Romina; Farahmand, Amir-massoud Accept (Poster)

Photos from the Workshop

Code of Conduct

The official EWRL 2022 Code of Conduct can be found here