EWRL15 (2022) | European Workshops on Reinforcement Learning
The 15th European Workshop on Reinforcement Learning (EWRL 2022) —————————————————————-
Dates: 19-21 September 2022
Location: **Aula De Carli – Politecnico di Milano – Campus Bovisa**, Building B9
Via Durando, 10 – 20158 – Milano (MI) – Italy
There are many entrances to the campus, we suggest to use the entrance in via Durando 10 to reach the venue easier
Schedule (add to Google Calendar)
Monday- 19/09/2022
8:30 – 9:30 Check-in
9:30 – 10:30 Tutorial 1 (part 1) Matteo Pirotta: “Exploration in Reinforcement Learning”
10:30 – 11:00 coffee break
11:00 – 12:00 Tutorial 1 (part 2)
12:00 – 13:00 Sponsor Talks 1
13:00 – 14:30 Lunch break
14:30 – 15:30 Tutorial 2 (part 1) Matthieu Geist: “Regularization in Reinforcement Learning”
15:30 – 16:00 coffee break
16:00 – 17:00 Tutorial 2 (part 2)
17:00 – 18:00 Sponsor Talks 2
18:00 – 20:00 Welcome reception
Tuesday- 20/09/2022
8:00 – 9:00 Check-in
8.45 – 9.00 Opening remarks
9:00 – 9:40 Invited talk 1 Sarah Perrin: “Scaling up MARL with MFGs and vice versa!”
9:40 – 10:00 Contributed talk 1 (Scalable Deep Reinforcement Learning Algorithms for Mean Field Games)
10:00 – 11:00 Poster session 1 (with Coffee break)
11:00 – 11:40 Invited talk 2 Niao He: “Complexities of Actor-critic Methods for Regularized MDPs and POMDPs”
11:40– 12:00 Contributed talk 2 (IQ-Learn: Inverse soft-Q Learning for Imitation)
12:00 – 12:20 Contributed talk 3 (Newton-based Policy Search for Networked Multi-agent Reinforcement Learning)
12:20 – 14:00 Lunch break
14:00 – 14:40 Invited talk 3 Ann Nowé: “Beyond the optimal action in Reinforcement Learning”
14:40 – 15:00 Contributed talk 4 (Group Fairness in Reinforcement Learning)
15:00 – 15:20 Contributed talk 5 (Direct Advantage Estimation)
15:20 – 16:00 Invited talk 4 Jan Peters: “Robot RL: Lessons from the Physical World”
16:00 – 18:00 Poster session 2 (with Coffee break)
20:00 Social Dinner
Wednesday- 21/09/2022
8:00 – 9:00 Check-in
9:00 – 9:40 Invited talk 1 Alessandro Lazaric: Understanding (unsupervised) exploration in goal-based Reinforcement Learning
9:40 – 10:00 Contributed talk 1 (Optimistic PAC Reinforcement Learning: the Instance-Dependent View)
10:00 – 11:00 Poster session 1 (with Coffee break)
11:00– 11:40 Invited talk 2 Ciara Pike-Burke: “Multi-armed bandits with history dependent rewards”
11:40 – 12:00 Contributed talk 2 (A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits)
12:00 – 12:20 Contributed talk 3 (Dynamic Pricing with Online Data Aggregation and Learning)
12:20 – 14:00 Lunch break
14:00 – 14:40 Invited talk 3 Gergely Neu: “Primal-Dual Methods for Reinforcement Learning”
14:40 – 15:00 Contributed talk 4 (Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality)
15:00 – 15:20 Contributed talk 5 (Local Feature Swapping for Generalization in Reinforcement Learning)
15:20 – 16:00 Invited talk 4 Richard Sutton: “An Architecture for Intelligence”
16:00 – 18:00 Poster session 2 (with Coffee break)
Poster Session Assignment
Each poster is assigned a day ( either September 20 or September 21) and will be presented in both (morning and afternoon) poster sessions of that day
Poster Session 20 September
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Curriculum Reinforcement Learning via Constrained Optimal Transport
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Multi-Objective Coordination Graphs for the Expected Scalarised Returns with Generative Flow Models
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Rate-Optimal Online Convex Optimization in Adaptive Linear Control
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Mixture of Interpretable Experts for Continuous Control
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Adaptive Belief Discretization for POMDP Planning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: IQ-Learn: Inverse soft-Q Learning for Imitation
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: On Bayesian Value Function Distributions.
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Minimax-Bayes Reinforcement Learning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Formulation and validation of a complete car-following model based on deep reinforcement learning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: A Deep Reinforcement Learning Approach to Supply Chain Inventory Management
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: On learning history-based policies for controlling Markov Decision Processes
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Belief states of POMDPs and internal states of recurrent RL agents: an empirical analysis of their mutual information
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Get Back Here: Robust Imitation by Return-to-Distribution Planning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Semi-Counterfactual Risk Minimization Via Neural Networks
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Dynamic Pricing with Online Data Aggregation and Learning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Newton-based Policy Search for Networked Multi-agent Reinforcement Learning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory Management
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Group Fairness in Reinforcement Learning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Cross-Entropy Soft-Risk Reinforcement Learning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: $Q$-Learning for $L_p$ Robust Markov Decision Processes
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Learning Efficiently Function Approximation for Contextual MDP
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Risk-aware linear bandits with convex loss
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Local Advantage Networks for Multi-Agent Reinforcement Learning in Dec-POMDPs
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration \& Planning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: RLDesigner: Toward Framing Spatial Layout Planning as a Markov Decision Process
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Optimistic Risk-Aware Model-based Reinforcement Learning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Quantification of Transfer in Reinforcement Learning via Regret Bounds for Learning Agents
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Linear Convergence of Natural Policy Gradient Methods with Log-Linear Policies
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Scalable Deep Reinforcement Learning Algorithms for Mean Field Games
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Cooperative Online Learning in Stochastic and Adversarial MDPs
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Interactive Inverse Reinforcement Learning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: A Unifying Framework for Reinforcement Learning and Planning
- Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees: Neural Distillation as a State Representation Bottleneck in Reinforcement Learning
Poster Session 21 September
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to Act
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Optimistic PAC Reinforcement Learning: the Instance-Dependent View
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Active Exploration for Inverse Reinforcement Learning
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal Specifications
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback Graphs
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Boosting reinforcement learning with sparse and rare rewards using Fleming-Viot particle systems
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Look where you look! Saliency-guided Q-networks for visual RL tasks
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Local Feature Swapping for Generalization in Reinforcement Learning
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: On Convergence of Neural asynchronous Q-iteration
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: On Reward Binarisation and Bayesian Agents
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Goal-Conditioned Generators of Deep Policies
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Tabular and Deep Learning of Whittle Index
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularization
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: A Last Switch Dependent Analysis of Satiation and Seasonality in Bandits
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Direct Advantage Estimation
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Offline Credit Assignment in Deep Reinforcement Learning with Hindsight Discriminator Networks
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Continuous Control with Action Quantization from Demonstrations
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: SFP: State-free Priors for Exploration in Off-Policy Reinforcement Learning
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Learning Generative Models with Goal-conditioned Reinforcement Learning
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Analyzing Thompson Sampling for Contextual Bandits via the Lifted Information Ratio
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: A Sparse Linear Program for Global Planning in Large MDPs
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDP
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Sample-Efficient Reinforcement Learning of Partially Observable Markov Games
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Entropy Regularized Reinforcement Learning with Cascading Networks
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Regret Bounds for Satisficing in Multi-Armed Bandit Problems
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Analysis of Stochastic Processes through Replay Buffers
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Near-Optimal Regret for Adversarial MDP with Delayed Bandit Feedback
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Reinforcement Learning with a Terminator
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near Optimality
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative Losses
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: Deep Coherent Exploration for Continuous Control
- When Privacy Meets Partial Information: A Refined Analysis of Differentially Private Bandits: First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic Motivation
Registration
Registrations for the 15th European Workshop on Reinforcement Learning are now open! The registration includes participation in the main event activities, as well as lunch all days of the event and a social dinner on September 20th. The early bird registration period ends on July 31st August 5th. Thanks to the generosity of our sponsors, we will be able to offer to students a limited number of participation grants, in the form of fee waivers. We will offer the grants based on merit and D&I considerations. If you think you come from an underrepresented group or have financial needs, please consider applying for the grant. The grant application deadline is July 21st. The grant notification will be given by July 28th, to allow students not receiving the grant to complete the payment for the early bird registration.
Description
The 15th European workshop on reinforcement learning (EWRL 2022) invites reinforcement learning researchers to participate in the revival of this world class event. We plan to make this an exciting event for researchers worldwide, not only for the presentation of top quality papers, but also as a forum for ample discussion of open problems and future research directions.
Reinforcement learning is an active field of research which deals with the problem of sequential decision making in unknown (and often) stochastic and/or partially observable environments. Recently there has been a wealth of both impressive empirical results, as well as significant theoretical advances. Both types of advances are of significant importance and we would like to create a forum to discuss such interesting results.
The workshop will cover a range of sub-topics including (but not limited to):
- MDPs and Dynamic Programming
- Temporal Difference Methods
- Policy Optimization
- Model-based RL and Planning
- Exploration in RL
- Offline RL
- Unsupervised and Intrinsically Motivated RL
- Representation Learning in RL
- Lifelong and Non-stationary RL
- Hierarchical RL
- Partially observable RL
- Multi-agent RL
- Multi-objective RL
- Transfer and Meta RL
- Deep RL
- Imitation Learning and Inverse RL
- Risk-sensitive and robust RL
- Theoretical aspects of RL
- Applications and Real-life RL
Paper Submission
We invite submissions for the 15th European Workshop on Reinforcement Learning (EWRL 2022) from the entire reinforcement learning spectrum. The papers can present new work or give a summary of recent work of the author(s). There will be no proceedings of EWRL15. As such, papers that are intended for or have been submitted to other conferences or journals are also welcome. Submitted papers will be reviewed by the program committee in a double-blind procedure.
Submissions should follow the JMLR format adapted for EWRL linked below. There is a limit of 9 pages, excluding acknowledgments, references, and appendix. Authors of accepted papers will be allowed an additional page to prepare the camera-ready version. All accepted papers will be considered for the poster sessions. Outstanding papers will also be considered for a 20 minutes oral presentation.
Please send your inquiries by email to the organizers at ewrl2022@gmail.com.
- Submission deadline: 1 June 2022 8 June 2022 11.59pm AOE
- Page limit: 9 pages excluding acknowledgments, references, and appendix
- Paper format: EWRL 2022 Author Kit
- Paper Submissions: CMT
Important Dates
- Paper submissions due: 1 June 2022 8 June 2022 11.59pm AOE
- Early Registration begins: 1 July 2022
- Participation grant application begins: 1 July 2022
- Paper notification: 14 July 2022
- Participation grant application ends: 21 July 2022
- Participation grant notification: 28 July 2022
- Early registration ends: 31 July 5 August 2022
- Camera ready due: 1 September 2022
- Workshop begins: 19 September 2022
- Workshop ends: 21 September 2022
Confirmed Invited Speakers
- Sarah Perrin (Inria Lille)
- Topic: Scaling up MARL with MFGs and vice versa!
- Niao He (ETH Zurich)
- Topic: Complexities of Actor-critic Methods for Regularized MDPs and POMDPs
- Alessandro Lazaric (Facebook AI Research)
- Topic: Understanding (unsupervised) exploration in goal-based RL
- Gergely Neu (Universitat Pompeu Fabra)
- Topic: Primal-Dual Methods for Reinforcement Learning
- Ann Nowé (Vrije Universiteit Brussel)
- Topic: Beyond the optimal action in RL
- Jan Peters (Technische Universität Darmstadt)
- Topic: Robot RL: Lessons from the Physical World
- Ciara Pike-Burke (Imperial College London)
- Topic: Multi-armed bandits with history dependent rewards
- Richard Sutton (University of Alberta – DeepMind)
- Topic: An Architecture for Intelligence
Confirmed Tutorial Sessions
- Matthieu Geist (Google Research)
- Topic: Regularization in Reinforcement Learning
- Matteo Pirotta (Facebook AI Research)
- Topic: Exploration in Reinforcement Learning
Organizing Committee
General Chair
- Marcello Restelli (Politecnico di Milano – Milan, Italy)
Organizing Chair
- Francesco Trovò (Politecnico di Milano – Milan, Italy)
Program Chair
- Alberto Maria Metelli (Politecnico di Milano – Milan, Italy)
Program Co-Chairs
-
Mirco Mutti (Universita di Bologna- Bologna, Politecnico di Milano – Milan, Italy)
-
Pierre Liotet (Politecnico di Milano – Milan, Italy)
Diversity and Inclusion Chairs
-
Giorgia Ramponi (ETH AI Center)
-
Riccardo Zamboni (Politecnico di Milano – Milan, Italy)
Workflow Chairs
-
Lorenzo Bisi (Politecnico di Milano – Milan, Italy)
-
Luca Sabbioni (Politecnico di Milano – Milan, Italy)
Communication Chairs
-
Amarildo Likmeta (Universita di Bologna- Bologna, Politecnico di Milano – Milan, Italy)
-
Marco Mussi (Politecnico di Milano – Milan, Italy)
External Organizers
EWRL 2022 invites companies or research institutions involved in fundamental research or application of reinforcement learning, to become official sponsors of the event. EWRL 2022 offers a single level of sponsorship, at the cost of 5000€, with the following benefits:
- Logo display on the official EWRL 2022 website
- Logo display on the Welcome Kit distributed during the event
- A poster session slot for presenting your research or applications
- Access to the EWRL recruitment database
- Two full-access registrations to the event
Workshop Venue
Credits: Wikimedia
EWRL2022 takes place in Milan, Italy. The precise address is:
**Aula De Carli – Politecnico di Milano – Campus Bovisa**
Via Candiani, 72 – 20158 – Milano (MI) – Italy
Reaching the Venue
Milan is very easy to travel to by car, train or airplane. The easiest way to reach Milan is by train, with many daily trains reaching the stations of Milano Centrale, Milano Porta Garibaldi or Milano Cadorna. By airplane, the best airports for reaching the workshop venue are Milano Malpensa or Milano Linate. You can also come via the Orio al Serio (Bergamo) Airport. After having reached Milan by train or airplane, to reach the Workshop venue you can chose the following options:
- If you are in Milano Malpensa Airport, you can take the Malpensa Express Train, directly from the airport every 30 mins. The train will have final destination either Milano Cadorna, or Milano Centrale but in both cases will stop in the station of Milano Bovisa Politecnico, where the workshop will take place. So we suggest not to take the train to the final destination, but rather stop directly in Milano Bovisa Politecnico. Milan can also be reached by bus departing from Malpensa Airport. In this case you will reach Milano Centrale train station (in around 1 hour). From Milano Centrale, you can take every train passing from local train lines S1, S2 or S13.
- If you reach Milan in Milano Linate Airport, you will first need to reach a train station either by Taxi or by bus number 73. The easiest station to reach is Milano centrale by taking bus 73 at the airport and then switching to bus 91. Once you reach a train station, you can take every train passing from the local train lines S1, S2 or S13, as they will stop in Milano Bovisa Politecnico train station. These buses and trains are easily accessible with a regular single use ATM metro ticket.
- If you reach Milan in the Orio al Serio (Bergamo) Airport, sadly no railway connection to Milan is available. Nevertheless, you can take a Taxi or better yet a bus from the airport directly to Milano Centrale. The bus can be taken directly at the airport exit, it is available every 20-30 minutes and it reaches Milano Centrale Station in 50-60 minutes. From Milano Centrale, you can take every train passing from local train lines S1, S2 or S13 to reach the Milano Bovisa Politecnico train station.
- If you reach Milano by train and you do not pass from Milano Bovisa Politecnico train station before reaching your final destination, the easiest way to the workshop venue is to take any train passing from local train lines S1, S2 or S13.
Program Committee
| Aditya Modi |
|---|
| Ahmed Touati |
| Alain Dutech |
| Aldo Pacchiano |
| Alessandro Lazaric |
| Alessio Russo |
| Alexis Jacq |
| Amarildo Likmeta |
| André Biedenkapp |
| Andrea Tirinzoni |
| Boris Belousov |
| Brendan O’Donoghue |
| Carlo D’Eramo |
| Christos Dimitrakakis |
| Ciara Pike-Burke |
| Claire Vernade |
| Conor F Hayes |
| David Abel |
| David Brandfonbrener |
| David Meger |
| Davide Tateo |
| Debabrota Basu |
| Divya Grover |
| Dongruo Zhou |
| Dylan R Ashley |
| Elena Smirnova |
| Emilie Kaufmann |
| Emmanuel Esposito |
| Eugenio Bargiacchi |
| Felipe Leno da Silva |
| Felix Berkenkamp |
| Francesco Faccio |
| Fredrik Heintz |
| Gergely Neu |
| Germano Gabbianelli |
| Gianluca Drappo |
| Giorgia Ramponi |
| Giorgio Manganini |
| Giuseppe Canonaco |
| Glen Berseth |
| Hannes Eriksson |
| Hao Liu |
| Harsh Satija |
| Hélène Plisnier |
| Ido Greenberg |
| Jens Kober |
| Johan Källström |
| Jonathan J Hunt |
| Julien Perolat |
| Kamyar Azizzadenesheli |
| Khaled Eldowa |
| Khazatsky Alexander |
| Khimya Khetarpal |
| Kianté Brantley |
| Léonard Hussenot |
| Lior Shani |
| Martin Klissarov |
| Martino Bernasconi |
| Mathieu Reymond |
| Matteo Papini |
| Matteo Pirotta |
| Matthew E. Taylor |
| Matthieu Geist |
| Nico Montali |
| Nicolò A Cesa-Bianchi |
| Olivier Bachem |
| Omar Darwiche Domingues |
| Paolo Bonetti |
| Patrick Mannion |
| Patrick Saux |
| Peter Vamplew |
| Philippe Preux |
| Pierluca D’Oro |
| Pierre Liotet |
| Pierre Menard |
| Prashanth L.A. |
| Puze Liu |
| Quanquan Gu |
| Rafael Rodriguez Sanchez |
| Rahul Savani |
| Riad Akrour |
| Riccardo Poiani |
| Richard S Sutton |
| Robert Dadashi |
| Roberta Raileanu |
| Romina Abachi |
| Ronald Ortner |
| Roxana Radulescu |
| Rui YUAN |
| Samuele Tosatto |
| Shangdong Yang |
| Simon Du |
| Tal Lancewicki |
| Taylor W Killian |
| Tengyang Xie |
| Thanh Nguyen-Tang |
| Tian Xu |
| Tianwei Ni |
| Tom Schaul |
| Tom Zahavy |
| Tommaso R Cesari |
| Weitong ZHANG |
| Yannis Flet-Berliac |
| Yi Su |
| Yishay Mansour |
| Younggyo Seo |
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Newton-based Policy Search for Networked Multi-agent Reinforcement LearningManganini, Giorgio; Fioravanti, Simone; Ramponi, Giorgia Accept (Oral)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Last Switch Dependent Analysis of Satiation and Seasonality in BanditsLaforgue, Pierre; Clerici, Giulia; Cesa-Bianchi, Nicolò; Gilad-Bachrach, Ran Accept (Oral)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Local Feature Swapping for Generalization in Reinforcement LearningBertoin, David; Rachelson, Emmanuel Accept (Oral)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Discovering Policies with DOMiNO: Diversity Optimization Maintaining Near OptimalityZahavy, Tom; Schroecker, Yannick; Behbahani, Feryal; Baumli, Kate; Flennerhag, Sebastian; Hou, Shaobo; Singh, Satinder Accept (Oral)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Dynamic Pricing with Online Data Aggregation and LearningGenalti, Gianmarco; Mussi, Marco; Nuara, Alessandro; Gatti, Nicola Accept (Oral)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Scalable Deep Reinforcement Learning Algorithms for Mean Field GamesLauriere, Mathieu; Perrin, Sarah; Girgin, Sertan; Muller, Paul; Jain, Ayush; Cabannes, Théophile; Piliouras, Georgios; Perolat, Julien; Élie, Romuald; Pietquin, Olivier; Geist, Matthieu Accept (Oral)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Optimistic PAC Reinforcement Learning: the Instance-Dependent ViewTirinzoni, Andrea; Al Marjani, Aymen; Kaufmann, Emilie Accept (Oral)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Group Fairness in Reinforcement LearningSatija, Harsh; Lazaric, Alessandro; Pirotta, Matteo; Pineau, Joelle Accept (Oral)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): IQ-Learn: Inverse soft-Q Learning for ImitationGarg, Divyansh; Chakraborty, Shuvam; Cundy, Chris; Song, Jiaming; Geist, Matthieu; Ermon, Stefano Accept (Oral)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): In a Nutshell, the Human Asked for This: Latent Goals for Following Temporal SpecificationsG. León, Borja; Shanahan, Murray; Belardinelli, Francesco Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Deep Reinforcement Learning Approach to Supply Chain Inventory ManagementStranieri, Francesco; Stella, Fabio Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Lazy-MDPs: Towards Interpretable Reinforcement Learning by Learning When to ActJacq, Alexis; Ferret, Johan; Geist, Matthieu; Pietquin, Olivier Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Continuous Control with Action Quantization from DemonstrationsDadashi, Robert; Hussenot, Léonard; Vincent, Damien; Girgin, Sertan; Raichuk, Anton; Geist, Matthieu; Pietquin, Olivier Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Unifying Framework for Reinforcement Learning and PlanningMoerland, Thomas M; Broekens, Joost; Plaat, Aske; Jonker, Catholijn M Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic ResetsStrupl, Miroslav; Faccio, Francesco; Ashley, Dylan R; Schmidhuber, Jürgen ; Srivastava, Rupesh Kumar Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Stochastic Bandits with Vector Losses: Minimizing $\ell^\infty$-Norm of Relative LossesShang, Xuedong; Shao, Han; Qian, Jian Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Semi-Counterfactual Risk Minimization Via Neural NetworksAminian, Gholamali; Vega, Roberto I; Rivasplata, Omar; Toni, Laura; Rodrigues, Miguel Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): When Privacy Meets Partial Information: A Refined Analysis of Differentially Private BanditsAzize, Achraf; Basu, Debabrota Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Deep Coherent Exploration for Continuous ControlZhang, Yijie; van Hoof, Herke Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Bilinear Exponential Family of MDPs: Frequentist Regret Bound with Tractable Exploration \& PlanningOuhamma, Reda; Basu, Debabrota; Maillard, Odalric Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Neural Distillation as a State Representation Bottleneck in Reinforcement LearningGuillet, Valentin; Wilson, Dennis; Aguilar-Melchor, Carlos; Rachelson, Emmanuel Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Tabular and Deep Learning of Whittle IndexRobledo, Francisco; Ayesta, Urtzi; Avrachenkov, Konstantin; Borkar, Vivek Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Learning Efficiently Function Approximation for Contextual MDPLevy, Orin; Mansour, Yishay Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Optimism in Face of a Context: Regret Guarantees for Stochastic Contextual MDPLevy, Orin; Mansour, Yishay Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Look where you look! Saliency-guided Q-networks for visual RL tasksBertoin, David; Zouitine, Adil; Zouitine, Mehdi; Rachelson, Emmanuel Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Quantification of Transfer in Reinforcement Learning via Regret Bounds for Learning AgentsTuynman, Adrienne; Ortner, Ronald Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Regret Bounds for Satisficing in Multi-Armed Bandit ProblemsMichel, Thomas; Hajiabolhassan, Hossein; Ortner, Ronald Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Risk-aware linear bandits with convex lossSaux, Patrick; Maillard, Odalric Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Interactive Inverse Reinforcement LearningKleine Büning, Thomas; George, Anne-Marie; Dimitrakakis, Christos Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Reinforcement Learning with a TerminatorGuy, Tennenholtz; Merlis, Nadav; Shani, Lior; Mannor, Shie; Shalit, Uri; Chechik, Gal; Hallak, Assaf; Dalal, Gal Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): On Convergence of Neural asynchronous Q-iterationSmirnova, Elena Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Curriculum Reinforcement Learning via Constrained Optimal TransportKlink, Pascal; Yang, Haoyi; D’Eramo, Carlo; Peters, Jan; Pajarinen, Joni Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Cross-Entropy Soft-Risk Reinforcement LearningGreenberg, Ido; Chow, Yinlam; Ghavamzadeh, Mohammad; Mannor, Shie Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Active Exploration for Inverse Reinforcement LearningLindner, David; Krause, Andreas; Ramponi, Giorgia Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Offline Credit Assignment in Deep Reinforcement Learning with Hindsight Discriminator NetworksFerret, Johan; Pietquin, Olivier; Geist, Matthieu Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Local Advantage Networks for Multi-Agent Reinforcement Learning in Dec-POMDPsAvalos, Raphael; Reymond, Mathieu; Nowé, Ann; Roijers, Diederik M Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Sample-Efficient Reinforcement Learning of Partially Observable Markov GamesLiu, Qinghua; Szepesvari, Csaba; Jin, Chi Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): General Policy Evaluation and Improvement by Learning to Identify Few But Crucial StatesFaccio, Francesco; Ramesh, Aditya; Herrmann, Vincent; Harb, Jean; Schmidhuber, Jürgen Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Goal-Conditioned Generators of Deep PoliciesFaccio, Francesco; Herrmann, Vincent; Ramesh, Aditya; Kirsch, Louis; Schmidhuber, Jürgen Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Learning Based Framework for Handling Uncertain Lead Times in Multi-Product Inventory ManagementMeisheri, Hardik; Nath, Somjit; Baranwal, Mayank; Khadilkar, Harshad Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Near-Optimal Regret for Adversarial MDP with Delayed Bandit FeedbackJin, Tiancheng; Lancewicki, Tal; Luo, Haipeng; Mansour, Yishay; Rosenberg, Aviv Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Rate-Optimal Online Convex Optimization in Adaptive Linear ControlCassel, Asaf B; Cohen, Alon; Koren, Tomer Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Cooperative Online Learning in Stochastic and Adversarial MDPsLancewicki, Tal; Rosenberg, Aviv; Mansour, Yishay Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Multi-Objective Coordination Graphs for the Expected Scalarised Returns with Generative Flow ModelsHayes, Conor F; Verstraeten, Timothy; Roijers, Diederik M; Howley, Enda; Mannion, Patrick Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Get Back Here: Robust Imitation by Return-to-Distribution PlanningCideron, Geoffrey; Pietquin, Olivier; Dadashi, Robert; Dulac-Arnold, Gabriel; Tabanpour, Baruch; Geist, Matthieu; Hussenot, Léonard; Curi, Sebastian; Girgin, Sertan Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Analysis of Stochastic Processes through Replay BuffersDi-Castro, Shirli; Mannor, Shie; Di Castro, Dotan Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Mixture of Interpretable Experts for Continuous ControlTateo, Davide; Akrour, Riad; Peters, Jan Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): On Reward Binarisation and Bayesian AgentsCatt, Elliot; Hutter, Marcus; Veness, Joel Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): $Q$-Learning for $L_p$ Robust Markov Decision ProcessesKumar, Navdeep; Wang, Kaixin; Levy, Kfir; Mannor, Shie Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): First Go, then Post-Explore: the Benefits of Post-Exploration in Intrinsic MotivationYang, Zhao; Moerland, Thomas M; Preuss, Mike; Plaat, Aske Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru*; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Curious Exploration via Structured World Models Yields Zero-Shot Object ManipulationSancaktar, Cansu *; Blaes, Sebastian; Martius, Georg Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Linear Convergence of Natural Policy Gradient Methods with Log-Linear PoliciesYUAN, Rui; Gower, Robert M; Lazaric, Alessandro; Du, Simon; Xiao, Lin Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru*; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Near-Optimal Best-of-Both-Worlds Algorithm for Online Learning with Feedback GraphsRouyer , Chloé *; van der Hoeven, Dirk; Cesa-Bianchi, Nicolò; Seldin, Yevgeny Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): RLDesigner: Toward Framing Spatial Layout Planning as a Markov Decision ProcessKakooee, Reza; Dillenburger, Benjamin Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Belief states of POMDPs and internal states of recurrent RL agents: an empirical analysis of their mutual informationLambrechts, Gaspard; Bolland, Adrien; Ernst, Damien Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual BanditsNeu, Gergely; Olkhovskaya, Julia; Papini, Matteo; Schwartz, Ludovic Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Formulation and validation of a complete car-following model based on deep reinforcement learningHart, Fabian Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPsTirinzoni, Andrea; Al Marjani, Aymen; Kaufmann, Emilie Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Scalable Representation Learning in Linear Contextual Bandits with Constant Regret GuaranteesTirinzoni, Andrea; Papini, Matteo; Touati, Ahmed; Lazaric, Alessandro; Pirotta, Matteo Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Learning Generative Models with Goal-conditioned Reinforcement LearningVargas Vieyra, Mariana; Menard, Pierre Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Adaptive Belief Discretization for POMDP PlanningGrover, Divya; Dimitrakakis, Christos Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Boosting reinforcement learning with sparse and rare rewards using Fleming-Viot particle systemsMastropietro, Daniel G; Majewski, Szymon; Ayesta, Urtzi; Jonckheere, Matthieu Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Sparse Linear Program for Global Planning in Large MDPsNeu, Gergely; Okolo, Nneka M Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): A Best-of-Both-Worlds Algorithm for Bandits with Delayed FeedbackMasoudian, Saeed; Zimmert, Julian; Seldin, Yevgeny Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Entropy Regularized Reinforcement Learning with Cascading NetworksShilova, Alena; Della Vecchia, Riccardo; Preux, Philippe; Akrour, Riad Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularizationPatil, Gandharv; L.A., Prashanth; Precup, Doina Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): On learning history-based policies for controlling Markov Decision ProcessesPatil, Gandharv; Mahajan, Aditya; Precup, Doina Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Minimax-Bayes Reinforcement LearningKleine Büning, Thomas; Dimitrakakis, Christos; Eriksson, Hannes; Grover, Divya; Jorge, Emilio Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): On Bayesian Value Function Distributions.Jorge, Emilio; Eriksson, Hannes; Dimitrakakis, Christos; Basu, Debabrota; Grover, Divya Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): TempRL: Temporal Priors for Exploration in Off-Policy Reinforcement LearningBagatella, Marco; Christen, Sammy; Hilliges, Otmar Accept (Poster)
- Direct Advantage EstimationPan, Hsiao-Ru; Gürtler, Nico; Neitz, Alexander; Schölkopf, Bernhard Accept (Oral): Optimistic Risk-Aware Model-based Reinforcement LearningAbachi, Romina; Farahmand, Amir-massoud Accept (Poster)
Photos from the Workshop
Code of Conduct
The official EWRL 2022 Code of Conduct can be found here








