Denis Tarasov

I obtained my BSc in Computer Science at Constructor (ex Jacobs) University Bremen while studying at first at Higher School of Economics. I'm currently getting my Masters degree at ETH Zurich. Right now I'm research assistant at ETH NLPED lab.

Previously I did Machine Learning research projects at Meta AI, InstaDeep, Tinkoff AI, Yandex and JetBrains Research.

Email  /  CV  /  Google Scholar  /  Semantic Scholar  /  Twitter  /  Linkedin  /  Github

profile photo
Research

I'm interested in Reinforcement Learning, Natural Language Processing and Bioinformatics.

Offline RL for generative design of protein binders
Denis Tarasov, Ulrich Mbou, Miguel Arbesú, Nima Siboni, Sebastien Boyer, Dries Smit, Oliver Bent, Arnu Pretorius
NeurIPS 2023 Workshop on New Frontiers of AI for Drug Discovery and Development, 2023 (Oral Presentation)

Offline Reinforcement Learning (RL) offers a compelling avenue for solving RL problems without the need for interactions with the environment, which can be costly or risky. While online RL methods have found success in various domains, such as de-novo drug generation, they struggle when it comes to optimizing essential properties like drug docking efficiency. The high computational cost associated with the docking process makes it impractical for online RL, which typically requires hundreds of thousands of interactions to learn. In this study, we propose the application of offline RL to address the bottleneck posed by the docking process, leveraging RL's capability to optimize non-differentiable properties. Our preliminary investigation focuses on using offline RL to generate drugs with improved docking and chemical characteristics.

Revisiting the Minimalist Approach to Offline Reinforcement Learning
OLD NAME: Revisiting Behavior Regularized Actor-Critic
Denis Tarasov, Vladislav Kurenkov, Alexander Nikulin, Sergey Kolesnikov,
NeurIPS, 2023

Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices on established baselines remains understudied. In this work, we aim to bridge this gap by conducting a retrospective analysis of recent works in offline RL and propose ReBRAC, a minimalistic algorithm that integrates such design elements built on top of the TD3+BC method. We evaluate ReBRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments.

CORL: Research-oriented Deep Offline Reinforcement Learning Library
Denis Tarasov, Alexander Nikulin, Dmitry Akimov, Vladislav Kurenkov, Sergey Kolesnikov,
NeurIPS, 2023
code

CORL is an open-source library that provides single-file implementations of Deep Offline Reinforcement Learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into distinct single files, making performance-relevant details easier to recognise. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking a commonly employed D4RL benchmark.

Katakomba: Tools and Benchmarks for Data-Driven NetHack
Vladislav Kurenkov, Alexander Nikulin, Denis Tarasov, Sergey Kolesnikov,
NeurIPS, 2023

NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack dataset was released; while it was a necessary step forward, it has yet to gain wide adoption in the ORL community. In this work, we argue that there are three major obstacles for adoption: tool-wise, implementation-wise, and benchmark-wise. To address them, we develop an open-source library that provides workflow fundamentals familiar to the ORL community: pre-defined D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation tools with accompanying configs and logs synced to the cloud.

Anti-Exploration by Random Network Distillation
Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov,
ICML, 2023

Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.

Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Dmitry Akimov, Sergey Kolesnikov,
NeurIPS 3rd Offline RL Workshop: Offline RL as a "Launchpad", 2022

Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for model-free deep offline RL algorithms, recently introduced Q-ensemble methods achieving state-of-the-art performance made this issue more relevant, notably extending the training duration. In this work, we demonstrate how this class of methods can benefit from large-batch optimization, which is commonly overlooked by the deep offline RL community. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time, effectively shortening training duration by 2.5x times on average.

Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows
Dmitry Akimov, Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov,
NeurIPS 3rd Offline RL Workshop: Offline RL as a "Launchpad", 2022

Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these problems is to induce conservatism - i.e., keeping the learned policies closer to the behavioral ones. To achieve this, we build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model, which we use as a conservative action encoder. This Normalizing Flows action encoder is pre-trained in a supervised manner on the offline dataset, and then an additional policy model - controller in the latent space - is trained via reinforcement learning. This approach avoids querying actions outside of the training dataset and therefore does not require additional regularization for out-of-dataset actions. We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms with generative action models on a large portion of datasets.

Prompts and Pre-Trained Language Models for Offline Reinforcement Learning
Denis Tarasov, Vladislav Kurenkov, Sergey Kolesnikov,
ICLR Workshop GPL, ACL Workshop LNLS, 2022
poster

In this preliminary study, we introduce a simple way to leverage pre-trained language models in deep offline RL settings that are not naturally suited for textual representation. We propose using a state transformation into a human-readable text and a minimal fine-tuning of the pre-trained language model when training with deep offline RL algorithms. This approach shows consistent performance gains on the NeoRL MuJoCo datasets. Our experiments suggest that LM finetuning is crucial for good performance on robotics tasks. However, we also show that it is not necessary when working with finance environments in order to retain significant improvement in the final performance.

Fixing 1-bit Adam and 1-bit LAMB algorithms
Denis Tarasov, Vasily Ershov
SEIM, 2022 (Oral Presentation)

Today, various neural network models are trained using distributed learning in order to reduce the time spent. Slow network communication between devices can significantly reduce distribution efficiency. Recent studies propose one-bit versions of the Adam and LAMB algorithms, which can significantly reduce the amount of transmitted information, as a result of which the scalability of training is improved. However, it turned out that these algorithms diverge on some neural network architectures. The goal of this work is an empirical study of these algorithms, to find the solution of the discovered divergence problem and the proposal of new aspects of testing gradient descent algorithms.

Predicting ethnicity with data on personal names in Russia
Alexey Bessudnov, Denis Tarasov, Viacheslav Panasovets,Veronica Kostenko, Ivan Smirnov, Vladimir Uspenskiy
Journal of Computational Social Science, 2023
code

In this paper we develop a machine learning classifier that predicts perceivedethnicity from data on personal names for major ethnic groups populating Russia. We collect data from VK, the largest Russian social media website. Ethnicity has been determined from languages spoken by users and their geographicallocation, with the data manually cleaned by crowd workers. The classifier showsthe accuracy of 0.82 for a scheme with 24 ethnic groups and 0.92 for 15 aggregated ethnic groups. It can be used for research on ethnicity and ethnic relationsin Russia, in particular with VK and other social media data.


Based on http://jonbarron.info