Denis Tarasov

I obtained my BSc in Computer Science at Constructor (ex Jacobs) University Bremen while studying at first at Higher School of Economics. I'm currently getting my Masters degree at ETH Zurich.

Previously I did Machine Learning research projects at Meta AI, ETH Zurich, EPFL, InstaDeep, Tinkoff AI, Yandex and JetBrains Research.

Email  /  CV  /  Google Scholar  /  Semantic Scholar  /  Twitter  /  Linkedin  /  Github

profile photo
Research

I'm interested in Reinforcement Learning, Natural Language Processing and Bioinformatics.

The Role of Deep Learning Regularizations on Actors in Offline RL
Denis Tarasov, Anja Surina, Caglar Gulcehre
Preprint, 2024

Deep learning regularization techniques, such as dropout, layer normalization, or weight decay, are widely adopted in the construction of modern artificial neural networks, often resulting in more robust training processes and improved generalization capabilities. However, in the domain of Reinforcement Learning (RL), the application of these techniques has been limited, usually applied to value function estimators, and may result in detrimental effects. This issue is even more pronounced in offline RL settings, which bear greater similarity to supervised learning but have received less attention. Recent work in continuous offline RL has demonstrated that while we can build sufficiently powerful critic networks, the generalization of actor networks remains a bottleneck. In this study, we empirically show that applying standard regularization techniques to actor networks in offline RL actor-critic algorithms yields improvements of 6% on average across two algorithms and three different continuous D4RL domains.

Is Value Functions Estimation with Classification Plug-and-play for Offline Reinforcement Learning?
Denis Tarasov, Kirill Brilliantov, Dmitrii Kharlapenko
ICML AutoRL Workshop, 2024

In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective, which has demonstrated improved performance and scalability of RL algorithms. However, existing study have not extensively benchmarked the effects of this replacement across various domains, as the primary objective was to demonstrate the efficacy of the concept across a broad spectrum of tasks, without delving into in-depth analysis. Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup and analyze the effects of different aspects on performance. Through large-scale experiments conducted across a diverse range of tasks using different algorithms, we aim to gain deeper insights into the implications of this approach. Our results reveal that incorporating this change can lead to superior performance over state-of-the-art solutions for some algorithms in certain tasks, while maintaining comparable performance levels in other tasks, however for other algorithms this modification might lead to the dramatic performance drop. This findings are crucial for further application of classification approach in research and practical tasks.

Distilling LLMs’ Decomposition Abilities into Compact Language Models
Denis Tarasov, Kumar Shridhar
ICML Workshops AI4MATH and AutoRL, 2024

Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities, yet their large size presents scalability challenges and limits any further customization. In contrast, compact models offer customized training but often fall short in solving complex reasoning tasks. This study focuses on distilling the LLMs' decomposition skills into compact models using offline reinforcement learning. We leverage the advancements in the LLM`s capabilities to provide feedback and generate a specialized task-specific dataset for training compact models. The development of an AI-generated dataset and the establishment of baselines constitute the primary contributions of our work, underscoring the potential of compact models in replicating complex problem-solving skills.

Offline RL for generative design of protein binders
Denis Tarasov, Ulrich Mbou, Miguel Arbesú, Nima Siboni, Sebastien Boyer, Dries Smit, Oliver Bent, Arnu Pretorius
NeurIPS 2023 Workshop on New Frontiers of AI for Drug Discovery and Development, 2023 (Oral Presentation)

Offline Reinforcement Learning (RL) offers a compelling avenue for solving RL problems without the need for interactions with the environment, which can be costly or risky. While online RL methods have found success in various domains, such as de-novo drug generation, they struggle when it comes to optimizing essential properties like drug docking efficiency. The high computational cost associated with the docking process makes it impractical for online RL, which typically requires hundreds of thousands of interactions to learn. In this study, we propose the application of offline RL to address the bottleneck posed by the docking process, leveraging RL's capability to optimize non-differentiable properties. Our preliminary investigation focuses on using offline RL to generate drugs with improved docking and chemical characteristics.

Revisiting the Minimalist Approach to Offline Reinforcement Learning
OLD NAME: Revisiting Behavior Regularized Actor-Critic
Denis Tarasov, Vladislav Kurenkov, Alexander Nikulin, Sergey Kolesnikov,
NeurIPS, 2023

Recent years have witnessed significant advancements in offline reinforcement learning (RL), resulting in the development of numerous algorithms with varying degrees of complexity. While these algorithms have led to noteworthy improvements, many incorporate seemingly minor design choices that impact their effectiveness beyond core algorithmic advances. However, the effect of these design choices on established baselines remains understudied. In this work, we aim to bridge this gap by conducting a retrospective analysis of recent works in offline RL and propose ReBRAC, a minimalistic algorithm that integrates such design elements built on top of the TD3+BC method. We evaluate ReBRAC on 51 datasets with both proprioceptive and visual state spaces using D4RL and V-D4RL benchmarks, demonstrating its state-of-the-art performance among ensemble-free methods. To further illustrate the efficacy of these design choices, we perform a large-scale ablation study and hyperparameter sensitivity analysis on the scale of thousands of experiments.

CORL: Research-oriented Deep Offline Reinforcement Learning Library
Denis Tarasov, Alexander Nikulin, Dmitry Akimov, Vladislav Kurenkov, Sergey Kolesnikov,
NeurIPS, 2023
code

CORL is an open-source library that provides single-file implementations of Deep Offline Reinforcement Learning algorithms. It emphasizes a simple developing experience with a straightforward codebase and a modern analysis tracking tool. In CORL, we isolate methods implementation into distinct single files, making performance-relevant details easier to recognise. Additionally, an experiment tracking feature is available to help log metrics, hyperparameters, dependencies, and more to the cloud. Finally, we have ensured the reliability of the implementations by benchmarking a commonly employed D4RL benchmark.

Katakomba: Tools and Benchmarks for Data-Driven NetHack
Vladislav Kurenkov, Alexander Nikulin, Denis Tarasov, Sergey Kolesnikov,
NeurIPS, 2023

NetHack is known as the frontier of reinforcement learning research where learning-based methods still need to catch up to rule-based solutions. One of the promising directions for a breakthrough is using pre-collected datasets similar to recent developments in robotics, recommender systems, and more under the umbrella of offline reinforcement learning (ORL). Recently, a large-scale NetHack dataset was released; while it was a necessary step forward, it has yet to gain wide adoption in the ORL community. In this work, we argue that there are three major obstacles for adoption: tool-wise, implementation-wise, and benchmark-wise. To address them, we develop an open-source library that provides workflow fundamentals familiar to the ORL community: pre-defined D4RL-style tasks, uncluttered baseline implementations, and reliable evaluation tools with accompanying configs and logs synced to the cloud.

Anti-Exploration by Random Network Distillation
Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov,
ICML, 2023

Despite the success of Random Network Distillation (RND) in various domains, it was shown as not discriminative enough to be used as an uncertainty estimator for penalizing out-of-distribution actions in offline reinforcement learning. In this paper, we revisit these results and show that, with a naive choice of conditioning for the RND prior, it becomes infeasible for the actor to effectively minimize the anti-exploration bonus and discriminativity is not an issue. We show that this limitation can be avoided with conditioning based on Feature-wise Linear Modulation (FiLM), resulting in a simple and efficient ensemble-free algorithm based on Soft Actor-Critic. We evaluate it on the D4RL benchmark, showing that it is capable of achieving performance comparable to ensemble-based methods and outperforming ensemble-free approaches by a wide margin.

Q-Ensemble for Offline RL: Don't Scale the Ensemble, Scale the Batch Size
Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Dmitry Akimov, Sergey Kolesnikov,
NeurIPS 3rd Offline RL Workshop: Offline RL as a "Launchpad", 2022

Training large neural networks is known to be time-consuming, with the learning duration taking days or even weeks. To address this problem, large-batch optimization was introduced. This approach demonstrated that scaling mini-batch sizes with appropriate learning rate adjustments can speed up the training process by orders of magnitude. While long training time was not typically a major issue for model-free deep offline RL algorithms, recently introduced Q-ensemble methods achieving state-of-the-art performance made this issue more relevant, notably extending the training duration. In this work, we demonstrate how this class of methods can benefit from large-batch optimization, which is commonly overlooked by the deep offline RL community. We show that scaling the mini-batch size and naively adjusting the learning rate allows for (1) a reduced size of the Q-ensemble, (2) stronger penalization of out-of-distribution actions, and (3) improved convergence time, effectively shortening training duration by 2.5x times on average.

Let Offline RL Flow: Training Conservative Agents in the Latent Space of Normalizing Flows
Dmitry Akimov, Alexander Nikulin, Vladislav Kurenkov, Denis Tarasov, Sergey Kolesnikov,
NeurIPS 3rd Offline RL Workshop: Offline RL as a "Launchpad", 2022

Offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions. There are two major challenges in this setting: (1) extrapolation error caused by approximating the value of state-action pairs not well-covered by the training data and (2) distributional shift between behavior and inference policies. One way to tackle these problems is to induce conservatism - i.e., keeping the learned policies closer to the behavioral ones. To achieve this, we build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model, which we use as a conservative action encoder. This Normalizing Flows action encoder is pre-trained in a supervised manner on the offline dataset, and then an additional policy model - controller in the latent space - is trained via reinforcement learning. This approach avoids querying actions outside of the training dataset and therefore does not require additional regularization for out-of-dataset actions. We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms with generative action models on a large portion of datasets.

Prompts and Pre-Trained Language Models for Offline Reinforcement Learning
Denis Tarasov, Vladislav Kurenkov, Sergey Kolesnikov,
ICLR Workshop GPL, ACL Workshop LNLS, 2022
poster

In this preliminary study, we introduce a simple way to leverage pre-trained language models in deep offline RL settings that are not naturally suited for textual representation. We propose using a state transformation into a human-readable text and a minimal fine-tuning of the pre-trained language model when training with deep offline RL algorithms. This approach shows consistent performance gains on the NeoRL MuJoCo datasets. Our experiments suggest that LM finetuning is crucial for good performance on robotics tasks. However, we also show that it is not necessary when working with finance environments in order to retain significant improvement in the final performance.

Fixing 1-bit Adam and 1-bit LAMB algorithms
Denis Tarasov, Vasily Ershov
SEIM, 2022 (Oral Presentation)

Today, various neural network models are trained using distributed learning in order to reduce the time spent. Slow network communication between devices can significantly reduce distribution efficiency. Recent studies propose one-bit versions of the Adam and LAMB algorithms, which can significantly reduce the amount of transmitted information, as a result of which the scalability of training is improved. However, it turned out that these algorithms diverge on some neural network architectures. The goal of this work is an empirical study of these algorithms, to find the solution of the discovered divergence problem and the proposal of new aspects of testing gradient descent algorithms.

Predicting ethnicity with data on personal names in Russia
Alexey Bessudnov, Denis Tarasov, Viacheslav Panasovets,Veronica Kostenko, Ivan Smirnov, Vladimir Uspenskiy
Journal of Computational Social Science, 2023
code

In this paper we develop a machine learning classifier that predicts perceivedethnicity from data on personal names for major ethnic groups populating Russia. We collect data from VK, the largest Russian social media website. Ethnicity has been determined from languages spoken by users and their geographicallocation, with the data manually cleaned by crowd workers. The classifier showsthe accuracy of 0.82 for a scheme with 24 ethnic groups and 0.92 for 15 aggregated ethnic groups. It can be used for research on ethnicity and ethnic relationsin Russia, in particular with VK and other social media data.


Based on http://jonbarron.info