Intelligent Systems at Phystech: 2024 Year in Review

The goal of the Intelligent Systems Department is to facilitate the road to high-quality professional life. The Ph.D. degree requires three publications in peer-reviewed journals. They are the core of the student’s thesis. This year each of our bachelor students delivered at least one publication. It means they pave the road to their Ph.D. To facilitate this, the Department provides state-of-the-art research topics, scientific advisors with excellence in science, and fine-tuned educational courses. Below, we are proud to recognize our students for their outstanding achievements.

Statistical data on students of the department, taken from open sources

This year our department continued its active development. This year we graduated seven master’s students, and three students continued their careers in Ph.D. studies. All 15 undergraduate students of our department continued their studies in master’s, and 13 students stayed in our master’s program at our department. Stably for several years, students and graduates of our department defend candidate’s dissertations. We would like to congratulate Vasiliy Novitsky on his PhD thesis defense “New Bounds for One-point Stochastic Gradient-free Methods”. Congratulations!

We adhere to complete openness of scientific research, therefore all defenses are presented on our YouTube channel and on our department site.

The publication activity of our department is worth mentioning separately. Over the past year, the number of publications in our department has almost doubled!

Applied methods in machine learning

Applied research in machine learning is popular due to its contribution to our lives. Research in our department is no exception.

The series of publications on the topic of machine-generated text detection opens different competitions. The paper, authored by our student Anastasia Voznyuk, describes a solution for the Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection shared task in the SemEval-2024 competition aims to tackle the problem of misusing collaborative human-AI writing. The paper considers the boundary detection problem. In particular, they present a pipeline for augmenting data for supervised fine-tuning of DeBERTaV3. The authors receive a new best MAE score, according to the leaderboard of the competition, with this pipeline. The next paper, authored by a group of researchers including our student Ksenia Petrushina, presents novel systems developed for the SemEval-2024 hallucination detection task. Their investigation spans a range of strategies to compare model predictions with reference standards, encompassing diverse baselines, the refinement of pre-trained encoders through supervised learning, and an ensemble approach utilizing several high-performing models. Through these explorations, they introduce three distinct methods that exhibit strong performance metrics. Notably, their premier method achieved a commendable 9th place in the competition’s model-agnostic track and 20th place in the model-aware track, highlighting its effectiveness and potential.

The paper, authored by German Gritsai and Ildar Khabutdinov, describes a system designed to distinguish between AI-generated and human-written scientific excerpts in the DAGPap24 competition hosted within the Fourth Workshop on Scientific Document Processing. They focused on the use of a multi-task learning architecture with two heads. The application of this approach is justified by the specificity of the task, where class spans are continuous over several hundred characters. They considered different encoder variations to obtain a state vector for each token in the sequence, as well as a variation in splitting fragments into tokens to further feed into the input of a transform-based encoder. This approach allows us to achieve a 9% quality improvement relative to the baseline solution score on the development set (from 0.86 to 0.95) using the average macro F1-score, as well as a score of 0.96 on a closed test part of the dataset from the competition. The next paper, authored by German Gritsai, describes a solution approach for the Automated Text Identification on Languages of the Iberian Peninsula competition held as part of the IberLEF 2024 conference. In the article, they present a model for detecting machine-generated fragments based on the aggregation of responses from a large language model BLOOM and two BERT-like encoders Multilingual E5 and XLM-RoBERTa. Given the specificity of the task, namely the presence of the different languages of the Iberian Peninsula, they fine-tuned the distinct models for different subgroups of languages. The method described in the paper helped the team to achieve about 67% for the binary classification dataset with 6 languages in the final competition result. The next competition describes binary text classification for machine or human text. The research provides an approach based on aggregating QLoRA adapters which are trained for multiple distributions of generative model families. Method, proposed by German Gritsai and Galina Boeva, LAVA demonstrates comparable results with the primary baseline provided by the PAN organizers. The method provides an efficient and fast detector with high performance of the target metrics, in view of the possibility of parallel training of adapters for the language models. It makes the detection process straightforward and flexible, tailoring the adapter to appearing distributions and adding it to an existing approach. The next conference paper, authored by German Gritsai and Anastasia Voznyuk, and Ildar Khabutdinov, describes a system to recognize machine-generated and human-written texts in the monolingual subtask of GenAI Detection Task 1 competition. Their developed system is a multi-task architecture with a shared Transformer Encoder between several classification heads. One head is responsible for binary classification between human-written and machine-generated texts, while the other heads are auxiliary multiclass classifiers for texts of different domains from particular datasets. As multiclass heads were trained to distinguish the domains presented in the data, they provided a better understanding of the samples. The approach led to achieving first place in the official ranking with an 83.07% macro F1 score on the test set and bypassing the baseline by 10%. We further study the obtained system through ablation, error, and representation analyses, finding that multi-task learning outperforms single-task mode and simultaneous tasks form a cluster structure in embedding space.

From paper: An overview of the proposed multi-task architecture for detecting machine-generated texts is presented. Different modules are either trainable or frozen at various stages, as indicated by corresponding emojis. CCH denotes the Custom Classification Head

The competition led to research into machine generation in general. The paper, authored by German Gritsai, presents a historical overview of the development of text generation algorithms. They have presented the material in a popular form so that it is possible to understand the principles of generative services with general erudition and certain computer skills. The paper, authored by German Gritsai and Anastasia Voznyuk, presents a systematic review of datasets from competitions dedicated to AI-generated content detection and propose methods for evaluating the quality of datasets containing AI-generated fragments. In addition, they discuss the possibility of using high-quality generated data to achieve two goals: improving the training of detection models and improving the training datasets themselves. Their contribution aims to facilitate a better understanding of the dynamics between human and machine text, which will ultimately support the integrity of information in an increasingly automated world.

One of the popular field of research in the department is topic modelling under the guidance of Professor Konstantin Vorontsov. In the paper Vasiliy Alekseev and their co-authors investigate the problem of estimating the number of topics in topic models, comparing various methods from the literature. They find that intrinsic methods for topic number estimation are neither reliable nor accurate. Additionally, the study shows that the number of topics depends on the chosen method and model, rather than being an inherent property of the corpus. In the paper Vasiliy Alekseev with co-authors propose an iterative training process for topic models, where each model is connected to the previous one through additive regularization, resulting in improved performance. This approach outperforms popular topic models such as LDA, ARTM, and BERTopic in both topic quality and diversity. Maria Nikitina gave a talk on automatic term extraction for scientific papers. She proposed to combine the collocation model TopMine with topic model library BigARTM and demonstrated that the resulting model efficiently works on the corpora of scientific texts.

From the preprint version of the paper: the idea of an iterative approach to topic model improvement

The related problem is considered in the paper authored by our alumni Alexey Grishanov and Aleksei Goncharov. It addresses the problem of unsupervised topic segmentation in dialogues, aiming to identify points in a dialogue stream where topics change. The authors propose a novel approach that leverages dialogue summarization, combined with smoothing techniques, and demonstrates its robustness on noisy text streams. The method outperforms many baseline algorithms, which often heavily depend on the quality of the input text.

From paper: Reference dialogue from the dataset proposed in the paper and simple sentences from its summary

Another area of applied interest is the study of large language models. The paper presents a multiple-choice question and selects the option with the highest logit as the model’s predicted answer. A team containing our student Anastasia Voznyuk introduced new scores that better capture and reveal the model’s underlying knowledge: the Query-Key Score (QK-score), derived from the interaction between query and key representations in attention heads, and the Attention Score, based on attention weights. These scores are extracted from specific heads, which show consistent performance across popular Multi-Choice Question Answering (MCQA) datasets.

From paper: Query-Key score calculation between the end-of-line token of an answer option and the last token of the prompt for the designated head, from which the answer is derived

In the paper, our Ph.D. student Konstantin Yakovlev and his colleagues propose Toolken+, a modification of the ToolkenGPT method for integrating external tools, such as database retrieval or symbolic computation, into large language models. The enhanced version introduces a reranking mechanism, enabling more confident tool selection. Additionally, Toolken+ incorporates tool documentation to provide users with comprehensive guidance on tool usage and examples.

From pre-print version of the paper: Toolken+ sample operation

Grammatical error correction is one of the core natural language processing tasks. In the paper Ildar Khabutdinov with co-autors proposes adaptation of GECToR architecture to the Russian language. Authors called those models as RuGECToR. The presented model achieves a quality of 82.5 in the F-score on synthetic data and 22.2 on the RULEC dataset, which was not used at the training stage.

The achievements of our students in natural language processing go far beyond publications. A team featuring our students Sergey Firsov and Vadim Kasiuk got first place at a hackathon hosted by MIPT in collaboration with VK. The challenge focused on text ranking tasks and presented a compelling problem: given a dataset with numerous queries, each paired with several responses labeled as correct or incorrect, the goal was to develop an effective ranking function. The competition encouraged a creative and flexible approach to defining the formal task formulation.

Many interesting studies are at the intersection of different sciences, such as mathematics, psychology, and sociology. In the paper Alina Samokhina and her colleagues investigate a specialised instructional approach for communicational learning skills learning that combines empathic listening and culturally nuanced communication skills. The experiment compared the effects of conventional language teaching and the proposed communication training on Japanese language learners assessing outcomes through written and blind oral tests conducted by a native speaker after nine months of instruction. This language education approach enhances cultural understanding and empathy, equipping learners with adaptive communication skills to foster inclusivity, reduce cultural misunderstandings, and build globally aware communities.

Other interesting research at the intersection of mathematics and biology. The paper, authored by Dmitry Muzalevskiy and Dr. Ivan Torshin, addresses the classification of cellular images for detecting leukemic (blast) cells in peripheral blood smears, an important task in practical hematology. The proposed method integrates graph theory, XGBoost, and convolutional neural networks (CNN). Images are converted into weighted graphs, and graph invariants are used as features for an XGBoost model, which, when combined with CNNs like ResNet-50, achieves high classification performance, with sensitivity and specificity reaching 99%.

From paper: growth of graph structures formed from microscopic images

The next paper, authored by Daniil Dorin and Nikita Kiselev under the supervision of Dr. Andrey Grabovoy and Prof. Vadim Strijov, investigates the correlation between videos presented to participants during an experiment and the resulting fMRI images. To achieve this, the authors propose a method for creating a linear model that predicts changes in fMRI signals based on video sequence images. A linear model is constructed for each individual voxel in the fMRI image, assuming that the image sequence follows a Markov property. In the paper Filipp Nikitin researches the problem of de novo 3D molecule generation, a critical task in drug discovery. The authors introduce Megalodon, a model designed to enhance the quality of generated 3D molecular structures. They demonstrate that Megalodon achieves state-of-the-art performance in 3D molecule generation, conditional structure generation, and energy-based benchmarks, leveraging diffusion and flow matching techniques.

From paper: molecules are separated into 3D structures and discrete atom types, bond types, and atom charge features

Currently, a lot of research is aimed at creating various open libraries and frameworks. In the paper Anastasia Voznyuk with her team presents an open-source framework for using Natural Language Processing models by leveraging transfer learning techniques. DeepPavlov 1.0 was created for modular and configuration-driven development of state-of-the-art NLP models and supports a wide range of NLP model applications.

From paper: overview of DeepPavlov training configuration

Another team comprising our student Maria Kovaleva and alumnus Andrey Filatov has released a new series of models, Kandinsky 4.0. These models address various tasks, including text-to-video, image-to-video, and video-to-audio.

From github: logo of Kandinsky 4.0 library

Our PhD student Pavel Severilov delivered an excellent series of talks on modern machine learning technologies, covering large language models, audio AI technologies, and workflows for solving industrial NLP tasks.

Optimisation

This year has been highly productive for our students working in the field of optimization, with several novel methods and extensions to existing techniques proposed.

For instance, our student Andrey Veprikov and Bachelor alumnus Alexander Bogdanov, along with their colleagues, proposed a new method, JAGUAR, for black-box optimization in scenarios where the gradient of the objective function is unavailable. Their method effectively leverages information from previous iterations and has been successfully integrated into classical optimization algorithms such as Frank-Wolfe and Gradient Descent. The resulting approach is demonstrated to be robust in stochastic settings and outperforms existing methods. In another paper [https://arxiv.org/pdf/2408.01848], Andrey addresses non-Euclidean optimization settings with Markovian noise in the first-order stochastic oracle. It proposes methods based on the Mirror Descent and Mirror Prox algorithms via the MLMC gradient estimation technique and obtains optimal results for both minimization and variational inequalities problems.

From the preprint version of the paper: comparison of JAGUAR with other optimization methods

In the paper Igor Ignashin and his co-author Demyan Yarmoshik address the problem of finding the equilibrium distribution of traffic flows and propose various modifications of the Frank-Wolfe algorithm. They show that these modifications, especially the use of multiple previous directions, lead to better convergence on the modeled urban datasets. The results demonstrate the advantage of the proposed algorithms over the classic Frank-Wolfe method.

Several papers published this year focus on analyzing contemporary standard optimization methods widely used in the deep learning community. One such paper, authored by a group of researchers including our student Andrei Semenov, investigates the issue of heavy-tailed noise in stochastic gradients when using adaptive step-size optimization methods like AdaGrad and Adam. They prove that AdaGrad may converge poorly under heavy-tailed noise and introduce new variants, Clip-RAdaGradD (Clipped Reweighted AdaGrad with Delay) and Clip-Adam. Empirical results show that this clipped version outperforms the original optimization methods across multiple tasks. Andrei also contributed to the paper, which introduces a modification of the recently proposed Mixed Newton Method, originally designed for minimizing real-valued functions of complex variables. Another significant contribution in the area of Newton-based methods is the paper authored by our alumnus Petr Ostroukhov and our PhD student and lecturer Konstantin Yakovlev. For the smooth and monotone case, we establish a lower bound with explicit dependence on the level of Jacobian inaccuracy and propose an optimal algorithm for this key setting. When derivatives are exact, our method converges at the same rate as exact optimal second-order methods. To reduce the cost of solving the auxiliary problem, which arises in all high-order methods with global convergence, we introduce several quasi-Newton approximations. Their method with Quasi-Newton updates achieves a global sublinear convergence rate. The proposed modification extends real-valued functions into the complex space, enabling their minimization while preserving favorable convergence properties. Additionally, the authors design a special regularization to prevent the model from converging to complex minima. This paper was presented at the current year Neur-IPS. Matvei Kreinin in his talk at the MIPT conference analyzed the convergence behavior of optimization methods with preconditioning that incorporate weight decay, focusing on popular variants like AdamW and OASIS. The study explored alternatives to these methods, examining their convergence speed and accuracy. The research provides insights into the development of regularization methods with preconditioning and weight decay. At the same conference, Konstantin Yakovlev presented a talk on hyperparameter gradient-based optimization. Gradient-based optimization methods enable the efficient tuning of a large number of hyperparameters, even in the range of thousands or millions, by leveraging approximate gradient techniques. Konstantin introduced a novel method that optimizes the full parameter trajectory while achieving faster performance compared to competing algorithms. The authors of the paper including Andrey Veprikov, address the challenges of stochastic optimisation in reinforcement learning, where the assumption of independently identically distributed data is violated due to the temporal dependencies of Markov decision processes (MDPs). They propose MAdam, an algorithm extending the classical Adam optimizer for average-reward reinforcement learning, leveraging multi-level Monte Carlo techniques to control variance without requiring knowledge of the MDP’s mixing time or assumptions about decay rates. The authors provide theoretical analysis and demonstrate the effectiveness of MAdam through experiments in challenging environments.

Significant progress has been made this year in the field of distributed optimization. In the paper authored by our students Nikita Kiselev, Daniil Dorin, and their colleagues the authors consider a separable function optimisation in a decentralised setting: parts of the optimised function and constraint terms are located in different nodes of the computational network. The authors proposed the first linearly convergent first-order decentralized algorithm for this problem with general affine coupled constraints.

From paper: the comparison of the proposed method against competing methods

Another important contribution comes from Andrei Semenov in the paper, which introduces a stochastic distributed method for monotone and strongly monotone variational inequalities with Lipschitz operators and convex regularizers, which are applicable in fields like game theory and adversarial training. Unlike previous methods that rely on the Euclidean metric, the proposed method uses Bregman proximal maps, making it compatible with arbitrary problem geometries. Additionally, Andrey Veprikov presented a talk on the Zero-Order Algorithm for Decentralised Optimization Problems. This paper was accepted by the AI Journey conference.

The paper, led by Nikita Kornilov as the first author, considers non-smooth convex optimization with a zeroth-order oracle corrupted by symmetric stochastic noise. Their results match the best-known ones for the case of the bounded variance, they use the mini-batched median estimate of the sampled gradient differences, apply gradient clipping to the result, and plug in the final estimate into the accelerated method. They apply this technique to the stochastic multi-armed bandit problem with a heavy-tailed distribution of rewards and achieve regret by incorporating the additional assumption of noise symmetry. Research on optimal transport is presented in the paper by Nikita, our alumnus Petr Mokrov, and their co-authors. This paper, based on the Master’s work of Nikita, develops and theoretically justifies the novel Optimal Flow Matching approach which allows recovering the straight Optimal Transport displacement for the quadratic transport in just one FM step. The main idea of their approach is the employment of vector fields for FM which are parameterized by convex functions. Their approach was presented at Neur-IP 2024.

From paper: the proposed Optimal Flow Matching obtains exactly straight transport trajectories

In addition, Nikita contributed to the paper, which introduces the Implicitly Normalized Forecaster (INF) algorithm. The authors establish convergence results under mild assumptions on the rewards distribution and demonstrate that INF-clip is optimal for linear heavy-tailed stochastic MAB problems and works well for non-linear ones. Furthermore, they show that INF-clip outperforms the best-of-both-worlds algorithm in cases where it is difficult to distinguish between different arms.

In the paper, a team containing our student Ilgam Latypov and our alumnus Dr. Aleksandr Katrutsa proposes a new method for constructing UCB-type algorithms for stochastic multi-armed bandits-based on general convex optimization methods with an inexact oracle. They proposed a new algorithm Clipped-SGD-UCB and showed, both theoretically and empirically, that in the case of symmetric noise in the reward, we can achieve a better regret bound. Another topic that Ilgam worked on is multi-objective optimization, which Ilgam is currently researching. This paper introduces an extension of the concept of competitive solutions and proposes the Scalarization With Competitiveness Method for multi-criteria problems. This method is highly interpretable eliminates the need for hyperparameter tuning and is useful when computational resources are limited or re-computation is not feasible. Optimal page replacement is an important problem in efficient buffer management and was studied in the paper with the contribution of Ilgam Latypov. The authors proposed a new family of page replacement algorithms for DB buffer manager which demonstrates a superior performance with respect to competitors on custom data access patterns and implies a low computational overhead on TPC-C. They provide theoretical foundations and an extensive experimental study on the proposed algorithms which covers synthetic benchmarks and an implementation in an open-source DB kernel evaluated on TPC-C.

Machine learning fundamentals and computational mathematics

Our students conduct research not only in applied fields but also achieve significant results in more theoretical areas of mathematics and computer science.

Our students Dmitry Protasov and Alexander Tolmachev, together with their co-author Vsevolod Voronov, published a preprint that examines the problem of partitioning a two-dimensional flat torus into multiple subsets to minimize the maximum diameter of each part. This problem is a specific case of the classical Borsuk problem, which asks whether any bounded subset of n-dimensional Euclidean space can be divided into n+1 parts with strictly smaller diameters. The authors present numerical estimates for the maximum diameters across different numbers of subsets. Alexander Tolmachev also published a pre-print that explores a variation of the Hadwiger–Nelson problem, which asks for the minimum number of colors needed to color the Euclidean plane so that no two points at a unit distance share the same color. The paper focuses on a specific two-dimensional case, reformulating the problem as a Maximal Independent Set (MIS) problem on graphs derived from a flat torus. The authors evaluate multiple numerical software packages for solving the MIS problem and provide theoretical justification for their approach.

From paper pre-print: Partitions of the torus into 9, 10, and 13 parts

Another notable contribution comes from our student Iryna Zabarianska, who, together with her supervisor Anton Proskurnikov, published a paper discussing a method for finding a common point of multiple convex sets in Euclidean space. This method, originally derived from algorithms for solving systems of linear equations, has since gained prominence in applications such as image processing and tomography. It focuses on a specialized multi-agent scenario where each convex set is associated with a specific agent and remains inaccessible to others. The authors provide a comprehensive overview of these methods and explore their connection to previously established theoretical results. The same author collective presented a paper that explores the Hegselmann-Krause opinion dynamics model, a deterministic averaging consensus algorithm applicable across various scientific fields, including sociology, complex physical modeling, and multi-agent systems. The authors introduce a special multidimensional extension of the model. The proposed model reveals unique behaviors, such as convergence to non-equilibrium points and periodic oscillations, which are thoroughly analyzed in the study.

Another series of theoretical results was proposed by this year’s alumna Polina Barabanschikova, addressing the max-sum matching problem within the framework of Tverberg graph theory. The first paper proves that a max-sum tree of any finite point set in is a Tverberg graph, which generalises a recent result of Abu-Affash et al., who established this claim in the plane. Additionally, they provide a new proof of a theorem by Bereg et al., which states that a max-sum matching of any even point set in the plane is a Tverberg graph. Moreover, they proved a slightly stronger version of this theorem. The next paper considers the ellipse with foci at the edge’s endpoints and eccentricity. Using an optimization approach, we prove that the convex sets bounded by these ellipses intersect, answering a Tverberg-type question of Andy Fingerhut from 1995. Finally, the paper proves a tight colorful dimension-free Tverberg theorem.

Significant research has been conducted in the field of machine learning fundamentals and model analysis. For instance, in the paper stemming from his Bachelor study, our student Andrey Veprikov under the supervision of Dr. Anton Khritankov and his colleagues propose a dynamical system to describe the iterative learning process in machine learning. This system reveals phenomena such as feedback loops, error amplification, and induced concept drift. It provides researchers with tools to analyze training workflows and address issues of trustworthiness and safety in resulting models.

In their paper, our student Galina Boeva and our alumnus Dr. Alexey Zaytsev with colleagues present an innovative approach to modeling events, viewing them not as standalone phenomena but as observations of a Gaussian Process that governs the actor’s dynamics. This paper is based on her Master’s study. They propose integrating these dynamics to create a continuous-trajectory extension of the widely successful Neural ODE model. az Through Gaussian Process theory, the uncertainty in an actor’s representation, which arises from not observing them between events, was evaluated. This estimate led to the development of a novel, theoretically backed negative feedback mechanism.

From paper: In comparison to existing approaches, the authors allowed for modeling the actor’s latent representation continuously

Another important contribution comes from Nikita Kiselev and his supervisor, our alumnus and Dr. Andrey Grabovoy, who analyse the loss landscape of neural networks — a critical aspect of their training — highlighting its importance for improving performance. Their work, based on Nikita’s Bachelor thesis, investigates how the loss surface evolves as the sample size increases, addressing a previously unexplored issue in the field. The paper theoretically analyzes the convergence of the loss landscape in a fully connected neural network and derives upper bounds for the difference in loss function values when adding a new object to the sample. Their empirical study confirms these results on various datasets, demonstrating the convergence of the loss function surface for image classification tasks.

The paper by Nikita Kiselev and Vladislav Meshkov proposes a method for estimating the Hessian matrix norm for a specific type of neural network like convolutional. They have obtained the results for both 1D and 2D convolutions, as well as for the fully connected heads in these networks. Their empirical analysis supports these findings, demonstrating convergence in the loss function landscape. Another paper by the same author presents an approach to determining the sample size for training. The paper proposes two methods based on the likelihood values on resampled subsets. They demonstrate the validity of one of these methods in a linear regression model. Computational experiments show the convergence of the proposed functions as the sample size increases. Different approaches to determine the sample size for training are presented in their paper [Nikita Kiselev, Andrey Grabovoy. Sample Size Determination: Likelihood Bootstrapping. Computational Mathematics and Mathematical Physics]. The paper proposes two methods based on the likelihood values on resampled subsets. They demonstrate the validity of one of these methods in a linear regression model. Computational experiments show the convergence of the proposed functions as the sample size increases.

From paper: The left part shows the loss function landscape for the parameter space, while the right part illustrates the difference in losses when one more object is added to the dataset. Near the optimal value of the parameters, the mean loss after adding one more object tends to be similar to the loss before adding it

The paper from NeurIPS by Alexander Tolmachev and his colleagues addresses the problem of mutual information estimation, a fundamental challenge in modern probabilistic modeling with applications in generative deep learning. Mutual information measures how much knowing one variable reduces uncertainty about another and is relevant in contexts such as relationships between random variables in graphs or layers of deep learning models. The authors propose a novel estimation method leveraging normalizing flows, which are powerful tools in contemporary deep probabilistic modeling, and provide theoretical guarantees and experimental validation for their approach. Another paper by Alexander Tolmachev extends Deep InfoMax, a self-supervised representation learning method based on mutual information, to address the challenge of aligning learned representations with a specific target distribution. The authors propose injecting independent noise into the encoder’s normalized outputs, enabling the representations to match a chosen prior distribution while preserving the original InfoMax objective. The method is shown to produce representations that conform to various continuous distributions and is evaluated across downstream tasks, highlighting a moderate trade-off between task performance and distribution matching quality. Both of these papers are closely related to another work by Alexander, presented at ICLR 2024, which is one of the results of his Master’s study. In this paper, the authors consider the information bottleneck principle, an information-theoretic framework for analyzing the training process of deep neural networks. The core idea of this framework is to track and analyze the dynamics of mutual information between the outputs of hidden layers and either the input or output of the neural network. Alexander extended this principle to a wide range of neural networks by combining mutual information estimation with a compression step, making the process more efficient and effective.

The paper by Grigoriy Ksenofontov and his colleagues investigates the problem of optimal transport through the lens of stochastic processes, focusing on the well-known Schrödinger Bridge problem. This problem has applications in diffusion generative models. The authors propose a new method, called Iterative Proportional Markovian Fitting (IPMF), which unifies existing approaches and demonstrates convergence under more general conditions

In their paper, Andrei Semenov and his co-authors address security risks in Vertical Federated Learning, a decentralised machine learning framework designed to protect data privacy. They focus on feature reconstruction attacks that compromise input data and theoretically claim that such attacks require prior knowledge of the data distribution to succeed. Through their study, the authors demonstrate that simple transformations in model architecture, such as employing multilayer perceptron-based models, can significantly enhance data protection. Experimental results confirm that these models are resistant to state-of-the-art feature reconstruction attacks. Additionally, Andrei Semenov proposed a novel architecture and method for explainable classification using Concept Bottleneck Models (CBMs), which incorporate additional knowledge about data through class-specific concepts. To address the traditionally lower performance of CBMs, they leverage pre-trained multi-modal encoders and CLIP-like architectures to develop CBMs with sparse concept representations. This approach significantly enhances the accuracy of CLIP-based bottleneck models, highlighting the effectiveness of sparse concept activation vectors.

From paper: Example of concept extraction with Sparse-CBM

This above-mentioned paper actively employs a Gumbel-Softmax distribution, a relaxation of discrete variables, allowing for backpropagation over them. A team of our students — Daniil Dorin, Igor Ignashin, Nikita Kiselev, and Andrey Veprikov — developed a PyTorch library called relaxit, which implements a wide range of methods for relaxing discrete variables. This library aims to gather and implement a wide range of methods for discrete variable relaxation. The discrete variable relaxation is crucial for generative models, where researchers employ surrogate continuous variables to approximate discrete ones, enabling parameter optimization using standard backpropagation methods. The details can be found at GitHub, Medium article and its extended version.

From site: Relaxit logo

At the MIPT conference, numerous papers on the analysis of machine learning methods were presented, all of them were later published as graduate work. Our students Petr Babkin, Kseniia Petrushina, and Konstantin Yakovlev addressed the problem of automatic ensemble search, a specific case of neural architecture search where the goal is to find an ensemble of architectures rather than a single one. They proposed a gradient-based method for ensemble search with a regularization term to promote model diversity. This regularization ensures that the resulting ensemble comprises diverse architectures while maintaining high performance.

Anton Bishuk and his supervisor, our alumna Dr. Anastasia Zukhba, presented a novel graph generation method conditioned on the statistical characteristics of graphs. The authors propose separating these characteristics into two groups: simple statistics, which can be computed efficiently using deterministic algorithms, and complex statistics, which capture intricate regularities within the graph population. The proposed method is particularly applicable to social graphs, making it valuable for applications in social sciences.

From paper: Here MF is a matrix of mixed statistics, EF is a vector of simple statistics, hEF is a hidden representation for simple statistics, DF is a vector of complex statistics, and z is a random matrix

Alexander Terentyev reported on dynamical system trajectory classification using Physics-Informed Neural Networks (PINNs). This novel type of neural network incorporates prior knowledge from physical systems to model physically consistent solutions. Alexander focused on the specific problem of classifying time series representing trajectories of mechanical systems. Kirill Semkin delivered a talk on time series prediction using tensor decomposition at the MIPT conference. He introduced a novel model architecture for time series analysis called TSSA, which combines the classical Singular Spectrum Analysis (SSA) algorithm with Canonical Polyadic Decomposition. The resulting model is shown to be efficient, with low computational complexity, and has been shown to perform effectively across various types of time series data.

Another paper on time series analysis by our PhD student Denis Tikhonov and his supervisor and Prof. Vadim Strijov explores the properties of dynamic system reconstruction using time-delay embedding and multilinear tensor algebra. The key idea is to use the tensor as a multilinear map from set phase spaces to one subspace. Due to the simplicity of the linear approach and linear dependencies between components, the results show that the method in several cases allows for a better reconstruction of the original attractor from an incomplete set of variables.

From paper: Schematic of the embedding process and the relationship between its reconstruction

Finally, our student Yuri Sapronov and his co-author Nikita Yudin presented a paper on the problem of finding the optimal policy for Average Reward Markov Decision Processes, a key challenge in many reinforcement learning algorithms. The authors propose a method that allows for inaccuracies in solving the empirical Bellman equation, which is central to reinforcement learning algorithms of this type while maintaining theoretical guarantees on the complexity of the approach.

From paper: comparison of the proposed method against competitors

Conclusion

The transfer of knowledge flows among peer professionals, – the students of our department, as actively as between students and supervisors. The cooptation and expansion of our style of learning and scientific research is the primary challenge for the Intelligent Systems Department. Twice a year we organize informal student meetings, hold a section on the scientific conference, and check the students’ research progress. We welcome new students and researchers to join us. The entrance point to the student research activity is the spring semester course “My First Scientific Paper”. New students and potential collaborators are welcome to follow the Department’s events on the YouTube channel, website, and Telegram.

ссылка на оригинал статьи https://habr.com/ru/articles/871802/

Intelligent Systems at Phystech: 2024 Year in Review

Applied methods in machine learning

Optimisation

Machine learning fundamentals and computational mathematics

Conclusion

Комментарии

Добавить комментарий Отменить ответ