Projects
Here I describe the large projects I was invovled into and give links to the materials and software.
PostDoc project, Institute of Medical Genetics and Applied Genomics, Tuebingen, Germany, 2024-now
ERDERA is a pan-European consortium, aiming at diagnostics of patients with undiagnosed rare diseases. My goal as a bioinformatician is to use innovative methods to assist clinician scientists in diagnostics.
PostDoc project, Institute of Medical Genetics and Applied Genomics, Tuebingen, Germany, 2020-2024
The goal of my PostDoc project was to download and process genomic data from a Pan-European consoritum Solve-RD, consisting of approximately 20.000 germline exome sequencing results, 3.000 genome sequencing results, several hundreds of long read (PacBio) datasets, around 800 of deep exome sequencing results of tumors and sometime germline (when mosaic variants were suspected) together with their metadata such as HPO-based phenotypes and relatedness structures, comprising altogether almost a petabyte of sequencing data. All these genomic data belong to rare disease patients who underwent some sort of genetic diagnostics before, but the diagnosis were not found. Thus, our goal was to process the dataset with the special focus at typically neglected variants or variants which are difficult to reliably detect, such as copy-number variants,structural variants, mobile element insertions or repeat expansions (in long reads).
As a result of the first half of this project, a paper was published in Nature Medicine, with me being a second author. Hundreds of undiagnosed patients received their diagnosis, after years of waiting.
Internship project: Washington University in St Louis, USA, 2016-2017
I had a short 6 months stay at the Washington University in St Louis, Immunology department, Artyomov lab. During this research visit I had to choose the optimal pipeline for the analysis of the data type which was new for me, namely, human methylation data obtained with eRRBS. I tested several tools and decided to use MethPipe tool due to its unusual but smart approach to this specific data. I also designed the visualizations of the obtained results, with the invaluable help of the head of the lab. Finally, several years after my stay, this approach was applied to the final dataset and the resulting findings were described by the first author of the Nature Aging publication.
PhD project: Center for Genomic Regulation, Barcelona, Spain, 2015-2017, and Institute of Medical Genetics and Applied Genomics, Tuebingen, Germany, 2018-2020
My PhD project was to develop a tool for copy-number variants detection in germline and somatic contexts. At first my KPI was just to develop a tool that has a higher analytical performance than the existing solutions, but after I moved to Germany in 2018 the tool was immediately introduced into clinics and the secondary goal of interpretability of the results by the clinicians was set. I successfully completed the task and defended a thesis based on my results in December 2019. The code of the tool called ClinCNV is available on github and is a part of MegSAP pipeline, passed several validation and accreditation rounds as a part of MegSAP and is used for detection of copy-number variants in tens of thousands of rare disease patients by our and partner clinics annually, as well as thousands of cancer patients. The tool is widely used by a broader research community and is published as two preprints for germline and somatic frameworks separately.
Parseq Lab and Saint Petersburg Academic University, Russia, 2015
I started to work with a Russian biotech startup Parseq Lab in 2013 and was hired as an intern in 2014 and got promoted into a full time bioinformatician two months after. My goal was to develop and validate a tool for detection of copy-number variants within the framework of routine neonathal NGS-based screening of newborns for particular rare disorders (cystic fibrosis, phenylketonuria, galactosemia). A novel two-phase (unsupervised and then supervised) statistical method was developed under a supervision, however, almost fully independently. The method was implemented in Java and Python in accordance with the best bioinformatic coding practices of that time. The code of the developed tool was reviewed by the bioinformatic department of the company and the software was used for routine diagnostics in several facilities in Russia. The tool called CONVector is openly available and an accompanying paper was published in BMC Bioinformatics in 2016.
Moscow State University, Moscow, Russia, 2013, Mathematical faculty project
Unfortunately, I cannot boast about any achievements in mathematics during my studies at the university’s mathematics faculty. However, in one year of working at the English Department, Faculty of Mechanics and Mathematics, Lomonosov Moscow State University in 2012-2013 (2 days per week, 5 hours work day) I developed a web site (having a minimum prior experience) and typeset a 260-page book “Practicum on Reading Specialized Literature for Mechanics and Mathematics Students” in LaTeX publishing system.