Alina Selega

Alina Selega


Research Interests

The focus of my PhD project is on investigating the structure of ribonucleic acid (RNA). RNA plays a vital role in gene decoding, expression, and regulation. Its ability to form a continuum of highly complex three-dimensional shapes facilitates interactions with other molecules, giving rise to a wealth of functions of post-transcriptional control. This makes it very important to understand the structure of RNA in order to explain how these molecular processes normally work and what goes wrong in disease.

Combining chemical structure probing methods with next-generation sequencing can identify accessible regions of RNA which might interact with other molecules. Further, recently developed methods made it possible to identify the interacting sites of RNA with the protein of interest. The idea of this project is to integrate these two data sources in order to assess how the dynamic events of protein binding can complement the information about RNA structural conformation. It is planned to apply the resulting model within a collaborative project to study Motor Neurone Disease.

My other interests include memory. My background is in computer science and mathematics and I aim to apply such techniques as probabilistic modelling and machine learning to biological data. In particular, I am especially drawn towards working in clinical applications.

Publications:
2017
  Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments
Selega, A, Sirocchi, C, Iosub, I, Granneman, S & Sanguinetti, G 2017, 'Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments' Nature Methods, vol 14, no. 1, pp. 83-89. DOI: 10.1038/nmeth.4068
Structure probing coupled with high-throughput sequencing could revolutionize our understanding of the role of RNA structure in regulation of gene expression. Despite recent technological advances, intrinsic noise and high sequence coverage requirements greatly limit the applicability of these techniques. Here we describe a probabilistic modeling pipeline that accounts for biological variability and biases in the data, yielding statistically interpretable scores for the probability of nucleotide modification transcriptome wide. Using two yeast data sets, we demonstrate that our method has increased sensitivity, and thus our pipeline identifies modified regions on many more transcripts than do existing pipelines. Our method also provides confident predictions at much lower sequence coverage levels than those recommended for reliable structural probing. Our results show that statistical modeling extends the scope and potential of transcriptome-wide structure probing experiments.
General Information
Organisations: School of Informatics.
Authors: Selega, Alina, Sirocchi, Christel, Iosub, Ira, Granneman, Sander & Sanguinetti, Guido.
Number of pages: 10
Pages: 83-89
Publication Date: 2017
Publication Information
Category: Article
Journal: Nature Methods
Volume: 14
Issue number: 1
ISSN: 1548-7091
Original Language: English
DOIs: 10.1038/nmeth.4068
2016
  Robust statistical modeling greatly improves sensitivity of high-throughput RNA structure probing experiments
Selega, A, Sirocchi, C, Iosub, I, Granneman, S & Sanguinetti, G 2016, 'Robust statistical modeling greatly improves sensitivity of high-throughput RNA structure probing experiments' Intelligent Systems for Molecular Biology, Orlando, FL, United States, 8/07/17 - 12/07/17, .
Structure probing coupled with high-throughput sequencing holds the potential to revolutionise our understanding of the role of RNA structure in regulation of gene expression. Despite major technological advances, intrinsic noise and high coverage requirements greatly limit the applicability of these techniques. Existing methods [1, 2, 3] do not provide strategies for correcting biases of the technology and are not sufficiently informed by inter-replicate variability in order to perform justifiable statistical assessments.
We developed a probabilistic modelling pipeline which specifically accounts for biological variability and provides automated empirical strategies to correct coverage- and sequence-dependent biases in the data. The output of our method yields statistically interpretable scores for the probability of nucleotide modification transcriptome-wide, obviating the need for arbitrary thresholds and post-processing. We demonstrate on two yeast data sets that our method has greatly increased sensitivity, enabling the identification of modified regions on a greatly increased number of transcripts, compared with existing pipelines. Our method also provides accurate and confident predictions at much lower coverage levels than those recommended in recent studies [3, 4], which are normally only met for a handful of transcripts in transcriptome-wide experiments. Our results show that statistical modelling greatly extends the scope and potential of transcriptome-wide structure probing experiments."

[1] Ding, Yiliang, et al. Nature 505.7485 (2014).
[2] Kielpinski, Lukasz Jan, and Jeppe Vinther, Nucleic acids research (2014).
[3] Talkish, Jason, et al. RNA 20.5 (2014).
[4] Siegfried, Nathan A., et al. Nature methods 11.9 (2014).

General Information
Organisations: Institute for Adaptive and Neural Computation .
Authors: Selega, Alina, Sirocchi, Christel, Iosub, Ira, Granneman, Sander & Sanguinetti, Guido.
Publication Date: 8 Jul 2016
Publication Information
Category: Poster
Original Language: English
  Trends and challenges in Computational RNA biology
Selega, A & Sanguinetti, G 2016, 'Trends and challenges in Computational RNA biology' Genome Biology, vol 17, no. 253, pp. 1-4. DOI: 10.1186/s13059-016-1117-7
A report on the Wellcome Trust Conference on Computational RNA
Biology held in Hinxton, UK, 17 to 19 October 2016.
General Information
Organisations: Institute for Adaptive and Neural Computation .
Authors: Selega, Alina & Sanguinetti, Guido.
Number of pages: 4
Pages: 1-4
Publication Date: 7 Dec 2016
Publication Information
Category: Meeting abstract
Journal: Genome Biology
Volume: 17
Issue number: 253
ISSN: 1465-6906
Original Language: English
DOIs: 10.1186/s13059-016-1117-7
  Robust statistical modeling greatly improves sensitivity of high-throughput RNA structure probing experiments
Selega, A, Sirocchi, C, Iosub, I, Granneman, S & Sanguinetti, G 2016, 'Robust statistical modeling greatly improves sensitivity of high-throughput RNA structure probing experiments' 11th Women in Machine Learning Workshop, Barcelona, Spain, 5/12/16 - 6/12/16, .
RNA structure plays a key role in regulating many mechanisms crucial for correct cellular functioning, such as RNA stability, transcription, and mRNA translation rates. In order to identify RNA structural regulatory elements, chemical and enzymatic structure probing is routinely used to interrogate RNA structure both in vivo and in vitro [1]. In these structure probing experiments, a chemical agent reacts with the RNA molecule in a structure-dependent way, cleaving or otherwise modifying its flexible parts. These modified positions can then be detected by primer extension analyses, providing valuable structural information that can be used to constrain RNA energy-based structure prediction software and significantly improve prediction accuracy [2, 3].

Coupled with high-throughput sequencing, structure probing allows interrogation of thousands of molecules in a single reaction, holding the potential to revolutionise our understanding of the role of RNA structure in regulation of gene expression. However, despite major technological advances, intrinsic noise and high coverage requirements greatly limit the applicability of these techniques. Existing methods [4, 5, 6] do not provide strategies for correcting biases of the technology and are not sufficiently informed by inter-replicate variability in order to perform justifiable statistical assessments.

We developed a probabilistic modelling pipeline which specifically accounts for biological variability and provides automated empirical strategies to correct coverage- and sequence-dependent biases in the data. Our model supports multiple experimental replicates in both control and treatment conditions and computes empirical p-values for each nucleotide by comparing the utilised measure of variability between conditions. These p-values are then used as observations in a Beta-Uniform mixture hidden Markov model, generating posterior probabilities of modification transcriptome-wide as the model's output. This obviates the need for setting arbitrary thresholds and other post-processing.

We demonstrate on two yeast data sets that our method has greatly increased sensitivity, enabling the identification of modified regions on many more transcripts compared with existing pipelines. Our method also provides accurate and confident predictions at much lower coverage levels than those recommended in recent studies [6, 7], which are normally only met for a handful of transcripts in transcriptome-wide experiments. Our results show that statistical modelling greatly extends the scope and potential of transcriptome-wide structure probing experiments.
[1] Kubota et al. "Progress and challenges for chemical probing of RNA structure inside living cells." Nature chemical biology (2015).

[2] Wu et al. "Improved prediction of RNA secondary structure by integrating the free energy model with restraints derived from experimental probing data." Nucleic acids research (2015).

[3] Ouyang et al. "SeqFold: genome-scale reconstruction of RNA secondary structure integrating high-throughput sequencing data." Genome research (2013).

[4] Ding et al. "In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features." Nature (2014).

[5] Kielpinski et al. "Chapter Six - Reproducible Analysis of Sequencing-Based RNA Structure Probing Data with User-Friendly Tools." Methods in enzymology (2015).

[6] Talkish et al. "Mod-seq: high-throughput sequencing for chemical probing of RNA structure." RNA (2014).

[7] Siegfried et al. "RNA motif discovery by SHAPE and mutational profiling (SHAPE-MaP)." Nature methods (2014).
General Information
Organisations: Institute for Adaptive and Neural Computation .
Authors: Selega, Alina, Sirocchi, Christel, Iosub, Ira, Granneman, Sander & Sanguinetti, Guido.
Publication Date: 5 Dec 2016
Publication Information
Category: Poster
Original Language: English

Projects:
Investigation of RNA structure by integrating different data sources: RNA chemical conformation and RNA-protein interactions (PhD)