Analytical modeling and deep learning approaches to estimating RNA SHAPE reactivity from 3D structure

Hurst, Travis; Zhou, Yuanzhe; Chen, Shi-Jie

doi:10.4310/CIS.2019.v19.n3.a4

Contents Online

Communications in Information and Systems

Volume 19 (2019)

Number 3

Analytical modeling and deep learning approaches to estimating RNA SHAPE reactivity from 3D structure

Pages: 299 – 319

DOI: https://dx.doi.org/10.4310/CIS.2019.v19.n3.a4

Authors

Travis Hurst (Department of Physics, University of Missouri, Columbia, Mo., U.S.A.)

Yuanzhe Zhou (Department of Physics, University of Missouri, Columbia, Mo., U.S.A.)

Shi-Jie Chen (Departmenst of Physics and Biochemistry, and Institute of Data Sciences & Informatics, University of Missouri, University of Missouri, Columbia, Mo., U.S.A.)

Abstract

The selective 2’-hydroxyl acylation analyzed by primer extension (SHAPE) chemical probing method provides information about RNA structure and dynamics at single nucleotide resolution. To facilitate understanding of the relationship between nucleotide flexibility, SHAPE reactivity, and RNA 3D structure, we developed an analytical 3D Structure-SHAPE Relationship (3DSSR) method and a predictive convolutional neural network (CNN) model that predict the SHAPE reactivity from RNA 3D structures. Starting from an RNA 3D structure, the analytical model combines key factors into a composite function to predict conformational flexibility of each nucleotide and calculate the correlation between the prediction and experimental SHAPE reactivity. Here, we apply the 3DSSR and the deep learning SHAPE model to SHAPE dataassisted RNA 3D structure prediction. We show that the models provide an effective sieve to exclude 3D structures that are incompatible with experimental SHAPE data. Additionally, we compare the 3DSSR analytical model with the CNN deep learning model that recognizes structural and physical/chemical patterns to predict SHAPE data from RNA 3D structure. Depending on the training data set, the analytical model outperforms the deep learning approach for most test cases, indicating that insufficient data is available to adequately train the CNN at this juncture. For other test cases, the deep learning approach provides better predictions than the analytical model, suggesting that the deep learning approach may become increasingly promising as more SHAPE data becomes available.

Full Text (PDF format)

T. Hurst and Y. Zhou contributed equally to this work.

The research of T. Hurst was supported by the NSF Graduate Research Fellowship Program under Grant 1443129.

The research of S.-J. Chen was supported by NIH Grants R01-GM063732 and R01-GM117059.

Received 1 August 2019

Published 6 December 2019