DIRECT Zhao group at Central China Normal University
DIRECT: RNA Contact Predictions by Integrating Structural Patterns
It is widely believed that tertiary nucleotide-nucleotide interactions are essential in determining RNA structure and function. Currently, direct coupling analysis (DCA) infers nucleotide contacts in a sequence from its homologous sequence alignment across different species. DCA and similar approaches that use sequence information alone typically yield a low accuracy, especially when the available homologous sequences are limited. Therefore, new methods for RNA structural contact inference are desirable because even a single correctly predicted tertiary contact can potentially make the difference between a correct and incorrectly predicted structure. Here we present a new method DIRECT (Direct Information REweighted by Contact Templates) that incorporates a Restricted Boltzmann Machine (RBM) to augment the information on sequence co-variations with structural features in contact inference. Benchmark tests demonstrate that DIRECT achieves better overall performance than DCA approaches. DIRECT produces a substantial enhancement of 41% and 18% in accuracy on average for contact prediction in comparison to the mfDCA and plmDCA. DIRECT improves predictions for long-range contacts and captures more tertiary structural features. We develop a hybrid approach that incorporates a Restricted Boltzmann Machine (RBM) to augment the information on sequence co-variations with structural templates in contact inference. Our results demonstrate that DIRECT is able to improve the RNA contact prediction.
(collaborated with Prof. Chen Zeng at the George Washington University)
DIRECT is available:
Instructions of usage of DIRECT
. Get rbm_weight (contact template): run rbmDCA_getweight.m
. Reproduce Figure 1: run rbmDCA_riboswotch.m (note: users are supposed to change PDB id in line 5,21,24 and 87 accordingly)
. Conservations of residues in top predicted pairs: run contact_pair_conservation.m; result is rbmDI_correct_psotion (5 columns: res1, res2, DIRECT scores, conservation for res1, conservation for res2; note: users are supposed to change PDB id in line 6)
. Contact type for predicted pair (results can be used to reproduce Figure 3): run contact_conservation_type.m; result is combination_DI_correct_pred.mat, combination_rbmDI_correct_pred.mat (These two are array of 6 entries, low-low, low-mid, low-high, mid-mid, mid-high, high-high, where low, mid, high stands for residues with conservation 1-3, 4-6,7-9; note: users are supposed to change PDB id in line 5,20,23,26 accordingly)
"./direct_information" contains DI scores (mat files) from DCA.
"./distance_matrix" contains distance metrics (txt files) for testing.
"./riboswitch_conservation_score" contains the conservations for each residues (txt files) in testing riboswitches.
"./riboswitch_MSA" contains multiple sequence alignments (fasta) of riboswitches used by DCA.
"./training_riboswitch_data_and_script" contains the distance metrics (txt files) for training.
"./weight" contains RBM-based weights (mat files) for riboswitches (contact templates).
RNA Tertiary structure prediction
We predicted RNA tertiary structures using 3dRNA, RNAcomposer, simRNA and Vfold3D. For each RNA structure prediction, we use the corresponding sequence and secondary structure on the RNA structure modeling servers. All tertiary structures are predicted automatically.
Any questions about DIRECT program, please email to firstname.lastname@example.org or email@example.com.
Prof. Yunjie Zhao Prof. Chen Zeng
Central China Normal University George Washington University
Email: firstname.lastname@example.org Email: email@example.com
Copyright 2019, Lab of Biophysics