DeepMethyl's Guide
Last updated: 11/21/2015
Abstract
DeepMethyl is a
web server for predicting the DNA Methylation State of CpG
Dinucleotide using features
inferred from three-dimensional genome topology (based on Hi-C) and DNA
sequence patterns. We have built the SdA models for from
immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines based on the RRBS the
experimental data.
Prediction
Performance.
There are totally 109 features used for SdA
models including 18 genome topological features. The accuracies of the models have been tested on chromosomes 1 and 21
using 5-fold cross-validation.
The SdA prediction model for CpG sites using genome topological features can achieve an
accuracy of 84.82% for GM12878 and 72.01% for K562.
The SdA prediction model for CpG sites without any genome topological features can
achieve an accuracy 84.25% for GM12878 and 69.95% for K562.
User's
instruction.
1. Enter your e-mail address.
Your job result will be sent to your mail box once it is
done.
2. Specify the species.
Currently our model is only for human.
3. Specify the cell line.
We have built the SdA models for
from immortalised myelogenous leukemia (K562) and
healthy lymphoblastoid (GM12878) cell lines based on
the RRBS the experimental data.
4. Specify the chromosome.
5. Specify the strand.
We benchmarked and tested our SdA
models only for the CpG sites on the positive strand
of chromosomes. Our program now supports predictions for CpG
sites on both strands of the chromosomes. You can specify the strand which your
target Dinucleotide site is on.
6. Specify the genome position of your target site.
The specified genome position should be located based on human
reference genome (version GRCh37/hg19). The target site should be a CpG (or TpG, which is a CpG mutation) site. If your input position is neither of
them, we will search for the first CpG (or TpG) site in 100-bp downstream sequence and use position of
site instead. The result in the
email will indicate the position of target CpG site.
If the target CpG site is not
covered by any three-dimensional genome topology information (no Hi-C contacts
within 50,000 bps of the site), we will use an
alternate model instead and notify it in the email.
If the target CpG site is already
covered by RRBS experiment, we will send the experimental result instead of
making prediction.