DeepMethyl's Guide

Last updated: 11/21/2015

 

Abstract

DeepMethyl is a web server for predicting the DNA Methylation State of CpG Dinucleotide using features inferred from three-dimensional genome topology (based on Hi-C) and DNA sequence patterns. We have built the SdA models for from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines based on the RRBS the experimental data.

 

Prediction Performance.

There are totally 109 features used for SdA models including 18 genome topological features. The accuracies of the models have been tested on chromosomes 1 and 21 using 5-fold cross-validation.

The SdA prediction model for CpG sites using genome topological features can achieve an accuracy of 84.82% for GM12878 and 72.01% for K562.

The SdA prediction model for CpG sites without any genome topological features can achieve an accuracy 84.25% for GM12878 and 69.95% for K562.

 

User's instruction.

1. Enter your e-mail address.

Your job result will be sent to your mail box once it is done.

2. Specify the species.

Currently our model is only for human.

3. Specify the cell line.

We have built the SdA models for from immortalised myelogenous leukemia (K562) and healthy lymphoblastoid (GM12878) cell lines based on the RRBS the experimental data.

4. Specify the chromosome.

5. Specify the strand.

We benchmarked and tested our SdA models only for the CpG sites on the positive strand of chromosomes. Our program now supports predictions for CpG sites on both strands of the chromosomes. You can specify the strand which your target Dinucleotide site is on.

6. Specify the genome position of your target site.

The specified genome position should be located based on human reference genome (version GRCh37/hg19). The target site should be a CpG (or TpG, which is a CpG mutation) site. If your input position is neither of them, we will search for the first CpG (or TpG) site in 100-bp downstream sequence and use position of site instead.  The result in the email will indicate the position of target CpG site.

If the target CpG site is not covered by any three-dimensional genome topology information (no Hi-C contacts within 50,000 bps of the site), we will use an alternate model instead and notify it in the email.

If the target CpG site is already covered by RRBS experiment, we will send the experimental result instead of making prediction.