Topologically Associating Domain Knowledge Base (TADKB) is an integrated resource for exploring topologically associating domains (TADs).

TADKB provides genome-wide TADs of ten cell types (i.e., 1. GM12878, 2. HMEC, 3. NHEK, 4. IMR90, 5. KBM7, 6. K562, 7. HUVEC for human and 8. CH12-LX, 9. ES, 10. NPC, and 11 CN for mouse) The domain definitions are called using normalized Hi-C data at the resolutions of 50 kb and 10 kb and three domain-caller methods: (1) directionality index, (2) GMAP, and (3) insulation score.

The predicted 3D structures of TADs are inferred using multidimensional scaling method. There are three steps:

  1. Rescale Hi-C cantacts into the range [1,30] using linear transformation without considering missing Hi-C contact.
  2. The rescaled Hi-C contacts are converted into wish spatial distances using an equation y=(1/x)^(1/3);
  3. The wish spatial distances are converted into 3D coordinates.

We download protein-coding gene data from Ensembl and three lncRNA databases, including NONCODE 2016, LNCipedia 4.0, and lncRNAdb 2.0. We map genes onto TADs of each of the ten cell types by comparing their genomic positions. Specially, since the three lncRNA databases don’t use the same ID definition we combined the two lncRNA databases (LNCipedia 4.0 and lncRNAdb 2.0) into NONCODE 2016. Therefore, users can check alternative lncRNAs under the tab of selected NONCODE lncRNA information where users can also find protein binding data integrated from lncRNAtor.

TADKB used TM-scores between two TADs' reconstructed 3D structures as structural similarity and used Pearson correlation coefficients between two TADs' fold enrichment of chromatin states as functional (chromatin-state) similarity. We used spectrial clustering algorithm to cluster TADs in a cell line based on structural and functional similarities, respectively.


The TADKB family is defined as such a set of TADs that are not only found in one of chromatin-state clusters and but also found in one of structural clusters.