Skip to content

Latest commit

 

History

History
53 lines (40 loc) · 2.1 KB

File metadata and controls

53 lines (40 loc) · 2.1 KB

Binding

Description

The datasets provided here aim at predicting protein binding (2-state). We provide the following four datasets:

  • binding_metal.fasta: Binding to metal ions (0/1)
  • binding_nuclear.fasta: Binding to nucleic acids (0/1)
  • binding_small.fasta: Binding to small molecules (0/1)
  • binding_combined.fasta: Binding to metal, nucleic acids OR small molecules (0/1)

Dataset Compilation

The provided dataset was compiled from the data provided in the bindEmbed repository.

Dataset Format

The dataset is provided in biotrainer-ready fasta format. Each entry contains a sequence and a header, providing the sequence id, the set (train/val/test) and the target label.

Dataset Benchmarks

The bindEmbed paper contains benchmarks for the binding prediction tasks. The TestSetNew46 is the independent set used for these datasets.

Citations

@Article{Littmann2021b,
  author    = {Littmann, Maria and Heinzinger, Michael and Dallago, Christian and Weissenow, Konstantin and Rost, Burkhard},
  journal   = {Scientific Reports},
  title     = {Protein embeddings and deep learning predict binding residues for various ligand classes},
  year      = {2021},
  issn      = {2045-2322},
  month     = dec,
  number    = {1},
  volume    = {11},
  doi       = {10.1038/s41598-021-03431-4},
  publisher = {Springer Science and Business Media LLC},
}

Data licensing

The RAW data downloaded from the aforementioned publication is subject to the MIT license. Modified data available in this repository falls under AFL-3.