Skip to content

Available databases

Jamie Morton edited this page Feb 8, 2023 · 10 revisions

We have constructed TM-vec encoded databases on both CATH S100 and Swissprot

The CATH S100 database can be found here and the corresponding metadata can be found here.

These can also be downloaded using wget as follows

wget https://users.flatironinstitute.org/thamamsy/public_www/cath_large.npy
wget https://users.flatironinstitute.org/thamamsy/public_www/cath_large_metadata.npy
wget https://users.flatironinstitute.org/thamamsy/public_www/cath-domain-seqs-large.fa
wget https://users.flatironinstitute.org/thamamsy/public_www/cath-domain-seqs-large.fai

The Swissprot database can be found here and the corresponding metadata can be found here.

These can also be downloaded using wget as follows

wget https://users.flatironinstitute.org/thamamsy/public_www/swiss_large.npy
wget https://users.flatironinstitute.org/thamamsy/public_www/swiss_large_metadata.npy
wget https://users.flatironinstitute.org/thamamsy/public_www/swissprot_seq.fasta
wget https://users.flatironinstitute.org/thamamsy/public_www/swissprot_seq.fai

The metadata contains information about protein annotation, and is not necessary for running the search command.

Clone this wiki locally