You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/adding_embedders.rst
+2-2
Original file line number
Diff line number
Diff line change
@@ -13,15 +13,15 @@ To add a new DNA LM to BEND, you need to implement a new Embedder class in ``be
13
13
14
14
15
15
Implementing the ``load_model`` method
16
-
======================================
16
+
**************************************
17
17
18
18
The ``load_model`` method should load the pretrained model and tokenizer, and store them as attributes of the class.
19
19
Additionally, it should ensure that the model is in eval mode and move the model to ``device``, which is a global variable defined in ``bend.utils.embedders.py``.
20
20
If there are other configurations that need to be set for the model, they should be set here as well, and if necessary be part of the ``load_model`` method's signature.
21
21
22
22
23
23
Implementing the ``embed`` method
24
-
=================================
24
+
*********************************
25
25
26
26
The ``embed`` method should take a list of DNA sequences as input and return a list of embeddings for each sequence. The input sequences are provided as a list of strings, where each string is a DNA sequence. The output embeddings should be a list of numpy arrays, where each numpy array is the embedding for the corresponding input sequence.
Copy file name to clipboardExpand all lines: docs/source/hydra.rst
+7-7
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ supervised models on new tasks and datasets, you can do so by creating new Hydra
9
9
10
10
A Note
11
11
*********************
12
-
Please be consistent in the naming of new tasks an embedders across the different configuration files.
12
+
Please be consistent in the naming of new tasks and embedders across the different configuration files.
13
13
This is required for the code to function correclty.
14
14
15
15
Running new embedders
@@ -18,7 +18,7 @@ Running new embedders
18
18
First, an embedder needs to be implemented as laid out in the tutorial on adding new embedders. To run a new embedder on tasks, you should extend the `conf/embedding/embed.yaml <https://github.com/frederikkemarin/BEND/tree/main/conf/embedding/embed.yaml>`_ file following the example below.
19
19
This config file is used by the ``precompute_embeddings.py`` script to generate embeddings for the different tasks, as shown in the GitHub README.
20
20
21
-
.. code-block::
21
+
.. code-block::yaml
22
22
23
23
embedder_name:
24
24
_target_ : bend.utils.embedders.NewEmbedderClass
@@ -33,7 +33,7 @@ E.g. typically, ``arg_1`` will be a name or path of the model. ``arg_2`` could b
33
33
34
34
The ``embedder_name`` must also be added in the `conf/embedding/embed.yaml <https://github.com/frederikkemarin/BEND/tree/main/conf/embedding/embed.yaml>`_ under ``models``:
35
35
36
-
.. code-block::
36
+
.. code-block::yaml
37
37
38
38
models:
39
39
- resnetlm
@@ -66,7 +66,7 @@ column can also indicate folds for cross-validation, as seen in BEND's Enhancer
66
66
The `conf/embedding/embed.yaml <https://github.com/frederikkemarin/BEND/tree/main/conf/embedding/embed.yaml>`_ file configures for which
67
67
``splits`` embeddings are generated when running ``precompute_embeddings.py``:
68
68
69
-
.. code-block::
69
+
.. code-block::yaml
70
70
71
71
splits :
72
72
- train
@@ -77,7 +77,7 @@ If ``splits`` is set to ``null``, all splits in the ``split`` column will be gen
77
77
78
78
In this file, add the name of the new dataset/task under ``tasks``, and append a new config entry indicating the files and how to process them:
79
79
80
-
.. code-block::
80
+
.. code-block::yaml
81
81
82
82
tasks :
83
83
- new_task_name
@@ -106,7 +106,7 @@ To train models on the new task, you should add a ``new_task`` directory to
106
106
This directory needs to be populated with a config file for each model that should be trained on the task.
107
107
Below is an example of one such config file.
108
108
109
-
.. code-block::
109
+
.. code-block::yaml
110
110
111
111
defaults:
112
112
- datadims : [label_dims,embedding_dims]
@@ -149,4 +149,4 @@ Below is an example of one such config file.
149
149
wandb:
150
150
mode : disabled
151
151
152
-
After having run ``precompute_embeddings.py``, you can run ``train_on_task.py`` as indicated in the GitHub README!
152
+
After having run ``precompute_embeddings.py``, you can now run ``train_on_task.py`` as indicated in the GitHub README!
0 commit comments