Skip to content

Commit d378a55

Browse files
Merge pull request frederikkemarin#41 from frederikkemarin/docs
Merge docs to main
2 parents 7854f1a + acd3423 commit d378a55

File tree

4 files changed

+14
-14
lines changed

4 files changed

+14
-14
lines changed

bend/utils/__init__.py

+2-5
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,8 @@
44
This module contains a collection of utilities used throughout the project for
55
data processing, model training, and evaluation.
66
7-
- :class:`~bend.utils.retrieve_from_bed.Annotation`: a class for retrieving
8-
sequences from a reference genome based on a bed file.
9-
10-
- :class:`~bend.utils.task_trainer.TaskTrainer`: a class for training a model
11-
on a given task.
7+
- :class:`~bend.utils.retrieve_from_bed.Annotation`: a class for retrieving sequences from a reference genome based on a bed file.
8+
- :class:`~bend.utils.task_trainer.TaskTrainer`: a class for training a model on a given task.
129
1310
1411
"""

docs/source/adding_embedders.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,15 @@ To add a new DNA LM to BEND, you need to implement a new Embedder class in ``be
1313

1414

1515
Implementing the ``load_model`` method
16-
======================================
16+
**************************************
1717

1818
The ``load_model`` method should load the pretrained model and tokenizer, and store them as attributes of the class.
1919
Additionally, it should ensure that the model is in eval mode and move the model to ``device``, which is a global variable defined in ``bend.utils.embedders.py``.
2020
If there are other configurations that need to be set for the model, they should be set here as well, and if necessary be part of the ``load_model`` method's signature.
2121

2222

2323
Implementing the ``embed`` method
24-
=================================
24+
*********************************
2525

2626
The ``embed`` method should take a list of DNA sequences as input and return a list of embeddings for each sequence. The input sequences are provided as a list of strings, where each string is a DNA sequence. The output embeddings should be a list of numpy arrays, where each numpy array is the embedding for the corresponding input sequence.
2727

docs/source/conf.py

+3
Original file line numberDiff line numberDiff line change
@@ -39,3 +39,6 @@
3939
# html_theme = 'alabaster'
4040
html_static_path = ['_static']
4141
autodoc_mock_imports = ['wandb']
42+
43+
44+
autoclass_content = 'both'

docs/source/hydra.rst

+7-7
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ supervised models on new tasks and datasets, you can do so by creating new Hydra
99

1010
A Note
1111
*********************
12-
Please be consistent in the naming of new tasks an embedders across the different configuration files.
12+
Please be consistent in the naming of new tasks and embedders across the different configuration files.
1313
This is required for the code to function correclty.
1414

1515
Running new embedders
@@ -18,7 +18,7 @@ Running new embedders
1818
First, an embedder needs to be implemented as laid out in the tutorial on adding new embedders. To run a new embedder on tasks, you should extend the `conf/embedding/embed.yaml <https://github.com/frederikkemarin/BEND/tree/main/conf/embedding/embed.yaml>`_ file following the example below.
1919
This config file is used by the ``precompute_embeddings.py`` script to generate embeddings for the different tasks, as shown in the GitHub README.
2020

21-
.. code-block::
21+
.. code-block:: yaml
2222
2323
embedder_name:
2424
_target_ : bend.utils.embedders.NewEmbedderClass
@@ -33,7 +33,7 @@ E.g. typically, ``arg_1`` will be a name or path of the model. ``arg_2`` could b
3333

3434
The ``embedder_name`` must also be added in the `conf/embedding/embed.yaml <https://github.com/frederikkemarin/BEND/tree/main/conf/embedding/embed.yaml>`_ under ``models``:
3535

36-
.. code-block::
36+
.. code-block:: yaml
3737
3838
models:
3939
- resnetlm
@@ -66,7 +66,7 @@ column can also indicate folds for cross-validation, as seen in BEND's Enhancer
6666
The `conf/embedding/embed.yaml <https://github.com/frederikkemarin/BEND/tree/main/conf/embedding/embed.yaml>`_ file configures for which
6767
``splits`` embeddings are generated when running ``precompute_embeddings.py``:
6868

69-
.. code-block::
69+
.. code-block:: yaml
7070
7171
splits :
7272
- train
@@ -77,7 +77,7 @@ If ``splits`` is set to ``null``, all splits in the ``split`` column will be gen
7777

7878
In this file, add the name of the new dataset/task under ``tasks``, and append a new config entry indicating the files and how to process them:
7979

80-
.. code-block::
80+
.. code-block:: yaml
8181
8282
tasks :
8383
- new_task_name
@@ -106,7 +106,7 @@ To train models on the new task, you should add a ``new_task`` directory to
106106
This directory needs to be populated with a config file for each model that should be trained on the task.
107107
Below is an example of one such config file.
108108

109-
.. code-block::
109+
.. code-block:: yaml
110110
111111
defaults:
112112
- datadims : [label_dims,embedding_dims]
@@ -149,4 +149,4 @@ Below is an example of one such config file.
149149
wandb:
150150
mode : disabled
151151
152-
After having run ``precompute_embeddings.py``, you can run ``train_on_task.py`` as indicated in the GitHub README!
152+
After having run ``precompute_embeddings.py``, you can now run ``train_on_task.py`` as indicated in the GitHub README!

0 commit comments

Comments
 (0)