Merge pull request frederikkemarin#41 from frederikkemarin/docs

frederikkemarin · web-flow · commit d378a55b1e65 · 2023-10-03T22:23:15.000+02:00
Merge docs to main
diff --git a/bend/utils/__init__.py b/bend/utils/__init__.py
@@ -4,11 +4,8 @@
 This module contains a collection of utilities used throughout the project for 
 data processing, model training, and evaluation.
 
-- :class:`~bend.utils.retrieve_from_bed.Annotation`: a class for retrieving
-    sequences from a reference genome based on a bed file.
-
-- :class:`~bend.utils.task_trainer.TaskTrainer`: a class for training a model
-    on a given task.
+- :class:`~bend.utils.retrieve_from_bed.Annotation`: a class for retrieving sequences from a reference genome based on a bed file.
+- :class:`~bend.utils.task_trainer.TaskTrainer`: a class for training a model on a given task.
     
 
 """
diff --git a/docs/source/adding_embedders.rst b/docs/source/adding_embedders.rst
@@ -13,15 +13,15 @@ To add a new DNA LM to BEND, you need to implement a new Embedder class in  ``be
 
 
 Implementing the ``load_model`` method
-======================================
+**************************************
 
 The ``load_model`` method should load the pretrained model and tokenizer, and store them as attributes of the class.
 Additionally, it should ensure that the model is in eval mode and move the model to ``device``, which is a global variable defined in ``bend.utils.embedders.py``.
 If there are other configurations that need to be set for the model, they should be set here as well, and if necessary be part of the ``load_model`` method's signature.
 
 
 Implementing the ``embed`` method
-=================================
+*********************************
 
 The ``embed`` method should take a list of DNA sequences as input and return a list of embeddings for each sequence. The input sequences are provided as a list of strings, where each string is a DNA sequence. The output embeddings should be a list of numpy arrays, where each numpy array is the embedding for the corresponding input sequence.
 
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -39,3 +39,6 @@
 # html_theme = 'alabaster'
 html_static_path = ['_static']
 autodoc_mock_imports = ['wandb']
+
+
+autoclass_content = 'both'
diff --git a/docs/source/hydra.rst b/docs/source/hydra.rst
@@ -9,7 +9,7 @@ supervised models on new tasks and datasets, you can do so by creating new Hydra
 
 A Note 
 *********************
-Please be consistent in the naming of new tasks an embedders across the different configuration files. 
+Please be consistent in the naming of new tasks and embedders across the different configuration files. 
 This is required for the code to function correclty.
 
 Running new embedders
@@ -18,7 +18,7 @@ Running new embedders
 First, an embedder needs to be implemented as laid out in the tutorial on adding new embedders. To run a new embedder on tasks, you should extend the `conf/embedding/embed.yaml <https://github.com/frederikkemarin/BEND/tree/main/conf/embedding/embed.yaml>`_ file following the example below.
 This config file is used by the ``precompute_embeddings.py`` script to generate embeddings for the different tasks, as shown in the GitHub README.
 
-.. code-block::
+.. code-block:: yaml
 
     embedder_name:
     _target_ : bend.utils.embedders.NewEmbedderClass
@@ -33,7 +33,7 @@ E.g. typically, ``arg_1`` will be a name or path of the model. ``arg_2`` could b
 
 The ``embedder_name`` must also be added in the `conf/embedding/embed.yaml <https://github.com/frederikkemarin/BEND/tree/main/conf/embedding/embed.yaml>`_ under ``models``:
 
-.. code-block::
+.. code-block:: yaml
 
     models:
     - resnetlm
@@ -66,7 +66,7 @@ column can also indicate folds for cross-validation, as seen in BEND's Enhancer
 The `conf/embedding/embed.yaml <https://github.com/frederikkemarin/BEND/tree/main/conf/embedding/embed.yaml>`_ file configures for which 
 ``splits`` embeddings are generated when running ``precompute_embeddings.py``:
 
-.. code-block::
+.. code-block:: yaml
 
   splits : 
     - train
@@ -77,7 +77,7 @@ If ``splits`` is set to ``null``, all splits in the ``split`` column will be gen
 
 In this file, add the name of the new dataset/task under ``tasks``, and append a new config entry indicating the files and how to process them:
 
-.. code-block::
+.. code-block:: yaml
 
   tasks : 
     - new_task_name 
@@ -106,7 +106,7 @@ To train models on the new task, you should add a ``new_task`` directory to
 This directory needs to be populated with a config file for each model that should be trained on the task.
 Below is an example of one such config file.
 
-.. code-block::
+.. code-block:: yaml
 
   defaults:
     - datadims : [label_dims,embedding_dims]
@@ -149,4 +149,4 @@ Below is an example of one such config file.
   wandb:
     mode : disabled 
 
-After having run ``precompute_embeddings.py``, you can run ``train_on_task.py`` as indicated in the GitHub README!
+After having run ``precompute_embeddings.py``, you can now run ``train_on_task.py`` as indicated in the GitHub README!