Merge pull request #10 from NCEAS/coding-tips-dev

angelchen7 · web-flow · commit c4e09e5fc38e · 2024-04-29T12:50:08.000-07:00
Coding tips dev
diff --git a/_freeze/best_practices/execute-results/html.json b/_freeze/best_practices/execute-results/html.json
diff --git a/_freeze/modules_best-practices/file-paths/execute-results/html.json b/_freeze/modules_best-practices/file-paths/execute-results/html.json
@@ -1,7 +1,7 @@
 {
-  "hash": "6c3f32b1566130c17d7b77285dc44aae",
+  "hash": "0b74bcb0ee0f9019d9c30ea7a67b3169",
   "result": {
-    "markdown": "\nThis section contains our recommendations for handling **file paths**. When you code collaboratively (e.g., with GitHub), accounting for the difference between your folder structure and those of your colleagues becomes critical. Ideally your code should be completely agnostic about (1) the operating system of the computer it is running on (i.e., Windows vs. Mac) and (2) the folder structure of the computer. We can--fortunately--handle these two considerations relatively simply.\n\nThis may seem somewhat dry but it is worth mentioning that failing to use relative file paths is a significant hindrance to reproducibility (see [Trisovic et al. 2022](https://www.nature.com/articles/s41597-022-01143-6)).\n\n### 1. Preserve File Paths as Objects Using `file.path`\n\nDepending on the operating system of the computer, the slashes between folder names are different (`\\` versus `/`). The `file.path` function automatically detects the computer operating system and inserts the correct slash. We recommend using this function and assigning your file path to an object.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_path <- file.path(\"path\", \"to\", \"my\", \"file\")\nmy_path\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"path/to/my/file\"\n```\n:::\n:::\n\n\nOnce you have that path object, you can use it everywhere you import or export information to/from the code (with another use of `file.path` to get the right type of slash!).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Import\nmy_raw_data <- read.csv(file = file.path(my_path, \"raw_data.csv\"))\n\n# Export\nwrite.csv(x = data_object, file = file.path(my_path, \"tidy_data.csv\"))\n```\n:::\n\n\n### 2. Create Necessary Sub-Folders in the Code with `dir.create`\n\nUsing `file.path` guarantees that your code will work regardless of the upstream folder structure but what about the folders that you need to export or import things to/from? For example, say your `graphs.R` script saves a couple of useful exploratory graphs to the \"Plots\" folder, how would you guarantee that everyone running `graphs.R` *has* a \"Plots folder\"? You can use the `dir.create` function to create the folder in the code (and include your path object from step 1!).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create needed folder\ndir.create(path = file.path(my_path, \"Plots\"), showWarnings = FALSE)\n\n# Then export to that folder\nggplot2::ggsave(filename = file.path(my_path, \"Plots\", \"my_plot.png\"))\n```\n:::\n\n\nThe `showWarnings` argument of `dir.create` simply warns you if the folder you're creating already exists or not. There is no negative to \"creating\" a folder that already exists (nothing is overwritten!!) but the warning can be confusing so we can silence it ahead of time.\n\n### File Paths Summary\n\nWe strongly recommend following these guidelines so that your scripts work regardless of (1) the operating system, (2) folders \"upstream\" of the working directory, and (3) folders within the project. This will help your code by flexible and reproducible when others are attempting to re-run your scripts!\n",
+    "markdown": "\nThis section contains our recommendations for handling **file paths**. When you code collaboratively (e.g., with GitHub), accounting for the difference between your folder structure and those of your colleagues becomes critical. Ideally your code should be completely agnostic about (1) the operating system of the computer it is running on (i.e., Windows vs. Mac) and (2) the folder structure of the computer. We can--fortunately--handle these two considerations relatively simply.\n\nThis may seem somewhat dry but it is worth mentioning that failing to use relative file paths is a significant hindrance to reproducibility (see [Trisovic et al. 2022](https://www.nature.com/articles/s41597-022-01143-6)).\n\n### 1. Preserve File Paths as Objects Using `file.path`\n\nDepending on the operating system of the computer, the slashes between folder names are different (`\\` versus `/`). The `file.path` function automatically detects the computer operating system and inserts the correct slash. We recommend using this function and assigning your file path to an object.\n\n\n::: {.cell}\n\n```{.r .cell-code}\nmy_path <- file.path(\"path\", \"to\", \"my\", \"file\")\nmy_path\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n[1] \"path/to/my/file\"\n```\n:::\n:::\n\n\nOnce you have that path object, you can use it everywhere you import or export information to/from the code (with another use of `file.path` to get the right type of slash!).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Import\nmy_raw_data <- read.csv(file = file.path(my_path, \"raw_data.csv\"))\n\n# Export\nwrite.csv(x = data_object, file = file.path(my_path, \"tidy_data.csv\"))\n```\n:::\n\n\n### 2. Create Necessary Sub-Folders in the Code with `dir.create`\n\nUsing `file.path` guarantees that your code will work regardless of the upstream folder structure but what about the folders that you need to export or import things to/from? For example, say your `graphs.R` script saves a couple of useful exploratory graphs to the \"Plots\" folder, how would you guarantee that everyone running `graphs.R` *has* a \"Plots folder\"? You can use the `dir.create` function to create the folder in the code (and include your path object from step 1!).\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Create needed folder\ndir.create(path = file.path(my_path, \"Plots\"), showWarnings = FALSE)\n\n# Then export to that folder\nggplot2::ggsave(filename = file.path(my_path, \"Plots\", \"my_plot.png\"))\n```\n:::\n\n\nThe `showWarnings` argument of `dir.create` simply warns you if the folder you're creating already exists or not. There is no negative to \"creating\" a folder that already exists (nothing is overwritten!!) but the warning can be confusing so we can silence it ahead of time.\n\n### File Paths Summary\n\nWe strongly recommend following these guidelines so that your scripts work regardless of (1) the operating system, (2) folders \"upstream\" of the working directory, and (3) folders within the project. This will help your code by flexible and reproducible when others are attempting to re-run your scripts!\n\nAlso, for more information on how to read files in cloud storage locations such as Google Drive, Box, Dropbox, etc., please refer to our [Other Tutorials](https://nceas.github.io/scicomp.github.io/tutorials.html).",
     "supporting": [],
     "filters": [
       "rmarkdown/pagebreak.lua"
diff --git a/_freeze/tutorial-scaffold_jsonlite/execute-results/html.json b/_freeze/tutorial-scaffold_jsonlite/execute-results/html.json
@@ -0,0 +1,14 @@
+{
+  "hash": "c0f63e8f7edc4c7bbdaeaab27ed88ab8",
+  "result": {
+    "markdown": "The above section shows how R users can access their data stored on Google Drive, but how about other types of cloud storage like Box or Dropbox? If your data or your team's data is synced to the cloud through those tools, we recommend that all group members store relevant file paths in their own respective JSON file. Then everyone can read those file paths using the [`jsonlite` R package](https://cran.r-project.org/web/packages/jsonlite/index.html)! \n\nThe main advantage of this method is that you and your group members do not have to manually change the file paths in each script whenever a different person runs it!\n\n### Prerequisites\n\nTo follow along with this tutorial you will need to take the following steps:\n\n- [Download R](https://cran.r-project.org/)\n\n- [Download RStudio](https://www.rstudio.com/products/rstudio/download/)\n\n- Make sure you have access to your cloud storage files on your local machine\n\nFeel free to skip any steps that you have already completed!\n\n### Copy the desired file paths\n\nFirst, navigate to the folder(s) that contain the files that you and your team most frequently need to access. Copy the absolute path to each needed folder. On Mac, you can right-click and then \"Copy ... as Pathname\" (see below).\n\n<p align=\"center\">\n<img src=\"images/tutorial_jsonlite/jsonlite-1.png\" width = \"90%\" />\n</p>\n\nIf you have multiple paths, feel free to paste them into an empty text file for now.\n\n### Create the JSON file\n\nOnce you have the absolute file paths, open RStudio to the main working directory for your project. At the top left corner, click on File -> New File -> Text File. \n\nType the following lines into the file, except replace `YOUR_ABSOLUTE_PATH` with your path. Keep the quotation marks around the path. \n\n\n::: {.cell}\n\n```{.r .cell-code}\n{\n\"data_path\":\"YOUR_ABSOLUTE_PATH\"\n}\n```\n:::\n\n\nYou can customize the name of this path but make sure everyone in your team have the same name(s)! For example, if `data_path` refers to the folder containing all of the data for the group, then everyone should have a `data_path` in their own respective JSON file pointing to the same data folder. The absolute file path will be unique for each person, though. \n\nIf you have multiple paths, you can save them like so:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n{\n\"raw_data_path\":\"YOUR_ABSOLUTE_PATH\",\n\"tidy_data_path\":\"YOUR_ABSOLUTE_PATH\"\n}\n```\n:::\n\n\nSave this file as `paths.json` in your main working directory. \n\n### Put the JSON file in `gitignore`\n\nNavigate to the `gitignore` file of your project and list `paths.json` as one of the files to ignore. We don't want to push this file to GitHub since everyone's own `paths.json` will look different and you wouldn't want to accidentally overwrite your teammate's custom absolute path!\n\n### Install `jsonlite`\n\nIf you don't have `jsonlite` already, install it with:\n\n\n::: {.cell}\n\n```{.r .cell-code}\ninstall.packages(\"jsonlite\")\n```\n:::\n\n\n### Access your files in cloud storage\n\nNow whenever you want to access the files for your group, you can load `jsonlite` and run its `read_json()` function. If your path was not saved as `data_path` then in the code below, make sure to replace `data_path` with the actual name.\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Load jsonlite\nlibrary(\"jsonlite\")\n\n# Get the path to your files\npath_to_data <- jsonlite::read_json(\"paths.json\")$data_path\n```\n:::\n\n\nAnd `path_to_data` will contain the path to the folder where all your relevant files live! \n\nIf you combine this path with the [`file.path()` function](https://nceas.github.io/scicomp.github.io/best_practices.html#preserve-file-paths-as-objects-using-file.path) then you'll have a powerful, flexible tool for managing file paths!\n\nFor example, if `example.csv` lives in the folder that `path_to_data` points to, then you **and your team members** can read `example.csv` like so:\n\n\n::: {.cell}\n\n```{.r .cell-code}\n# Read the csv reproducibly\nexample <- read.csv(file = file.path(path_to_data, \"example.csv\"))\n```\n:::",
+    "supporting": [],
+    "filters": [
+      "rmarkdown/pagebreak.lua"
+    ],
+    "includes": {},
+    "engineDependencies": {},
+    "preserve": {},
+    "postProcess": true
+  }
+}
diff --git a/_freeze/tutorials/execute-results/html.json b/_freeze/tutorials/execute-results/html.json
diff --git a/best_practices.qmd b/best_practices.qmd
@@ -30,6 +30,10 @@ Check the headings below or in the table of contents on the right of this page t
 <img src="images/lter-photos/penguins.jpg" width="100%"/>
 </p>
 
+## Good Naming Conventions
+
+{{< include /modules_best-practices/naming-conventions.qmd >}}
+
 ## Package Loading
 
 {{< include /modules_best-practices/pkg-loading.qmd >}}
diff --git a/images/tutorial_jsonlite/jsonlite-1.png b/images/tutorial_jsonlite/jsonlite-1.png
diff --git a/modules_best-practices/file-paths.qmd b/modules_best-practices/file-paths.qmd
@@ -39,3 +39,5 @@ The `showWarnings` argument of `dir.create` simply warns you if the folder you'r
 ### File Paths Summary
 
 We strongly recommend following these guidelines so that your scripts work regardless of (1) the operating system, (2) folders "upstream" of the working directory, and (3) folders within the project. This will help your code by flexible and reproducible when others are attempting to re-run your scripts!
+
+Also, for more information on how to read files in cloud storage locations such as Google Drive, Box, Dropbox, etc., please refer to our [Other Tutorials](https://nceas.github.io/scicomp.github.io/tutorials.html).
diff --git a/modules_best-practices/naming-conventions.qmd b/modules_best-practices/naming-conventions.qmd
@@ -0,0 +1,12 @@
+When you first start working on a project with your group members, figuring out what to name your folders/files may not be at the top of your priority list. However, following a good naming convention will allow team members to quickly locate files and figure out what they contain. The organized naming structure will also allow new members of the group to be onboarded more easily! 
+
+Here is a summary of some naming tips that we recommend. These were taken from the [Reproducibility Best Practices module](https://lter.github.io/ssecr/mod_reproducibility.html#naming-tips) in the LTER's SSECR course. Please feel free to refer to the aforementioned link for more information.
+
+- Names should be informative
+  - An ideal file name should give some information about the file’s contents, purpose, and relation to other project files.
+  - For example, if you have a bunch of scripts that need to be run in order, consider adding step numbers to the start of each file name (e.g., "01_harmonize_data.R" or "step01_harmonize_data.R"). 
+- Names should avoid spaces and special characters
+  - Spaces and special characters (e.g., é, ü, etc.) in folder/file names may cause errors when someone with a Windows computer tries to read those file paths. You can replace spaces with delimiters like underscores or hyphens to increase machine readability. 
+- Follow a consistent naming convention throughout!
+  - If you and your group members find a naming convention that works, stick with it! Having a consistent naming convention is key to getting new collaborators to follow it. 
+  
diff --git a/tutorial-scaffold_jsonlite.qmd b/tutorial-scaffold_jsonlite.qmd
@@ -0,0 +1,85 @@
+The above section shows how R users can access their data stored on Google Drive, but how about other types of cloud storage like Box or Dropbox? If your data or your team's data is synced to the cloud through those tools, we recommend that all group members store relevant file paths in their own respective JSON file. Then everyone can read those file paths using the [`jsonlite` R package](https://cran.r-project.org/web/packages/jsonlite/index.html)! 
+
+The main advantage of this method is that you and your group members do not have to manually change the file paths in each script whenever a different person runs it!
+
+### Prerequisites
+
+To follow along with this tutorial you will need to take the following steps:
+
+- [Download R](https://cran.r-project.org/)
+
+- [Download RStudio](https://www.rstudio.com/products/rstudio/download/)
+
+- Make sure you have access to your cloud storage files on your local machine
+
+Feel free to skip any steps that you have already completed!
+
+### Copy the desired file paths
+
+First, navigate to the folder(s) that contain the files that you and your team most frequently need to access. Copy the absolute path to each needed folder. On Mac, you can right-click and then "Copy ... as Pathname" (see below).
+
+<p align="center">
+<img src="images/tutorial_jsonlite/jsonlite-1.png" width = "90%" />
+</p>
+
+If you have multiple paths, feel free to paste them into an empty text file for now.
+
+### Create the JSON file
+
+Once you have the absolute file paths, open RStudio to the main working directory for your project. At the top left corner, click on File -> New File -> Text File. 
+
+Type the following lines into the file, except replace `YOUR_ABSOLUTE_PATH` with your path. Keep the quotation marks around the path. 
+
+```{r create-json-1, eval = F}
+{
+"data_path":"YOUR_ABSOLUTE_PATH"
+}
+```
+
+You can customize the name of this path but make sure everyone in your team have the same name(s)! For example, if `data_path` refers to the folder containing all of the data for the group, then everyone should have a `data_path` in their own respective JSON file pointing to the same data folder. The absolute file path will be unique for each person, though. 
+
+If you have multiple paths, you can save them like so:
+
+```{r create-json-2, eval = F}
+{
+"raw_data_path":"YOUR_ABSOLUTE_PATH",
+"tidy_data_path":"YOUR_ABSOLUTE_PATH"
+}
+```
+
+Save this file as `paths.json` in your main working directory. 
+
+### Put the JSON file in `gitignore`
+
+Navigate to the `gitignore` file of your project and list `paths.json` as one of the files to ignore. We don't want to push this file to GitHub since everyone's own `paths.json` will look different and you wouldn't want to accidentally overwrite your teammate's custom absolute path!
+
+### Install `jsonlite`
+
+If you don't have `jsonlite` already, install it with:
+
+```{r install-json, eval = F}
+install.packages("jsonlite")
+```
+
+### Access your files in cloud storage
+
+Now whenever you want to access the files for your group, you can load `jsonlite` and run its `read_json()` function. If your path was not saved as `data_path` then in the code below, make sure to replace `data_path` with the actual name.
+
+```{r read-json-1, eval = F}
+# Load jsonlite
+library("jsonlite")
+
+# Get the path to your files
+path_to_data <- jsonlite::read_json("paths.json")$data_path
+```
+
+And `path_to_data` will contain the path to the folder where all your relevant files live! 
+
+If you combine this path with the [`file.path()` function](https://nceas.github.io/scicomp.github.io/best_practices.html#preserve-file-paths-as-objects-using-file.path) then you'll have a powerful, flexible tool for managing file paths!
+
+For example, if `example.csv` lives in the folder that `path_to_data` points to, then you **and your team members** can read `example.csv` like so:
+
+```{r read-json-2, eval = F}
+# Read the csv reproducibly
+example <- read.csv(file = file.path(path_to_data, "example.csv"))
+```
diff --git a/tutorials.qmd b/tutorials.qmd
@@ -8,6 +8,10 @@ Some of the content that we produce is not as detailed as our full workshops but
 
 {{< include /tutorial-scaffold_googledrive-auth.qmd >}}
 
+## Using the `jsonlite` R Package
+
+{{< include /tutorial-scaffold_jsonlite.qmd >}}
+
 ## Building a Website with Quarto
 
 {{< include /tutorial-scaffold_quarto-website.qmd >}}