|
2 | 2 | "cells": [ |
3 | 3 | { |
4 | 4 | "cell_type": "markdown", |
5 | | - "id": "302fd57e-ae4f-4a22-95ad-a69573212a98", |
| 5 | + "id": "509e7882-7bc8-4180-9634-0be59e4baad7", |
6 | 6 | "metadata": {}, |
7 | 7 | "source": [ |
8 | 8 | "<div style=\"overflow: hidden;\">\n", |
|
22 | 22 | "\n", |
23 | 23 | "### Before we begin\n", |
24 | 24 | "\n", |
25 | | - "Currently (November, 2024) the required versions of gcr-catalogs and dataregistry are only available in the `desc-python-bleed` kernel. Make sure you have selected that kernel while running this tutorial.\n", |
| 25 | + "As of September, 2025, the required versions of gcr-catalogs and dataregistry are available in the `desc-python` and `desc-python-bleed` kernels. Make sure you have selected one of those kernels while running this tutorial.\n", |
26 | 26 | "\n", |
27 | 27 | "If you haven't done so already, check out the [getting setup](https://lsstdesc.org/dataregistry/tutorial_setup.html) page from the documentation if you want to run this tutorial interactively." |
28 | 28 | ] |
|
143 | 143 | "outputs": [], |
144 | 144 | "source": [ |
145 | 145 | "from dataregistry import DataRegistry\n", |
146 | | - "from dataregistry.schema import DEFAULT_SCHEMA_PRODUCTION\n", |
| 146 | + "from dataregistry.schema import DEFAULT_NAMESPACE\n", |
147 | 147 | "\n", |
148 | 148 | "# Establish connection to the production schema\n", |
149 | | - "datareg = DataRegistry(schema=DEFAULT_SCHEMA_PRODUCTION)" |
| 149 | + "prod_schema = DEFAULT_NAMESPACE + \"_production\"\n", |
| 150 | + "datareg = DataRegistry(schema=prod_schema)" |
150 | 151 | ] |
151 | 152 | }, |
152 | 153 | { |
|
176 | 177 | }, |
177 | 178 | { |
178 | 179 | "cell_type": "markdown", |
179 | | - "id": "fa586592-2c2e-428b-b443-33ca26038add", |
| 180 | + "id": "f08bc754-4fee-44c0-8468-f2d82d2a9283", |
180 | 181 | "metadata": {}, |
181 | 182 | "source": [ |
182 | | - "That is a list of __all__ columns from __all__ tables, maybe more than we bargained for. Let's restrict it to columns in the `dataset` table." |
| 183 | + "By default that prints only the columns in the `dataset` table, which is the most interesting for most purposed.\n", |
| 184 | + "Datasets can be associated with an \"execution\" - in practice this could be a run of a script or a job step in a pipeline. Here are the columns for that table:" |
183 | 185 | ] |
184 | 186 | }, |
185 | 187 | { |
|
191 | 193 | }, |
192 | 194 | "outputs": [], |
193 | 195 | "source": [ |
194 | | - "dataset_columns = [col for col in all_columns if col.startswith('dataset.')]\n", |
195 | | - "print(dataset_columns)" |
| 196 | + "execution_columns = datareg.Query.get_all_columns(table=\"execution\")\n", |
| 197 | + "print(execution_columns)" |
196 | 198 | ] |
197 | 199 | }, |
198 | 200 | { |
199 | 201 | "cell_type": "markdown", |
200 | | - "id": "ad32a278-694a-4364-8dcd-39cdc702039c", |
| 202 | + "id": "b43623bd-d903-4fab-881c-ec41a81e46b7", |
201 | 203 | "metadata": {}, |
202 | 204 | "source": [ |
203 | | - "Among the more interesting for our purposes are `name`, `relative_path`, `access_api`, `access_api_configuration` and `location_type`. In the case of catalogs registered with GCRCatalogs, `name` in the data registry is the same name GCRCatalogs uses to refer to it: the basename of the corresponding config file, not including the suffix `.yaml`. But keep in mind that, unlike GCRCatalog, the dataregistry always respects case in names\n", |
| 205 | + "Among the more interesting dataset columns for our purposes are `name`, `relative_path`, `access_api`, `access_api_configuration` and `location_type`. In the case of catalogs registered with GCRCatalogs, `name` in the data registry is the same name GCRCatalogs uses to refer to it: the basename of the corresponding config file, not including the suffix `.yaml`. But keep in mind that, unlike GCRCatalog, the dataregistry always respects case in names\n", |
204 | 206 | "\n", |
205 | 207 | "Let's look at those properties for the dataset `cosmoDC2_v1.1.4`." |
206 | 208 | ] |
|
294 | 296 | "source": [ |
295 | 297 | "It all looks pretty much as you would expect, except what happened to the value of `dataset.relative_path`? That doesn't look like a path. You can see the reason in the catalog's configuration: it's based on another catalog. Or you can see it in the value for `dataset.location_type`. \"meta_only\" means that the data registry is only storing metadata for the catalog; it is not keeping track of the (indirectly) associated files. The same thing would happen for a composite catalog: the data registry just stores the catalog's configuration. It doesn't know how to parse it as GCRCatalogs would." |
296 | 298 | ] |
297 | | - }, |
298 | | - { |
299 | | - "cell_type": "markdown", |
300 | | - "id": "5721858e-8e42-4285-9ef0-ead3d780e918", |
301 | | - "metadata": {}, |
302 | | - "source": [] |
303 | 299 | } |
304 | 300 | ], |
305 | 301 | "metadata": { |
306 | 302 | "kernelspec": { |
307 | | - "display_name": "desc-python-bleed", |
| 303 | + "display_name": "desc-python", |
308 | 304 | "language": "python", |
309 | | - "name": "desc-python-bleed" |
| 305 | + "name": "desc-python" |
310 | 306 | }, |
311 | 307 | "language_info": { |
312 | 308 | "codemirror_mode": { |
|
318 | 314 | "name": "python", |
319 | 315 | "nbconvert_exporter": "python", |
320 | 316 | "pygments_lexer": "ipython3", |
321 | | - "version": "3.12.7" |
| 317 | + "version": "3.12.11" |
322 | 318 | } |
323 | 319 | }, |
324 | 320 | "nbformat": 4, |
|
0 commit comments