Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions Configuration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# GCR Catalogs Configuration

GCRCatalogs has several configuration parameters. For most users
default settings will be appropriate, but some may need more control.
Those parameters and how to modify them are described below.

## Site

Defaults for certain other parameters are dependent on where you're running.
Currently recognized sites in the GCRCatalogs sense are "nersc", "in2p3"
and "nersc_public". If you are running at NERSC, GCRCatalogs will detect that
and use the value "nersc".

### How and When to Set Site

You may set site explicitly by giving the
environment variable `DESC_GCR_SITE` one of the allowed values.
This is likely necessary if you are either running at in2p3 or are
running at NERSC but want to access only the publicly released catalogs.
If none of the standard site values are appropriate, you'll need to set
values independently for the configuration parameters described in remaining
sections.

## Root Directory

Production catalog datasets are stored in a hierarchy under a dedicated
protected directory at the site. Within the catalog metadata the path
to the data is given relative to the root directory. Hence to access the
data one must know the location of the root directory in the local file
system. By default that value is determined from the site, however there
are a couple ways to set a value explicitly for the root directory if
the per-site default is not what you need:

- It can be set from Python code (`GCRCatalogs.set_root_dir`)
- It can be set from a previously-written user config file.
See the docstring for the class `RootDirManager` for details

## Catalog Config Source

GCRCatalogs accesses production catalog data by keying off the catalog name
to discover the metadata needed to read the associated dataset. That metadata
has traditionally been stored in catalog config files in a directory of
the `gcr-catalogs` package. The same information is now also stored
in the DESC Data Registry database. GCRCatalogs needs to know whether to
use the files or the Data Registry as the source of this metadata. By
default GCRCatalogs will use the site to determine a value, currently
"dataregistry" when the site is "nersc" and "files" otherwise. One
may override this value by using either of the following methods:

- setting the environment variable `GCR_CONFIG_SOURCE` to the desired
value before using GCRCatalogs
- from within Python invoking `GCRCatalogs.ConfigSource.set_config_source()`
(in order to use files) or
`GCRCatalogs.ConfigSource.set_config_source(dr=True)`
to use the Data Registry
45 changes: 37 additions & 8 deletions GCRCatalogs/register.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import yaml # now needed only for error reporting
import requests # now needed only for error reporting
from collections import namedtuple
from .root_dir_manager import RootDirManager
from .root_dir_manager import RootDirManager, get_site_name
from .catalog_helpers import load_yaml_local, load_yaml
from .base_config import BaseConfig, BaseConfigManager
from .dr_register import DR_AVAILABLE
Expand All @@ -20,7 +20,8 @@
_HERE = os.path.dirname(__file__)
_CONFIG_DIRNAME = "catalog_configs"
_CONFIG_DIRPATH = os.path.join(_HERE, _CONFIG_DIRNAME)
_SITE_CONFIG_PATH = os.path.join(_HERE, "site_config", "site_rootdir.yaml")
_SITE_CONFIG_DIR = os.path.join(_HERE, "site_config")
_SITE_CONFIG_INFO_PATH = os.path.join(_SITE_CONFIG_DIR, "site_info.yaml")
_CONFIG_SOURCE_ENV = "GCR_CONFIG_SOURCE"
_DR_SCHEMA_ENV = "GCR_DR_SCHEMA"
_DR_SCHEMA_DEFAULT = "lsst_desc_production"
Expand Down Expand Up @@ -111,18 +112,45 @@ def retrieve_paths(self, **kwargs):

# module-level functions that access/manipulate ConfigSource.config_source
def check_for_reg():
'''
Look to see if config source has already been established. If not,
attempt to establish it (must be either "dataregistry" or "files")
* if dataregistry code can't be imported, choose files
* else if environment variable GCR_CONFIG_SOURCE has a value, use that
* else try to make sensible choice based on site. Per-site default
values are stored in a file. For null or unrecognized site, issue
warning and use "files"
'''
if not ConfigSource.config_source:
if not DR_AVAILABLE:
ConfigSource.set_config_source()
return
else:
msg = f'''
Set env variable {_CONFIG_SOURCE_ENV} to acceptable value
("dataregistry" or "files") or call ConfigSource.set_config_source'''
("dataregistry" or "files"), revise file {_SITE_CONFIG_INFO_PATH} or
call ConfigSource.set_config_source with acceptable value'''
# See if user has set environment variable to select source
source = os.getenv(_CONFIG_SOURCE_ENV, None)
if not source:
raise RuntimeError("Registry source has not been established." + msg)
# Attempt to establish source from site

def get_config_source_from_site():
site = get_site_name()
if site and os.path.isfile(_SITE_CONFIG_INFO_PATH):
site_config = load_yaml_local(_SITE_CONFIG_INFO_PATH)
for k, v in site_config.items():
if k in site:
return v["config_source"]
return None

source = get_config_source_from_site()
if not source:
warnings.warn(
"Unable to determine config source. Defaulting to 'files'"
)
source = "files"

if source == "dataregistry":
ConfigSource.set_config_source(dr=True)
return
Expand All @@ -131,7 +159,7 @@ def check_for_reg():
return
else:
raise RuntimeError(
f"Unknown value {source} for GCR_CONFIG_SOURCE." + msg)
f"Unknown value '{source}' for config source ." + msg)


def get_root_dir():
Expand Down Expand Up @@ -420,16 +448,17 @@ def set_config_source(dr=False, dr_root=None,
return elt[0]
# No existing config source with these parameters so make
# a new one
reg = DrConfigRegister(_SITE_CONFIG_PATH,
reg = DrConfigRegister(site_config_path=_SITE_CONFIG_INFO_PATH,
dr_root=dr_root,
dr_site=dr_site)
ConfigSource.dr_sources.append((reg, dr_params))
ConfigSource.config_source = reg
return reg
else:
if not ConfigSource.file_source:
ConfigSource.file_source = ConfigRegister(_CONFIG_DIRPATH,
_SITE_CONFIG_PATH)
ConfigSource.file_source = ConfigRegister(
_CONFIG_DIRPATH,
site_config_path=_SITE_CONFIG_INFO_PATH)
reg = ConfigSource.file_source
ConfigSource.config_source = reg
return reg
Expand Down
46 changes: 24 additions & 22 deletions GCRCatalogs/root_dir_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,24 @@
from .catalog_helpers import load_yaml_local
from .utils import is_string_like

_DESC_SITE_ENV = "DESC_GCR_SITE"

def get_site_name():
"""
Return a string which, when executing at a recognized site with
well-known name, will include the name for that site
"""

site_from_env = os.getenv(_DESC_SITE_ENV, "")
if site_from_env:
return site_from_env

if os.getenv("NERSC_HOST", ""):
site_from_node = 'nersc'
else:
site_from_node = None
return site_from_node


class RootDirManager:
_ROOT_DIR_SIGNAL = "^/"
Expand All @@ -20,7 +38,6 @@ class RootDirManager:
"meta_path",
)
_DICT_LIST_KEYS = ("catalogs",)
_DESC_SITE_ENV = "DESC_GCR_SITE"
_NO_DEFAULT_ROOT_WARN = """
Default root dir has not been set; catalogs may not be found.

Expand Down Expand Up @@ -64,34 +81,19 @@ def __init__(self, site_config_path=None, user_config_name=None):

# Try to set self._root_dir_from_site
if self._site_config:
site_info = self._get_site_info()
if site_info:
site_name = get_site_name()
if site_name:
for k, v in self._site_config.items():
if k in site_info:
self._default_root_dir = v
if k in site_name:
self._default_root_dir = v['root_dir']
break

def _get_site_info(self):
"""
Return a string which, when executing at a recognized site with
well-known name, will include the name for that site
"""
site_from_env = os.getenv(self._DESC_SITE_ENV, "")
if site_from_env:
return site_from_env

if os.getenv("NERSC_HOST", ""):
site_from_node = 'nersc'
else:
site_from_node = None
return site_from_node

@property
def root_dir(self):
current_root_dir = self._custom_root_dir or self._default_root_dir
if not current_root_dir:
site_string = ' '.join(self.site_list)
warnings.warn(self._NO_DEFAULT_ROOT_WARN.format(self._DESC_SITE_ENV, site_string))
warnings.warn(self._NO_DEFAULT_ROOT_WARN.format(_DESC_SITE_ENV, site_string))

return current_root_dir

Expand Down Expand Up @@ -193,7 +195,7 @@ def has_valid_root_dir_in_site_config(self):
root_dir = self.root_dir
return bool(
root_dir and
os.path.abspath(root_dir) in self._site_config.values() and
os.path.abspath(root_dir) in [v['root_dir'] for v in self._site_config.values()] and
os.path.isdir(root_dir) and
os.access(root_dir, os.R_OK)
)
11 changes: 11 additions & 0 deletions GCRCatalogs/site_config/site_info.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
nersc:
root_dir: /global/cfs/cdirs/lsst/shared
config_source: dataregistry

in2p3:
root_dir: /sps/lsst/groups/desc/shared
config_source: files

nersc_public:
root_dir: /global/cfs/cdirs/lsst/gsharing
config_source: files
3 changes: 0 additions & 3 deletions GCRCatalogs/site_config/site_rootdir.yaml

This file was deleted.

2 changes: 1 addition & 1 deletion GCRCatalogs/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '1.10.0'
__version__ = '1.10.1'
8 changes: 6 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,9 +53,9 @@ Confluence page (*DESC member only*).
#### DiffSky Series
*by Andrew Hearin, Eve Kovacs, Patricia Larsen, Esteban Rangel, Katrin Heitmann et al.*

- `roman_rubin_v1.1.3_elais`: This catalog is a variant of roman_rubin_v1.1.2_elais that uses an improved tuning of the SED model parameters. The resulting luminosity distributions are in better agreement with the validation data (COSMOS 2020 and DESCQA tests).
- `roman_rubin_v1.1.3_elais`: This catalog is a variant of roman_rubin_v1.1.2_elais that uses an improved tuning of the SED model parameters. The resulting luminosity distributions are in better agreement with the validation data (COSMOS 2020 and DESCQA tests).

- `roman_rubin_v1.1.2_elais`: This catalog was produced for the joint roman-desc image simulations using differentiable, forward modeling techniques. Predictions for the galaxy SEDs are based on their star-formation hsitories. Sky area covers the ELAIS field and is ~110 sq. deg.
- `roman_rubin_v1.1.2_elais`: This catalog was produced for the joint roman-desc image simulations using differentiable, forward modeling techniques. Predictions for the galaxy SEDs are based on their star-formation hsitories. Sky area covers the ELAIS field and is ~110 sq. deg.

- `roman_rubin_v1.1.1_elais`: DEPRECATED. This catalog was produced for the joint roman-desc image simulations using differentiable, forward modeling techniques. This catalog is deprecated as it contains a serious bug resulting in satellite galaxies that are too bright.

Expand Down Expand Up @@ -187,6 +187,10 @@ Note: DR3 processing is not fully completed; a few tracts are missing. Here `dr3
- `dc2_eimages_run1.2i_visit-181898`: one visit of e-images for Run 1.2i
- `dc2_eimages_run1.2p_visit-181898`: one visit of e-images for Run 1.2p

## GCRCatalogs Configuration

See details [here](./Configuration.md)

## Using GCRCatalogs at NERSC

All catalogs available in `GCRCatalogs` are physically located at NERSC
Expand Down