Skip to content

Commit

Permalink
Add initial documentation (#4)
Browse files Browse the repository at this point in the history
  • Loading branch information
sjperkins authored Sep 10, 2024
1 parent cff65e3 commit d527d26
Show file tree
Hide file tree
Showing 11 changed files with 382 additions and 0 deletions.
22 changes: 22 additions & 0 deletions .github/workflows/readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# .github/workflows/documentation-links.yml

name: readthedocs/actions
on:
pull_request_target:
types:
- opened
# Execute this action only on PRs that touch
# documentation files.
# paths:
# - "doc/**"

permissions:
pull-requests: write

jobs:
documentation-links:
runs-on: ubuntu-latest
steps:
- uses: readthedocs/actions/preview@v1
with:
project-slug: "xarray-ms"
21 changes: 21 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
# .readthedocs.yaml
# Read the Docs configuration file
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the OS, Python version and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"
jobs:
post_install:
- pip install poetry==1.8.3
- poetry config virtualenvs.create false
- poetry install --with docs

# Build documentation in the "docs/" directory with Sphinx
sphinx:
configuration: docs/conf.py
116 changes: 116 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,116 @@
xarray-ms
=========

xarray-ms presents a Measurement Set v4 view (MSv4) over
`CASA Measurement Sets <https://casa.nrao.edu/Memos/229.html>`_ (MSv2).
It provides access to MSv2 data via the xarray API, allowing MSv4 compliant applications
to be developed on well-understood MSv2 data.

.. code-block:: python
>>> import xarray_ms
>>> import xarray
>>> ds = xarray.open_dataset("/data/L795830_SB001_uv.MS/",
chunks={"time": 2000, "baseline": 1000})
>>> ds
<xarray.Dataset> Size: 70GB
Dimensions: (time: 28760, baseline: 2775, frequency: 16,
polarization: 4, uvw_label: 3)
Coordinates:
antenna1_name (baseline) object 22kB dask.array<chunksize=(1000,), meta=np.ndarray>
antenna2_name (baseline) object 22kB dask.array<chunksize=(1000,), meta=np.ndarray>
baseline_id (baseline) int64 22kB dask.array<chunksize=(1000,), meta=np.ndarray>
* frequency (frequency) float64 128B 1.202e+08 ... 1.204e+08
* polarization (polarization) <U2 32B 'XX' 'XY' 'YX' 'YY'
* time (time) float64 230kB 1.601e+09 ... 1.601e+09
Dimensions without coordinates: baseline, uvw_label
Data variables:
EFFECTIVE_INTEGRATION_TIME (time, baseline) float64 638MB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
FLAG (time, baseline, frequency, polarization) uint8 5GB dask.array<chunksize=(2000, 1000, 16, 4), meta=np.ndarray>
TIME_CENTROID (time, baseline) float64 638MB dask.array<chunksize=(2000, 1000), meta=np.ndarray>
UVW (time, baseline, uvw_label) float64 2GB dask.array<chunksize=(2000, 1000, 3), meta=np.ndarray>
VISIBILITY (time, baseline, frequency, polarization) complex64 41GB dask.array<chunksize=(2000, 1000, 16, 4), meta=np.ndarray>
WEIGHT (time, baseline, frequency, polarization) float32 20GB dask.array<chunksize=(2000, 1000, 16, 4), meta=np.ndarray>
Attributes:
antenna_xds: <xarray.Dataset> Size: 4kB\nDimensions: (...
version: 0.0.1
creation_date: 2024-09-10T14:29:22.587984+00:00
data_description_id: 0
Measurement Set v4
------------------

NRAO_/SKAO_ are developing a new xarray-based `Measurement Set v4 specification <msv4-spec_>`_.
While there are many changes some of the major highlights are:

* xarray_ is used to define the specification.
* MSv4 data consists of Datasets of ndarrays on a regular time-channel grid.
MSv2 data is tabular and, while in many instances the time-channel grid is regular,
this was not guaranteed, especially after MSv2 datasets had been transformed by various tasks.


xarray_ Datasets are self-describing and they are therefore easier to reason about and work with.
Additionally, the regularity of data will make writing MSv4-based software less complex.

xradio
------

`casangi/xradio <xradio_>`_ provides a reference implementation that converts
CASA v2 Measurement Sets to Zarr v4 Measurement Sets using the python-casacore_
package.

Why xarray-ms?
--------------

* By developing against an MSv4 xarray view over MSv2 data,
developers can develop applications on well-understood data,
and then seamlessly transition to newer formats.
Data can also be exported to newer formats (principally zarr_) via xarray's
native I/O routines.
However, the xarray view of either format looks the same to the software developer.

* xarray-ms builds on xarray's
`backend API <https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html>`_:
Implementing a formal CASA MSv2 backend has a number of automatically benefits:

* Use of xarray's internal I/O routines such as ``open_dataset`` or ``to_zarr``.
* Use of xarray's `lazy loading mechanism <xarray_lazy_>`_.
* Automatic access to any `chunked array types <xarray_chunked_arrays_>`_
supported by xarray including, but not limited to dask_.
* Arbitrary chunking along any xarray dimension.

* xarray-ms uses arcae_, a high-performance backend to CASA Tables implementing
a subset of python-casacore_'s interface.
* Some limited support for irregular MSv2 data via padding.

Work in Progress
----------------

.. warning::

xarray-ms is currently under active development and does not yet
have feature parity with xradio_.

.. warning::

The Measurement Set v4 specification is currently under active development.

Most measures information and many secondary sub-tables are currently missing.
However, the most important parts of the ``MAIN`` tables,
as well as the ``ANTENNA``, ``POLARIZATON`` and ``SPECTRAL_WINDOW``
sub-tables are implemented and should be sufficient to start
developing software that uses xarray-ms.

.. _SKAO: https://www.skao.int/
.. _NRAO: https://public.nrao.edu/
.. _msv4-spec: https://docs.google.com/spreadsheets/d/14a6qMap9M5r_vjpLnaBKxsR9TF4azN5LVdOxLacOX-s/
.. _xradio: https://github.com/casangi/xradio
.. _dask-ms: https://github.com/ratt-ru/dask-ms
.. _arcae: https://github.com/ratt-ru/arcae
.. _dask: https://www.dask.org/
.. _python-casacore: https://github.com/casacore/python-casacore/
.. _xarray: https://github.com/pydata/xarray
.. _xarray_backend: https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html
.. _xarray_lazy: https://docs.xarray.dev/en/latest/internals/internal-design.html#lazy-indexing-classes
.. _xarray_chunked_arrays: https://docs.xarray.dev/en/latest/internals/chunked-arrays.html
.. _zarr: https://zarr.dev/
20 changes: 20 additions & 0 deletions doc/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions doc/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.https://www.sphinx-doc.org/
exit /b 1
)

if "%1" == "" goto help

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
8 changes: 8 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
API
===

Opening Measurement Sets
------------------------

.. autoclass:: xarray_ms.backend.msv2.entrypoint.MSv2PartitionEntryPoint
:members: open_dataset, open_datatree
65 changes: 65 additions & 0 deletions doc/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Configuration file for the Sphinx documentation builder.
#
# For the full list of built-in configuration values, see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Project information -----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information

# type: ignore

project = "xarray-ms"
copyright = "2024, Simon Perkins"
author = "Simon Perkins"
release = "0.2.0"

# -- General configuration ---------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#general-configuration

extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.autosummary",
"sphinx.ext.extlinks",
"sphinx_copybutton",
"sphinx.ext.doctest",
"sphinx.ext.napoleon",
"sphinx.ext.intersphinx",
]

templates_path = ["_templates"]
exclude_patterns = []

# Napoleon settings
napoleon_google_docstring = True
napoleon_numpy_docstring = False
napoleon_include_init_with_doc = False
napoleon_include_private_with_doc = False
napoleon_include_special_with_doc = True
napoleon_use_admonition_for_examples = False
napoleon_use_admonition_for_notes = False
napoleon_use_admonition_for_references = False
napoleon_use_ivar = False
napoleon_use_param = True
napoleon_use_rtype = True
napoleon_preprocess_types = False
napoleon_type_aliases = None
napoleon_attr_annotations = True

# -- Options for HTML output -------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output

html_theme = "pydata_sphinx_theme"
html_static_path = ["_static"]

extlinks = {
"issue": ("https://github.com/ratt-ru/xarray-ms/issues/%s", "GH#"),
"pr": ("https://github.com/ratt-ru/xarray-ms/pull/%s", "GH#"),
}

# Example configuration for intersphinx: refer to the Python standard library.
intersphinx_mapping = {
"dask": ("https://dask.pydata.org/en/stable", None),
"numpy": ("https://numpy.org/doc/stable/", None),
"python": ("https://docs.python.org/3/", None),
"xarray": ("https://docs.xarray.dev/en/stable", None),
}
20 changes: 20 additions & 0 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.. xarray-ms documentation master file, created by
sphinx-quickstart on Tue Sep 10 10:36:27 2024.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
xarray-ms documentation
=======================

Add your content using ``reStructuredText`` syntax. See the
`reStructuredText <https://www.sphinx-doc.org/en/master/usage/restructuredtext/index.html>`_
documentation for details.


.. toctree::
:maxdepth: 2
:caption: Contents:

readme
install
api
59 changes: 59 additions & 0 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
Installation
============

.. code-block:: bash
$ pip install xarray-ms
Development
===========

Firstly, install Python `Poetry <poetry_>`_.

.. _poetry: https://python-poetry.org/

Then, the following commands will install the required dependencies,
optional testing dependencies, documentation and development dependencies
in a suitable virtual environment:

.. code-block:: bash
$ cd /code/arcae
$ poetry env use 3.11
$ poetry install -E testing --with doc --with dev
$ poetry run pre-commit install
$ poetry shell
The pre-commit hooks can be manually executed as follows:

.. code-block:: bash
$ poetry run pre-commit run -a
Test Suite
----------

Run the following command within the arcae source code directory to
execute the test suite

.. code-block:: bash
$ cd /code/arcae
$ poetry install -E testing --with dev
$ poetry run py.test -s -vvv tests/
Documentation
-------------

Run the following command within the doc sub-directory to
build the Sphinx documentation

.. code-block:: bash
$ cd /code/arcae
$ poetry install --with doc
$ poetry shell
$ cd doc
$ make html
1 change: 1 addition & 0 deletions doc/source/readme.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.. include:: ../../README.rst
15 changes: 15 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,21 @@ testing = ["dask", "distributed", "pytest"]
[tool.poetry.plugins."xarray.backends"]
"xarray-ms:msv2" = "xarray_ms.backend.msv2.entrypoint:MSv2PartitionEntryPoint"

[tool.poetry.group.dev]
optional = true

[tool.poetry.group.dev.dependencies]
pre-commit = "^3.8.0"

[tool.poetry.group.doc]
optional = true

[tool.poetry.group.doc.dependencies]
sphinx = "^8.0.2"
pygments = "^2.18.0"
sphinx-copybutton = "^0.5.2"
pydata-sphinx-theme = "^0.15.4"

[tool.ruff]
line-length = 88
indent-width = 2
Expand Down

0 comments on commit d527d26

Please sign in to comment.