This package provides coherent dedispersion and full cross-polarization or Stokes detection of radio telescope voltage data. The input format is GUPPI RAW. The output format is SIGPROC Filterbank. This package can run with or without a CUDA-enabled GPU, but use with a CUDA-enabled GPU is recommended for higher throughput.
This package and several of its dependent packages are not yet in the General
Julia registry. They are available in a separate publicly accessible registry.
The easiest way to install this package is to add this registry to your Julia
"depot":
$ julia -e 'using Pkg; Registry.add(RegistrySpec(url="https://github.com/david-macmahon/MyJuliaRegistry"))'After adding that registry, you can add the CoherentDedispersion package
using:
$ julia -e 'import Pkg; Pkg.add("CoherentDedispersion")'If you would like a symlink to this package's command line utility to be created
in directory /path/to/bin, you can export the environment variable
CODD_BINDIR before adding the package or by "building" if it has already been
added:
env CODD_BINDIR="$HOME/bin" julia -e 'import Pkg; Pkg.build("CoherentDedispersion"; verbose=true)'This package include a bash script, bin/rawcodd.jl that provides a convenient
command line interface. Creating a symlink to the script (see above) in a
directory in your path will allow you to run the script from anywhere without
having to specify the path to it.
Here is the command line help for rawcodd.jl:
$ rawcodd.jl --help
usage: rawcodd.jl -d DM [-f FFT] [-t INT] [-o OUTDIR] [-h] RAWFILES...
positional arguments:
RAWFILES GUPPI RAW files to process
optional arguments:
-d, --dm DM dispersion measure (type: Float64)
-f, --fft FFT up-channelization FFT length (type: Int64,
default: 1)
-t, --int INT spectra to integrate (type: Int64, default: 4)
-o, --outdir OUTDIR output directory (default: ".")
-h, --help show this help message and exitCaveats:
-
All files given on the command line will be treated as a single sequence of contiguous GUPPI RAW files. No checks are performed to verify that that is actually the case. Be sure to specify GUPPI RAW files accordingly.
-
The list of file names will be sorted before being processed. If you want to process a list of files in unsorted order, use the package from within Julia (see below).
-
No checks are performed on the existence of the output file. Existing output files will be silently overwritten. If you want to save the output from the same data files with different options, be sure to use different output directories.
-
This script can only process one scan at a time. Each time it runs, CUDA initialization occurs, which imposes several seconds of delay. If you have many similar scans to process with the same dispersion measure (DM), you may want to consider using this package from within Julia to be able to amortize the CUDA initialization over more scans.
-
Output directories will be created as needed.
Currently the output filename is generated from outdir (defaults to ".") and
the basename of the first filename given. The .raw extension of the input
file is removed (if present) and .rawcodd.0000.fil is concatenated. The GUPPI
RAW sequence number (if any) is retained so that individual GUPPI RAW files from
the same scan may be processed individually without overwriting the same output
file.
For more advanced use, one can use the package programmatically from within Julia. This allows for more control over the pipeline and the tasks that run it.
This package creates data buffers that are sized specifically for a given GUPPI RAW block size and other parameters. These buffers and their associated FFT plans are considered to be a pipeline. A set of contiguous GUPPI RAW data files (aka a scan) is processed by running the pipeline, which creates asynchronous tasks for that particular set of contiguous GUPPI RAW data files. It is possible to run a pipeline for multiple sets of contiguous GUPPI RAW data files, one set after another. A new set of tasks are started for each set of contiguous input files.
The sizing of the data buffers depends on multiple factors:
- The geometry of the GUPPI RAW blocks
- The requested up-channelization and time integration factors
- The maximum dispersive delay, which depends on the frequencies and dispersion measure being dedispersed
When reusing the pipeline on a new set of files, the new files are first checked for compatibility with the existing pipeline. If there is a mismatch an error message is displayed and no processing will occur for that set of files.
The overall process is:
- Create the pipeline for the first set of input files (i.e. for a scan)
- Run the pipeline for each set of input files (i.e. for each scan)
The first step is to create the pipeline. This is done using the aptly named
create_pipeline function. The create_pipeline function needs information
about the GUPPI RAW data files, the dispersion measure to be dedispersed, the
number of fine channels per coarse channel (if up-channelization is desired),
and the number of time samples to integrate after detection. Additional
optional keyword arguments can be used to increase buffering within the
pipeline and to explicitly opt out of using CUDA (e.g. for testing). By
default, CUDA will be used if the local system is determined to be "CUDA
functional".
create_pipeline(rawinfo, dm; nfpc=1, nint=4, N=2, use_cuda=CUDA.functional())The rawinfo parameter can be a GuppiRaw.Header object, the name of a GUPPI
RAW file, or a Vector of names of GUPPI RAW files. The latter two cases the
first GUPPI RAW header of the only/first file will be used. Generally, a
Vector of GUPPI RAW names will be the most convenient to use.
The pipeline object returned by create_pipeline can be passed to the
start_pipeline or run_pipeline functions along with a Vector of GUPPI RAW
filenames and an optional output directory (which defaults to the current
directory). Both start_pipeline and run_pipeline start the asynchronous
tasks that perform the dedispersion process and create the output file.
Generally, start_pipeline is more versatile whereas run_pipeline can be
more convenient for single pipeline applications.
Both start_pipeline and run_pipeline can be called multiple times with the
same pipeline object but different sets of (compatible) input files. As
described below, start_pipeline does not wait for completion so the caller is
responsible for waiting for completion before calling start_pipeline again.
run_pipeline does wait for completion so the caller may call run_pipeline
again as soon as it returns.
By default, run_pipeline displays a progress bar as processing progresses, but
start_pipeline does not. This can be controlled explicitly by passing the
keyword argument progress=true or progress=false to either function.
Both start_pipeline and run_pipeline support additional keyword arguments
that control the behavior of the detection process:
dostokes-true/falsevalue indicating whether to compute Stokes parameters (true) or cross polarization products (false, default).doconj-true/false/nothingvalue indicating whether to negate Stokes V/conjugate the cross polarization products (truenegates/conjugates,falsedoes not) or to negate/conjugate only whenOBSBWis negative (nothing, default).doscale-true/falsevalue indicating whether to scale the outputs to match the scaling performed byrawspec(another GUPPI RAW to Filterbank tool).true(default) scales to matchrawspec;falsedoes not.
start_pipeline starts the tasks and returns a NamedTuple of the tasks without
waiting for the tasks to complete (i.e. the tasks will likely still be running
after start_pipeline returns). The tasks in the NamedTuple are in "pipeline
order", so waiting for the last task (i.e. the outputtask) will wait for
completion of processing all the input files. Calling fetch on the output
task will wait for completion and return the name of the output Filterbank file.
start_pipeline(pipeline, rawfiles; outdir=".", progress=false)run_pipeline is essentially start_pipeline plus a fetch of the last task.
run_pipeline returns the name of the output Filterbank file, but only after
processing is complete.
run_pipeline(pipeline, rawfiles; outdir=".", progress=false)Here is a short script that shows how to use CoherentDedispersion to
dedisperse a list of GUPPI RAW files obtained from a not shown user-supplied
function) using a dispersion measure of 123.456, up-channelizing by a factor
of 16, and integrating 128 time samples (i.e. up-channelized spectra) after
detecting, and outputting a Filterbank file in the current directory. This is
essentially a simplified version of the rawcodd.jl command line interface.
using CoherentDedispersion
rawnames = your_function_to_get_list_of_raw_files()
dm = 123.456
pipeline = create_pipeline(rawfiles, dm; nfpc=16, nint=64)
fbname = run_pipeline(rawfiles, dm)
@info "saved output to $fbname"
@info "done"