Skip to content

Commit 3e8e357

Browse files
authored
Merge pull request #61 from IQTLabs/rc/v1.0
Rc/v1.0
2 parents 0ee56c1 + 1053ab7 commit 3e8e357

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

59 files changed

+3566
-1345
lines changed

.gitignore

+6-1
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,13 @@
88
*.yaml
99
*.txt
1010
*.json
11-
*.png
1211
*.mdb
12+
*.mar
1313
*-checkpoint.ipynb
1414
.DS_Store
1515
.python-version
16+
experiment_logs/
17+
lightning_logs/
18+
spec_logs/
19+
20+
rfml-dev/.README.md.swp

.gitmodules

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[submodule "torchsig"]
2+
path = torchsig
3+
url = https://github.com/TorchDSP/torchsig

README.md

+196-4
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,199 @@
1-
# rfml-dev
1+
# RFML
2+
3+
This repo provides the pipeline for working with RF datasets, labeling them and training both IQ and spectrogram based models. The SigMF standard is used for managing RF data and the labels/annotations on the data. It also uses the Torchsig framework for performing RF related augmentation of the data to help make the trained models more robust and functional in the real world.
4+
5+
6+
## Prerequisites
7+
8+
### Poetry
9+
10+
Follow the instructions here to install Poetry: https://python-poetry.org/docs/#installation
11+
12+
13+
### Inspectrum (optional)
14+
15+
https://github.com/miek/inspectrum
16+
17+
This utility is useful for inspecting sigmf files and the annotations that the auto label scripts make.
18+
19+
20+
21+
### Anylabelling (optional)
22+
https://github.com/vietanhdev/anylabeling
23+
24+
This program is used for image annotation and offers AI-assisted labelling.
25+
26+
27+
## Activate virtual environment
28+
29+
This project uses Poetry for dependency management and packaging. Poetry can be used with external virtual environments.
30+
If using a non-Poetry virtual environment, start by activating the environment before running Poetry commands. See note in [Poetry docs](https://python-poetry.org/docs/basic-usage/#using-your-virtual-environment) for more info.
31+
32+
33+
### Using Poetry
34+
35+
To activate the Poetry virtual environment with all of the Python modules configured, run the following:
236

337
```bash
4-
poetry install
538
poetry shell
6-
jupyter notebook
7-
```
39+
```
40+
See [Poetry docs](https://python-poetry.org/docs/basic-usage/#activating-the-virtual-environment) for more information.
41+
42+
## Install
43+
44+
```bash
45+
git clone https://github.com/IQTLabs/rfml-dev.git
46+
cd rfml-dev
47+
git submodule update --init --recursive
48+
poetry install
49+
```
50+
51+
## Verify install with GPU support (optional)
52+
53+
```bash
54+
$ python -c 'import torch; print(torch.cuda.is_available())'
55+
True
56+
```
57+
58+
If the output does not match or errors occur, try installing Pytorch manually ([current version](https://pytorch.org/get-started/locally/) or [previous versions](https://pytorch.org/get-started/previous-versions/)).
59+
#### Example
60+
61+
```bash
62+
pip install torch==2.0.1 torchvision==0.15.2
63+
```
64+
65+
66+
# Building a model
67+
68+
69+
## Approach
70+
71+
Our current approach is to capture examples of signals of interest to create labeled datasets. There are many methods for doing this and many challenges to consider. One practical method for accomplishing this is to isolate signals of interest and compare those to a specific background RF environment. For simplicity we apply the same label to all the signals present in the background environment samples. We use this to essentially teach the model to ignore those signals. For this to work, it is important that the signals of interest are isolated from the background RF environment. Since it is really tough these days to find an RF free environment, we have build a mini-faraday cage enclosure by lining the inside of a pelican case with foil. There are lots of instructions, like [this one](https://mosequipment.com/blogs/blog/build-your-own-faraday-cage), available online if you want to build your own. With this, the signal will be very strong, so make sure you adjust the SDR's gain appropriately.
72+
73+
## Labeling IQ Data
74+
75+
The scripts in [label_scripts](./label_scripts/) use signal processing to automatically label IQ data. The scripts looks at the signal power to detect when there is a signal present in the IQ data and estimate the occupied bandwidth of the signal.
76+
77+
### Tuning Autolabeling
78+
79+
In the labeling scripts, the settings for autolabeling need to be tuned for the type of signals that were collected.
80+
81+
```python
82+
annotation_utils.annotate(
83+
f,
84+
label="mavic3_video", # This is the label that is applied to all of the matching annotations
85+
avg_window_len=256, # The number of samples over which to average signal power
86+
avg_duration=0.25, # The number of seconds, from the start of the recording to use to automatically calculate the SNR threshold, if it is None then all of the samples will be used
87+
debug=False,
88+
estimate_frequency=True, # Whether the frequency bounds for an annotation should be calculated. estimate_frequency needs to be enabled if you use min/max_bandwidth
89+
spectral_energy_threshold=0.95, # Percentage used to determine the upper and lower frequency bounds for an annotation
90+
force_threshold_db=-58, # Used to manually set the threshold used for detecting a signal and creating an annotation. If None, then the automatic threshold calculation will be used instead.
91+
overwrite=False, # If True, any existing annotations in the .sigmf-meta file will be removed
92+
min_bandwidth=16e6, # The minimum bandwidth (in Hz) of a signal to annotate
93+
max_bandwidth=None, # The maximum bandwidth (in Hz) of a signal to annotate
94+
min_annotation_length=10000, # The minimum numbers of samples in length a signal needs to be in order for it to be annotated. This is directly related to the sample rate a signal was captured at and does not take into account bandwidth. So 10000 samples at 20,000,000 samples per second, would mean a minimum transmission length of 0.0005 seconds
95+
# max_annotations=500, # The maximum number of annotations to automatically add
96+
dc_block=True # De-emphasize the DC spike when trying to calculate the frequencies for a signal
97+
)
98+
```
99+
100+
### Tips for Tuning Autolabeling
101+
102+
#### Force Threshold dB
103+
![low threshold](./images/low_threshold.png)
104+
105+
If you see annotations where harmonics or lower power, unintentional signals are getting selected, try setting the `force_threshold_db`. The automatic threshold calculation maybe selecting a value that is too low. Find a value for `force_threshold_db` where it is selecting the intended signals and ignoring the low power ones.
106+
107+
#### Spectral Energy Threshold
108+
![spectral energy](./images/spectral_energy.png)
109+
110+
If the frequency bounds are not lining up with the top or bottom part of a signal, make the `spectral_energy_threshold` higher. Sometime a setting as high as 0.99 is required
111+
112+
#### Skipping "small" Signals
113+
![small signals](./images/min_annotation.png)
114+
115+
Some tuning is needed for signals that have a short transmission duration and/or limited bandwidth. Here are a couple things to try if they are getting skipped:
116+
- `min_annotation_length` is the minimum number of samples for an annotation. If the signal is has less samples than this, it will not be annotated. Try lowering this.
117+
- The `average_duration` setting maybe too long and the signal is getting averaged into the noise. Try lowering this.
118+
- `min_bandwidth` is the minimum bandwidth (in Hz) for a signal to be detected. If this value is too high, signals that have less bandwidth will be ignored. Try lowering this.
119+
120+
## Training a Model
121+
122+
After you have finished labeling your data, the next step is to train a model on it. This repo makes it easy to train both IQ and Spectrogram based models from sigmf data.
123+
124+
### Configure
125+
126+
This repo provides an automated script for training and evaluating models. To do this, configure the [run_experiments.py](./run_experiments.py) file to point to the data you want to use and set the training parameters:
127+
128+
```python
129+
"experiment_0": { # A name to refer to the experiment
130+
"class_list": ["mavic3_video","mavic3_remoteid","environment"], # The labels that are present in the sigmf-meta files
131+
"train_dir": ["data/samples/mavic-30db", "data/samples/mavic-0db", "data/samples/environment"], # Directory with SigMF files
132+
"iq_epochs": 10, # Number of epochs for IQ training, if it is 0 or None, it will be skipped
133+
"spec_epochs": 10, # Number of epochs for spectrogram training, if it is 0 or None, it will be skipped
134+
"notes": "DJI Mavic3 Detection" # Notes to your future self
135+
}
136+
```
137+
138+
Once you have the **run_experiments.py** file configured, run it:
139+
140+
```bash
141+
python3 run_experiments.py
142+
```
143+
144+
Once the training has completed, it will print out the logs location, model accuracy, and the location of the best checkpoint:
145+
146+
```bash
147+
I/Q TRAINING COMPLETE
148+
149+
150+
Find results in experiment_logs/experiment_1/iq_logs/08_08_2024_09_17_32
151+
152+
Total Accuracy: 98.10%
153+
Best Model Checkpoint: lightning_logs/version_5/checkpoints/experiment_logs/experiment_1/iq_checkpoints/checkpoint.ckpt
154+
```
155+
156+
### Convert & Export IQ Models
157+
158+
Once you have a trained model, you need to convert it into a portable format that can easily be served by TorchServe. To do this, use **convert_model.py**:
159+
160+
```bash
161+
python3 convert_model.py --model_name=drone_detect --checkpoint=lightning_logs/version_5/checkpoints/experiment_logs/experiment_1/iq_checkpoints/checkpoint.ckpt
162+
```
163+
This will export a **_torchscript.pt** file.
164+
165+
```bash
166+
torch-model-archiver --force --model-name drone_detect --version 1.0 --serialized-file weights/drone_detect_torchscript.pt --handler custom_handlers/iq_custom_handler.py --export-path models/ -r custom_handler/requirements.txt
167+
```
168+
169+
This will generate a **.mar** file in the [models/](./models/) folder. [GamutRF](https://github.com/IQTLabs/gamutRF) can run this model and use it to classify signals.
170+
171+
## Files
172+
173+
174+
[annotation_utils.py](annotation_utils.py) - DSP based automated labelling tools
175+
176+
[auto_label.py](auto_label.py) - CV based automated labelling tools
177+
178+
[data.py](data.py) - RF data operations tool
179+
180+
[experiment.py](experiment.py) - Class to manage experiments
181+
182+
[models.py](models.py) - Class for I/Q models (based on TorchSig)
183+
184+
[run_experiments.py](run_experiments.py) - Experiment configurations and run script
185+
186+
[sigmf_pytorch_dataset.py](sigmf_pytorch_dataset.py) - PyTorch style dataset class for SigMF data (based on TorchSig)
187+
188+
[spectrogram.py](spectrogram.py) - Spectrogram tools
189+
190+
[test_data.py](test_data.py) - Test for data.py (might be outdated)
191+
192+
[train_iq.py](train_iq.py) - Training script for I/Q models
193+
194+
[train_spec.py](train_spec.py) - Training script for spectrogram models
195+
196+
[zst_parse.py](zst_parse.py) - ZST file parsing tool, for GamutRF-style filenames
197+
198+
The [notebooks/](./notebooks/) directory contains various experiments we have conducted during development.
199+

0 commit comments

Comments
 (0)