Skip to content

Open source application for anonymizing IEC 61850 SCL instance files

License

Notifications You must be signed in to change notification settings

cimug-org/scl-sanitizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alt text

SCL Sanitizer is an open source tool for for anonymizing IEC 61850 SCL files and removing sensitive network-related information.

Command-Line Interface (CLI) Usage

The SCL sanitizer is provided as a command-line tool, typically invoked as:

python scl_sanitizer.py [options] <path-to-input-SCL-file>

Basic Usage

To sanitize an SCL file non-deterministically (default random mode):

python scl_sanitizer.py example.scl
  • Output is written to example_sanitized.scl (same directory as input).
  • Original file is not modified.

Deterministic Output Options

You can produce sanitized output deterministically (so the same input will generate the same output and IDs, useful for test fixtures and reproducibility):

Using Explicit Seed

python scl_sanitizer.py --seed 12345 example.scl
  • The same seed and input file will always generate the same sanitized output.

Using Hash-Based Seed

python scl_sanitizer.py --hash-seed example.scl
  • The file's byte content is hashed to derive a unique seed.
  • Any change in the SCL file (even whitespace) changes the output.

You cannot supply both --seed and --hash-seed at the same time.

Debug Output

Add --debug to print extra diagnostic output:

python scl_sanitizer.py --debug example.scl

Error Handling

  • If the input is not a valid SCL XML file, or a validation rule is violated (such as randomization producing duplicate or too-long identifiers), the CLI exits with an error message.
  • If the SCL file is invalid, or contains structures not supported, the tool may report the validation issue, including a line reference when possible.

Version Comments in Output

  • The sanitizer always inserts a version comment, e.g. <!-- SanitizerVersion=v2.9.0 --> immediately after the XML declaration.
  • With deterministic runs (--seed or --hash-seed), the CLI also inserts comments for the seed and seed source.

CLI Options Summary

Option Description
--seed <int> Set explicit deterministic PRNG seed.
--hash-seed Use the input file's hash to derive deterministic seed.
--debug Print debug output to stderr.
<input file> Path to SCL XML file to be sanitized (required).

Example

python scl_sanitizer.py --seed 42 --debug substation_example.scl

This will write a file named substation_example_sanitized.scl in the same directory, with seed-based determinism and verbose debug output.

Python Version Requirement

The SCL Sanitizer requires Python 3.7 or newer to run properly.

You can check your current Python version with:

python --version

or

python3 --version

If your version is lower than 3.7, you will need to download and install a newer Python 3 release.

Note: This tool also requires the lxml library. Install it using:

pip install lxml

IEC 61850 SCL Sanitization Rules

0. Scope, Goals, Definitions

  • Goal: Produce an anonymized SCL (Edition 2.1) that passes IEC 61850-6 XSD validation, reveals no sensitive topology or vendor data, and preserves structural integrity.
  • Do NOT modify: lnClass, lnInst, prefix.
  • Sensitive identifiers are randomized; vendor/private extensions deleted.
  • “Randomize” = replace original with generated identifier (NATO word + underscore + 4 alphanumerics) respecting maximum length & uniqueness within category.
  • Basic IEC types (Rule 15) must remain unchanged.
  • Deterministic mode optional (Rule 14).

1. Identifier Length Constraints

Enforce maximum lengths:

  • MAX_IED_NAME (32)
  • MAX_ACCESSPOINT_NAME (32)
  • MAX_LDEVICE_INST (32)
  • MAX_CONTROL_BLOCK_NAME (32)
  • MAX_RPT_ID (64)
  • MAX_GENERIC_ID (64)
  • MAX_DATASET_NAME (32) Regenerate (preferred) or truncate (fallback) to meet limits (implementation regenerates until compliant).

2. Deletion Rules

2.1 Delete every <Private> element.
2.2 Delete any element whose namespace URI != http://www.iec.ch/61850/2003/SCL.
2.3 Strip foreign namespaced attributes (xmlns:* except the SCL default). Retain only IEC SCL namespace on root.

3. Header Sanitization

3.1 Randomize <Header> attributes: id, toolID, version, revision.
3.2 Randomize <Header>/<Text> content (if present).
3.3 Remove all <History> children. 3.4 If IED@owner equals the original Header@id, the new IED@owner MUST equal the new Header@id (preservation of ownership semantics).

4. SubNetwork Identification

4.1 Randomize SubNetwork@name values.

5. IED-Level Anonymization

5.1 Randomize IED@name (respect length).
5.2 Randomize IED@type, IED@owner (subject to Rule 3.4).
5.3 Randomize vendor metadata: manufacturer, configVersion.

6. Access Points & Server Structures

6.1 Randomize AccessPoint@name.
6.2 Synchronize all references: apName / apRef / ServerAt@apName.
6.3 Maintain consistent mapping across all uses of each original name.

7. Communication Endpoints (ConnectedAP & Addresses)

7.1 Allocate a unique non-overlapping /24 subnet per ConnectedAP (format: 10.X.0.0/24).
7.2 Randomize host IP (one per P[@type="IP"] in respective subnet).
7.3 Set IP-SUBNET to 255.255.255.0.
7.4 Randomize MAC-Address with multicast prefix 01-0C-CD-XX-XX-XX.
7.5 Randomize VLAN-ID within 002–999 range (inclusive).
7.6 Single pass: do not overwrite randomized values in subsequent steps.

8. Control Blocks & DataSets

8.1 Randomize ldInst and cbName in <GSE> and <SMV>.
8.2 Propagate new ldInst to LDevice@inst, FCDA@ldInst, and GSE/SMV@ldInst.
8.3 Propagate cbName to associated GSEControl@name / SampledValueControl@name. 8.4 Randomize DataSet@name. 8.5 Propagate new DataSet name to datSet attribute in GSEControl, SampledValueControl, and ReportControl.

9. ReportControl Blocks

9.1 Randomize ReportControl@rptID.
9.2 Randomize ReportControl@name.
9.3 Update dependent references: ExtRef@rptID, ExtRef@rcbName, ClientLN@name referencing the block.

10. Description Attributes

10.1 Clear ("") all desc attribute values.

11. DataTypeTemplates – IDs & Types

11.1 Randomize LNodeType@id and update all lnType attributes.
11.2 Randomize DOType@id and update DO@type & SDO@type.
11.3 Randomize DAType@id and update DA@type references (non-basic).
11.4 Randomize EnumType@id and update DA/BDA where bType="Enum".
11.5 Randomize remaining user-defined @type literals (DA / BDA) not in basic allowlist, ensuring consistent mapping.
11.6 Maintain uniqueness across all randomized IDs (LNodeType, DOType, DAType, EnumType, user literals).

12. Reference Synchronization

12.1 ConnectedAP@iedName → new IED@name.
12.2 ServerAt@apName → new AccessPoint@name.
12.3 <IEDName> text nodes → new IED@name.
12.4 ClientLN: update iedName, apRef/apName, ldInst, and name if referencing a ReportControl.
12.5 LNode: update iedName, ldInst.
12.6 FCDA@ldInst → new LDevice@inst.
12.7 ExtRef@rptID and ExtRef@rcbName → new report identifiers.
12.8 GSE / SMV cbNameGSEControl / SampledValueControl@name.
12.9 *Control@datSet → new DataSet@name. 12.10 Terminal@connectivityNode (and ConnectivityNode@pathName) → Reconstructed path string (see Rule 20).

13. Comment Preservation

13.1 Preserve comments whose trimmed text starts with OCL ERROR.
13.2 Remove all other comments.
13.3 Do not alter preserved comment content.

14. Identifier Generation & Determinism

14.1 Pattern: NATO word + underscore + 4 lowercase alphanumerics.
14.2 Enforce uniqueness per category via tracking sets.
14.3 Enforce max length constraints.
14.4 Deterministic modes: - --seed <int> explicit seed. - --hash-seed derives 32-bit seed from first 8 hex chars of SHA256(file bytes).
14.5 Insert meta comments when deterministic:

<!-- SanitizerVersion=v2.9.0 -->
<!-- SanitizerSeed=<value> -->
<!-- SanitizerSeedSource=<explicit|hash> -->

14.6 Omit seed comments in non-deterministic mode.
14.7 Hash seed ensures identical sanitized output for byte-identical inputs; any byte change alters seed.

15. Basic Type Protection

15.1 Never randomize IEC built-in primitive types (allowlist).
15.2 Randomize only user-defined / enumerated types & template IDs.

16. Validation & Integrity

16.1 All randomized references must resolve (IDs, names, ldInst, rptID, cbName, type).
16.2 XSD validation SHOULD pass (external schema not embedded).
16.3 Uniqueness enforced for all categories.
16.4 Subnets must not overlap (distinct /24).
16.5 No dangling SDO@type or enum references.
16.6 Length compliance confirmed for identifiers post-randomization.

17. Output & Traceability

17.1 Include seed comments only in deterministic mode.
17.2 Include SanitizerSeedSource (explicit | hash) when deterministic.
17.3 Always include version comment <!-- SanitizerVersion=v2.9.0 -->.
17.4 Preserve whitelisted OCL ERROR comments.
17.5 Maintain XML declaration; ensure <Header> is on its own line (formatting aid).
17.6 Place version/seed comments immediately after XML declaration (spec compliant).

18. Versioning

18.1 Increment rules version on structural or semantic changes.

19. Change Log (since v2.8.1)

  • Added Rule 3.4: IED Owner synchronization with Header ID.
  • Added Rule 8.4/8.5/12.9: DataSet name randomization and reference updates.
  • Added Rule 20/12.10: Substation topology randomization and connectivity path reconstruction.
  • Version bump to 2.9.0.

20. Substation Topology

20.1 Randomize the @name attribute of every element in the <Substation> hierarchy. This includes:

  • <Substation>
  • <VoltageLevel>
  • <Bay>
  • <ConductingEquipment>
  • <ConnectivityNode>
  • <Terminal>
  • <SubEquipment>

If an element beneath <Substation> has a name attribute, that attribute must be randomized.

20.2 All topology names share a single randomization namespace (Topology.name) to facilitate path reconstruction.

20.3 Reconstruct Terminal@connectivityNode attributes:

  • Split the original path string by /.
  • Map each segment to its new randomized name.
  • Rejoin segments to form the updated path.
  • The result must maintain the pattern: SubstationName/VoltageLevelName/BayName/ConnectivityNodeName.

20.4 Similarly, reconstruct the ConnectivityNode@pathName attribute using the same logic.

Security / Privacy Note

  • Deterministic hash seeding can correlate identical originals across organizations.
  • Use non-deterministic mode when unlinkability is a priority.

Outcome

Produces a structurally intact, anonymized SCL file safe for cross-organizational sharing. v2.9.0 adds deep topology anonymization and DataSet protection.

About

Open source application for anonymizing IEC 61850 SCL instance files

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •  

Languages