Skip to content

Conversation

@jostaub
Copy link
Contributor

@jostaub jostaub commented Sep 18, 2025

This PR should address the problems described in:

Here an overview of the most important changes:

  • Use of a hash code algorithm for de duplication (bases of problems described in the linked issues)
  • Increased consistency in parsing between the XML and CSV parsers.
  • Combined findings where the only differences are in fields that cannot be rehashed due to inconsistent values between scans (e.g. fields containing timestamps or packet IDs). This prevents duplicates if the vulnerability is found multiple times on the same endpoint.
  • Increased parser value coverage
  • Heuristic for fix_available detection
  • Updated mapping to DefectDojo fields compared to version 1.
  • Improved docs
  • Simmilar XML and CSV implementation
  • Added Testing files (this is the reason for the large line count)

In #12378 we discussed why it is a new parser version instead of an update of the existing parser. The concrete fields (that where the reason mentioned in the issue) are no longer the same since i changed the dedpulication algo since then. However the base reason is still the same (hashing fields not present in original parser).

For a long time, this PR did not deduplicate on endpoints. However, this resulted in duplication markings during reimport in the default engagement configuration, which is why this functionality was removed (for now).

@github-actions github-actions bot added settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR docs unittests parser labels Sep 18, 2025
@jostaub
Copy link
Contributor Author

jostaub commented Sep 18, 2025

DryRun Findings:

  • Potential Cross-Site Scripting -> I can html sanitize all fields is this wanted? I don't see other parsers doing to this.
  • CSV Injection -> This should be a false positive. It should be defectdojos job to sanitize this during csv export?
  • Information Disclosure via finding.steps_to_reproduce -> should be a false positive -> the function can currently combine multiple endpoints hover the local dedup currently prevents this. Even if this is changed in the future this is not information disclosure since the user that views the vuln should see all the endpoints

I will fix the rest.

@jostaub
Copy link
Contributor Author

jostaub commented Sep 18, 2025

I don't think my changes caused the remaining failing tests. Are they failing on the default DEV branch or did i break something?

@valentijnscholten
Copy link
Member

2025-09-18T13:37:39.4935378Z uwsgi-1  | ERROR: test_openvas_csv_report_combined_findings (unittests.tools.test_openvas_parser.TestOpenVASParserV2.test_openvas_csv_report_combined_findings)
2025-09-18T13:37:39.4935515Z uwsgi-1  | Ensure findings combinding behaviour
2025-09-18T13:37:39.4935674Z uwsgi-1  | ----------------------------------------------------------------------
2025-09-18T13:37:39.4935798Z uwsgi-1  | Traceback (most recent call last):
2025-09-18T13:37:39.4936128Z uwsgi-1  |   File "/app/unittests/tools/test_openvas_parser.py", line 107, in test_openvas_csv_report_combined_findings
2025-09-18T13:37:39.4936261Z uwsgi-1  |     findings = setup_openvas_v2_test(f)
2025-09-18T13:37:39.4936371Z uwsgi-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4936630Z uwsgi-1  |   File "/app/unittests/tools/test_openvas_parser.py", line 18, in setup_openvas_v2_test
2025-09-18T13:37:39.4936768Z uwsgi-1  |     findings = parser.get_findings(f, test)
2025-09-18T13:37:39.4936882Z uwsgi-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4937089Z uwsgi-1  |   File "/app/dojo/tools/openvas/parser.py", line 37, in get_findings
2025-09-18T13:37:39.4937227Z uwsgi-1  |     return get_findings_from_csv(file, test)
2025-09-18T13:37:39.4937424Z uwsgi-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4937703Z uwsgi-1  |   File "/app/dojo/tools/openvas/parser_v2/csv_parser.py", line 25, in get_findings_from_csv
2025-09-18T13:37:39.4937948Z uwsgi-1  |     column_names = [column_name.lower() for column_name in next(csv_reader) if column_name]
2025-09-18T13:37:39.4938123Z uwsgi-1  |                                                            ^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4938366Z uwsgi-1  | TypeError: underlying read() should have returned a bytes-like object, not 'str'
2025-09-18T13:37:39.4938451Z uwsgi-1  | 
2025-09-18T13:37:39.4938584Z uwsgi-1  | ======================================================================
2025-09-18T13:37:39.4939105Z uwsgi-1  | ERROR: test_openvas_parser_csv_detail (unittests.tools.test_openvas_parser.TestOpenVASParserV2.test_openvas_parser_csv_detail)
2025-09-18T13:37:39.4939328Z uwsgi-1  | Ensure finding contains report data as expected
2025-09-18T13:37:39.4939497Z uwsgi-1  | ----------------------------------------------------------------------
2025-09-18T13:37:39.4939618Z uwsgi-1  | Traceback (most recent call last):
2025-09-18T13:37:39.4939919Z uwsgi-1  |   File "/app/unittests/tools/test_openvas_parser.py", line 42, in test_openvas_parser_csv_detail
2025-09-18T13:37:39.4940049Z uwsgi-1  |     findings = setup_openvas_v2_test(f)
2025-09-18T13:37:39.4940163Z uwsgi-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4940430Z uwsgi-1  |   File "/app/unittests/tools/test_openvas_parser.py", line 18, in setup_openvas_v2_test
2025-09-18T13:37:39.4940567Z uwsgi-1  |     findings = parser.get_findings(f, test)
2025-09-18T13:37:39.4940681Z uwsgi-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4940887Z uwsgi-1  |   File "/app/dojo/tools/openvas/parser.py", line 37, in get_findings
2025-09-18T13:37:39.4941026Z uwsgi-1  |     return get_findings_from_csv(file, test)
2025-09-18T13:37:39.4941140Z uwsgi-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4941414Z uwsgi-1  |   File "/app/dojo/tools/openvas/parser_v2/csv_parser.py", line 25, in get_findings_from_csv
2025-09-18T13:37:39.4941661Z uwsgi-1  |     column_names = [column_name.lower() for column_name in next(csv_reader) if column_name]
2025-09-18T13:37:39.4941795Z uwsgi-1  |                                                            ^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4942046Z uwsgi-1  | TypeError: underlying read() should have returned a bytes-like object, not 'str'
2025-09-18T13:37:39.4942137Z uwsgi-1  | 
2025-09-18T13:37:39.4942264Z uwsgi-1  | ======================================================================
2025-09-18T13:37:39.4942700Z uwsgi-1  | ERROR: test_openvas_parser_csv_xml_parity (unittests.tools.test_openvas_parser.TestOpenVASParserV2.test_openvas_parser_csv_xml_parity)
2025-09-18T13:37:39.4942954Z uwsgi-1  | Ensure xml and csv parser parse data that is the same between report in the same way
2025-09-18T13:37:39.4943124Z uwsgi-1  | ----------------------------------------------------------------------
2025-09-18T13:37:39.4943246Z uwsgi-1  | Traceback (most recent call last):
2025-09-18T13:37:39.4943547Z uwsgi-1  |   File "/app/unittests/tools/test_openvas_parser.py", line 80, in test_openvas_parser_csv_xml_parity
2025-09-18T13:37:39.4943684Z uwsgi-1  |     findings_csv = setup_openvas_v2_test(f)
2025-09-18T13:37:39.4943801Z uwsgi-1  |                    ^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4944061Z uwsgi-1  |   File "/app/unittests/tools/test_openvas_parser.py", line 18, in setup_openvas_v2_test
2025-09-18T13:37:39.4944195Z uwsgi-1  |     findings = parser.get_findings(f, test)
2025-09-18T13:37:39.4944303Z uwsgi-1  |                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4944504Z uwsgi-1  |   File "/app/dojo/tools/openvas/parser.py", line 37, in get_findings
2025-09-18T13:37:39.4944641Z uwsgi-1  |     return get_findings_from_csv(file, test)
2025-09-18T13:37:39.4944760Z uwsgi-1  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4945157Z uwsgi-1  |   File "/app/dojo/tools/openvas/parser_v2/csv_parser.py", line 25, in get_findings_from_csv
2025-09-18T13:37:39.4945402Z uwsgi-1  |     column_names = [column_name.lower() for column_name in next(csv_reader) if column_name]
2025-09-18T13:37:39.4945539Z uwsgi-1  |                                                            ^^^^^^^^^^^^^^^^
2025-09-18T13:37:39.4945837Z uwsgi-1  | TypeError: underlying read() should have returned a bytes-like object, not 'str'
2025-09-18T13:37:39.4945929Z uwsgi-1  | 
2025-09-18T13:37:39.4946090Z uwsgi-1  | ----------------------------------------------------------------------
2025-09-18T13:37:39.4946194Z uwsgi-1  | Ran 2993 tests in 401.406s
2025-09-18T13:37:39.4946284Z uwsgi-1  | 
2025-09-18T13:37:39.4946402Z uwsgi-1  | FAILED (errors=5, skipped=462)
2025-09-18T13:37:39.4946617Z uwsgi-1  | Preserving test database for alias 'default' ('test_defectdojo')...

@jostaub
Copy link
Contributor Author

jostaub commented Sep 18, 2025

@valentijnscholten Thank you for the sniped. I was to much distracted by the other errors (sorry).
I did not know that the import over the UI and the tests return different file objects (i only tested the dry run fix in the UI), which caused problems in my case. This is fixed now.

The failing test in unittests.test_tool_config.TestApiScanConfigEntry seem to be caused by a deprecation warning in api_blackduck.

Is the k8s test failing a known problem (otherwise i will try to investigate this)?

@valentijnscholten
Copy link
Member

the blackduck thing is new to me but appears on more PRs since today.
kubernetes tests sometimes needs to be retried by us, or the pr closed-> reopened to trigger the tests again

@valentijnscholten
Copy link
Member

@jostaub If you rebase/merge the blackduck failure should go away.

@dryrunsecurity
Copy link

DryRun Security

This pull request contains a CSV injection finding: OpenVASParserV2 in dojo/tools/openvas/parser.py assigns CSV column values directly to Finding fields (title, summary, impact, mitigation, openvas_result) without sanitizing leading formula characters, and the existing cleanup_openvas_text and escape_restructured_text functions (which only remove newlines or wrap text) do not mitigate spreadsheet-formula injection. As a result, malicious inputs could survive import/export and execute commands or exfiltrate data when the CSV is opened in a spreadsheet.

CSV Injection in dojo/tools/openvas/parser.py
Vulnerability CSV Injection
Description The OpenVASParserV2 processes CSV files and directly assigns column values to Finding model fields such as title, summary, impact, mitigation, and openvas_result without specific sanitization against spreadsheet formula injection. While cleanup_openvas_text removes newlines and escape_restructured_text wraps text in triple backticks for display within DefectDojo, these functions do not prevent malicious formulas (e.g., starting with '=', '+', '-', '@') from being interpreted as commands if the exported data is opened in a spreadsheet program. If a malicious CSV is imported and then its findings are exported, an attacker could craft inputs that, when opened in a spreadsheet, execute arbitrary commands or exfiltrate data.

if str(filename.name).endswith(".xml"):
return OpenVASXMLParser().get_findings(filename, test)
return None
class OpenVASParserV2:
def get_scan_types(self):
return ["OpenVAS Parser v2"]
def get_label_for_scan_types(self, scan_type):
return scan_type
def get_description_for_scan_types(self, scan_type):
return "Import CSV or XML output of Greenbone OpenVAS report."
def get_findings(self, file, test):
if str(file.name).endswith(".csv"):
return get_findings_from_csv(file, test)
if str(file.name).endswith(".xml"):
return get_findings_from_xml(file, test)
return None


All finding details can be found in the DryRun Security Dashboard.

Copy link
Member

@valentijnscholten valentijnscholten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and nice edit on the v2 suffix support.

@valentijnscholten valentijnscholten added this to the 2.51.0 milestone Sep 25, 2025
Copy link
Contributor

@Maffooch Maffooch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the version suffix - excellent idea!

Copy link
Member

@valentijnscholten valentijnscholten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approved (again :-))

@Maffooch
Copy link
Contributor

I used vscode to review and request reviews from others.. seems like that did not work the way I thought 😅

@Maffooch Maffooch removed their request for review September 25, 2025 20:13
Copy link
Contributor

@mtesauro mtesauro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved

@mtesauro mtesauro merged commit 699e3b1 into DefectDojo:dev Sep 26, 2025
89 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs parser settings_changes Needs changes to settings.py based on changes in settings.dist.py included in this PR unittests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants