Skip to content

Commit c0f9bed

Browse files
authored
pySCG: adding documentation to CWE-184 as part of #531 (#820)
adding documentation to CWE-184 as part of #531 Co-Authors: s19110 (hubert), tommcd (Thomas) --------- Signed-off-by: Helge Wehder <[email protected]> Signed-off-by: myteron <[email protected]>
1 parent def50a8 commit c0f9bed

File tree

4 files changed

+180
-87
lines changed

4 files changed

+180
-87
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,144 @@
1+
# CWE-184: Incomplete List of Disallowed Input
2+
3+
Avoid Incomplete 'deny lists' that can lead to security vulnerabilities such as cross-site scripting (XSS) by using 'allow lists' instead.
4+
5+
## Non-Compliant Code Example
6+
7+
The `noncompliant01.py` code demonstrates the difficult handling of exclusion lists in a multi language support use case. `UTF-8` has __1,112,064__ mappings between `8-32` bit values and printable characters such as `` known as "code points".
8+
9+
The `noncompliant01.py` `filterString()` method attempts to search for disallowed inputs and fails to find the `script` tag due to the non-English character `` in `<script生>`.
10+
11+
*[noncompliant01.py](noncompliant01.py):*
12+
13+
```python
14+
# SPDX-FileCopyrightText: OpenSSF project contributors
15+
# SPDX-License-Identifier: MIT
16+
"""Compliant Code Example"""
17+
18+
import re
19+
import sys
20+
21+
if sys.stdout.encoding.lower() != "utf-8":
22+
sys.stdout.reconfigure(encoding="UTF-8")
23+
24+
25+
def filter_string(input_string: str):
26+
"""Normalize and validate untrusted string
27+
28+
Parameters:
29+
input_string(string): String to validate
30+
"""
31+
# TODO Canonicalize (normalize) before Validating
32+
33+
# validate, exclude dangerous tags:
34+
for tag in re.findall("<[^>]*>", input_string):
35+
if tag in ["<script>", "<img", "<a href"]:
36+
raise ValueError("Invalid input tag")
37+
38+
39+
#####################
40+
# attempting to exploit above code example
41+
#####################
42+
names = [
43+
"YES 毛泽东先生",
44+
"YES dash-",
45+
"NOK <script" + "\ufdef" + ">",
46+
"NOK <script生>",
47+
]
48+
for name in names:
49+
print(name)
50+
filter_string(name)
51+
52+
```
53+
54+
## Compliant Solution
55+
56+
The `compliant01.py` uses an allow list instead of a deny list and prevents the use of unwanted characters by raising an exception even without canonicalization. The missing canonicalization in `compliant01.py` according to [CWE-180: Incorrect Behavior Order: Validate Before Canonicalize](https://github.com/ossf/wg-best-practices-os-developers/tree/main/docs/Secure-Coding-Guide-for-Python/CWE-707/CWE-180) must be added in order to make logging or displaying them safe!
57+
58+
*[compliant01.py](compliant01.py):*
59+
60+
```python
61+
# SPDX-FileCopyrightText: OpenSSF project contributors
62+
# SPDX-License-Identifier: MIT
63+
"""Compliant Code Example"""
64+
65+
import re
66+
import sys
67+
68+
if sys.stdout.encoding.lower() != "utf-8":
69+
sys.stdout.reconfigure(encoding="UTF-8")
70+
71+
72+
def filter_string(input_string: str):
73+
"""Normalize and validate untrusted string
74+
75+
Parameters:
76+
input_string(string): String to validate
77+
"""
78+
# TODO Canonicalize (normalize) before Validating
79+
80+
# validate, only allow harmless tags
81+
for tag in re.findall("<[^>]*>", input_string):
82+
if tag not in ["<b>", "<p>", "</p>"]:
83+
raise ValueError("Invalid input tag")
84+
# TODO handle exception
85+
86+
87+
#####################
88+
# attempting to exploit above code example
89+
#####################
90+
names = [
91+
"YES 毛泽东先生",
92+
"YES dash-",
93+
"NOK <script" + "\ufdef" + ">",
94+
"NOK <script生>",
95+
]
96+
for name in names:
97+
print(name)
98+
filter_string(name)
99+
100+
```
101+
102+
The `compliant01.py` detects the unallowed character correctly and throws a `ValueError` exception. An actual production solution would also need to canonicalize and handle the exception correctly.
103+
104+
__Example compliant01.py output:__
105+
106+
```bash
107+
/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py
108+
$ python3 compliant01.py
109+
YES 毛泽东先生
110+
YES dash-
111+
NOK <script﷯>
112+
Traceback (most recent call last):
113+
File "/workspace/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py", line 38, in <module>
114+
filter_string(name)
115+
File "/workspace/wg-best-practices-os-developers/docs/Secure-Coding-Guide-for-Python/CWE-693/CWE-184/compliant01.py", line 23, in filter_string
116+
raise ValueError("Invalid input tag")
117+
ValueError: Invalid input tag
118+
119+
```
120+
121+
According to *Unicode Technical Report #36, Unicode Security Considerations [Davis 2008b]*, `\uFFFD` is usually unproblematic, as a replacement for unwanted or dangerous characters. That is, `\uFFFD` will typically just cause a failure in parsing. Where the output character set is not Unicode, though, this character may not be available.
122+
123+
## Automated Detection
124+
125+
|Tool|Version|Checker|Description|
126+
|:---|:---|:---|:---|
127+
|Bandit|1.7.4 on Python 3.10.4|Not Available||
128+
|Flake8|8-4.0.1 on Python 3.10.4|Not Available||
129+
130+
## Related Guidelines
131+
132+
|||
133+
|:---|:---|
134+
|[MITRE CWE](http://cwe.mitre.org/)|Pillar: [CWE-693: CWE-693: Protection Mechanism Failure (mitre.org)](https://cwe.mitre.org/data/definitions/693.html)|
135+
|[MITRE CWE](http://cwe.mitre.org/)|Base : [CWE-184, Incomplete List of Disallowed Inputs (4.13) (mitre.org)](https://cwe.mitre.org/data/definitions/184.html)|
136+
|[SEI CERT Coding Standard for Java](https://wiki.sei.cmu.edu/confluence/display/java/SEI+CERT+Oracle+Coding+Standard+for+Java)|[IDS11-J. Perform any string modifications before validation](https://wiki.sei.cmu.edu/confluence/display/java/IDS11-J.+Perform+any+string+modifications+before+validation)|
137+
138+
## Bibliography
139+
140+
|||
141+
|:---|:---|
142+
|[Unicode 2024]|Unicode 16.0.0 [online]. Available from: [https://www.unicode.org/versions/Unicode16.0.0/](https://www.unicode.org/versions/Unicode16.0.0/) [accessed 20 March 2025] |
143+
|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] |
144+
|[Davis 2008b]|Unicode Technical Report #36, Unicode Security Considerations, Section 3.5 "Deletion of Code Points" [online]. Available from: [https://www.unicode.org/reports/tr36/](https://www.unicode.org/reports/tr36/) [accessed 20 March 2025] |
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,38 @@
11
# SPDX-FileCopyrightText: OpenSSF project contributors
22
# SPDX-License-Identifier: MIT
3-
""" Compliant Code Example """
3+
"""Compliant Code Example"""
4+
45
import re
5-
import unicodedata
66
import sys
77

8-
sys.stdout.reconfigure(encoding="UTF-8")
9-
10-
11-
class TagFilter:
12-
"""Input validation for human language"""
8+
if sys.stdout.encoding.lower() != "utf-8":
9+
sys.stdout.reconfigure(encoding="UTF-8")
1310

14-
def filter_string(self, input_string: str) -> str:
15-
"""Normalize and validate untrusted string
1611

17-
Parameters:
18-
input_string(string): String to validate
19-
"""
20-
# normalize
21-
_str = unicodedata.normalize("NFKC", input_string)
12+
def filter_string(input_string: str):
13+
"""Normalize and validate untrusted string
2214
23-
# modify, keep only trusted human words
24-
_filtered_str = "".join(re.findall(r"[/\w<>\s-]+", _str))
25-
if len(_str) - len(_filtered_str) != 0:
26-
raise ValueError("Invalid input string")
15+
Parameters:
16+
input_string(string): String to validate
17+
"""
18+
# TODO Canonicalize (normalize) before Validating
2719

28-
# validate, only allow harmless tags
29-
for tag in re.findall("<[^>]*>", _str):
30-
if tag not in ["<b>", "<p>", "</p>"]:
31-
raise ValueError("Invalid input tag")
32-
return _str
20+
# validate, only allow harmless tags
21+
for tag in re.findall("<[^>]*>", input_string):
22+
if tag not in ["<b>", "<p>", "</p>"]:
23+
raise ValueError("Invalid input tag")
24+
# TODO handle exception
3325

3426

3527
#####################
3628
# attempting to exploit above code example
3729
#####################
3830
names = [
3931
"YES 毛泽东先生",
40-
"YES María Quiñones Marqués",
41-
"YES Борис Николаевич Ельцин",
42-
"YES Björk Guðmundsdóttir",
43-
"YES 0123456789",
44-
"YES <b>",
45-
"YES <p>foo</p>",
46-
"YES underscore_",
4732
"YES dash-",
48-
"NOK semicolon;",
49-
"NOK noprint " + "\uFDD0",
50-
"NOK noprint " + "\uFDEF",
51-
"NOK <script" + "\uFDEF" + ">",
33+
"NOK <script" + "\ufdef" + ">",
5234
"NOK <script生>",
53-
"NOK and &",
5435
]
5536
for name in names:
56-
print(f"{name}", end=" ")
57-
try:
58-
TagFilter().filter_string(name)
59-
except ValueError as e:
60-
print(" Error: " + str(e))
61-
else:
62-
print(" OK")
37+
print(name)
38+
filter_string(name)
Original file line numberDiff line numberDiff line change
@@ -1,63 +1,37 @@
11
# SPDX-FileCopyrightText: OpenSSF project contributors
22
# SPDX-License-Identifier: MIT
3-
""" Non-compliant Code Example """
3+
"""Compliant Code Example"""
4+
45
import re
5-
import unicodedata
66
import sys
77

8-
sys.stdout.reconfigure(encoding="UTF-8")
9-
10-
11-
class TagFilter:
12-
"""Input validation for human language"""
8+
if sys.stdout.encoding.lower() != "utf-8":
9+
sys.stdout.reconfigure(encoding="UTF-8")
1310

14-
def filter_string(self, input_string: str) -> str:
15-
"""Normalize and validate untrusted string
1611

17-
Parameters:
18-
input_string(string): String to validate
19-
"""
20-
# normalize
21-
_str = unicodedata.normalize("NFKC", input_string)
12+
def filter_string(input_string: str):
13+
"""Normalize and validate untrusted string
2214
23-
# validate, exclude dangerous tags
24-
for tag in re.findall("<[^>]*>", _str):
25-
if tag in ["<script>", "<img", "<a href"]:
26-
raise ValueError("Invalid input tag")
15+
Parameters:
16+
input_string(string): String to validate
17+
"""
18+
# TODO Canonicalize (normalize) before Validating
2719

28-
# modify, keep only trusted human words
29-
# _filtered_str = "".join(re.findall(r'([\//\w<>\s_-]+)', _str))
30-
_filtered_str = "".join(re.findall(r"[/\w<>\s-]+", _str))
31-
if len(_str) - len(_filtered_str) != 0:
32-
raise ValueError("Invalid input string")
33-
return _filtered_str
20+
# validate, exclude dangerous tags:
21+
for tag in re.findall("<[^>]*>", input_string):
22+
if tag in ["<script>", "<img", "<a href"]:
23+
raise ValueError("Invalid input tag")
3424

3525

3626
#####################
3727
# attempting to exploit above code example
3828
#####################
3929
names = [
4030
"YES 毛泽东先生",
41-
"YES María Quiñones Marqués",
42-
"YES Борис Николаевич Ельцин",
43-
"YES Björk Guðmundsdóttir",
44-
"YES 0123456789",
45-
"YES <b>",
46-
"YES <p>foo</p>",
47-
"YES underscore_",
4831
"YES dash-",
49-
"NOK semicolon;",
50-
"NOK noprint " + "\uFDD0",
51-
"NOK noprint " + "\uFDEF",
52-
"NOK <script" + "\uFDEF" + ">",
32+
"NOK <script" + "\ufdef" + ">",
5333
"NOK <script生>",
54-
"NOK and &",
5534
]
5635
for name in names:
57-
print(name, end=" ")
58-
try:
59-
TagFilter().filter_string(name)
60-
except ValueError as e:
61-
print(" Error: " + str(e))
62-
else:
63-
print(" OK")
36+
print(name)
37+
filter_string(name)

docs/Secure-Coding-Guide-for-Python/readme.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,7 @@ It is __not production code__ and requires code-style or python best practices t
7575
|[CWE-617: Reachable Assertion](CWE-691/CWE-617/README.md)||
7676

7777
|[CWE-693: Protection Mechanism Failure](https://cwe.mitre.org/data/definitions/693.html)|Prominent CVE|
78-
|:----------------------------------------------------------------|:----|
79-
|[CWE-184: Incomplete List of Disallowed Input](CWE-693/CWE-184/.)||
78+
|[CWE-184: Incomplete List of Disallowed Input](CWE-693/CWE-184/README.md)||
8079
|[CWE-330: Use of Insufficiently Random Values](CWE-693/CWE-330/README.md)|[CVE-2020-7548](https://www.cvedetails.com/cve/CVE-2020-7548),<br/>CVSSv3.1: __9.8__,<br/>EPSS: __0.22__ (12.12.2024)|
8180
|[CWE-798: Use of hardcoded credentials](CWE-693/CWE-798/README.md)||
8281

0 commit comments

Comments
 (0)