-
-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added parsing for saltstack interpreted files #86
Changes from 5 commits
4504fdf
7617fa2
e74a495
74b5638
d5d039f
a3b69db
9e7bb37
96a56b9
11d2ce5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,9 +52,10 @@ | |
'gif': {'binary', 'image', 'gif'}, | ||
'go': {'text', 'go'}, | ||
'gotmpl': {'text', 'gotmpl'}, | ||
'gpg': {'text', 'gnupg'}, | ||
'gpx': {'text', 'gpx', 'xml'}, | ||
'graphql': {'text', 'graphql'}, | ||
'gradle': {'text', 'groovy'}, | ||
'graphql': {'text', 'graphql'}, | ||
'groovy': {'text', 'groovy'}, | ||
'gyb': {'text', 'gyb'}, | ||
'gyp': {'text', 'gyp', 'python'}, | ||
|
@@ -100,6 +101,7 @@ | |
'lr': {'text', 'lektor'}, | ||
'lua': {'text', 'lua'}, | ||
'm': {'text', 'c', 'objective-c'}, | ||
'mako': {'text', 'mako'}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have no authority whatsoever on the matter, but for reference, this change was previously rejected in PR #137 , for reasons of mako refusing to specify a standard file extension. However, for reference, over 170 000 files on Github have the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right on, I threw a couple of these in only because I didn;t see them in there already and thought why not, but if there are issues I'll just pull mako no problem there. |
||
'manifest': {'text', 'manifest'}, | ||
'map': {'text', 'map'}, | ||
'markdown': {'text', 'markdown'}, | ||
|
@@ -179,6 +181,7 @@ | |
'tgz': {'binary', 'gzip'}, | ||
'thrift': {'text', 'thrift'}, | ||
'tiff': {'binary', 'image', 'tiff'}, | ||
'tmpl': {'text', 'cheetah'}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This association seems rather questionable. While it is documented that Much worse, of the 2 300 000 Github file results, many/most don't seem to use the Cheetah There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah I did not find those on a quick search, good call again. I'll remove this one. |
||
'toml': {'text', 'toml'}, | ||
'ts': {'text', 'ts'}, | ||
'tsx': {'text', 'tsx'}, | ||
|
@@ -223,6 +226,15 @@ | |
EXTENSIONS_NEED_BINARY_CHECK = { | ||
'plist': {'plist'}, | ||
} | ||
# This should contain a map of file extensions to a map of interpreter names to | ||
# their own file extensions | ||
EXTENSIONS_NEED_SHEBANG_CHECK = { | ||
'sls': { | ||
'pydsl': 'py', | ||
'pyobjects': 'py', | ||
'cheetah': 'tmpl', | ||
}, | ||
} | ||
|
||
NAMES = { | ||
'.babelrc': EXTENSIONS['json'] | {'babelrc'}, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,6 +60,8 @@ def tags_from_path(path): | |
if len(shebang) > 0: | ||
tags.update(tags_from_interpreter(shebang[0])) | ||
|
||
tags.update(tags_from_extension_specific_shebang(path)) | ||
|
||
# some extensions can be both binary and text | ||
# see EXTENSIONS_NEED_BINARY_CHECK | ||
if not {TEXT, BINARY} & tags: | ||
|
@@ -73,6 +75,42 @@ def tags_from_path(path): | |
return tags | ||
|
||
|
||
def tags_from_extension_specific_shebang(path): | ||
"""Match tags from an extension that we need to read the shebang from.""" | ||
_, filename = os.path.split(path) | ||
_, ext = os.path.splitext(filename) | ||
ret = set() | ||
if ext.lstrip('.') not in extensions.EXTENSIONS_NEED_SHEBANG_CHECK: | ||
return ret | ||
|
||
interpreter_to_extension_map = extensions.EXTENSIONS_NEED_SHEBANG_CHECK[ | ||
ext.lstrip('.') | ||
] | ||
|
||
with open(path, 'rb') as f: | ||
shebang = parse_shebang(f) | ||
|
||
if ext == '.sls': | ||
if shebang: | ||
# try to match tags for the file extension of the first interpreter | ||
try: | ||
first_interpreter = shebang[0].split('|')[0] | ||
ret.update( | ||
extensions.EXTENSIONS[ | ||
interpreter_to_extension_map.get( | ||
first_interpreter, first_interpreter, | ||
) | ||
], | ||
) | ||
except (IndexError, KeyError): | ||
pass | ||
else: | ||
# the default interpreter is jinja | ||
ret.update(extensions.EXTENSIONS['jinja']) | ||
|
||
return ret | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These are the lines coverage is complaining about missing tests. I'm stumped. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Have you tried the tests locally to verify that this block is actually executing? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep, I verified 4 locations get hit |
||
|
||
|
||
def tags_from_filename(filename): | ||
_, filename = os.path.split(filename) | ||
_, ext = os.path.splitext(filename) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 101 000 results on Github, and spot-checking the results, as well as Linguist, MIME-DB and FileInfo indicates no conflicts. However, this GnuPG mailing list response, this SO answer and the FileInfo listing, as well as an examining a sample of
.gpg
files on Github, indicates that this extension is designated for binary-format GnuPG data, withasc
being the ASCII-armored text equivalent for both PGP and GnuPG (with 635 000 Github hits, but only 470 000 of them keys, the rest being identified as AGS scripts and ASCIIDoc text).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch, I'll update when I get back