Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Line numbers #647

Open
wants to merge 53 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
a57ae4a
Create test_line_numbers.py
acoleman2000 Nov 21, 2022
4b23d8d
adding tests
acoleman2000 Jan 9, 2023
2e584ac
updating save method within python_codegen_support.py
acoleman2000 Jan 9, 2023
b7f22de
adding helper methods
acoleman2000 Jan 9, 2023
21659d2
adding additional helper method
acoleman2000 Jan 9, 2023
f68e74a
fixing bugs in helper method
acoleman2000 Jan 9, 2023
2c26bfa
updating python codegen_support.py and python_codegen.py
acoleman2000 Jan 10, 2023
e0ee3f4
updating files
acoleman2000 Jan 11, 2023
8b31a59
updating test
acoleman2000 Jan 12, 2023
6d8bce9
updating files and adding test cwl files
acoleman2000 Jan 13, 2023
3150b2f
Merge branch 'common-workflow-language:main' into line_numbers
acoleman2000 Jan 13, 2023
3143595
Fixing bug with updating sub-docs and non-kv values getting added to …
acoleman2000 Jan 17, 2023
9919bc4
Merge branch 'line_numbers' of https://github.com/acoleman2000/schema…
acoleman2000 Jan 17, 2023
3b80801
updating CommentedSeq lc update
acoleman2000 Jan 17, 2023
8152f0c
updating CommentedSeq lc data
acoleman2000 Jan 17, 2023
6ec8730
Updating metaschema.py
acoleman2000 Jan 17, 2023
149b1ba
updating type -> asinstance and bug fix in save
acoleman2000 Jan 17, 2023
16996bb
Adding doc = copy.copy(doc) before removing values
acoleman2000 Jan 17, 2023
0522ce8
removing typecheck for key in val
acoleman2000 Jan 17, 2023
6f62fb8
running make cleanup
acoleman2000 Jan 18, 2023
ba2dd90
updating metaschema.py
acoleman2000 Jan 18, 2023
2dea597
Fixing type warning for doc
acoleman2000 Jan 23, 2023
74b23d0
working on test
acoleman2000 Jan 23, 2023
8575f3f
Fixing issue with type hints and indentation of setting global variable
acoleman2000 Jan 23, 2023
a087833
adding metaschema.py
acoleman2000 Jan 23, 2023
7bfac7c
fix type error
acoleman2000 Jan 23, 2023
06fc513
fix type error
acoleman2000 Jan 23, 2023
5fd6ca1
Merge branch 'common-workflow-language:main' into line_numbers
acoleman2000 Jan 25, 2023
41d406d
updates to codegen
acoleman2000 Mar 28, 2023
27c7314
Merge branch 'line_numbers' of https://github.com/acoleman2000/schema…
acoleman2000 May 2, 2023
f4e098b
Updating python_codegen and python_codegen_support for cleaner logic …
acoleman2000 May 5, 2023
8263fdb
updating for consistent line numbers
acoleman2000 May 8, 2023
add86c6
adding files for line number tests
acoleman2000 May 11, 2023
625a3a5
adding cwl python codegen files for tests and having them be ignored …
acoleman2000 May 11, 2023
6f544e9
updating python codegen/codegen_support, metaschema, and tests.
acoleman2000 May 11, 2023
4178c78
Merge branch 'main' into line_numbers
acoleman2000 May 11, 2023
74e3247
running make clean-up
acoleman2000 May 11, 2023
c9d35a2
Merge branch 'line_numbers' of https://github.com/acoleman2000/schema…
acoleman2000 May 11, 2023
752dbab
trying to pass tox tests
acoleman2000 May 15, 2023
5d198ee
updating to remove inserted_line_info from global variable
acoleman2000 May 15, 2023
160f559
updating cwl codegen filesfor updated codegen
acoleman2000 May 15, 2023
bdd5c04
Updating codegen to support shifting down of text
acoleman2000 Jun 5, 2023
cc76eb9
Updating metaschema.py and updating to pass lint
acoleman2000 Jun 5, 2023
f85ed3c
running make cleanup
acoleman2000 Jun 5, 2023
3d61e55
updating Makefile to properly exclude cwl files
acoleman2000 Jun 8, 2023
63da121
Trying to pass metaschema up to date test
acoleman2000 Jun 8, 2023
ba8be89
trying alternate style of loading test files in
acoleman2000 Jun 9, 2023
d84a8bd
Merge branch 'main' into line_numbers
acoleman2000 Jun 14, 2023
be53207
Bogus commit to re-run testS
acoleman2000 Jun 15, 2023
5b10422
Merge branch 'main' into line_numbers
Nov 9, 2023
93406dd
Updating line numbers tests to use generated cwl files.
Nov 14, 2023
154af86
Removing static cwl files.
Nov 14, 2023
3afd4b0
Merge branch 'main' into line_numbers
acoleman2000 Nov 14, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 26 additions & 6 deletions schema_salad/python_codegen.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,8 @@ def begin_class(
self.out.write(" pass\n\n\n")
return

field_names.append("_doc")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the fields of the model (which are closely related to, but not exactly the same, as the fields of the class being generated). Why are you adding this here?

Copy link
Contributor Author

@acoleman2000 acoleman2000 Jan 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted the original CommentedMap that had line column information to be associated with the top level class (e.g., Workflow) in order to update the attributes that are being saved. I wasn't sure how to save this Commented Map besides saving it as a class attribute. The save method called by the user goes directly to the save method within the class, so I didn't see a way of passing the Commented Map to the first call to save. If you have any ideas of doing this an alternate way I would be happy to reformat it!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually changed it to have a global doc variable and have the save methods take a list of strings representing what level in the doc that object is. This is useful for the second issue as well, since there are places that error messages need line numbers appended where no version of the doc is present.


required_field_names = [f for f in field_names if f not in optional_fields]
optional_field_names = [f for f in field_names if f in optional_fields]

Expand Down Expand Up @@ -274,10 +276,16 @@ def fromDoc(
self.serializer.write(
"""
def save(
self, top: bool = False, base_url: str = "", relative_uris: bool = True
) -> Dict[str, Any]:
r: Dict[str, Any] = {}

self, top: bool = False, base_url: str = "", relative_uris: bool = True, line_info: Optional[CommentedMap] = None
) -> CommentedMap:
r = CommentedMap()
if line_info is not None:
self._doc = line_info
if (type(self._doc) == CommentedMap):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally use isinstance. The difference is that a direct comparison on type will not match subclasses, whereas isinstance is true both for the same class and for subclasses.
Practically that may not matter if CommentedMap doesn't have any subclasses.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 149b1ba.

r._yaml_set_line_col(self._doc.lc.line,self._doc.lc.col)
line_numbers = get_line_numbers(self._doc)
max_len = get_max_line_num(self._doc)
cols: Dict[int, int] = {}
if relative_uris:
for ef in self.extension_fields:
r[prefix_url(ef, self.loadingOptions.vocab)] = self.extension_fields[ef]
Expand All @@ -301,6 +309,7 @@ def save(
self.serializer.write(
"""
r["class"] = "{class_}"
max_len = add_kv(old_doc=self._doc, new_doc=r, line_numbers=line_numbers, key="class", val=r.get("class"), max_len=max_len, cols=cols)
""".format(
class_=classname
)
Expand Down Expand Up @@ -394,6 +403,7 @@ def type_loader(
sub_names: List[str] = list(
dict.fromkeys([self.type_loader(i).name for i in type_declaration])
)

return self.declare_type(
TypeDef(
"union_of_{}".format("_or_".join(sub_names)),
Expand Down Expand Up @@ -565,12 +575,15 @@ def declare_field(
if self.{safename} is not None:
u = save_relative_uri(self.{safename}, {baseurl}, {scoped_id}, {ref_scope}, relative_uris)
r["{fieldname}"] = u
max_len = add_kv(old_doc = self._doc, new_doc = r, line_numbers = line_numbers, key = "{key_1}", val = r.get("{key_2}"), max_len = max_len, cols = cols)
""".format(
safename=self.safe_name(name),
fieldname=shortname(name).strip(),
baseurl=baseurl,
scoped_id=fieldtype.scoped_id,
ref_scope=fieldtype.ref_scope,
key_1=self.safe_name(name),
key_2=self.safe_name(name),
),
8,
)
Expand All @@ -580,9 +593,16 @@ def declare_field(
fmt(
"""
if self.{safename} is not None:
r["{fieldname}"] = save(
self.{safename}, top=False, base_url={baseurl}, relative_uris=relative_uris
saved_val = save(
self.{safename}, top=False, base_url={baseurl}, relative_uris=relative_uris, doc=self._doc.get("{fieldname}")
)

if type(saved_val) == list:
if len(saved_val) == 1: # If the returned value is a list of size 1, just save the value in the list
saved_val = saved_val[0]
r["{fieldname}"] = saved_val

max_len = add_kv(old_doc = self._doc, new_doc = r, line_numbers = line_numbers, key = "{fieldname}", val = r.get("{fieldname}"), max_len = max_len, cols = cols)
""".format(
safename=self.safe_name(name),
fieldname=shortname(name),
Expand Down
120 changes: 105 additions & 15 deletions schema_salad/python_codegen_support.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@

from rdflib import Graph
from rdflib.plugins.parsers.notation3 import BadSyntax
from ruamel.yaml.comments import CommentedMap
from ruamel.yaml.comments import CommentedMap, CommentedSeq

from schema_salad.exceptions import SchemaSaladException, ValidationException
from schema_salad.fetcher import DefaultFetcher, Fetcher, MemoryCachingFetcher
Expand Down Expand Up @@ -189,8 +189,8 @@ def fromDoc(

@abstractmethod
def save(
self, top: bool = False, base_url: str = "", relative_uris: bool = True
) -> Dict[str, Any]:
self, top: bool = False, base_url: str = "", relative_uris: bool = True, line_info: Optional[CommentedMap] = None
) -> CommentedMap:
"""Convert this object to a JSON/YAML friendly dictionary."""


Expand Down Expand Up @@ -219,27 +219,110 @@ def load_field(val, fieldtype, baseuri, loadingOptions):
Union[MutableMapping[str, Any], MutableSequence[Any], int, float, bool, str]
]

def add_kv(old_doc: CommentedMap, new_doc: CommentedMap, line_numbers: dict[Any,dict[str,int]], key: str, val: Any, max_len: int, cols: dict[int,int])->int:
"""Add key value pair into Commented Map.

Function to add key value pair into new CommentedMap given old CommentedMap, line_numbers for each key/val pair in the old CommentedMap,
key/val pair to insert, max_line of the old CommentedMap, and max col value taken for each line.
"""
if key in line_numbers: # If the key to insert is in the original CommentedMap
new_doc.lc.add_kv_line_col(key, old_doc.lc.data[key])
elif isinstance(val, (int, float, bool, str)): # If the value is hashable
if val in line_numbers: # If the value is in the original CommentedMap
line = line_numbers[val]["line"]
if line in cols:
col = max(line_numbers[val]["col"], cols[line])
else:
col = line_numbers[val]["col"]
new_doc.lc.add_kv_line_col(key, [line, col, line, col + len(key) + 2])
cols[line] = col + len("id") + 2
else: # If neither the key or value is in the original CommentedMap (or value is not hashable)
new_doc.lc.add_kv_line_col(key, [max_len, 0, max_len, len(key) + 2])
max_len += 1
return max_len


def get_line_numbers(doc: CommentedMap) -> dict[Any, dict[str, int]]:
"""Get line numbers for kv pairs in CommentedMap.

For each key/value pair in a CommentedMap, save the line/col info into a dictionary,
only save value info if value is hashable.
"""
line_numbers: Dict[Any, dict[str,int]] = {}
if type(doc) == dict:
return {}
for key, value in doc.lc.data.items():
line_numbers[key] = {}

line_numbers[key]["line"] = doc.lc.data[key][0]
line_numbers[key]["col"] = doc.lc.data[key][1]
if isinstance(value, (int, float, bool, str)):
line_numbers[value] = {}
line_numbers[value]["line"] = doc.lc.data[key][2]
line_numbers[value]["col"] = doc.lc.data[key][3]
return line_numbers


def get_max_line_num(doc: CommentedMap) -> int:
"""Get the max line number for a CommentedMap.

Iterate through the the key with the highest line number until you reach a non-CommentedMap value or empty CommentedMap.
"""
max_line = 0
max_key = ""
cur = doc
while type(cur) == CommentedMap and len(cur) > 0:
for key in cur.lc.data.keys():
if cur.lc.data[key][2] >= max_line:
max_line = cur.lc.data[key][2]
max_key = key
cur = cur[max_key]
return max_line + 1


def save(
val: Any,
top: bool = True,
base_url: str = "",
relative_uris: bool = True,
doc: Optional[CommentedMap] = None,
) -> save_type:
"""Save a val of any type.

Recursively calls save method from class if val is of type Saveable. Otherwise, saves val to CommentedMap or CommentedSeq
"""
if isinstance(val, Saveable):
return val.save(top=top, base_url=base_url, relative_uris=relative_uris)
return val.save(top=top, base_url=base_url, relative_uris=relative_uris, line_info=doc)
if isinstance(val, MutableSequence):
return [
save(v, top=False, base_url=base_url, relative_uris=relative_uris)
for v in val
]
r = CommentedSeq()
for v in val:
if doc:
if isinstance(v,(int, float, bool, str)):
if v in doc:
r.lc.data.add_kv_line_col(v, doc.lc.data[v])
r.append(save(v, top=False, base_url=base_url, relative_uris=relative_uris, doc=doc))
return r
# return [
# save(v, top=False, base_url=base_url, relative_uris=relative_uris)
# for v in val
# ]
if isinstance(val, MutableMapping):
newdict = {}
newdict = CommentedMap()
for key in val:
if doc:
if isinstance(key, (int, float, bool, str)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the key shouldn't ever be anything other than a string, so you can either assume it is a string, or keep the type check but throw an error on a non-string subscript.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in commit 0522ce8

if key in doc:
newdict.lc.add_kv_line_col(key, doc.lc.data[key])
newdict[key] = save(
val[key], top=False, base_url=base_url, relative_uris=relative_uris
val[key], top=False, base_url=base_url, relative_uris=relative_uris, doc=doc
)
return newdict
# newdict = {}
# for key in val:
# newdict[key] = save(
# val[key], top=False, base_url=base_url, relative_uris=relative_uris
# )
# return newdict
tetron marked this conversation as resolved.
Show resolved Hide resolved
if val is None or isinstance(val, (int, float, bool, str)):
return val
raise Exception("Not Saveable: %s" % type(val))
Expand Down Expand Up @@ -710,11 +793,18 @@ def _document_load(
addl_metadata=addl_metadata,
)

doc = {
k: v
for k, v in doc.items()
if k not in ("$namespaces", "$schemas", "$base")
}
# doc = {
# k: v
# for k, v in doc.items()
# if k not in ("$namespaces", "$schemas", "$base")
# }
tetron marked this conversation as resolved.
Show resolved Hide resolved

if "$namespaces" in doc:
doc.pop("$namespaces")
if "$schemas" in doc:
doc.pop("$schemas")
if "$base" in doc:
doc.pop("$base")
tetron marked this conversation as resolved.
Show resolved Hide resolved

if "$graph" in doc:
loadingOptions.idx[baseuri] = (
Expand Down
26 changes: 26 additions & 0 deletions schema_salad/tests/count-lines6-wf_v1_0.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env cwl-runner
class: Workflow
cwlVersion: v1.0

requirements:
- class: ScatterFeatureRequirement
- class: MultipleInputFeatureRequirement

inputs:
file1: File[]
file2: File[]

outputs:
count_output:
type: int
outputSource: step1/output

steps:
step1:
run: wc3-tool_v1_0.cwl
scatter: file1
in:
file1:
source: [file1, file2]
linkMerge: merge_nested
out: [output]
26 changes: 26 additions & 0 deletions schema_salad/tests/count-lines6-wf_v1_1.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env cwl-runner
class: Workflow
cwlVersion: v1.1

requirements:
- class: ScatterFeatureRequirement
- class: MultipleInputFeatureRequirement

inputs:
file1: File[]
file2: File[]

outputs:
count_output:
type: int
outputSource: step1/output

steps:
step1:
run: wc3-tool_v1_1.cwl
scatter: file1
in:
file1:
source: [file1, file2]
linkMerge: merge_nested
out: [output]
27 changes: 27 additions & 0 deletions schema_salad/tests/count-lines6-wf_v1_2.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@

#!/usr/bin/env cwl-runner
class: Workflow
cwlVersion: v1.2

requirements:
- class: ScatterFeatureRequirement
- class: MultipleInputFeatureRequirement

inputs:
file1: File[]
file2: File[]

outputs:
count_output:
type: int
outputSource: step1/output

steps:
step1:
run: wc3-tool_v1_2.cwl
scatter: file1
in:
file1:
source: [file1, file2]
linkMerge: merge_nested
out: [output]
64 changes: 64 additions & 0 deletions schema_salad/tests/test_line_numbers.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
#from parser import load_document_by_uri, save
from pathlib import Path
from schema_salad.utils import yaml_no_ts
from ruamel.yaml.comments import CommentedMap, CommentedSeq
from typing import Any, Dict, List, Optional, cast
from schema_salad import codegen
from schema_salad.avro.schema import Names
from schema_salad.schema import load_schema


def compare_line_numbers(original_doc:CommentedMap, codegen_doc:CommentedMap)->None:
assert type(original_doc) == CommentedMap
assert type(codegen_doc) == CommentedMap

assert original_doc.lc.line == codegen_doc.lc.line
assert original_doc.lc.col == codegen_doc.lc.col

for key, lc_info in original_doc.lc.data.items():
assert key in codegen_doc.lc.data
assert lc_info==codegen_doc.lc.data[key]

max_line = get_max_line_number(original_doc)

for key, lc_info in codegen_doc.lc.data.items():
if key in original_doc:
continue
assert lc_info == [max_line, 0, max_line, len(key) + 2]
max_line += 1

def get_max_line_number(original_doc:CommentedMap)->int:
max_key = ""
max_line = 0
temp_doc = original_doc
while (type(temp_doc) == CommentedMap) and len(temp_doc) > 0:
for key, lc_info in temp_doc.lc.data.items():
if lc_info[0] >= max_line:
max_line = lc_info[0]
max_key = key
temp_doc = temp_doc[max_key]
return max_line + 1

def python_codegen(
file_uri: str,
target: Path,
parser_info: Optional[str] = None,
package: Optional[str] = None,
) -> None:
document_loader, avsc_names, schema_metadata, metaschema_loader = load_schema(
file_uri
)
assert isinstance(avsc_names, Names)
schema_raw_doc = metaschema_loader.fetch(file_uri)
schema_doc, schema_metadata = metaschema_loader.resolve_all(
schema_raw_doc, file_uri
)
codegen.codegen(
"python",
cast(List[Dict[str, Any]], schema_doc),
schema_metadata,
document_loader,
target=str(target),
parser_info=parser_info,
package=package
)
Loading