Line numbers #647

acoleman2000 · 2023-01-13T16:21:56Z

I made changes to python_codegen.py, python_codegen_support.py, and introduced a test file test_line_numbers.py that intergrates with the test suite.

I identified several blockers within the current code preventing line numbers from being associated with keys during the saving process.

During the loading process, the cwl is read in and saved as a CommentedMap, which has associated line numbers. However in the _document_load method in python_codegen_support.py the CommentedMap was replaced with a dictionary

       doc = {
            k: v
            for k, v in doc.items()
            if k not in ("$namespaces", "$schemas", "$base")
           }

I replaced this code with

        if "$namespaces" in doc:
            doc.pop("$namespaces")
        if "$schemas" in doc:
            doc.pop("$schemas")
        if "$base" in doc:
            doc.pop("$base")

to keep doc in CommentedMap form.

Additionally, I noticed in the fromDoc method doc was being set to None or overriden to be something else, so I saved the original passed in doc as self._doc, following the naming conventions.

I wanted to use the lc info from the original YAML passed in, so I modified the save method for each class to take in line_numbers, a CommentedMap. If line_numbers isn't null, it replaces the self._doc field. This is done to save the original CommentedMap and propagate it downwards.

`python_codegen_support.py`

I added several methods.

I added a method that extracts the max_line (+ 1) number from a CommentedMap. This iterates through the child with the highest line number until it reaches the end). This is used to insert the line column info for new fields in the returned doc.

I added a method that adds a the kv lc info into the returned doc. This is the real meat of the change. This takes a CommentedMap to insert into, an old CommentedMap, a dictionary of line numbers, and a dictionary of line numbers to maximum col used in the line, and a max_len variable. First the method checks if the key is in the line numbers, and then inserts the old lc info directly info the new Commented Map. Then, if the key isn't in the line numbers, it checks if the value is in the line numbers and inserts it using that line number with an adjusted column number (based on the length of the key and the maximum col for that line). It then checks if the value is in the old_doc, and inserts with that lc information. Finally, if neither the key or the val is the line numbers, it inserts it to max_len, and increases max_len by 1. It has appropriate logic for DSL expansion:

elif isinstance(val, str):  # Logic for DSL expansion with "?"
            if val + "?" in line_numbers:
                line = line_numbers[val + "?"]["line"] + shift
                if line in inserted_line_info:
                    line = max_line
                col = line_numbers[val + "?"]["col"]
                new_doc.lc.add_kv_line_col(key, [line, col, line, col + len(key) + 2])
                inserted_line_info[line] = col + len(key) + 2

I added a method that pulls out the lc info for all kv pairs in a Commented doc. For example, if a CommentedMap was like orderddict("key, "value") with lc info ["key": [1, 0, 1, 6]] it would return {"key": {"line":1, "col": 0}, "value':{"line":1, "col":6}}

I also modified the save method. It changes the return type from list/dict to CommentedSeq/CommentedMap, takes in a doc field, and if the k/v pair is in the doc, it adds the lc info to the return type.

I added a method, iterate_through_doc, that has no type check and takes a list of keys, and iterates through the global doc to the appropriate place. It has no type check since it goes from CommentedMap -> CommentedSeq before eventually ending up at a CommentedMap (or None)

python_codegen.py

I modified several things in python_codegen.py

First, I modified the fromDoc attribute to save the self._doc attribute to the class.

I modified the save method. I changed the return type r from dict to CommentedMap. I added the code to override the self._doc, calculate max_len, line_numbers, and set an empty dictionary to store col info. I also updated max_len after inserting each class attribute to r by calling add_kv, which also adds the lc value to r.

To prevent issues of something like the outputs key being before an inputs key and overexpanding, causing inconsistency with line numbers, I iterate through all keys in the line number doc and add the line numbers, before going through all attributes like normal.

                if isinstance(key, str):
                    if hasattr(self, key):
                        if getattr(self, key) is not None:
                            #add lc info

Additionally, due to array expansion and DSL expansion, sometimes there is a shift down. To appropriately make sure everything ends up on the same line, I added shift counter that says how many lines to shift down for a value.

test_line_numbers

I added 3 tests.

One test is outputs field being before inputs.
One test checks secondary files DSL expansion.
One test checks type DSL expansion.

mr-c · 2023-01-15T14:53:48Z

Thank you @acoleman2000 for this! Can you run make cleanup?

tetron · 2023-01-17T16:20:45Z

To re-create "metaschema.py" do

schema-salad-tool --codegen=python schema_salad/metaschema/metaschema.yml > schema_salad/metaschema.py

tetron · 2023-01-17T16:26:43Z

schema_salad/python_codegen.py

+        r = CommentedMap()
+        if line_info is not None:
+            self._doc = line_info
+        if (type(self._doc) == CommentedMap):


I generally use isinstance. The difference is that a direct comparison on type will not match subclasses, whereas isinstance is true both for the same class and for subclasses.
Practically that may not matter if CommentedMap doesn't have any subclasses.

Fixed in commit 149b1ba.

schema_salad/python_codegen_support.py

tetron · 2023-01-17T16:54:41Z

schema_salad/python_codegen_support.py

+        newdict = CommentedMap()
        for key in val:
+            if doc:
+                if isinstance(key, (int, float, bool, str)):


the key shouldn't ever be anything other than a string, so you can either assume it is a string, or keep the type check but throw an error on a non-string subscript.

Removed in commit 0522ce8

tetron · 2023-01-17T17:01:11Z

schema_salad/python_codegen.py

            self.out.write("    pass\n\n\n")
            return

+        field_names.append("_doc")


These are the fields of the model (which are closely related to, but not exactly the same, as the fields of the class being generated). Why are you adding this here?

I wanted the original CommentedMap that had line column information to be associated with the top level class (e.g., Workflow) in order to update the attributes that are being saved. I wasn't sure how to save this Commented Map besides saving it as a class attribute. The save method called by the user goes directly to the save method within the class, so I didn't see a way of passing the Commented Map to the first call to save. If you have any ideas of doing this an alternate way I would be happy to reformat it!

I actually changed it to have a global doc variable and have the save methods take a list of strings representing what level in the doc that object is. This is useful for the second issue as well, since there are places that error messages need line numbers appended where no version of the doc is present.

…returned doc

…_salad into line_numbers

schema_salad/tests/test_line_numbers.py

schema_salad/python_codegen_support.py

tetron · 2023-02-03T18:52:28Z

schema_salad/python_codegen_support.py

+    for key in keys:
+        if isinstance(doc, CommentedMap):
+            doc = doc.get(key)
+        elif isinstance(doc, (CommentedSeq, list)) and isinstance(key, int):
+            if key < len(doc):
+                doc = doc[key]
+            else:
+                doc = None
+        else:
+            doc = None
+            break


Need some discussion about what's going on here. It looks like you're using "keys" to find a path through the original document, to find the leaf node that has the line number info we want.

What happens when you have a field with mapSubject and it's been converted (internally) from a dict to a list? In this case it is intentional for save() to emit the normalized form, which is the list form, but that may or may not correspond to the original document, depending on the original document used the list form or the dict form.

tetron · 2023-02-03T18:54:09Z

schema_salad/python_codegen_support.py

+            if doc:
+                if i in doc:
+                    r.lc.data[i] = doc.lc.data[i]
+                    new_keys.append(i)


append is a destructive modification, so appending to new_keys is also modifying the contents of keys which is probably not what you intended.

tetron · 2023-02-03T19:04:51Z

schema_salad/python_codegen_support.py

 IdxType = MutableMapping[str, Tuple[Any, "LoadingOptions"]]


+doc_line_info = CommentedMap()


Instead of doc_line_info being a global variable, how about having save() take LoadingOptions and using original_doc ?

tetron · 2023-02-03T19:07:23Z

schema_salad/python_codegen.py

+            saved_val = saved_val[0]
+    r["{fieldname}"] = saved_val
+
+    max_len = add_kv(old_doc = doc, new_doc = r, line_numbers = line_numbers, key = "{fieldname}", val = r.get("{fieldname}"), max_len = max_len, cols = cols)


Since you have r, key and val together already, I would consider moving the assignment of r["{fieldname}"] = saved_val into add_kv() so the add_kv() method is responsible for setting both the value and metadata of the entry at the same time.

…_salad into line_numbers

…and less conditionals - as well as some bug fixes

…wherever metaschema.py is ignored

…_salad into line_numbers

codecov · 2023-06-08T17:33:44Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (138e249) 83.68% compared to head (be53207) 83.63%.

❗ Current head be53207 differs from pull request most recent head 3afd4b0. Consider uploading reports for the commit 3afd4b0 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #647      +/-   ##
==========================================
- Coverage   83.68%   83.63%   -0.06%     
==========================================
  Files          22       22              
  Lines        4580     4497      -83     
  Branches     1239     1242       +3     
==========================================
- Hits         3833     3761      -72     
+ Misses        483      470      -13     
- Partials      264      266       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mr-c

Exciting! I left some comments. Would be nice if this PR doesn't add 100,000+ lines

mr-c · 2023-06-23T04:42:00Z

Makefile

 cleanup: sort_imports format flake8 diff_pydocstyle_report

-## install-dep            : install most of the development dependencies via pip
+## install-dep            : inshttps://github.com/common-workflow-language/cwltool/issues?q=is%3Aissue+is%3Aopen+author%3Atom-tantall most of the development dependencies via pip


mr-c · 2023-06-23T04:42:54Z

schema_salad/python_codegen.py

+        # names = []
+        # for name in field_names:
+        #     names.append("('%s', 0)"%name)
+
+        # self.serializer.write(
+        #    fmt(f"""ordered_attrs = CommentedMap(["{', '.join(names)}])\n""", 4)
+        # )
+


Suggested change

# names = []

# for name in field_names:

# names.append("('%s', 0)"%name)

# self.serializer.write(

# fmt(f"""ordered_attrs = CommentedMap(["{', '.join(names)}])\n""", 4)

# )

mr-c · 2023-06-23T04:43:48Z

schema_salad/python_codegen_support.py

+    return max_len + 1, inserted_line_info
+
+
+@no_type_check


mr-c · 2023-06-23T04:45:38Z

schema_salad/tests/cwl_v1_2.py

What's the difference between this and schema_salad/cwl_v1_2.py ?

mr-c · 2023-06-23T04:46:31Z

schema_salad/tests/test_line_numbers.py

How are we keeping schema_salad/tests/cwl_v1_2.py up to date? Maybe it would be better to generate that when the tests are run..

acoleman2000 and others added 11 commits November 21, 2022 10:27

Create test_line_numbers.py

a57ae4a

adding tests

4b23d8d

updating save method within python_codegen_support.py

2e584ac

adding helper methods

b7f22de

adding additional helper method

21659d2

fixing bugs in helper method

f68e74a

updating python codegen_support.py and python_codegen.py

2c26bfa

updating files

e0ee3f4

updating test

8b31a59

updating files and adding test cwl files

6d8bce9

Merge branch 'common-workflow-language:main' into line_numbers

3150b2f

tetron suggested changes Jan 17, 2023

View reviewed changes

acoleman2000 added 16 commits January 17, 2023 10:11

Fixing bug with updating sub-docs and non-kv values getting added to …

3143595

…returned doc

Merge branch 'line_numbers' of https://github.com/acoleman2000/schema…

9919bc4

…_salad into line_numbers

updating CommentedSeq lc update

3b80801

updating CommentedSeq lc data

8152f0c

Updating metaschema.py

6ec8730

updating type -> asinstance and bug fix in save

149b1ba

Adding doc = copy.copy(doc) before removing values

16996bb

removing typecheck for key in val

0522ce8

running make cleanup

6f62fb8

updating metaschema.py

ba2dd90

Fixing type warning for doc

2dea597

working on test

74b23d0

Fixing issue with type hints and indentation of setting global variable

8575f3f

adding metaschema.py

a087833

fix type error

7bfac7c

fix type error

06fc513

Merge branch 'common-workflow-language:main' into line_numbers

5fd6ca1

tetron suggested changes Feb 3, 2023

View reviewed changes

acoleman2000 and others added 18 commits March 28, 2023 17:27

updates to codegen

41d406d

Merge branch 'line_numbers' of https://github.com/acoleman2000/schema…

27c7314

…_salad into line_numbers

Updating python_codegen and python_codegen_support for cleaner logic …

f4e098b

…and less conditionals - as well as some bug fixes

updating for consistent line numbers

8263fdb

adding files for line number tests

add86c6

adding cwl python codegen files for tests and having them be ignored …

625a3a5

…wherever metaschema.py is ignored

updating python codegen/codegen_support, metaschema, and tests.

6f544e9

Merge branch 'main' into line_numbers

4178c78

running make clean-up

74e3247

Merge branch 'line_numbers' of https://github.com/acoleman2000/schema…

c9d35a2

…_salad into line_numbers

trying to pass tox tests

752dbab

updating to remove inserted_line_info from global variable

5d198ee

updating cwl codegen filesfor updated codegen

160f559

Updating codegen to support shifting down of text

bdd5c04

Updating metaschema.py and updating to pass lint

cc76eb9

running make cleanup

f85ed3c

updating Makefile to properly exclude cwl files

3d61e55

Trying to pass metaschema up to date test

63da121

acoleman2000 and others added 3 commits June 9, 2023 15:59

trying alternate style of loading test files in

ba8be89

Merge branch 'main' into line_numbers

d84a8bd

Bogus commit to re-run testS

be53207

tetron marked this pull request as ready for review June 22, 2023 21:12

mr-c requested changes Jun 23, 2023

View reviewed changes

Alex Coleman and others added 4 commits November 9, 2023 12:19

Merge branch 'main' into line_numbers

5b10422

Updating line numbers tests to use generated cwl files.

93406dd

Removing static cwl files.

154af86

Merge branch 'main' into line_numbers

3afd4b0

		IdxType = MutableMapping[str, Tuple[Any, "LoadingOptions"]]


		doc_line_info = CommentedMap()

Uh oh!

Line numbers #647

Are you sure you want to change the base?

Line numbers #647

Uh oh!

Conversation

acoleman2000 commented Jan 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

python_codegen_support.py

python_codegen.py

test_line_numbers

Uh oh!

mr-c commented Jan 15, 2023

Uh oh!

tetron commented Jan 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acoleman2000 Jan 18, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jun 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mr-c left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

acoleman2000 commented Jan 13, 2023 •

edited

Loading

`python_codegen_support.py`

tetron commented Jan 17, 2023 •

edited

Loading

acoleman2000 Jan 18, 2023 •

edited

Loading

codecov bot commented Jun 8, 2023 •

edited

Loading