Skip to content

Dbutson/datatypes #173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open

Dbutson/datatypes #173

wants to merge 9 commits into from

Conversation

butson
Copy link
Contributor

@butson butson commented Apr 17, 2025

This seems to handle datetime , time, and timestamp better than current implementation. This has a client side calendar to be able to address datetimes across the full range 1/1/1 UTC to 12/31/9999 UTC. Do not have test cases included here.

butson added 9 commits March 14, 2025 16:58
…imestruct which limited the support datetime range. calendar supports all dates
…d be positive. pack seconds microseconds put into a utility function. switched to use tzlocal and ZoneInfo instead of pytz
… time zone is not supported. These changes set timezone on python datetime and allows input to be datetime with timezone not on connection timezone.
@butson butson requested a review from madscientist April 17, 2025 16:23
@@ -43,7 +43,7 @@
url='https://github.com/nuodb/nuodb-python',
license='BSD License',
long_description=open(readme).read(),
install_requires=['pytz>=2015.4', 'ipaddress'],
install_requires=['tzlocal>=3.0', 'ipaddress'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't work; we still have to support Python 2 and tzlocal 3.0 requires Python 3. If we do have to use tzlocal it must be >=2.1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a way to specify dependencies based on Python version: https://packaging.python.org/en/latest/specifications/dependency-specifiers/#environment-markers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madscientist we need to move past 2.7. its been dead for 5 years now. There are issues with pytz and it's discouraged now that datetime has better timezone support. This change relies somewhat heavily on supporting timezone aware datetime. I'll look to see if I can conditionalize everything based upon release. Ross had mentioned on slack that we only support 3.9> but you are saying no. I hope there are plans to move pytest to 3.x instead of being stuck on 2.7 forever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my other comment on this subject.

Comment on lines +194 to +196



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 specifies only one blank line between methods in a class

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how can I confirm I meet your/pep standards before checking in. I will create a new branch as a fix to all that is mentioned here. but you only want one commit so how do I check before commit.

Copy link
Contributor

@madscientist madscientist Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we normally do is edit our code with an editor that has a "python mode" and can be configured to warn (via little red squiggles in the text or whatever) for code that violates PEP8. We are not super-rigorous about it (for example we don't use black or similar) and we don't have any sort of automated check on push.

It's fine to have multiple commits while developing. What I mean is that we want to squash them all before pushing to master (e.g., we don't just merge the branch). Normally as part of preparing to submit a PR, we would rebase all our work to latest master, squash / whatever the commits into a "final form", run the tests, and then create the PR. Fixes after the PR is submitted would get new commits, then the whole thing is squashed before pushing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@madscientist You can make GitHub enforce only merging PRs with rebase. This is what we do in all of the repos for ORCA code. I'm not sure who has admin rights on this repo to enable that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have this set. We allow squash and rebase merging, and disallow merge commits.

Comment on lines +297 to +299



Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One blank line

Comment on lines +138 to +150
for key in params:
if key.lower() == 'timezone':
timezone_name = params[key]
break
if not timezone_name:
# from doc: https://pypi.org/project/tzlocal/
# You can also use tzlocal to get the name of your local
# timezone, but only if your system is configured to make
# that possible. tzlocal looks for the timezone name in
# /etc/timezone, /var/db/zoneinfo, /etc/sysconfig/clock
# and /etc/conf.d/clock. If your /etc/localtime is a
# symlink it can also extract the name from that symlink.
additional_params['timezone'] = tzlocal.get_localzone_name()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably just write this like:

if 'timezone' not in [k.lower() for k in params]:
    params['timezone'] = tzlocal.get_localzone_name()

@@ -0,0 +1,132 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really the case that there's no Python module that provides these facilities already? Why are we rolling our own?

@@ -897,7 +965,28 @@ def getClob(self):

raise DataError('Not a clob')


__shifters = [1,10,100,1000,10000,100000,1000000,10000000,100000000, 1000000000]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use "10 ** scale" etc. to get the right value instead of creating this list and indexing on it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

expected that lookup was faster than ** scale.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be quite surprised if the performance difference can be detected, if it even exists; I prefer simplicity unless we can prove with benchmarking that it matters.

Comment on lines +984 to +986
# def __unpack(self,scale,time):
# ticks = decimal.Decimal(time) / decimal.Decimal(10**scale)
# return (round(int(ticks)), int((ticks % 1) * decimal.Decimal(1000000)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't leave commented-out code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was testing performance between the two implementations with cProfile or line_profile. and left in incase you wanted to compare. can easily take it out.

day = ymd2day(value.year, value.month, value.day)
return day * TICKSDAY

def packtime(seconds: int, microseconds: int) -> (int, int):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we must continue to support Python2 so this syntax is illegal. Please use # type: comments instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we were past supporting python3. The pip package says it only support python3. python2 has been dead forever now.

Copy link
Contributor

@madscientist madscientist Apr 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only support Python3 for customers. However we use the Python driver extensively in all our internal testing and large swathes of this internal testing is all written in Python2. We have very few resources we can expend on rewriting all that (working perfectly well) internal testing infrastructure to Python3. This work is progressing but very very slowly. It's barely possible that sometime next year (2026) we can make a serious push to fully deprecating Python2 support internally. Until then, we have to support both.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general notes:

  • Before pushing we will need to squash this into a single commit, or else if there are certain changes that are "stand-alone" there can be multiple commits but each commit should have a single purpose and change the code from one known-good state to another known-good state (e.g., all tests should pass after each individual commit)
  • We need to ensure that the commit message provides a comprehensive discussion of the reason for the changes: the commit message shouldn't restate the changes in English but it should discuss why each one was necessary. Also the subject of the commit message needs to be informative on its own, as well as <75 chars long. See https://cbea.ms/git-commit/#seven-rules for info one the format we try to adhere to.
  • It seems like you haven't rebased these changes onto the HEAD of the latest master branch: usually before we start a code review we always rebase to the latest code and make sure the regression tests are passing.
  • There is a failure in circleci. Maybe missing something from requirements or setup?
  • We need to have some tests that exercise these new changes
  • We need this module to run in both Python2 still, as well as Python3
  • Need to add type: comments so that mypy will check for type errors
  • We try to have our Python follow the PEP8 coding standards; in particular I noticed:
    -- There should be exactly two blank lines between "top-level" classes and functions
    -- There should be exactly one blank line between methods inside classes, and also nested classes if any
    -- There should be a space after commas in lists
    -- Maybe other things; if you use a Python-enabled editor it should be possible to turn on pyflake or similar which will show PEP8 issues
    -- I didn't try to mark all the places

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best thing to do is introduce a set of tests for different date/time situations, which both fail and succeed with the current implementation, then we will know when we've gotten a proper fix. Maybe we can use the set of date/time unit tests we have for the C/C++ driver (and maybe JDBC? I'm not familiar with those) as a model and re-create them here in Python.

@@ -251,6 +272,31 @@ def set_encryption(self, value):
"""Enable or disable encryption."""
self.__encryption = value

def _set_timezone(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not thrilled about doing these types of SQL queries "behind the user's back" as part of the underlying connection. If we have to, we have to. But I wonder. How do the JDBC and C/C++ drivers handle this situation? Do they also do this type of query?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. It's possible that we can just use what is passed in. but, I thought I saw a difference between what I passed in and what is used on the connection. I believe when we support timestamp with out time zone protocol. the time zone of the connection will be returned on every execute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants