Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Promote from BLAIS5-2965 to main #48

Merged
merged 4 commits into from
Dec 22, 2021
Merged

Promote from BLAIS5-2965 to main #48

merged 4 commits into from
Dec 22, 2021

Conversation

social-surveys-blaise-concourse
Copy link
Collaborator

@social-surveys-blaise-concourse social-surveys-blaise-concourse commented Dec 17, 2021

This PR makes fairly sweeping changes to how sessions are managed for the cawi-portal.

What was the issue?

We accidentally introduced a bug when we added automatic token refreshing.

What actually happens when a user clicks save and sign out:

  • An API call is made to blaise to do the session clearing at a case level. When this happens our proxy detects it and also refreshes the users authentication session (the session JWT)
  • We logout the users. This clears the users authentication session.

That all sounds awesome! But then bad things sometimes happen. Basically, we attempt to refresh and clear the session at almost the exact same time.

After some monitoring we worked out this is often within the same millisecond. This gives us a race condition. Sometimes the session gets cleared and then refreshed. Instead of the desired order refreshed and then cleared. Doh!

There is no easy fix for this, as the events are fired by separate calls to the webserver and the timings vary depending on performance. We looked at adding some kind of sleep, but that was janky as well as being unreliable.

What have we actually done about it?

Lets start with the simpler bits

  • We now only refresh the authentication session when a call is made to the blaise API endpoints (we used to refresh on any action, this included loading of static assets etc, which was just overkill!)
  • We have stopped checking for session expiry when a user clicks save and sign out. There is an edge case where a user would click save and sign out where they had either already signed out, or had their session cleared super duper quickly (yay race conditions again). This would take them to the session timeout screen instead of the sign out page, not the end of the world, but it was annoying!

But what were the important bits?

We now use multiple sessions and multiple stores:

  • CSRF tokens use a cookie based session store with 30 days of session expiry configured. This means that a user can stay on our login page for a looooong time before they get an annoying message when they try to login. Keeping this in cookies also helps us keep our new redis backed sessions nice and clean!
  • User session and a new "validation" session are stored in a redis based session store with a 1 day expiry configured. While our JWTs expire in as little as 15 minutes (or whatever the questionnaire configures) we cannot configure the store per questionnaire. This is a compromise as security would never allow a user to be logged in for over a day, it means we can keep our redis store from growing potentially very large (cant costing lots of money!) while maintaining high compatibility with questionnaire settings. There are no security downsides to this as the JWT expiry is still the primary validation rule.

So what is this new "validation session".

Basically, when a user logs in we set a user session as "valid" in the validation session. When a user logs out, we delete this session. Simple!

Why does this help?

The refresh token function never writes to the validation session. So once we logout even if the users session gets refreshed straight afterwards the session is still marked as invalid.

To make this work we also had to update our AuthenticatedWithUac middleware. This now makes sure you have a valid session before attempting to decrypt and check the JWT.

Supporting PR

As we need a new redis database we also need to merge the corresponding PR in blaise-terraform that sets up our new DB.

https://github.com/ONSdigital/blaise-terraform/pull/490

Forked repos

Currently, this code depends on some forked repos to help us manage the multiple session stores.

https://github.com/srbry/gin-csrf

The original repo we were using is: https://github.com/utrack/gin-csrf

Basically, its just a bit rubbish, but it was also hard coded to only support a webserver using a single session.

Its under an MIT license so we can move it in-house (out of my name etc) if we have any requirement. But I am also happy to support it personally.... so its up to you.

https://github.com/srbry/sessions

The original repo is: https://github.com/gin-contrib/sessions

While this repo already had support for multiple sessions, it only supported them using a single store. Meaning we had to be all cookies, or all redis. Which we could have made work, but wasn't ideal.

We have an open PR for this work: gin-contrib/sessions#144
Also a tracking issue: gin-contrib/sessions#124

Once this gets merged we should repoint our dependency at the main repo instead of the fork.

@codecov
Copy link

codecov bot commented Dec 17, 2021

Codecov Report

Merging #48 (1259506) into main (a981e21) will decrease coverage by 1.04%.
The diff coverage is 9.52%.

❗ Current head 1259506 differs from pull request most recent head 30857d4. Consider uploading reports for the commit 30857d4 to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##             main      #48      +/-   ##
==========================================
- Coverage   59.00%   57.96%   -1.05%     
==========================================
  Files          13       13              
  Lines         461      471      +10     
==========================================
+ Hits          272      273       +1     
- Misses        162      171       +9     
  Partials       27       27              
Impacted Files Coverage Δ
webserver/webserver.go 0.00% <0.00%> (ø)
authenticate/authenticate.go 63.30% <11.11%> (-4.03%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a981e21...30857d4. Read the comment docs.

@lgtm-com
Copy link

lgtm-com bot commented Dec 17, 2021

This pull request introduces 1 alert when merging 36c52f3 into a981e21 - view on LGTM.com

new alerts:

  • 1 for Log entries created from user input

This is to prevent race conditions with token refreshing
@lgtm-com
Copy link

lgtm-com bot commented Dec 20, 2021

This pull request introduces 1 alert when merging c04855e into a981e21 - view on LGTM.com

new alerts:

  • 1 for Log entries created from user input

@lgtm-com
Copy link

lgtm-com bot commented Dec 20, 2021

This pull request introduces 1 alert when merging 1259506 into a981e21 - view on LGTM.com

new alerts:

  • 1 for Log entries created from user input

- Only refresh session on API calls to blaise, not static asset loading
- Don't check session expiry if the button being clicked is save and
  sign out
- Use redis backed session management for user session
- Use cookie backed session for CSRF
- 30 day expiry for CSRF
- 1 day expiry for all redis backed session data
- Add seperate session validation session to ensure refreshed tokens are
  not valid after logout - fixes race condition where the JWT is both
  cleared and written and the same time (within the same ms)
@lgtm-com
Copy link

lgtm-com bot commented Dec 21, 2021

This pull request introduces 2 alerts and fixes 1 when merging 30857d4 into a981e21 - view on LGTM.com

new alerts:

  • 2 for Log entries created from user input

fixed alerts:

  • 1 for Log entries created from user input

@srbry srbry marked this pull request as ready for review December 21, 2021 17:04
Copy link
Contributor

@JAWilliamsONS JAWilliamsONS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visual inspection of the code looks good 👍
Deployed to my sandbox (jw02) fine
Tested - sign in and click 'save and sign out' in CAWI (DST2111Z) multiple times
Cloned code to local machine - unit tests pass locally

@JAWilliamsONS JAWilliamsONS merged commit 9927662 into main Dec 22, 2021
@Riceyo Riceyo deleted the BLAIS5-2965 branch January 26, 2022 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants