Use custom service account for search dataform#3800
Conversation
ebcf0cb to
e237f69
Compare
|
Since this is Terraform, I assume the approach to testing is to get this merged and then test it out on integration before applying the changes in production. |
aaronfowles
left a comment
There was a problem hiding this comment.
The terraform plan looks like it failed with what could be a transient error. Might be worth triggering a retry from the UI to see if that resolves...
Thanks @aaronfowles ! I reran the plan and it looks like it's working now. |
hannako
left a comment
There was a problem hiding this comment.
This makes sense to me based on the confluence documentation and my reading of the gcp docs. But as discussed with @emmalowe in person, our roll out approach will be to merge, apply the change in integration and confirm all looks as expected before applying the change in production
e9c171e to
06e96b1
Compare
GCP is enforcing a new access control model for Dataform called "strict act-as mode" [1]. As part of this, GCP is disabling the ability for Dataform instances to be run using the Default Dataform service Account. All Dataform workflows must switch to a custom service account, which the Dataform Service Agent is given permissions on [2]. Steps covered: - Change BigQuery internal project permissions from default service account to custom service account - Give default service account permissions to impersonate custom service account - Add custom service account to repo set up (to use this as the default account for running workflows) - Give custom service account secrets permissions to connect to our dataform Github repo [1] https://docs.cloud.google.com/dataform/docs/strict-act-as-mode [2] https://docs.cloud.google.com/dataform/docs/access-control#grant-roles-auto-workflows
06e96b1 to
1756975
Compare
The permissions listed are not cross-project, so this comment is misleading. The comment is a hang-over from when the BigQuery permissions were assigned across all search environments at once. See #2109
8bc3962 to
0d46886
Compare
The Search Team's Dataform pipelines [1] read from specific datasets in the GA4 Analytics project. Here we add those specific permissions, in line with the principle of least privilege. Because some of the pipelines include a table wildcard [2], we need to add a new custom role that includes the list permission. It seems that previously these permissions were added in the GCP UI via click-ops. [1] https://github.com/alphagov/search-api-v2-dataform [2] e.g. https://github.com/alphagov/search-api-v2-dataform/blob/main/definitions/search-intraday.sqlx#L66
0d46886 to
e6323a6
Compare
|
Hello reviewers 👋🏻 Apologies for sitting on this a while - I realised I had to do some more thinking on cross-project permissions before I could merge this. The short version is, our Dataform pipelines need permission to read from the GA4 Analytics Project, so I've added those in a new commit. @AP-Hunt it might be most efficient if you could review this first, please? Most of my questions (that I'll leave inline) are about Terraform best practice, so you might be best to answer those. @hannako feel free to have a look too, since we've been chatting about this already. Thanks 🌟 |
| ] | ||
| } | ||
|
|
||
| locals { |
There was a problem hiding this comment.
Since we only need to read from 2/17 datasets in the GA4 Analytics project, I've added these as dataset permissions instead of project permissions (which would give access to all the datasets). This might be overkill.
I've chosen to use iam_member instead of iam_binding for two reasons:
- I'm not confident how dataset permissions interact with project permissions, and I didn't want to wipe out any permissions for anyone else. From what I've read, dataset binding and project bindings together are additive, so that shouldn't be the case, but I'm not sure.
- It seems like some of GA4 Analytics permissions have been set up via ClickOps and again, I don't want to delete permissions for anyone else. But maybe I need to do more work to check what is there and rectify any issues because we probably should use binding if we can (see point 1).
If the permission set up looks okay, I'm not clear on how this new code should be organised. I've seen locals.tf files elsewhere. Maybe we should also have a separate file for dataset permissions 🤷🏻♀️
| title = "GDS BQ read access" | ||
| } | ||
|
|
||
| resource "google_project_iam_custom_role" "gds_bigquery_read_and_list_access" { |
There was a problem hiding this comment.
I'm adding this custom role, because for some of our pipelines that use a table wildcard, we need the bigquery.tables.list permission, which isn't in gds_bigquery_read_access. This also might be overkill, since I could just add bigquery.tables.list to gds_bigquery_read_access.
There was a problem hiding this comment.
I think because I'm only adding a list permission (rather than create or delete), it's not worth making a separate role, but I'm curious to know what other people think.
GCP is enforcing a new access control model for Dataform called "strict act-as mode". As part of this, GCP is disabling the ability for Dataform instances to be run using the Default Dataform service Account, so all Dataform workflows must switch to a custom service account, which the Dataform Service Agent is given permissions on. The custom service account must include the iam.serviceAccounts.actAs permission to configure Dataform workflows.
Steps covered here:
See: