Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scanning into a recursive struct #57

Open
n-e opened this issue Jun 24, 2021 · 10 comments
Open

Scanning into a recursive struct #57

n-e opened this issue Jun 24, 2021 · 10 comments
Labels
help wanted Extra attention is needed

Comments

@n-e
Copy link

n-e commented Jun 24, 2021

Hi,

I'd like to scan into a recursive struct:

type Node struct {
	ID       string
	Parent   *Node
}
var nodes []*Node
err := sqlscan.Select(ctx, conns.PgClient, &nodes, `select id, null as parent from nodes where parent_id is null`)

This causes an infinite loop as scany introspects struct recursively.

Any tips for achieving that? I might be missing something obvious since I'm new to Go and scany. Ideally I'd like not to touch the Node struct definition since in my real-life code it is auto-generated. Also I'd like to avoid using db:"-" on Parent since I might want to get the immediate parent with a join.

Thanks!

@georgysavva
Copy link
Owner

Thank you for opening this issue and discovering such an edge case!
Scany indeed iterates all fields recursively in getColumnToFieldIndexMap() func and gets into an infinite loop with such a struct.

To solve that issue we can introduce a threshold like 2 or 3: how many recursive loops are allowed. When we do the broad-first traversal we need to count the number of parents that have the same type as the current element and ensure that it is not bigger when the threshold, otherwise we need to skip that element and don't propagate further.

Let me know if that solution solves your problem or feel free to propose how you see it.

@georgysavva georgysavva added the help wanted Extra attention is needed label Jul 1, 2021
@n-e
Copy link
Author

n-e commented Jul 5, 2021

Hi,

Thanks for your answer. Indeed that would solve my problem!

@CyganFx
Copy link

CyganFx commented Dec 29, 2022

hi, is there any development on this issue? ran into the same problem today

@georgysavva
Copy link
Owner

Hi @CyganFx, there was an attempt to add this feature to scany: #60, but the PR got inactive. I am not currently working on this feature. If you are interested in helping, I would gladly accept a PR from you.

@anton7r
Copy link

anton7r commented Feb 27, 2023

getColumnToFieldIndexMap() could alternatively return map[reflect.Type]map[string]int. And next make map[string]int[] in the structScan() method while iterating through the column dbTags.

The approach would be fairly complicated when we compare to limiting the recursion. But it would be fully cacheable and multithreadable with sync.Map.

@sylr
Copy link

sylr commented Mar 2, 2023

Ran into this issue as well.

@georgysavva
Copy link
Owner

Thanks for the feedback, guys. Would any of you like to tackle this?

@anton7r
Copy link

anton7r commented Mar 7, 2023

Could try to tackle this issue in the near future. sqlx/reflectx/reflect.go could be used as a point of reference.

@anton7r
Copy link

anton7r commented Mar 8, 2023

Should we implement caching of getColumnToFieldIndexMap() at the same time?

sync.Mapwould work out of the box fairly well, but it doesn't scale as well as some third party synchronized maps. puzpuzpuz/xsync.MapOf would be better at scalability since it splits the map in to buckets. And the buckets themselves have the lock instead of sync.Map having a lock on the entire map. But third party map implementations can't directly hash reflect.Type and to achieve native performance with the third party map implementation requires few hacks with unsafe.Pointers shown in this blog post https://www.dolthub.com/blog/2022-12-19-maphash/

@georgysavva
Copy link
Owner

@anton7r thank you for deciding to tackle this issue. I appreciate that.

Yes, you are also welcome to add caching for the reflection map. Here is an issue requesting exactly that: #25. Just do it as a separate PR because the feature requested in this PR and map caching aren't related.

For caching, I would go with the default sync.Map implementation or just the native map[..]... protected by a sync.RWMutex{}. A single global lock with no further optimization is okay because the only time the lock is activated is when we write to the cache map, which happens only when a new type definition is being used with scany. And since the number of types in an application is finite, I expected most of them to get cached when the application starts serving incoming requests. After that, all operations on the cache map are read-only, meaning sync.RWMutex{} doesn't prevent them from working in parallel. Correct me if I am wrong here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants