Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RSDK-9990- Expose SOCKS proxy as fallback dialer #414

Merged
merged 8 commits into from
Mar 12, 2025

Conversation

cheukt
Copy link
Member

@cheukt cheukt commented Mar 10, 2025

this change will make SOCKS proxy a fallback option (if SocksProxyEnvVar is specified), but will not actively replace a connection if it is already established. If OnlySocksProxyEnvVar is specified, only the SOCKS connection is used.

It adds a small delay to the SOCKs dialer as a way to prioritize a connection made without the SOCKS dialer.

Conceptually, pretty similar to our Dial code and it borrows heavily from the net.DialContext code.

tested a few scenarios:

  • only SOCKS initially, then a network blip and turn on both phone proxy and wifi (wifi connected in the end)
  • only wifi initially, then a network blip and turn on both phone proxy and wifi (wifi connected in the end)
  • wifi and SOCKS initially, then a network blip and only turn on phone proxy (SOCKS connected in the end)
  • wifi and SOCKS initially, then a network blip and only turn on wifi (wifi connected in the end)

cc: @dgottlieb if interested

@cheukt cheukt requested a review from benjirewis March 10, 2025 21:42
@viambot viambot added the safe to test Mark as safe to test label Mar 10, 2025

// OnlySocksProxyEnvVar is the name of an environment variable used if all network
// traffic should be done through SOCKS5.
OnlySocksProxyEnvVar = "ONLY_SOCKS_PROXY"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] Will there be an equivalent ONLY_INTERNET? Or, is the idea that having SOCKS_PROXY unset means "internet only"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that should be ok, + internet will almost always be prioritized/chosen even without this knob

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline; this is also helpful for development, as we can force all traffic through the SOCKS proxy even with a WiFi connection (ssh still possible.)

rpc/dialer.go Outdated
return nil, fmt.Errorf("error creating SOCKS proxy dialer to address %q from environment: %w",
proxyAddr, err)
}
conn, err := dialer.(proxy.ContextDialer).DialContext(ctx, network, addr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you have to cast to proxy.ContextDialer here? Is DialContext not available on the SOCKS5 dialer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, by default it only has Dial

@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Mar 11, 2025
Copy link
Member

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for discussion offline 🫡


// OnlySocksProxyEnvVar is the name of an environment variable used if all network
// traffic should be done through SOCKS5.
OnlySocksProxyEnvVar = "ONLY_SOCKS_PROXY"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline; this is also helpful for development, as we can force all traffic through the SOCKS proxy even with a WiFi connection (ssh still possible.)

// the fallback immediately (in 0 nanoseconds).
fallbackTimer.Reset(0)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there not be a

case <-ctx.Done():

here? Will this not spawn infinite SOCKS-proxy-dialing goroutines if neither WiFi not SOCKS proxy are available?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once primary and fallback dial functions are done, we will exit because of L359. If ctx gets cancelled, the two dials will end and also end up at L359

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool; assume you mean L364 in the new line numbers.

fallback = res
}
if primary.done && fallback.done {
return nil, primary.error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[q] Can you explain what's happening in this case? No connection can be returned even though they both completed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if both primary and fallback are done with errors, this means neither connection attempt succeeded. return the error from the primary dial attempt in that case.

Co-authored-by: Benjamin Rewis <[email protected]>
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Mar 11, 2025
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Mar 11, 2025
Copy link
Member Author

@cheukt cheukt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some comments but definitely point out other confusing bits if you find any!

rpc/dialer.go Outdated
return nil, fmt.Errorf("error creating SOCKS proxy dialer to address %q from environment: %w",
proxyAddr, err)
}
conn, err := dialer.(proxy.ContextDialer).DialContext(ctx, network, addr)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, by default it only has Dial

fallback = res
}
if primary.done && fallback.done {
return nil, primary.error
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if both primary and fallback are done with errors, this means neither connection attempt succeeded. return the error from the primary dial attempt in that case.

// the fallback immediately (in 0 nanoseconds).
fallbackTimer.Reset(0)
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once primary and fallback dial functions are done, we will exit because of L359. If ctx gets cancelled, the two dials will end and also end up at L359

@cheukt cheukt requested a review from benjirewis March 11, 2025 21:09
Copy link
Member

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a question about fallback delay.

// the fallback immediately (in 0 nanoseconds).
fallbackTimer.Reset(0)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool; assume you mean L364 in the new line numbers.

rpc/dialer.go Outdated
go dialer(primaryCtx, primaryDial, true)

// wait a small amount before starting the fallback dial (to prioritize the primary connection method).
fallbackTimer := time.NewTimer(time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time.Second

If I'm reading Golang source code correctly, their fallback delay is customizable and defaults to 300ms. I understand not exposing this as a knob, but any reason to bump to 1s? Did you see SOCKS being "over" prioritized with 300ms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, let me test and drop to 300ms if no issues

@viambot viambot removed the safe to test Mark as safe to test label Mar 12, 2025
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Mar 12, 2025
@viambot viambot added safe to test Mark as safe to test and removed safe to test Mark as safe to test labels Mar 12, 2025
@cheukt cheukt changed the title RSDK-9991 - Use SOCKS proxy as fallback RSDK-99910- Expose SOCKS proxy as fallback dialer Mar 12, 2025
@cheukt cheukt changed the title RSDK-99910- Expose SOCKS proxy as fallback dialer RSDK-9990- Expose SOCKS proxy as fallback dialer Mar 12, 2025
@cheukt cheukt requested a review from benjirewis March 12, 2025 19:17
@cheukt
Copy link
Member Author

cheukt commented Mar 12, 2025

@benjirewis given the new direction of the ticket around dynamic switching of connection types, going to cut a release here first to fully support agent using the socks proxy. I also unexported one of the functions to make it easier to change the impls. tested manually with the same test cases and also with the agent

@cheukt
Copy link
Member Author

cheukt commented Mar 12, 2025

see viamrobotics/agent#79 for corresponding agent changes

Copy link
Member

@benjirewis benjirewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM!

@cheukt cheukt merged commit 447cd0d into viamrobotics:main Mar 12, 2025
6 checks passed
@cheukt cheukt deleted the use-socks-proxy-as-fallback branch March 12, 2025 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
safe to test Mark as safe to test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants