Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autoconfigure network without DHCP #64

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

slp
Copy link
Collaborator

@slp slp commented Sep 26, 2024

Passt includes a DHCP server to allow guests to easily configure the network without being aware passt is on the other side. But we are aware of that, so we can take advantage of it.

Instead of using DHCP, read the configuration output from passt, marshall it into some environment variables, pass it to the guest, and have krun-guest read it and apply it using rtnetlink.

By doing that, we reduce the startup time in half, from...

$ time krun /bin/false
(...)
real 0m4,301s

... to ...

$ time krun /bin/false
(...)
real 0m1,966s

crates/krun/Cargo.toml Outdated Show resolved Hide resolved
@@ -15,7 +15,8 @@ use krun::utils::env::find_in_path;
use log::debug;
use rustix::process::{getrlimit, setrlimit, Resource};

fn main() -> Result<()> {
#[tokio::main]
Copy link
Collaborator

@teohhanhui teohhanhui Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we go multi-threaded, some of our safety assumptions would no longer hold, e.g.

https://github.com/slp/krun/blob/44417f6e802a934f9ca164ace647f99da009546d/crates/krun/src/guest/user.rs#L21-L23

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using tokio is a requirement of the rtnetlink crate. I think we can avoid the safety issues with something like this:

    let rt = tokio::runtime::Runtime::new().unwrap();
    rt.block_on(async {
        if let Err(err) = configure_network().await {
            eprintln!("Failed to configure network, continuing without it: {err}");
        }
    });

Copy link
Collaborator

@teohhanhui teohhanhui Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use tokio's current-thread scheduler?

Apparently it could be selected like this:

#[tokio::main(flavor = "current_thread")]

But it looks like we will still need to make sure nothing uses spawn_blocking: https://users.rust-lang.org/t/why-does-tokios-current-thread-flavor-not-be-single-threaded/85129

How? Probably by using Handle::block_on? But hmm... There's not really any point in choosing the current-thread scheduler then? lol

Oh, apparently this can't work:

When this is used on a current_thread runtime, only the Runtime::block_on method can drive the IO and timer drivers, but the Handle::block_on method cannot drive them. This means that, when using this method on a current_thread runtime, anything that relies on IO or timers will not work unless there is another thread currently calling Runtime::block_on on the same runtime.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we aren't planning on using tokio anywhere else, I think I feel more confident isolating it's use to configure_network.

@slp slp force-pushed the avoid-dhcp branch 2 times, most recently from e79250b to 2cf1578 Compare September 27, 2024 15:44
@@ -42,7 +42,12 @@ fn main() -> Result<()> {

setup_fex()?;

configure_network()?;
let rt = tokio::runtime::Runtime::new().unwrap();
rt.block_on(async {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the docs correctly, this will NOT have the desired effect of ensuring the safety of the env::set_var calls.

This runs the given future on the current thread, blocking until it is complete, and yielding its resolved result. Any tasks or timers which the future spawns internally will be executed on the runtime.

When the multi thread scheduler is used this will allow futures to run within the io driver and timer context of the overall runtime.

Any spawned tasks will continue running after block_on returns.

https://docs.rs/tokio/latest/tokio/runtime/struct.Runtime.html#method.block_on

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right. Another approach could be, since I don't think any of the other tasks krun-guest puts in motion depends on the network, simply move the network configuration to krun-server which already uses tokio and doesn't use env::set_var.

}
handle.route().add().v4().gateway(router).execute().await?;
fs::write("/etc/resolv.conf", format!("nameserver {}", router))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are assuming that the default gateway and the dns server are the same. While true on most residential networks, it is not a correct assumption in general, and even there, some of us very intentionally override the ISP dns server to something else.

Copy link
Collaborator

@teohhanhui teohhanhui Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inside the VM, so passt is doing its thing. In our current usage (dhclient), /etc/resolv.conf inside the VM would contain a nameserver entry pointing to the host's default gateway...

https://man.archlinux.org/man/passt.1.en#Handling_of_traffic_with_local_destination_and_source_addresses

So as far as I can tell, the code here is correct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/etc/resolv.conf inside the VM would contain a nameserver entry pointing to the host's default gateway.

That is the problem that i am pointing out, in general nameserver != default gateway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More correct explanation here: #17 (comment)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WhatAmISupposedToPutHere Uhh, but I think I get what you mean now... The code here is only correct in the case of passt doing such DNS forwarding... If the resolver on the host is not a loopback address, this code is wrong.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I've updated the PR to actually pick up all the DNS configuration from passt, which was the right thing to do.

Passt includes a DHCP server to allow guests to easily configure the
network without being aware passt is on the other side. But we _are_
aware of that, so we can take advantage of it.

Instead of using DHCP, read the configuration output from passt,
marshall it into some environment variables, pass it to the guest, and
have krun-guest read it and apply it using rtnetlink.

By doing that, we reduce the startup time in half, from...

$ time krun /bin/false
(...)
real    0m4,301s

... to ...

$ time krun /bin/false
(...)
real    0m1,966s

In addition of reducing the boot time, this potentially will prevent
some of the dhcp issues we've seen in the past.

Signed-off-by: Sergio Lopez <[email protected]>
@sbrivio-rh
Copy link

sbrivio-rh commented Oct 22, 2024

Even though we're unlikely to break the output scraping you're adding here, I wonder if we could avoid that by carrying a minimal script for the ISC's dhclient. I guess dhcpcd won't cause any significant delay as it doesn't do all the checks that the default dhclient script on Fedora does.

In passt's test suite, we need to bring up guests fast, and at the same time check that DHCP works, so we use this script in the guest.

By the way, I wonder if you need to make this fast for IPv6 as well. There, you could disable neighbour solicitations in the guest, configure the address via DHCPv6 (or via NDP and SLAAC, bringing up the interface) with the nodad attribute, and then re-enable neighbour solicitations (full details at the bottom of this email).

Side note: I don't know if it works for you, but passt doesn't actually care about what address and route you're using. If the address is not the same you have on the host, it will just NAT things. You could happily use link-local addresses (even for IPv4, say, 169.254.1.1). But sure, you'll need to convey DNS information somehow.

Anyway, regardless of all this, the current patch looks fine to me.

@sbrivio-rh
Copy link

But sure, you'll need to convey DNS information somehow.

Actually, even for that, you could hardcode things in a way similar to what Podman does with pasta(1): a single link-local address in /etc/resolv.conf in the container (here, guest) and tell passt to map that (--dns-forward) to whatever resolver you have on the host. On the other hand:

You could happily use link-local addresses (even for IPv4, say, 169.254.1.1)

this way, you would lose network transparency (making the guest look like the host in terms of addresses and routes), which might be important for some applications. I wanted to try out the tricks I suggested above to make DHCP fast, but I'm currently hitting:

   Compiling devices v0.1.0 (/home/sbrivio/libkrun/src/devices)
error: couldn't read `src/devices/src/virtio/fs/linux/../../../../../../init/init`: No such file or directory (os error 2)
  --> src/devices/src/virtio/fs/linux/passthrough.rs:33:29
   |
33 | static INIT_BINARY: &[u8] = include_bytes!("../../../../../../init/init");
   |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: this error originates in the macro `include_bytes` (in Nightly builds, run with -Z macro-backtrace for more info)

error: could not compile `devices` (lib) due to 1 previous error

while trying to build libkrun. I might get back to it in a few days unless there's an obvious solution.

@sbrivio-rh
Copy link

From r/Showerthoughts: netlink over vsock

@slp
Copy link
Collaborator Author

slp commented Oct 24, 2024

this way, you would lose network transparency (making the guest look like the host in terms of addresses and routes), which might be important for some applications. I wanted to try out the tricks I suggested above to make DHCP fast, but I'm currently hitting:

   Compiling devices v0.1.0 (/home/sbrivio/libkrun/src/devices)
error: couldn't read `src/devices/src/virtio/fs/linux/../../../../../../init/init`: No such file or directory (os error 2)
  --> src/devices/src/virtio/fs/linux/passthrough.rs:33:29
   |
33 | static INIT_BINARY: &[u8] = include_bytes!("../../../../../../init/init");
   |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: this error originates in the macro `include_bytes` (in Nightly builds, run with -Z macro-backtrace for more info)

error: could not compile `devices` (lib) due to 1 previous error

while trying to build libkrun. I might get back to it in a few days unless there's an obvious solution.

You're probably running cargo build ... instead of make. Run make init/init once, and then you can rely solely con cargo.

@sbrivio-rh
Copy link

You're probably running cargo build ... instead of make.

Oops, of course.

Run make init/init once, and then you can rely solely con cargo.

Right, it works, thanks! I'll try things out in a bit.

@sbrivio-rh
Copy link

Right, it works, thanks! I'll try things out in a bit.

...currently driving me mad (yes, I have libkrun and libkrunfw installed):

  = note: /usr/bin/ld: /home/sbrivio/muvm/target/debug/deps/muvm-dfc2b43586ae1702.73tss62c465d1mti0x3tsc1b1.rcgu.o: in function `muvm::add_ro_disk':
          /home/sbrivio/muvm/crates/muvm/src/bin/muvm.rs:51:(.text._ZN4muvm11add_ro_disk17hc95bc1821f59bb8eE+0x362): undefined reference to `krun_add_disk'
          collect2: error: ld returned 1 exit status
          
  = note: some `extern` functions couldn't be found; some native libraries may need to be installed or have their path specified
  = note: use the `-l` flag to specify native libraries to link
  = note: use the `cargo:rustc-link-lib` directive to specify the native libraries to link with Cargo (see https://doc.rust-lang.org/cargo/reference/build-scripts.html#rustc-link-lib)

error: could not compile `muvm` (bin "muvm") due to 1 previous error

@teohhanhui
Copy link
Collaborator

teohhanhui commented Oct 25, 2024

@sbrivio-rh I think it's a cache problem. Try doing cargo clean and try again?

EDIT: Ohhhhh... #84 (comment)

@sbrivio-rh
Copy link

EDIT: Ohhhhh... #84 (comment)

Thanks, I would have never found that!

@sbrivio-rh
Copy link

sbrivio-rh commented Oct 25, 2024

Almost there...

NET=1, of course...

``` $ strace -f ./muvm true

[...]

[pid 4013940] execve("/usr/local/bin/passt", ["passt", "-q", "-f", "-t", "3334:3334", "--fd", "6"], 0x7ffd61f5fc80 /* 23 vars /) = -1 ENOENT (No such file or directory)
[pid 4013940] execve("/usr/bin/passt", ["passt", "-q", "-f", "-t", "3334:3334", "--fd", "6"], 0x7ffd61f5fc80 /
23 vars */ <unfinished ...>
[pid 4013939] <... clone3 resumed>) = 4013940
[pid 4013939] munmap(0x7f1bfb842000, 36864) = 0
[pid 4013939] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 4013939] fcntl(3, F_GETFD <unfinished ...>
[pid 4013940] <... execve resumed>) = 0
[pid 4013939] <... fcntl resumed>) = 0x1 (flags FD_CLOEXEC)
[pid 4013939] close(3) = 0
[pid 4013939] write(2, "Error: ", 7 <unfinished ...>
Error: [pid 4013940] brk(NULL <unfinished ...>
[pid 4013939] <... write resumed>) = 7
[pid 4013940] <... brk resumed>) = 0x563b6c492000
[pid 4013939] write(2, "Failed to configure net mode", 28Failed to configure net mode) = 28
[pid 4013939] write(2, "\n\nCaused by:", 12 <unfinished ...>

Caused by:[pid 4013940] mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
[pid 4013939] <... write resumed>) = 12
[pid 4013940] <... mmap resumed>) = 0x7f1605c01000
[pid 4013939] write(2, "\n", 1
) = 1
[pid 4013940] access("/etc/ld.so.preload", R_OK <unfinished ...>
[pid 4013939] write(2, " ", 4 <unfinished ...>
[pid 4013940] <... access resumed>) = -1 ENOENT (No such file or directory)
[pid 4013939] <... write resumed>) = 4
[pid 4013939] write(2, "Operation not supported", 23 <unfinished ...>
Operation not supported[pid 4013940] openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 4013939] <... write resumed>) = 23
[pid 4013939] write(2, " (os error ", 11 <unfinished ...>
(os error [pid 4013940] <... openat resumed>) = 3
[pid 4013939] <... write resumed>) = 11
[pid 4013940] fstat(3, <unfinished ...>
[pid 4013939] write(2, "95", 295 <unfinished ...>


...but I can't see anything returning -95 / `EOPNOTSUPP`...
</details>

@sbrivio-rh
Copy link

So, at least on my setup, this whole delay is actually caused by dhcpcd(8):

$ ./passt -f -p /tmp/muvm.pcap
$ time ./muvm --passt-socket=/tmp/passt_3.socket /bin/false 2>/dev/null

real	0m6.445s
user	0m0.916s
sys	0m0.549s
$ tshark -r /tmp/muvm.pcap 
    1   0.000000      0.0.0.0 → 255.255.255.255 DHCP 342 DHCP Discover - Transaction ID 0x356a31d7
    2   0.000148 88.198.0.161 → 88.198.0.164 DHCP 342 DHCP Offer    - Transaction ID 0x356a31d7
    3   0.000513      0.0.0.0 → 255.255.255.255 DHCP 342 DHCP Request  - Transaction ID 0x356a31d7
    4   0.000542 88.198.0.161 → 88.198.0.164 DHCP 342 DHCP ACK      - Transaction ID 0x356a31d7
    5   0.023915 5a:94:ef:e4:0c:ee → Broadcast    ARP 42 Who has 88.198.0.164? (ARP Probe)
    6   1.448584 5a:94:ef:e4:0c:ee → Broadcast    ARP 42 Who has 88.198.0.164? (ARP Probe)
    7   2.828025 5a:94:ef:e4:0c:ee → Broadcast    ARP 42 Who has 88.198.0.164? (ARP Probe)
    8   4.830970 5a:94:ef:e4:0c:ee → Broadcast    ARP 42 ARP Announcement for 88.198.0.164

because, by default (without --noarp), it ARP-probes the address we just assigned. If I do this:

diff --git a/crates/muvm/src/guest/net.rs b/crates/muvm/src/guest/net.rs
index b47f2e1..aa74ce0 100644
--- a/crates/muvm/src/guest/net.rs
+++ b/crates/muvm/src/guest/net.rs
@@ -41,7 +41,7 @@ pub fn configure_network() -> Result<()> {
     };
     if let Some(dhcpcd_path) = dhcpcd_path {
         let output = Command::new(dhcpcd_path)
-            .args(["-M", "--nodev", "eth0"])
+            .args(["-M", "--nodev", "eth0", "--noarp"])
             .output()
             .context("Failed to execute `dhcpcd` as child process")?;
         debug!(output:?; "dhcpcd output");

I get:

$ time ./muvm --passt-socket=/tmp/passt_3.socket /bin/false 2>/dev/null

real	0m1.404s
user	0m0.899s
sys	0m0.583s

For some reason I couldn't get IPv6 working in the guest so I didn't try that (at least, not yet).

@sbrivio-rh
Copy link

For some reason I couldn't get IPv6 working in the guest so I didn't try that (at least, not yet).

Oh, it's simply disabled in the kernel configuration from libkrunfw (config-libkrunfw_x86_64):

# CONFIG_IPV6 is not set

and this is there starting from the very beginning, libkrunfw commit 443c03426a73. Is there a specific reason for it? Maybe because TSI didn't/doesn't support that? Or you don't need it at all in libkrun/muvm? Does it make sense that I spend a moment trying to "fix" that? @slp?

@slp
Copy link
Collaborator Author

slp commented Oct 28, 2024

and this is there starting from the very beginning, libkrunfw commit 443c03426a73. Is there a specific reason for it? Maybe because TSI didn't/doesn't support that? Or you don't need it at all in libkrun/muvm? Does it make sense that I spend a moment trying to "fix" that? @slp?

That was disabled as part of the work to slim down the guest kernel. I'm okay with enabling it if there's a user demand for the feature, but so far I haven't heard of anyone requesting it.

@teohhanhui
Copy link
Collaborator

It just feels ethically wrong to be contributing to holding back IPv6 adoption. 🙈

@asahilina
Copy link
Member

Oof, libkrunfw doesn't have CONFIG_IPV6? Yeah, we really should fix that... even if things work with IPv4 only for most people, it's not sustainable to not have any kind of v6 networking within the VM. Even if v4 only works, IPv4 infrastructure is increasingly going through CGNAT while IPv6 infrastructure is not, and that means IPv6 is often faster where it is available (that is the case for me most of the time at home).

@sbrivio-rh
Copy link

That was disabled as part of the work to slim down the guest kernel. I'm okay with enabling it if there's a user demand for the feature, but so far I haven't heard of anyone requesting it.

There's one problem with that and passt: if passt doesn't find an IPv4 interface on the host, it disables IPv4, and only smoke signals remain at that point.

This is by default at least: you can still enable IPv4 even on an IPv6-only host, but then you need to give explicit addresses and routes.

I can try to enable CONFIG_IPV6 in libkrunfw and see what happens. I'll give it a try at some point this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants