- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.2k
ci: fix a race in TestExecIn and TestExecInTTY #4445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a6c839a    to
    ee42499      
    Compare
  
    | 
 Please add this into the commit message. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just not expect cat in ps output.
| 
 Or do a retry. | 
| 
 I think because these are integration tests, so maybe we need to keep this to check we didn't exec into an unrelated container? | 
Fixes: opencontainers#4437 We can use a chan to wait the output from init process. After we received the content, it means that the init process has started. Then we can exec into this container to use ps command to check the init process and the exec process are both exist. Signed-off-by: lifubang <[email protected]>
ee42499    to
    f08116b      
    Compare
  
    | _ = stdoutR.Close() | ||
| _ = stdoutW.Close() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just defer the close, instead of a function? We are ignoring the error anyways.
I mean it for all the cases, not just this.
| _ = stdinR.Close() | ||
| defer stdinW.Close() //nolint: errcheck | ||
| defer func() { | ||
| _, _ = stdinW.Write([]byte("hello")) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we write this? To stop the go routine? Can you add a comment or the message written can be self-explanatory instead of this hello?
If it's that, is it needed? Won't the close of the pipe already free the goroutine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the proper fix belongs to func (c *Container) exec(), which signals the runc init to go ahead and exec the real init process (cat in our case). I see there is some complicated logic in there but it looks it's just to handle the error case.
What it does (in a normal, non-error) case is opens exec fifo, reads from it, and removes the exec fifo file.
The other side (runc init, see the tail of func (l *linuxStandardInit) Init()):
- writes 0to the exec fifo (after which the parent returns from container.Start and thus container is expected to be running);
- runs the StartContainerhook;
- calls utils.UnsafeCloseFrom
- calls system.Exec(name, l.config.Args, os.Environ())
The test execs the second process after (1) has happened, and if it manages to execute before (4) then we have this issue.
I guess that #4325 makes the race window smaller because os.Environ takes some time (it actually copies the environment), but we still haven't merged that.
I wish there is a cheap way to find out if runc init has completed. Maybe read /proc/$INIT_PID/exe to see it's not /proc/self/exe? Or something similar, ideally polling on something.
And, this can be added to func (c *Container) exec() so when it returns we're sure the container is running.
Fixes: #4437
We can use a chan to wait the output from init process. After we received the content,
it means that the init process has started. Then we can exec into this container to use
ps command to check the init process and the exec process are both exist.
Signed-off-by: lifubang [email protected]