Skip to content

win32_isatty() dont call a mostly failing syscall, NT->WIN err conv is slow #23375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: blead
Choose a base branch
from

Conversation

bulk88
Copy link
Contributor

@bulk88 bulk88 commented Jun 17, 2025

GetFileType() has shortcuts, it first checks against -1 -2 -3, then checks
against PEB's 3 master IN, OUT, ERR kernel handles, then checks if it is
tagged/unaligned [open secret, not frozen API], and only then does the
NtRequestWaitReplyPort() RPC call to csrss.exe instead of
NtQueryVolumeInformationFile(). GetFileType() from kernel32.dll is different
from GetFileType() in kernelbase.dll.

  • This set of changes does not require a perldelta entry.

@tonycoz
Copy link
Contributor

tonycoz commented Jul 9, 2025

It definitely speeds up the test against file type file handles, from ~1.2µs to 0.39µs per call.

It slows down the test against tty handles a little, from ~23.8-24.5µs to 23.4-25.0µs.

@bulk88
Copy link
Contributor Author

bulk88 commented Jul 10, 2025

It definitely speeds up the test against file type file handles, from ~1.2µs to 0.39µs per call.

It slows down the test against tty handles a little, from ~23.8-24.5µs to 23.4-25.0µs.

Good to know it was a verified benchmark improvement? How did you measure it? Just curious. Maybe I can do it as routine.

I didn't benchmark it myself, but that RtlNtErr2WinErr() function made me furious single stepping its Asm code. Its a 0-65000 for() loop with zero optimizations. After 500 iterations I stepped out of the loop. And started to think how to patch the WinPerl interp to stop it.

Its probably worse than 0-65000. Its more like [0x0***-****, 0x4***-****, 0x8***-****, 0xc***-****] x 0-65000, and NO, [0x0***-****, 0x4***-****, 0x8***-****, 0xc***-****] are not optimized to if() else if() else if() else {}. MS devs assumed no sane production process will do high speed, high frequency, failing syscalls, as normal runtime behavior. RtlNtErr2WinErr() isn't something worthy to optimize.

I have another patch somewhere that cuts down the number win32_isatty() calls coming from the POSIX-y PerlIO .c code by an order of magnitude (90%), but I want to get this patch in first, which makes the win32_isatty() impl better, regardless of goodness or badness, of whatever the caller frame's code is doing.

The patch that removes isatty() calls from POSIX-y PerlIO .c code by an order of magnitude (90%) is a cross platform patch, so its definitely another PR.

The Win32 Console APIs aren't known for being I/O speed demons. > to a disk file handle vs console level > to a disk file or console level | to another process is like 10x or 100x the speed to move the same number of MBs.

Plus waking up csrss.exe process to force it to search its global handle table for "junk value handles" from perl.exe made with an "RNG" isn't polite.

I could've done Native API/Asm style optimizations inside win32_isatty() but I decided that is a bad idea, MS in late Win10s era/Win 11 era has done heavy refactoring on cmd.exe, and I don't want to make non-public API assumptions of what a Console I/O handle/opaque integer actually is. So I decided NOT to use the unaligned U32 * test/magic trick for Console I/O handles, or the -1 -2 -3 tricks.

Safer and easier and less thinking and less work to off load responsibility for all the shortcut tricks to GetFileType()@kernel32,dll.

GetFileType(fh) == FILE_TYPE_CHAR sounds like that is frozen MS Public API forever, aslong as you have access to symbol GetFileType()@kernel32,dll it will work forever.

Remember GetFileType()@kernelbase.dll IS NOT identical to GetFileType()@kernel32,dll, but that is irrelevant to WinPerl. All 3, WinPerl, Mingw GCC and MSVC 2022, don't know about and don't link to kernelbase.dll.

MS specifically says kernelbase.dll isn't part of Public API/ABI and isn't part of 1337 API/ABI, and can disappear at any time, see https://learn.microsoft.com/en-us/windows/win32/win7appqual/new-low-level-binaries .

Win RT Apps/MS Mobile App Store walled garden Apps, probably are forbidden from C linking against kernel32,dll. kernel32,dll simple doesn't exist anymore for UWP/Win RT App Store processes. They can only link against kernelbase.dll, and GetFileType()@kernelbase.dll has no idea that the WinOS has user-mode emulated Console I/O Handles. Technically Win RT apps don't link against a file called kernelbase.dll, they use the API Set .dlls in their import tables, and MS can rewire those PE symbol forwarder entries to anywhere .dll-wise at runtime.

I'd have to double check, but I believe the kernelbase.dll file doesn't know that Consoles/cmd.exe/STDIN/STDOUT/STDERR exist on the Windows Platform anywhere inside of itself. Just like ntdll.dll has no clue about STDIN/STDOUT/STDERR. [* (@bulk88 is still wrong >>>) lies! lies! and even more lies! Of course ntdll.dll knows what stdin/stdout/stderr and what the console is, How does autochk.exe work before the Video Card driver is loaded? Perhaps, we should say ntdll.dll doesn't know Win32 has a console 😉 ]

MS's API Sets and kernelbase.dll reorganization makes perfect sense to me. Why would smartphone apps need access to STDIN/STDOUT/STDERR? Why would 48U rack of full of Win Server OS Blade Servers, need to know what user32.dll is, or what is a TUI/Console/command prompt is, or what a DVI/DP/HDMI cable is?

user32.dll and gdi.dll are still single thread handle, single TID, semi-single threaded, synchronous I/O APIs from pre-historic times. Just like a pre-pthreads Unix OS or a pre-pthreads Unix libc.so library.

random facts: ntdll.dll only knows about STDOUT, aka NtDisplayString(). It has no clue what STDIN or STDERR is. But autochk.exe knows how to open a FD to /Device/KeyboardClass, and now you have a fully functioning TUI app!

@tonycoz
Copy link
Contributor

tonycoz commented Jul 10, 2025

Good to know it was a verified benchmark improvement? How did you measure it? Just curious. Maybe I can do it as routine.

It wasn't anything too rigorous:

#!perl
use v5.40;
use Time::HiRes "time";

testit(\*STDIN);
open my $fh, "<", $0 or die;
testit($fh);

sub testit ($myfh) {
    my $start = time();

    for (1 .. 1_000_000) {
	my $x = -t $myfh;
    }
    my $end = time();
    print $end - $start, "\n";
}

tested 3 times each with blead and with your change.

blead results (Ryzen 7 2700):

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
23.8609209060669
1.24526906013489

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
23.7503299713135
1.19657588005066

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
24.4927198886871
1.2960000038147

with your change:

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
24.9973509311676
0.387558937072754

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
25.186952829361
0.397616147994995

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
23.3968820571899
0.391831159591675

Debian (i7-10700F):

tony@venus:.../git/perl6$ ./perl -Ilib ../23375.pl
0.237051963806152
0.199118852615356
tony@venus:.../git/perl6$ ./perl -Ilib ../23375.pl
0.248272895812988
0.191707134246826
tony@venus:.../git/perl6$ ./perl -Ilib ../23375.pl
0.23987603187561
0.188696146011353

Debian (WSL2, Ryzen 7 2700, same hardware as Windows above)

tony@GANYMEDE:~/dev/perl/git/perl$ ./perl -Ilib ../23375.pl
0.769921064376831
0.469972848892212
tony@GANYMEDE:~/dev/perl/git/perl$ ./perl -Ilib ../23375.pl
0.762050151824951
0.463076114654541
tony@GANYMEDE:~/dev/perl/git/perl$ ./perl -Ilib ../23375.pl
0.762561082839966
0.469331026077271

@khwilliamson
Copy link
Contributor

I would merge this if the commit message is improved enough. But I object to its merging as-is. The code itself looks fine to me.

First, the commit message title is too long. GH was forced to wrap it, using ellipses. And what is there doesn't really help me understand what's happening. It is actually factually wrong. This commit doesn't stop calling any syscall. It inserts another syscall first and avoids calling the original one if that one fails. It also assumes a more intimate knowledge of Windows internals than I possess, and I'm sure I'm not alone. "NT->Winn err conv is slow" is something I can guess at what it means. But it shouldn't be in a commit title

Second the commit message body is non-existent, and the comments refer the reader to the p.r. for details. The comments should not refer to an unspecified GH p.r. that someone would have to take steps to track down. It is ok to refer to the commit message that created them. But making a later reader have to go through the extra level of indirection is unacceptable.

Third, the p.r. description isn't very helpful. People reading this want to know what is changing and why. Starting off with a description of the internals of a Windows library function does not meet that need. I myself would not have included it, but if you feel that background is helpful to people more attuned to Windows internals than I and most of the people who will ever read the message, then it should be placed in a separate paragraph later.

The p.r. description should be copied into the commit message in this case. And it is a non-sequitor with its title. Its first sentence needs to expand on what the title says. It doesn't currently.

What it looks to me is that the commit basically finds many failures using a faster but incomplete syscall before falling back to the slow complete one. But I wouldn't have figured that out from any of your descriptions.

The "Let them read code" attitude is not a principal compatible with this project. (Today I learned that Marie Antoinette did not say the similar phrase attributed to her; that claim was first made 50 years after her execution.)

Writing a good commit message for anything but trivial changes takes some effort. We require that pull requests have had sufficient effort not just in the code but in its comments and descriptions.

@@ -4000,7 +4000,10 @@ win32_isatty(int fd)
return 0;
}

if (GetConsoleMode(fh, &mode))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the description says there is a syscall that mostly fails. There is nothing that explains that statement, so we are left to guess about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants