-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8364305: Support AVX10 saturating floating point conversion instructions #26919
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
👋 Welcome back missa! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@missa-prime The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
Webrevs
|
src/hotspot/cpu/x86/x86.ad
Outdated
@@ -7753,6 +7773,20 @@ instruct castDtoX_reg_evex(vec dst, vec src, vec xtmp1, vec xtmp2, kReg ktmp1, k | |||
ins_pipe( pipe_slow ); | |||
%} | |||
|
|||
instruct cast2DtoX_reg_evex(vec dst, vec src, rFlagsReg cr) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vector instruction do not effect EFLAGS register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the EFLAGS register parameters.
src/hotspot/cpu/x86/x86.ad
Outdated
predicate(VM_Version::supports_avx10_2() && | ||
is_integral_type(Matcher::vector_element_basic_type(n))); | ||
match(Set dst (VectorCastD2X src)); | ||
effect(KILL cr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the effect calls.
src/hotspot/cpu/x86/x86.ad
Outdated
@@ -7753,6 +7773,20 @@ instruct castDtoX_reg_evex(vec dst, vec src, vec xtmp1, vec xtmp2, kReg ktmp1, k | |||
ins_pipe( pipe_slow ); | |||
%} | |||
|
|||
instruct cast2DtoX_reg_evex(vec dst, vec src, rFlagsReg cr) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding a CICS flavour of these patterns. As now we have a single instruction to cover entire conversion semantics memory operand patterns will be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with CICS. Could you elaborate? Also, I added memory variants.
@@ -2363,6 +2403,14 @@ void Assembler::evcvttpd2qq(XMMRegister dst, XMMRegister src, int vector_len) { | |||
emit_int16(0x7A, (0xC0 | encode)); | |||
} | |||
|
|||
void Assembler::evcvttpd2qqs(XMMRegister dst, XMMRegister src, int vector_len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add memory operand flavour of these assembler routines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added memory variants of the instructions.
@missa-prime Looks like an interesting patch! Do you think you could add some sort of IR test here, to verify that the correct code is generated on AVX10 vs lower AVX? |
@eme64 Thanks for the suggestion. This patch doesn't modify any IR though, so I'm not sure what IR test(s) to add. I could modify existing tests ( |
…take memory as the source
@missa-prime Could you not match on the mach graph? See example: Maybe another |
src/hotspot/cpu/x86/x86.ad
Outdated
instruct cast2DtoX_mem_evex(vec dst, memory src) %{ | ||
predicate(VM_Version::supports_avx10_2() && | ||
is_integral_type(Matcher::vector_element_basic_type(n))); | ||
match(Set dst (VectorCastD2X src)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume your intent here is to feed the memory operand to the vector cast IR, a memory operand is first loaded into register using LoadVector IR, so a CISC / memory variant of pattern should consume the Load IR such that the operand is directly exposed to the instruction. Checkout https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/x86.ad#L8986
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make a similar change in all the newly added memory patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the scalar and vector memory patterns. I'm not completely sure about the vector ones though, so I'll try and test further.
@@ -87,7 +87,7 @@ public VectorFPtoIntCastTest() { | |||
|
|||
@Test | |||
@IR(counts = {IRNode.VECTOR_CAST_F2I, IRNode.VECTOR_SIZE_16, "> 0"}, | |||
applyIfCPUFeature = {"avx512f", "true"}) | |||
applyIfCPUFeatureOr = {"avx512f", "true", "avx10_2", "true"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should check for target specific Machine IR which is selected on AVX10_2 targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New checks are added.
@@ -105,7 +105,7 @@ public void checkf2int(int len) { | |||
|
|||
@Test | |||
@IR(counts = {IRNode.VECTOR_CAST_F2L, IRNode.VECTOR_SIZE_8, "> 0"}, | |||
applyIfCPUFeature = {"avx512dq", "true"}) | |||
applyIfCPUFeatureOr = {"avx512dq", "true", "avx10_2", "true"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
avx10_2 is super set of AVX512DQ, we enable all AVX512 featurs during VM initialization and IRFrameWork rely on the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the checks to account for this.
I modified existing vector conversion tests, and I'll add some matching scalar tests to get full coverage. |
…rms during Vector floating point to integer conversion tests
…bling vector alignment or compact object headers
@IR(counts = {"castDtoX", " >0 "}, phase = CompilePhase.FINAL_CODE, | ||
applyIfCPUFeatureAnd = {"avx", "true", "avx10_2", "false"}) | ||
@IR(counts = {"cast2DtoX", " >0 "}, phase = CompilePhase.FINAL_CODE, | ||
applyIfCPUFeature = {"avx10_2", "true"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please refer to https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java#L2638 for adding MachNode IR node based checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I added some new nodes.
@missa-prime , please have a look at the following failure with the current patch
|
…t in cast2F2X and cast2D2X memory instructions
… conversion tests
@@ -491,6 +491,26 @@ public class IRNode { | |||
callOfNodes(STATIC_CALL_OF_METHOD, "CallStaticJava"); | |||
} | |||
|
|||
public static final String CAST_F2X = PREFIX + "CAST_F2X" + POSTFIX; | |||
static { | |||
machOnlyNameRegex(CAST_F2X, "castF2X_reg_(av|eve)x"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "castFtoX_reg_(av|eve)x".
|
||
public static final String CAST_D2X = PREFIX + "CAST_D2X" + POSTFIX; | ||
static { | ||
machOnlyNameRegex(CAST_D2X, "castD2X_reg_(av|eve)x"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "castDtoX_reg_(av|eve)x".
|
||
public static final String CAST2_F2X = PREFIX + "CAST2_F2X" + POSTFIX; | ||
static { | ||
machOnlyNameRegex(CAST2_F2X, "cast2F2X_(reg|mem)_evex"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "cast2FtoX_(reg|mem)_evex"
|
||
public static final String CAST2_D2X = PREFIX + "CAST2_D2X" + POSTFIX; | ||
static { | ||
machOnlyNameRegex(CAST2_D2X, "cast2D2X_(reg|mem)_evex"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be "cast2DtoX_(reg|mem)_evex".
predicate(VM_Version::supports_avx10_2() && | ||
is_integral_type(Matcher::vector_element_basic_type(n))); | ||
match(Set dst (VectorCastF2X src)); | ||
format %{ "vector_cast2r_f2x $dst, $src\t!" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format %{ "vector_cast2r_f2x $dst, $src\t!" %} | |
format %{ "vector_cast_f2x_saturating $dst, $src\t!" %} |
predicate(VM_Version::supports_avx10_2() && | ||
is_integral_type(Matcher::vector_element_basic_type(n))); | ||
match(Set dst (VectorCastF2X (LoadVector src))); | ||
format %{ "vector_cast2m_f2x $dst, $src\t!" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format %{ "vector_cast2m_f2x $dst, $src\t!" %} | |
format %{ "vector_cast_f2x_saturating $dst, $src\t!" %} |
src will be represented by appropriate addressing scheme for the memory operand
predicate(VM_Version::supports_avx10_2() && | ||
is_integral_type(Matcher::vector_element_basic_type(n))); | ||
match(Set dst (VectorCastD2X src)); | ||
format %{ "vector_cast2r_d2x $dst, $src\t!" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format %{ "vector_cast2r_d2x $dst, $src\t!" %} | |
format %{ "vector_cast_d2x_saturating $dst, $src\t!" %} |
predicate(VM_Version::supports_avx10_2() && | ||
is_integral_type(Matcher::vector_element_basic_type(n))); | ||
match(Set dst (VectorCastD2X (LoadVector src))); | ||
format %{ "vector_cast2m_d2x $dst, $src\t!" %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format %{ "vector_cast2m_d2x $dst, $src\t!" %} | |
format %{ "vector_cast_d2x_saturating $dst, $src\t!" %} |
@@ -7709,6 +7712,33 @@ instruct castFtoX_reg_evex(vec dst, vec src, vec xtmp1, vec xtmp2, kReg ktmp1, k | |||
ins_pipe( pipe_slow ); | |||
%} | |||
|
|||
instruct cast2FtoX_reg_evex(vec dst, vec src) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be named as castFtoX_reg_avx10.
ins_pipe( pipe_slow ); | ||
%} | ||
|
||
instruct cast2FtoX_mem_evex(vec dst, memory src) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be named as castFtoX_mem_avx10.
@@ -7753,6 +7786,33 @@ instruct castDtoX_reg_evex(vec dst, vec src, vec xtmp1, vec xtmp2, kReg ktmp1, k | |||
ins_pipe( pipe_slow ); | |||
%} | |||
|
|||
instruct cast2DtoX_reg_evex(vec dst, vec src) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be named as castDtoX_reg_avx10.
ins_pipe( pipe_slow ); | ||
%} | ||
|
||
instruct cast2DtoX_mem_evex(vec dst, memory src) %{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be named as castDtoX_mem_avx10.
@@ -11724,8 +11725,31 @@ instruct convF2I_reg_reg(rRegI dst, regF src, rFlagsReg cr) | |||
ins_pipe(pipe_slow); | |||
%} | |||
|
|||
instruct conv2F2I_reg_reg(rRegI dst, regF src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be named as convF2I_reg_reg_avx10.
@@ -11746,8 +11793,31 @@ instruct convD2I_reg_reg(rRegI dst, regD src, rFlagsReg cr) | |||
ins_pipe(pipe_slow); | |||
%} | |||
|
|||
instruct conv2D2I_reg_reg(rRegI dst, regD src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be named as convD2I_reg_reg_avx10.
ins_pipe(pipe_slow); | ||
%} | ||
|
||
instruct conv2D2I_reg_mem(rRegI dst, memory src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be named as convD2I_reg_mem_avx10.
@@ -11757,6 +11827,28 @@ instruct convD2L_reg_reg(rRegL dst, regD src, rFlagsReg cr) | |||
ins_pipe(pipe_slow); | |||
%} | |||
|
|||
instruct conv2D2L_reg_reg(rRegL dst, regD src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be named as convD2L_reg_reg_avx10.
ins_pipe(pipe_slow); | ||
%} | ||
|
||
instruct conv2D2L_reg_mem(rRegL dst, memory src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be named as convD2L_reg_mem_avx10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IRNode.java will need name regex changes accordingly.
@@ -491,6 +491,26 @@ public class IRNode { | |||
callOfNodes(STATIC_CALL_OF_METHOD, "CallStaticJava"); | |||
} | |||
|
|||
public static final String CAST_F2X = PREFIX + "CAST_F2X" + POSTFIX; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be we can name CAST_F2X as X86_VCAST_F2X and CAST2_F2X as X86_VCAST_F2X_AVX10.
Then we can use the similar theme for other names below as well.
Intel® AVX10 ISA [1] extensions added new saturating floating point conversion instructions which comply with definitions in section 5.8 of the 2019 IEEE-754 standard. They can compute floating point to integral type conversions while also handling special inputs such as NaN, +Infinity, and -Infinity.
Without AVX10.2, the current approach starts by converting the floating point value(s) in the source register to the desired integral value(s) in the destination register. In the scalar case, the CVTTSS2SI (single precision) or CVTTSD2SI (double precision) instruction is used. In the vector case, the CVTTPS2DQ (single precision) or CVTTPD2DQ (double precision) is used. However, if the source contains a special value (NaN, -Infinity, +Infinity, <= Integer.MIN_VALUE, or >= Integer.MAX_VALUE), extra handling is required. The specific sequence of instructions involved depends on the source (single precision vs double precision), destination (long, integer, short, or byte), level of parallelization (scalar vs vector), and supported AVX extension type. Essentially though, the special values are mapped to values (NaN -> 0, -Infinity, <= Integer.MIN_VALUE -> Integer.MIN_VALUE, +Infinity, >= Integer.MAX_VALUE -> Integer.MAX_VALUE) in the integer range with the help of a few temporary registers to store intermediate results.
This change uses the new AVX10.2 scalar (VCVTTSS2SIS or VCVTTSD2SIS) and vector (VCVTTPS2QQS, VCVTTPS2DQS, VCVTTPD2QQS, and VCVTTPD2DQS) instructions on supported platforms to avoid the extra handling described above. Also, the JTREG tests listed below were used to verify correctness with
-XX:-UseSuperWord
/-XX:+UseSuperWord
options to exercise both scalar and vector paths. The baseline build used is OpenJDK v26-b11.jtreg:test/hotspot/jtreg/compiler/codegen/TestByteDoubleVect.java
jtreg:test/hotspot/jtreg/compiler/codegen/TestByteFloatVect.java
jtreg:test/hotspot/jtreg/compiler/codegen/TestIntDoubleVect.java
jtreg:test/hotspot/jtreg/compiler/codegen/TestIntFloatVect.java
jtreg:test/hotspot/jtreg/compiler/codegen/TestLongDoubleVect.java
jtreg:test/hotspot/jtreg/compiler/codegen/TestLongFloatVect.java
jtreg:test/hotspot/jtreg/compiler/codegen/TestShortDoubleVect.java
jtreg:test/hotspot/jtreg/compiler/codegen/TestShortFloatVect.java
jtreg:test/hotspot/jtreg/compiler/vectorapi/VectorFPtoIntCastTest.java
jtreg:test/hotspot/jtreg/compiler/vectorization/runner/ArrayTypeConvertTest.java
[1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/26919/head:pull/26919
$ git checkout pull/26919
Update a local copy of the PR:
$ git checkout pull/26919
$ git pull https://git.openjdk.org/jdk.git pull/26919/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 26919
View PR using the GUI difftool:
$ git pr show -t 26919
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/26919.diff
Using Webrev
Link to Webrev Comment