Skip to content

Conversation

@hornc
Copy link
Contributor

@hornc hornc commented Nov 20, 2025

closes #7

This PR

  • forms ctc1 and cfc1 (recognised by LLVM), and LLVM missing ctc2, cfc2 opcodes using .longs
  • retains the existing CopRegister macro format, with "m" or blank for Data, and "c" for Control
  • retains the existing Cop register numbering from r0-r63
  • uses LLVM for mfc and mtc,
  • adds a delay nop after mfc to ensure the result value is available in subsequent reads which seems to be necessary for reliability. There may be more delays needed for some of the others, but I have only fixed the issues I noticed in my examples. Happy to take advice on how to handle this better if needed.

Technical:
I've used $at / $1 as temporary storage since I couldn't see a way to use in(reg) or out(reg) directly in bitwise forming the opcode. The addiu {}, $at, 0 copies the result back to the register Rust expects it for the asm! macro to set the variable. This frequently results in addiu $at, $at, 0, which does nothing. However, sometimes the Rust in/out register is not $at, so it is needed. If anyone knows of a better way to handle this, please let me know!

Before merging this PR I should probably add in the full 64 GTE register definitions... DONE!

Co-Processor Op-code summary:

MIPS opcode LLVM PSX this PR outputs Description
mfc0 mfc0 Move from CoProc. 0
mfc1 mfc1 Move from Floating Point
mfc2 mfc2 Move from CoProc. 2
mtc0 mtc0 Move to CoProc. 0
mtc1 mtc1 Move to Floating Point
mtc2 mtc2 Move to CoProc. 2
cfc1 .long Move Control Word from Floating Point
cfc2 .long Move Control Word from CoProc. 2
ctc1 .long Move Control Word to Floating Point
ctc2 .long Move Control Word to CoProc. 2
cop2 N/A (for later PR) Coprocessor Operation to Coprocessor 2

Testing
I've got a separate project locally where I have tested:

  • enabling and disabling the GTE
  • reading and writing GTE registers r0-r31 and r32-63 (once GTE is enabled)
  • Just one GTE op SQR , const SQR_OP: u32 = 0x0a00428; which multiplies a vector by itself.
  1. Super simple easy to verify example where <1, 2, 3> is squared to <1, 4, 9>:
Screenshot from 2025-11-19 22-50-14

Simple, but performed by a previously inaccessible GTE :D

  1. Forcing an overflow demonstrating that FLAG (r63) can be read and shows sensible error bits set based on the op ( 2 of the three registers have overflowed).
Screenshot from 2025-11-19 22-49-44

Not covered in this PR:
Turns out LLVM does not support the cop2 opcode either. This is used to send GTE commands once the required registers have been set up. I used this in my example to test:

    const SQR_OP: u32 = 0x0a00428;

    // SQR send cop2 
    unsafe {
        core::arch::asm! {
            "nop",
            ".long {} & 0x1ffffff | 37 << 25 # cop2",
            const SQR_OP,
            options(nomem, nostack)
        }
    }

I imagine adding this as a separate feature. I haven't really thought much about the full API needed, but to use the GTE properly the SDK needs Vectors, Matrices, and Colors of various sorts (with potentially different sizes depending on the context). And all the GTE op codes need to be defined.

Breaking all this up into separate manageable chunks seems sensible.

@hornc
Copy link
Contributor Author

hornc commented Nov 20, 2025

Somewhat answering my own question for a better way to form opcodes from registers: I did look at the register_single! macro in the psp codebase:

https://github.com/overdrivenpotato/rust-psp/blob/e83585c3e78b7d6209b8b5790c837d13ef71355a/psp/src/vfpu.rs#L3557-L3569

I didn't want to list out every possible register, and am not quite sure how to determine the full range that are likely to occur. The psp comment talks about disassembling many examples. The addiu {}, $at, 0 approach seemed the most immediate and reliable to test that the GTE worked with my changes.

@hornc
Copy link
Contributor Author

hornc commented Nov 21, 2025

@ayrtonm This is ready for review.

I just noticed now that even after my refactor to use the "m" opcodes with LLVM, I am still bitwise forming ctc1 and cfc1 opcodes, even though they are supported by LLVM. That was a bit of an oversight. Even though the functionality should be the same.... I should probably only construct the 2 missing opcodes and let LLVM handle all the supported ones ☹️

EDIT: It's late and I'm possibly overthinking this ... the PSX doesn't have a co-processor 1 (FPU), so how this code forms ctc1 and cfc1 is hypothetical as it should never occur. I shouldn't need to change anything for these PSX unsupported op-codes.

@hornc hornc changed the title Support cfc2 and ctc2 op codes to access GTE Control Registers psx/hw/cop: Support cfc2 and ctc2 op codes to access GTE Control Registers Nov 22, 2025
Source: Hitmen psx docs https://hitmen.c02.at/files/docs/psx/psx.pdf
p. 51
> GTE load and store instructions have a delay of 2 instructions,
> for any GTE commands or operations accessing that
> register.

and local testing which showed sporadic errors in some examples.
Sometimes the delays seem to occur naturally, so this was hard to
verify.
2 nops after GTE opcodes and constructed opcodes seems to prevent
all errors reliably.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cfc2 and ctc2 aren't implemented in LLVM's assembler

1 participant