SH2 assembly optimizations

Through working on Princess Crown and more recently Langrisser III there’s been times where I’ve been working with tight code space. Sometimes you need to add a few instructions and you just have no space to put them.  Anyways, here’s some areas I found where you can often optimize the code giving you a few extra precious bytes to work with:

  1.  Negative or high value word’s stored as long’s. Sometimes you run into following scenario:

    mov.l    my_label, r0
    [more code here]
    my_label:    .long 0xFFFD

    The data then is basically treated as a word for its entire use. It’s a bit inefficient since you should be able to store it as a word instead. Of course you may also have to deal with data alignment which may negate the savings. So a better approach instead is as follows:

    mov    #0xFD, r0
    extu.w r0, r0

    This works because of how the SH2 loads immediates. It always sign extends so it ends up as 0xFFFFFFD, etc. Then you can clamp it down to the word value of 0xFFFD with extu.
  2. Unnecessary extu’s. Sometimes there’s cases where code may want to check for a specific value like 0xFFFD and it’s reading word’s from a buffer:

    mov.w @r0, r0
    extu.w r0, r0
    mov #0xFD, r1
    extu.w r1, r1
    cmp/eq r0, r1
    bt my_label

    Given that reading word from a buffer and loading an immediate both sign extend, there’s really no point in converting them both to words. Here’s a smaller version:

    mov.w @r0, r0
    mov #0xFD, r1
    cmp/eq r0, r1
    bt my_label
  3. Similar pointers. Occasionally you run into code that reads and writes to specific address:

    mov #0, r0
    mov.l addr1, r1
    mov.l r0, @r1
    mov.l addr2, r2
    mov.l r0, @r2
    addr1: .long 0x06005000
    addr2: .long 0x06005040

    So long as the addresses are within offset -/+ 127 you can strip out the second address immediate and use an add instruction:

    mov #0, r0
    mov.l addr1, r1
    mov.l r0, @r1
    mov r1, r2
    add #0x40, r2
    mov.l r0, @r2
    addr1: .long 0x06005000

    Just remember that add immediate is signed byte(just like loading immediates). so “add #0xFF, r0” is the equivalent of add -1, not add 0x000000FF.
  4. Unused code. If you can find it and it just happens to be larger than the code you want to insert then use this. Most of the typical unused code you’ll see is related to the CD Block. In fact the official libs used for the CD Block were incredibly bloated. If you know what you’re doing you can find some space here.
  5. Unused HWRAM areas. If you’re lucky 0x06000C00-0x06004000 may only be partially used or not used at all. You can usually verify this by running the game, and checking 0x06000C00/0x06002000 areas and see if it contains data from the first sector of the disc. If it’s still there then it’s quite likely the area is free. I’d still run some checks to make sure though.

One Response to “SH2 assembly optimizations”

  1. ProstatePunch says:

    Hey, I was thinking about dipping my feet into the Saturn Grandia Translation Project, and all my digging around for information led me to you. I was wondering if I could speak with you maybe over Skype (or email if you prefer) regarding this so I can pick your brain a bit. I really want to see this project come to life, but don’t really know where to start and could use a hand.

Leave a Reply

Time limit is exhausted. Please reload CAPTCHA.