Having now finished a full project on a PC-88 game, I find it interesting how much closer it is to MSX hacking that to NEC's own PC-98. This shouldn't be surprising, of course, as both are 8-bit computers with Z80 processors, where the PC-98 is a 16-bit x86 system of the next generation, closer in many ways to PC-compatibles of the era than its direct predecessor. For my purposes, aside from the particular flavor of ASM, the lack of more modern filesystems for disks means I had to work with the .d88 images more like a ROM cartridge than a floppy disk of discrete files.
The d88 file format is well documented online, but essentially consists of the raw data from each sector of the floppy disk, each with a header identifying the sector, and prepended with a file header describing the properties of the disk and a table of pointers to the start of each track. The PC-88 will load data by addressing specific track/sector coordinates to load into RAM, so that's another thing to look out for alongside memory address pointers when adjusting text. For dumping the text, I pretty much just had to go through it visually in a hex editor and mark down the memory locations of the text, as well as the sector headers so my extraction script can skip over them. I listed all the sections to be dumped as pairs of start/end adresses in a separate file, which my script loads along with disk image to know what to ignore and what to output. This is a basic general purpose techinique and there are existing tools available such as Cartographer which can do this along with various additional features like formatting the output text and tracking memory pointers. I just roll my own because I'm a weirdo.
For War of the Dead, all the menu text and text triggered by events are hard-coded in with the "executable" portion of the data. Some data, such as item names, are addressed by a table of pointers to each entry. Others, like the character names, are stored in a single block and the game just runs through it until it hits a certain number of delimiter characters to know it is at the right one. The dialog text is similarly stored in one big contiguous chunk with delimiters, but with it's own special method of formatting.
The text for speaking to characters is stored on the disk as a huge list of lines, each ending with the byte "0x0D". It is further divided into separated blocks for each "level" of the game, and each of these blocks are aligned to the start of a particular disk sector, padding the end of the block with "0x00" until it hits the next sector. When you interact with a character in the game, it checks the current level you are at, then references a table of track/sector numbers. Starting at that sector on the disk, it then loads the next 0x1000 bytes into ram at 0x4000, so the start of the current level's block is always at 0x4000, and any individual block cannot be longer than 0x1000 bytes. The game then starts going through from the start of the block and counting the number of delimiters it reaches. Each block contains 4 lines for each character, including blank ones, in order, so the id number of the character times 4, plus the current line number, tells it how many delimiters to count. It then changes the ending 0x0D of that line to 0x00, reads back through to the start of the line, and then prints it on screen. The changing of the delimiting bit is important because of how it handles special characters.
If you're wondering why I call the script divisions "levels", they all advance at the same part of the game where it increases your Max MF, so it could be considered a "level" in both the game progression and character progression sense.
When the game reads a character and is getting ready to print it to the screen, if the byte is lower than 0x20 it then executes a subroutine indexed from a table of address pointers. Not every byte in that range is used, most just point to the same address to do nothing, but the ones that do have functions are as follows:
08 (18bb) backspace (clear)
0A (18ca) skipline (add textbox width to current position)
0B (1943) move cursor to start of textbox
0C (18cf) clear textbox and continue
0D (18c5) move cursor to start of current line (line separator)
1C (18ae) move cursor right unless at end of line
1D (18a4) backspace (no clear)
1E (188d) move cursor up 1 row (bugged, after calculating the new position, it re-reads the old one instead of writing it)
1F (1897) move cursor down 1 row
-- (1949) unused bytes point here
You'll notice that there is no explicit new line character. The text autowraps, and Japanese doesn't have to worry about not splitting words across lines. Other text boxes in the game will use 0D 0A to move the cursor to the start of the line and then skipline to the start of the next, but since the character dialog uses 0D as the delimiter and replaces the end of the line with 00, that subroutine cannot be called directly. To get around that, I rewrote the subroutine for 1E, which was unused and bugged anyway, to instead function as a single byte 0D 0A. I could then replace the space between two words with 1E where I needed to force a line break in the text.
cdca18 CALL 18CAH ;0A subroutine
cdc518 CALL 18C5H ;0D subroutine
c9 RET
0000 NOP 2 times
There were still some lines in English that were too long for a single text box, and the game did not have a way to address this. Text would autowrap back to the first line of the text box and display on top of the existing text. The 0C command does clear the box, but there are a couple things it doesn't do. Firstly, there is no command to tell it to stop and wait for input before clearing and continuing. Then, there is the fact that the lines in the game are doublespaced and moves down an extra line when it autowraps. That was an issue I ran into when trying to manually write the new line subroutine, before realizing how the game was already handling it. So using 0C to clear it leaves the text right up against the top of the box instead of spaced down properly. Lastly, there is no command to print the character name, it is built into the text box routine just before it grabs the dialogue, but thankfully I could duplicate the code for that to solve the second and third problems at the same time.
cd267a CALL 7A26H ;call to end of text routine where it waits for input to close
cdcf18 CALL 18CFH ;0C subroutine
3e01 LD A,1
328116 LD (1681H),A
3a3000 LD A,(30H)
5f LD E,A
21fd7c LD HL,7CFDH
cd677c CALL 7C49H ;reprint the character nametag
c9 RET
This ended up being larger than I expected because I had to duplicate setting up the call to write the nametag instead of just jumping to it, so instead of replacing an existing special character, I had to find somewhere else in memory to stick it and point one of the unused commands to it. Since the block of character names was smaller in English, I put the code in the leftover space at the end of it and set the pointer for 1B to go there. This worked well because that data was in with the code for the text boxes, so I could be sure it would always be loaded in memory whenever I needed to use it.
Most of the non-dialogue text was simple enough that it was easiest to just insert them by hand in the hex editor. They either fit in the original memory alloted or could be easily rewritten to do so, and there were only a handful of pointers that needed to be manually adjusted, which was easy enough. For the dialogue text, not only was the English text much larger, the sector-aligned formatting meant recalculating how many sectors needed and padding the end of each block, wasting even more space. Only one block went over the 0x1000 byte hard limit and had to be edited down, but in the end the whole text required 15 more disk sectors than the original. To make matters worse, while I could adjust the pointers to the start of each block, the data within the block had to remain contiguous, so I could not split a block across multiple areas of the disk. Keeping the scripts in the original order, there was not enough room for the last two blocks of the script, and the disk was otherwise too full to put them anywhere. I investigated the possibility of expanding the disk and adding more tracks, since some later-model PC-88s supported 2HD floppy drives that had a higher capacity, but in testing I found that only one out of the three emulators I tried would read the extra tracks and I didn't hold out much hope that real hardware would fare any better. I was worried that I would have to add some sort of compression to loading the data, adding to the load times when the game already takes a second or two just to bring up the text boxes, but thankfully I found a couple of tracks on the second disk that were all 0xFFs. Crossing my fingers that those were truly blank and not some sort of copyright protection, I moved the last two blocks to the second disk. Since there was already one block of dialogue on the second disk for the ending cutscene, it was trivial to adjust the dialogue loading routine to look at the second disk if the level is 09 or higher, instead of just 0B.
The scrolling text for the opening and ending are both stored on disk two and are similar in format but addressed differently. Both are padded with spaces at the start of each line to manually center the text on screen, and both have each line delimited with 0D 0A. For the opening text, each line is accessed via a pointer table, and it stops at 0D so the 0A byte is not actually needed. The ending text, on the other hand, is read out a single long textbox including the credits, so no adjustment is needed.
Locations in memory for various data
be00 current script?
be01 current map
be02 player x position
be03 player y position
be04 player facing direction
be05 player current hp
be71-be81 item counts in inventory
be89-be90 weapon acquired flags
be91-be98 PS charged weapon ammo
be99 current equipped weapon?
bfd3,d4,... flags for characters' current dialogue line
By using a debug emulator and watching which of these flags are accessed when talking to a character,
I could see which other character's flag I needed to trigger before the speaking character would move
on to their next line.