Analysing unknown ROMs

To get to a point where you can extract modules from a ROM you need to identify how the ROM is put together, basically there’s a ROMHEADER (usually included as the first part of exec.library, libraries and then the end of ROM checksum/size/vectors (have a look at Kickstart structure for more info).
Some of these things are in standard places and hold information about the rest of the ROM (such as the size and checksum), but we are most interested in the libraries themselves, luckily every library has a standard chunk of data which describes it – this is used to “load” the library (obviously Kickstart is always “loaded” as it’s a physical ROM, but the LIBHEADER structure contains the fixups and offsets for the library functions to show how it is to be used/initialised during boot)

Example LIBHEADER from exec v1.2

FC00B6 4AFC RTC_MATCHWORD (start of ROMTAG marker)
FC00B8 00FC00B6 RT_MATCHTAG (pointer RTC_MATCHWORD)
FC00BC 00FC323A RT_ENDSKIP (pointer to end of code)
FC00C0 00 RT_FLAGS (no flags)
FC00C1 21 RT_VERSION (version number)
FC00C2 09 RT_TYPE (NT_LIBRARY)
FC00C3 78 RT_PRI (priority = 126)
FC00C4 00FC00A8 RT_NAME (pointer to name)
FC00C8 00FC0018 RT_IDSTRING (pointer to ID string)
FC00CC 00FC00D2 RT_INIT (execution address)

Identifying libraries

So, first things first, to find each library find the RTC_MATCHWORD (two bytes 0x4AFC), then the next four bytes must be the kickstart address of the RTC_MATCHWORD, these six bytes give confidence that this is a library, it’s very unlikely that you’d happen to get six bytes in a row which happened to be the right bytes – 0x4AFC is an invalid opcode for a start, and then having the next four bytes with exactly the right address randomly is billions to one.

Position, size, start, end

The next challenge is to find the start and end of the library, the start could be (and often is) before the RTC_MATCHWORD and the end is (or should be) no earlier than the address given by RT_ENDSKIP, so if you’re really lucky the library will start where the previous library ends on RT_ENDSKIP and will have an RT_ENDSKIP which ends where the next library has a RTC_MATCHWORD – unfortunately, this is not usually the case.

There’s a few ways we can find the start and end of a library, if you have access to a disassembler then this will definitely help, the inbuilt one is pretty basic and incomplete, but it’s functional and I use it when I want to find out which code belongs to a library (see Missassembler), having access to multiple kickstarts of similar or same revisions really, really helps here as you may find that an unknown library has a known end or known start in a different ROM version, e.g. if you have a ROM where the start is known and another ROM where the end is known, then you know both, and the more libraries you can identify, the more other library start/ends you can find.

RELOCs

Once you know the start and end of a library, the final step is to identify the “RELOCs”, these are the absolute addresses in the library which change depending where in ROM it is placed (vital to know if you want to be able to extract a library and place it somewhere else in the ROM), for example, within the LIBHEADER you will always get five absolute addresses RT_MATCHTAG, RT_ENDSKIP, RT_NAME, RT_IDSTRING and RT_INIT – this is the minimum number of RELOCs in a library, and there could be thousands (although that’s rare), each of these addresses need to be RELOCated when you load a library into a ROM.

There’s two basic ways of finding the RELOCs, first and most laborious is just to find any four-byte number that could be an address within the range of the library start/end addresses and then checking to see if that’s genuine or just random noise (again the disassembler is very useful here), the second way (which I found most commonly useful) is to compare two copies of the same library loaded at different addresses (i.e. in different places in two ROMs), if the same library is in two ROMs at different locations then the differences must be RELOCs, and you can verify this by checking that they have the same relative offset within the library, e.g. if the two libraries both have a four-byte address that point at the address in ROM 100 bytes from the RTC_MATCHWORD, then that must be a RELOC.

Known libraries

Because I have a large collection of ROMs I have been able to use multiple examples to identify the libraries, sometimes I even have the same ROMs which are designed to load at different addresses (e.g. one may be a physical 0xF8000 ROM and another may “softload” to 0x200000, these are ideal for identifying RELOCs). For every known library, I create a hash file (see Hash-files), these contain several critical bits of information; firstly, the Size= (this is the number of bytes from start to end) ROMTAG= (the position from the start where the RTC_MATCHWORD is) from this you can identify known libraries in a ROM, however, Cap uses several cross checks to make sure that it has correctly identified a library, for example, every RELOC is listed in the hash file and to be 100% sure, every hash file name starts with the CRC32 hash value of the extracted library.

Putting this all together, the analyserom() function uses the hash files to identify known libraries;

  1. Scan the ROM for 0x4AFC
  2. Verify the next four bytes are a valid RT_MATCHTAG
  3. See if we have any hash files for that particular library name
  4. For any matching name, e.g. 0xddf45370.gadtools_39.361_(11.12.92) check to see if the listed RELOCs appear valid
  5. If the RELOCs are valid, calculate the CRC32 hash of the extracted library (e.g. 0xddf45370)
  6. If this CRC32 matches the one on the filename then we have correctly identified the library
  7. Repeat for the whole ROM

There’s not that many known libraries, a little over two thousand and only about 200 different ROMs.

Issues

Loadable library files (such as distributed by Hyperion) may have RELOCs out of order, this means that the CRC32 hash of the file and what Cap expects will be different, so all loadable libraries need to be loaded and saved into a consistent format before they can be used for analysing ROMs.

Some ROMs will use non-standard or invalid RT_ENDSKIP in their LIBHEADER which point to addresses outside the start/end addresses of the ROM (the latest AROS ones and some Cloanto 3.X ROMs for example), probably to shave a few millisecond’s off the boot times, this is really tedious when you want to analyse a ROM!

The early dos.library (KS=<1.3) contains normal RELOCs and BCPL RELOCs, no loadable ELF library supports these and they need to be patched individually and manually if you want to save/load/relocate the DOS library, they work the same as a normal RELOC, but the addresses are 4x as small.

Scroll to Top