Notes on reverse engineering Petz and contributing to PetzHeaders

Warning: this is not going to make any sense if you don't understand basic reverse engineering/programming/C++ concepts like pointers, memory allocation, virtual functions etc. This is not a beginner tutorial. Also I'm no expert in reverse engineering so if any of this is stupid, tell me.

To write PetzHeaders, these were the 3 most important things:

RTTI

Run-time type information is used in dynamic casts (i.e. taking a base pointer and casting it to a derived pointer at runtime). This encodes the inheritance hierarchy of classes, which is extremely useful when it comes to figuring out the basic class structure.

For some reason Ghidra completely fails to decipher Petz' RTTI. IDA can do it with ClassInformer but that doesn't work in the latest free version. I ended up using Cutter to look at it. It's not the most user-friendly interface but it works. Things get more confusing when multiple/virtual inheritance enters the mix. With multiple inheritance you'll see multiple entries per class in Cutter.

Virtual function tables

When a virtual function is called, it basically looks like call dword ptr [eax+4]. This means the order of the virtual function tables is extremely important - if the function at position 1 of the vftable isn't what the calling code is expecting, everything will blow up.

You can view the vftables directly in IDA. Most of them are exported from Petz 4, or you can go directly to the GetSprite exported function in a DLL and double-click on the vftables to view them.

A v useful IDA tip: go to Options > Demangled Names > Setup Short Names to increase the amount of info you can see. You probably want to enable at least return type/static, virtual, show const.

Virtual function tables are mostly generated in order of definition. I don't know if there's a good way to export them from IDA, but I just copied them and used a regex to turn them into function declarations.

One big thing to watch out for: overloaded functions are grouped together no matter what order they're defined in. They are grouped in reverse declaration order. So if you declare virtual void DoSomething(int) and then 5 entries later you declare virtual void DoSomething (int, int), your vftable entries will be in this order: virtual void DoSomething(int,int); virtual void DoSomething(int).

Beware that IDA gets confused about empty functions, e.g. it reports Area::GetIsIndoor as XBallzData::AmIChild. Multiple empty functions point to one location so it has no way to know which one is the most relevant. One way to deal with this is opening a toy/breed/area so that the amount of functions is more scoped. Another is to see what other empty functions are also represented there and pick the obviously correct one if there is one.

If you get piles of unresolved external errors from the linker when building, your function signature doesn't match the exported one exactly. Make sure you represent all consts.

Finding class sizes

Find a calloc for the class (hierarchy) you're interested in. For example, in GetSprite you can see callocs for complete pets/toys and in some genome/chromosome functions you can see callocs for alleles. The callocs are hidden behind PetzNew or unnamed functions but you can find it pretty easily.

This tells you the total object size, but if you have a class hierarchy involved then you have to figure out what bits go where. There's some virtual inheritance high up in the Petz hierarchy too, which further complicates things. Basically you can figure it out via the placement of the vftables and vbtables, and by the jumps made when constructing base classes.

Remember that you don't have to individually define each variable, unless you intend to use them in your own code. You just need to allocate the right amount of memory (or more) and get the vftables/vbtables in the right place. You don't even have to get the placement of variables right as long as the memory structure is correct, i.e. if you have Base > Derived > Derived2, you can get away with putting all the variables in Derived2 if that's the only one you're ever going to care about.

Finding the purpose of variables is harder. No shortcuts here, you just need to inspect the code in IDA/Ghidra/your favourite disassembler here or a debugger and figure it out. Big hints can be found in the various StreamOut functions if you already know the format of the written data (e.g. lnz is well understood, so you can tell the function of lots of variables in the Linez class by when they are written out).

One thing I'll flag up is that there's a lot of use of a custom vector class in the code, e.g. a genome is two pfvectors of chromosomes. The size of these is always 12 (4 for a ptr, 4 for size, 4 for capacity).

Also remember that the calloc includes space for virtual function tables and vbtable pointers.

A practical example

Let's say you want to create a new food bowl toy. Imagine that the relevant header in PetzHeaders is not already there, i.e. there is no Sprite_Bowl in PetzHeaders.

First, determine what class you need to inherit from. This can be sometimes be done by guessing (look through Petz 4.exe in IDA/Ghidra/whatever - in this case it's pretty obvious that we should inherit from Sprite_Bowl). Otherwise, use Cutter on an existing resource to determine the correct class hierarchy.

(If you find out that you need to implement a whole class hierarchy that doesn't exist in PetzHeaders, get ready for a long job!)

Set up the class(es) in PetzHeaders. Remember to mark the whole class as __declspec(dllimport).

Next, determine the class size. Open an existing bowl and look at the GetSprite function.

PetzNew is a memory allocation function. You can tell that the total size of the toy (in this case, the Plate of Leftovers) is 0x3C9C.

A ToySprite's size is 0x3C7C. You can determine this by looking at the GetSprite of a toy which has no additional state, like a brush.

So you know that Sprite_Bowl must have 0x3C9C - 0x3C7C = 0x20 of additional variables. We probably don't care what these variables are, we just want to get the class the right size. So stuff 32 bytes of memory in there, e.g. int vars[8];. (In practice you'll see that PetzHeaders defines int vars[7] and int foodAmount in Sprite_Bowl, because we did end up caring about one of the variables!)

Now determine the overrides your class needs. This can be done by looking at the virtual function tables in IDA. Take note of any functions which mention your class, and implement the overrides.

Now implement your actual derived class in your DLL project, e.g. Sprite_Bowl_J2: public Sprite_Bowl. To start with, attempt to make it exactly match your sample resource (i.e. for this example, attempt to make a food bowl that exactly replicates the leftovers). Use IDA to see what functions the toy/breed/area implements and copy their logic. Use the RES file from your sample resource.

Compile and inspect your code in IDA. Make sure the class size and placement of vf/vbtables matches your sample resource. Make necessary adjustments.

Test your resource in-game. Make necessary adjustments. If the resource loads and is playable but doesn't have all the original functionality, you've almost certainly forgotten to override something; take another look at the vftables. You may end up having to determine the function of certain variables and name them in PetzHeaders.

Once you have a working exact copy, you can make your custom changes and submit a github PR to PetzHeaders!