Engine: Backing up of waloff array in per-map setup results in dereferencing of stale pointers
Repeatedly loading per-map art of two different maps (example: see attachment) will eventually start overriding other tile data. What I observed is that it overwrote data of the menu and UI tiles, including the crosshair. Here's a screenshot of the bug in action: https://imgur.com/YqwDVQg
Here's the test case to reproduce this: mapart_swaplevel.zip
Simply load E1L1 and hit alt fire repeatedly to switch levels. Eventually you will notice the crosshair having been overwritten. The following is the output of cacheinfo after this occurred: cacheinfo_out.txt
No particular revision where this started, in fact this used to crash the game instead due to running out of cache entries.
UPDATE 2020-08-26: I have investigated this issue further and was able to make some new discoveries.
First, I no longer believe this problem is restricted to per-map tiles, and have hence altered the title. Much rather I believe that the per-map art system has a bug that exposed a potentially bigger underlying problem.
Nevermind, turns out this is only a problem for per-map art, see below.
First, let's take a look at the bug in artSetupMapArt()
and artClearMapArt()
:
void artClearMapArt(void)
{
...
for (bssize_t i=0; i<MAXTILES; i++)
{
if (tilefilenum[i] >= MAXARTFILES_BASE)
{
// XXX: OK way to free it? Better: cache1d API. CACHE1D_FREE
walock[i] = CACHE1D_FREE;
waloff[i] = 0;
}
}
// Restore original per-tile arrays
RESTORE_MAPART_ARRAY(tilefilenum, g_bakTileFileNum);
RESTORE_MAPART_ARRAY(tilefileoffs, g_bakTileFileOffs);
RESTORE_MAPART_ARRAY(tilesiz, g_bakTileSiz);
RESTORE_MAPART_ARRAY(picsiz, g_bakPicSiz);
RESTORE_MAPART_ARRAY(walock, g_bakWalock);
RESTORE_MAPART_ARRAY(waloff, g_bakWaloff);
RESTORE_MAPART_ARRAY(picanm, g_bakPicAnm);
RESTORE_MAPART_ARRAY(faketile, g_bakFakeTile);
RESTORE_MAPART_ARRAY(rottile, g_bakRottile);
...
}
void artSetupMapArt(const char *filename)
{
artClearMapArt();
...
// Allocate backup arrays.
ALLOC_MAPART_ARRAY(tilefilenum, g_bakTileFileNum);
ALLOC_MAPART_ARRAY(tilefileoffs, g_bakTileFileOffs);
ALLOC_MAPART_ARRAY(tilesiz, g_bakTileSiz);
ALLOC_MAPART_ARRAY(picsiz, g_bakPicSiz);
ALLOC_MAPART_ARRAY(walock, g_bakWalock);
ALLOC_MAPART_ARRAY(waloff, g_bakWaloff);
ALLOC_MAPART_ARRAY(picanm, g_bakPicAnm);
ALLOC_MAPART_ARRAY(faketile, g_bakFakeTile);
ALLOC_MAPART_ARRAY(faketiledata, g_bakFakeTileData);
ALLOC_MAPART_ARRAY(rottile, g_bakRottile);
...
}
Immediately you will notice that the loop to zero all the cache pointers to per-map tiles of waloff
, as well freeing all the locks to per-map tile entries in walock
inside the function artClearMapArt()
is actually redundant, as both of these arrays are afterwards overwritten by their backups.
Furthermore it is not clear if making a backup of the pointer and lock arrays (i.e. waloff
and walock
) is actually a good idea, because maybe you are storing dangling pointers to data that has already been freed elsewhere. This backup was introduced in 298d80a2 as part of a fix to prevent viewscreens from crashing the game if used in conjunction with per-map art (if I remember correctly).
Finally, what actually exposes the corruption problem is that, if the first map you load before anything else has a per-map art tile assigned, then this code will actually create a backup of the waloff
array that consists almost entirely of null pointers. If you then repeatedly load usermaps that have a per-map art file assigned, the code will proceed to discard most of the previously cached tiles, replacing the existing waloff
array with the backup, and thus reloading almost all of the tiles from disk.
Some tile data however is not reloaded in this process, which happens to include pretty much all the data required to display the menu, including the crosshair, the letters, the logo and the spinning nuke tiles. These tiles are the ones that have been loaded prior to any map being played, and what we can observe is that it is exactly those tiles that end up getting corrupted if the remaining tiles are reloaded from disk often enough.
If the first map the player starts is a map that does not have any per-map tiles, then if the player then attempts to repeatedly load a map which does have them assigned, the problem does not occur. Since the waloff
array has been filled with pointers prior to the loading of the per-map art, a backup will be created which does not consist of zero pointers, and hence less tiles will end up being reloaded.
Final Update: Further debugging with GDB indicates that it is only restricted to per-map art tiles.
What causes the corruption is that, as previously suspected, stale pointers are in fact loaded from the backup waloff
array. Namely, once the cache is full and tiles start getting evicted, the waloff
pointers for those tiles that are reloaded will now point to a different location in memory. However, once another map with per-map tile is loaded, the data from the backup waloff array is restored, including the stale pointers.
Therefore, the solution to this problem is to not back up the waloff
array.