Copy Link
Add to Bookmark
Report

Tony Hawk s Pro Skater 4 extractor/rebuilder

This is a short tutorial on how to code an extractor/rebuilder. This tutorial is for educational use only. A C language program that extracts and rebuilds files from MUSIC.HED/WAD from THPS4 is includ

PS_2's profile picture
Published in 
Playstation 2 tutorials
 · 6 years ago
Tony Hawk s Pro Skater 4 Playstation 2 PAL cover.
Pin it
Tony Hawk s Pro Skater 4 Playstation 2 PAL cover.

Extractor / Rebuilder Tutorial

This is a small tutorial on how to code an extractor / rebuilder. It is for educational use only. The example program included, will extract, rebuild, or list the files contained in MUSIC.WAD/HED from THPS4. The files are located in the MUSIC directory on the DVD. If you would like to use the source code in your own programs, please read the GPL license agreement that was included in COPYING.txt. It basically says that you can use / modify / redistribute the source code as long as you make your modified work’s source code free and available. This tutorial assumes that you already understand the following concepts:

- C language programming
- -control flow, loops, functions, structs, pointers, arrays
- -parsing command line arguments
- -opening, reading, writing, seeking to offset, closing binary files

- compiling, debugging a program

- low level concepts such as the sizes of words and double words (DWORD), big endian, little endian, hexadecimal notation

- how to use a hex editor to move around a file, search for text values and hex values, etc.

About a year or two of college level computer science should be sufficient to understand the concepts and the code I included. I’m sorry to say that if your not a programmer, this tutorial will probably be useless to you.

Let me start out by saying that I’ve only coded a handful of extractors/rebuilders. I’m sure there are many people with more experience on the subject out there who could probably do a better job and give a better explanation of the concepts. All questions and comments are welcome. Please post them in the DVD Ripping section of the PS2Ownz forums.

The code I wrote was aimed at being small in file size and memory usage as well as efficient. The final program is intended to be used from the command prompt. It does not contain a GUI (Graphical User Interface) and gives very little feedback, if any. It was complied with MS Visual C++ 6.0 as a Console Application. If you will be using a different compiler, you will probably have to modify the source code.

The hardest thing about making an extractor is figuring out the file format of the large archive. Even if you figure out one, the next game might be different. The first thing you must understand is the problem at hand. Why do game companies use large archives instead of just including all files on the disc in plain form? Some reasons include: having a few large files is easier to deal with and keep track of for developers, there are limits/restrictions on filenames and directories when they are placed on a disc (I think), and finally, game developers don’t want people to rip off their content. In addition to putting the files into a large archive, game companies can also use compression (e.g. Quake III for the pc uses zip compression on its pak files). The compression can save some space, but it takes longer to read the files when they are needed by the game executable because they have to be decompressed. Luckily, no PS2 games use this kind of compression (at least none that I’ve come across).

Ok, let’s move on to extraction. The basic idea here is that you need to find the file table of the archive and decipher it. The file table can be located in the large archive in the form of a header, or be in a separate file altogether. Some file tables contain a lot of information and some only contain a little. All of them need to have the offsets of where each file is located within the archive. There must also be a way to determine when a file ends (its size). The offset can be expressed as a byte offset or as a unit of 2048 bytes, 2048 bytes being the sector size of a CD/DVD. It might not be a requirement, but often times files will be stored at an address that is an even multiple of 2048. If this isn’t strictly required by the PS2 software libraries, then it’s probably just for run time read efficiency. The offset, and numbers in general will be stored as double word quantities (32 bits). These dwords are little endian with big endian bit order within each byte (the bits of each byte are stored as big endian, but the hi order bytes are stored after the low order bytes, little endian). Let’s say we used a hex editor and found this offset:

 
03AF 4D00


To read this number, you have to reverse the byte order:

 
004D AF03


Notice that the order of the bits in each byte was not changed. When you read these numbers with the C function fread, you’ll have to swap the byte order manually to form a dword. Once the offset and file size have be found, the file can be extracted. However, the file table can contain many more values than just the offset and the size. Some contain filenames and paths, start sector (LBA), end sector (LBA), number of sectors (LBA). LBA meaning the value is expressed as a multiple of 2048 bytes. The more you know about the file table, the better. The same goes for the content of the files within the archive. Make sure you keep good notes on whatever you have found so you can refer to this info later. The last part of the extraction process is knowing where/when to stop. Is the number of entries encoded in the header? Is the file table size or end offset located in the header? Is the header delimited by a sentry value? This must be figured out so you will know how and when to tell your program to stop extraction.

Here’s some oversimplified pseudo code for extraction:

1 - read the entry (filename, offset, size) from the file table
2 - move to the offset within the archive
3 - read size amount of bytes from archive
4 - write size amount of bytes to filename
5 - repeat until all files have been extracted

This next part is specific to MUSIC.WAD/HED from THPS4. Open MUSIC.HED with a hex editor. The first dword is 0000 0000. The next one is 60EE 7F00. After that is a pathname delimited with a null byte. There are three more bytes of zeros, then the next entry begins. For the next entry we get 0000 8100 for the first dword, 60F2 B100 for the next one, and a string pathname. See the format yet? The first dword is the file offset, the next one is the file size in bytes, and finally the pathname of the file:

- dword offset
- dword size
- \path\name

There seems to be a variable amount of null bytes between entries. However, if you look closely at the address of each entry (not the offset previously mentioned), you will see that each entry starts at an even dword address. There must be at least one null byte at the end of the pathname because it is used as a string delimiter. The extra null bytes are used for padding between entries only if it is necessary.
Here are the values for the first two entries:

1 - (offset: 0000 0000, size: 007F EE60, pathname: \music\Aesop)
2 - (offset: 0081 0000, size: 00B1 F260, pathname: \music\acdc)

Open MUSIC.WAD and check these values out. You will notice that the end offset of the files (offset + size) seems to be wrong, but I think the developers purposely put junk values after the actual files just to throw would be extractors off their trail, so the end offset hence size is in fact correct.

Finally, the end of the file table in MUSIC.HED is marked by the value: FFFF FFFF.

Compile thp4wadx.c and put the executable in the same directory as MUSIC.WAD and MUSIC.HED, then call it like this:

 
thp4wadx x MUSIC


The ‘x’ tells the program to extract the files. ‘MUSIC’ is the prefix name of the WAD and HED files to use for extraction. Sorry, but I won’t be stepping through and explaining the source code. If you can’t follow the program, get some documentation on the C library functions I used to read/write the files. Next, set some break points and debug it. It basically follows the pseudo code I gave before.

Now for rebuilding. I chose to make the program write to a specific directory (MUSIC_extracted). The rebuilder expects to find all files under that directory. Their sizes can be different from when they were extracted, but they must exist and be readable. The pseudo code for the rebuilding process is pretty much the opposite of extraction. Don’t forget, the header must also be updated if the file sizes and consequently the offset within the archive have changed.

1 - get the pathname from the header file
2 - open the file named by pathname and read it’s size
3 - read the current offset of the wad file
4 - update the header file with the new offset and size
5 - read in the named file
6 - write out the contents of the named file to the wad
7 - optionally pad the wad
8 - repeat until all files in the header have been accounted for

I never figured out the padding rules for MUSIC.WAD. There is a lot of space between files. I just padded it so each file would start on an address that’s an even multiple of 2048. That is how most archives I’ve come across are padded, so I decided to use that here too.

Remember, when you update the header, you have to write the dword quantities to be little endian. One way to do this is to break up the dword into a set of four bytes using bit masks and shift operations. Now when you put these bytes into your write buffer, put the lowest order byte first and the highest order byte last. Here’s a code snippet that will perform these steps:

 
unsigned char buff[4];
unsigned int dwVal;

assign the new offset or size of the file to dwVal

buff[3] = (unsigned char) ((dwVal & 0xff000000) >> 24);
buff[2] = (unsigned char) ((dwVal & 0x00ff0000) >> 16);
buff[1] = (unsigned char) ((dwVal & 0x0000ff00) >> 8);
buff[0] = (unsigned char) (dwVal & 0x000000ff);

write buff to header


The thp4wadx rebuilder opens the header file, MUSIC.HED, for updating (“r+b”) and opens the wad file, MUSIC.WAD, for writing (“wb”) (overwriting). Each of the files named in the header is opened for reading and its size is saved. The current offset of the wad file is also saved. These two values will become the new offset and size that will be written to the header. The contents of the file are then read in and written to the wad. Afterwards, the wad is padded to the next 2048 boundary if necessary. The header file is then seeked back to the beginning of the current entry and the new values overwrite the old ones. Remember, since the header file was opened for both reading and writing, there must be a call to fflush, fseek, fsetpos, or rewind between reads and writes.

The rebuilder should be called like this:

 
thp4wadx b MUSIC


The ‘b’ means build the wad. The ‘MUSIC’ means the wad and hed files are named MUSIC.WAD and MUSIC.HED. All files must be under the directory ‘MUSIC_extracted’ for the rebuilder to find them. Make sure you backup or move MUSIC.WAD, and backup MUSIC.HED before rebuilding because they will be overwritten.

One final note about the rebuilder is that it does not relink zero byte files (if you replaced any of the music files with a zero byte dummy before rebuilding, the header will just record the file size as being zero). I don’t know if THPS4 freezes if it tries to read a zero byte file when it’s loading the song, so you may have to relink the header after the rebuilding process is complete. If you don’t want to do this relinking by hand with a hex editor, then a program can be coded that will copy the values of one entry (offset, size) to another. This program could use the filenames of both to locate the offsets of each of the entries within the header.

Conclusion

I hope this helped clarify the process of coding an extractor and a rebuilder. I was hoping to add more programs for archives from other games to this tutorial, but was unable to due to time constraints. Hopefully, I can do more in the future.

If you decipher the file format of an archive, please post it in the forums. Documentation is a programmer’s best friend.

C code

 

/ **
* thp4wadx - extracts / rebuilds / lists files from MUSIC.WAD/HED in THPS4
* Copyright (C) 2002 sitinduk
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
* Contact info: sitinduk@yahoo.com
*
***************************************************************************
*
* thp4wadx.c
*
* This program extracts and rebuilds WAD files from Tony Hawk's Pro
* Skater 4. It puts all files in the directory
* 'WADFilenameWithoutExtension_extracted'.
* eg. MUSIC.WAD files will be extracted to 'MUSIC_extracted'
* The header of the WAD file (eg. MUSIC.HED) is used during extraction
* and updated during rebuilding. The header file and the WAD file
* must be located in the same directory.
* /

#include
#include
#include
#include
#include

/ *************************************************************************** /

typedef unsigned int DWORD;
typedef unsigned char BYTE;

typedef struct Entry {
char szFullPath [_MAX_PATH];
DWORD dwOffset;
DWORD dwSize;
} Entry;

#define BUFFSIZE 4096 / * 4KB * /
#define HDRTERM 0xffffffff / * End of header value * /
#define ALIGN 2048 / * archive file alignment * /

/ *************************************************************************** /

/ **
* Prints program usage to stderr
*
* param char* -> name of the program
* /
void usage(char *szProg);

/ **
* Gets the next entry for the header file.
*
* param Entry* -> this structure will contain the offset, size and pathname
* of the next entry on a successful return
* param FILE* -> header file stream opened for reading
* return -> 1 = success, 0 = end of header (last file already read), -1 =
* failure (read error)
* /
int getEntry(Entry *entry, FILE *f);

/ **
* Lists 'offset size \path\name' to stdout. 'offset' and 'size' are
* displayed in hex.
*
* param FILE* -> header file stream opened for reading
* /
void listFiles(FILE *f);

/ **
* This function swaps the bytes of buff to form a double word value (32 bit
* unsigned integer).
*
* param BYTE* -> a pointer to a 4 element array where buff[0] is the low
* order byte and buff[3] is the high order byte
* return DWORD -> value of reversed bytes
* /
DWORD swapBytes(BYTE *buff);

/ **
* Extracts all files from wad using the info in hdr. All files are written
* to the paths specified in the header. The current directory is used as the
* root directory. (CAUTION: Any files that already exist will be
* overwritten.)
*
* param FILE* -> header file opened for reading
* param FILE* -> wad file opened for reading
* return int -> 1 = success, 0 = failure
* /
int extractFiles(FILE *hdr, FILE *wad);

/ **
* This function creates the directories that are passed in szPath. The first
* directory in the path is written to the current directory. This function
* will silently fail if the path already existed or could not be created.
*
* param char* -> "path\path1\path2\...\file" the last name in the path is
* assumed to be a filename and will not be created
* return int -> 1 = success
* /
int makePath(char *szPath);

/ **
* Rebuilds the wad using the information in hdr. The offsets and file sizes
* in hdr will also be updated. (CAUTION: wad will be overwritten, and hdr
* will be modified.) All files specifed in hdr must exist and be readable.
*
* param FILE* -> header file opened for updating (eg. "r+b")
* param FILE* -> wad file stream in which to write the files, this stream
* must be opened for writing.
* return int -> 1 = success, 0 = failure
* /
int buildWad(FILE *hdr, FILE *wad);

/ **
* Given a DWORD value, this function reverses the byte order and stores
* each byte in buff.
*
* param DWORD -> value to be reversed
* param BYTE* -> a pointer a 4 element array of BYTE, buff[0] will contain
* low order byte and buff[3] will contain the high order byte.
* return int -> 1 = success
* /
int littleEndian(DWORD dwVal, BYTE *buff);

/ **
* This function updates the specified entry in header file 'f'.
*
* param Entry* -> The values in this structure will be written to f.
* param FILE* -> header file opened for updating (eg. "r+b"). The entry is
* written to the current offset (location) of the header file.
* return int -> 1 = success, 0 = failure
* /
int writeEntry(Entry *entry, FILE *f);

/ *************************************************************************** /

void usage(char *szProg)
{
fprintf(stderr, "Usage:\n");
fprintf(stderr, "%s x wadfile\n", szProg);
fprintf(stderr,"Extracts files contained in wadfile.WAD to \
'wadfile_extracted' directory\n"
);
fprintf(stderr, "wadfile.HED must be in same directory as WAD\n");

fprintf(stderr, "%s b wadfile\n", szProg);
fprintf(stderr, "Builds wadfile.WAD and regenerates wadfile.HED\n");
fprintf(stderr, "wadfile.HED must be in same directory as WAD\n");

fprintf(stderr, "%s l wadfile\n", szProg);
fprintf(stderr, "Writes \"offset size pathname\" to stdout\n");
fprintf(stderr, "wadfile.HED must be in the directory specified by \
wadfile\n"
);
}

int getEntry(Entry *entry, FILE *f)
{
BYTE buff[4];
char s[_MAX_PATH];
unsigned int nRead;

nRead = fread(buff, sizeof(BYTE), 4, f);
if (nRead < 4) {
return -1;
}

entry->dwOffset = swapBytes(buff);
if (entry->dwOffset == HDRTERM) {
return 0;
}

nRead = fread(buff, sizeof(BYTE), 4, f);
if (nRead < 4) {
return -1;
}

entry->dwSize = swapBytes(buff);

memset(s, 0, _MAX_PATH);
do {
nRead = fread(buff, sizeof(BYTE), 4, f);
if (nRead < 4) {
return -1;
}
strncat(s, buff, 4);

if (buff[0] == 0 || buff[1] == 0 || buff[2] == 0 || buff[3] == 0) {
strcpy(entry->szFullPath, s);
break;
}
} while (1);

return 1;
}

void listFiles(FILE *f)
{
Entry ent;
int ret;

while ( (ret = getEntry(&ent, f)) == 1 ) {
printf("%8X %8X %s\r\n", ent.dwOffset, ent.dwSize, ent.szFullPath);
}
}

DWORD swapBytes(BYTE *buff)
{
DWORD tmp = (buff[3] 0) {
if (dwSize > BUFFSIZE) {
dwRead = fread(buff, byteSize, BUFFSIZE, inFile);
if (dwRead < BUFFSIZE) { / * check for read error * /
if (ferror(inFile)) {
fclose(inFile);
return 0;
}
}
dwWritten = fwrite(buff, byteSize, dwRead, wad);
if (dwWritten < dwRead) {
fclose(inFile);
return 0;
}
dwSize -= BUFFSIZE;
} else {
dwRead = fread(buff, byteSize, dwSize, inFile);
if (dwRead < dwSize) { / * check for read error * /
if (ferror(inFile)) {
fclose(inFile);
return 0;
}
}
dwWritten = fwrite(buff, byteSize, dwRead, wad);
if (dwWritten < dwRead) {
fclose(inFile);
return 0;
}
break;
}
}
fclose(inFile);

if (fgetpos(wad, &wadOffset) != 0) {
return 0;
}
/ * align wad * /
mod = (unsigned int) (wadOffset % ALIGN);
if (mod != 0) {
nRemain = ALIGN - mod;
memset(buff, 0, nRemain);
dwWritten = fwrite(buff, byteSize, nRemain, wad);
if (dwWritten < nRemain) {
return 0;
}
}

/ * update hdr file * /
if (fsetpos(hdr, &hdrOffset) != 0) {
return 0;
}
if ((ret = writeEntry(&ent, hdr)) != 1) {
return 0;
}
if (fgetpos(hdr, &hdrOffset) != 0) {
return 0;
}
if (fsetpos(hdr, &hdrOffset) != 0) {
return 0;
}
}

if (ret == 0) {
ret = 1;
}
return ret;
}

int littleEndian(DWORD dwVal, BYTE *buff)
{
buff[3] = (BYTE) ((dwVal & 0xff000000) >> 24);
buff[2] = (BYTE) ((dwVal & 0x00ff0000) >> 16);
buff[1] = (BYTE) ((dwVal & 0x0000ff00) >> 8);
buff[0] = (BYTE) ((dwVal & 0x000000ff));
return 1;
}

int writeEntry(Entry *entry, FILE *f)
{
BYTE buff[8], *p;
unsigned int mod, nPathSize, nRemain;
DWORD dwWritten;
fpos_t hdrOffset;

p = &(buff[4]);
littleEndian(entry->dwOffset, buff);
littleEndian(entry->dwSize, p);
dwWritten = fwrite(buff, sizeof(BYTE), 8, f);
if (dwWritten < 8) { / * error * /
return 0;
}
nPathSize = strlen(entry->szFullPath) + 1;
dwWritten = fwrite(entry->szFullPath, 1, nPathSize, f);
if (dwWritten < nPathSize) { / / error
return 0;
}
/ * align on dword address * /
if (fgetpos(f, &hdrOffset) != 0) {
return 0;
}
mod = (unsigned int) (hdrOffset % 4);
if (mod != 0) {
nRemain = 4 - mod;
memset(buff, 0, nRemain);
dwWritten = fwrite(buff, sizeof(BYTE), nRemain, f);
if (dwWritten < nRemain) { / * error * /
return 0;
}
}
return 1;
}

int main(int argc, char** argv)
{
char szExt[] = ".WAD"; / * default WAD file extension * /
char szHdrExt[] = ".HED"; / * header file extension * /
char szExDir[] = "_extracted"; / * output directory suffix * /
char *szProg, *szPrefix, *szHdrName = NULL,
*szWadName = NULL, szWadDir[_MAX_PATH], *s, *t;
char mode;
FILE *hdr, *wad;
int retVal = 0;

szProg = argv[0];
++argv;
--argc;
if (argc != 2) {
usage(szProg);
exit(1);
}

/ * parse commandline options * /
mode = *argv[0];
szPrefix = argv[1];

/ * construct hdr and wad names * /
szHdrName = malloc(strlen(szPrefix)+strlen(szHdrExt)+1);
strcpy (szHdrName, szPrefix);
strcat(szHdrName, szHdrExt);

szWadName = malloc(strlen(szPrefix)+strlen(szExt)+1);
strcpy (szWadName, szPrefix);
strcat(szWadName, szExt);

hdr = fopen(szHdrName, "rb");
if (!hdr) {
fprintf(stderr, "Error: Could not open header file: %s\n",
szHdrName);
return 1;
}

switch (mode) {
case 'B':
case 'b':
fclose(hdr);
hdr = fopen(szHdrName, "r+b");
if (!hdr) {
fprintf(stderr, "Error: Could not open header file: %s\n",
szHdrName);
return 1;
}

wad = fopen(szWadName, "wb");
if (!wad) {
fprintf(stderr, "Error: Could not open wad file: %s\n",
szWadName);
retVal = 1;
break;
}
s = szWadDir;
t = szPrefix;
while (*s++ = *t++) { / * copy string and quote back slash * /
if (*s == '\\') {
*++s = '\\';
}
}
strcat(szWadDir, szExDir);
if (chdir(szWadDir) != 0) {
fprintf(stderr, "Error: Could cd into: %s\n", szWadDir);
retVal = 1;
break;
}
if (!buildWad(hdr, wad)) {
fprintf(stderr, "Error: Could not build: %s\n", szWadName);
retVal = 1;
}
break;
case 'L': / * list 'offset size \path\file' * /
case 'l':
listFiles(hdr);
break;
case 'X':
case 'x':
wad = fopen(szWadName, "rb");
if (!wad) {
fprintf(stderr, "Error: Could not open wad file: %s\n",
szWadName);
retVal = 1;
break;
}
s = szWadDir;
t = szPrefix;
while (*s++ = *t++) { / * copy string and quote back slash * /
if (*s == '\\') {
*++s = '\\';
}
}
strcat(szWadDir, szExDir);
mkdir(szWadDir);
if (chdir(szWadDir) != 0) {
fprintf(stderr, "Error: Could cd into: %s\n", szWadDir);
retVal = 1;
break;
}
if (!extractFiles(hdr, wad)) {
fprintf(stderr, "Error: Extraction error\n");
retVal = 1;
}
fclose(wad);
break;
default:
fprintf(stderr, "Unrecognized option: %c\n", mode);
usage(szProg);
retVal = 1;
break;
}

fclose(hdr);
if (szHdrName) {
free(szHdrName);
}
if (szWadName) {
free(szWadName);
}
return retVal;
}

next →
loading
sending ...
New to Neperos ? Sign Up for free
download Neperos App from Google Play
install Neperos as PWA

Let's discover also

Recent Articles

Recent Comments

Neperos cookies
This website uses cookies to store your preferences and improve the service. Cookies authorization will allow me and / or my partners to process personal data such as browsing behaviour.

By pressing OK you agree to the Terms of Service and acknowledge the Privacy Policy

By pressing REJECT you will be able to continue to use Neperos (like read articles or write comments) but some important cookies will not be set. This may affect certain features and functions of the platform.
OK
REJECT