String Manipulation

Various string hashing, compression and encryption algorithms reverse engineered from games.

Bully (Canis Canem Edit)

Two different hashing algorithms, one used to look up strings by their label, and another used to look up RSM sound files.

Audio Lookup Hash

A 32-bit string hashing algorithm used for looking up audio files by their hashed labels.

The algorithm is identical to Midnight Club 3: DUB Edition, the only thing that sets this apart is the conversion to lowercase and backslashes to forward slashes, and a secondary terminator if the string itself is encased in quotation marks.

Py (Standalone), JS (Live).


Label Lookup Hash

A 31-bit string hashing algorithm used for looking up text by their hashed labels.

The string is first converted to upper case. For each ASCII character, the current hash is multiplied by 131 and the character value is added to it, and the 31st bit is then masked out.

Py (Standalone), JS (Live).


String Encryption

A string encryption algorithm used for masking the .EFF, .MTL, .WDB, .XML files in Bully: Anniversary Edition.

Py (Standalone), JS (Live).


Call of Duty

A string hashing algorithm used in Call of Duty: Finest Hour.

String Hashing

A 64-bit string hashing algorithm used to store filenames in the Spark Pack in Call of Duty: Finest Hour.

The string is converted to uppercase and forward slashes are converted to backslashes, and the initial hash seed is set to 9521211207457086692. For each ASCII character, the value of the character is XORed to the current hash shifted left by 40 bits, to which the current hash that's multiplied by 435 is then added. There is also a secondary loop break condition if the current character in the string contains a semicolon.
In other words, this is a 64-bit FNV-1 hash reimplementation with the offset basis' upper and lower halves swapped.

Py (Standalone), JS (Live).


Criterion Games

A string compression and hashing algorithm used in Burnout 3: Takedown, Legends, Revenge, Dominator, Black, and Need for Speed: Shift (PSP).

String Compression

A 64-bit string compression algorithm used to store track and vehicle names, and file identifiers.

The original code in the games is much longer because the ASCII values are calculated on the spot - it's been greatly simplified here. Both the character set, and the compressed character index array is now pre-made. Thus the explanation will follow my reimplementation, rather than original code.
To compress, the string is first padded with whitespace to 12 characters and reversed. For each character, the compressed character array is indexed with the ASCII value and multiplied by the result of the total amount of characters exponented to the current string index, which is then added to the compressed integer.
To decompress, the compressed integer is divided by the length of the character set, the remainder of which is used to index the decompressed character. The final decompressed string is then reversed.
In other words, a glorified Base 40 conversion.

Py (Standalone), JS (Live).


String Hashing

A 32-bit string hashing algorithm used for looking up strings and vehicle parameters by their hashed labels.

The algorithm utilizes a pre-computed hash table, which is identical to the one used by CRC-32. For each ASCII character value, the hash, while is treated as unsigned, it's shifted right by 8 bits as a signed integer, which is then XORed by a value from the hash table indexed by XORing the character value with the 8 bits in the hash that were previously shifted away.
In short, it's just a derivative of CRC-32 with the main difference being the final hash not being XORed with -1 (like CRC-32/JAMCRC) and performing a signed 8-bit shift in the loop but making the result unsigned.

Py (Standalone), JS (Live).


Epic Mickey

A string hashing algorithm used in Epic Mickey.

String Hashing

A 32-bit string hashing algorithm used for looking up strings by their hashed label in Epic Mickey. The 2nd hexadecimal field takes an input of the unique ID as a hash seed present at offset 8 of each of the language .DCT dictionary files.

Py (Standalone), JS (Live).


Jak & Daxter

A string compression algorithm used in Jak 3 and Jak X: Combat Racing.

String Compression

A 42-bit string compression algorithm used to store audio entries.

The original code in the game is longer because the character indices are worked out on the spot - it's been simplified here with a pre-made character set.
Each compressed entry consists of multiple fields: the first 42 bits (0 - 41) are used for the compressed filename, bit 42 stores a flag whether or not the sound file is stereo, bit 43 stores a flag whether or not the sound file is meant for the international VAGWAD.INT container, bits 44 - 47, while uncertain, store a value that seems to correlate to the frequency/sample rate of the sound file most of the times, and the last 16 bits store the offset of the file aligned to 32768 bytes.
The 42-bit compressed filename is split into two 21-bit fields, the upper half storing the first 4 compressed characters, and the lower half storing the last 4 compressed characters. For each compressed field, the integer is divided by the length of the character set, the remainder of which is used to index the decompressed character. The final decompressed string is then reversed.

Py (Standalone), JS (Live).


Lingo (Tildes Birojs)

A Latvian game of Lingo that came bundled with Tildes Birojs.

String Compression

A 32-bit string compression algorithm used for storing every guessable 5 letter Latvian word in 4 bytes.
See the source code header comment for an in-depth explanation on why exactly this was made, as obscure as it may seem.

The game only has a decompression routine, the compression part is my own code. It also originally was written with the assumption that the system had the Baltic Windows Codepage 1257 installed.
To decompress, the compressed number is divided by 33, and the remainder of it is used to return a character from a long switch case of letters in the Latvian alphabet.
To compress, a dictionary is first prepared for each used Latvian letter to correspond to the numbers 1 - 9 and letters a - w. The number 0 is used for any character not in the dictionary, all of which will be turned into the letter A. The final string is converted to an integer by treating it as a Base 33 number.

Py (Standalone), JS (Live).


Midnight Club

Might apply to more early Angel Studios/Rockstar San Diego games. Confirmed for Midnight Club 2 and Midnight Club 3: DUB Edition.

Audio Lookup Hash

A 32-bit string hashing algorithm used for looking up audio files by their hashed labels.

The string is converted to uppercase and backslashes are converted to forward slashes. For each ASCII character, the 31st bit of the hash is rolled over to the 0th position, and the character value that's multiplied by the current string index starting from 1 is added to the hash.

Py (Standalone), JS (Live).


String Lookup Hash (Midnight Club 2)

A 28-bit string hashing algorithm used for looking up strings by their hashed labels.

For each ASCII character, the hash is shifted left by 4 bits and the value of the character is added to it. If the last nibble of the hash isn't 0, then the hash is XORed with the final nibble that's XORed with itself shifted right by 24 bits.

Py (Standalone), JS (Live).


String Lookup Hash (Midnight Club 3)

A 32-bit string hashing algorithm used for looking up strings by their hashed labels.

The ASCII value of each character is added to the current hash, and the resulting value is then shifted left by 10 bits with the original resulting value also added to it. Then the current hash is XORed with the current hash that's shifted right by 6 bits.
After every character is hashed, the final hash is shifted left by 3 bits and the original resulting hash is added to it, which is then XORed with the resulting hash shifted right by 11 bits. Finally, the resulting hash is shifted left by 15 bits, and the original resulting hash is added to it again.
This algorithm is also used for Bully's audio lookup hash, just with slight changes being made to the string itself.

Py (Standalone), JS (Live).


The Simpsons Game

Might apply to other Electronic Arts: Redwood Shores/Visceral Games developed titles.

String Hashing

A 32-bit string hashing algorithm used for looking up strings by their hashed labels.

The string is converted to lower case. For each character, the hash is multiplied by 65599, to which the ASCII value of the character is then added.

Py (Standalone), JS (Live).


The Sims (Console)

Might apply to other Edge of Reality and/or Maxis developed titles.

String Hashing

A 32-bit string hashing algorithm used for looking up files by their hashed filenames.

The algorithm used is plain CRC-32, but the string is converted to uppercase and any non-alphanumeric character is converted to an underscore. These string changes so far apply to The Sims 1 Console up to v1.11 (European and North American PS2 releases). The string changes for all the other Sims 1, 2, 3 (Wii), and Urbz console games are the same, but with the additional change of adding an extra underscore at the start of the string if it starts with a number.

JS (Live).


Key Generation

A key generation algorithm used for Unlock Codes in The Sims 2: Pets (Console). The allowed characters for the gifter's name are all alphanumeric characters 0-9 and A-Z (both upper and lower case), extended characters ÀÁÄÆÈÉÌÍÑÒÓÖÙÚÜŒ (both upper and lower case), extended characters âçêëîïôûß (lower case only), spaces, and special characters !"#%&'()*+,-./:;=?@\^{}~. Any other character will be converted to a space.
For a full list of the available gifts, take a look at the provided source code. They've been sorted alphabetically by the gift name.

Py (Standalone), JS (Live).


Tomb Raider

String hashing algorithm used in Tomb Raider: Legend, and possibly other games.

String Hashing

A 32-bit string hashing algorithm used for looking up files by their hashed filenames.

The string is converted to lower case, and the initial hash is set to -1. For each character, the hash is XORed by the ASCII value of the character shifted left by 24 bits. Then, if the hash is a negative number, it's added to itself, to which the result is XORed with 79764919, otherwise the hash is shifted left by 1 bit, and this entire process is repeated 8 times. Finally, the hash is NORed.

Py (Standalone), JS (Live).