In PHP (and most languages), this is false:

'\143\141\164' == "\143\141\164"

No surprise there. One is a 12-byte string of backslashes and numbers, and the other is a 3-byte string of octal values spelling “cat”. When you use double quotes, PHP transparently converts the string.

Sometimes it’s convenient to write values in files as string literals that represent characters. Some values simply don’t translate well in their native form, and it’s more explicit to write them out “long hand” in octal or hexadecimal. This is useful if you have to match, say, an exotic series of characters with 100% accuracy.

But what happens when you need to clue PHP in that the string “\143\141\164″ (as read from a file) should equal “cat”? As far as I know, there’s no easy way to do this. Presumably, there should be a function—something like str_convert_literals()—which would accept a string and do the conversion itself. But there isn’t, so you must rely on regular expressions.

Here’s the solution I found after some trying various other methods (like tokenizing the string):

$string = preg_replace_callback('/\\\\([0-7]{1,3})/', 'convertOctalToCharacter',
                                $string);

function convertOctalToCharacter($octal)
{
    return chr(octdec($octal[1]));
}

I’ll run through what’s going on briefly. The regular expression matches anything following a backslash that is a series of up to three digits, 0-7 (octal is base 8, after all). It passes that match to the convertOctalToCharacter() function, which converts the value to decimal and then feeds it to the chr() function (which only accepts decimal values). That in turn converts the integer to its corresponding character value, which is then substituted into the string.

Based on this, the hexadecimal conversion function isn’t very difficult to guess. To get you started, I’ll give you a not-so-subtle hint: the regular expression is “/\\\\x([0-9A-F]{1,2})/i”.

One more thing: if you also translate special characters like “\r”, consider using lookbehinds in your expression to ensure that valid sequences like \\r aren’t converted twice.