string - In PHP what does it mean by a function being binary-safe?
redis comparison (4)
<?php $string1 = "Hello"; $string2 = "Hello\x00World"; // This function is NOT ! binary safe echo strcoll($string1, $string2); // gives 0, strings are equal. // This function is binary safe echo strcmp($string1, $string2); // gives <0, $string1 is less than $string2. ?>
\x indicates hexadecimal notation. See: PHP strings
0x00 = NULL 0x04 = EOT (End of transmission)
ASCII table to see ASCII char list
PHP what does it mean by a function being
What makes them special and where are they typically used ?
It means the function will work correctly when you pass it arbitrary binary data (i.e. strings containing non-ASCII bytes and/or null bytes).
For example, a non-binary-safe function might be based on a C function which expects null-terminated strings, so if the string contains a null character, the function would ignore anything after it.
This is relevant because PHP does not cleanly separate string and binary data.
The other users already mentioned what
binary safe means in general.
In PHP, the meaning is more specific, referring only to what Michael gives as an example.
All strings in PHP have a length associated, which are the number of bytes that compose it. When a function manipulates a string, it can either:
- Rely on that length meta-data.
- Rely on the string being null-terminated, i.e., that after the data that is actually part of the string, a byte with value
It's also true that all string PHP variables manipulated by the engine are also null-terminated. The problem with functions that rely on 2., is that, if the string itself contains a byte with value
0, the function that's manipulating it will think the string has ended at that point and will ignore everything after that.
For instance, if PHP's
strlen function worked like C standard library
strlen, the result here would be wrong:
$str = "abc\x00abc"; echo strlen($str); //gives 7, not 3!
I found these collation charts helpful. http://collation-charts.org/mysql60/. I'm no sure which is the used utf8_general_ci though.
For example here is the chart for utf8_swedish_ci. It shows which characters it interprets as the same. http://collation-charts.org/mysql60/mysql604.utf8_swedish_ci.html