Hello,
I've encountered a character encoding bug when performing move() or save() operations on user objects that have multi-byte UTF-8 characters (like ã, ç, é) in their Common Name (cn).
Environment:
- adLDAP Version: 5.2.17
- PHP Version: PHP 7.4.28
- LDAP Server: Microsoft Active Directory (configured to use UTF-8)
Bug Description
When updating or moving an existing user whose cn contains UTF-8 characters, the library incorrectly escapes these characters, leading to data corruption in the Active Directory.
Example Scenario:
- A user exists in AD with a correct
cn: Anita Cantora joão
- The
distinguishedName is: CN=Anita Cantora joão,OU=Users,DC=domain,DC=com
- When I perform an operation like moving the user to another OU using
$user->move(...) or updating another attribute using $user->save(), the user's cn in Active Directory gets corrupted to Anita Cantora jo\C3\A3o.
This happens because the escaping mechanism in the library is not UTF-8 aware. It treats the multi-byte sequence for "ã" (C3 A3) as individual characters to be escaped, resulting in the malformed \C3\A3.
This issue does not occur during user creation (createUser) if the DN is manually constructed, but it consistently happens on updates or moves that involve re-assembling the RDN.
Code to Reproduce
Here's a minimal example of how the bug is triggered:
// Assume $adldap is a connected Adldap instance
$adAPI = new ActiveDirectoryAPI($adldap); // My wrapper class
// Find an existing user with a UTF-8 character in their name
$user = $adAPI->adldap->users()->find('some.user');
// The name is correct at this point in AD: "Anita Cantora joão"
// var_dump($user->getCommonName()); // This correctly shows "Anita Cantora joão" in PHP
// Define the target OU
$targetOuDn = 'OU=NewOU,DC=domain,DC=com';
// This call triggers the bug. The library internally uses `getDnBuilder()->assembleCns()`
// which calls the faulty escape function.
$rdn = $user->getDnBuilder()->assembleCns();
$user->move($rdn, $targetOuDn, true);
// After this operation, the user's CN in AD is corrupted.
Root Cause Analysis
The root of the problem lies in Adldap\Objects\DistinguishedName::assembleRdns(). This method calls Adldap\Classes\Utilities::escape($value, '', 2) to escape the RDN value.
// In Adldap\Objects\DistinguishedName
protected function assembleRdns($attribute, array $values = [])
{
// ...
$values = array_map(function ($value) use ($attribute) {
// This escape call corrupts UTF-8 strings
return sprintf('%s=%s', $attribute, Utilities::escape($value, '', 2));
}, $values);
// ...
}
The Utilities::escape() function (and its internal escapeManualWithFlags) is not multi-byte safe. It iterates over a predefined list of characters to escape (, ,, +, etc.) but doesn't handle UTF-8 characters correctly, breaking them into invalid escape sequences.
Suggested Correction
Since PHP's native ldap_escape() function with the LDAP_ESCAPE_DN flag is also not guaranteed to be UTF-8 safe across all environments, and a manual regex-based escape is more reliable, I suggest modifying Adldap\Classes\Utilities::escape() to handle DN escaping with a UTF-8 compatible method.
A robust way to escape DN values is to handle the special characters defined in RFC 4514. The special characters are: ,, +, ", , <, >, ;, #, and =.
Here is a proposed replacement for the escapeManualWithFlags method (or a new method to be called by escape when $flags = 2):
File: Adldap/Classes/Utilities.php
/**
* Escapes a value for use in a Distinguished Name, with UTF-8 support.
* This is a more reliable replacement for ldap_escape with LDAP_ESCAPE_DN.
*
* @param string $value The value to escape.
* @return string The escaped value.
*/
private static function escapeDnValue($value)
{
// 1. First, escape the backslash itself
$value = str_replace('\\', '\\5c', $value);
// 2. Then, escape the other special characters defined in RFC 4514
$specialChars = [',', '+', '"', '<', '>', ';', '='];
foreach ($specialChars as $char) {
$value = str_replace($char, '\\' . bin2hex($char), $value);
}
// 3. Handle leading/trailing spaces and leading '#'
if (substr($value, 0, 1) === ' ') {
$value = '\\ ' . substr($value, 1);
}
if (substr($value, -1) === ' ') {
$value = substr($value, 0, -1) . '\\ ';
}
if (substr($value, 0, 1) === '#') {
$value = '\\#'. substr($value, 1);
}
return $value;
}
/**
* Returns an escaped string for use in an LDAP filter or DN.
*
* @param string $value
* @param string $ignore
* @param int $flags
*
* @return string
*/
public static function escape($value, $ignore = '', $flags = 0)
{
// Use the new, more reliable DN escaping method when the flag is set.
if ($flags === 2) { // Corresponds to LDAP_ESCAPE_DN
return self::escapeDnValue($value);
}
// Fallback to the original logic for other cases (like filter escaping).
if (self::isEscapingSupported()) {
return ldap_escape($value, $ignore, $flags);
}
return self::escapeManual($value, $ignore, $flags);
}
Correction Explanation:
The new escapeDnValue method correctly handles escaping according to RFC 4514. Instead of iterating over bytes, it uses str_replace which is binary-safe and works correctly with UTF-8 strings in PHP. It replaces each special character with its \xx hex representation. The main escape function is modified to call this new method specifically when escaping for a DN ($flags = 2).
This change ensures that cn values with UTF-8 characters are correctly preserved during move and save operations.
Thank you for your time and for this great library.
Hello,
I've encountered a character encoding bug when performing
move()orsave()operations on user objects that have multi-byte UTF-8 characters (likeã,ç,é) in their Common Name (cn).Environment:
Bug Description
When updating or moving an existing user whose
cncontains UTF-8 characters, the library incorrectly escapes these characters, leading to data corruption in the Active Directory.Example Scenario:
cn:Anita Cantora joãodistinguishedNameis:CN=Anita Cantora joão,OU=Users,DC=domain,DC=com$user->move(...)or updating another attribute using$user->save(), the user'scnin Active Directory gets corrupted toAnita Cantora jo\C3\A3o.This happens because the escaping mechanism in the library is not UTF-8 aware. It treats the multi-byte sequence for "ã" (
C3 A3) as individual characters to be escaped, resulting in the malformed\C3\A3.This issue does not occur during user creation (
createUser) if the DN is manually constructed, but it consistently happens on updates or moves that involve re-assembling the RDN.Code to Reproduce
Here's a minimal example of how the bug is triggered:
Root Cause Analysis
The root of the problem lies in Adldap\Objects\DistinguishedName::assembleRdns(). This method calls Adldap\Classes\Utilities::escape($value, '', 2) to escape the RDN value.
The Utilities::escape() function (and its internal escapeManualWithFlags) is not multi-byte safe. It iterates over a predefined list of characters to escape (, ,, +, etc.) but doesn't handle UTF-8 characters correctly, breaking them into invalid escape sequences.
Suggested Correction
Since PHP's native ldap_escape() function with the LDAP_ESCAPE_DN flag is also not guaranteed to be UTF-8 safe across all environments, and a manual regex-based escape is more reliable, I suggest modifying Adldap\Classes\Utilities::escape() to handle DN escaping with a UTF-8 compatible method.
A robust way to escape DN values is to handle the special characters defined in RFC 4514. The special characters are: ,, +, ", , <, >, ;, #, and =.
Here is a proposed replacement for the escapeManualWithFlags method (or a new method to be called by escape when $flags = 2):
File: Adldap/Classes/Utilities.php
Correction Explanation:
The new escapeDnValue method correctly handles escaping according to RFC 4514. Instead of iterating over bytes, it uses str_replace which is binary-safe and works correctly with UTF-8 strings in PHP. It replaces each special character with its \xx hex representation. The main escape function is modified to call this new method specifically when escaping for a DN ($flags = 2).
This change ensures that cn values with UTF-8 characters are correctly preserved during move and save operations.
Thank you for your time and for this great library.