Skip to content

bug in adldap2/adldap2 5.2.17 DN/RDN escaping corrupts UTF-8 characters on move() and save() operations #147

@insinfo

Description

@insinfo

Hello,

I've encountered a character encoding bug when performing move() or save() operations on user objects that have multi-byte UTF-8 characters (like ã, ç, é) in their Common Name (cn).

Environment:

  • adLDAP Version: 5.2.17
  • PHP Version: PHP 7.4.28
  • LDAP Server: Microsoft Active Directory (configured to use UTF-8)

Bug Description

When updating or moving an existing user whose cn contains UTF-8 characters, the library incorrectly escapes these characters, leading to data corruption in the Active Directory.

Example Scenario:

  1. A user exists in AD with a correct cn: Anita Cantora joão
  2. The distinguishedName is: CN=Anita Cantora joão,OU=Users,DC=domain,DC=com
  3. When I perform an operation like moving the user to another OU using $user->move(...) or updating another attribute using $user->save(), the user's cn in Active Directory gets corrupted to Anita Cantora jo\C3\A3o.

This happens because the escaping mechanism in the library is not UTF-8 aware. It treats the multi-byte sequence for "ã" (C3 A3) as individual characters to be escaped, resulting in the malformed \C3\A3.

This issue does not occur during user creation (createUser) if the DN is manually constructed, but it consistently happens on updates or moves that involve re-assembling the RDN.

Code to Reproduce

Here's a minimal example of how the bug is triggered:

// Assume $adldap is a connected Adldap instance
$adAPI = new ActiveDirectoryAPI($adldap); // My wrapper class

// Find an existing user with a UTF-8 character in their name
$user = $adAPI->adldap->users()->find('some.user');

// The name is correct at this point in AD: "Anita Cantora joão"
// var_dump($user->getCommonName()); // This correctly shows "Anita Cantora joão" in PHP

// Define the target OU
$targetOuDn = 'OU=NewOU,DC=domain,DC=com';

// This call triggers the bug. The library internally uses `getDnBuilder()->assembleCns()`
// which calls the faulty escape function.
$rdn = $user->getDnBuilder()->assembleCns(); 
$user->move($rdn, $targetOuDn, true); 

// After this operation, the user's CN in AD is corrupted.

Root Cause Analysis
The root of the problem lies in Adldap\Objects\DistinguishedName::assembleRdns(). This method calls Adldap\Classes\Utilities::escape($value, '', 2) to escape the RDN value.

// In Adldap\Objects\DistinguishedName
protected function assembleRdns($attribute, array $values = [])
{
    // ...
    $values = array_map(function ($value) use ($attribute) {
        // This escape call corrupts UTF-8 strings
        return sprintf('%s=%s', $attribute, Utilities::escape($value, '', 2));
    }, $values);
    // ...
}

The Utilities::escape() function (and its internal escapeManualWithFlags) is not multi-byte safe. It iterates over a predefined list of characters to escape (, ,, +, etc.) but doesn't handle UTF-8 characters correctly, breaking them into invalid escape sequences.
Suggested Correction
Since PHP's native ldap_escape() function with the LDAP_ESCAPE_DN flag is also not guaranteed to be UTF-8 safe across all environments, and a manual regex-based escape is more reliable, I suggest modifying Adldap\Classes\Utilities::escape() to handle DN escaping with a UTF-8 compatible method.
A robust way to escape DN values is to handle the special characters defined in RFC 4514. The special characters are: ,, +, ", , <, >, ;, #, and =.
Here is a proposed replacement for the escapeManualWithFlags method (or a new method to be called by escape when $flags = 2):
File: Adldap/Classes/Utilities.php

    /**
     * Escapes a value for use in a Distinguished Name, with UTF-8 support.
     * This is a more reliable replacement for ldap_escape with LDAP_ESCAPE_DN.
     *
     * @param string $value The value to escape.
     * @return string The escaped value.
     */
    private static function escapeDnValue($value)
    {
        // 1. First, escape the backslash itself
        $value = str_replace('\\', '\\5c', $value);
        
        // 2. Then, escape the other special characters defined in RFC 4514
        $specialChars = [',', '+', '"', '<', '>', ';', '='];
        foreach ($specialChars as $char) {
            $value = str_replace($char, '\\' . bin2hex($char), $value);
        }
        
        // 3. Handle leading/trailing spaces and leading '#'
        if (substr($value, 0, 1) === ' ') {
            $value = '\\ ' . substr($value, 1);
        }
        if (substr($value, -1) === ' ') {
            $value = substr($value, 0, -1) . '\\ ';
        }
        if (substr($value, 0, 1) === '#') {
            $value = '\\#'. substr($value, 1);
        }
        
        return $value;
    }

    /**
     * Returns an escaped string for use in an LDAP filter or DN.
     *
     * @param string $value
     * @param string $ignore
     * @param int    $flags
     *
     * @return string
     */
    public static function escape($value, $ignore = '', $flags = 0)
    {
        // Use the new, more reliable DN escaping method when the flag is set.
        if ($flags === 2) { // Corresponds to LDAP_ESCAPE_DN
             return self::escapeDnValue($value);
        }
        
        // Fallback to the original logic for other cases (like filter escaping).
        if (self::isEscapingSupported()) {
            return ldap_escape($value, $ignore, $flags);
        }

        return self::escapeManual($value, $ignore, $flags);
    }

Correction Explanation:
The new escapeDnValue method correctly handles escaping according to RFC 4514. Instead of iterating over bytes, it uses str_replace which is binary-safe and works correctly with UTF-8 strings in PHP. It replaces each special character with its \xx hex representation. The main escape function is modified to call this new method specifically when escaping for a DN ($flags = 2).
This change ensures that cn values with UTF-8 characters are correctly preserved during move and save operations.
Thank you for your time and for this great library.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions