Skip to content

Fix PHP find_symbol: route by file extension, handle flat SymbolInformation#1097

Open
EarthmanWeb wants to merge 6 commits intooraios:mainfrom
EarthmanWeb:fix/php-symbol-routing
Open

Fix PHP find_symbol: route by file extension, handle flat SymbolInformation#1097
EarthmanWeb wants to merge 6 commits intooraios:mainfrom
EarthmanWeb:fix/php-symbol-routing

Conversation

@EarthmanWeb
Copy link
Contributor

Problem

Two related issues cause find_symbol to fail or return wrong results in PHP projects:

1. Incorrect language server routing
In multi-language projects, get_language_server() routed files using
is_ignored_path() — a method that checks gitignore and unsupported
extension filters. A PHP file could pass or fail this check for the
wrong language server depending on configuration order, causing PHP
files to be routed to the Python LS (or vice versa).

2. Flat SymbolInformation[] from Intelephense
For some PHP files, Intelephense returns SymbolInformation[] (the
older LSP format) instead of DocumentSymbol[]. These symbols have no
parent-child hierarchy — methods appear as root-level symbols with only
a containerName string field linking them to their class.

find_symbol("ClassName/method") failed silently because
iter_name_path_components_reversed() never yielded the container,
so the name-path match never succeeded.

3. Hard error on explicitly-requested ignored file
When get_symbols_overview() was called on a file that happened to
match an ignore pattern, it raised a hard error and returned []
instead of proceeding. This broke workflows where the caller explicitly
requested a specific file.

Fix

  • ls_manager.py: Route by fn_matcher.is_relevant_filename()
    checks the file extension against the language's known extensions —
    instead of is_ignored_path().

  • symbol.py: Added containerName fallback in
    iter_name_path_components_reversed(). When a symbol has no
    ancestors (flat SymbolInformation), yield the containerName value
    so name-path patterns like "ClassName/method" still match.

  • ls.py: Downgraded hard error to a warning when an explicitly
    requested file is ignored; proceed with the symbol request anyway.

Example

Before:

# In a Python+PHP project:
find_symbol("TestWebhookHandler/handle_request")
# → [] (routed to wrong LS, or containerName not traversed)

After:

find_symbol("TestWebhookHandler/handle_request")
# → [<LanguageServerSymbol: handle_request>]

…mation

Three related fixes for PHP symbol resolution:

1. ls_manager.get_language_server(): route to language server by file
   extension (via get_source_fn_matcher) instead of is_ignored_path, so
   PHP files are correctly dispatched to Intelephense.

2. ls.get_symbols_overview(): when a file is explicitly requested but
   matches an ignore pattern, warn and proceed instead of returning an
   empty list.

3. symbol.iter_name_path_components_reversed(): add containerName fallback
   for flat SymbolInformation responses (Intelephense returns these for some
   PHP files), enabling name-path patterns like "ClassName/method" to match.
Comment on lines +340 to +352
has_yielded_ancestor = False
for ancestor in self.iter_ancestors(up_to_symbol_kind=SymbolKind.File):
yield NamePathComponent(ancestor.name, ancestor.overload_idx)
has_yielded_ancestor = True
if not has_yielded_ancestor:
# Fallback for flat SymbolInformation (e.g. Intelephense returns SymbolInformation[]
# instead of DocumentSymbol[] for some PHP files). In that case symbols have no
# parent-child links, but may carry a containerName that identifies the enclosing
# class or function. Yield it so that name-path patterns like "ClassName/method"
# still match correctly.
container_name = self.symbol_root.get("containerName")
if container_name:
yield NamePathComponent(container_name, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has to be solved in the language server, not here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By that I mean the SolidLanguageServer implementation, not Intelephense itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — moved the fix to Intelephense._request_document_symbols in ce6ddf3. The override detects flat SymbolInformation[] (by top-level "location" key) and calls a new _reconstruct_document_symbols static method that rebuilds the proper DocumentSymbol[] tree using containerName before returning. symbol.py is reverted to its original form.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

This means that all cached raw document symbol data needs to be invalidated, however, and we must increment cache_version_raw_document_symbols (which currently uses the default value 1) in the super constructor call to achieve this.

Comment on lines +139 to +145
if not candidate.is_ignored_path(relative_path, ignore_unsupported_files=True):
fn_matcher = candidate.language.get_source_fn_matcher()
if fn_matcher.is_relevant_filename(relative_path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you change this? is_ignored_path with ignore_unsupported_files=True has the same semantics for files.

serena/src/solidlsp/ls.py

Lines 580 to 599 in 3d4610f

def is_ignored_path(self, relative_path: str, ignore_unsupported_files: bool = True) -> bool:
"""
Determine if a path should be ignored based on file type
and ignore patterns.
:param relative_path: Relative path to check
:param ignore_unsupported_files: whether files that are not supported source files should be ignored
:return: True if the path should be ignored, False otherwise
"""
abs_path = os.path.join(self.repository_root_path, relative_path)
if not os.path.exists(abs_path):
raise FileNotFoundError(f"File {abs_path} not found, the ignore check cannot be performed")
# Check file extension if it's a file
is_file = os.path.isfile(abs_path)
if is_file and ignore_unsupported_files:
fn_matcher = self.language.get_source_fn_matcher()
if not fn_matcher.is_relevant_filename(abs_path):
return True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to is ignored path are leaked over from other changes to allow an option in config to allow dotfiles and to allow gitignored files. That was negating the flags that were set in the config

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LS implementations ignore only directories starting with ".", not files. For files, the logic I cited applies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both addressed:

symbol.py — moved to the correct layer in ce6ddf3a; Intelephense now reconstructs the DocumentSymbol[] tree from flat SymbolInformation[] before it reaches shared code.

ls_manager.py — you're right, is_ignored_path already does the extension check at ls.py:596–599. Reverted in 50037339. The bypass came from confusion with changes in another branch that don't actually affect file routing here.

Intelephense can return flat SymbolInformation[] (no parent-child links)
instead of DocumentSymbol[] for some PHP files. The previous fix placed a
containerName fallback inside LanguageServerSymbol.iter_name_path_components_reversed,
polluting the generic symbol layer with a language-server-specific quirk.

Moved the fix to the correct layer per reviewer feedback:

- intelephense.py: Override _request_document_symbols to detect flat
  SymbolInformation[] (identified by a top-level "location" key) and
  reconstruct a proper DocumentSymbol[] tree using containerName to wire
  up parent-child relationships before the result reaches shared code.

- symbol.py: Revert iter_name_path_components_reversed to its original
  clean form; no Intelephense-specific knowledge belongs here.

- test_symbol.py: Remove TestLanguageServerSymbolFlatContainerName, which
  tested the now-removed containerName fallback in the generic class.

- test_php_basic.py: Add TestIntelephenseReconstructDocumentSymbols with
  four unit tests covering _reconstruct_document_symbols directly (root
  symbols, method-to-class attachment, orphan handling, two-class
  separation). The existing integration test test_find_symbol_class_method
  continues to cover the end-to-end path through Intelephense.
The previous implementation bypassed is_ignored_path in favour of calling
fn_matcher.is_relevant_filename directly when routing a file path to the
correct language server in a multi-LS configuration.

This was unnecessary: is_ignored_path(relative_path,
ignore_unsupported_files=True) already performs the exact same file-extension
check via fn_matcher.is_relevant_filename internally (ls.py:596-599) before
it evaluates directory ignore rules or gitignore patterns.

The bypass was introduced due to confusion with a separate in-progress branch
where unrelated changes to is_ignored_path were inadvertently negating the
extension check for files. As noted in PR review, the dot-directory guard
(is_ignored_dirname) applies only to directory path components, not to files
themselves, so those changes do not affect file-extension routing here.

Reverts get_language_server to its original is_ignored_path call and removes
the expanded docstring added with the bypass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants