Scripts to help guide cleanup of #include lines in a codebase, using clangd
add_to_remove_include.py- Determine which missing include edges need to be added to remove a specific includeapply_include_changes.py- Apply include changes to files in the source treeextract_archived_include_analysis.py- Extract archived include analysis JSONfilter_include_changes.py- Filter include changes outputinclude_analysis_diff.py- Analyze differences between an include analysis output and previous oneslist_includers.py- List includers of a filelist_transitive_includes.py- List transitive (and direct) includes of a filepost_process_compilation_db.py- Post-process the clang compilation database for analysisrecalculate_expanded_sizes.py- Recalculate translation unit expanded sizes if all provided include changes were appliedset_edge_weights.py- Set edge weights in include changes outputsuggest_include_changes.py- Suggests includes to add and removetrace_transitive_include.py- Trace a transitive include from a source file
To use these scripts, you'll need:
- A release of
clangdwhich has "IncludeCleaner" with support for missing includes (17.0.0+) - The full output of
//tools/clang/scripts/analyze_includes.py, see discussion on the mailing list for how to generate it - A compilation database for
clangdto use, which can be generated withgn gen . --export-compile-commandsin the Chromium output directory- The generated
compile_commands.jsonshould be post-processed with thepost_process_compilation_db.pyscript for best results
- The generated
$ pip install -r ~/chromium-include-cleanup/requirements.txtYou need to enable MissingIncludes and UnusedIncludes diagnostics in a
clangd config file:
Diagnostics:
MissingIncludes: Strict
UnusedIncludes: StrictThese instructions assume you've already built and processed the build
log with //tools/clang/scripts/analyze_includes.py, if you haven't, see the link above under
"Prerequisites". It assumes the output is at ~/include-analysis.js, so
adjust to taste.
This also assumes you have clangd on your $PATH.
$ cd ~/chromium/src/out/Default
$ gn gen . --export-compile-commands
$ python3 ~/chromium-include-cleanup/post_process_compilation_db.py compile_commands.json > compile_commands-fixed.json
$ mv compile_commands-fixed.json compile_commands.json
$ cd ../../
$ python3 ~/chromium-include-cleanup/suggest_include_changes.py --compile-commands-dir=out/Default ~/include-analysis.js > ~/unused-edges.csv
$ python3 ~/chromium-include-cleanup/set_edge_weights.py ~/unused-edges.csv ~/include-analysis.js --config ~/chromium-include-cleanup/configs/chromium.json > ~/weighted-unused-edges.csvAnother useful option is --filename-filter=^base/, which lets you filter the
files which will be analyzed, which can speed things up considerably if it is
limited to a subset of the codebase.
Edge weights are set in a separate script to allow quick iteration, since
suggest_include_changes.py takes many hours to run. The default metric
for edge weights pulls the "Added Size" metric from the include analysis
output. This means new weights can be easily be applied to the output of
suggest_include_changes.py by downloading the latest hosted include
analysis output at https://commondatastorage.googleapis.com/chromium-browser-clang/include-analysis.js,
but mileage may vary since you're combining output from your local build
and the hosted build.
For a full codebase run of the suggest_include_changes.py script on Ubuntu,
it takes 7 hours on a 4 core, 8 thread machine. clangd is highly parallel
though, and the script is configured to use all available logical CPUs, so it
will scale well on beefier machines.
Currently the suggest_include_changes.py script has problems with suggesting
includes to remove when the filename in the #include line does not match the
filename in the include analysis output, which could happen for includes
inside third-party code which is including relative to itself, not the source
root.
When suggesting includes to add, clangd will sometimes suggest headers which
are internal to the standard library, like <__hash_table>, rather than the
public header. Unfortunately these cases can't be disambiguated by this script,
since there's not enough information to work off of.
These scripts rely on clangd and specifically the "IncludeCleaner" feature
to determine which includes are unused, and which headers need to be added.
With the Chromium codebase, there are many places where clangd will return
false positives, suggesting that an include is not used when it actually is.
As such, the output is more of a guide than something which can be used as-is
in an automated situation.
Known situations in Chromium where clangd will produce false positives:
- When an include is only used for a
friend classdeclaration - When the code using an include is inside an
#ifdefnot used on the system which built the codebase - Macros in general are often a struggle point
- Umbrella headers
- Certain forward declarations seem to be flagged incorrectly as the canonical location for a symbol, such as "base/callback_forward.h"
- Forward declarations in the file being analyzed
clangdwon't consider an include unused even if forward declarations exist which make it unnecessaryclangdwill still suggest an include even if a forward declaration makes it unnecessary- In some circumstances the presence of an incorrect forward declaration
will stop
clangdfrom suggesting a missing include