@@ -82,3 +82,53 @@ When scanning binaries, the line numbers are just a relative indication of where
82
82
a detection was found: there is no such thing as lines in a binary. The numbers
83
83
reported are based on the strings extracted from the binaries, typically broken
84
84
as new lines with each NULL character.
85
+
86
+
87
+ How does ``--license-text `` for ScanCode works exactly?
88
+ -------------------------------------------------------------
89
+
90
+ Is the matched text that gets included into the result exactly the lines of text
91
+ from the input file that are covered by the ``start_line `` and ``end_line ``
92
+ fields of the result? I.e., if I would post-process the input file and extract
93
+ ``start_line `` to ``end_line `` from it, would I get exactly the ``matched_text ``
94
+ contents? Or is there some more "magic" involved when populating the
95
+ ``matched_text `` field?
96
+
97
+ ScanCode is a bit smarter than just start and end line, as matching is based on
98
+ words, not lines of the actual scanned text. And a whole line may not always be matched.
99
+
100
+ For instance with this command::
101
+
102
+ $ echo "Foo is a wonder piece of code. Licensed under the GPL. " \
103
+ "For support contact [email protected] " > tst
104
+ $ scancode --license --license-text --license-text-diagnostics --yaml - tst
105
+ ...
106
+ license_detections:
107
+ - license_expression: gpl-1.0-plus
108
+ license_expression_spdx: GPL-1.0-or-later
109
+ matches:
110
+ - license_expression: gpl-1.0-plus
111
+ license_expression_spdx: GPL-1.0-or-later
112
+ from_file: tst
113
+ start_line: 1
114
+ end_line: 1
115
+ matcher: 2-aho
116
+ score: '100.0'
117
+ matched_length: 4
118
+ match_coverage: '100.0'
119
+ rule_relevance: 100
120
+ rule_identifier: gpl_85.RULE
121
+ rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl_85.RULE
122
+ matched_text: Foo is a wonder piece of code. Licensed under the GPL.
123
+ For support contact [email protected]
124
+ matched_text_diagnostics: Licensed under the GPL.
125
+ ...
126
+
127
+ then:
128
+
129
+ - ``matched_text `` is based on ``start_line `` and ``end_line ``
130
+ - ``matched_text_diagnostics `` is based on the exact matched words
131
+
132
+ Note that ``matched_text_diagnostics `` also includes "tagged" gaps or extra
133
+ unmatched words highlighted between the matched words.
134
+
0 commit comments