Skip to content

Commit f437798

Browse files
authored
Add new FAQ entry on --license-text
Signed-off-by: Philippe Ombredanne <[email protected]>
1 parent 01fb718 commit f437798

File tree

1 file changed

+59
-0
lines changed

1 file changed

+59
-0
lines changed

docs/source/misc/faq.rst

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,3 +82,62 @@ When scanning binaries, the line numbers are just a relative indication of where
8282
a detection was found: there is no such thing as lines in a binary. The numbers
8383
reported are based on the strings extracted from the binaries, typically broken
8484
as new lines with each NULL character.
85+
86+
87+
How does ``--license-text`` for ScanCode works exactly?
88+
-------------------------------------------------------------
89+
90+
I have a question about how ``--license-text`` for ScanCode works exactly:
91+
Is the matched text that gets included into the result exactly the lines of text
92+
from the input file that are covered by the ``start_line`` and ``end_line``
93+
fields of the result? I.e., if I would post-process the input file and extract
94+
``start_line`` to ``end_line`` from it, would I get exactly the ``matched_text``
95+
contents? Or is there some more "magic" involved when populating the
96+
``matched_text`` field?
97+
98+
ScanCode is a bit smarter than just start and end line, as matching is based on
99+
words, not lines of the actual scanned text.
100+
And a whole line may not always be matched.
101+
102+
For instance with this command::
103+
104+
$ echo "Foo is a wonder piece of code. Licensed under the GPL. For support contact [email protected] " > tst
105+
$ scancode --license --license-text --license-text-diagnostics --yaml - tst
106+
...
107+
license_detections:
108+
- license_expression: gpl-1.0-plus
109+
license_expression_spdx: GPL-1.0-or-later
110+
matches:
111+
- license_expression: gpl-1.0-plus
112+
license_expression_spdx: GPL-1.0-or-later
113+
from_file: tst
114+
start_line: 1
115+
end_line: 1
116+
matcher: 2-aho
117+
score: '100.0'
118+
matched_length: 4
119+
match_coverage: '100.0'
120+
rule_relevance: 100
121+
rule_identifier: gpl_85.RULE
122+
rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl_85.RULE
123+
matched_text: Foo is a wonder piece of code. Licensed under the GPL.
124+
For support contact [email protected]
125+
matched_text_diagnostics: Licensed under the GPL.
126+
...
127+
128+
then:
129+
130+
- ``matched_text`` is based on ``start_line`` and ``end_line``
131+
- ``matched_text_diagnostics`` is based on the exact matched words (and it includes "tagged" gaps or extra)
132+
133+
134+
135+
136+
137+
138+
139+
140+
141+
142+
143+

0 commit comments

Comments
 (0)