@@ -82,3 +82,62 @@ When scanning binaries, the line numbers are just a relative indication of where
82
82
a detection was found: there is no such thing as lines in a binary. The numbers
83
83
reported are based on the strings extracted from the binaries, typically broken
84
84
as new lines with each NULL character.
85
+
86
+
87
+ How does ``--license-text`` for ScanCode works exactly?
88
+ -------------------------------------------------------------
89
+
90
+ I have a question about how ``--license-text`` for ScanCode works exactly:
91
+ Is the matched text that gets included into the result exactly the lines of text
92
+ from the input file that are covered by the ``start_line`` and ``end_line``
93
+ fields of the result? I.e., if I would post-process the input file and extract
94
+ ``start_line`` to ``end_line`` from it, would I get exactly the ``matched_text ``
95
+ contents? Or is there some more "magic" involved when populating the
96
+ ``matched_text`` field?
97
+
98
+ ScanCode is a bit smarter than just start and end line, as matching is based on
99
+ words, not lines of the actual scanned text.
100
+ And a whole line may not always be matched.
101
+
102
+ For instance with this command::
103
+
104
+ $ echo "Foo is a wonder piece of code. Licensed under the GPL. For support contact [email protected] " > tst
105
+ $ scancode --license --license-text --license-text-diagnostics --yaml - tst
106
+ ...
107
+ license_detections:
108
+ - license_expression: gpl-1.0-plus
109
+ license_expression_spdx: GPL-1.0-or-later
110
+ matches:
111
+ - license_expression: gpl-1.0-plus
112
+ license_expression_spdx: GPL-1.0-or-later
113
+ from_file: tst
114
+ start_line: 1
115
+ end_line: 1
116
+ matcher: 2-aho
117
+ score: '100.0'
118
+ matched_length: 4
119
+ match_coverage: '100.0'
120
+ rule_relevance: 100
121
+ rule_identifier: gpl_85.RULE
122
+ rule_url: https://github.com/nexB/scancode-toolkit/tree/develop/src/licensedcode/data/rules/gpl_85.RULE
123
+ matched_text: Foo is a wonder piece of code. Licensed under the GPL.
124
+ For support contact [email protected]
125
+ matched_text_diagnostics: Licensed under the GPL.
126
+ ...
127
+
128
+ then:
129
+
130
+ - ``matched_text `` is based on ``start_line `` and ``end_line ``
131
+ - ``matched_text_diagnostics `` is based on the exact matched words (and it includes "tagged" gaps or extra)
132
+
133
+
134
+
135
+
136
+
137
+
138
+
139
+
140
+
141
+
142
+
143
+
0 commit comments