File tree Expand file tree Collapse file tree 1 file changed +8
-0
lines changed
Expand file tree Collapse file tree 1 file changed +8
-0
lines changed Original file line number Diff line number Diff line change @@ -26,6 +26,14 @@ or ``.get_text()`` from Beautiful Soup?
2626Text extracted with ``html_text `` does not contain inline styles,
2727javascript, comments and other text that is not normally visible to the users.
2828
29+ Apart from just getting text from the page (e.g. for display or search),
30+ one intended usage of this library is for machine learning (feature extraction).
31+ If you want to use the text of the html page as a feature (e.g. for classification),
32+ this library gives you plain text that you can later feed into a standard text
33+ classification pipeline.
34+ If you feel that you need html structure as well, check out
35+ `webstruct <http://webstruct.readthedocs.io/en/latest/ >`_ library.
36+
2937
3038Install
3139-------
You can’t perform that action at this time.
0 commit comments