You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
All configurable Scrapy Settings added by the Middleware.
53
56
54
57
55
-
With the middleware, the usage of crawlera is automatic, every request will go through crawlera without nothing to worry about.
56
-
If you want to *disable* crawlera on a specific Request, you can do so by updating `meta` with `dont_proxy=True`::
58
+
With the middleware, the usage of Zyte Smart Proxy Manager is automatic, every request will go through Zyte Smart Proxy Manager without nothing to worry about.
59
+
If you want to *disable* Zyte Smart Proxy Manager on a specific Request, you can do so by updating `meta` with `dont_proxy=True`::
57
60
58
61
59
62
scrapy.Request(
@@ -65,11 +68,11 @@ If you want to *disable* crawlera on a specific Request, you can do so by updati
65
68
)
66
69
67
70
68
-
Remember that you are now making requests to Crawlera, and the Crawlera service will be the one actually making the requests to the different sites.
71
+
Remember that you are now making requests to Zyte Smart Proxy Manager, and the Zyte Smart Proxy Manager service will be the one actually making the requests to the different sites.
69
72
70
-
If you need to specify special `Crawlera Headers <https://doc.scrapinghub.com/crawlera.html#request-headers>`_, just apply them as normal `Scrapy Headers<https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.headers>`_.
73
+
If you need to specify special `Zyte Smart Proxy Manager headers <https://docs.zyte.com/smart-proxy-manager.html#request-headers>`_, just apply them as normal `Scrapy headers<https://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.headers>`_.
71
74
72
-
Here we have an example of specifying a Crawlera header into a Scrapy request::
75
+
Here we have an example of specifying a Zyte Smart Proxy Manager header into a Scrapy request::
73
76
74
77
scrapy.Request(
75
78
'http://example.com',
@@ -82,8 +85,8 @@ Here we have an example of specifying a Crawlera header into a Scrapy request::
82
85
Remember that you could also set which headers to use by default by all
83
86
requests with `DEFAULT_REQUEST_HEADERS <http://doc.scrapy.org/en/1.0/topics/settings.html#default-request-headers>`_
84
87
85
-
.. note:: Crawlera headers are removed from requests when the middleware is activated but Crawlera
86
-
is disabled. For example, if you accidentally disable Crawlera via ``crawlera_enabled = False``
88
+
.. note:: Zyte Smart Proxy Manager headers are removed from requests when the middleware is activated but Zyte Smart Proxy Manager
89
+
is disabled. For example, if you accidentally disable Zyte Smart Proxy Manager via ``zyte_smartproxy_enabled = False``
87
90
but keep sending ``X-Crawlera-*`` headers in your requests, those will be removed from the
88
91
request headers.
89
92
@@ -99,4 +102,4 @@ All the rest
99
102
news
100
103
101
104
:doc:`news`
102
-
See what has changed in recent scrapy-crawlera versions.
105
+
See what has changed in recent scrapy-zyte-smartproxy versions.
Copy file name to clipboardExpand all lines: docs/settings.rst
+31-30Lines changed: 31 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,76 +2,77 @@
2
2
Settings
3
3
========
4
4
5
-
This Middleware adds some settings to configure how to work with Crawlera.
5
+
This Scrapy downloader middleware adds some settings to configure how to work
6
+
with Zyte Smart Proxy Manager.
6
7
7
-
CRAWLERA_APIKEY
8
-
---------------
8
+
ZYTE_SMARTPROXY_APIKEY
9
+
----------------------
9
10
10
11
Default: ``None``
11
12
12
-
Unique Crawlera API Key provided for authentication.
13
+
Unique Zyte Smart Proxy Manager API key provided for authentication.
13
14
14
-
CRAWLERA_URL
15
-
------------
15
+
ZYTE_SMARTPROXY_URL
16
+
-------------------
16
17
17
-
Default: ``'http://proxy.crawlera.com:8010'``
18
+
Default: ``'http://proxy.zyte.com:8011'``
18
19
19
-
Crawlera instance url, it varies depending on adquiring a private or dedicated instance. If Crawlera didn't provide
20
-
you with a private instance url, you don't need to specify it.
20
+
Zyte Smart Proxy Manager instance URL, it varies depending on adquiring a private or dedicated instance. If Zyte Smart Proxy Manager didn't provide
21
+
you with a private instance URL, you don't need to specify it.
21
22
22
-
CRAWLERA_MAXBANS
23
-
----------------
23
+
ZYTE_SMARTPROXY_MAXBANS
24
+
-----------------------
24
25
25
26
Default: ``400``
26
27
27
-
Number of consecutive bans from Crawlera necessary to stop the spider.
28
+
Number of consecutive bans from Zyte Smart Proxy Manager necessary to stop the spider.
28
29
29
-
CRAWLERA_DOWNLOAD_TIMEOUT
30
-
-------------------------
30
+
ZYTE_SMARTPROXY_DOWNLOAD_TIMEOUT
31
+
--------------------------------
31
32
32
33
Default: ``190``
33
34
34
-
Timeout for processing Crawlera requests. It overrides Scrapy's ``DOWNLOAD_TIMEOUT``.
35
+
Timeout for processing Zyte Smart Proxy Manager requests. It overrides Scrapy's ``DOWNLOAD_TIMEOUT``.
35
36
36
-
CRAWLERA_PRESERVE_DELAY
37
-
-----------------------
37
+
ZYTE_SMARTPROXY_PRESERVE_DELAY
38
+
------------------------------
38
39
39
40
Default: ``False``
40
41
41
42
If ``False`` Sets Scrapy's ``DOWNLOAD_DELAY`` to ``0``, making the spider to crawl faster. If set to ``True``, it will
42
43
respect the provided ``DOWNLOAD_DELAY`` from Scrapy.
43
44
44
-
CRAWLERA_DEFAULT_HEADERS
45
-
------------------------
45
+
ZYTE_SMARTPROXY_DEFAULT_HEADERS
46
+
-------------------------------
46
47
47
48
Default: ``{}``
48
49
49
-
Default headers added only to crawlera requests. Headers defined on ``DEFAULT_REQUEST_HEADERS`` will take precedence as long as the ``CrawleraMiddleware`` is placed after the ``DefaultHeadersMiddleware``. Headers set on the requests have precedence over the two settings.
50
+
Default headers added only to Zyte Smart Proxy Manager requests. Headers defined on ``DEFAULT_REQUEST_HEADERS`` will take precedence as long as the ``ZyteSmartProxyMiddleware`` is placed after the ``DefaultHeadersMiddleware``. Headers set on the requests have precedence over the two settings.
50
51
51
-
* This is the default behavior, ``DefaultHeadersMiddleware`` default priority is ``400`` and we recommend ``CrawleraMiddleware`` priority to be ``610``
52
+
* This is the default behavior, ``DefaultHeadersMiddleware`` default priority is ``400`` and we recommend ``ZyteSmartProxyMiddleware`` priority to be ``610``
52
53
53
-
CRAWLERA_BACKOFF_STEP
54
-
-----------------------
54
+
ZYTE_SMARTPROXY_BACKOFF_STEP
55
+
----------------------------
55
56
56
57
Default: ``15``
57
58
58
59
Step size used for calculating exponential backoff according to the formula: ``random.uniform(0, min(max, step * 2 ** attempt))``.
59
60
60
-
CRAWLERA_BACKOFF_MAX
61
-
-----------------------
61
+
ZYTE_SMARTPROXY_BACKOFF_MAX
62
+
---------------------------
62
63
63
64
Default: ``180``
64
65
65
66
Max value for exponential backoff as showed in the formula above.
66
67
67
-
CRAWLERA_FORCE_ENABLE_ON_HTTP_CODES
68
-
------------------------------------
68
+
ZYTE_SMARTPROXY_FORCE_ENABLE_ON_HTTP_CODES
69
+
------------------------------------------
69
70
70
71
Default: ``[]``
71
72
72
-
List of HTTP response status codes that warrant enabling Crawlera for the
73
+
List of HTTP response status codes that warrant enabling Zyte Smart Proxy Manager for the
73
74
corresponding domain.
74
75
75
76
When a response with one of these HTTP status codes is received after a request
76
-
that did not go through Crawlera, the request is retried with Crawlera, and any
77
-
new request to the same domain is also sent through Crawlera.
77
+
that did not go through Zyte Smart Proxy Manager, the request is retried with Zyte Smart Proxy Manager, and any
78
+
new request to the same domain is also sent through Zyte Smart Proxy Manager.
0 commit comments