Document memory-efficient use of capture_http

Since `requests.get` loads the whole response into RAM, capture_http is inefficient for large files:

```
# this uses 10GB of RAM:
with capture_http('example.warc.gz'):
    requests.get('https://example.com/#some_10GB_file')
```

Simply calling requests.get with stream=True doesn't result in archiving the file, only the headers:

```
# this doesn't work:
with capture_http('example.warc.gz'):
    requests.get('https://example.com/#some_10GB_file', stream=True)
```

Fetching and throwing away the data does work:

```
with capture_http('example.warc.gz'):
    response = requests.get('https://example.com/#some_10GB_file', stream=True)
    for _ in response.iter_content(chunk_size=2**16):
        pass
```

I haven't dug into why it's necessary to consume the stream -- is that inherent or accidental?

Either way, it might be nice to add one of these examples to the docs, since I think it's correct to do this any time you're not actually using the `requests.get()` response.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Document memory-efficient use of capture_http #187

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Document memory-efficient use of capture_http #187

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions