Commit 3bfc8df
authored
fix: prevent S3 path conflicts using tempfile (#569)
## Problem
S3 downloads were sometimes failing with `NotADirectoryError` and
`FileExistsError` when S3 buckets contained objects with conflicting
naming patterns that cannot be represented in traditional filesystem
hierarchies.
**Example conflict:**
- S3 object: `foo` (file)
- S3 object: `foo/documents` (file requiring foo to be a directory)
This created a race condition where download order determined
success/failure
## Solution
Used tempfile to create unique download paths for each S3 object:
**Before:**
```
S3: "foo" → Local: /downloads/foo
S3: "foo/documents" → Local: /downloads/foo/documents
Conflict: foo cannot be both file and directory
```
**After:**
```
S3: "foo" → Local: /downloads/a1b2c3d4e5f6/foo
S3: "foo/documents" → Local: /downloads/9g8h7i6j5k4l/documents
No conflicts: Each file gets unique directory
```
## Future Work
This PR targets only the s3 downloads. I think it would make sense to
use tempfiles for all downloads (as in [PR
#571](#571)),
but that requires more extensive changes to implement cleanly. This fix
provides immediate relief from the path conflict issues while we work on
the more comprehensive tempfile solution.1 parent 93121f1 commit 3bfc8df
File tree
8 files changed
+47
-9
lines changed- test/integration/connectors
- expected_results
- s3-minio
- s3-specialchar
- s3
- utils/validation
- unstructured_ingest
- interfaces
- processes/connectors/fsspec
8 files changed
+47
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
1 | 5 | | |
2 | 6 | | |
3 | 7 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
167 | 167 | | |
168 | 168 | | |
169 | 169 | | |
170 | | - | |
171 | | - | |
172 | | - | |
173 | | - | |
174 | | - | |
| 170 | + | |
| 171 | + | |
| 172 | + | |
| 173 | + | |
| 174 | + | |
| 175 | + | |
| 176 | + | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
175 | 188 | | |
176 | 189 | | |
177 | 190 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| 43 | + | |
42 | 44 | | |
43 | 45 | | |
44 | 46 | | |
| |||
Lines changed: 19 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
264 | 264 | | |
265 | 265 | | |
266 | 266 | | |
| 267 | + | |
| 268 | + | |
267 | 269 | | |
268 | 270 | | |
269 | 271 | | |
270 | 272 | | |
271 | 273 | | |
272 | 274 | | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
273 | 292 | | |
274 | 293 | | |
275 | 294 | | |
| |||
0 commit comments