Skip to content

Commit e526579

Browse files
(EAI-639) setup WebDataSource (#611)
* scraping web pages for content * processing sitemap for full directory urls, processing some additional urls * modify drone file for testing puppeteer in cron env * tests for WebDataSource and helper functions --------- Co-authored-by: mongodben <[email protected]>
1 parent 5da75ad commit e526579

15 files changed

+1867
-72
lines changed

Diff for: .drone.yml

+1
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ steps:
1818
image: node:18
1919
commands:
2020
- npm ci
21+
- npx playwright install chromium --with-deps
2122
- npm run build
2223
- npm run lint
2324
- npm run test

Diff for: ingest-service.dockerfile

+5-2
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Build stage
2-
FROM node:18-alpine
2+
FROM node:18
3+
4+
# Install Playwright with dependencies
5+
RUN npx playwright install chromium --with-deps
36

47
WORKDIR /bin
58
COPY . ./
@@ -8,7 +11,7 @@ RUN npm run bootstrap -- --scope='{mongodb-rag-core,mongodb-rag-ingest,ingest-mo
811
RUN npm run build -- --scope='{mongodb-rag-core,mongodb-rag-ingest,ingest-mongodb-public}'
912

1013
# Add git for GitDataSource
11-
RUN apk add --no-cache git
14+
RUN apt-get update && apt-get install -y git
1215

1316
ENV NODE_ENV=production
1417

0 commit comments

Comments
 (0)