Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions mediaParser.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
const axios = require('axios');
const cheerio = require('cheerio');
const { URL } = require('url');

async function extractImages(url) {
try {
// 1. Fetch HTML
const { data: html } = await axios.get(url);

// 2. Load HTML into cheerio
const $ = cheerio.load(html);

const images = [];
const seen = new Set(); // to track duplicates

// 3. Loop through each <img> tag
$('img').each((index, element) => {
let src = $(element).attr('src');
const alt = $(element).attr('alt') || '';

if (src) {
// 4. Convert relative URLs to absolute URLs
try {
src = new URL(src, url).href;
} catch {
// skip invalid URLs
return;
}

// 5. Skip duplicates
if (!seen.has(src)) {
seen.add(src);
images.push({ url: src, altText: alt });
}
}
});

return images;

} catch (error) {
console.log('Oops! Something went wrong while fetching images:', error.message);
return [];
}
}
4 changes: 3 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,11 @@
"@types/react-dom": "^18.0.1",
"@types/uuid": "^8.3.4",
"airtable": "^0.12.2",
"axios": "^1.9.0",
"axios": "^1.12.2",
"bcrypt": "^5.1.1",
"body-parser": "^1.20.3",
"buffer": "^6.0.3",
"cheerio": "^1.1.2",
"connect-pg-simple": "^10.0.0",
"cookie-parser": "^1.4.6",
"cors": "^2.8.5",
Expand Down Expand Up @@ -80,6 +81,7 @@
"styled-components": "^5.3.3",
"swagger-jsdoc": "^6.2.8",
"swagger-ui-express": "^5.0.1",
"tesseract.js": "^6.0.1",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Unused dependency: tesseract.js is not referenced in the current implementation.

The tesseract.js library was added but is not imported or used in mediaParser.js. Consider removing it unless it's planned for future use (e.g., OCR on images).

If tesseract.js is intended for future functionality, consider adding a comment in the code or creating a follow-up issue to track its implementation.

🤖 Prompt for AI Agents
In package.json around line 84, the dependency "tesseract.js": "^6.0.1" is
unused in the codebase (not imported in mediaParser.js); remove the tesseract.js
entry from package.json and run npm/yarn install to update lockfile, or if it is
intended for future OCR work, leave the dependency but add a short TODO comment
in the relevant module or create a follow-up issue referencing this dependency
and its planned usage so it is tracked.

"typedoc": "^0.23.8",
"typescript": "^4.6.3",
"uuid": "^8.3.2",
Expand Down