-
Notifications
You must be signed in to change notification settings - Fork 973
feat: parallelization #352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
2599712
0c655fc
641e182
352447b
4ffbcba
4c0eb30
bd25950
8360d3d
38539d5
e0b52c1
b2e8332
7d0339a
c4c77e6
5c6f478
5becc84
5e8a6d1
73c2cc3
6984401
7105749
a931a13
5411484
7650794
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,6 +39,60 @@ services: | |
volumes: | ||
- minio_data:/data | ||
|
||
zookeeper: | ||
image: confluentinc/cp-zookeeper:7.4.0 | ||
environment: | ||
ZOOKEEPER_CLIENT_PORT: ${ZOOKEEPER_PORT:-2181} | ||
ZOOKEEPER_TICK_TIME: 2000 | ||
ports: | ||
- "${ZOOKEEPER_PORT:-2181}:2181" | ||
volumes: | ||
- zookeeper_data:/var/lib/zookeeper/data | ||
- zookeeper_log:/var/lib/zookeeper/log | ||
healthcheck: | ||
test: ["CMD-SHELL", "zookeeper-shell.sh localhost:${ZOOKEEPER_PORT:-2181} ls /"] | ||
interval: 10s | ||
timeout: 5s | ||
retries: 5 | ||
|
||
# Add Kafka | ||
kafka: | ||
image: confluentinc/cp-kafka:7.4.0 | ||
depends_on: | ||
- zookeeper | ||
environment: | ||
KAFKA_BROKER_ID: 1 | ||
KAFKA_ZOOKEEPER_CONNECT: zookeeper:${ZOOKEEPER_PORT:-2181} | ||
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT | ||
KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092 | ||
KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1 | ||
KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0 | ||
KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1 | ||
KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1 | ||
KAFKA_AUTO_CREATE_TOPICS_ENABLE: "true" | ||
ports: | ||
- "${KAFKA_PORT:-9092}:9092" | ||
- "${KAFKA_EXTERNAL_PORT:-29092}:29092" | ||
volumes: | ||
- kafka_data:/var/lib/kafka/data | ||
healthcheck: | ||
test: ["CMD-SHELL", "kafka-topics.sh --bootstrap-server localhost:9092 --list"] | ||
interval: 10s | ||
timeout: 5s | ||
retries: 5 | ||
|
||
kafka-ui: | ||
image: provectuslabs/kafka-ui:latest | ||
container_name: kafka-ui | ||
ports: | ||
- "${KAFKA_UI_PORT:-9080}:8080" | ||
environment: | ||
KAFKA_CLUSTERS_0_NAME: maxun-cluster | ||
KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS: kafka:9092 | ||
KAFKA_CLUSTERS_0_ZOOKEEPER: zookeeper:2181 | ||
depends_on: | ||
- kafka | ||
|
||
backend: | ||
#build: | ||
#context: . | ||
|
@@ -63,6 +117,7 @@ services: | |
- postgres | ||
- redis | ||
- minio | ||
- kafka | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add Kafka configuration for backend service. The backend service now depends on Kafka but lacks the necessary environment variables for connection configuration. Add these environment variables to the backend service: environment:
+ KAFKA_BROKERS: kafka:9092
+ KAFKA_CLIENT_ID: maxun-backend
+ KAFKA_CONSUMER_GROUP_ID: maxun-scraping-group
|
||
volumes: | ||
- /var/run/dbus:/var/run/dbus | ||
|
||
|
@@ -83,4 +138,7 @@ services: | |
volumes: | ||
postgres_data: | ||
minio_data: | ||
redis_data: | ||
redis_data: | ||
kafka_data: | ||
zookeeper_data: | ||
zookeeper_log: |
Original file line number | Diff line number | Diff line change | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,10 @@ | ||||||||||
export const kafkaConfig = { | ||||||||||
clientId: 'maxun-scraper', | ||||||||||
brokers: ['localhost:29092'], | ||||||||||
topics: { | ||||||||||
Comment on lines
+1
to
+4
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use environment variables for Kafka configuration. The broker address and client ID are hardcoded, which limits deployment flexibility and poses security risks. Consider using environment variables for configuration. Apply this diff to make the configuration more flexible: +import { config } from 'dotenv';
+
+config();
+
export const kafkaConfig = {
- clientId: 'maxun-scraper',
- brokers: ['localhost:29092'],
+ clientId: process.env.KAFKA_CLIENT_ID || 'maxun-scraper',
+ brokers: (process.env.KAFKA_BROKERS || 'localhost:29092').split(','), Also, consider adding SSL/authentication configuration for production environments: // Add these properties to kafkaConfig
ssl: process.env.KAFKA_SSL === 'true',
sasl: process.env.KAFKA_SASL === 'true' ? {
mechanism: process.env.KAFKA_SASL_MECHANISM,
username: process.env.KAFKA_USERNAME,
password: process.env.KAFKA_PASSWORD,
} : undefined, |
||||||||||
SCRAPING_TASKS: 'scraping-tasks', | ||||||||||
SCRAPING_RESULTS: 'scraping-results', | ||||||||||
SCRAPING_DLQ: 'scraping-dlq' | ||||||||||
}, | ||||||||||
consumerGroup: 'scraping-group' | ||||||||||
}; | ||||||||||
Comment on lines
+9
to
+10
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🛠️ Refactor suggestion Add environment separation for consumer groups. The consumer group should be environment-aware to prevent cross-environment message processing. Apply this diff: - consumerGroup: 'scraping-group'
+ consumerGroup: `${process.env.ENV || 'dev'}-scraping-group` 📝 Committable suggestion
Suggested change
Comment on lines
+1
to
+10
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Codebase verification Remove unused Kafka configuration The Kafka configuration appears to be unused in the codebase. There are no imports of this configuration, no Kafka client instantiations, and no other Kafka-related code present. Consider removing this unused configuration to maintain a clean codebase. 🔗 Analysis chainVerify Kafka configuration in the codebase. Let's ensure the configuration is properly utilized across the codebase. 🏁 Scripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Description: Check for proper usage of Kafka configuration and potential security issues
# Check for hardcoded Kafka configurations
echo "Checking for hardcoded Kafka configurations..."
rg -g '!*.{json,md}' -i 'localhost:29092'
# Check for proper import usage
echo "Checking kafka config imports..."
rg "from ['|\"].*config/kafka" -A 3
# Check for potential security-sensitive Kafka operations
echo "Checking for direct Kafka operations..."
ast-grep --pattern 'new Kafka($$$)'
Length of output: 451 Script: #!/bin/bash
# Search for Kafka-related code and configurations
# Check for usage of kafkaConfig
echo "Checking for kafkaConfig usage..."
rg "kafkaConfig"
# Check for usage of topic names
echo "Checking for topic name usage..."
rg "scraping-tasks|scraping-results|scraping-dlq"
# Check for any Kafka-related imports or code
echo "Checking for Kafka-related code..."
rg -i "kafka" -g "!*.{json,md,txt}" -A 3
Length of output: 844 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Pin the kafka-ui image version.
Using the
latest
tag can lead to unexpected behavior when the image is updated. Pin to a specific version for reproducible builds.📝 Committable suggestion