4
4
* It will auto resume from same point from where it died if given consumer group name is same before and after crash.
5
5
* it will upload ` current.bin ` file to s3 which contains messages upto ` NUMBER_OF_MESSAGE_PER_BACKUP_FILE `
6
6
but will only upload with other backup files.
7
- * upload to s3 is background process and it depends on ` RETRY_UPLOAD_SECONDS ` .
7
+ * ` RETRY_UPLOAD_SECONDS ` controls upload to s3 or other cloud storage.
8
+ * ` NUMBER_OF_KAFKA_THREADS ` is used to parallelise reading from kafka topic.
9
+ It should not be more than number of partitions.
10
+ * ` LOG_LEVEL ` values can be found https://docs.python.org/3/library/logging.html#logging-levels
11
+ * ` NUMBER_OF_MESSAGE_PER_BACKUP_FILE ` will try to keep this number consistent in file
12
+ but if application got restarted then it may be vary for first back file.
8
13
9
14
** Restore Application**
10
15
* it will restore from backup dir into given topic.
@@ -25,50 +30,76 @@ python3 backup.py backup.json
25
30
** Local Filesytem Backup.json**
26
31
```
27
32
{
28
- "BOOTSTRAP_SERVERS": "localhost :9092",
33
+ "BOOTSTRAP_SERVERS": "kafka01:9092,kafka02:9092,kafka03 :9092",
29
34
"TOPIC_NAMES": ["davinder.test"],
30
35
"GROUP_ID": "Kafka-BackUp-Consumer-Group",
31
36
"FILESYSTEM_TYPE": "LINUX",
32
37
"FILESYSTEM_BACKUP_DIR": "/tmp/",
33
- "NUMBER_OF_MESSAGE_PER_BACKUP_FILE": 50
38
+ "NUMBER_OF_MESSAGE_PER_BACKUP_FILE": 1000,
39
+ "RETRY_UPLOAD_SECONDS": 100,
40
+ "NUMBER_OF_KAFKA_THREADS": 3,
41
+ "LOG_LEVEL": 20
34
42
}
35
43
```
36
44
37
45
** S3 backup.json**
38
46
```
39
47
{
40
- "BOOTSTRAP_SERVERS": "localhost :9092",
48
+ "BOOTSTRAP_SERVERS": "kafka01:9092,kafka02:9092,kafka03 :9092",
41
49
"TOPIC_NAMES": ["davinder.test"],
42
50
"GROUP_ID": "Kafka-BackUp-Consumer-Group",
43
51
"FILESYSTEM_TYPE": "S3",
44
52
"FILESYSTEM_BACKUP_DIR": "/tmp/",
45
- "NUMBER_OF_MESSAGE_PER_BACKUP_FILE": 50,
46
- "RETRY_UPLOAD_SECONDS": 100
53
+ "NUMBER_OF_MESSAGE_PER_BACKUP_FILE": 1000,
54
+ "RETRY_UPLOAD_SECONDS": 100,
55
+ "NUMBER_OF_KAFKA_THREADS": 3,
56
+ "LOG_LEVEL": 20
47
57
}
48
58
```
49
-
59
+ ** Example Local Backup Run Output**
60
+ ```
61
+ { "@timestamp": "2020-06-08 10:56:34,557","level": "INFO","thread": "Kafka Consumer 0","name": "root","message": "started polling on davinder.test" }
62
+ { "@timestamp": "2020-06-08 10:56:34,557","level": "INFO","thread": "Kafka Consumer 1","name": "root","message": "started polling on davinder.test" }
63
+ { "@timestamp": "2020-06-08 10:56:34,557","level": "INFO","thread": "Kafka Consumer 2","name": "root","message": "started polling on davinder.test" }
64
+ { "@timestamp": "2020-06-08 10:56:51,590","level": "INFO","thread": "Kafka Consumer 1","name": "root","message": "Created Successful Backupfile /tmp/davinder.test/1/20200608-105651.tar.gz" }
65
+ { "@timestamp": "2020-06-08 10:56:51,593","level": "INFO","thread": "Kafka Consumer 1","name": "root","message": "Created Successful Backup sha256 file of /tmp/davinder.test/1/20200608-105651.tar.gz.sha256" }
66
+ { "@timestamp": "2020-06-08 10:57:17,270","level": "INFO","thread": "Kafka Consumer 0","name": "root","message": "Created Successful Backupfile /tmp/davinder.test/0/20200608-105717.tar.gz" }
67
+ { "@timestamp": "2020-06-08 10:57:17,277","level": "INFO","thread": "Kafka Consumer 0","name": "root","message": "Created Successful Backup sha256 file of /tmp/davinder.test/0/20200608-105717.tar.gz.sha256" }
68
+ { "@timestamp": "2020-06-08 10:57:17,399","level": "INFO","thread": "Kafka Consumer 2","name": "root","message": "Created Successful Backupfile /tmp/davinder.test/2/20200608-105717.tar.gz" }
69
+ { "@timestamp": "2020-06-08 10:57:17,406","level": "INFO","thread": "Kafka Consumer 2","name": "root","message": "Created Successful Backup sha256 file of /tmp/davinder.test/2/20200608-105717.tar.gz.sha256" }
70
+ ...
71
+ ```
50
72
** Example S3 Backup Run Output**
51
73
```
52
74
$ python3 backup.py backup.json
53
- { "@timestamp": "2020-06-01 10:37:00,168","level": "INFO","thread": "MainThread","name": "root","message": "Successful loading of config.json file" }
54
- { "@timestamp": "2020-06-01 10:37:00,169","level": "INFO","thread": "MainThread","name": "root","message": "all required variables are successfully" }
55
- { "@timestamp": "2020-06-01 10:37:00,187","level": "INFO","thread": "Kafka Consumer","name": "root","message": "starting polling on davinder.test" }
56
- { "@timestamp": "2020-06-01 10:38:17,291","level": "INFO","thread": "Kafka Consumer","name": "root","message": "Created Successful Backupfile /tmp/davinder.test/20200601-103817.tar.gz" }
57
- { "@timestamp": "2020-06-01 10:39:00,631","level": "INFO","thread": "S3-Upload","name": "root","message": "upload successful at s3://davinder-test-kafka-backup/davinder.test/20200601-103817.tar.gz" }
75
+ { "@timestamp": "2020-06-10 12:49:43,871","level": "INFO","thread": "S3 Upload","name": "botocore.credentials","message": "Found credentials in environment variables." }
76
+ { "@timestamp": "2020-06-10 12:49:43,912","level": "INFO","thread": "Kafka Consumer 1","name": "root","message": "started polling on davinder.test" }
77
+ { "@timestamp": "2020-06-10 12:49:43,915","level": "INFO","thread": "Kafka Consumer 0","name": "root","message": "started polling on davinder.test" }
78
+ { "@timestamp": "2020-06-10 12:49:43,916","level": "INFO","thread": "Kafka Consumer 2","name": "root","message": "started polling on davinder.test" }
79
+ { "@timestamp": "2020-06-10 12:49:44,307","level": "INFO","thread": "S3 Upload","name": "root","message": "upload successful at s3://davinder-test-kafka-backup/davinder.test/0/20200608-102909.tar.gz" }
80
+ { "@timestamp": "2020-06-10 12:49:45,996","level": "INFO","thread": "S3 Upload","name": "root","message": "waiting for new files to be generated" }
81
+ { "@timestamp": "2020-06-10 12:52:33,130","level": "INFO","thread": "Kafka Consumer 0","name": "root","message": "Created Successful Backupfile /tmp/davinder.test/0/20200610-125233.tar.gz" }
82
+ { "@timestamp": "2020-06-10 12:52:33,155","level": "INFO","thread": "Kafka Consumer 0","name": "root","message": "Created Successful Backup sha256 file of /tmp/davinder.test/0/20200610-125233.tar.gz.sha256" }
58
83
....
59
84
```
60
85
61
86
# Backup Directory Structure
62
87
```
63
- $ tree davinder.test/
64
- davinder.test/
65
- ├── 20204025-154046.tar.gz
66
- ├── 20204025-154046.tar.gz.sha256
67
- ├── 20204325-154344.tar.gz
68
- ├── 20204325-154344.tar.gz.sha256
69
- └── current.bin
88
+ /tmp/davinder.test/
89
+ ├── 0
90
+ │ ├── 20200608-102909.tar.gz
91
+ │ ├── 20200608-102909.tar.gz.sha256
92
+ │ └── current.bin
93
+ ├── 1
94
+ │ ├── 20200608-102909.tar.gz
95
+ │ ├── 20200608-102909.tar.gz.sha256
96
+ │ └── current.bin
97
+ └── 2
98
+ ├── 20200608-102909.tar.gz
99
+ ├── 20200608-102909.tar.gz.sha256
100
+ └── current.bin
70
101
71
- 0 directories, 5 files
102
+ 3 directories, 9 files
72
103
```
73
104
74
105
# How to Run Kafka Restore Application
0 commit comments