|
1 | | -# For Deno client readme [click here](./DENO_README.md) |
2 | | - |
3 | | -# For the v2 scraper client readme [click here](./V2_README.md) |
4 | | - |
5 | 1 | # RT-CV scraper client |
6 | 2 |
|
7 | | -A client that can be spawned by a scraper to ease communication with [RT-CV](https://github.com/script-development/RT-CV) |
8 | | - |
9 | | -## Communication flow |
10 | | - |
11 | | -``` |
12 | | -custom scraper <> rtcv_scraper_client (this project) <> RT-CV |
13 | | -``` |
14 | | - |
15 | | -1. The custom scraper spawns rtcv_scraper_client as child process |
16 | | -2. The custom scraper sends it's credentials to the child process via stdin |
17 | | -3. rtcv_scraper_client handles the authentication and reports if it went successfull |
18 | | -4. The custom scraper starts scraping and sends every scraped result to it's child process (rtcv_scraper_client) |
19 | | -5. rtcv_scraper_client sends the scraped data to rt-cv and reports if it was successfull |
20 | | -6. ... |
21 | | - |
22 | | -## Methods |
23 | | - |
24 | | -### Authentication |
25 | | - |
26 | | -There are 3 authentication methods |
27 | | -- `set_credentials` if you have one RT-CV server |
28 | | -- `set_multiple_credentials` if you have multiple RT-CV servers |
29 | | -- `set_mock` if you want to debug a scraper, this mocks the RT-CV server |
| 3 | +A helper program that aims to ease the communication between a scraper and [RT-CV](https://github.com/script-development/RT-CV) |
30 | 4 |
|
31 | | -#### `set_credentials` |
| 5 | +## How does this work? |
32 | 6 |
|
33 | | -Set credentials and location of a RT-CV server |
| 7 | +This scraper client handles authentication and communication with RT-CV and beside that also has a cache for the already fetched reference numbers. |
34 | 8 |
|
35 | | -Example input |
| 9 | +This scraper works like this: |
36 | 10 |
|
37 | | -```json |
38 | | -{"type":"set_credentials","content":{"server_location":"http://localhost:4000","api_key_id":"111111111111111111111111","api_key":"ddd"}} |
39 | | -``` |
40 | | - |
41 | | -Ok Response |
42 | | - |
43 | | -```json |
44 | | -{"type":"ok"} |
45 | | -``` |
46 | | - |
47 | | -#### `set_multiple_credentials` |
| 11 | +1. You run `rtcv_scraper_client` in your terminal |
| 12 | +2. The scraper client reads `env.json` and authenticates with RT-CV |
| 13 | +3. The scraper client spawns the program you have defined as args for the `rtcv_scraper_client`, for example with an npm scraper it would be something like `rtcv_scraper_client npm run start` |
| 14 | +4. Your scraper can now easially talk with `RT-CV` via `rtcv_scraper_client` using http requests where the http server address is defiend by a shell variable set by the scraper client `$SCRAPER_ADDRESS` |
48 | 15 |
|
49 | | -Set credentials and location of multiple RT-CV servers |
| 16 | +## Why this client? |
50 | 17 |
|
51 | | -Note that we need to define which server the primary should be, the primary server is used to fetch secrets and recently scraped reference numbers |
| 18 | +Every scraper needs to communicate with RT-CV and the amound of code that require is quite a lot. |
52 | 19 |
|
53 | | -Example input |
| 20 | +If we have the same code for communicating with RT-CV we only have a single point of failure and updating / adding features is easy. |
54 | 21 |
|
55 | | -```json |
56 | | -{"type":"set_credentials","content":[{"primary":true,"server_location":"http://localhost:4000","api_key_id":"111111111111111111111111","api_key":"ddd"},{"server_location":"http://localhost:4000","api_key_id":"111111111111111111111111","api_key":"ddd"}]} |
57 | | -``` |
| 22 | +## Example |
58 | 23 |
|
59 | | -Ok Response |
| 24 | +A Deno example |
60 | 25 |
|
61 | | -```json |
62 | | -{"type":"ok"} |
| 26 | +```ts |
| 27 | +// denoexample.ts |
| 28 | +const req = await fetch(Deno.env.get('SCRAPER_ADDRESS') + '/users') |
| 29 | +const users = await req.json() |
| 30 | +console.log(users) |
63 | 31 | ``` |
64 | 32 |
|
65 | | -#### `set_mock` |
66 | | - |
67 | | -Enable mock mode |
68 | | - |
69 | | -Example input: |
70 | | - |
71 | | -```json |
72 | | -{"type":"set_mock","content":{}} |
73 | | -``` |
74 | | - |
75 | | -You can also provide mock secrets for the `*_secret` methods. |
76 | | - |
77 | | -This object need to follow the following type convention `key (string) -> value (any)` |
78 | | - |
79 | | -```json |
80 | | -{"type":"set_mock","content":{"secrets":{"users": [{"username":"foo","password":"bar"}],"user":{"username":"foo","password":"bar"}}}} |
81 | | -``` |
82 | | - |
83 | | -Ok Response |
84 | | - |
85 | | -```json |
86 | | -{"type":"ok"} |
87 | | -``` |
88 | | - |
89 | | -### `send_cv` |
90 | | - |
91 | | -Send a scraped CV to RT-CV and triggers the `set_cached_reference` for the reference number of this cv if there where matches on this CV |
92 | | - |
93 | | -Example input |
94 | | - |
95 | | -```json |
96 | | -{"type":"send_cv","content":{"reference_number":"abcd","..":".."}} |
97 | | -``` |
98 | | - |
99 | | -Ok Response |
100 | | - |
101 | | -*The content represents if a match was made* |
102 | | - |
103 | | -```json |
104 | | -{"type":"ok","content":true} |
105 | | -``` |
106 | | - |
107 | | -### `get_secret` |
108 | | - |
109 | | -Get a user defined secret from the server |
110 | | - |
111 | | -Example input |
112 | | - |
113 | | -```json |
114 | | -{"type":"get_secret","content":{"encryption_key":"my-very-secret-encryption-key", "key":"key-of-value"}} |
115 | | -``` |
116 | | - |
117 | | -Ok Response |
118 | | - |
119 | | -```jsonc |
120 | | -{"type":"ok","content":{/*Based on the content stored in the secret value*/}} |
121 | | -``` |
122 | | - |
123 | | -### `get_users_secret` |
124 | | - |
125 | | -Get a secret from the server where the contents is a strictly defined list of users |
126 | | - |
127 | | -Example input |
128 | | - |
129 | | -```json |
130 | | -{"type":"get_users_secret","content":{"encryption_key":"my-very-secret-encryption-key", "key":"users"}} |
131 | | -``` |
132 | | - |
133 | | -Ok Response |
134 | | - |
135 | | -```json |
136 | | -{"type":"ok","content":[{"username":"foo","password":"foo"},{"username":"bar","password":"bar"}]} |
137 | | -``` |
138 | | - |
139 | | -### `get_user_secret` |
140 | | - |
141 | | -Get a secret from the server where the contents is a strictly defined user |
142 | | - |
143 | | -Example input |
144 | | - |
145 | | -```json |
146 | | -{"type":"get_user_secret","content":{"encryption_key":"my-very-secret-encryption-key", "key":"user"}} |
147 | | -``` |
148 | | - |
149 | | -Ok Response |
150 | | - |
151 | | -```json |
152 | | -{"type":"ok","content":{"username":"foo","password":"bar"}} |
| 33 | +```sh |
| 34 | +# rtcv_scraper_client deno run -A denoexample.ts |
| 35 | +credentials set |
| 36 | +testing connections.. |
| 37 | +connected to RTCV |
| 38 | +running scraper.. |
| 39 | +Check file:///.../denoexample.ts |
| 40 | +[ { username: "username here", password: "password here" } ] |
153 | 41 | ``` |
154 | 42 |
|
155 | | -### `set_cached_reference` |
156 | | - |
157 | | -Save a reference number to the cache with a timeout of 3 days |
158 | | -This cache is used to avoid sending the same CV twice or scraping data that has already been scraped |
| 43 | +## Setup & Run |
159 | 44 |
|
160 | | -*Note that "send_cv" also executes this function automatically* |
| 45 | +### *1.* Install the helper |
161 | 46 |
|
162 | | -Example input |
163 | | - |
164 | | -```json |
165 | | -{"type":"set_cached_reference","content":"abcd"} |
| 47 | +```sh |
| 48 | +go install github.com/script-development/rtcv_scraper_client@latest |
166 | 49 | ``` |
167 | 50 |
|
168 | | -Ok Response |
| 51 | +### *2.* Obtain a `env.json` |
169 | 52 |
|
170 | | -```json |
171 | | -{"type":"ok"} |
| 53 | +Create a `env.json` file with the following content **(this file can also be obtained from the RTCV dashboard, tough note that you might need to add login_users yourself)** |
| 54 | +```js |
| 55 | +{ |
| 56 | + "primary_server": { |
| 57 | + "server_location": "http://localhost:4000", |
| 58 | + "api_key_id": "aa", |
| 59 | + "api_key": "bbb" |
| 60 | + }, |
| 61 | + "alternative_servers": [ |
| 62 | + // If you want to send CVs to multiple servers you can add additional servers here |
| 63 | + ], |
| 64 | + "login_users": [ |
| 65 | + {"username": "scraping-site-username", "password": "scraping-site-password"} |
| 66 | + ] |
| 67 | +} |
172 | 68 | ``` |
173 | 69 |
|
174 | | -### `set_short_cached_reference` |
| 70 | +### *3.* Develop / Deploy a scraper using `rtcv_scraper_client` |
175 | 71 |
|
176 | | -Same as previouse one except this one has a time out of 12 hours |
177 | | - |
178 | | -### `has_cached_reference` |
179 | | - |
180 | | -Is there a cache entry for a specific reference number? |
181 | | - |
182 | | -Example input |
183 | | - |
184 | | -```json |
185 | | -{"type":"has_cached_reference","content":"abcd"} |
186 | | -``` |
| 72 | +You can now prefix your scraper's run command with `rtcv_scraper_client` and the scraper client program will run a webserver as long as your scraper runs where via you can communicate with RT-CV. |
187 | 73 |
|
188 | | -Ok Response |
| 74 | +If you have for a NodeJS project you can run your program like this: |
189 | 75 |
|
190 | | -```json |
191 | | -{"type":"ok","content":true} |
| 76 | +```sh |
| 77 | +rtcv_scraper_client npm run start |
192 | 78 | ``` |
193 | 79 |
|
194 | | -### `ping` |
| 80 | +## Routes available |
195 | 81 |
|
196 | | -Send a ping message to the server and the server will respond with a pong |
| 82 | +Notes: |
| 83 | +- The http method can be anything |
| 84 | +- If errors occur the response will be the error message with a 400 or higher status code |
197 | 85 |
|
198 | | -Example input |
| 86 | +### `$SCRAPER_ADDRESS/send_cv` |
199 | 87 |
|
200 | | -```json |
201 | | -{"type":"ping"} |
202 | | -``` |
| 88 | +Sends a cv to rtcv and remembers the reference number |
203 | 89 |
|
204 | | -Ok Response |
| 90 | +- Body: In JSON the cv send to RT-CV |
| 91 | +- Resp: **true** / **false** if the cv was sent to RT-CV |
205 | 92 |
|
206 | | -```json |
207 | | -{"type":"pong"} |
208 | | -``` |
| 93 | +### `$SCRAPER_ADDRESS/users` |
209 | 94 |
|
210 | | -## How to develop / Debug |
| 95 | +Returns the login users from the `env.json` |
211 | 96 |
|
212 | | -```sh |
213 | | -# After a change update the binary in your path using: |
214 | | -go install |
| 97 | +- Body: None |
| 98 | +- Resp: **true** / **false** the login users from env.json |
215 | 99 |
|
216 | | -# Then in your scraper project |
217 | | -cd ../some-scraper |
218 | | -go run . |
219 | | -``` |
| 100 | +### `$SCRAPER_ADDRESS/set_cached_reference` |
220 | 101 |
|
221 | | -Debug data send by scraper to this program |
222 | | -```sh |
223 | | -# Logs the IO to scraper_client_input.log |
224 | | -export LOG_SCRAPER_CLIENT_INPUT=true |
225 | | - |
226 | | -# Logs the debug messages to scraper_client.log |
227 | | -export ENABLE_SCRAPER_CLIENT_LOG=true |
| 102 | +Manually add another cached reference with the default ttl (3 days) |
228 | 103 |
|
229 | | -# Now run your scraper |
230 | | -go run . |
| 104 | +Note that this is also done by the send_cv route |
231 | 105 |
|
232 | | -# Now all messages written to this program are also written to scraper_client_input.log |
233 | | -# You can follow the input by tailing the file in a new terminal: |
234 | | -tail -f scraper_client_input.log |
| 106 | +- Body: The reference number |
| 107 | +- Resp: **true** |
235 | 108 |
|
236 | | -# All debug messages are now visible in scraper_client.log |
237 | | -# You can follow the input by tailing the file in a new terminal: |
238 | | -tail -f scraper_client_input.log |
| 109 | +### `$SCRAPER_ADDRESS/set_short_cached_reference` |
239 | 110 |
|
240 | | -# After collecting the input data you can also use rtcv_scraper_client to replay sending the data |
241 | | -# This is very handy for debugging |
242 | | -rtcv_scraper_client -replay scraper_client_input.log |
| 111 | +manually add another cached reference with a short ttl (12 hours) |
243 | 112 |
|
244 | | -# There are also additional options for the -replay command: |
245 | | -rtcv_scraper_client \ |
246 | | - -replay ./scraper_client_input.log \ |
247 | | - -replaySkipCommands has_cached_reference,set_cached_reference,ping,get_users_secret |
248 | | -``` |
| 113 | +- Body: The reference number |
| 114 | +- Resp: **true** |
249 | 115 |
|
| 116 | +### `$SCRAPER_ADDRESS/get_cached_reference` |
250 | 117 |
|
251 | | -## How to ship? |
| 118 | +Check if a reference number is in the cache |
252 | 119 |
|
253 | | -Currently we don't have pre build binaries so you'll need to compile the binary yourself |
| 120 | +- Body: The reference number |
| 121 | +- Resp: **true** / **false** |
254 | 122 |
|
255 | | -```Dockerfile |
256 | | -FROM golang:alpine AS obtain-rtcv-client |
257 | | -RUN go install github.com/script-development/rtcv_scraper_client@latest |
| 123 | +## Deno client docs |
258 | 124 |
|
259 | | -FROM denoland/deno:alpine AS runtime |
260 | | -COPY --from=obtain-rtcv-client /go/bin/rtcv_scraper_client /bin/rtcv_scraper_client |
261 | | -``` |
| 125 | +[click here](./DENO_README.md) |
0 commit comments