Skip to content

Commit 30f7ed9

Browse files
committed
delete old code
1 parent f962f04 commit 30f7ed9

File tree

8 files changed

+212
-1085
lines changed

8 files changed

+212
-1085
lines changed

README.md

Lines changed: 78 additions & 214 deletions
Original file line numberDiff line numberDiff line change
@@ -1,261 +1,125 @@
1-
# For Deno client readme [click here](./DENO_README.md)
2-
3-
# For the v2 scraper client readme [click here](./V2_README.md)
4-
51
# RT-CV scraper client
62

7-
A client that can be spawned by a scraper to ease communication with [RT-CV](https://github.com/script-development/RT-CV)
8-
9-
## Communication flow
10-
11-
```
12-
custom scraper <> rtcv_scraper_client (this project) <> RT-CV
13-
```
14-
15-
1. The custom scraper spawns rtcv_scraper_client as child process
16-
2. The custom scraper sends it's credentials to the child process via stdin
17-
3. rtcv_scraper_client handles the authentication and reports if it went successfull
18-
4. The custom scraper starts scraping and sends every scraped result to it's child process (rtcv_scraper_client)
19-
5. rtcv_scraper_client sends the scraped data to rt-cv and reports if it was successfull
20-
6. ...
21-
22-
## Methods
23-
24-
### Authentication
25-
26-
There are 3 authentication methods
27-
- `set_credentials` if you have one RT-CV server
28-
- `set_multiple_credentials` if you have multiple RT-CV servers
29-
- `set_mock` if you want to debug a scraper, this mocks the RT-CV server
3+
A helper program that aims to ease the communication between a scraper and [RT-CV](https://github.com/script-development/RT-CV)
304

31-
#### `set_credentials`
5+
## How does this work?
326

33-
Set credentials and location of a RT-CV server
7+
This scraper client handles authentication and communication with RT-CV and beside that also has a cache for the already fetched reference numbers.
348

35-
Example input
9+
This scraper works like this:
3610

37-
```json
38-
{"type":"set_credentials","content":{"server_location":"http://localhost:4000","api_key_id":"111111111111111111111111","api_key":"ddd"}}
39-
```
40-
41-
Ok Response
42-
43-
```json
44-
{"type":"ok"}
45-
```
46-
47-
#### `set_multiple_credentials`
11+
1. You run `rtcv_scraper_client` in your terminal
12+
2. The scraper client reads `env.json` and authenticates with RT-CV
13+
3. The scraper client spawns the program you have defined as args for the `rtcv_scraper_client`, for example with an npm scraper it would be something like `rtcv_scraper_client npm run start`
14+
4. Your scraper can now easially talk with `RT-CV` via `rtcv_scraper_client` using http requests where the http server address is defiend by a shell variable set by the scraper client `$SCRAPER_ADDRESS`
4815

49-
Set credentials and location of multiple RT-CV servers
16+
## Why this client?
5017

51-
Note that we need to define which server the primary should be, the primary server is used to fetch secrets and recently scraped reference numbers
18+
Every scraper needs to communicate with RT-CV and the amound of code that require is quite a lot.
5219

53-
Example input
20+
If we have the same code for communicating with RT-CV we only have a single point of failure and updating / adding features is easy.
5421

55-
```json
56-
{"type":"set_credentials","content":[{"primary":true,"server_location":"http://localhost:4000","api_key_id":"111111111111111111111111","api_key":"ddd"},{"server_location":"http://localhost:4000","api_key_id":"111111111111111111111111","api_key":"ddd"}]}
57-
```
22+
## Example
5823

59-
Ok Response
24+
A Deno example
6025

61-
```json
62-
{"type":"ok"}
26+
```ts
27+
// denoexample.ts
28+
const req = await fetch(Deno.env.get('SCRAPER_ADDRESS') + '/users')
29+
const users = await req.json()
30+
console.log(users)
6331
```
6432

65-
#### `set_mock`
66-
67-
Enable mock mode
68-
69-
Example input:
70-
71-
```json
72-
{"type":"set_mock","content":{}}
73-
```
74-
75-
You can also provide mock secrets for the `*_secret` methods.
76-
77-
This object need to follow the following type convention `key (string) -> value (any)`
78-
79-
```json
80-
{"type":"set_mock","content":{"secrets":{"users": [{"username":"foo","password":"bar"}],"user":{"username":"foo","password":"bar"}}}}
81-
```
82-
83-
Ok Response
84-
85-
```json
86-
{"type":"ok"}
87-
```
88-
89-
### `send_cv`
90-
91-
Send a scraped CV to RT-CV and triggers the `set_cached_reference` for the reference number of this cv if there where matches on this CV
92-
93-
Example input
94-
95-
```json
96-
{"type":"send_cv","content":{"reference_number":"abcd","..":".."}}
97-
```
98-
99-
Ok Response
100-
101-
*The content represents if a match was made*
102-
103-
```json
104-
{"type":"ok","content":true}
105-
```
106-
107-
### `get_secret`
108-
109-
Get a user defined secret from the server
110-
111-
Example input
112-
113-
```json
114-
{"type":"get_secret","content":{"encryption_key":"my-very-secret-encryption-key", "key":"key-of-value"}}
115-
```
116-
117-
Ok Response
118-
119-
```jsonc
120-
{"type":"ok","content":{/*Based on the content stored in the secret value*/}}
121-
```
122-
123-
### `get_users_secret`
124-
125-
Get a secret from the server where the contents is a strictly defined list of users
126-
127-
Example input
128-
129-
```json
130-
{"type":"get_users_secret","content":{"encryption_key":"my-very-secret-encryption-key", "key":"users"}}
131-
```
132-
133-
Ok Response
134-
135-
```json
136-
{"type":"ok","content":[{"username":"foo","password":"foo"},{"username":"bar","password":"bar"}]}
137-
```
138-
139-
### `get_user_secret`
140-
141-
Get a secret from the server where the contents is a strictly defined user
142-
143-
Example input
144-
145-
```json
146-
{"type":"get_user_secret","content":{"encryption_key":"my-very-secret-encryption-key", "key":"user"}}
147-
```
148-
149-
Ok Response
150-
151-
```json
152-
{"type":"ok","content":{"username":"foo","password":"bar"}}
33+
```sh
34+
# rtcv_scraper_client deno run -A denoexample.ts
35+
credentials set
36+
testing connections..
37+
connected to RTCV
38+
running scraper..
39+
Check file:///.../denoexample.ts
40+
[ { username: "username here", password: "password here" } ]
15341
```
15442
155-
### `set_cached_reference`
156-
157-
Save a reference number to the cache with a timeout of 3 days
158-
This cache is used to avoid sending the same CV twice or scraping data that has already been scraped
43+
## Setup & Run
15944
160-
*Note that "send_cv" also executes this function automatically*
45+
### *1.* Install the helper
16146
162-
Example input
163-
164-
```json
165-
{"type":"set_cached_reference","content":"abcd"}
47+
```sh
48+
go install github.com/script-development/rtcv_scraper_client@latest
16649
```
16750
168-
Ok Response
51+
### *2.* Obtain a `env.json`
16952
170-
```json
171-
{"type":"ok"}
53+
Create a `env.json` file with the following content **(this file can also be obtained from the RTCV dashboard, tough note that you might need to add login_users yourself)**
54+
```js
55+
{
56+
"primary_server": {
57+
"server_location": "http://localhost:4000",
58+
"api_key_id": "aa",
59+
"api_key": "bbb"
60+
},
61+
"alternative_servers": [
62+
// If you want to send CVs to multiple servers you can add additional servers here
63+
],
64+
"login_users": [
65+
{"username": "scraping-site-username", "password": "scraping-site-password"}
66+
]
67+
}
17268
```
17369
174-
### `set_short_cached_reference`
70+
### *3.* Develop / Deploy a scraper using `rtcv_scraper_client`
17571
176-
Same as previouse one except this one has a time out of 12 hours
177-
178-
### `has_cached_reference`
179-
180-
Is there a cache entry for a specific reference number?
181-
182-
Example input
183-
184-
```json
185-
{"type":"has_cached_reference","content":"abcd"}
186-
```
72+
You can now prefix your scraper's run command with `rtcv_scraper_client` and the scraper client program will run a webserver as long as your scraper runs where via you can communicate with RT-CV.
18773
188-
Ok Response
74+
If you have for a NodeJS project you can run your program like this:
18975
190-
```json
191-
{"type":"ok","content":true}
76+
```sh
77+
rtcv_scraper_client npm run start
19278
```
19379
194-
### `ping`
80+
## Routes available
19581
196-
Send a ping message to the server and the server will respond with a pong
82+
Notes:
83+
- The http method can be anything
84+
- If errors occur the response will be the error message with a 400 or higher status code
19785
198-
Example input
86+
### `$SCRAPER_ADDRESS/send_cv`
19987
200-
```json
201-
{"type":"ping"}
202-
```
88+
Sends a cv to rtcv and remembers the reference number
20389
204-
Ok Response
90+
- Body: In JSON the cv send to RT-CV
91+
- Resp: **true** / **false** if the cv was sent to RT-CV
20592
206-
```json
207-
{"type":"pong"}
208-
```
93+
### `$SCRAPER_ADDRESS/users`
20994
210-
## How to develop / Debug
95+
Returns the login users from the `env.json`
21196
212-
```sh
213-
# After a change update the binary in your path using:
214-
go install
97+
- Body: None
98+
- Resp: **true** / **false** the login users from env.json
21599
216-
# Then in your scraper project
217-
cd ../some-scraper
218-
go run .
219-
```
100+
### `$SCRAPER_ADDRESS/set_cached_reference`
220101
221-
Debug data send by scraper to this program
222-
```sh
223-
# Logs the IO to scraper_client_input.log
224-
export LOG_SCRAPER_CLIENT_INPUT=true
225-
226-
# Logs the debug messages to scraper_client.log
227-
export ENABLE_SCRAPER_CLIENT_LOG=true
102+
Manually add another cached reference with the default ttl (3 days)
228103
229-
# Now run your scraper
230-
go run .
104+
Note that this is also done by the send_cv route
231105
232-
# Now all messages written to this program are also written to scraper_client_input.log
233-
# You can follow the input by tailing the file in a new terminal:
234-
tail -f scraper_client_input.log
106+
- Body: The reference number
107+
- Resp: **true**
235108
236-
# All debug messages are now visible in scraper_client.log
237-
# You can follow the input by tailing the file in a new terminal:
238-
tail -f scraper_client_input.log
109+
### `$SCRAPER_ADDRESS/set_short_cached_reference`
239110
240-
# After collecting the input data you can also use rtcv_scraper_client to replay sending the data
241-
# This is very handy for debugging
242-
rtcv_scraper_client -replay scraper_client_input.log
111+
manually add another cached reference with a short ttl (12 hours)
243112
244-
# There are also additional options for the -replay command:
245-
rtcv_scraper_client \
246-
-replay ./scraper_client_input.log \
247-
-replaySkipCommands has_cached_reference,set_cached_reference,ping,get_users_secret
248-
```
113+
- Body: The reference number
114+
- Resp: **true**
249115
116+
### `$SCRAPER_ADDRESS/get_cached_reference`
250117
251-
## How to ship?
118+
Check if a reference number is in the cache
252119
253-
Currently we don't have pre build binaries so you'll need to compile the binary yourself
120+
- Body: The reference number
121+
- Resp: **true** / **false**
254122
255-
```Dockerfile
256-
FROM golang:alpine AS obtain-rtcv-client
257-
RUN go install github.com/script-development/rtcv_scraper_client@latest
123+
## Deno client docs
258124
259-
FROM denoland/deno:alpine AS runtime
260-
COPY --from=obtain-rtcv-client /go/bin/rtcv_scraper_client /bin/rtcv_scraper_client
261-
```
125+
[click here](./DENO_README.md)

0 commit comments

Comments
 (0)