Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile.consumer
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ FROM senzing/senzingsdk-runtime:4.0.0
USER root

RUN apt-get update \
&& apt-get -y install --no-install-recommends curl python3 python3-pip python3-boto3 \
&& apt-get -y install --no-install-recommends python3 python3-pip python3-boto3 \
&& apt-get -y autoremove \
&& apt-get -y clean

Expand Down
2 changes: 1 addition & 1 deletion Dockerfile.exporter
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ FROM senzing/senzingsdk-runtime:4.0.0
USER root

RUN apt-get update \
&& apt-get -y install --no-install-recommends curl python3 python3-pip python3-boto3 \
&& apt-get -y install --no-install-recommends python3 python3-pip python3-boto3 \
&& apt-get -y autoremove \
&& apt-get -y clean

Expand Down
8 changes: 7 additions & 1 deletion Dockerfile.tools
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ RUN echo "deb http://apt.postgresql.org/pub/repos/apt bookworm-pgdg main" > /etc
# Update packages and install additional dependencies.
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install --no-install-recommends -y awscli pipx postgresql-client senzingsdk-poc && \
apt-get install --no-install-recommends -y awscli pipx postgresql-client senzingsdk-poc python3 python3-pip python3-boto3 && \
apt-get autoremove \
&& apt-get clean

Expand All @@ -30,9 +30,15 @@ USER senzing
ENV PATH="$PATH:/home/senzing/.local/bin"
RUN pipx install awscli-local

ENV PYTHONPATH=$PYTHONPATH:/home/senzing/dev:/usr/lib/python3/dist-packages

# Define volumes necessary to support a read-only root filesystem on ECS
# Fargate.
VOLUME ["/home/senzing", "/var/lib/amazon", "/var/log"]

WORKDIR /home/senzing

RUN mkdir -p dev
COPY dev-scripts/* dev

ENTRYPOINT ["/entrypoint.sh"]
150 changes: 119 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,39 +51,13 @@ and run the consumer service on our local machine. This setup includes:
docker compose up -d
```

### Consumer

Spinning up a consumer service (intended to be a continually-running process; in
a production scenarion, multiple instances could be running simultaneously as
needed):

```bash
docker compose run --env AWS_PROFILE=some-profile-name --env \
Q_URL="http://sqs.us-east-1.localhost.localstack.cloud:4566/000000000000/sqs-senzing-local-ingest" \
consumer
```

### Exporter

Spinning up the exporter middleware (this is intended to be an ephemeral
container):

```bash
docker compose run --env AWS_PROFILE=localstack --env S3_BUCKET_NAME=sqs-senzing-local-export exporter
```

You can view information about files in the Localstack S3 bucket by visiting
this URL:

http://localhost:4566/sqs-senzing-local-export
### Using the services (tools container)

### Using the services (Tools container)
Access the `tools` container to interact with the services:

1. Access the `tools` container to interact with the services:

```bash
docker compose run tools /bin/bash
```
```bash
docker compose run tools /bin/bash
```

The `tools` container should be configured with the necessary environment
variables to interact with the SQS and S3 services in LocalStack, as well as the
Expand Down Expand Up @@ -116,6 +90,120 @@ sz_command -C add_record \
PEOPLE 1 '{"NAME_FULL":"Robert Smith", "DATE_OF_BIRTH":"7/4/1976", "PHONE_NUMBER":"555-555-2088"}'
```

#### Loading sample data

From inside the tools container:

1. Download the sample data sets; see:
https://senzing.com/docs/quickstart/quickstart_docker/#download-the-files
2. Register the data source names using `sz_configtool`; see:
https://senzing.com/docs/quickstart/quickstart_docker/#add-the-data-source
3. Actually load each of the data files into the Senzing database, i.e.:

sz_file_loader -f customers.jsonl
sz_file_loader -f reference.jsonl
sz_file_loader -f watchlist.jsonl

#### Additional utilities

##### Senzing and the database

Load a single record as a simple test:

docker compose run tools python dev/add_1_record.py

Purge the database:

docker compose run tools python dev/db_purge.py

##### S3

You might need to configure an AWS profile before using these S3-related
utilities. See further down below for how to do that.

Copy a file out of the LocalStack S3 bucket into `~/tmp` on your machine (be
sure this folder already exists -- on macOS, that would be
`/Users/yourusername/tmp`):

> [!NOTE]
> You will need to manually create `/Users/yourusername/tmp` if it
> doesn't already exist.

# Here, `hemingway.txt` is the file you wish to retrieve from S3.
docker compose run tools python3 dev/s3_get.py hemingway.txt

Purge the LocalStack S3 bucket:

docker compose run tools python3 dev/s3_purge.py

## Middleware

There are three middleware applications:

- consumer (continually-running service)
- redoer (continually-running service)
- exporter (ephemeral container)

### Configuring an AWS profile for LocalStack

To use the middleware (consumer, etc.) with LocalStack, an AWS profile specific
to LocalStack will be needed.

Your `~/.aws/config` file should have something like:

[profile localstack]
region = us-east-1
output = json
ignore_configure_endpoint_urls = true
endpoint_url = http://localhost:4566

Your `~/.aws/credentials` file should have:

[localstack]
aws_access_key_id=test
aws_secret_access_key=test

Generally speaking, the `endpoint_url` argument will be needed when
instantiating client objects for use with particular LocalStack services, e.g.:

sess = boto3.Session()
if 'AWS_ENDPOINT_URL' in os.environ:
return sess.client('s3', endpoint_url=os.environ['AWS_ENDPOINT_URL'])
else:
return sess.client('s3')

### Consumer

Spinning up the consumer middleware (intended to be a continually-running
process; in a production scenario, multiple instances could be running
simultaneously as needed):

```bash
docker compose run --env AWS_PROFILE=localstack --env \
Q_URL="http://sqs.us-east-1.localhost.localstack.cloud:4566/000000000000/sqs-senzing-local-ingest" \
--env LOG_LEVEL=INFO consumer
```

`LOG_LEVEL` is optional; defaults to `INFO`.

### Exporter

Spinning up the exporter middleware (this is intended to be an ephemeral
container):

```bash
docker compose run --env AWS_PROFILE=localstack --env S3_BUCKET_NAME=sqs-senzing-local-export \
--env LOG_LEVEL=INFO exporter
```

`LOG_LEVEL` is optional; defaults to `INFO`.

You can view information about files in the LocalStack S3 bucket by visiting
this URL:

http://localhost:4566/sqs-senzing-local-export


[awslocal]: https://docs.localstack.cloud/aws/integrations/aws-native-tools/aws-cli/#localstack-aws-cli-awslocal
[localstack]: https://www.localstack.cloud/
[senzing]: https://senzing.com
Expand Down
File renamed without changes.
4 changes: 2 additions & 2 deletions middleware/sz_purge.py → dev-scripts/db_purge.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@
sz_factory = sz_core.SzAbstractFactoryCore("sz_factory_1", senzing_engine_configuration_json)
sz_diagnostic = sz_factory.create_diagnostic()

print('Are you sure you want to purge the repository? If so, type YES:')
print('Are you sure you want to purge the database? If so, type YES:')
ans = input('>')
if ans == 'YES':
sz_diagnostic.purge_repository()
else:
print('Everything left as-is.')
print('Nothing was done. Everything was left as-is.')
24 changes: 24 additions & 0 deletions dev-scripts/s3_get.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
import os
import sys
import boto3

def make_s3_client():
try:
sess = boto3.Session()
return sess.client('s3', endpoint_url=os.environ['AWS_ENDPOINT_URL'])
except Exception as e:
print(e)
sys.exit(1)

def get_file_from_s3(key):
'''Get file from S3 and write to /tmp (use docker-compose to map this
to desired directory on host machine).'''
s3 = make_s3_client()
print('Grabbing file...')
resp = s3.download_file(os.environ['S3_BUCKET_NAME'], key, '/tmp/'+key)
print ('Got file, put in tmp')

print("Starting util_s3_retrieve ...")
fname = sys.argv[1]
get_file_from_s3(fname)
print("Done")
16 changes: 16 additions & 0 deletions dev-scripts/s3_purge.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
import os
import boto3

def purge_s3():
s3 = boto3.resource('s3', endpoint_url=os.environ['AWS_ENDPOINT_URL'])
buck = s3.Bucket(os.environ['S3_BUCKET_NAME'])
print('Purging...')
buck.objects.all().delete()

print('Are you sure you want to purge the S3 bucket (' + os.environ['S3_BUCKET_NAME'] + ')? If so, type YES:')
ans = input('>')
if ans == 'YES':
purge_s3()
print('Done.')
else:
print('Nothing was done. Everything was left as-is.')
7 changes: 7 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,13 @@ services:
"CONNECTION": "postgresql://${POSTGRES_USERNAME:-senzing}:${POSTGRES_PASSWORD:-senzing}@db:5432:${POSTGRES_DB:-G2}/?sslmode=disable"
}
}
AWS_PROFILE: localstack
S3_BUCKET_NAME: sqs-senzing-local-export
PYTHONPATH: ${PYTHONPATH}:/opt/senzing/er/sdk/python:/home/senzing/dev:/usr/lib/python3/dist-packages
PYTHONUNBUFFERED: 1 # Flush buffer - helps with print statements.
volumes:
- ~/tmp:/tmp
- ~/.aws:/home/senzing/.aws

consumer:
build:
Expand Down
12 changes: 6 additions & 6 deletions middleware/consumer.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,9 +51,10 @@ def init():
'''Returns sqs client object'''
try:
sess = _make_boto_session()
sqs = sess.client('sqs')
log.info(AWS_TAG + 'SQS client object instantiated.')
return sqs
if 'AWS_ENDPOINT_URL' in os.environ:
return sess.client('sqs', endpoint_url=os.environ['AWS_ENDPOINT_URL'])
else:
return sess.client('sqs')
except Exception as e:
log.error(AWS_TAG + str(e))
sys.exit(1)
Expand All @@ -66,9 +67,8 @@ def get_msgs(sqs, q_url):
- Body -- here, should be the JSONL record as a string
'''
while 1:
print('waiting for msg')
try:
log.info(AWS_TAG + 'Polling SQS for the next message')
log.debug(AWS_TAG + 'Polling SQS for the next message')
resp = sqs.receive_message(QueueUrl=q_url, MaxNumberOfMessages=1,
WaitTimeSeconds=POLL_SECONDS)
if 'Messages' in resp and len(resp['Messages']) == 1:
Expand Down Expand Up @@ -155,7 +155,7 @@ def go():
# Get next message.
msg = next(msgs)
receipt_handle, body = msg['ReceiptHandle'], msg['Body']
log.info('SQS message retrieved, having ReceiptHandle: '
log.debug('SQS message retrieved, having ReceiptHandle: '
+ receipt_handle)
rcd = json.loads(body)

Expand Down
20 changes: 0 additions & 20 deletions middleware/exporter.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,23 +128,3 @@ def main():

if __name__ == '__main__': main()

#-------------------------------------------------------------------------------
# ad-hoc test funcs - might move later

def _upload_test_file_to_s3():
print("Starting test upload to S3 ...")
s3 = make_s3_client()
print(s3)
fname = 'hemingway.txt'
resp = s3.upload_file(fname, S3_BUCKET_NAME, fname)
print(resp)
print('Upload successful.')

def _get_file_from_s3(key):
'''Get file from S3 and write to /tmp (use docker-compose to map this
to desired directory on host machine).'''
print('Grabbing file...')
s3 = make_s3_client()
resp = s3.download_file(S3_BUCKET_NAME, key, '/tmp/'+key)
print(resp)
print('Done.')
5 changes: 4 additions & 1 deletion middleware/loglib.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
import logging
import os
import sys

AWS_TAG = '[AWS] '
Expand All @@ -7,13 +8,15 @@

_instantiated_loggers = {}

LOG_LEVEL = os.environ.get('LOG_LEVEL', 'INFO')

def retrieve_logger(tag='default'):
global _instantiated_loggers
if tag in _instantiated_loggers:
return _instantiated_loggers[tag]
else:
x = logging.getLogger(tag)
x.setLevel(logging.INFO)
x.setLevel(LOG_LEVEL)
handler = logging.StreamHandler()
fmt = logging.Formatter(
'[%(asctime)s] [%(levelname)s] ' \
Expand Down
Loading