-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Exporter middleware #13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
9d6bd01
feat: Exporter middleware
spompea-cfa d861da6
logic for proper JSON blob
spompea-cfa 51ad2dc
add to trivyignore
spompea-cfa cb7e702
add logging + general tidying
spompea-cfa fc9a942
update README
spompea-cfa 3003d33
small tweak
spompea-cfa 4c02130
PR feedback etc
spompea-cfa File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| FROM senzing/senzingsdk-runtime:4.0.0 | ||
|
|
||
| USER root | ||
|
|
||
| RUN apt-get update \ | ||
| && apt-get -y install --no-install-recommends curl python3 python3-pip python3-boto3 \ | ||
| && apt-get -y autoremove \ | ||
| && apt-get -y clean | ||
|
|
||
| WORKDIR /app | ||
| COPY middleware/* . | ||
|
|
||
| # Add a new user and switch to it. | ||
| RUN useradd -m -u 1001 senzing | ||
| USER senzing | ||
|
|
||
| ENV PYTHONPATH=/opt/senzing/er/sdk/python:/app | ||
|
|
||
| # Flush buffer - helps with print statements. | ||
| ENV PYTHONUNBUFFERED=1 | ||
|
|
||
| CMD ["python3", "exporter.py"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,150 @@ | ||
| import json | ||
| import io | ||
| import os | ||
| import time | ||
| import sys | ||
| import boto3 | ||
| import senzing as sz | ||
|
|
||
| from loglib import * | ||
| log = retrieve_logger() | ||
|
|
||
| try: | ||
| log.info('Importing senzing_core library . . .') | ||
| import senzing_core as sz_core | ||
| log.info('Imported senzing_core successfully.') | ||
| except Exception as e: | ||
| log.error('Importing senzing_core library failed.') | ||
| log.error(e) | ||
| sys.exit(1) | ||
|
|
||
| if 'SENZING_ENGINE_CONFIGURATION_JSON' not in os.environ: | ||
| log.error('SENZING_ENGINE_CONFIGURATION_JSON environment variable required.') | ||
| sys.exit(1) | ||
| SZ_CONFIG = json.loads(os.environ['SENZING_ENGINE_CONFIGURATION_JSON']) | ||
|
|
||
| if 'S3_BUCKET_NAME' not in os.environ: | ||
| log.error('S3_BUCKET_NAME environment variable required.') | ||
| sys.exit(1) | ||
| S3_BUCKET_NAME = os.environ['S3_BUCKET_NAME'] | ||
|
|
||
| EXPORT_FLAGS = sz.SzEngineFlags.SZ_EXPORT_DEFAULT_FLAGS | ||
|
|
||
| #------------------------------------------------------------------------------- | ||
|
|
||
| def ts(): | ||
| '''Return current timestamp in ms as a str''' | ||
| return str(int(round(time.time() * 1000))) | ||
|
|
||
| def make_s3_client(): | ||
| try: | ||
| sess = boto3.Session() | ||
| if 'AWS_ENDPOINT_URL' in os.environ: | ||
| return sess.client('s3', endpoint_url=os.environ['AWS_ENDPOINT_URL']) | ||
| else: | ||
| return sess.client('s3') | ||
| except Exception as e: | ||
| log.error(AWS_TAG + str(e)) | ||
| sys.exit(1) | ||
|
|
||
| def go(): | ||
| ''' | ||
| Exports Senzing JSON entity report data into a buffer, then | ||
| uploads the buffer as a file into the output S3 bucket. | ||
|
|
||
| References: | ||
| - https://garage.senzing.com/sz-sdk-python/senzing.html#senzing.szengine.SzEngine.export_json_entity_report | ||
| - https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3/client/upload_fileobj.html | ||
| ''' | ||
|
|
||
| # Init S3 client | ||
| s3 = make_s3_client() | ||
|
|
||
| # Init senzing engine object. | ||
| # Note that Senzing engine object cannot be passed around between functions, | ||
| # else it will be eagerly cleaned up / destroyed and no longer usable. | ||
| sz_eng = None | ||
| try: | ||
| sz_factory = sz_core.SzAbstractFactoryCore("ERS", SZ_CONFIG) | ||
| sz_eng = sz_factory.create_engine() | ||
| log.info(SZ_TAG + 'Senzing engine object instantiated.') | ||
| except sz.SzError as sz_err: | ||
| log.error(SZ_TAG + str(sz_err)) | ||
| sys.exit(1) | ||
| except Exception as e: | ||
| log.error(str(e)) | ||
| sys.exit(1) | ||
|
|
||
| # init buffer | ||
| buff = io.BytesIO() | ||
|
|
||
| # Retrieve output from sz into buff | ||
| # sz will export JSONL lines; we add the chars necessary to make | ||
| # the output as a whole be a single JSON blob. | ||
| log.info(SZ_TAG + 'Starting export from Senzing.') | ||
| try: | ||
| export_handle = sz_eng.export_json_entity_report(EXPORT_FLAGS) | ||
| log.info(SZ_TAG + 'Obtained export_json_entity_report handle.') | ||
| buff.write('['.encode('utf-8')) | ||
| while 1: | ||
| log.debug(SZ_TAG + 'Fetching chunk...') | ||
| chunk = sz_eng.fetch_next(export_handle) | ||
| if not chunk: | ||
| break | ||
| buff.write(chunk.encode('utf-8')) | ||
| log.debug('Wrote chunk to buffer.') | ||
| buff.write(','.encode('utf-8')) | ||
| sz_eng.close_export_report(export_handle) | ||
| log.info(SZ_TAG + 'Closed export handle.') | ||
| buff.seek(-1, os.SEEK_CUR) # toss out last comma | ||
| buff.write(']'.encode('utf-8')) | ||
| log.info('Total bytes exported/buffered: ' + str(buff.getbuffer().nbytes)) | ||
| except sz.SzError as err: | ||
| log.error(SZ_TAG + str(err)) | ||
| except Exception as e: | ||
| log.error(str(e)) | ||
|
|
||
| # rewind buffer | ||
| buff.seek(0) | ||
| buff.flush() | ||
|
|
||
| # write buff to S3 using upload_fileobj | ||
| fname = 'output-' + ts() + '.json' | ||
| log.info(AWS_TAG + 'About to upload JSON file ' + fname + ' to S3 ...') | ||
| try: | ||
| s3.upload_fileobj(buff, S3_BUCKET_NAME, fname) | ||
| log.info(AWS_TAG + 'Successfully uploaded file.') | ||
| except Exception as e: | ||
| log.error(AWS_TAG + str(e)) | ||
|
|
||
| #------------------------------------------------------------------------------- | ||
|
|
||
| def main(): | ||
| log.info('====================') | ||
| log.info(' EXPORTER') | ||
| log.info(' *STARTED*') | ||
| log.info('====================') | ||
| go() | ||
|
|
||
| if __name__ == '__main__': main() | ||
|
|
||
| #------------------------------------------------------------------------------- | ||
| # ad-hoc test funcs - might move later | ||
|
|
||
| def _upload_test_file_to_s3(): | ||
| print("Starting test upload to S3 ...") | ||
| s3 = make_s3_client() | ||
| print(s3) | ||
| fname = 'hemingway.txt' | ||
| resp = s3.upload_file(fname, S3_BUCKET_NAME, fname) | ||
| print(resp) | ||
| print('Upload successful.') | ||
|
|
||
| def _get_file_from_s3(key): | ||
| '''Get file from S3 and write to /tmp (use docker-compose to map this | ||
| to desired directory on host machine).''' | ||
| print('Grabbing file...') | ||
| s3 = make_s3_client() | ||
| resp = s3.download_file(S3_BUCKET_NAME, key, '/tmp/'+key) | ||
| print(resp) | ||
| print('Done.') |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,24 @@ | ||
| import logging | ||
| import sys | ||
|
|
||
| AWS_TAG = '[AWS] ' | ||
| SZ_TAG = '[SZ] ' | ||
| DLQ_TAG = '[DLQ] ' | ||
|
|
||
| _instantiated_loggers = {} | ||
|
|
||
| def retrieve_logger(tag='default'): | ||
| global _instantiated_loggers | ||
| if tag in _instantiated_loggers: | ||
| return _instantiated_loggers[tag] | ||
| else: | ||
| x = logging.getLogger(tag) | ||
| x.setLevel(logging.INFO) | ||
| handler = logging.StreamHandler() | ||
| fmt = logging.Formatter( | ||
| '[%(asctime)s] [%(levelname)s] ' \ | ||
| '[%(filename)s:%(lineno)s] %(message)s') | ||
| handler.setFormatter(fmt) | ||
| x.addHandler(handler) | ||
| _instantiated_loggers[tag] = x | ||
| return x |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| It was very late and everyone had left the cafe except an old man who sat in the | ||
| shadow the leaves of the tree made against the electric light. In the day time the | ||
| street was dusty, but at night the dew settled the dust and the old man liked to sit | ||
| late because he was deaf and now at night it was quiet and he felt the difference. | ||
| The two waiters inside the cafe knew that the old man was a little drunk, and while | ||
| he was a good client they knew that if he became too drunk he would leave without | ||
| paying, so they kept watch on him. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.