Skip to content

PoC: Gzip-base64 encoding for aspect values #13360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Conversation

skrydal
Copy link
Collaborator

@skrydal skrydal commented Apr 29, 2025

Attempt to overcome 16M bytes limitation.

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX labels Apr 29, 2025
Copy link

codecov bot commented Apr 29, 2025

❌ 25 Tests Failed:

Tests completed Failed Passed Skipped
2885 25 2860 89
View the top 3 failed test(s) by shortest run time
tests.restli.restli_test::test_gms_ignore_unknown_dashboard_info
Stack Traces | 0.072s run time
graph_client = DataHubGraph: configured to talk to http://localhost:8080 with token: eyJh**********Tkvc

    def test_gms_ignore_unknown_dashboard_info(graph_client):
        dashboard_urn = make_dashboard_urn(platform="looker", name="test-ignore-unknown")
        generated_urns.extend([dashboard_urn])
    
        audit_stamp = pre_json_transform(
            ChangeAuditStampsClass(
                created=AuditStampClass(
                    time=int(time.time() * 1000),
                    actor="urn:li:corpuser:datahub",
                )
            ).to_obj()
        )
    
        invalid_dashboard_info = {
            "title": "Ignore Unknown Title",
            "description": "Ignore Unknown Description",
            "lastModified": audit_stamp,
            "notAValidField": "invalid field value",
        }
        mcpw = MetadataChangeProposalInvalidWrapper(
            entityUrn=dashboard_urn,
            aspectName="dashboardInfo",
            aspect=invalid_dashboard_info,
        )
    
        mcp = mcpw.make_mcp()
        assert "notAValidField" in str(mcp)
        assert "invalid field value" in str(mcp)
    
>       graph_client.emit_mcp(mcpw, async_flag=False)

tests/restli/restli_test.py:89: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
...../datahub/emitter/rest_emitter.py:441: in emit_mcp
    mcp_obj = pre_json_transform(mcp.to_obj())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = MetadataChangeProposalInvalidWrapper(entityType='dashboard', changeType='UPSERT', entityUrn='urn:li:dashboard:(looker,...rties': {'clientId': 'acryl-datahub', 'clientVersion': '1!0.0.0.dev0'}, 'version': None}), headers=None, use_gzip=True)
tuples = False, simplified_structure = False, use_gzip = None

    def to_obj(self, tuples: bool = False, simplified_structure: bool = False, use_gzip: bool = None) -> dict:
        # The simplified_structure parameter is used to make the output
        # not contain nested JSON strings. Instead, it unpacks the JSON
        # string into an object.
    
>       obj = self.make_mcp(use_gzip=use_gzip).to_obj(tuples=tuples)
E       TypeError: MetadataChangeProposalInvalidWrapper.make_mcp() got an unexpected keyword argument 'use_gzip'

...../datahub/emitter/mcp.py:203: TypeError
tests.integration.delta_lake.test_local_delta_lake::test_delta_lake[relative_path.json]
Stack Traces | 0.127s run time
Metadata files differ (use `pytest --update-golden-files` to update):
{'values_changed': {"root[0]['proposedSnapshot']['com.linkedin.pegasus2avro.metadata.snapshot.DatasetSnapshot']['aspects'][2]['com.linkedin.pegasus2avro.schema.SchemaMetadata']['fields'][0]": {'new_value': {'fieldPath': '[version=2.0].[type=int].bar',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'nativeDataType': 'integer',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.NumberType': {}}}},
                                                                                                                                                                                                 'old_value': {'fieldPath': '[version=2.0].[type=int].bar',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'jsonProps': '{"native_data_type": '
                                                                                                                                                                                                                            '"integer", '
                                                                                                                                                                                                                            '"_nullable": '
                                                                                                                                                                                                                            'true}',
                                                                                                                                                                                                               'nativeDataType': 'integer',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.NumberType': {}}}}},
                    "root[0]['proposedSnapshot']['com.linkedin.pegasus2avro.metadata.snapshot.DatasetSnapshot']['aspects'][2]['com.linkedin.pegasus2avro.schema.SchemaMetadata']['fields'][1]": {'new_value': {'fieldPath': '[version=2.0].[type=int].foo',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'nativeDataType': 'integer',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.NumberType': {}}}},
                                                                                                                                                                                                 'old_value': {'fieldPath': '[version=2.0].[type=int].foo',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'jsonProps': '{"native_data_type": '
                                                                                                                                                                                                                            '"integer", '
                                                                                                                                                                                                                            '"_nullable": '
                                                                                                                                                                                                                            'true}',
                                                                                                                                                                                                               'nativeDataType': 'integer',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.NumberType': {}}}}},
                    "root[0]['proposedSnapshot']['com.linkedin.pegasus2avro.metadata.snapshot.DatasetSnapshot']['aspects'][2]['com.linkedin.pegasus2avro.schema.SchemaMetadata']['fields'][2]": {'new_value': {'fieldPath': '[version=2.0].[type=string].zip',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'nativeDataType': 'string',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.StringType': {}}}},
                                                                                                                                                                                                 'old_value': {'fieldPath': '[version=2.0].[type=string].zip',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'jsonProps': '{"native_data_type": '
                                                                                                                                                                                                                            '"string", '
                                                                                                                                                                                                                            '"_nullable": '
                                                                                                                                                                                                                            'true}',
                                                                                                                                                                                                               'nativeDataType': 'string',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.StringType': {}}}}}}}
tests.integration.delta_lake.test_local_delta_lake::test_delta_lake[single_table.json]
Stack Traces | 0.374s run time
Metadata files differ (use `pytest --update-golden-files` to update):
{'values_changed': {"root[0]['proposedSnapshot']['com.linkedin.pegasus2avro.metadata.snapshot.DatasetSnapshot']['aspects'][2]['com.linkedin.pegasus2avro.schema.SchemaMetadata']['fields'][0]": {'new_value': {'fieldPath': '[version=2.0].[type=int].bar',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'nativeDataType': 'integer',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.NumberType': {}}}},
                                                                                                                                                                                                 'old_value': {'fieldPath': '[version=2.0].[type=int].bar',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'jsonProps': '{"native_data_type": '
                                                                                                                                                                                                                            '"integer", '
                                                                                                                                                                                                                            '"_nullable": '
                                                                                                                                                                                                                            'true}',
                                                                                                                                                                                                               'nativeDataType': 'integer',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.NumberType': {}}}}},
                    "root[0]['proposedSnapshot']['com.linkedin.pegasus2avro.metadata.snapshot.DatasetSnapshot']['aspects'][2]['com.linkedin.pegasus2avro.schema.SchemaMetadata']['fields'][1]": {'new_value': {'fieldPath': '[version=2.0].[type=int].foo',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'nativeDataType': 'integer',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.NumberType': {}}}},
                                                                                                                                                                                                 'old_value': {'fieldPath': '[version=2.0].[type=int].foo',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'jsonProps': '{"native_data_type": '
                                                                                                                                                                                                                            '"integer", '
                                                                                                                                                                                                                            '"_nullable": '
                                                                                                                                                                                                                            'true}',
                                                                                                                                                                                                               'nativeDataType': 'integer',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.NumberType': {}}}}},
                    "root[0]['proposedSnapshot']['com.linkedin.pegasus2avro.metadata.snapshot.DatasetSnapshot']['aspects'][2]['com.linkedin.pegasus2avro.schema.SchemaMetadata']['fields'][2]": {'new_value': {'fieldPath': '[version=2.0].[type=string].zip',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'nativeDataType': 'string',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.StringType': {}}}},
                                                                                                                                                                                                 'old_value': {'fieldPath': '[version=2.0].[type=string].zip',
                                                                                                                                                                                                               'isPartOfKey': False,
                                                                                                                                                                                                               'jsonProps': '{"native_data_type": '
                                                                                                                                                                                                                            '"string", '
                                                                                                                                                                                                                            '"_nullable": '
                                                                                                                                                                                                                            'true}',
                                                                                                                                                                                                               'nativeDataType': 'string',
                                                                                                                                                                                                               'nullable': True,
                                                                                                                                                                                                               'recursive': False,
                                                                                                                                                                                                               'type': {'type': {'com.linkedin.pegasus2avro.schema.StringType': {}}}}}}}

To view more test analytics, go to the Test Analytics Dashboard
📋 Got 3 mins? Take this short survey to help us improve Test Analytics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant