Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch compound document #105

Merged
merged 18 commits into from
Nov 20, 2024
Merged

Fetch compound document #105

merged 18 commits into from
Nov 20, 2024

Conversation

leoraba
Copy link
Contributor

@leoraba leoraba commented Oct 22, 2024

Description

This PR introduces a new feature to this project - View representation of a compound document represented in a hierarchical structure of related Submitted Data, consolidating all related data for a central entity, or centric schema, while organizing both its parent and child relationships within the document. This structure will enable more efficient data access by aggregating connected data into a single view.

Here's how each relationships are represented:

  • Centric Schema: The central entity is the focal point of the document. All related data, including both parent and child records, is organized around this entity.
  • Child entities: Displayed in arrays within the centric schema document. Each child entry includes relevant information and may also contain an array of its own children allowing for a nested, hierarchical structure.
  • Parent entity: representing the entity to which the centric entity belongs, is stored as a single, unique field within the centric schema document.

Code changes:

  • Changes on existing GET data endpoints to add a query parameter view. This specify the data representation to be returned. Use list to return an unordered list of entities; compound to return a compound document nesting using scheme centric structure.
  • Change on register new Dictionary endpoint to include new body parameter defaultCentricEntity. This value is used to form compound documents.
  • New environment variables: NESTED_RECORD_PREFIX, NESTED_RECORD_SUFFIX , PARENT_RECORD_PREFIX, PARENT_RECORD_SUFFIX
  • Add new GET data by systemID endpoint.

@leoraba leoraba marked this pull request as draft October 22, 2024 12:38
Comment on lines 59 to 61
/data/category/{categoryId}/id/{systemId}:
get:
summary: Retrieve Submitted Data Record for a System ID
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a new Endpoint to retrieve record by it's systemID

@@ -31,6 +31,13 @@
schema:
type: integer
description: Optional query parameter to specify the number of results per page. Default value is 20
- name: view
in: query
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding view query param on GET data endpoints

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm nervous about the performance of compound object retrieval in a list... there could be many joins that are slow to resolve. I suppose if this response is paginated then this is less of a concern.

Am I correct in assuming that the purpose of the defaultCentricEntity property you are introducing is to determine which entity to use as the root when returning data in this form?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint and others that fetch data as list or compound implements pagination. with default of 20 items per page. This will alleviate fetching large mount of data.

It's correct, the property defaultCentricEntity is used to determine thee root schema in compound view

Comment on lines +26 to +28
defaultCentricEntity:
type: string
description: The default centric entity name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding new input parameter on Dictionary Registration endpoint. This is value is used to retrieve compound records

* @param order Order of the structed.
* @returns The hierarchical tree structure.
*/
export const generateHierarchy = (source: SchemaDefinition[], order: OrderType): TreeNode[] => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main function to generate hierarchy structure based on dictionary schemas.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider writing this function to return both the ascending and descending hierarchies in one call? Do you we have a use case where we want only one and not the other?

I see us using this function in 3 places, and each time we call it for both ascending and descending trees.

@leoraba leoraba marked this pull request as ready for review October 23, 2024 13:08

- `original_schema_id`: This field references the original schema version used during the initial submission of the data, allowing for a historical perspective on schema changes over time.

- `submission_id`: This field links the audit record to the specific submission, creating a direct connection to the submissin responsible for this action.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... connection to the submission responsible for this action.
Missing 'o' in submission

- `created_at`: This timestamp records when the audit entry was created, providing a chronological context for the logged action.

- `created_by`: This field indicates who performed the action, enhancing accountability by identifying the user responsible for the change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are great docs 👍

@@ -10,15 +10,18 @@ import {
dataGetByCategoryRequestSchema,
dataGetByOrganizationRequestSchema,
dataGetByQueryRequestschema,
dataGetBySystemIdRequestschema,
Copy link

@demariadaniel demariadaniel Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent casing for variables:

dataGetByCategoryRequestSchema, dataGetByOrganizationRequestSchema,
vs
dataGetByQueryRequestschema, dataGetBySystemIdRequestschema,

@@ -0,0 +1,2 @@
ALTER TABLE "dictionary_categories" ALTER COLUMN "active_dictionary_id" SET NOT NULL;--> statement-breakpoint

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is --> statement-breakpoint intended to be committed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .sql file is auto generated by Drizzle Kit. unfortunately this comment is added

};

/**
* Finds a matching schema name withing a nested object.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo within a nested object (extra 'g')

const getSubmittedDataByCategory = async (
categoryId: number,
paginationOptions: PaginationOptions,
filterOptions: { entityName?: (string | undefined)[] },
filterOptions: { entityName?: (string | undefined)[]; view: ViewType },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view here could be optional right? { entityName?: (string | undefined)[]; view?: ViewType },

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Controllers level, view is optional. For service functions, however, it is required, and we define a default view for them.

const getSubmittedDataByOrganization = async (
categoryId: number,
organization: string,
paginationOptions: PaginationOptions,
filterOptions?: { sqon?: SQON; entityName?: (string | undefined)[] },
filterOptions: { sqon?: SQON; entityName?: (string | undefined)[]; view: ViewType },
Copy link

@demariadaniel demariadaniel Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

view here is also optional

Also there's enough overlap between getSubmittedDataByCategory & getSubmittedDataByOrganization that filterOptions could be defined as a reusable variable

const getSubmittedDataBySystemId = async (
categoryId: number,
systemId: string,
filterOptions: { view: ViewType },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tied into the above discussion, view is not mandatory here


expect(result).to.eql({ sports: [] });

describe('find the hierarchycal descendant structure between schemas in the dictionary', () => {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling hierarchical

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks like a made up word lol.


describe('Dictionary Schema Relations', () => {
it('should return the schema children nodes on a Dictionary', () => {
const result = getDictionarySchemaRelations(dictionarySportStats.dictionary);
describe('Determine child relations by schema in a dictionary', () => {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hurray for tests!

@leoraba
Copy link
Contributor Author

leoraba commented Oct 30, 2024

PR updated. Fixed typos and replied to comments.

Copy link

@demariadaniel demariadaniel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

README.md Outdated
| `ID_USELOCAL` | Generate ID locally | true |
| `LECTERN_URL` | Schema Service (Lectern) URL | |
| `LOG_LEVEL` | Log Level | 'info' |
| `PLURALIZE_SCHEMAS_ENABLED` | This feature automatically convert schema names to their plural forms when handling compound documents | true |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New environment variable. When enabled it automatically pluralize schema names on compound documents

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked up how you were doing pluralization. Nice to have a library to handle the english grammar. Might be worth commenting that the pluralization assumes the property names are in english.

@@ -11,7 +11,7 @@
type: string
description: ID of the category
- name: entityName
description: Array of strings to filter by entity names
description: Array of strings to filter by entity names. Incompatible with `compound` view
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@justincorrigible: to prevent confusion about API usage, instead of ignoring, should the request fail if this parameter given along compound? an error message could say "compound views do not support filtering by entity names" or something like that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing entityName description. Request parameters are being validated and will give a message with the corresponding error

Comment on lines +331 to +339
.superRefine((data, ctx) => {
if (data.view === VIEW_TYPE.Values.compound && data.entityName && data.entityName?.length > 0) {
ctx.addIssue({
code: z.ZodIssueCode.custom,
message: 'is incompatible with `compound` view',
path: ['entityName'],
});
}
}),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validating request fields that are incompatible with compound view

@@ -19,10 +19,13 @@
type: string
description: A matching Dictionary Name defined on Dictionary Manager (Lectern)
required: true
version:
dictionaryVersion:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to confirm for my own understanding: this is different from a schema version, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's different from schema version, changed variable name to dictionaryVersion

- `dictionary`: A JSONB column that stores the full dictionary schema as a JSON object. This schema defines the fields, data types, relationships, and validation rules required for data submissions, providing a flexible and easily accessible format for schema definitions.

> [!NOTE]
> Creating a new dictionary requires a [Lectern](https://github.com/overture-stack/lectern) as a Data Dictionary Management Service.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
> Creating a new dictionary requires a [Lectern](https://github.com/overture-stack/lectern) as a Data Dictionary Management Service.
> Creating a new dictionary requires [Lectern](https://github.com/overture-stack/lectern) as a Data Dictionary Management Service.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo fixed

.env.schema Outdated
Comment on lines 14 to 17
NESTED_RECORD_PREFIX=
NESTED_RECORD_SUFFIX=
PARENT_RECORD_PREFIX=
PARENT_RECORD_SUFFIX=
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to be careful with natural language property names... not everything is pluralized with an 's'.

I'll read through how you use these properties but at first read I want to just leave the property name as the schema name, unmodified.

@@ -31,6 +31,13 @@
schema:
type: integer
description: Optional query parameter to specify the number of results per page. Default value is 20
- name: view
in: query
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm nervous about the performance of compound object retrieval in a list... there could be many joins that are slow to resolve. I suppose if this response is paginated then this is less of a concern.

Am I correct in assuming that the purpose of the defaultCentricEntity property you are introducing is to determine which entity to use as the root when returning data in this form?

schema:
type: string
enum: ['list', 'compound']
description: Optional query parameter to specify the data representation. Choose 'list' for an unordered list of records, or 'compound' for a nested, schema-centric structure. Default is 'list'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is a bit unclear for me. Does compound view still return a list, but each entry is built into a compound document?

This feels like the options are actually "entity only" and "compound", since we get a list response in both cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!. I see the word list is ambiguous, I could suggest even something like flat which pairs well with compound and describes the data is non-nested and

Comment on lines 77 to 83
- name: view
in: query
required: false
schema:
type: string
enum: ['list', 'compound']
description: Optional query parameter to specify the data representation. Choose 'list' for an unordered list of records, or 'compound' for a nested, schema-centric structure. Default is 'list'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a repeat of the content comments from the previous endpoint. The same parameter is duplicated, maybe we want to write a reusable definition for this parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved view and most used parameters to a components file to be referenced

@@ -39,7 +39,8 @@ table dictionaries {

table dictionary_categories {
id serial [pk, not null, increment]
active_dictionary_id integer
active_dictionary_id integer [not null]
defaultCentricEntity varchar
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to make this decision when we are settign the dictionary? Maybe we should instead provide the root-entity as an http request parameter, so the data requester can choose this at request time. We probably want that ability anyways, and not every dictionary will have a clear default to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting a "default" entity only once could reduce complexity to data requesters. However, I also like the idea to override the default entity at request time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be nullable, with no default provided?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes defaultCentricEntity can be null in database, and if provided it must match a schema name in the dictionary.

Comment on lines 503 to 522
const hierarchyStructureDesc = generateHierarchy(category.activeDictionary.dictionary, ORDER_TYPE.Values.desc);
const hierarchyStructureAsc = generateHierarchy(category.activeDictionary.dictionary, ORDER_TYPE.Values.asc);

const childNodes = await traverseChildNodes({
data: foundRecord.data,
entityName: foundRecord.entityName,
organization: foundRecord.organization,
schemaCentric: foundRecord.entityName,
treeNode: hierarchyStructureDesc,
});

const parentNode = await traverseParentNodes({
data: foundRecord.data,
entityName: foundRecord.entityName,
organization: foundRecord.organization,
schemaCentric: foundRecord.entityName,
treeNode: hierarchyStructureAsc,
});

dataValue = { ...dataValue, ...childNodes, ...parentNode };
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Repeated code for converting an entity to compound document. Reuse a function.

* If no parent nodes or dependencies exist, it returns an empty object.
*
*/
const traverseParentNodes = async ({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is getting long with many functions that are specialized tasks used by the service, such as this one. Consider splitting the file into multiple files within a folder services/submittedData.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting Service files in different files to reduce accumulating much logic in a single file.

* @param order
* @returns
*/
const findOrCreateNode = (tree: TreeNode[], schemaName: string, order: OrderType): TreeNode => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we want to store our nodes as an object with the schemaNames as the property keys, instead of in an array where we need to traverse the array to find the node?

Comment on lines +119 to +120
// Use the first mapping for parent-child field relationship
const mapping = foreignKey.mappings[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a reasonable decision to start with, but we should consider how to manage more complex data models where an entity has multiple foreign key dependencies. Perhaps we could make an issue to track that we need to explore how this would work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue #111 created

Comment on lines 18 to 22
const viewSchema = z
.string()
.trim()
.min(1)
.refine((value) => isValidView(value), 'invalid `view` parameter');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this just be an enum from the VIEW_TYPE?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are converting into lowercase first and then doing the check from the enum VIEW_TYPE

Copy link
Contributor

@joneubank joneubank Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do something like:

z.string().toLowerCase().trim().min(1).pipe(z.enum(['asdf', 'qwerty']));

"version": "5",
"when": 1730128901514,
"tag": "0009_add_default_centric_entity",
"breakpoints": true
}
]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
}
}

const relatedSchema = findOrCreateNode(tree, foreignKey.schema, order);

// Use the first mapping for parent-child field relationship
const mapping = foreignKey.mappings[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this first item guaranteed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic is something we need to revisit. It works in all tested scenarios, however it's not fully guaranteed it works in more complex dictionaries. an issue was created #111

schemaCentricEntity: string | null;
recordEntityName: string;
}): string => {
return filterByEntityName?.some((e) => e && e !== '') ? recordEntityName : schemaCentricEntity || recordEntityName;
Copy link
Member

@justincorrigible justincorrigible Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if e === '' then e will be falsy, and the short-circuit won't ever get to the second bit.
it's not entirely clear what the string could be here either by its type, the variable names or the logic...
what is it that's supposed to happen here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was removed as it is no longer needed due to code refactoring.

@leoraba
Copy link
Contributor Author

leoraba commented Nov 19, 2024

PR Updated to fix and refactor code.

} catch (error) {
logger.error(`Error converting record ${record.systemId} into compound document`, error);
}
return record;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should be returning some error information that the conversion failed... We don't want the client to think there was nothing to join with when that operation actually failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to not return any records instead it throws a 500 Http error with a message An error occurred while converting records into compound view. This way user will know data or request is not valid.
Logs will describe the error details.

Copy link
Contributor

@joneubank joneubank left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very good now apart from the compound document failure handling in the submittedData/viewMode.ts. I'm happy to merge all this if we make a ticket to handle this failure by communicating the error to the caller instead of just returning incomplete data.

@leoraba leoraba force-pushed the get_compound_document branch from 7a20848 to 6a0400e Compare November 20, 2024 22:08
@leoraba leoraba merged commit ddaede3 into main Nov 20, 2024
@leoraba leoraba deleted the get_compound_document branch November 20, 2024 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants