-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fetch compound document #105
Conversation
apps/server/swagger/data-api.yml
Outdated
/data/category/{categoryId}/id/{systemId}: | ||
get: | ||
summary: Retrieve Submitted Data Record for a System ID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a new Endpoint to retrieve record by it's systemID
apps/server/swagger/data-api.yml
Outdated
@@ -31,6 +31,13 @@ | |||
schema: | |||
type: integer | |||
description: Optional query parameter to specify the number of results per page. Default value is 20 | |||
- name: view | |||
in: query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adding view
query param on GET data endpoints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm nervous about the performance of compound object retrieval in a list... there could be many joins that are slow to resolve. I suppose if this response is paginated then this is less of a concern.
Am I correct in assuming that the purpose of the defaultCentricEntity
property you are introducing is to determine which entity to use as the root when returning data in this form?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This endpoint and others that fetch data as list or compound implements pagination. with default of 20 items per page. This will alleviate fetching large mount of data.
It's correct, the property defaultCentricEntity
is used to determine thee root schema in compound
view
defaultCentricEntity: | ||
type: string | ||
description: The default centric entity name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding new input parameter on Dictionary Registration endpoint. This is value is used to retrieve compound records
* @param order Order of the structed. | ||
* @returns The hierarchical tree structure. | ||
*/ | ||
export const generateHierarchy = (source: SchemaDefinition[], order: OrderType): TreeNode[] => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
main function to generate hierarchy structure based on dictionary schemas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider writing this function to return both the ascending and descending hierarchies in one call? Do you we have a use case where we want only one and not the other?
I see us using this function in 3 places, and each time we call it for both ascending and descending trees.
packages/data-model/docs/README.md
Outdated
|
||
- `original_schema_id`: This field references the original schema version used during the initial submission of the data, allowing for a historical perspective on schema changes over time. | ||
|
||
- `submission_id`: This field links the audit record to the specific submission, creating a direct connection to the submissin responsible for this action. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... connection to the submission responsible for this action.
Missing 'o' in submission
- `created_at`: This timestamp records when the audit entry was created, providing a chronological context for the logged action. | ||
|
||
- `created_by`: This field indicates who performed the action, enhancing accountability by identifying the user responsible for the change. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are great docs 👍
@@ -10,15 +10,18 @@ import { | |||
dataGetByCategoryRequestSchema, | |||
dataGetByOrganizationRequestSchema, | |||
dataGetByQueryRequestschema, | |||
dataGetBySystemIdRequestschema, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent casing for variables:
dataGetByCategoryRequestSchema, dataGetByOrganizationRequestSchema,
vs
dataGetByQueryRequestschema, dataGetBySystemIdRequestschema,
@@ -0,0 +1,2 @@ | |||
ALTER TABLE "dictionary_categories" ALTER COLUMN "active_dictionary_id" SET NOT NULL;--> statement-breakpoint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is --> statement-breakpoint
intended to be committed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .sql
file is auto generated by Drizzle Kit. unfortunately this comment is added
}; | ||
|
||
/** | ||
* Finds a matching schema name withing a nested object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo within a nested object
(extra 'g')
const getSubmittedDataByCategory = async ( | ||
categoryId: number, | ||
paginationOptions: PaginationOptions, | ||
filterOptions: { entityName?: (string | undefined)[] }, | ||
filterOptions: { entityName?: (string | undefined)[]; view: ViewType }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
view
here could be optional right? { entityName?: (string | undefined)[]; view?: ViewType },
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Controllers level, view
is optional. For service functions, however, it is required, and we define a default view for them.
const getSubmittedDataByOrganization = async ( | ||
categoryId: number, | ||
organization: string, | ||
paginationOptions: PaginationOptions, | ||
filterOptions?: { sqon?: SQON; entityName?: (string | undefined)[] }, | ||
filterOptions: { sqon?: SQON; entityName?: (string | undefined)[]; view: ViewType }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
view
here is also optional
Also there's enough overlap between getSubmittedDataByCategory & getSubmittedDataByOrganization
that filterOptions
could be defined as a reusable variable
const getSubmittedDataBySystemId = async ( | ||
categoryId: number, | ||
systemId: string, | ||
filterOptions: { view: ViewType }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tied into the above discussion, view is not mandatory here
|
||
expect(result).to.eql({ sports: [] }); | ||
|
||
describe('find the hierarchycal descendant structure between schemas in the dictionary', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling hierarchical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks like a made up word lol.
|
||
describe('Dictionary Schema Relations', () => { | ||
it('should return the schema children nodes on a Dictionary', () => { | ||
const result = getDictionarySchemaRelations(dictionarySportStats.dictionary); | ||
describe('Determine child relations by schema in a dictionary', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hurray for tests!
PR updated. Fixed typos and replied to comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
README.md
Outdated
| `ID_USELOCAL` | Generate ID locally | true | | ||
| `LECTERN_URL` | Schema Service (Lectern) URL | | | ||
| `LOG_LEVEL` | Log Level | 'info' | | ||
| `PLURALIZE_SCHEMAS_ENABLED` | This feature automatically convert schema names to their plural forms when handling compound documents | true | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New environment variable. When enabled it automatically pluralize schema names on compound documents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked up how you were doing pluralization. Nice to have a library to handle the english grammar. Might be worth commenting that the pluralization assumes the property names are in english.
@@ -11,7 +11,7 @@ | |||
type: string | |||
description: ID of the category | |||
- name: entityName | |||
description: Array of strings to filter by entity names | |||
description: Array of strings to filter by entity names. Incompatible with `compound` view |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@justincorrigible: to prevent confusion about API usage, instead of ignoring, should the request fail if this parameter given along compound? an error message could say "compound views do not support filtering by entity names" or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing entityName description. Request parameters are being validated and will give a message with the corresponding error
.superRefine((data, ctx) => { | ||
if (data.view === VIEW_TYPE.Values.compound && data.entityName && data.entityName?.length > 0) { | ||
ctx.addIssue({ | ||
code: z.ZodIssueCode.custom, | ||
message: 'is incompatible with `compound` view', | ||
path: ['entityName'], | ||
}); | ||
} | ||
}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validating request fields that are incompatible with compound
view
@@ -19,10 +19,13 @@ | |||
type: string | |||
description: A matching Dictionary Name defined on Dictionary Manager (Lectern) | |||
required: true | |||
version: | |||
dictionaryVersion: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to confirm for my own understanding: this is different from a schema version, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's different from schema version, changed variable name to dictionaryVersion
packages/data-model/docs/README.md
Outdated
- `dictionary`: A JSONB column that stores the full dictionary schema as a JSON object. This schema defines the fields, data types, relationships, and validation rules required for data submissions, providing a flexible and easily accessible format for schema definitions. | ||
|
||
> [!NOTE] | ||
> Creating a new dictionary requires a [Lectern](https://github.com/overture-stack/lectern) as a Data Dictionary Management Service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
> Creating a new dictionary requires a [Lectern](https://github.com/overture-stack/lectern) as a Data Dictionary Management Service. | |
> Creating a new dictionary requires [Lectern](https://github.com/overture-stack/lectern) as a Data Dictionary Management Service. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo fixed
.env.schema
Outdated
NESTED_RECORD_PREFIX= | ||
NESTED_RECORD_SUFFIX= | ||
PARENT_RECORD_PREFIX= | ||
PARENT_RECORD_SUFFIX= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to be careful with natural language property names... not everything is pluralized with an 's'.
I'll read through how you use these properties but at first read I want to just leave the property name as the schema name, unmodified.
apps/server/swagger/data-api.yml
Outdated
@@ -31,6 +31,13 @@ | |||
schema: | |||
type: integer | |||
description: Optional query parameter to specify the number of results per page. Default value is 20 | |||
- name: view | |||
in: query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm nervous about the performance of compound object retrieval in a list... there could be many joins that are slow to resolve. I suppose if this response is paginated then this is less of a concern.
Am I correct in assuming that the purpose of the defaultCentricEntity
property you are introducing is to determine which entity to use as the root when returning data in this form?
apps/server/swagger/data-api.yml
Outdated
schema: | ||
type: string | ||
enum: ['list', 'compound'] | ||
description: Optional query parameter to specify the data representation. Choose 'list' for an unordered list of records, or 'compound' for a nested, schema-centric structure. Default is 'list'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This description is a bit unclear for me. Does compound
view still return a list, but each entry is built into a compound document?
This feels like the options are actually "entity only" and "compound", since we get a list response in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point!. I see the word list
is ambiguous, I could suggest even something like flat
which pairs well with compound and describes the data is non-nested and
apps/server/swagger/data-api.yml
Outdated
- name: view | ||
in: query | ||
required: false | ||
schema: | ||
type: string | ||
enum: ['list', 'compound'] | ||
description: Optional query parameter to specify the data representation. Choose 'list' for an unordered list of records, or 'compound' for a nested, schema-centric structure. Default is 'list'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a repeat of the content comments from the previous endpoint. The same parameter is duplicated, maybe we want to write a reusable definition for this parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved view
and most used parameters to a components file to be referenced
packages/data-model/docs/schema.dbml
Outdated
@@ -39,7 +39,8 @@ table dictionaries { | |||
|
|||
table dictionary_categories { | |||
id serial [pk, not null, increment] | |||
active_dictionary_id integer | |||
active_dictionary_id integer [not null] | |||
defaultCentricEntity varchar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to make this decision when we are settign the dictionary? Maybe we should instead provide the root-entity as an http request parameter, so the data requester can choose this at request time. We probably want that ability anyways, and not every dictionary will have a clear default to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think setting a "default" entity only once could reduce complexity to data requesters. However, I also like the idea to override the default entity at request time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be nullable, with no default provided?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes defaultCentricEntity
can be null in database, and if provided it must match a schema name in the dictionary.
const hierarchyStructureDesc = generateHierarchy(category.activeDictionary.dictionary, ORDER_TYPE.Values.desc); | ||
const hierarchyStructureAsc = generateHierarchy(category.activeDictionary.dictionary, ORDER_TYPE.Values.asc); | ||
|
||
const childNodes = await traverseChildNodes({ | ||
data: foundRecord.data, | ||
entityName: foundRecord.entityName, | ||
organization: foundRecord.organization, | ||
schemaCentric: foundRecord.entityName, | ||
treeNode: hierarchyStructureDesc, | ||
}); | ||
|
||
const parentNode = await traverseParentNodes({ | ||
data: foundRecord.data, | ||
entityName: foundRecord.entityName, | ||
organization: foundRecord.organization, | ||
schemaCentric: foundRecord.entityName, | ||
treeNode: hierarchyStructureAsc, | ||
}); | ||
|
||
dataValue = { ...dataValue, ...childNodes, ...parentNode }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Repeated code for converting an entity to compound document. Reuse a function.
* If no parent nodes or dependencies exist, it returns an empty object. | ||
* | ||
*/ | ||
const traverseParentNodes = async ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is getting long with many functions that are specialized tasks used by the service, such as this one. Consider splitting the file into multiple files within a folder services/submittedData
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Splitting Service files in different files to reduce accumulating much logic in a single file.
* @param order | ||
* @returns | ||
*/ | ||
const findOrCreateNode = (tree: TreeNode[], schemaName: string, order: OrderType): TreeNode => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we want to store our nodes as an object with the schemaNames as the property keys, instead of in an array where we need to traverse the array to find the node?
// Use the first mapping for parent-child field relationship | ||
const mapping = foreignKey.mappings[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a reasonable decision to start with, but we should consider how to manage more complex data models where an entity has multiple foreign key dependencies. Perhaps we could make an issue to track that we need to explore how this would work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue #111 created
const viewSchema = z | ||
.string() | ||
.trim() | ||
.min(1) | ||
.refine((value) => isValidView(value), 'invalid `view` parameter'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this just be an enum from the VIEW_TYPE
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we are converting into lowercase first and then doing the check from the enum VIEW_TYPE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could do something like:
z.string().toLowerCase().trim().min(1).pipe(z.enum(['asdf', 'qwerty']));
"version": "5", | ||
"when": 1730128901514, | ||
"tag": "0009_add_default_centric_entity", | ||
"breakpoints": true | ||
} | ||
] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | |
} | |
const relatedSchema = findOrCreateNode(tree, foreignKey.schema, order); | ||
|
||
// Use the first mapping for parent-child field relationship | ||
const mapping = foreignKey.mappings[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this first item guaranteed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic is something we need to revisit. It works in all tested scenarios, however it's not fully guaranteed it works in more complex dictionaries. an issue was created #111
schemaCentricEntity: string | null; | ||
recordEntityName: string; | ||
}): string => { | ||
return filterByEntityName?.some((e) => e && e !== '') ? recordEntityName : schemaCentricEntity || recordEntityName; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if e === ''
then e
will be falsy, and the short-circuit won't ever get to the second bit.
it's not entirely clear what the string
could be here either by its type, the variable names or the logic...
what is it that's supposed to happen here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function was removed as it is no longer needed due to code refactoring.
PR Updated to fix and refactor code. |
} catch (error) { | ||
logger.error(`Error converting record ${record.systemId} into compound document`, error); | ||
} | ||
return record; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should be returning some error information that the conversion failed... We don't want the client to think there was nothing to join with when that operation actually failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to not return any records instead it throws a 500 Http error with a message An error occurred while converting records into compound view
. This way user will know data or request is not valid.
Logs will describe the error details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very good now apart from the compound document failure handling in the submittedData/viewMode.ts
. I'm happy to merge all this if we make a ticket to handle this failure by communicating the error to the caller instead of just returning incomplete data.
7a20848
to
6a0400e
Compare
Description
This PR introduces a new feature to this project - View representation of a compound document represented in a hierarchical structure of related Submitted Data, consolidating all related data for a central entity, or centric schema, while organizing both its parent and child relationships within the document. This structure will enable more efficient data access by aggregating connected data into a single view.
Here's how each relationships are represented:
Code changes:
view
. This specify the data representation to be returned. Uselist
to return an unordered list of entities;compound
to return a compound document nesting using scheme centric structure.defaultCentricEntity
. This value is used to form compound documents.NESTED_RECORD_PREFIX
,NESTED_RECORD_SUFFIX
,PARENT_RECORD_PREFIX
,PARENT_RECORD_SUFFIX