-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Search docs #871
base: feature/federated_search
Are you sure you want to change the base?
Search docs #871
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,300 @@ | ||
|
||
|
||
# Federated Search | ||
 | ||
|
||
## Prerequisites | ||
- Local data must be configured | ||
- ENV flag passed into process | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what env flag? what does the flag set/enable? |
||
- "search" config file present | ||
|
||
## Config | ||
- Read "search config" from file | ||
- Export object containing node data for use in gql endpoint | ||
|
||
Example Config provided: | ||
|
||
| Field | Type | Note | | ||
| ------------- | ------ | ------------------------------------------------------------------------------------------------------------------------------- | | ||
| endpoint | string | Full url to gql endpoint | | ||
| arrangerField | string | Endpoints can have other data on the root object. Need to get specific Arranger config object. Appears to be "file" by default. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think it is |
||
|
||
Example Search Node data: | ||
|
||
| Field | Description | | ||
| ------ | --------------------------- | | ||
| url | gql endpoint | | ||
| name | name of node eg. Toronto | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seeing this here, we may want to include a |
||
| schema | version of Arranger running | | ||
| status | node status eg. "connected" | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
|
||
## Request Nodes | ||
Outline: | ||
- GQL HTTP calls to endpoints | ||
- Specify only returning "arrangerField" in gql query | ||
- Handles network level errors eg. node not found | ||
|
||
Query step: | ||
- Read `aggregations` field which Arranger generates by default (`aggsState` isn't always generated - only for UI?) | ||
justincorrigible marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Get field name and type eg. `donor_specimen_sample: NumericAggregation` | ||
|
||
Sample query: | ||
```graphql | ||
{ | ||
# This is the root arranger type | ||
# "file" is the Arranger field | ||
RootType: __type(name:"file"){ | ||
name | ||
fields { | ||
name | ||
type { | ||
name | ||
} | ||
} | ||
} | ||
# This is the data we are interested in, just aggregations | ||
# input: typename "fileAggregations" retrieved from previous query | ||
Aggregations: __type(name:"fileAggregations"){ | ||
name | ||
fields { | ||
name # field name | ||
type { | ||
name # type name for resolvers | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
Sample response: | ||
```js | ||
{ | ||
"data": { | ||
"RootType": { | ||
"name": "file", // arranger root field | ||
"fields": [ | ||
{ | ||
"name": "aggregations", | ||
"type": { | ||
"name": "fileAggregations" | ||
} | ||
}, | ||
{ | ||
"name": "configs", | ||
"type": { | ||
"name": "ConfigsWithState" | ||
} | ||
}, | ||
{ | ||
"name": "hits", | ||
"type": { | ||
"name": "fileConnection" | ||
} | ||
}, | ||
{ | ||
"name": "mapping", | ||
"type": { | ||
"name": "JSON" | ||
} | ||
} | ||
] | ||
}, | ||
"Aggregations": { | ||
"name": "fileAggregations", | ||
"fields": [ | ||
{ | ||
"name": "analysis__analysis_id", // field name to merge | ||
"type": { | ||
"name": "Aggregations" // type for resolvers | ||
} | ||
}, | ||
... | ||
{ | ||
"name": "analysis__analysis_version", | ||
"type": { | ||
"name": "NumericAggregations" | ||
} | ||
}, | ||
``` | ||
|
||
## Merge Schemas | ||
Creates a union of all schema types into stitched schema including search node bucket breakdown. A configuration method should be provided to admins giving them freedom to filter out, or map fields to their liking. | ||
|
||
Introspection response object | ||
```json | ||
{ | ||
"name": "analysis__analysis_version", | ||
"type": { | ||
"name": "NumericAggregations" | ||
} | ||
} | ||
``` | ||
|
||
GQL schema type: | ||
```gql | ||
{ | ||
analysis__analysis_version: NumericAggregations | ||
} | ||
``` | ||
|
||
Process: | ||
1. Create stitched schema `stitched` from local schema aggregation fields | ||
3. For each *n* remote schema aggregations: | ||
1. iterate fields | ||
1. If `name` AND `type.name` pair exist in `stitched` - return | ||
2. else - add `[name]: [type.name]` to `stitched` | ||
|
||
Example: | ||
|
||
Inputs - GQL responses from introspection query | ||
Node A | ||
```js | ||
// GQL resp | ||
{ | ||
"name": "fileAggregations", | ||
"fields": [ | ||
{ | ||
"name": "donors__gender", // field name to merge | ||
"type": { | ||
"name": "Aggregations" // type | ||
} | ||
} | ||
] | ||
} | ||
|
||
``` | ||
Node B: | ||
```js | ||
{ | ||
"name": "fileAggregations", | ||
"fields": [ | ||
{ | ||
"name": "donors__gender", // field name to merge | ||
"type": { | ||
"name": "Aggregations" // type | ||
} | ||
}, | ||
{ | ||
"name": "donors__age", | ||
"type": { | ||
"name": "NumericAggregations" | ||
} | ||
} | ||
] | ||
} | ||
``` | ||
|
||
Output - Stitched schema | ||
```graphql | ||
schema { | ||
query { | ||
network { | ||
aggregations { | ||
donors__gender { | ||
search_node_agg: Aggregation | ||
agg: Aggregation | ||
} | ||
donors__age: { | ||
search_node_agg: NumericAggregation | ||
agg: NumericAggregation | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
``` | ||
|
||
Sample query to stitched schema: | ||
|
||
```js | ||
// Node A response | ||
{ | ||
donor: { | ||
aggregations: { | ||
__typename: "Aggregation", | ||
gender: { | ||
buckets: [ | ||
{ | ||
key: "Male", | ||
bucket_count: 123, | ||
}, | ||
{ | ||
key: "Female", | ||
bucket_count: 456, | ||
}, | ||
], | ||
}, | ||
}, | ||
}, | ||
}; | ||
|
||
// Node B response | ||
{ | ||
donor: { | ||
aggregations: { | ||
gender: { | ||
__typename: "Aggregation", | ||
buckets: [ | ||
{ | ||
key: "Male", | ||
bucket_count: 789, | ||
}, | ||
{ | ||
key: "Female", | ||
bucket_count: 234, | ||
}, | ||
], | ||
}, | ||
}, | ||
}, | ||
}; | ||
|
||
// Full response | ||
{ | ||
donor: { | ||
network: { | ||
aggregations: { | ||
gender: { | ||
search_node_agg: { | ||
buckets: [ | ||
{ | ||
key: "Node A", | ||
bucket_count: 579, // male + female | ||
}, | ||
{ | ||
key: "Node B", | ||
bucket_count: 1023, // male + female | ||
}, | ||
], | ||
}, | ||
agg: { | ||
buckets: [ | ||
{ | ||
key: "Male", | ||
bucket_count: 912, // Node A male + Node B male | ||
}, | ||
{ | ||
key: "Female", | ||
bucket_count: 456, // Node A female + Node B female | ||
}, | ||
], | ||
}, | ||
}, | ||
}, | ||
}, | ||
}, | ||
}; | ||
|
||
|
||
``` | ||
## Generate Resolvers | ||
 | ||
|
||
New resolvers are needed to aggregate the aggregates for all available aggregation types. | ||
ref: https://github.com/overture-stack/arranger/blob/develop/modules/server/src/schema/Aggregations.js | ||
|
||
- Individual search node data is queried (http) | ||
- Apply data transforms based on the `__typename` field eg. `NumberAggregation` | ||
- Add additional data eg. `search_node` breakdown of aggregate | ||
- Return fully resolved request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bit confusing cause I read it assuming it meant local developer setup should have data available, but now I assume this is "local" in the context of "local search" vs "network search". This is a bit unclear.
Perhaps it could be phrased like
Local Search configuration of this arranger node is performed separately from Network Search
, since we do not require a local search config for network search