Commit 07f05cd
authored
Fix msdf_merge util function. (#627)
The `msdf_merge` function attempts to "propagate" slots before merging
the sets, but is doing so without any regard for which slots should
actually be propagated.
In addition, it attempts to inject in every individual mapping a
`mapping_set_source` slot pointing to the ID of the original set that
contained the mapping, but this is invalid as there is _no_
`mapping_set_source` slot on indivdual mapping records -- the slot
intended to capture the set from which a record came from is
`mapping_source`.
Lastly, the function also attempts to drop duplicates after the sets
have been merged, but the detection of duplicates is prevented by (1)
the incorrect propagation of non-propagatable slots (which can cause two
otherwise identical records in two different sets to appear different,
if the metadata of the sets contain different wrongly propagated slots),
and (2) the injection of the `mapping_set_source` slot.
This commit fixes all those issues by deleting the bogus
`inject_metadata_into_df` function and replacing by a call to
`msdf.propagate()`, which implements propagation correctly. It then
manually inject the correct `mapping_source` slot if possible, and if so
ignore the injected slot when attempting to drop duplicates.1 parent 76165d5 commit 07f05cd
File tree
4 files changed
+26
-46
lines changed- src/sssom
- tests
4 files changed
+26
-46
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
50 | | - | |
| 50 | + | |
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
| |||
1118 | 1118 | | |
1119 | 1119 | | |
1120 | 1120 | | |
1121 | | - | |
| 1121 | + | |
1122 | 1122 | | |
1123 | 1123 | | |
1124 | 1124 | | |
1125 | | - | |
1126 | | - | |
1127 | | - | |
1128 | | - | |
1129 | | - | |
1130 | | - | |
1131 | | - | |
1132 | | - | |
1133 | | - | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
| 1129 | + | |
| 1130 | + | |
| 1131 | + | |
| 1132 | + | |
| 1133 | + | |
| 1134 | + | |
| 1135 | + | |
| 1136 | + | |
| 1137 | + | |
| 1138 | + | |
| 1139 | + | |
| 1140 | + | |
| 1141 | + | |
| 1142 | + | |
| 1143 | + | |
1134 | 1144 | | |
1135 | 1145 | | |
1136 | | - | |
1137 | | - | |
| 1146 | + | |
| 1147 | + | |
1138 | 1148 | | |
1139 | | - | |
| 1149 | + | |
1140 | 1150 | | |
1141 | 1151 | | |
1142 | 1152 | | |
| |||
1296 | 1306 | | |
1297 | 1307 | | |
1298 | 1308 | | |
1299 | | - | |
1300 | | - | |
1301 | | - | |
1302 | | - | |
1303 | | - | |
1304 | | - | |
1305 | | - | |
1306 | | - | |
1307 | | - | |
1308 | | - | |
1309 | | - | |
1310 | | - | |
1311 | | - | |
1312 | | - | |
1313 | | - | |
1314 | | - | |
1315 | | - | |
1316 | | - | |
1317 | | - | |
1318 | | - | |
1319 | | - | |
1320 | 1309 | | |
1321 | 1310 | | |
1322 | 1311 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
| 24 | + | |
25 | 25 | | |
26 | 26 | | |
27 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
44 | | - | |
| 44 | + | |
45 | 45 | | |
46 | 46 | | |
47 | 47 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
41 | | - | |
42 | 41 | | |
43 | 42 | | |
44 | 43 | | |
| |||
213 | 212 | | |
214 | 213 | | |
215 | 214 | | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | 215 | | |
225 | 216 | | |
226 | 217 | | |
| |||
0 commit comments