Skip to content

Commit c05372a

Browse files
authored
add report option (#98)
1 parent c3f3f37 commit c05372a

File tree

3 files changed

+33
-9
lines changed

3 files changed

+33
-9
lines changed

README.md

+6-3
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,10 @@ The database uses the [Data Format for Digital Linguistics][DaFoDiL] (DaFoDiL) a
88

99
<!-- TOC -->
1010
- [Sources](#sources)
11-
- [Project Requirements](#project-requirements)
1211
- [Process](#process)
13-
- [Style Guide](#style-guide)
1412
- [The Database](#the-database)
15-
- [Building the Database](#building-the-database)
13+
- [Building & Updating the Database](#building--updating-the-database)
14+
- [Steps to incrementally update the production database](#steps-to-incrementally-update-the-production-database)
1615
- [Tests](#tests)
1716
<!-- /TOC -->
1817

@@ -76,6 +75,10 @@ To build and/or update the database, follow the steps below. Each of these steps
7675

7776
Entries from individual sources are **not** imported as main entries in the ALTLab database. Instead they are stored as subentries (using the `dataSources` field). The import script merely matches entries from individual sources to a main entry, or creates a main entry if none exists. An aggregation script then does the work of combining information from each of the subentries into a main entry (see the next step).
7877

78+
Each import step prints a table to the console, showing how many entries from the original data source were unmatched.
79+
80+
When importing the Maskwacîs database, you can add an `-r` or `--report` flag to output a list of unmatched entries to a file. The flag takes the file path as its argument.
81+
7982
6. Aggregate the data from the individual data sources: `node bin/aggregate.js <inputPath> <outputPath>` (the output path can be the same as the input path; this will overwrite the original).
8083

8184
7. For convenience, you can perform all the above steps with a single command in the terminal: `npm run build` | `yarn build`. In order for this command to work, you will need each of the following files to be present in the `/data` directory, with these exact filenames:

bin/import-MD.js

+1
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ import program from 'commander';
66
program
77
.arguments(`<mdPath> <databasePath> [fstPath]`)
88
.usage(`convert-md <mdPath> <databasePath> [fstPath]`)
9+
.option(`-r, --report <reportPath>`, `generate report of unmatched entries`)
910
.action(importMD);
1011

1112
program.parse(process.argv);

lib/import/MD.js

+26-6
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,9 @@
1-
import createSpinner from 'ora';
2-
import DatabaseIndex from '../utilities/DatabaseIndex.js';
3-
import readNDJSON from '../utilities/readNDJSON.js';
4-
import { Transducer } from 'hfstol';
5-
import writeNDJSON from '../utilities/writeNDJSON.js';
1+
import createSpinner from 'ora';
2+
import { createWriteStream } from 'fs';
3+
import DatabaseIndex from '../utilities/DatabaseIndex.js';
4+
import readNDJSON from '../utilities/readNDJSON.js';
5+
import { Transducer } from 'hfstol';
6+
import writeNDJSON from '../utilities/writeNDJSON.js';
67

78
function getPos(str) {
89
if (!str) return ``;
@@ -32,8 +33,11 @@ function updateEntry(dbEntry, mdEntry) {
3233
* Imports the MD entries into the ALTLab database.
3334
* @param {String} mdPath
3435
* @param {String} dbPath
36+
* @param {String} [fstPath]
37+
* @param {Object} [options={}]
38+
* @param {String} [report] The path where you would like the report generated.
3539
*/
36-
export default async function importMD(mdPath, dbPath, fstPath) {
40+
export default async function importMD(mdPath, dbPath, fstPath, { report } = {}) {
3741

3842
const readDatabaseSpinner = createSpinner(`Reading databases.`).start();
3943

@@ -141,4 +145,20 @@ export default async function importMD(mdPath, dbPath, fstPath) {
141145
'Entries without a match:': unmatched.length,
142146
});
143147

148+
if (report) {
149+
150+
const reportSpinner = createSpinner(`Generating report of unmatched entries.`).start();
151+
const writeStream = createWriteStream(report);
152+
153+
writeStream.write(`head\tPOS\toriginal\t\n`);
154+
155+
for (const { head, original, pos } of unmatched) {
156+
writeStream.write(`${ head.md }\t${ pos }\t${ original }`);
157+
}
158+
159+
writeStream.end();
160+
reportSpinner.succeed();
161+
162+
}
163+
144164
}

0 commit comments

Comments
 (0)