|
| 1 | +Encoding Tests |
| 2 | +============== |
| 3 | + |
| 4 | +Each file containing encoding tests has any number of tests separated by |
| 5 | +two newlines (LF) and a single newline before the end of the file: |
| 6 | + |
| 7 | + [TEST]LF |
| 8 | + LF |
| 9 | + [TEST]LF |
| 10 | + LF |
| 11 | + [TEST]LF |
| 12 | + |
| 13 | +...where [TEST] is the format documented below. |
| 14 | + |
| 15 | +Encoding test format |
| 16 | +==================== |
| 17 | + |
| 18 | +Each test must begin with a string "\#data", followed by a newline (LF). |
| 19 | +All subsequent lines until a line that says "\#encoding" are the test data |
| 20 | +and must be passed to the system being tested unchanged, except with the |
| 21 | +final newline (on the last line) removed. |
| 22 | + |
| 23 | +Then there must be a line that says "\#encoding", followed by a newline |
| 24 | +(LF), followed by string indicating an encoding name, followed by a newline |
| 25 | +(LF). The encoding name indicated is the expected character encoding for |
| 26 | +the output with the given test data as input. |
| 27 | + |
| 28 | +For the tests in the `preparsed` subdirectory, the encoding name indicated |
| 29 | +is the expected result of running the *encoding sniffing algorithm* at |
| 30 | +https://html.spec.whatwg.org/#encoding-sniffing-algorithm with the given |
| 31 | +test data as input; this is, it's the expected result of running *only* the |
| 32 | +*encoding sniffing algorithm* — without also running the tokenization state |
| 33 | +machine and tree-construction stage defined in the spec. |
| 34 | + |
| 35 | +For all tests outside the subdirectory named `preparsed`, the encoding name |
| 36 | +indicated is instead the expected character encoding for the output after |
| 37 | +fully parsing the given test data; that is, it's the expected character |
| 38 | +encoding for the output after running the tokenization state machine and |
| 39 | +tree-construction stage. |
0 commit comments