html5lib
diff --git a/‎encoding/README.md
+39 b/‎encoding/README.md
+39
@@ -0,0 +1,39 @@
+Encoding Tests
+==============
+
+Each file containing encoding tests has any number of tests separated by
+two newlines (LF) and a single newline before the end of the file:
+
+    [TEST]LF
+    LF
+    [TEST]LF
+    LF
+    [TEST]LF
+
+...where [TEST] is the format documented below.
+
+Encoding test format
+====================
+
+Each test must begin with a string "\#data", followed by a newline (LF).
+All subsequent lines until a line that says "\#encoding" are the test data
+and must be passed to the system being tested unchanged, except with the
+final newline (on the last line) removed.
+
+Then there must be a line that says "\#encoding", followed by a newline
+(LF), followed by string indicating an encoding name, followed by a newline
+(LF). The encoding name indicated is the expected character encoding for
+the output with the given test data as input.
+
+For the tests in the `preparsed` subdirectory, the encoding name indicated
+is the expected result of running the *encoding sniffing algorithm* at
+https://html.spec.whatwg.org/#encoding-sniffing-algorithm with the given
+test data as input; this is, it's the expected result of running *only* the
+*encoding sniffing algorithm* — without also running the tokenization state
+machine and tree-construction stage defined in the spec.
+
+For all tests outside the subdirectory named `preparsed`, the encoding name
+indicated is instead the expected character encoding for the output after
+fully parsing the given test data; that is, it's the expected character
+encoding for the output after running the tokenization state machine and
+tree-construction stage.