Skip to content

Commit e6a5a5b

Browse files
committed
docs: updated documentation
1 parent 00d23a0 commit e6a5a5b

File tree

4 files changed

+313
-35
lines changed

4 files changed

+313
-35
lines changed

src/Generator.ts

+117-33
Original file line numberDiff line numberDiff line change
@@ -5,42 +5,67 @@ import * as errors from './errors';
55
import * as utils from './utils';
66

77
/**
8-
* The TAR headers follow this structure:
9-
* Start Size Description
10-
* ------------------------------
11-
* 0 100 File name (first 100 bytes)
12-
* 100 8 File mode (null-padded octal)
13-
* 108 8 Owner user id (null-padded octal)
14-
* 116 8 Owner group id (null-padded octal)
15-
* 124 12 File size in bytes (null-padded octal, 0 for directories)
16-
* 136 12 Mtime (null-padded octal)
17-
* 148 8 Checksum (fill with ASCII spaces for computation)
18-
* 156 1 Type flag ('0' for file, '5' for directory)
19-
* 157 100 Link name (null-terminated ASCII/UTF-8)
20-
* 257 6 'ustar\0' (magic string)
21-
* 263 2 '00' (ustar version)
22-
* 265 32 Owner user name (null-terminated ASCII/UTF-8)
23-
* 297 32 Owner group name (null-terminated ASCII/UTF-8)
24-
* 329 8 Device major (unset in this implementation)
25-
* 337 8 Device minor (unset in this implementation)
26-
* 345 155 File name (last 155 bytes, total 255 bytes, null-padded)
27-
* 500 12 '\0' (unused)
8+
* The Generator can be used to generate blocks for a tar archive. The generator
9+
* can create three kinds of headers: FILE, DIRECTORY, and EXTENDED. The file and
10+
* directory is expected, but the extended header is able to store additional
11+
* metadata that does not fit in the standard header.
12+
*
13+
* This class can also be used to generate data chunks padded to 512 bytes. Note
14+
* that the chunk size shouldn't exceed 512 bytes.
15+
*
16+
* Note that the generator maintains an internal state and must be used for
17+
* operations like generating data chunks, end chunks, or headers, otherwise an
18+
* error will be thrown.
19+
*
20+
* For reference, this is the structure of a tar header.
21+
*
22+
* | Start | Size | Description |
23+
* |--------|------|-----------------------------------------------------------|
24+
* | 0 | 100 | File name (first 100 bytes) |
25+
* | 100 | 8 | File mode (null-padded octal) |
26+
* | 108 | 8 | Owner user ID (null-padded octal) |
27+
* | 116 | 8 | Owner group ID (null-padded octal) |
28+
* | 124 | 12 | File size in bytes (null-padded octal, 0 for directories) |
29+
* | 136 | 12 | Mtime (null-padded octal) |
30+
* | 148 | 8 | Checksum (fill with ASCII spaces for computation) |
31+
* | 156 | 1 | Type flag ('0' for file, '5' for directory) |
32+
* | 157 | 100 | Link name (null-terminated ASCII/UTF-8) |
33+
* | 257 | 6 | 'ustar\0' (magic string) |
34+
* | 263 | 2 | '00' (ustar version) |
35+
* | 265 | 32 | Owner user name (null-terminated ASCII/UTF-8) |
36+
* | 297 | 32 | Owner group name (null-terminated ASCII/UTF-8) |
37+
* | 329 | 8 | Device major (unset in this implementation) |
38+
* | 337 | 8 | Device minor (unset in this implementation) |
39+
* | 345 | 155 | File name (last 155 bytes, total 255 bytes, null-padded) |
40+
* | 500 | 12 | '\0' (unused) |
2841
*
29-
* Note that all numbers are in stringified octal format.
42+
* Note that all numbers are in stringified octal format, as opposed to the
43+
* numbers used in the extended header, which are all in stringified decimal.
3044
*
3145
* The following data will be left blank (null):
3246
* - Link name
33-
* - Owner user name
34-
* - Owner group name
3547
* - Device major
3648
* - Device minor
3749
*
38-
* This is because this implementation does not interact with linked files.
39-
* Owner user name and group name cannot be extracted via regular stat-ing,
40-
* so it is left blank. In virtual situations, this field won't be useful
41-
* anyways. The device major and minor are specific to linux kernel, which
42-
* is not relevant to this virtual tar implementation. This is the reason
43-
* these fields have been left blank.
50+
* This is because this implementation does not interact with linked files.
51+
* The device major and minor are specific to linux kernel, which is not
52+
* relevant to this virtual tar implementation. This is the reason these fields
53+
* have been left blank.
54+
*
55+
* The data for extended headers is formatted slightly differently, with the
56+
* general format following this structure.
57+
* <size> <key>=<value>\n
58+
*
59+
* Here, the <size> stands for the byte length of the entire line (including the
60+
* size number itself, the space, the equals, and the \n). Unlike in regular
61+
* strings, the end marker for a key-value pair is the \n (newline) character.
62+
* Moreover, unlike the USTAR header, the numbers are written in stringified
63+
* decimal format.
64+
*
65+
* The key can be any supported metadata key, and the value is binary data
66+
* storing the actual value. These are the currently supported keys for
67+
* the extended metadata:
68+
* - path (corresponding to file path if it is longer than 255 characters)
4469
*/
4570
class Generator {
4671
protected state: GeneratorState = GeneratorState.HEADER;
@@ -85,6 +110,7 @@ class Generator {
85110
filePath = filePath.endsWith('/') ? filePath : filePath + '/';
86111
}
87112

113+
// Write the relevant sections in the header with the provided data
88114
utils.writeUstarMagic(header);
89115
utils.writeFileType(header, type);
90116
utils.writeFilePath(header, filePath);
@@ -103,10 +129,27 @@ class Generator {
103129
return header;
104130
}
105131

132+
/**
133+
* Generates a file header based on the file path and the stat. Note that the
134+
* stat must provide a size for the file, but all other fields are optional.
135+
* If the file path is longer than 255 characters, then an error will be
136+
* thrown. An extended header needs to be generated first, then the file path
137+
* can be set to an empty string.
138+
*
139+
* The content of the file must follow this header in separate chunks.
140+
*
141+
* @param filePath the path of the file relative to the tar root
142+
* @param stat the stats of the file
143+
* @returns one 512-byte chunk corresponding to the header
144+
*
145+
* @see {@link generateExtended} for generating headers with extended metadata
146+
* @see {@link generateDirectory} for generating directory headers instead
147+
* @see {@link generateData} for generating data chunks
148+
*/
106149
generateFile(filePath: string, stat: FileStat): Uint8Array {
107150
if (this.state === GeneratorState.HEADER) {
108151
// Make sure the size is valid
109-
if (stat.size == null) {
152+
if (stat.size == null || stat.size < 0) {
110153
throw new errors.ErrorVirtualTarGeneratorInvalidStat(
111154
'Files must have valid file sizes',
112155
);
@@ -130,6 +173,19 @@ class Generator {
130173
);
131174
}
132175

176+
/**
177+
* Generates a directory header based on the file path and the stat. Note that
178+
* the size is ignored and set to 0 for directories. If the file path is longer
179+
* than 255 characters, then an error will be thrown. An extended header needs
180+
* to be generated first, then the file path can be set to an empty string.
181+
*
182+
* @param filePath the path of the file relative to the tar root
183+
* @param stat the stats of the file
184+
* @returns one 512-byte chunk corresponding to the header
185+
*
186+
* @see {@link generateExtended} for generating headers with extended metadata
187+
* @see {@link generateFile} for generating file headers instead
188+
*/
133189
generateDirectory(filePath: string, stat?: FileStat): Uint8Array {
134190
if (this.state === GeneratorState.HEADER) {
135191
// The size is zero for directories. Override this value in the stat if
@@ -147,6 +203,14 @@ class Generator {
147203
);
148204
}
149205

206+
/**
207+
* Generates an extended metadata header based on the total size of the data
208+
* following the header. If there is no need for extended metadata, then avoid
209+
* using this, as it would just waste space.
210+
*
211+
* @param size the size of the binary data block containing the metadata
212+
* @returns one 512-byte chunk corresponding to the header
213+
*/
150214
generateExtended(size: number): Uint8Array {
151215
if (this.state === GeneratorState.HEADER) {
152216
this.state = GeneratorState.DATA;
@@ -160,6 +224,22 @@ class Generator {
160224
);
161225
}
162226

227+
/**
228+
* Generates a data block. The input must be 512 bytes in size or smaller. The
229+
* input data cannot be chunked smaller than 512 bytes. For example, if the
230+
* file size is 1023 bytes, then you need to provide a 512-byte chunk first,
231+
* then provide the remaining 511-byte chunk later. You can not chunk it up
232+
* like sending over the first 100 bytes, then sending over the next 512.
233+
*
234+
* This method is used to generate blocks for both a file and the exnteded
235+
* header.
236+
*
237+
* @param data a block of binary data (512-bytes at largest)
238+
* @returns one 512-byte padded chunk corresponding to the data block
239+
*
240+
* @see {@link generateExtended} for generating headers with extended metadata
241+
* @see {@link generateFile} for generating file headers preceeding data block
242+
*/
163243
generateData(data: Uint8Array): Uint8Array {
164244
if (this.state === GeneratorState.DATA) {
165245
if (data.byteLength > constants.BLOCK_SIZE) {
@@ -198,9 +278,13 @@ class Generator {
198278
);
199279
}
200280

201-
// Creates a single null block. A null block is a block filled with all zeros.
202-
// This is needed to end the archive, as two of these blocks mark the end of
203-
// archive.
281+
/**
282+
* Generates a null chunk. Two invocations are needed to create a valid
283+
* archive end marker. After two invocations, the generator state will be
284+
* set to ENDED and no further data can be fed through the generator.
285+
*
286+
* @returns one 512-byte null chunk
287+
*/
204288
generateEnd(): Uint8Array {
205289
switch (this.state) {
206290
case GeneratorState.HEADER:

src/Parser.ts

+92
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,64 @@ import * as constants from './constants';
44
import * as errors from './errors';
55
import * as utils from './utils';
66

7+
/**
8+
* The Parser is used to parse blocks from a tar archive. Each written chunk can
9+
* return either a token or undefined. Undefined will only be returned when
10+
* parsing the first null chunk which signifies that the archive has ended. The
11+
* tokens can be either a header token corresponding to either a file, a
12+
* directory, or an extended header, a data token returning the data, and an end
13+
* token signifiying the ending of the archive.
14+
*
15+
* For reference, this is the structure of a tar header.
16+
*
17+
* | Start | Size | Description |
18+
* |--------|------|-----------------------------------------------------------|
19+
* | 0 | 100 | File name (first 100 bytes) |
20+
* | 100 | 8 | File mode (null-padded octal) |
21+
* | 108 | 8 | Owner user ID (null-padded octal) |
22+
* | 116 | 8 | Owner group ID (null-padded octal) |
23+
* | 124 | 12 | File size in bytes (null-padded octal, 0 for directories) |
24+
* | 136 | 12 | Mtime (null-padded octal) |
25+
* | 148 | 8 | Checksum (fill with ASCII spaces for computation) |
26+
* | 156 | 1 | Type flag ('0' for file, '5' for directory) |
27+
* | 157 | 100 | Link name (null-terminated ASCII/UTF-8) |
28+
* | 257 | 6 | 'ustar\0' (magic string) |
29+
* | 263 | 2 | '00' (ustar version) |
30+
* | 265 | 32 | Owner user name (null-terminated ASCII/UTF-8) |
31+
* | 297 | 32 | Owner group name (null-terminated ASCII/UTF-8) |
32+
* | 329 | 8 | Device major (unset in this implementation) |
33+
* | 337 | 8 | Device minor (unset in this implementation) |
34+
* | 345 | 155 | File name (last 155 bytes, total 255 bytes, null-padded) |
35+
* | 500 | 12 | '\0' (unused) |
36+
*
37+
* Note that all numbers are in stringified octal format, as opposed to the
38+
* numbers used in the extended header, which are all in stringified decimal.
39+
*
40+
* The following data will be left blank (null):
41+
* - Link name
42+
* - Device major
43+
* - Device minor
44+
*
45+
* This is because this implementation does not interact with linked files.
46+
* The device major and minor are specific to linux kernel, which is not
47+
* relevant to this virtual tar implementation. This is the reason these fields
48+
* have been left blank.
49+
*
50+
* The data for extended headers is formatted slightly differently, with the
51+
* general format following this structure.
52+
* <size> <key>=<value>\n
53+
*
54+
* Here, the <size> stands for the byte length of the entire line (including the
55+
* size number itself, the space, the equals, and the \n). Unlike in regular
56+
* strings, the end marker for a key-value pair is the \n (newline) character.
57+
* Moreover, unlike the USTAR header, the numbers are written in stringified
58+
* decimal format.
59+
*
60+
* The key can be any supported metadata key, and the value is binary data
61+
* storing the actual value. These are the currently supported keys for
62+
* the extended metadata:
63+
* - path (corresponding to file path if it is longer than 255 characters)
64+
*/
765
class Parser {
866
protected state: ParserState = ParserState.HEADER;
967
protected remainingBytes = 0;
@@ -67,6 +125,40 @@ class Parser {
67125
}
68126
}
69127

128+
/**
129+
* Each chunk in a tar archive is exactly 512 bytes long. This chunk needs to
130+
* be written to the parser, which will return a single token. This token can
131+
* be one of a header token, a data token, an end token, or undefined. The
132+
* undefined token is only returned when the chunk does not correspond to an
133+
* actual token. For example, the first null chunk in the archive end marker
134+
* will return an undefined. The second null chunk will return an end token.
135+
*
136+
* The header token can return different types of headers. The three supported
137+
* headers are FILE, DIRECTORY, and EXTENDED. Note that the file stat is
138+
* returned with each header. It might contain default values if it was not
139+
* set in the header. The default value for strings is '', for numbers is 0,
140+
* and for dates is Date(0), which is 11:00 AM 1 January 1970.
141+
*
142+
* Note that extended headers will not be automatically parsed. If some
143+
* metadata was put into the extended header instead, then it will need to be
144+
* parsed separately to get the information out, and the metadata field in the
145+
* header will contain the default value for its type.
146+
*
147+
* A data header is pretty simple, containing the bytes of the file. Note that
148+
* this is not aligned to the 512-byte boundary. For example, if a file has
149+
* 513 bytes of data, then the first chunk will return the 512 bytes of data,
150+
* and the next data chunk will return 1 byte, removing the padding. The data
151+
* token also has another field, `end`. This is a boolean which is true when
152+
* the last chunk of data is being sent. The expected token after an ended
153+
* data token is a header or an end token.
154+
*
155+
* The end token signifies that the archive has ended. This sets the internal
156+
* state to ENDED, and no further data can be written to it and attempts to
157+
* write any additional data will throw an error.
158+
*
159+
* @param data a single 512-byte chunk from the tar file
160+
* @returns a parsed token, or undefined if no tokens can be returned
161+
*/
70162
write(data: Uint8Array): TokenHeader | TokenData | TokenEnd | undefined {
71163
if (data.byteLength !== constants.BLOCK_SIZE) {
72164
throw new errors.ErrorVirtualTarParserBlockSize(

0 commit comments

Comments
 (0)