multi30k dataset only contains test set for French and no Czech data

The multi30k dataset provided by `torchtext.datasets` contains the training/testing/validation sets for English and German, however for French it only contains a test set - even though training and validation sets are available.

It also does not contain any of the Czech data, or any of the 2017 test data (which is not available in Czech).

The official multi30k repository - https://github.com/multi30k/dataset - contains all of the data at https://github.com/multi30k/dataset/tree/master/data/task1/raw, although getting the data from GitHub is slightly more tricky than directly downloading it from the urls torchtext is already using for multi30k.

I'm guessing that adding this can wait until #751 is finished?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

multi30k dataset only contains test set for French and no Czech data #762

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

multi30k dataset only contains test set for French and no Czech data #762

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions