kiwi.data.datasets.wmt_qe_dataset
¶
Module Contents¶
Classes¶
Base class for all pydantic configs. Used to configure base behaviour of configs. |
|
Base class for all pydantic configs. Used to configure base behaviour of configs. |
|
Base class for all pydantic configs. Used to configure base behaviour of configs. |
|
Base class for all pydantic configs. Used to configure base behaviour of configs. |
|
An abstract class representing a |
Functions¶
|
-
kiwi.data.datasets.wmt_qe_dataset.
logger
¶
-
class
kiwi.data.datasets.wmt_qe_dataset.
InputConfig
¶ Bases:
kiwi.utils.io.BaseConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
-
source
:FilePath¶ Path to a corpus file in the source language.
-
target
:FilePath¶ Path to a corpus file in the target language.
-
alignments
:Optional[FilePath]¶ Path to alignments between source and target.
-
post_edit
:Optional[FilePath]¶ Path to file containing post-edited target.
-
source_pos
:Optional[FilePath]¶ Path to input file with POS tags for source.
-
target_pos
:Optional[FilePath]¶ Path to input file with POS tags for source.
-
-
class
kiwi.data.datasets.wmt_qe_dataset.
OutputConfig
¶ Bases:
kiwi.utils.io.BaseConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
Path to label file for target.
Path to label file for source.
-
sentence_scores
:Optional[FilePath]¶ Path to file containing sentence level scores (HTER).
-
class
kiwi.data.datasets.wmt_qe_dataset.
TrainingConfig
¶ Bases:
kiwi.utils.io.BaseConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
-
input
:InputConfig¶
-
output
:OutputConfig¶
-
-
class
kiwi.data.datasets.wmt_qe_dataset.
TestConfig
¶ Bases:
kiwi.utils.io.BaseConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
-
input
:InputConfig¶
-
-
class
kiwi.data.datasets.wmt_qe_dataset.
WMTQEDataset
(columns: Dict[Any, Union[Iterable, List]])¶ Bases:
torch.utils.data.Dataset
An abstract class representing a
Dataset
.All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite
__getitem__()
, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite__len__()
, which is expected to return the size of the dataset by manySampler
implementations and the default options ofDataLoader
.Note
DataLoader
by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided.-
class
Config
¶ Bases:
kiwi.utils.io.BaseConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
-
buffer_size
:int¶ Number of consecutive instances to be temporarily stored in the buffer, which will be used later for batching/bucketing.
-
train
:TrainingConfig¶
-
valid
:TrainingConfig¶
-
test
:TestConfig¶
-
split
:Optional[confloat(gt=0.0, lt=1.0)]¶ Split train dataset in case that no validation set is given.
-
ensure_there_is_validation_data
(cls, v, values)¶
-
-
static
build
(config: Config, directory=None, train=False, valid=False, test=False, split=0)¶ Build training, validation, and test datasets.
- Parameters
config – configuration object with file paths and processing flags; check out the docs for
Config
.directory – if provided and paths in configuration are not absolute, use it to anchor them.
train – whether to build the training dataset.
valid – whether to build the validation dataset.
test – whether to build the testing dataset.
split (float) – If no validation set is provided, randomly sample \(1-split\) of training examples as validation set.
-
__getitem__
(self, index_or_field: Union[int, str]) → Union[List[Any], Dict[str, Any]]¶ Get a row with data from all fields or all rows for a given field
-
__len__
(self)¶
-
__contains__
(self, item)¶
-
sort_key
(self, field='source')¶
-
class
-
kiwi.data.datasets.wmt_qe_dataset.
read_file
(path, reader)¶