kiwi.systems.encoders.xlmroberta
¶
Module Contents¶
Classes¶
Encode a field, handling vocabulary, tokenization and embeddings. |
|
XLM-RoBERTa model, using HuggingFace’s implementation. |
-
kiwi.systems.encoders.xlmroberta.
logger
¶
-
class
kiwi.systems.encoders.xlmroberta.
XLMRobertaTextEncoder
(tokenizer_name='xlm-roberta-base', is_source=False)¶ Bases:
kiwi.data.encoders.field_encoders.TextEncoder
Encode a field, handling vocabulary, tokenization and embeddings.
Heavily inspired in torchtext and torchnlp.
-
fit_vocab
(self, samples, vocab_size=None, vocab_min_freq=0, embeddings_name=None, keep_rare_words_with_embeddings=False, add_embeddings_vocab=False)¶
-
-
class
kiwi.systems.encoders.xlmroberta.
XLMRobertaEncoder
(vocabs: Dict[str, Vocabulary], config: Config, pre_load_model: bool = True)¶ Bases:
kiwi.systems._meta_module.MetaModule
XLM-RoBERTa model, using HuggingFace’s implementation.
-
class
Config
¶ Bases:
kiwi.utils.io.BaseConfig
Base class for all pydantic configs. Used to configure base behaviour of configs.
-
model_name
:Union[str, Path] = xlm-roberta-base¶ Pre-trained XLMRoberta model to use.
-
interleave_input
:bool = False¶ Concatenate SOURCE and TARGET without internal padding (111222000 instead of 111002220)
-
use_mlp
:bool = True¶ Apply a linear layer on top of XLMRoberta.
Size of the linear layer on top of XLMRoberta.
-
pooling
:Literal['first_token', 'mean', 'll_mean', 'mixed'] = mixed¶ Type of pooling used to extract features from the encoder. Options are: first_token: CLS_token is used for sentence representation mean: Use avg pooling for sentence representation using scalar mixed layers ll_mean: Mean pool of only last layer embeddings mixed: Concat CLS token with mean_pool
-
scalar_mix_dropout
:confloat(ge=0.0, le=1.0) = 0.1¶
-
scalar_mix_layer_norm
:bool = True¶
-
freeze
:bool = False¶ Freeze XLMRoberta during training.
-
freeze_for_number_of_steps
:int = 0¶ Freeze XLMR during training for this number of steps.
-
fix_relative_path
(cls, v)¶
-
-
load_state_dict
(self, state_dict: Union[Dict[str, Tensor], Dict[str, Tensor]], strict: bool = True)¶ Copies parameters and buffers from
state_dict
into this module and its descendants. Ifstrict
isTrue
, then the keys ofstate_dict
must exactly match the keys returned by this module’sstate_dict()
function.- Parameters
state_dict (dict) – a dict containing parameters and persistent buffers.
strict (bool, optional) – whether to strictly enforce that the keys in
state_dict
match the keys returned by this module’sstate_dict()
function. Default:True
- Returns
missing_keys is a list of str containing the missing keys
unexpected_keys is a list of str containing the unexpected keys
- Return type
NamedTuple
withmissing_keys
andunexpected_keys
fields
-
size
(self, field=None)¶
-
_check_freezing
(self)¶
-
forward
(self, batch_inputs, *args, include_logits=False)¶
-
static
concat_input
(source_batch, target_batch, pad_id)¶ Concatenate tensors of two batches into one tensor.
- Returns
- the concatenation, a mask of types (a as zeroes and b as ones)
and concatenation of attention_mask.
-
static
split_outputs
(features, batch_inputs, interleaved=False)¶ Split contexts to get tag_side outputs.
- Parameters
features (tensor) – XLMRoberta output: <s> target </s> </s> source </s> Shape of (bs, 1 + target_len + 2 + source_len + 1, 2)
batch_inputs –
interleaved (bool) – whether the concat strategy was ‘interleaved’.
- Returns
dict of tensors, one per tag side.
-
static
interleave_input
(source_batch, target_batch, pad_id)¶ Interleave the source + target embeddings into one tensor.
This means making the input as [batch, target [SEP] source].
- Returns
- interleave of embds, mask of target (as zeroes) and source (as ones)
and concatenation of attention_mask
-
static
get_mismatch_features
(logits, target, pred)¶
-
class