Skip to content

build_vocab with custom word embedding #201

Open
@DaehanKim

Description

@DaehanKim

Hi :)
I want to use customized bio-word embedding to do some text classification.

And I can't find how.

Some old tutorial says there is 'wv_dir' keyword argument, which I tried and failed :

TypeError                                 Traceback (most recent call last)
<ipython-input-48-ac09f554719e> in <module>()
      1 test_field = data.Field()
      2 lang_data = datasets.LanguageModelingDataset(path='pr_data/processed_neg.txt',text_field=test_field)
----> 3 voc = torchtext.vocab.Vocab(wv_dir='bio_wordemb/PubMed-and-PMC-w2v.txt')
      4 
      5 # test_field.build_vocab(lang_data,wv_dir='bio_wordemb/PubMed-and-PMC-w2v.txt')

TypeError: __init__() got an unexpected keyword argument 'wv_dir'

Just like we can load pretrained GloVe embedding using TEXTFIELD.build_vocab(data, vectors='glove.6B.100d'), is there similar way to load customized embedding?

Any help would be much appreciated. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions