Go to file
Setra Solofoniaina d988d3e4a3 refactor
2021-04-02 10:24:30 +03:00
src refactor 2021-04-02 10:24:30 +03:00
.gitignore fixed .gitignore add readme.md 2021-04-02 10:10:11 +03:00
README.MD fixed .gitignore add readme.md 2021-04-02 10:10:11 +03:00

MALBERT

Malagasy Langage BERT - Strongly inspired by codertimo/BERT-pytorch but using pytorch integrated transformer module

Quickstart

**NOTICE : Your corpus should be one sentence per line

0. Prepare your corpus

Put train.txt, test.txt, valid.txt in folder dataset/corpus

1. Pretrain model

$python3 main.py 

Dependencies

  • python 3.8
  • torch >= 1.4
  • tokenizers

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT