diff --git a/.gitignore b/.gitignore index ddde8a7..7d12255 100644 --- a/.gitignore +++ b/.gitignore @@ -4,4 +4,5 @@ /src/model/__pycache__ /src/output/* /src/trainer/__pycache__ -/.vscode \ No newline at end of file +/.vscode +/src/model/embedding/__pycache__ \ No newline at end of file diff --git a/README.MD b/README.MD new file mode 100644 index 0000000..33ac55f --- /dev/null +++ b/README.MD @@ -0,0 +1,30 @@ +# MALBERT + +Malagasy Langage BERT - Strongly inspired by [codertimo/BERT-pytorch](https://github.com/codertimo/BERT-pytorch) but using pytorch integrated transformer module + + +## Quickstart + +**NOTICE : Your corpus should be one sentence per line + +### 0. Prepare your corpus +Put train.txt, test.txt, valid.txt in folder dataset/corpus + +### 1. Pretrain model +``` +$python3 main.py +``` + +## Dependencies +* python 3.8 +* torch >= 1.4 +* tokenizers + + +## Contributing +Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change. + +Please make sure to update tests as appropriate. + +## License +[MIT](https://choosealicense.com/licenses/mit/) \ No newline at end of file