View on GitHub

Turku-neural-parser-pipeline

A neural parsing pipeline for segmentation, morphological tagging, dependency parsing and lemmatization with pre-trained models for more than 50 languages. Top ranker in the CoNLL-18 Shared Task.

Docker

For a quick test on the pre-made Finnish image:

echo "Minulla on koira." | docker run -i turkunlp/turku-neural-parser:finnish-cpu-plaintext-stdin

And for English:

echo "I don't have a goldfish." | docker run -i turkunlp/turku-neural-parser:english-cpu-plaintext-stdin

Ready-made images

Several ready-made Docker images are published in the TurkuNLP Docker Hub where Docker can find them automatically. Currently the ready-made images exist for Finnish and English.

Running from a ready-made image

To simply test the parser:

echo "Minulla on koira." | docker run -i turkunlp/turku-neural-parser:finnish-cpu-plaintext-stdin

To one-off parse a single file:

cat input.txt | docker run -i turkunlp/turku-neural-parser:finnish-cpu-plaintext-stdin > output.conllu

Images for other languages

Building a language-specific image is straightforward. For this you need to choose one of the available language models from here. These models refer to the various treebanks available at UniversalDependencies. Let us choose French and the GSD treebank model. That means the model name is fr_gsd and to parse plain text documents you would use the parse_plaintext pipeline.

Build the Docker image like so:

docker build -t "my_french_parser_plaintext" --build-arg "MODEL=fr_gsd" --build-arg "PIPELINE=parse_plaintext" -f Dockerfile https://github.com/TurkuNLP/Turku-neural-parser-pipeline.git

And then you can parse French like so:

echo "Les carottes sont cuites" | docker run -i my_french_parser_plaintext