The first step to read Conll data using nltk API reader is import the appropriate library.
from nltk.corpus.reader.conll import ConllCorpusReader
The documentation of NLTK about Conll API describes as the first argument of ConllCorpusReader constructor a root, which means the root directory of your data. So, for instance:
root = "/home/userx/dataconll/"
The next step is to call the constructor of ConllCorpusReader class:
ccorpus = ConllCorpusReader(root, ".conll", ('words', 'pos', 'tree'))
In this example, I want all files with extension ".conll" given root directory. Also, I have to specify which columns I want from my conll files. In this example, I would like to take 'words', 'pos', and 'tree' columns, but you can select the following columns: 'words', 'pos', 'tree', 'chunk', 'ne', 'ignore'. The description of each column is in NLTK documentation website.
After that, you can access the methods for each file you want. For instance:
ccorpus.words('file2.conll')
terça-feira, 7 de março de 2017
Assinar:
Postagens (Atom)