segunda-feira, 6 de junho de 2016

NLTK and StanfordParser

Recently I faced a issue when I was working with NLTK and StanfordParser. According to the most recent documentation, which can be found here, it is enough you add Stanford Parser jar's to your CLASSPATH. However, this is is what happened to me:

Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
at edu.stanford.nlp.parser.common.ParserGrammar.(ParserGrammar.java:46)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more

Traceback (most recent call last):
  File "test_stanford.py", line 23, in
    sentences = dp.parse_sents(("Hello, My name is Evelin.", "What is your name?"))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/parse/stanford.py", line 129, in parse_sents
    cmd, '\n'.join(' '.join(sentence) for sentence in sentences), verbose))
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/parse/stanford.py", line 225, in _execute
    stdout=PIPE, stderr=PIPE)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/internals.py", line 135, in java
    raise OSError('Java command failed : ' + str(cmd))
OSError: Java command failed :...

So, I investigated this issue in so many Stackoverflow posts, NLTK documentation, and StanfordParser documentation. Nevertheless, most of information I collected was about earlier version of NLTK and Stanford earlier versions. After some debugging, I found out that one important file wasn't in Java command of NLTK library. This file is slf4j-api.jar.

Now, let's start from the beginning.  First let's see how is code for use StanfordParser with NLTK. you'll need the following imports in your code:

from nltk.parse.stanford import StanfordDependencyParser

import os

os library is to set environment variables STANFORD_PARSER and CLASSPATH. You can do this like in the following code lines:

os.environ['STANFORD_PARSER'] = '/path/to/your/stanford-parser/unzip/directory'

os.environ['CLASSPATH'] = '/path/to/your/stanford-parser/unzip/directory/'

After that you can instantiate a StanfordDependencyParser like that:

dp = StanfordDependencyParser(model_path='/path/to/your/englishPCFG.ser.gz')

File englishPCFG.ser.gz is inside stanford-parser-3.6.0-models.jar, so you can extract your model file from this jar. 

Finally, you are able to parse your sentences using the next line of code.

sentences = dp.parse_sents(("Hello, My name is Evelin.", "What is your name?"))

Then, I got the error that I pasted in the beginning os this post. So, what did I do? I debugged the following two files of the NLTK API: /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/internals.py and /Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/nltk/parse/stanford.py. I saw that the java command built by NLTK API consider in classpath only stanford-parser-3.6.0-models.jar and stanford-parser.jar, but we also need slf4j-api.jar to execute StanfordParser. I tried to set CLASSPATH, but that didn't work for me, so I actually changed stanford.py code. I added the following lines to code:

_MAIN_JAR = r'slf4j-api\.jar' # right after _JAR variable set up

# this code goes right before line: self._classpath = (stanford_jar, model_jar)
main_jar=max(
              find_jar_iter(
                  self._MAIN_JAR, path_to_models_jar,
                  env_vars=('STANFORD_MODELS', 'STANFORD_CORENLP'),
                  searchpath=(), url=_stanford_url,
                  verbose=verbose, is_regex=True
              ),
              key=lambda model_name: re.match(self._MAIN_JAR, model_name)
          )  

# and I changed ...
self._classpath = (stanford_jar, model_jar)
#to...
self._classpath = (stanford_jar, model_jar, main_jar)

I know that it is not a elegant solution, but it worked fine, so for now I think I will use it. Any suggestions are welcomed.

Nenhum comentário:

Postar um comentário