Over the past few months, as part of an effort to standardize some of the JVM AI work I've been doing, I've been working with a number of open-source technologies (see also links at right).
Apache UIMA -- a system for annotating unstructured text. It was originally part of IBM's Watson product. It provides a neat framework into which to fit a number of other AI and NLP tools, both open-source and home-grown.
OpenNLP -- I got tired of continually re-inventing the NLP parser, so I've decided to standardize on OpenNLP for most projects. I looked at several, including the Stanford project and Python NLTK, all of which are great, but OpenNLP proved best for my needs in terms of licensing and runtime ecosystem.
Drools -- Some of my low-level inference and feature-extraction needs exceed the performance capabilities of off-the-shelf inference engines. However, once out of that phase, a more general purpose open-source engine like Drools is ideal.
ANTLR -- I have hanging around a lot of little DSL parsers for rule files, configuration files, etc. No longer. With version 4.0, ANTLR has become so powerful and easy to use that everything's getting converted to ANTLR.
One of the main challenges was getting all of the above working under Scala, and especially working together under Scala! Actually, more of a documentation issue than anything else. Perhaps I'll document those efforts more in another post, but for now I'll just leave you with two thoughts:
1) Tools are your friends; I never would have gotten it all together without the help of a handful of Eclipse and IntelliJ IDEA plugins. Even if I went on to do some things manually, the plugins at least showed me the way. And sometimes there are pleasant surprises: I found out my IntelliJ IDEA installation could recognize and syntax-check Drools rule files!
2) And second, a piece of enduring wisdom that every JVM user should write on ceiling over his or her bed, so it will be the first thing seen in the morning, and the last thing seen at night:
"No matter what a JVM exception seems to be talking about, it's really talking about a classpath problem."