machine learning on textual data http://code.google.com/p/dkpro-tc