Ideas of Projects for 2017

Some possible ideas for students looking for projects: ‘iniciação científica’.

Text Entailment

We propose a project to evaluate different techniques for text entailment using deep parsing. We were particularly interested in ‘deep’ linguistic processing of sentences. The goal is the combination of linguistic and statistical processing methods for getting at the meaning of texts and utterances. For the experiments, we propose the use of the SICK corpus and it was the corpus used in the SemEval 2014.

Some tools/ideas under consideration are:

  1. Universal DepLambda
  2. Sempre
  3. Enhanced Dependencies
  4. EasySRL
  5. AMR

Dependency Parser for Portuguese in FreeLing

Freeling is a developer-oriented library providing language analysis services. Freeling has already a good support for Portuguese in all its base modules (tokenizer, sentence splitter, POS tagger, WSD, etc.). We want to extend that support with a dependency parser for Portuguese. This project is about to understand how to train a parser in FreeLing and make it for Portuguese, evaluating the result.

For the training we can use the recently released UD_Portuguese data under the Universal Dependencies project.

SUO-KIF translator to TPTP

We have rewrote the translation from SUO-KIF logic language to TPTP language. In this project we want to expand the translation of high-order construction to TPTP/THF. In the sequence, we want to make the output readable to SNARK prover to explore its support to Procedural Attachments.

Ideally, the translator should be written in logic or functional programming style using: Prolog, Haskell or Common Lisp, etc.

CoNLL-U and Universal Dependencies toolset

The creation of an annotated corpus with dependencies is a hard task and very time-consuming. We are collaborating with the Universal Dependencies Project, with a Portuguese Corpus (UD_Portuguese). After release 2.0, we are now preparing for the next version expanding and solving errors in the current 2.0 corpus.

In this project, we are interested in improving the necessary tools that we use:

  1. CL-CONLLU : a Common Lisp library for work with CoNLL-U files
  2. conllu-workbench : a set of opensource tools that we use for searching and editing the corpus.

In particular, the CL library needs better support for rules and functions for comparing different trees and help in the identification of common patterns of errors.

Improving the openWordnet-PT interface

The OpenWordnet-PT, abbreviated as OpenWN-PT or simply OWN-PT) is a open access wordnet for Portuguese, originally developed by Valeria de Paiva, Alexandre Rademaker and Gerard de Melo as a syntactic projection of Universal WordNet (UWN) of de Melo and Weikum. Like many other open wordnets we believe that lexical resources need to be open to be useful.

The OpenWN-PT is available in RDF/OWL, following and expanding, when necessary, the mappings from the original Princeton WordNet. Both the data and the RDF template settings (classes and properties) of the OpenWN-PT are freely available for download here. Besides being downloadable, the data can be retrieved via SPARQL in the endpoint and one can consult and compare it with other wordnets at the generic interface provided by the Open MultiLingual WordNet project.

This project is about helping our team in the improving of the web interface for our openWordnet-PT.

In particular, we need to: (1) simplify the architecture; (2) improve the interface for votes and suggestions; (3) improve navegation and data visualization.