The Construction of a Corpus from the Brazilian Historical-Biographical Dictionary


Abstract:

We present our ongoing efforts towards the creation of a new Portuguese corpus based on the “Dicionário Histórico-Bibliográfico Brasileiro”. The aim to add as many linguistic annotations as possible using widely accepted annotation schemas and distributing all data in standard formats. This first exploratory work revisits what is already done and tests different tools to detect errors and look for the best methods to tackle the problem. Data is available at https://github.com/cpdoc/dhbb-nlp, and it will be continuously improved.

Downloads:

BibTeX:

@inproceedings{propor-2020,
  author = {Ribeiro, Lucas and Zulini, Jaqueline P. and Rademaker, Alexandre},
  editor = {Quaresma, Paulo and Vieira, Renata and Alu{\'i}sio, Sandra and Moniz, Helena and Batista, Fernando and Gon{\c{c}}alves, Teresa},
  title = {The Construction of a Corpus from the Brazilian
               Historical-Biographical Dictionary},
  booktitle = {Computational Processing of the Portuguese Language},
  year = {2020},
  publisher = {Springer International Publishing},
  address = {Cham},
  pages = {109--117},
  pdflink1 = {/files/propor-2020-article.pdf},
  isbn = {978-3-030-41505-1}
}