Interactive analysis of phylomemetic structures

Master research project proposal
Interactive analysis of phylomemetic structures

Zoltan Miklos (MCF), David Gross-Amblard (PR)
DRUID team, IRISA lab, Université de Rennes 1
Contact: and
Funded by the ANR 2016 Epique project

Keywords: evolution of science, analysis of very-large text corpora, query languages,
provenance, data quality
Understanding the evolution of various scientific fields is important for our society.
Obtaining a general picture of important evolutions is rather challenging in the light of
the proliferation of scientific publishing, and overspecialized scientific journals. The
Epique project proposes to develop automated tools that can reconstruct important
aspect of the evolution from large corpora of scientific publications (such as Web of
Science, PubMed). We will represent the evolution in the form of phylomemetic lattices
[1] (analogous to phylogenetic trees that are used in biology for natural species). The
Epique project will use a combination of various techniques to obtain the phylomemetic
structures from very large collections of scientific articles.
The master project will focus on some of the following aspects.
• The phylomemetic structures reference a large number of scientific concepts and
describe their co-occurrence at various levels, so they can grow very large. We
would like to develop a query language that enables to realize interactive tools
for exploring the structure [2,7,8,9,10,11].
• As we will reconstruct the phylomemetic structure with the help of automated
tools, it is inevitable that it contains errors or imprecisions. We will develop
suitable quality assessment techniques to identify parts of the structure with
potentially poor quality. The quality model shall also rely on provenance
information [3,4,5,6] to support the human experts to identify the real sources of
quality problems.
Upon success, an extension of the master project for a three years PhD funded by the
ANR is possible.


