Master research project proposal
Interactive analysis of phylomemetic structures
Keywords: evolution of science, analysis of very-large text corpora, query languages,
provenance, data quality
Understanding the evolution of various scientific fields is important for our society.
Obtaining a general picture of important evolutions is rather challenging in the light of
the proliferation of scientific publishing, and overspecialized scientific journals. The
Epique project proposes to develop automated tools that can reconstruct important
aspect of the evolution from large corpora of scientific publications (such as Web of
Science, PubMed). We will represent the evolution in the form of phylomemetic lattices
 (analogous to phylogenetic trees that are used in biology for natural species). The
Epique project will use a combination of various techniques to obtain the phylomemetic
structures from very large collections of scientific articles.
The master project will focus on some of the following aspects.
• The phylomemetic structures reference a large number of scientific concepts and
describe their co-occurrence at various levels, so they can grow very large. We
would like to develop a query language that enables to realize interactive tools
for exploring the structure [2,7,8,9,10,11].
• As we will reconstruct the phylomemetic structure with the help of automated
tools, it is inevitable that it contains errors or imprecisions. We will develop
suitable quality assessment techniques to identify parts of the structure with
potentially poor quality. The quality model shall also rely on provenance
information [3,4,5,6] to support the human experts to identify the real sources of
Upon success, an extension of the master project for a three years PhD funded by the
ANR is possible.
 Chavalarias, D. and Cointet, J-P. P. 2013. Phylomemetic patterns in science
evolution—the rise and fall of scientific fields. PloS one 8, 2, e54847.
 Luiz Gomes Jr., Bernd Amann, André Santanchè:
Beta-Algebra: Towards a Relational Algebra for Graph Analysis. EDBT/ICDT Workshops
 S. B. Davidson and J. Freire. Provenance and scientific workflows: Challenges and
opportunities. In SIGMOD, 2008.
 Grigoris Karvounarakis, Irini Fundulaki, Vassilis Christophides: Provenance for
Linked Data. In Search of Elegance in the Theory and Practice of Computation 2013:
 Meliou et al. Causality in Databases. IEEE Data Engineering Bulletin 33, 3 (2010), 59–
 Yael Amsterdamer, Susan B. Davidson, Daniel Deutch, Tova Milo, Julia Stoyanovich,
Val Tannen:Putting Lipstick on Pig: Enabling Databasestyle Workflow Provenance.
PVLDB 5(4): 346357 (2011)
 Colazzo, D., Goasdoué, F., Manolescu, I., & Roatiş, A. (2014, April). RDF analytics:
lenses over semantic graphs. In Proceedings of the 23rd international conference on
World wide web (pp. 467-478). International World Wide Web Conferences Steering
 Valeria Fionda and Giuseppe Pirro. Querying graphs with preferences. In CIKM 2013
 Alekh Jindal, Praynaa Rawlani, Eugene Wu, Samuel Madden, Amol Deshpande, and
Mike Stonebraker. Vertexica: your relational friend for graph analytics!. VLDB J. 7(13) :
 Zoi Kaoudi, Ioana Manolescu: RDF in the clouds: a survey. VLDB J. 24(1): 6791
 Peter T. Wood. Query languages for graph databases. SIGMOD Rec. 41(1) : 5060