icafe – informal team seminar

Upcoming Talks

Title:  EMBEDD-ER : EMBEDDing Educational Resources Using Linked Open Data

Speaker:  Aymen Bazouzi

Place and Time: Rennes – Room TBD & Teams via the DRUID Channel – 28 March at 1:00 PM


There are a lot of educational resources publicly available online. Recommender systems and information retrieval engines can help learners and educators navigate in these resources. However, the available educational resources differ in format, size, type, topics, etc. These differences complicate their use and manipulation which raised the need for having a common representation for educational resources and texts in general. Efforts have been made by the research community to create various techniques to homogeneously represent these resources. Although these representations have achieved incredible results in many tasks, they seem to be dependent on the writing style and not only on the content. Furthermore, they do not generate representations that reflect a semantic representation of the content. In this work, we present a new task-agnostic method (EMBEDD-ER) to generate representations for educational resources based on document annotation and Linked Open Data (LOD). It creates representations that are focused on the content, compact, and can be generalized to unseen resources without requiring extra training. The resulting representations encapsulate the information found in the resources and project similar resources closer to one another than to non-similar ones. Empirical tests have shown promising results both visually and in a subject classification task.

Former Talks

Title:  Analysis and prediction of the commercial speed in a public urban transport network

Speaker:  Erwan Vincent

Place and Time: Rennes – Room TBD & Teams via the DRUID Channel – 21 March at 1:00 PM


In the metropolis of Rennes as in other cities, buses evolve in a complex environment that is the urban environment. This environment contains many “factors” that can positively or negatively influence the speed of buses. In a paper to be submitted in the following months, I have proposed an analysis of the different factors that can impact buses speed and the use of these different factors in a machine learning model. This is to desmonstrate that the knowledge of the environment where buses evolve allow to know with a certain degree of accuracy their future performance in this same environment.

Title:  Article archive explorer

Speaker:  Cédriane Prohin

Place and Time: Rennes – Room TBD & Teams via the DRUID Channel – 14 March at 1:00 PM


Presentation of website that allows to collect archive articles of “LA NATURE” using keyword exploration.

Title:  Contribution to automated license analysis – state of the art

Speaker:  Malo Revel

Place and Time: Rennes – Room Aurigny D165 & Teams via the DRUID Channel – 7 February at 1:00 PM

Title:  Designing a temporal graph management system for IoT application domains.

Speaker:  Maria Massri

Place and Time: Rennes – Aurigny (D165) – Tuesday, the 6th of December at 1:00 PM – 6 December 2022 at 1:00 PM

Title:  Numérique responsable ? Moins vite, moins haut, moins fort !

Speaker:  Olivier Ridoux

Place and Time: Rennes – Aurigny (D165) & Teams via the DRUID Channel – 8 November 2022


Si on considère qu’une posture responsable consiste à ne pas se fermer aux angles morts une fois qu’ils nous sont révélés, on voit que le numérique responsable (et pareillement le tourisme responsable, l’agriculture responsable, etc.) doit accepter des contraintes qui ne viennent pas des spécialistes du domaines, et qui sont exprimées en des termes inhabituels pour les spécialistes, comme des joules, des tonnes de GES ou des m3 d’eau. Être responsable signifie seulement respecter ces contraintes. À l’heure actuelle, ces contraintes ne sont pas encore formulées explicitement, et il revient alors au numérique responsable de ne pas attendre qu’elles le soient mais plutôt de les anticiper. Les contraintes à anticiper sont tellement fortes qu’on doit s’attendre à devoir faire moins vite, moins haut et moins fort, même si il n’est pas exclu que le temps passant l’état de l’art progresse et permette d’aller plus vite, plus haut et plus fort, mais toujours en respectant les contraintes.

Title:  Imperfect Labels with Belief Functions for Active Learning

Speaker:  Arthur Hoarau

Place and Time: Rennes – Oléron (A008) & Teams via the DRUID Channel – 17 October 2022


Classification is used to predict classes by extracting information from labeled data. But sometimes the collected data is imperfect, as in crowdsourcing where users have partial knowledge and may answer with uncertainty or imprecision.
This paper offers a way to deal with uncertain and imprecise labeled data using Dempster-Shafer theory and active learning. An evidential version of K-NN that classifies a new example by observing its neighbors was earlier introduced.
We propose to couple this approach with active learning, where the model uses only a fraction of the labeled data, and to compare it with non-evidential models.
A new computable parameter for EK-NN is introduced, allowing the model to be both compatible with imperfectly labeled data and equivalent to its first version in the case of perfectly labeled data.
This method increases the complexity but provides a way to work with imperfectly labeled data with efficient results and reduced labeling costs when coupled with active learning. We have conducted tests on real data imperfectly labeled during crowdsourcing campaigns.

Title:  State of the Art of Aymen Bazouzi’s PhD entitled : “Combining Educational Resources Through Graph Representation Learning

Speaker:  Aymen Bazouzi

Place and Time: Rennes – Aurigny (D165) & Inria Cisco – 07 June 2022

Title:  State of the Art of Erwan Vincent’s PhD entitled : “Automatic learning and simulation for the identification and prediction of the determining factors of the quality of service of high service level buses

Speaker:  Erwan Vincent

Place and Time: Rennes – Aurigny (D165) & Teams via the DRUID Channel – 24 Mai 2022

Title:  Clock-G: A temporal graph management system with a space-efficient storage technique

Speaker:  Maria Masri

Place and Time: Rennes – Oléron (A008) & Teams via the DRUID Channel – 6 Mai 2022


IoT applications can be naturally modeled as a graph where the edges represent the interactions between devices, sensors, and their environment. Thing’in is a platform, initiated by  Orange. The platform manages a graph of millions of connected and non-connected objects using a commercial graph database. The graph of Thing’in is dynamic because IoT devices create temporary connections between each other. Analyzing the history of these connections paves the way to new promising applications such as object tracking, anomaly detection, and forecasting the future behavior of devices. However, existing commercial graph databases are not designed with native temporal support which limits their usability in such use cases.
In this paper, we discuss the design of a temporal graph management system Clock-G and introduce a new space-efficient storage technique δ-Copy+Log. Clock-G is designed by the developers of the Thing’in platform and is currently being deployed into production. It differentiates from existing temporal graph management systems by adopting the δ-Copy+Log technique. This technique targets the mitigation of the apparent trade-off between the conflicting goals of the reduction of space usage and acceleration of query execution time. Our experimental results demonstrate that the δ-Copy+Log presents an overall better performance as compared to traditional storage methods in terms of space usage and query evaluation time.

Title: Presentation of PhD work

Speaker: Mathieu Chambe

Place and Time: Rennes-Aurigny (D165) & Teams via the DRUID Channel – 9 November 2021

Title: PhD defense rehearsal

Speaker: Gauthier Lyan

Place and Time: Rennes-Aurigny (D165) – 21 September 2021

Title: Development of Croudsourcing campaigns

Speaker: Thomas Hamon

Place and Time: Teams – 22 June 2021

Title:  CSID rehearsal

Speaker: François Mentec

Place and Time: TEAMS – 8 June 2021

Title: CSID rehearsal

Speaker: Maria Massri

Place and Time: ZOOM – 1 June 2021

Title:  CSID rehearsal

Speaker:  Constance Thierry

Place and Time: TEAMS – 11 May 2021

Title: Data-centric Workflows for Croudsourcing Applications (PhD defense rehearsal)

Speaker: Rituraj Singh

Place and Time: ZOOM – 27 April 2021

Title:  Belief Shift Clustering

Speaker:  Zuowei Zhang

Place and Time: TEAMS – 26 January 2021


It is still a very challenging task to characterize the uncertainty and imprecision between singleton (specific) clusters with arbitrary shapes and sizes in the space. To derive such a problem, this paper introduces a new method, called belief shift clustering (BSC), for object data via extending mean shift or mode seeking under the framework of belief functions, which mainly contains two characteristics. First, the query object is preliminarily assigned as the noise, precise, or imprecise one based on the notion of “belief shift”. Second, partial credal redistribution with dynamic cluster centers, inspired by fuzzy/possibilistic and evidential partition, to avoid the “uniform effect”, is established to reassign the imprecise object to the specific cluster or related meta-cluster. Once assigned to meta-cluster, it indicates that the specific clusters involved in the meta-cluster cannot be distinguished for the object since it may lie in the overlapping or intermediate areas of different specific clusters. By doing this, the BSC can reasonably characterize the uncertainty and imprecision between specific clusters, regardless of their shapes and sizes in the space. The effectiveness of the proposed method has been validated on several synthetic and real data sets by critically comparing with that of other related methods.

Title:  Web Bias Monitoring

Speaker: Théo JAMMES-BEUVE, Thomas LE FLOCH and Olivier MEYER

Place and Time: Rennes – Lipari (F202) and 30 June 2020

Title:  The Thing’in platform

Speaker: Maria Massri

Place and Time: Rennes -Aurigny(D165) and 13 March 2020


The Thing’in platform (www.thinginthefuture.com) is an open platform initiated by Orange, which proposes to operators, object manufacturers, object owners and service developers to cooperate together in the birth of the web of things. This platform allows to understand the context in which each object evolves thanks to the repository of connected objects coming from different universes. The evolution of an object results in the evolution of its relationships and interactions with other objects over time which can be naturally handled by a temporal and graph-oriented data storage.
Although graph databases have extensively found applications in the relationship centered era, a time-version support is seldom provided. For instance, current systems capture the most recently updated snapshot of the underlying graph, whilst the analysis and prediction of temporal behaviors imply the persistence of every graph element’s history. Since physical deletions are forbidden in such a scenario, the outgrowing data volume is a crippling restriction steering the interest in this area towards the optimization of the persistent storage. In this PhD thesis, we are aiming to deliver a storage and querying system that is capable of optimizing both space and query’s computation time costs.
Tomorrow, I will be presenting the rationale behind this work, the anterior academic work posited in this area with its limitations and possible solutions.

Title:  Public transportation

Speaker:  Gauthier Lyan

Place and Time: Rennes -Aurigny(D165) and 03 March 2020


“Nowadays, climate change has become an actual issue to address for both scientists and politicians. If the former can prove to the later that there actually are solutions to reduce the impact of human activities on the climate, the later cannot easily act without appropriate tools that facilitate choices on what to act on.
Public transportation systems are wide and complex, involving many stakeholders and heterogeneous factors that have an impact on their efficiency, hence global impact. We will propose a software approach that offer the possibility to study public transportation systems both in temporal and spatial dimensions, offering predictions of commercial speed in known and unknown environment, based on historical data and available exogenous data. The purpose of this research is to enable decision-makers to make better decisions about public transportation.”

We will discuss the data sources we already/should have, and if our assumptions about them make sense or not.

We will discuss the framework we are being imagining.

Title:  Privacy and ethical issues of AI in legal systems

Speaker:  Louis Béziaud

Place and Time: Rennes -Aurigny(D165) and 18 February 2020

Title:  From databases to artificial intelligence

Speaker:  Zoltan Miklos

Place and Time: Rennes -Aurigny(D165) and 11 February 2020


Repetition for the HDR presentation.

Title:  Modeling uncertainty and inaccuracy on data from crowdsourcing platforms: MONITOR

Speaker:  Constance Thierry

Place and Time: Rennes -Aurigny(D165) and 21 January 2020 (visio from Lannion)


Repetition for the EGC2020 presentation.

Links: Paper on HAL / EGC2020

Title:  Building metro map of scientific topics using hierarchy alignments

Speaker:  Ian Jeantet

Place and Time: Rennes -Aurigny(D165) and 14 January 2020


Presentation of the my joint work done with the Griffith University during my mobility in Australia. I’ll explain how we ended up to build metro maps of scientific topics to study the evolution of science through time.

Title:  Feedback from the Shonan Meeting on Crowdsourcing/Future of Work

Speaker:  David Gross-Amblard

Place and Time: Rennes -Aurigny(D165) and 17 December 2019

Title:  Crowdsourcing the database course with HEADWORK

Speaker:  Adrien Wacquet  (2019 Summer Internship)

Place and Time: Rennes -Aurigny(D165) and 03 December 2019

Title:  Web crawler & and the DIFFIX attack

Speaker:  Antonin Voyez

Place and Time: Rennes -Aurigny(D165) and 19 November 2019


Presentation of a web crawler made for the PROFILE project and a short presentation of the current work done for my upcoming thesis with ENEDIS : linear reconstruction applied to the DIFFIX system.

Title:  The anonymization of personal data: myth, limits, and successes

Speaker:  Tristan Allard,  Joris Duguépéroux, Tompoariniaina Andriamilanto

Place and Time: Rennes -Oleron(A008) and 12 March 2019

Link: Privacy Games @ Festival des Libertés Numériques 

Title: Overlapping hierarchical clustering

Speaker:  Ian Jeantet

Place and Time: Rennes -Oleron(A008) and 12 February 2019


Agglomerative clustering methods have been widely used by many research communities to explore hierarchical structures in their data. The produced cluster hierarchies contribute to understanding the hierarchical structures that are present in complex data. However the agglomerative methods necessarily result in a tree structure, where one has to make a split decision too early in the construction process, that can affect the conclusions one can make about the obtained hierarchical structure. In various settings, one needs a richer hierarchical structure to describe the clusters of the data. Moreover, clusters might also overlap. In this paper, We propose a framework that enables to compute hierarchical structures represented as directed acyclic graphs rather than trees. Our bottom-up method creates clusters with density-based merging criteria, such that the various clusters can overlap.

Title: Integrating uncertain data using user feedback in crowdsourcing applications

Speaker:  Marion Tommasi

Place and Time: Rennes -Oleron(A008) and  22 January 2019


Crowdsourcing applications are used in many domains to perform tasks which are difficult for computers or to gather knowledge using a crowd of people. To execute a task in a crowdsourcing application, human workers by performing some micro-tasks and the resulting data is integrated into the system to proceed with the completion of the global task. However, the data provided by workers is uncertain as human workers can make mistakes or eventually intentionally give a wrong result. We want to use the feedback of other workers to evaluate the trust in the data at any time of the workflow. Ultimately, we want to use this trust to have a workflow which adapts itself depending on the data ant the perceived trust in it to improve data quality. I will first present a model for crowdsourcing applications then present the model for user feedback.

Title: Data-Centric workflow for Complex Crowdsourcing Applications

Speaker:  Rituraj Singh

Place and Time: Rennes -Oleron(A008) and 15 January 2019


Crowdsourcing has emerged as a major paradigm for accomplishing work by paying a small sum of money and alluring the worker whole across the globe. However, the targeted tasks at crowdsourcing platforms are relatively simple, uncomplicated and are independent. In this work, we propose a novel data-centric workflow model for the design of complex crowdsourcing tasks with dependencies. The model allows orchestration of simple tasks, handles data and crowd workers, allows concurrency, and in addition provides high-level constructs allowing decomposition of complex tasks into orchestrations of simpler subtasks. We first define the syntax and semantics of the model, and then consider its formal properties, starting with the question of termination of a complex workflow (i.e., whether a system has non-terminating runs). Unsurprisingly termination is undecidable even for the simplest models. However, upon restrictions that are sensible in the context of crowdsourcing (namely that a crowd worker only has a bounded number of contributions in a workflow ), termination becomes decidable. We then extend the termination question to address the correctness of a workflow, i.e. the question of whether a terminating workflow always satisfies a constraint depicted in terms of the relation between the input of the workflow and its output.

Comments are closed.