Experimental analysis of complex workflows in crowdsourcing systems

Loïc Hélouët (CR, Hab.), Zoltan Miklos (MCF)
SUMO and DRUID teams, INRIA / IRISA Rennes
Funded by the ANR 2016 HEADWORK project

« Crowdsourcing » is a generic term for task-solving techniques that rely on a large group of online
users. We can consider for example the success of FoldIt [1], an online game on protein folding,
which allowed the crowd to solve a problem left open by specialists. Wikipedia can also be seen as
an encyclopedia produced by crowdsourcing. Commercial versions of crowdsourcing also exist,
such as Amazon Mecanical Turk [2]. In e-Science, crowdsourcing is used to gather huge data sets
(participative sensing, for example the « Sauvage de ma rue » [4] project). Systems specifically
designed for crowdsourcing are on their way (sCOOP at Stanford [5], crowDB at Berkeley [6]).
A difficulty addressed by crowdsourcing systems is to build complex applications orchestrating
crowd competences. Such applications can be intricate processes that need to distribute large sets of
data to crowd participants, then aggregate the obtained results, and continue the process differently
according to the nature or quality of collected answers. Complex crowd based services are
frequently implemented through human management of tasks distribution, or using ad-hoc and lowlevel
programming solutions. The next challenge for crowdsourcing systems is to allow for easy
design of applications and services with complex workflows over crowd platforms. This calls for
the design of intuitive formalisms to facilitate design, deployment, and runtime management of
complex tasks on a crowd platform. The considered models have to handle at the same time data,
control (i.e. handle complex tasks progress depending on collected answers), quality of collected
answers, and provide mechanisms to distribute work to pools of crowd participants with various
competences in order to maximize crowd efficiency [11].
This master topic focuses on modeling of crowd-based applications, and in particular on the
following issues:
1- First, we would like to implement a simple workflow model (proposed by the supervisors of
the internship) that is suitable in the context of crowdsourcing and analyze experimentally
the realized model. One of the key problems in the context of crowdsourcing is the
potentially poor quality of the responses one can obtain from the crowd (imprecision,
incorrect answers, etc.). There are several techniques that address these problems for single
tasks, and we will explore how these techniques behave when adding simple human
intelligence tasks to a workflow. We also would like to understand the effects of the
workflow design on the final output.
2- Second, besides the questions related to imprecision, we would like to explore and
understand the workflow executions. Some execution paths might never terminate while
others could deliver fast answers. We would like to analyze experimentally the relation
between the workflow design and liveliness of workflow executions.
3- Finally, we would like to construct a benchmark of workflows. This benchmark should
include commonly used workflow settings, in particular in the context of citizen science
projects. The benchmark will serve as a basis of systematic analysis of the above-mentioned
questions.

Context & Supervision

This master internship takes place in the context of the HEADWORK project (2016-2020), funded
by the ANR. It is located at IRISA, Rennes, France. It is co-supervised by Loïc Hélouët (CR
INRIA, SUMO1 team) and Zoltan Miklos (Mcf, DRUID2 team). Upon success, an extension of the
master for a three years PhD funded by the ANR is possible.
Loïc Hélouët :
Mail: loic.helouet@inria.fr
Tel: 02 99 84 75 90
Web: http://people.rennes.inria.fr/Loic.Helouet
Zoltan Miklos :
Mail : zoltan.miklos@irisa.fr,
Tel: 02 99 84 22 54
Web: http://people.irisa.fr/Zoltan.Miklos/

Competences:

The master candidate should also have the following competences:
– Fluent in English (written & spoken)
– Basic algorithmic skills
Competences in some of the following domains are not mandatory but are welcome
– Formal techniques (automata, models checking, algebras, …)
– Implementation skills
– Databases and data management
Foreign applications are welcome. Knowledge of French is not mandatory.

Application

Candidates should send their application by mail to loic.helouet@inria.fr and Zoltan.miklos@irisa.fr with the following documents:
– Complete curriculum vitae
– Motivation letter
– Copy of marks obtained after baccalauréat

References

[1] Foldit, solve puzzles for science. http://fold.it/portal/
[2] Amazon Mechanical turk. https://www.mturk.com/mturk/welcome
[3] Sauvages de ma rue. http://sauvagesdemarue.mnhn.fr
[4] Projet sCOOP, Stanford. http://www-cs-students.stanford.edu/~adityagp/scoop.html
[5] M.J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, R. Xin: CrowdDB: answering queries with
crowdsourcing. 61-72.
[6] R. Hull, J. Su, and R. Vaculin. Data management perspectives on business process
management: tutorial overview. SIGMOD, 2013.
[7] S. Abiteboul, P. Bourhis, A. Galland, B. Marinoiu: The AXML Artifact Model. TIME 2009: 11-
17, 2009.
[8] E. Badouel, L. Hélouët, G.-E. Kouamou, C. Morvan, R. Nsaibirni. Active Workspaces;
Distributed Collaborative Systems based on Guarded Attribute Grammars. ACM SIGAPP
Applied Computing Review 15(3):6-34, 2015.
[9] E.Badouel, L. Hélouet, C.Morvan, Petri nets with semi-structured data. In Petri Nets’15,
Bruxelles, June 2015.
[10] S. Abiteboul, E. Antoine, G. Miklau, J. Stoyanovich, J. Testard, Rule-based application
development using Webdamlog, ACM SIGMOD International Conference on Management of
Data, ACM, pp 965—968, 2013.
[11] P. Mavridis, D. Gross-Amblard, Z. Miklos. Using Hierarchical Skills for Optimized Task
Assignment in Knowledge-intensive Crowdsourcing. 25th International World Wide Web
Conference, 2016.