Description du stage
Crowdsourcing is redefining the way people work in a fully networked world. It is an emerging paradigm that, in its most general definition, proposes to connect directly the offer to the demand through the Internet. An early example of a paid-work crowdsourcing platform is Amazon Mechanical Turk  (AMT for short). So-called requesters can post simple tasks on AMT and fix the price they are willing to pay for their completions, and workers can chose the tasks they want to perform. Tasks are usually simple data labeling tasks, rating tasks, or translation tasks, requiring only basic skills to be completed though being hard to solve algorithmically (for example, « Label each given tweet with the feeling it expresses.»). According to AMT, more than 500K workers from 190 countries are registered (this justifies the term crowd) and around 300K tasks are currently available at the time of writing these lines. These numbers do not take into account the other paid-work platforms similar to AMT (for example, FouleFactory  or CrowdFlower ), neither those that propose non-digital tasks (for example, the on-demand driver service Uber  or the 3D printing service Voovoo ). Comforting the current interests to crowdsourcing, a large body of research work is widening its scope, especially by enabling the management of complex tasks and of skilled workers  (for example, the collaborative writing of a news article over a given specific topic involving several workers with different skills). Crowdsourcing platforms are thus expected to become essential work orchestrators.
There is however a downside. Strong responsibilities, with respect to the various actors involved, are attached to this central position. Notable examples include the respect of the privacy of workers’ personal data, the confidentiality of the requesters’ tasks, or the neutrality of the crowdsourced processes and platform with competing requesters. To the best of our knowledge,very few works have tackled the problems due to the technical enforcement of such responsibilities in the crowdsourcing context.
The internship will focus on the confidentiality of the requesters’ tasks against the platform. There exists a full spectrum of possible solutions. At one extreme of the spectrum, a requester could fully obfuscate its tasks to the platform, by encrypting them for example. But this would simply forbid any work-assignment strategy from the crowdsourcing platform, and dramatically shrink its performances. At the other extreme of the spectrum, tasks could be posted in the clear. But, quite obviously, confidentiality is jeopardized. The intern will have to develop an in-between solution where information about the task is partially disclosed to the platform but a strong confidentiality level is guaranteed. The solution will be made of (1) a privacy model, possibly inspired from differential privacy  the current gold standard for private data publishing, and (2) a privacy algorithm designed to enforce the model. At the same time, the solution will have to preserve the quality and performances of the work-assignment strategies of the platform as much as possible, and the human factors will have to be considered (for example, the way workers make their choices among tasks). A theoretical study will prove the confidentiality guarantees of the privacy model and algorithm, and an implementation (Java for example) will show experimentally the quality and performance reached by the solution. Depending on the approach chosen, other aspects of the problem may be considered (for example, confidentiality of the tasks against workers, workers’ privacy).
Tristan Allard : email@example.com
David Gross-Ambard : firstname.lastname@example.org
Zoltan Miklos : email@example.com
- Amazon Mechanical Turk, https://www.mturk.com/mturk/welcome
- FouleFactory, http://www.foulefactory.com/
- Crowd Flower, http://www.crowdflower.com/
- Uber, https://www.uber.com/
- Voovoo, https://voovo.co/
- S. Basu Roy, I. Lykourentzou, S. Thirumuruganathan, S. Amer-Yahia, and G. Das, “Task Assignment Optimization in Knowledge-intensive Crowdsourcing,” The VLDB Journal, vol. 24, no. 4, pp. 467–491, août 2015.
- C. Dwork, “Differential privacy,” in Proceedings of the 33rd international conference on Automata, Languages and Programming – Volume Part II, Berlin, Heidelberg, 2006, pp. 1–12.