One of the biggest challenges for human sciences nowadays is to deal with an ever increasing amount of resources. The size and variety in forms of world’s heritage make it really difficult to get an overview of everything relating to one topic. To make it possible, digital humanities tend to numerize all kinds of resources using different methods. Data then need to be translated into usable information and linked together. To achieve these long tasks crowdsourcing is very often used. This allows things to move much faster but give heterogeneous sets of data which are difficult to link together and to navigate through.
Here are three articles which illustrate different approaches used to link data together and show the difficulties one can encounters when it comes to put different datasets together.
In the first article, “From Crowdsourcing to Knowledge Communities: Creating Meaningful Scholarship Through Digital Collaboration” 1, the authors different experiences they had with crowdsourcing through three different projects. They classify different cases of crowdsourcing, the ones in which the individuals deal alone with relatively easy tasks and the ones in which the task to achieve is more complex end the crowd will have to collaborate more as a community. They discuss the goods and bads of crowdsourcing. What comes out is that crowdsourcing can allow involving anyone who is interested in research on different topics which is interesting for the public and allows the research to move fast. The bad point is that the aim settled at the beginning is rarely exactly met but often takes various directions.
In the second article, “Generating Navigable Semantic Maps from Social Sciences Corpora” 2, the authors take a practical example to illustrate their approach of the problematic. Starting form documents concerning the 2007-2008 financial crisis their aim was to underline the roles of the different actors and the mechanic behind the different events of the crisis. They talk about the different automatic tools they use and underline the fact that automated processes sometimes lead to unsatisfactory results. They underline the problematic of entities sometimes appearing under different forms making it difficult for algorithms to make the link between the different appearances. They also asked experts in the area to assess the quality of the results.
The last article, “An entity-based approach to interoperability in the Canadian Writing Research Collaboratory”3, treats of the question of the format used to create datasets and the difficulties to link different datasets together due to the difference in format. They explain how they proceed to link different datasets and make them interoperable. They also mention different existing standards for datasets and think it is better to find solutions to bridge differently formatted datasets then to try standardizing everything. They point is mainly that the idea of humanists working in a standardized uniform way seems utopist and unachievable to them.
In sum one can say that crowdsourcing is a really important tool for humanists as it is quickly achieving incredible amounts of work. But data obtained this way are often really heterogeneous and difficult to deal with. On top of that each project uses its own technic and format making the different datasets obtained difficult to use together. Even though standardizing everything seems and though task I think that taking more time on the interface before launching a crowdsourcing operation would allow a better efficiency of the process, for example by guiding more the participant in tagging operations as it is a key point to afterwards link things together. Also standardizing some parameters of datasets seems really interesting to me. It might take a lot of time and energy to put into place but would allow much better collaboration between humanists in the world and save time by avoiding reworking data that have already been treated by someone else.
1: From Crowdsourcing to Knowledge Communities: Creating Meaningful Scholarship Through Digital Collaboration; Jon Voss, Historypin, United States of America; Gabriel Wolfenstein, Stanford University, United States of America; Zephyr Frank, Stanford University, United States of America; Ryan Heuser, Stanford University, United States of America; Kerri Young, Historypin, United States of America; Nick Stanhope, Historypin, United States of America
2: Generating Navigable Semantic Maps from Social Sciences Corpora; Thierry Poibeau, LATTICE-CNRS, France; Pablo Ruiz, LATTICE-CNRS, France
3: An entity-based approach to interoperability in the Canadian Writing Research Collaboratory; Susan Brown, University of Guelph, Canada; University of Alberta, Canada; Canadian Writing Research Collaboratory; Jeffery Antoniuk, University of Alberta, Canada; Canadian Writing Research Collaboratory; Michael Brundin, University of Alberta, Canada; Canadian Writing Research Collaboratory; John Simpson, University of Alberta, Canada; INKE Research Group; Mihaela Ilovan, University of Alberta, Canada; Canadian Writing Research Collaboratory; Robert Warren, Carleton University; University of Guelph, Canada