TEXTCROWD

OVERVIEW: The Social Sciences and Humanities research communities face a fragmented research landscape that can be supported by EOSC. The EOSC would help overcome such fragmentation, by building on structuring and integrating initiatives such as the CLARIN, DARIAH and E-RIHS ERICs, and Digital Humanities Organizations (e.g. their Association ADHO) to offer advanced text-based services addressing common research needs (see recent survey by PARTHENOS). One example is enabling the semantic enrichment of text sources through cooperative, supervised crowdsourcing, based on shared semantics, and then to make this work available to others via EOSC. This would benefit many scientists in the long-tail even if delivering such a service presents real challenges around interoperability and multilingualism.

TEXTCROWD is an advanced cloud-based tool developed within the framework of the EOSCpilot project to process textual, archaeological reports. The tool has been boosted and made capable of browsing large online knowledge repositories, training itself on demand and used to produce semantic metadata ready to be integrated with information coming from different domains, to establish an advanced machine-learning scenario.

OBJECTIVES: TEXTCROWD is a tool that allows analyzing openly available archaeological text documents, pointing out concepts about space, time and artifacts and the relations between them, in order to index the documents. The main purpose is to produce and accumulate collections of semantically enriched texts based on domain ontologies and thesauri. At present, such texts use global descriptive metadata without a semantic structure: researchers will generate and revise the enriched documents using the tools provided, and these texts will then be available for direct searching by others thus increasing the total number of enriched and searchable texts.

MAIN ACHIEVEMENTS: Among the main results of TEXTCROWD, the following are worth a mention:

Semantic enrichment of text documents thanks to the creation of metadata to improve indexing, discoverability, accessibility and reusability
Recognition and automatic annotation of entities and concepts in texts with machine learning techniques
Knowledge extraction in a semantic format using renowned standards»Cloud architecture based on an easy-to-use virtual research environment
Interoperability of extracted knowledge to increase reusability in other projects
FAIR principles implementation to improve results accessibility
Availability of the tool to the broader scientific community via other projects

RECOMMENDATIONS: Improve the usability of cloud infrastructures on the side of user interfaces and reusability of components; provide facilities to support a modular approach, build the infrastructure centered around the needs of the user community; ensure interoperability by using open standards and guarantee reproducibility thanks to the source documents.

CONTEXT: Cultural heritage and humanities datasets are largely based on texts:

Reports
Archaeology: excavations, surveys
Conservation: diagnosis, restoration – often mixed with numeric results
Grey literature
Literary/historical sources
Research articles
Monographs

Download TEXTCROWD Presentation by Kathrin Beck (MPCDF)
Read the TEXTCROWD Success Story by Franco Niccolucci (PIN)

Public Attachment:

pin_textcrowd_poster.pdf

fliera4_textcrowd_web.pdf

NEWS & PUBLICATIONS

05 July 2019

EOSCpilot maps key deliverables for use by EOSC Executive Board Working Groups

During the review meeting of the EOSCpilot project earlier this month, EOSCpilot Coordinator Juan
04 June 2019

EOSCpilot Rounds Up Key Contributions to the EOSC

As EOSCpilot officially ended last week, 31 May 2019, we take a look back at some of the key cont
23 May 2019

EOSC Data Interoperability Ensure Availability of Scientific Data

The overall objective of task 6.2 was to complement the FAIR principles by providing a strategy and a set of recommendations for the EOSC to improve the availability of research data to users and services through an open cloud infrastructure.
22 May 2019

EOSCpilot Delivers Final EOSC Architecture Recommendation

Earlier this month, EOSCpilot delivered its pioneering work on what the European Open Science Clo
21 May 2019

An EOSC Roadmap for Service Portfolio

Shaping the definition of the Servie Porfolio of EOSC has been a significant activity in the EOSCpilot project. Along with the identification of services, the roadmap also focused on the implementation of the EOSC Service Portfolio Management.

Search form

TEXTCROWD

You are here