List CLARIN K-centres with a summary of their areas of expertise

Key Short name Full name Areas of competence Languages Modalities Linguistic topics Language processing topics Data types Resources families Generic topicsTypes of ServicesTour de CLARIN
101Spanish K-CentreSpanish CLARIN K-CentreThe Spanish CLARIN K-Centre aims to provide knowledge, services, consultancy and specialized web services to the Humanities and Social Science research communities. Our web services and consultancy is about how to use and research with basic tools that can handle and exploit textual data at least in the four (co)official languages (Spanish, Catalan, Galician, Basque) and English, which is one of the most important sources of information for many HSC disciplines.Spanish, Basque, Catalan, GalicianText1. general linguistics (phonology, morphology, syntax, semantics,
2. computational linguistics
3. corpus linguistics
4. applied linguistics
5. stylistics
Spanish, Catalan, Galician and Euskera language processing:
morphology, syntax, semantics, discourse
1. Lexical databases: general, sentiment, NERC...
2. Syntax Tree banks
3. Discourse Tree banks: correference, relational
4. Spoken databases
5. Semantic annotation: semantic roles, word sense,
6. Error annotation
7. Image bank (wikimedia)
8. Conversational QA
1. Grammars
2. Finite-State Applications
3. Statistical Methods
4. Neural Networks
1. Tools
2. Data
3. Mentoring
4. Dissemination
5. Tutorials
102PhA-OeAWPhonogrammarchiv - Austrian Academy of SciencesAs an audio and audiovisual archive with numerous collections of unique research recordings from all across the world, the Phonogrammarchiv offers various services: Besides providing access to its data and metadata resources (remote & onsite), it advises scholars on field research methodology and technologies of audio and audiovisual documentation, supporting them with necessary recording equipment. In addition, it widely shares its broad expertise on topics such as restoration, digitisation, format obsolescence, cataloguing, metadata, long-term preservation and storage.Audio and audiovisual recordings plus accompanying documentation on a wide variety of languages / dialects from all across the world, covering a timespan of 120 years.Audio and audiovisual recordings.Field linguistics, interview techniques (social/cultural anthropology, ethnomusicology), language documentation, oral history.Audio and audiovisual recordings.Archiving: physical restoration, digitisation, format migration, cataloguing, metadata, long-term preservation and storage. Research: methods and technologies of audiovisual fieldwork and documentationIndividual advice, group trainings, workshops and higher education teaching, internships, practical assistance and institutional cooperations. Access to audiovisual data and metadata (remote and onsite).Introduction
103TreebankingCLARIN Knowledge Centre for TreebankingTreebanks: construction, integration, search, processing, formatsTextBuilding and processing treebanksTree banksIntroduction Interview
104CLARIN-SPEECHCLARIN Knowledge Centre for Speech AnalysisTechnical advice on speech analysis relating to all aspects of speech technology, including speech science, speech applications, and speech in interaction.Swedish, EnglishSpeech, biosiglnals, audiovisual data, sensor dataphonetics, pathologyspeech analysis, speech modelling, speech processingacoustic and language models, dictionaries, vocabularies, pronunciation data, biosignals related to spoken interactionoral history, parliamentary recordsdeep learning, evaluation, tools, visualization, ASR, legal issue, data managementawareness, tools, mentoring
105DANSKCLARIN K-Centre DANSK - DANish helpdeSKDanish language and Danish sign language, language resources and language technology tools and services for Danish.Danish and Danish sign languageText, sign languagemorphology, syntax, semantics, pragmaticsDanish language processing, tokenisation, PoS tagging, lemmatization, name entity tagging, tree banks, corpus tools, multimodal corpora annotation and processingcorpora, word net, multimodal annotationshistorical and literary corpora, contemporary domain specific corpora, Hansards, multimodal annotations, NLP tools
106CLARIN-LearnCLARIN Knowledge Centre for Language Learning AnalysisOur centre is happy to provide advice on tools, corpora, and methods for the study of first and second language learning, conversational interactions, and a variety of language and developmental disabilities including aphasia, stuttering, TBI, dementia, and ASD.Speech, gestureslanguage development, conversation, language disordersspeech analysis, archiving, fluency, lexical accessCorpora, both text and multimodalChild language, conversations, clinical datacorpora, tools, web screencasts, manuals, GoogleGroup mailing lists, email support, workshopsIntroduction Interview
107SWELANGCLARIN Knowledge Centre for The Languages of SwedenInformation service offering advice on the use of digital language resources and tools for the Swedish language, minority languages in Sweden, the Swedish sign language, Swedish dialects, as well as other parts of the intangible cultural heritage of Sweden in text and speech, as well as language policy and planning.Swedish, Finnish, Meänkieli, Romani, Jiddisch, Swedish sign and other languages in SwedenText and sign language, spoken Swedish dialectslanguage policy and planning, language infrastructure, language technology, dialect studies, sociolinguistics, folkloristics, plain language and language comprehensibility, terminology, lexicographytopic modellingspeech recordings, mono- and multilingual lexica and word/term collectionslanguage policy and planningon-line lexica, q&a database, map interfaces for folk tales and dialects, open data, language consulting (by telephone, email and social media)Introduction Interview
108CLARIN-HUMLABCLARIN Knowledge Centre of Lund University Humanities LabAdvice on multimodal and sensor-based methods, including EEG, eye-tracking, articulography, virtual reality, motion capture, av-recordingMultimodal data, sensor-based datatext mining, machine learning-related research on textual data, keystroke loggingmultimodal and sensor-based methods, including EEG, eye-tracking, articulography, virtual reality, motion capture and av-recordingtools, mentoring, consultancy, tutorials
109PolLinguaTecCLARIN Knowledge Centre for Polish Language TechnologyProvides wide knowledge on the methods of natural language analysis with a special emphasis put on the analysis of Polish language. Offers support for all types of applications of Language Technology for Polish, both mono and multilingual ones. PolishTextPolish language processingtools, resources, dissemination, awareness, tutorials, helpdeskIntroduction Interview
110CKLDCLARIN Knowledge-Centre for linguistic diversity and language documentation The CLARIN Knowledge-Centre for linguistic diversity and language documentation offers expertise on data and data-related methods, technology and background information on language resources and tools to researchers - including students and native speakers. CKLD provides information and assistance relating to fieldwork and data-related methodological aspects and in particular relating to equipment, digital tools, methods, where to find data and information, whom to contact for specialist information on particular regions or language families.Under-researched languages and languages families (linguistic diversity). Expertise in Athabascan, Austronesian, Austro-Asiatic, Dravidian, Finno-Ugric, Papuan, etcText, audio-visual recorings of speechlanguage documentation, linguistic typology, linguistic fieldworkAV collections, typological databasesAV collections of endangered and under-researched languageslinguistic fieldworkInformation materials, guidelines, tutorials, consultancy
111CLARIN Knowledge Centre for Data Management at NSDProvides expertise in data management, including legal and ethical issues related to privacy and IPR.Data Management, Legal and ethical issues
112IMPACT-CKCIMPACT centre of competence - CLARIN K-centre in digitisationIMPACT-CKC (IMPACT centre of competence - CLARIN K-centre in digitisation), as knowledge centre offers expertise and resources to institutions and researchers looking for advice in digitisation and related fields. The IMPACT-CKC resoruces include a demonstrator platform for online testing tools, a collection of high quality images with associated ground truth, historical lexica for 10 languages as well as training materials and registries on tools, initiatives, datasets and competitions.Spanish, English, Polish, French, Dutch, German, Slovene, Czech, Latin, BulgarianText, AV datacorpus linguistics, diachronic language resources, language learningbasic language processing, information extractionlexical data, language models, linked open data and ontologieshistorical texts, lexical resources, literary texts, newspapersOCR, digitisation, visualisation, evaluation of toolstools, data, mentoring, dissemination, awareness, tutorials, web lectures. Introduction Interview
113CorpLingCzCzech CLARIN Knowledge Centre for Corpus LinguisticsProvides information, consulting and technical assistance on all topics related to corpus linguistics. This includes data formats, annotation, metadata encoding, corpus querying, corpus linguistics methodology, statistical methods etc. Another specialization of the centre is empirical research on the Czech language.CzechText, SpeechCorpus linguistics (including methodology and statistics)Basic language processing (POS tagging, parsing)Speech corpora, parallel corporaWe are ready to provide data, tools and technical assistance, share expertise and hold workshops on demand on the topics covered by the K-centre. There is an on-line helpdesk to handle the user requests.Introduction Interview
114CLARIN-SMSSwedish in a Multilingual SettingOffers special expertise in the areas of processing of parallel and comparable corpora, including alignment and machine translation, cross-linguistically consistent annotation within the framework of Universal Dependencies, computation and evaluation of measures of text complexity and language technology for Swedish Sign Language.SwedishText and sign languageProcessing parallel corpora, machine translation, annotation, evaluationparallel corpora
115DiaResCLARIN K-centre for Diachronic Language ResourcesDiachrionic text collections, historical texts, and tools and resources for processing and analysing themTextDiachronic language studiesDiachronic language processing
116TRTCTerminology Resources and Translation CorporaThe K-­-Centre provides information and training to users on the
preparation and documentation of translation-­-related resources,
in particular terminology resources and translation corpora. This
includes inquiries submitted to the Helpdesk related to tools,
methods, data, and guidance in seeking further expert support.
The service does not focus on language resources in particular
language, but is language independent.
TextTranslation studiesTranslation corpora, Terminology
117CLASSLACLARIN Knowledge Centre for South Slavic languagesOffers expertise on language resources and technologies for South Slavic languagesSlovene, Croatian, Bosnian, Montenegrin, Serbian, Macedonian, BulgarianTextApplied linguistics, Dialect studies, Sociolinguistics (for South Slavic languages)Basic processing of South Slavic languagestraining data, language models (for South Slavic languages)Newspapers, social media, parliamentary records, historical texts, language learner corpora (for South Slavic languages)deep learning, evaluation of tools (for South Slavic languages)tools, data, mentoring, dissemination, awareness, web lectures
118ACECLARIN Knowledge Centre for Atypical CommunicationAtypical communication encompasses language and speech as encountered during (second) language acquisition and development, and in language disorders, but also more broadly in bilingual language development and in sign language. ACE is specialised in this type of research and concomitant infrastructural issues related to data acquisition, processing and sharing, which is typically highly characterised by sensitivity issues. For data storage and access the centre collaborates with MPI’s TLA (The Language Archive) which is a CLARIN B Centre and also based in Nijmegen.Text, speech, sign languageLanguage acquisition (L1 and L2), language disordersCritical Data Management; Legal and ethical issuesInformation and guidelines about:
- consent (forms)
- hosting corpora and datasets containing atypical communication
- where to find corpora and datasets containing atypical communication
- including FAQ
Helpdesk/consultancy for questions on these topics
Technical assistance for designing, creating, annotating, formatting and metadating these resources
Outreach: presentations, workshops contributions, etc
119SAFMORILSystems and Frameworks for Morphologically Rich LanguagesSAFMORIL brings together researchers and developers in the area of computational morphology and its NLP applications. The focus of SAFMORIL is actual, working systems and frameworks based on linguistic principles providing linguistically motivated analyses and generation outputs. Such systems are relevant in particular for languages with rich morphologies. SAFMORIL offers online courses for developing morphologies, tokenizers and spell-checkers, and a repository for storing morphologies. Primarily Nordic and Baltic languages (such as Finnish, Swedish, Norwegian, Latvian, Lithuanian as well as the Sámi languages), but also more generally Fenno-Ugric languages, Inuit languages, Canadian First Nation languages and Babylonian languagesTextMorphology and MorphosyntaxProcessing of morphologically rich languagesLexical resources containing inflectional, derivational and compounding information as well as morphosyntactic grammars and language modelsMorphological Lexicons, Grammars and Language ModelsPrimarily Finite-State Applications, but to some degree also Statistical Methods and Neural Networksdata, tools, web demos, web lectures and tutorials
120PORTULANCLARIN Knowledge Centre for the Science and Technology of the Portuguese LanguageThe Science and Technology of the Portuguese Language is the thematic area of this CLARIN Knowledge Centre. Related to the Portuguese language, it covers all topics, from Phonetics to Discourse and Dialogue; considering all language functions, from communicative performance to cultural expression; approached by all disciplines, from Theoretical Linguistics to Language Technology; covering all language variants, from national standard varieties across the world to dialects of professional groups; taking into account all media of representation, from audio to brain imageology recordings.PortugueseText and speechPortuguese language processingBrain image recording
121K-BLPCLARIN Knowledge Centre for Belarusian Text and Speech ProcessingKnowledge about text and speech processing of Belarusian and other languages; Knowledge about Belarusian language learning; - Tools and resources for text and speech processing for Belarusian and other languagesBelarusianText and speech
122NLP:ELCLARIN K-Centre for Natural Language Processing in GreeceNLP research for Greek
Digital readiness of Greek
GreekTextCLARIN K-Centre NLP:EL will operate a helpdesk concerning Natural Language Processing for Greek and/or developed in Greece.
Besides responding to questions on the above issues (reactive activities), it will additionally provide informative material and documentation relevant to these issues (proactive activities); this material includes (but is not limited to)
- scientific publications and presentations on NLP research and applications for Greek,
- guides and tutorials on NLP tools and services for Greek and
- direct connection to the CLARIN:EL infrastructure, where the users can find more detailed information and further training and dissemination material.
123CORLI-K-centreCORLI French CLARIN Knowledge Centre for Corpora, Languages and Interaction Corpus linguistics with a special focus on the French language and the languages of FranceFrenchText corpora

[Visit the list of CLARIN Knowledge Centres with organisation details]

Steven Krauwer (s.krauwer@uu.nl) Utrecht Institute of Linguistics UiL OTS
Phone +31 30 253 6050 Faculty of Humanities, Utrecht University
[Page generated: 21-01-2020] Drift 10, 3512 BS Utrecht, Netherlands