Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Replicating relevance-ranked synonym discovery in a new language and domain
Max Planck Institut für Informatik, DEU.
Trafikverket. Blekinge Tekniska Högskola, Institutionen för programvaruteknik.
Ansvarig organisation
2019 (Engelska)Ingår i: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Springer Verlag , 2019, s. 429-442Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Domain-specific synonyms occur in many specialized search tasks, such as when searching medical documents, legal documents, and software engineering artifacts. We replicate prior work on ranking domain-specific synonyms in the consumer health domain by applying the approach to a new language and domain: identifying Swedish language synonyms in the building construction domain. We chose this setting because identifying synonyms in this domain is helpful for downstream systems, where different users may query for documents (e.g., engineering requirements) using different terminology. We consider two new features inspired by the change in language and methodological advances since the prior work’s publication. An evaluation using data from the building construction domain supports the finding from the prior work that synonym discovery is best approached as a learning to rank task in which a human editor views ranked synonym candidates in order to construct a domain-specific thesaurus. We additionally find that FastText embeddings alone provide a strong baseline, though they do not perform as well as the strongest learning to rank method. Finally, we analyze the performance of individual features and the differences in the domains. © Springer Nature Switzerland AG 2019.

Ort, förlag, år, upplaga, sidor
Springer Verlag , 2019. s. 429-442
Serie
Trafikverkets forskningsportföljer
Serie
Lecture Notes in Computer Science, ISSN 0302-9743
Nyckelord [en]
Domain-specific search, Generalization, Replication, Synonym discovery, Thesaurus construction, Construction, Information retrieval, Software engineering, Thesauri, Building construction, Domain specific searches, Individual features, Learning to rank, Medical documents, Semantics
Nationell ämneskategori
Programvaruteknik
Forskningsämne
FOI-portföljer, Bygga
Identifikatorer
URN: urn:nbn:se:trafikverket:diva-5761DOI: 10.1007/978-3-030-15712-8_28ISBN: 9783030157111 (digital)OAI: oai:DiVA.org:trafikverket-5761DiVA, id: diva2:1734200
Konferens
41st European Conference on Information Retrieval, ECIR; Cologne; Germany; 14 April 2019 through 18 April
Projekt
KREDA - Kravhantering i en digital anläggning
Forskningsfinansiär
Trafikverket, TRV 2017/92595Tillgänglig från: 2023-02-06 Skapad: 2023-02-06 Senast uppdaterad: 2023-02-16Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Person

Unterkalmsteiner, Michael

Sök vidare i DiVA

Av författaren/redaktören
Unterkalmsteiner, Michael
Av organisationen
Trafikverket
Programvaruteknik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 223 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf