CCL

Összesen 5 találat.
#/oldal:
Részletezés:
Rendezés:

1.

001-es BibID:BIBFORM135364
Első szerző:Abdelzaher, Esra (linguist)
Cím:[KTK]You get it through lexicography: extracting suppressed language from LLMs using lexicographic scenarios as jailbreaking tools / Esra Abdelzaher, Ágoston Tóth
Dátum:2025
Megjegyzések:Taboo words present a challenge for a lexicographer to include and describe in a language resource, as they are forms of verbal violence. However, discarding offensive words from general-purpose lexicographic wordlists disregards the representation of an integral part of the mental lexicon. The present study aims at using lexicographic scenarios to jailbreak four GPT variants into the retrieval of offensive words that are frequently used yet undocumented in most lexicographic resources. While Large Language Models (LLMs) can be used to document a headword, the presence of taboo items may prevent these systems from providing an answer. Our results reveal that the type of the model and the lexicographic framing of the extraction task improved the responses of the models and increased the success rate, with the optimal configuration reaching 87.5% success rate. The AI-generated lexicon of offensive words currently contains approximately 250 headwords grouped into gender, age, religion and race categories. The words also vary in their inherently or contextually offensive types. A searchable user-friendly version is accessible through https://arabic-studies.com/Elex/index.html. The main contributions of this lexicon are detecting lexicographically undocumented offensive terms, pointing to the negative context of several headwords and discovering new senses of apparently neutral ones. In addition, LLMs provide very useful morphological, semantic and socio-cultural information in the definitions, despite the inconsistencies and some overgeneralizations in the definitions. Although corpus evidence proved the success of LLMs in detecting offensive words and senses, the automatic evaluation of AI-generated example sentences showed their limited value from a pedagogical perspective.
Tárgyszavak:Bölcsészettudományok Nyelvtudományok tanulmány, értekezés
könyvrészlet
Offensive language
Jailbreak
Prompt engineering
GPT
Megjelenés:Electronic lexicography in the 21st century (eLex 2025) Proceedings of the eLex 2025 conference / szerk. Kosem Iztok; Jakubíček MiIoš; Medveď Marek; Zgaga Karolina; Arhar Holdt Špela; Munda Tina; Salgado Ana. - p. 774-794. -
További szerzők:Tóth Ágoston (1974-) (nyelvész)
Internet cím:Szerző által megadott URL
Intézményi repozitóriumban (DEA) tárolt változat
Borító:

2.

001-es BibID:BIBFORM107049
Első szerző:Abdelzaher, Esra (linguist)
Cím:Defining Crime: A multifaceted approach based on Lexicographic Relevance and Distributional Semantics / Esra Abdelzaher, Ágoston Tóth
Dátum:2020
ISSN:1787-3606
Megjegyzések:This paper demonstrates how the parallel examination of distributional data and frame semantic information can expose word senses that are not documented in FrameNet. In our case study, we compare the distributional features of the word crime to its properties stored in the FrameNet database also considering dictionary data that we find in three online monolingual dictionaries. Our analysis indicates that crime has senses that are absent from FrameNet. The five senses that we identify can be separated on the basis of (a) frame hierarchies, (b) frame elements, (c) syntactic and semantic data extracted from corpora using lexicographical tools and (d) distributional similarity. Annotated examples are provided to demonstrate each sense.
Tárgyszavak:Bölcsészettudományok Nyelvtudományok idegen nyelvű folyóiratközlemény hazai lapban
folyóiratcikk
crime,
FrameNet,
distributional semantics
lexicographic relevance
Megjelenés:Argumentum. - 16 (2020), p. 44-63. -
További szerzők:Tóth Ágoston (1974-) (nyelvész)
Internet cím:Szerző által megadott URL
DOI
Intézményi repozitóriumban (DEA) tárolt változat
Borító:

3.

001-es BibID:BIBFORM134408
Első szerző:Tóth Ágoston (nyelvész)
Cím:Improving the lexicographic accessibility of WN through LLMs / Ágoston Tóth, Esra Abdelzaher
Dátum:2025
Megjegyzések:This paper reports the results of an ongoing research on the usability of neural language models to improve WordNet (WN) data for pedagogical lexicographic use. We test the efficacy of BERT-based methods for the selection of example sentences from SemCor and the addition of guidewords to WN senses. We probed our method in a series of time-measured classroom experiments that used WN data only and WN data after adding example sentences and guidewords. We compare two methods of the automatic selection of "good" examples for lexicographic use and discuss the value of BERT probability scores to the selection of useful guidewords. The gap between the pedagogical values of the SemCor extracted sentences and the handpicked examples in WN was reflected in the longer time students spent on the decoding tasks after adding examples and guidewords
Tárgyszavak:Bölcsészettudományok Nyelvtudományok előadáskivonat
könyvrészlet
LLMs, Lexicographic Accessibility
WN
Decoding tasks
Megjelenés:Proceedings of the 13th Global Wordnet Conference. - p. 142-150. -
További szerzők:Abdelzaher, Esra (1992-) (linguist)
Internet cím:Szerző által megadott URL
DOI
Intézményi repozitóriumban (DEA) tárolt változat
Borító:

4.

001-es BibID:BIBFORM128835
Első szerző:Tóth Ágoston (nyelvész)
Cím:BERT may help in lexicographic sense delineation / Tóth, Ágoston; Abdelzaher, Esra
Dátum:2024
ISSN:2524-7840
Megjegyzések:This study addresses the challenge of sense delineation, which is one of the most difficult tasks for lexicographers (Kilgarriff, 1998), who need to abstract senses from corpus citations (Kilgarriff, 2007). There is initial evidence that contextualized embeddings (such as BERT word representations; Devlin et al., 2019) form distinct clusters corresponding to different word senses (Wiedemann et al., 2019; Schmidt & Hofmann, 2020), making BERT successful at the word sense disambiguation task. This study further examines this idea from a lexicographical perspective. The experiment cites dictionary examples and creates contextualized embeddings to represent example sentences using BERT. Clusters are visualized in two dimensions and are quantitatively and qualitatively processed. Results reveal that BERT's distributional representations are not only sensitive to salient syntactic variation, but they also capture the semantic diversity in word senses. The different parts of speech of the same word formed distinctive clusters with moderate to high silhouette scores. Also, literal, metaphoric and metonymic extensions of word senses appeared in different hierarchical clusters. Dissimilar semantic preferences and differences in the cognitive prominence of a target word were also mirrored in forming multiple sub-clusters of the same sense. Qualitative error analysis of the cases with negative silhouette scores showed the influence of fuzzy categorization on the distributional representation of example sentences. It also spotted example sentences which failed to specify the abstractness of the definitions or overspecified the use of a target word in a considerably long sentence and may, accordingly, be of less practical value for the dictionary user.
Tárgyszavak:Bölcsészettudományok Nyelvtudományok idegen nyelvű folyóiratközlemény külföldi lapban
folyóiratcikk
Digital lexicography
BERT
Hierarchical clustering
Sense delineation
Polysemy
Megjelenés:International Journal of Digital Humanities. - 2024 (2024), p. 1-22. -
További szerzők:Abdelzaher, Esra (1992-) (linguist)
Internet cím:Szerző által megadott URL
DOI
Intézményi repozitóriumban (DEA) tárolt változat
Borító:

5.

001-es BibID:BIBFORM113094
035-os BibID:(Scopus)85171342691
Első szerző:Tóth Ágoston (nyelvész)
Cím:Probing visualizations of neural word embeddings for lexicographic use / Ágoston Tóth, Esra Abdelzaher
Dátum:2023
ISSN:2533-5626
Megjegyzések:Our study explores the possibility of using the distributional characteristics of headwords as exemplified in the online Oxford Learner's Dictionaries, captured by contextualized word embeddings and displayed in two dimensions to help lexicographers find sense categories, detect variations across senses and select potential example sentences. In addition to the dictionary examples, we added British National Corpus data that contained the headwords. BERT word embeddings were extracted for all occurrences of the headword, then two-dimensional representations of the resulting high-dimensional BERT embedding vectors were created using 4 algorithms: MDS, Isomap, Spectral and t-SNE. Clustering was assisted by k-means clustering and Silhouette scoring for different k values. Our investigation showed that Silhouette scores for k-means increased after dimension reduction; furthermore, spectral and t-SNE visualizations were associated with the most cohesive clusters. The highest Silhouette scores recommended a number of clusters different from the number of dictionary senses, but semantic and syntactic patterns were detectable across the recommended clusters.
Tárgyszavak:Bölcsészettudományok Nyelvtudományok előadáskivonat
könyvrészlet
sense delineation
word embedding visualization
BERT
Megjelenés:Electronic lexicography in the 21st century: Proceedings of the eLex 2023 conference / edited by Marek Medved, Michal Mechura, Iztok Kosem, Jelena Kallas, Carole Tiberius, Milos Jakubícek. - p. 545-566. -
További szerzők:Abdelzaher, Esra (1992-) (linguist)
Internet cím:Szerző által megadott URL
Intézményi repozitóriumban (DEA) tárolt változat
Borító:
Rekordok letöltése1