Towards Lexicon-Grammar Verbnets Through Lexical Ontologies

In this article, we present research directly inspired by the Princeton WordNet lexical ontology project (Miller, Fellbaum), which was a response to the real need for ontologies corresponding to the natural conceptualization common to all language users, within a given natural language, or within a specific sublanguage . Lexical ontologies for a given language or language subsystem determined by the scope of communication needs turn out to be useful and even necessary for constructing formal models of linguistic competence and, consequently, for designing and implementing AI systems with linguistic communicative competence, both passive and active. An important milestone of the research program presented in this work is the acquisition of tools in the form of extensive lexical ontologies of a new type, referred to in this work as Lexicon­Grammar Verbnets . In the article, we refer to the works of authors such as: Alain Colmerauer, Charles Fillmore, Christiane Fellbaum, Gaston Gross, Maurice Gross, Thomas R. Gruber, Richard Kittredge, George A. Miller, Martha Palmer, Kazimierz Polański, and Piek Vossen.

Towards Lexicon-Grammar Verbnets Through Lexical Ontologies Knowledge processing and the ability to build a model of knowledge about the environment in which language users participating in the speech act function (people, devices, systems) are two key components necessary to achieve the goal of creating new generation AI systems at a level significantly exceeding current chatGPT systems.We will show, among other things, a number of results, including our owns, which make up a methodologically coherent whole, and which bring us -step by step -closer to the above-defined goal.An important stage of the research program outlined here is obtaining tools in the form of complex lexical ontologies of a new type, referred to as LexiconGrammar Verbnets.
The title of this paper, "Towards Lexicon-Grammar Verbnets Through Lexical Ontologies, " illustrates the course (of a part) of our work in the field of Human Language Technologies from the 1980s until now.We consider obtaining a lexicon-grammar of the verbnet type (LexiconGrammar Verbnet for Polish) with a rich conceptual coverage as a solid basis for further R&D and implementation works in the field of IT.In this article, we present the results scattered across a number of our publications containing essential elements and ideas.They form the backbone of a longterm, ongoing research program.2 The review of the results begins with early works, conducted in the conditions of scarcity of digital language resources, both lexical and grammatical.These are prototypes of systems constituting the BPII (Basic Polish for Information Interchange) family, as well as results in the field of digital lexical data and digital grammatical data formats obtained as part of national and European projects (POLEX and the EU projects PECO-COPERNICUS CEGLEX and PECO-COPERNICUS GRAMLEX); see section Early Works.Section WordNet Like Lexical Ontologies focuses on the development of a wordnet lexical ontology.The part concerning basic research mainly deals with the problems of synonymy, while the practical part presents the implementation of a lexical ontology of the WordNet type (for the Polish language) PolNet v1.Section From PolNet 1.0 to Lexicon-Grammar VerbNets is the main part of the work and concerns the transformation of the PolNet v1 lexical database into a lexical ontology, which is a VerbNet type Lexicon-Grammar.The most important challenge at the current stage of development of WordNet systems with Lexicon-Grammar features turned out to be the extension of synonymy relations and homonymy/hyperonymy relations to predicative synsets.This section discusses, among other things, the currently performed tasks.

Early Works
The research referred to in the title of the article is the direct result of previous projects that made us aware of the shortages of basic resources and IT tools for processing the Polish language.The beginnings of our work on systems with linguistic competence in the 1980s and 1990s, and partly their continuation, were characterized by the lack of access to digital linguistic resources (dictionaries, grammars) in a form that would enable their direct use in IT applications.Nevertheless, the Polish language belongs to a small elite group of languages with a long tradition of linguistic work, which turned out to be a solid theoretical base for our research.
The successful development of systems with language competence became possible thanks to the work we started on a grammatical description of the Polish language, suitable for IT use in parsing algorithms, that is, in algorithms that perform syntactic analysis, which is a preparatory stage in the process of calculating the meaning of a text.3The result of this work were the POLINT grammars developed since 1980s.Our source of inspiration was the questionanswering system ORBIS implemented in PROLOG for English and French by A. Colmerauer and R. Kittredge (using DCG) (Colmerauer & Kittredge, 1982), later extended by Vetulani with a Polish module (Vetulani, 1988).
The first POLINT programs (see Vetulani, 1988) focused on modeling question-answer dialogues, were created in order to demonstrate the application potential in terms of language coverage in BPII systems4 and to obtain practical knowledge of linguistic resources necessary to meet the needs of application systems.This potential was positively tested in the confrontation with the empirical material in the form of a corpus of empirically generated dialogues (Vetulani, 1990), and finally confirmed in the POLINT-112-SMS system (Vetulani & Osiński, 2017).
The first successful attempts to parse sentences of the Polish language already allowed us, in accordance with our expectations and with postulates of Antonio 3 It should be noted here that this research was carried out under simplifying assumptions, namely the compositionability and computability of the language.These assumptions have been discussed among philosophers of language and linguists since at least the Enlightenment period, but they seem essential for creating IT-useful, precise models of language, pushing the boundaries of deterministic language modeling.
4 The subset of Polish corresponding to the grammatical coverage of POLINT prototypes from the late 1980s is referred to in this period as BPII (Basic Polish for Information Interchange).NEO.2023.35.22 p. 5/32 Towards Lexicon-Grammar Verbnets Through Lexical Ontologies Zampolli,5 creator and promotor of the concept of Language Industry to identify priorities in terms of technological needs of Language Engineering.One of the first of our ventures was the POLEX dictionary project (1994)(1995)(1996).6 Polish is a language with a complex inflection system and has a relatively free word order.Therefore, simple adaptation of processing algorithms efficient for languages like English or French appeared hard to apply, as in Polish the basic information concerning the function of a word in the sentence is typically being encoded in the word form, independently of its linear position in the sentence.Dictionaries are a suitable place to store this information.At the time we started our research, good-quality grammatical descriptions of Polish existed only in the form of traditional dictionaries and grammars addressed to traditional customers.However, these resources, typically addressed to human readers, appeared to be of low usefulness for automatic processing because of lack of precision.
Our solution, the POLEX Polish Lexicon, is an electronic morphological dictionary which includes the core Polish vocabulary of general interest acquired from the traditional paper dictionary (Szymczak, 1983(Szymczak, -1985)).7 POLEX is based on a precise machine-interpretable format (coding system), the same for all grammatical categories (Vetulani et al., 1998a).
The POLEX format we propose is uniform for all grammatical categories (parts of speech) and does not apply exceptions to the rules, which makes creating algorithms for generating and lemmatizing text much easier than when using traditional language descriptions, which place high demands on programmers due to the excessive complexity of the description.The POLEX dictionary entries take the following form: BASIC_FORM+LST_OF_STEMS+PARADIGMATIC_CODE+DISTRIBUTION_OF_STEMS 5 Antonio Zampolli considered the lack of resources in the form of IT-processable corpora, dictionaries, and digital grammars necessary to build language models to be a critical obstacle in the development of utility IT systems.See Language Resources.Overview by J. J. Godfrey and Antonio Zampolli (1996) and (Zampolli, 2006).
6 The first public release of the resource contained over 42,000 nouns, 12,000 verb, 15,000 adjectives, 25,000 participles, and about 200 pronouns.
7 Supplemented by the basic swear words not found in this dictionary and the most frequently used elements of jargon, regional and colloquial vocabulary.Several paper editions of Słownik Języka Polskiego PWN [Polish Language Dictionary PWN] by Mieczysław Szymczak were edited between 1978 and mid 1990s.For our purposes we used the three volume version published from 1983 to 1985; see (Szymczak, 1983(Szymczak, -1895)).NEO.2023.35.22 p. 6/32 Zygmunt Vetulani, Grażyna Vetulani For example, dictionary entries for two inflected variants of the word sucker8 look as follows: frajer; frajer, frajerz; N110; 1;1-5,9-13;2:6-8,14 frajer; frajer, frajerz; N110; 1;1-5,8-14;2:6-7 The paradigmatic inflection code contains full paradigmatic information about inflection, that is, the way of associating endings with stems in order to obtain the desired word form.The inflection code (here N110) includes full information on morphology and inflection, in particular a list of endings appropriate for all paradigmatic positions.The distribution parameter (distribution_of_stems) relates stems (here frajer, frajerz) to paradigmatic positions.The information stored in a dictionary entry is complete and unambiguous, and inflection classes are constructed in such a way that there is no need to consider exceptions.
The other two projects discussed in this section were of a different nature.
The main goal of the CEGLEX consortium (Vetulani et al., 1998b) was to test the EU EUREKA project GENELEX that proposed a reusable generic model for lexicons assumed to respond to IT needs.GENELEX was implemented (between 1990 and 1994) for a number of Western European languages, such as French, English, German, Italian.Within CEGLEX three Central-European languages, Polish and Czech (Slavic) and Hungarian (Finno-Ugric) were used as testbeds to verify genericity of the GENELEX model.
It is worth noting that the final Polish module developed in CEGLEX/ GENELEX went further than original GENELEX, which focused on morphological and syntactic layers while the semantic layer was addressed only marginally.
The three layers of the CEGLEX/GENELEX model were confronted with linguistic data of the languages under consideration with generally positive results.For the Polish module of the project this confrontation consisted in adapting the model GENELEX to Polish language data.
The CEGLEX project resulted in a successful attempt to test (on representative linguistic data) the feasibility of an IT-oriented lexicon-grammar covering all three basic layers (morphological, syntactic, and semantic) of grammatical description.
8 The word frajer (en.sucker) is a masculine-personal noun (pol.rzeczownik męskoosobowy), inflected for number and case (two numbers /singular and plural/ and seven cases.The themes (fra jer and frajerz) are the same in both entries.Code N110 represents a 14-position string of endings, the same in both entries for the lexeme sucker: (, a, owi, a, em, e, e; y, ów, om, ów, ami, ach, y).The fourth parameter, topic distribution, describes the assignment of each paradigmatic ending to the appropriate topic.In this example, 1:1-5,9-13 means that the first stem (sucker) is combined with the endings from paradigm positions 1 to 5 and 9 to 13.Similarly, the expression 2:68,14 indicates that the endings from positions 6, 7, 8 and 14 are connected to the second stem (sucker).NEO.2023.35.22 p. 7/32 Towards Lexicon-Grammar Verbnets Through Lexical Ontologies The CEGLEX/GENELEX methodology together with the results of the POLEX project were the starting point for our work within the PECO-COPERNICUS GLAMLEX project, carried out from 1995 to 1998.The main goal of this project was to build, in accordance with the GENELEX methodology, morphological digital dictionaries and related IT-oriented tools.The intention of the tasks of the GRAMLEX project (Vetulani et al., 1998b) regarding the Polish language was to contribute to the improvement of the situation in the field of language engineering tools and resources.Among the main achievements of GRAMLEX was the creation of a corpus-based morphological dictionary for the Polish language encoded in SGML (in the proprietary GRAMCODE format.9 The GRAMLEX project turned out to be the first step towards implementing a lexicon-grammar for the Polish language.

WordNet Like Lexical Ontologies Synonymy, Hyperonymy and Inheritance10
The concept of synonymy refers to the concept of meaning, which is commonly used in informal discourse and usually does not raise controversy.Consequently, it is generally used as a primary concept, not requiring analytical definition referring to other concepts treated as known.If the reference to the obviousness of a concept turns out to be inappropriate, then an axiomatic definition can be used.Definitions of this type do not enter into the ontological nature of the defined concept, but are operational in nature, specifying the way of using the concept by referring to another, assumed to be already known.A classic example is the Peano arithmetic (around 1889),11 where primitive concepts such as additions, multiplications, natural numbers, etc., are explained by axioms that, by reference to other concepts, determine how to use these concepts.In traditional linguistics, similar methods are sometimes used to determine the meaning of a word or phrase by giving usage examples considered representative.When defining NEO.2023.35.22 p. 8/32 Zygmunt Vetulani, Grażyna Vetulani synonymy, reference to meaning may be appropriate if well-defined procedures are used to compare word meanings, for example, by applying context-of-use analysis (see Vossen, 2002).
Our initial work on lexical ontologies was motivated by the desire to obtain a basic ontology for the Polish language inspired on the one hand by Linnaeus' systematics, and on the other hand by the pioneering work of cognitive scientists and linguists from Princeton (Miller, Fellbaum and others) on the WordNet system.The Princeton WordNet was an ontology directly linked to lexical material in the form of abstract nouns grouped into classes of synonyms called syn sets.The inspiration turned out to be accurate and led to the creation of the linguistic ontology, PolNet -Polish Wordnet (version 1.0), which satisfactorily corresponds to the conceptualization reflected in the nouns of the Polish language.

The Problem of Defining the Concept of Synonymy
In natural languages, concepts (understood as mental equivalents of complex or simple entities) are represented by words.Synonymy is commonly understood as a binary relation holding between two words (terms, expressions) if and only if its arguments have the same or similar meaning.The need to define the relation of synonymy more precisely leads us to distinguish the three cases where the term synonymy will be used.
Case 1.If the concepts represented by their names (being simple or compound nouns) are extensional (that is, when they can be fully described by specifying which of the entities fall under the given concept and which do not), then by synonymy of some two names we understand that both names refer to the same set of entities.In a similar way, we can construct an extensional definition of synonymy of verbs: two verbs are said to be synonymous when they both refer to the same set of states and/or events.
Case 2. In turn, when the meaning of each of the two words compared to each other can be unambiguously determined by a specific set of features and their values, then their synonymy means that both can be uniquely described by the same set of features (attributes) taking the same values.
Case 3. If neither of the above two cases occurs, then it remains to refer to definitions of the nature of procedures referring to the circumstances of the use of each of the compared terms.
Let us compare three of the frequently discussed solutions: 1) Leibnitz's proposal, 2) Princton WordNet proposal (Miller-Fellbaum), 3) EuroWordNet proposal (Vossen).NEO.2023.35.22 p. 9/32 Towards Lexicon-Grammar Verbnets Through Lexical Ontologies Ex. 1) We quote, after Vossen (2002, p.18), a very strong definition of synonymy given by Leibnitz: "two expressions are synonyms if the substitution of one for the other never change the truth value of a sentence in which the substitution is made." Note that when using this definition, synsets are generally very small, or even composed of one element only.This means a significant flattening of the hierarchy based on the hypernymy relationship, which in turn reduces the potential benefits of the inheritance mechanism of attributes associated with synsets and the values of these attributes.The advantage of the Leibnitz's proposal is that syno nymy is an equivalence relation and thus marks a partition in a set of words.
Ex. 2) George A. Miller and Christiane Fellbaum (see Vossen, 2002, p. 18) proposed a less restrictive approach to synonymy, encapsulated in the formula: "two expressions are synonymous in a linguistic context C, if the substitution of one for the other in C does not alter the truth value." In the literal sense, it means that to conclude that these expressions are not syno nymous, it is enough to refer to just one selected context C in which the replacement of one expression with another will change the logical value of the whole sentence.12De facto, this procedure (correctly) indicates as synonyms only those words for which the fixed context C can be considered representative of a particular meaning.In dictionary practice, the condition of representativeness of examples (containing the context of use) for illustrating a typical meaning is not strictly observed (see e.g., Polański, 1980Polański, -1992)), which in practice may significantly hinder the creation of WordNet-type systems based on the above definition of synonymy.
12 "The weak point of Miller's approach is the synonymy criterion (above) which -alone -is not sufficient to form synsets because it does not guarantee transitivity when the C context changes.To remedy this defect, the initial criterion of synonymy must be reinforced by imposing the reference to the same context C. " (Vetulani, Z. and Vetulani, G., 2015, p. 117).[Translation from French by Z. Vetulani: "Le point faible de l'approche de Miller est le critère de synonymie (ci-dessus) quiseul -ne suffit pas pour former les synsets car il ne garantit pas la transitivité quand le contexte C change.Pour remédier à ce défaut il faut renforcer le critère initial de la synonymie en imposant la référence à un même contexte C. "] NEO.2023.35.22 p. 10/32 Zygmunt Vetulani, Grażyna Vetulani What follows is an example of an EuroWordNet context-based tests (applied to English) for noun-noun synonymy (Test 1) (Vossen, 2002, p. 19)

Hierarchical Organization of Concepts in PolNet
The classic wordnet organization for nouns is based on a hierarchy of concepts referring to the relation of hyponymy/hyperonymy for nouns.This hierarchy has a tree structure.More general concepts are higher in this hierarchy and those more specific are lower down.Tree organization is intended to allow inheritance of properties, essential for knowledge representation and inference (see Linnaean systematics14).The extension of the PolNet lexical ontology to predicative synsets15 introduces relations between predicative synsets and other ontology entities (synsets or not).Of particular importance is the introduction of relations that connect the predicative synset with arguments that are assigned attributes called semantic roles (which are synsets or other objects of the PolNet ontology).Assigning semantic roles to the argument positions opened in predicative expressions serves to determine links or connectivity constraints between these ex pressions and arguments.16Expanding PolNet with predicative synsets requires special caution when extending the hyponymy/hyperonymy relationship to predicative synsets.

PolNet Development Incremental Algorithm (Nouns)
In this section we present an algorithm of creating synsets and hierarchical relations based on hyponymy/hyperonymy relations between nouns (simple and compound).This algorithm was directly used by lexicographers in the first phase NEO.2023.35.22 p. 11/32 Towards Lexicon-Grammar Verbnets Through Lexical Ontologies of building the PolNet v.1 database.The DebVisDic platform developed at the Masaryk University Brno was used (Pala et al., 2007).
Application of the algorithm requires: • the Visdic or DEBVisDic platform (or any functionally equivalent tool), • on-line access to Princeton WordNet, • a good monolingual lexicon17 (called reference dictionary in the algorithm description), preferably accessible on-line (we used Uniwersalny słownik języ ka polskiego PWN18 (Dubisz, 2006) as the basic reference lexicon and Słownik języka polskiego PWN (Szymczak, 1995) as a complementary one), • a team with both language engineering and lexicographical skills.
The algorithm input consists of a list of words (lexemes).The output is a WordNet code segment for: a) synsets, b) the ISA relation between synsets (detemined by the hyponymy/hyperonymy relation).
The general procedure for expanding PolNet consists in performing a sequence of operations, step by step: 1.Looking through the reference dictionary, we search for wordmeanings19 that are synonyms.2. We create synonymity classes using appropriate definition criteria.These classes are called synsets.3.For created or modified synsets, we search for candidates for hyponyms and hyperonyms using our own language competence, dictionaries, LSR list and knowledge of the Princeton WordNet structure.4. For pairs of synsets selected in step 3, we perform hyponym and hyperonym definition tests.Short example: Let us take the Polish word zamek as an example.The list of the word-meanings identified at in step 1 will be: • zamek-1 (zamek I-1 in the dictionary): a lock • zamek-2 (separated from the zamek-1 meaning, where the phrase zamek błys kawiczny is mentioned): a zip fastener • zamek-3 (zamek I-2): a machine blocking lock, e.g. a valve lock • zamek-4 (zamek I-3): a gun lock • zamek-5 (zamek II-1): a castle Zamek2, zamek3 and zamek4 will all be hyponyms of zamek1.
17 By a good dictionary we mean one where different word-meanings are explicitly distinguished.
18 PWN is the name of a Polish publishing house.

Language Resources Used: Dictionaries and Tools
The research, the main results of which are summarized in this article, make up a description of the research path leading to a coherent methodology enabling the design and implementation of large AI systems20 with language competence.One of the most important milestones of the long-term research program discussed here is the implementation of a prototype of a large AI system used for practical verification of decisions regarding the selection and/or development of appropriate tools and methods for natural language engineering.21 In addition to standard tools and methods commonly recognized as elements of the canon of IT and linguistic knowledge, during our research (until the implementation of the testing system /POLINT-112-SMS/), we considered it appropriate to use two classes of resources: A) specialized resources and publicly available tools -necessary or useful in the project, B) own resources and tools developed in the project, which turned out to be needed to implement the milestones of our work.Class (A) includes: • IPI PAN National Corpus of Polish Language (on a limited scale) (Przepiórkowski, 2004), • PWN Polish Language Dictionary (version edited by M. Szymczak, 1995), • Universal Dictionary of the Polish Language (edited by S. Dubisz, 2006), • SyntacticGenerative Dictionary of Polish Verbs (Polański, 1980(Polański, -1992)), • Internet dictionary SJP.PL, more on this topic in (Vetulani et al., 2010, p. 158-159), • Tools for generating WordNet lexical networks -VisDic and DebVisDic (Masaryk University Brno) (Pala et al., 2007).Group (B) includes: • formats and vocabularies created in the POLEX, GRAMLEX and CEGLEX projects (Vetulani et al., 2010) Towards Lexicon-Grammar Verbnets Through Lexical Ontologies • a corpus of recordings from the emergency telephone 997/112 (confidential recordings, not intended for sharing), • verb-noun collocations for the Polish language: methodology, data formats, predicative-nouns /basic resource/ (Vetulani, G., 2000), basic noun-synsetscreation algorithm (Vetulani et al., 2007), algorithms for expanding the collocation dictionary (Vetulani, G., Vetulani, Z., Obrębski, T., 2008), a digital dictionary of verbal-nominal collocations (Vetulani, G., 2012), • coding algorithms for valency dictionaries, • various algorithms for expanding the PolNet database (as of 2010).

Inspirations. Princeton WordNet
Creating advanced systems with language competence, such as AI systems, requires knowledge processing, and thus referring to abstract concepts.For this purpose, ontologies (as defined by T. R. Gruber) are used (see the opening paragraph of the article).Ontologies, which on the one hand correspond to the natural conceptualization of the world -real or fictitious, and on the other hand are formal entities subject to IT processing, are WordNettype systems.23The WordNet lexical ontology (also known as Princeton WordNet /PWN/) is an implementation, in the 1980s by G. A. Miller and colleagues at Princeton University's Cognitive Sciences Laboratory, of a new method for describing semantic vocabulary that has proven particularly useful for searching information on the Internet.The key idea of this method is to present the lexicon described by referring to the concepts of syno nymy and hyperonymy.PWN is composed of classes of synonyms called synsets and is organized hierarchically by the relation of hyponymy/ hyperonymy between synsets.Some other semantic relations between synsets (as meronymy, antonymy, etc.) are implemented as well.WordNet-like systems have an advantage over traditional ontologies because they explicitly account for the relationships between the words of the language and the concepts of the ontology /represented by synsets/.

Lexical Ontology PolNet v1
Our research initiated in the early 2000s was inspired by the work of George A. Miller and his team on the WordNet lexical ontology, as well as by the work NEO.2023.35.22 p. 14/32 Zygmunt Vetulani, Grażyna Vetulani led by Piek Vossen in the EuroWordNet project.At later stages of work on the PolNet system, we also relied on the pioneering research of Maurice Gross on the concept of Lexicon-Grammar, initially implemented for the French language (Gross, M., 1975;1994;1981) and independently conducted work (in the same period) by Kazimierz Polański, and crowned with the implementation in 1980-1992 of the SyntacticGenerative Dictionary of Polish Verbs, as well as on the results of the FrameNet (Fillmore et al., 2002) and VerbNet (Palmer, 2009) projects, close to the assumptions of Lexicon-Grammar.
The launch in 2006 of the construction of PolNet (a lexical ontology intended to be a wordnet-type lexical database) was a response to the need for a language processing module for implementation of an stand-alone, large-scale IT system with language competence (POLINT-112-SMS) (Vetulani, Z., 2014).While the concept and structure of the PolNet database was modeled on the solutions adopted for the Princeton WordNet system (Miller and Fellbaum, 2007), the methodology for creating the PolNet database was developed from scratch by a team of Polish computer scientists and lexicographers.24The adopted methodology assumed the use of existing dictionaries of Polish in order to maintain the conceptualization appropriate for users of the Polish language.
The PolNet database is a structure built from synonym classes and relations between these classes.Synonym classes (synsets) represent concepts identifiable in natural language, thanks to which PolNet can be used as a lexical onto logy corresponding to the conceptualization reflected in the Polish language.PolNet v1 was built on the basis of high-quality traditional dictionaries of the Polish language and the study of available language corpora (such as IPI PAN Corpus (Przepiórkowski, 2004) and small domain corpora).Resource creation is done incrementally, starting with high-frequency vocabulary25 and words that are (for various reasons) considered important.
While the initial work on PolNet was conducted towards a system with a structure similar to the Princeton WordNet and intended to serve as an ontology naturally associated with the language, over time, the PolNet project, influenced by theoretical work carried out independently by Maurice Gross and Kazimierz Polański and implementation-oriented works (in particular by Alain Colmerauer and Charles Fillemore), evolved into a Lexicon-Grammar by gradually incorpo-24 Mainly from the Department of Computer Linguistics and Artificial Intelligence of the Adam Mickiewicz University and the Faculty of Modern Languages and Literatures of the Adam Mickiewicz University.
25 A departure from this principle, made for methodological reasons in order to enable early testing of the developed resource in applications for which the condition of lexical completeness must be met, was the inclusion of terminology specific to these applications.NEO.2023.35.22 p. 15/32 Towards Lexicon-Grammar Verbnets Through Lexical Ontologies rating simple and compound verbs.This evolution coincided with the progress of theoretical work on the development of a formalized dictionary of verbalnominal collocations initiated in the 1990s by Grażyna Vetulani (see Vetulani G. 2000;2012), and with Gaston Gross's independent research on the category of object classes (fr.classes d' objets) (see e.g., Gross, G., 1994).
The first versions of the PolNet database, made available to a limited extent before 2012, included mainly nouns and the most important verbs.It was also during this period that verbnoun collocations began to be included in the PolNet database.The addition of simple and complex verbs (verb-noun collocations) along with syntactic information was the first step towards giving the PolNet lexical database the character of Lexicon-Grammar (as understood by Maurice Gross and Kazimierz Polański).
What follows is a (simplified) example of a noun synset (code

Usefulness of WordNet Lexical Networks for IT Application Development
The usability of the PolNet network as a lexical ontology in specific applications (e.g., in AI systems with language competence) is primarily determined by the properties of the concepts of synonymy and hyperonymy, as well as the features of lexical coverage (more on the prospects for the development of lexical ontologies of the PolNet/Lexicon-Grammar VerbNet type later in the article).NEO.2023.35.22 p. 17/32 Towards Lexicon-Grammar Verbnets Through Lexical Ontologies

From PolNet 1.0 to Lexicon-Grammar VerbNets
The extent to which WordNet lexical networks will be useful in IT applications is determined by the properties of the concepts used in defining these networks.These concepts include, above all, the notion of synonymy, as well as relations defined on synsets.Of the latter, the relation of hyperonymy between the synsets representing particular concepts is the most important.Hyperonymy plays the role of the backbone for organizing the structure of the synset network.In the network, synsets can also enter into relationships with entities other than synsets, e.g., with attribute values or metadata.
Already the first attempts to extend the PolNet system with simple and complex verbs prompted us to in-depth reflection on synonymy and homonymy.The aim was to propose definitions that would correspond to the intuitive understanding of these concepts by linguists and at the same time be of a proced ural nature, facilitating the writing of algorithms for creating synsets and extending the homonymy/hyperonymy relationship for the purposes of knowledge management using mechanisms of inheritance of the features of objects represented by synsets.
In order for the search for appropriate solutions to be tested on the basis of language material in applications of a practical nature (on a real scale), it was first necessary to supplement those language resources that were used to complete the first stage described in section WordNet Like Lexical Ontologies, as well as to acquire or create new resources.In this respect, a pioneering task was the development of dictionaries of predicative nouns and verbal-nominal collocations (Vetulani, G., 2000;2012), as well as the proposal of a model for encoding and implementing grammatical information assigned to verb synsets for collocations.The most important of these tasks are listed in section Synonymy, Hyperonymy and Inheritance (see also Vetulani, Z. et al., 2010).

New Inspirations
Kazimierz Polański (1929Polański ( -2009)), parallel to Maurice Gross (1934Gross ( -2001)), was a precursor of the idea of Lexicon-Grammar.In his formalized dictionary of Polish verbs Polański includes entries with morphological, syntactic, and semantic information related to the chosen word form, which is also the ID of the entry (Polański, 1976;1980-1992).The dictionary was developed and published in the years 1980-1992 and included 7,000 entries for the most important Polish verbs.NEO.2023.35.22 p. 18/32 Zygmunt Vetulani, Grażyna Vetulani At the same time and independently of Polański, Maurice Gross was working on the formal description of the French verbs.Gross's concept is similar to Polański's in that the word form of the verb is directly related to the relevant lexical and semantic information.Gross held the opinion that the determinants of the meaning of words are elementary sentences characterizing their typical uses.Both of the above approaches are also found in the idea of the WordNet lexical network implemented under the direction of Charles Miller at Princeton University and organized around the concept of synonymy, which makes WordNet legitimately considered a lexical ontology.
All three approaches in the initial phase were implemented independently for significantly different languages: English, French and Polish (in alphabetical order).These languages are characterized by a different grammatical, and, in particular, dictionary tradition, which (probably) explains the fact that the initial research was not mutually quoted.
An important reason for the wide-spread adoption of these ideas is their significant application potential, supported by insightful theoretical work aimed at strengthening the lexical and grammatical coverage of important data repositories and tools needed for the development of language engineering (including multilingual aspects).
The EU-funded EuroWordNet project led by Piek Vossen (Vossen, 2002) went in this direction.The excellent theoretical documentation of EuroWordNet, was an important source of inspiration for the Lexicon-Grammar Verbnet for Polish.

The Need for Lexical and Grammatical Resources
Initial work on the PolNet system was motivated by the desire to obtain ontologies sufficient to meet the basic needs27 in the field of knowledge representation.At this stage, an ontology that well reflects the conceptualization typical of the language that people use every day seemed to be sufficient.Thus, in the initial period, limiting our work to the noun category was justified.However, this state turned out to be insufficient when there was a need to represent knowledge about situations, states, and events in AI systems, typically expressed in language 27 The need for large lexical resources was not significant in the initial period of work discussed in Section Early Works, because until the end of the 1980s in Poland there were no favorable conditions for practical work in the field of natural language technology.NEO.2023.35.22 p. 19/32 Towards Lexicon-Grammar Verbnets Through Lexical Ontologies by predicative-argument structures, inspired by computer logic and knowledge engineering.Hence the need to extend PolNet with language constructions used to express relational content.
The basic lexical categories for this role are simple (one-word in Polish) or complex verbs of various types (see Vetulani, G., 2000).Among the latter, in Polish and a number of other languages, the most important are verb-noun collocations, composed of a supporting verb and a predicative noun, belonging to the category of abstract nouns.The support verb (Vsup) primarily plays a syntactic role, but sometimes also a different one (e.g., pragmatic or sociolinguistic), while the predicative noun (Npred) is associated with syntactic and semantic attributes.The latter are organized in valency structures expressing (through the attribute values) requirements or constraints of connectivity with arguments in a sentence structure.(More on the predicate-argument model in Vetulani, Z., 1998and 2004and Karolak, 1984).
Since the class of verb-nominal constructions is much more flexible and evolutionarily open than the class of simple verbs, we considered it reasonable to treat this class as a priority in the development of the PolNet database.The first step was to develop a careful methodology for recognizing the use of a compound structure as a verb-noun collocation acting as the center of a sentence.We will devote the rest of section Valency Structures in PolNet Lexical Ontologies to the acquisition of verbal-nominal collocations.
At the beginning of our research in this field, we focused on the description of the noun capable of playing the role of predicate in the verb-noun construction (Vetulani, G., 2000).In the 1970s and 1980s, the first important work on the predicative noun in the verb-noun construction appeared in the French literature; see (Giry-Schneider, 1978), (Danlos, 1980), (Vivès, 1983), (Gross, G., 1987).It is customary for a predicative noun (Npred) to appear in an analytic construction, partly fixed (frozen), forming with its accompanying verb (Vsup) a verbnoun collocation (Vsup + Npred) which plays the role of the sentence center (in simple sentences).The first tangible result of implementing the assumptions described above was the development of a digital dictionary covering over 14,600 Polish verb-noun collocations that can act as a predicate in a sentence (Vetulani, G., 2012).

Valency Structures in PolNet Lexical Ontologies
When the main goal of the first phase of the PolNet project, which ended with the implementation of PolNet v1, was to develop noun synsets and the relations between them (induced by semantic relations between the elements of synsets), the extension of PolNet to verb synsets, and more generally predicative synsets, NEO.2023.35.22 p. 20/32 Zygmunt Vetulani, Grażyna Vetulani posed a significant challenge that forced redefinition of the concept of synonymy.The modification required the introduction of relations aimed at enabling the formulation of conditions of connectivity between verb and noun synsets (representing predicate and arguments, respectively).This function is played by valency structures.
The key to making the right decisions regarding the development of linguistic ontologies for grammatical categories other than abstract nouns is to follow the idea already successfully validated for nouns at the stage of PolNet v1 implementation.What we mean here is that the structure of a formal ontology is consistent with the natural categorization of knowledge, so that Gruber's28 maxim, which has worked for noun ontologies, does not lose its validity for other grammatical categories.This was the direction of research by a number of linguists working on the formal description of the semantics of natural languages.
In this field the most active among Polish linguists was Kazimierz Polański, while for other languages, pioneering research was conducted by Charles Fillmore, Martha Palmer (English), Maurice Gross and Gaston Gross (French), Piek Vossen (Dutch) and others.It is important to take account of the work of mathematicians and logicians such as Gottlob Frege, Alfred Tarski, Richard Montague and Kazimierz Ajdukiewicz, who had an essential impact on the formation of the model of thinking about natural language in the pre-informatics period.
Grouping together verb synsets and noun synsets according to the semanticsyntactic connectivity constraints imposed by the argument positions opened in a sentence by the predicate gives the PolNet system the status of a lexicongrammar.The key to extending synonymy to predicative phrases (simple predicative verbs, predicative nouns, verbal-nominal predicative collocations and other grammatical categories) in such a way as to respect compatibility with the idea of Lexicon-Grammar is the concept of valency structure (Vetulani Z. and Vetulani, G., 2014).
By valency structure we mean here information about all argument positions opened by a predicative word, taking into account both semantic constraints on arguments, as well as, morpho-syntactic constraints on text elements filling these positions (case, number, gender, etc.) (Vetulani, Z. and Vetulani, G., 2015).In particular, we require synonyms to have the same valency structure and the same assignments of semantic role values.Thus, the valency structure is one of the formal exponents of meaning and, ipso facto, imposes strong granularity constraints on the synonymy of predicative expressions.NEO.2023.35.22 p. 21/32 Towards Lexicon-Grammar Verbnets Through Lexical Ontologies Extending the Synset Definition.Traditional descriptions of the vocabulary of natural languages generally distinguish words in terms of the meaning that is assigned to them.By meaning we understand the reference of a word29 by the user to reality (real or fictional).This reference may associate a word with an object, a class of objects, or a bundle of relevant semantic features.
From the point of view of knowledge representation, particular importance is attached to the noun and verb categories.Both of these categories are composed of simple and complex forms.Typical meanings of nouns are entities (physical or abstract) or their descriptions.Typical meanings of verbs (simple or compound) are relationships between entities (physical or abstract), as well as states and events relating to entities (as well as other states, events, etc.).
Extension of the dictionary with synsets containing predicative expressions (predicative verbs and nouns, predicative collocations, etc.) is done with the use of predicative uses obtained from text corpora.Analysis of usage contexts provided the necessary syntactic and semantic information used for further work.
As in the lexicographical tradition (traditional dictionaries), during the development of the current versions of the PolNet system, examples were grouped to illustrate related uses, but shoving differences in surface implementation (see, for example, Polański, 1980Polański, -1992)).For verbs and other predicative expressions, the key property defining their meaning (in the above sense) is the way they function in the structure of the sentence, in which they play a central (organizing) role (according to the widespread opinion of many linguists).
The description of the function of the entry in the structure of the sentence specifies the conditions of connectivity between the predicative expression (verb) and noun groups (as arguments).Connectivity conditions are described in the valency structure, which consists of appropriately selected syntactic frames obtained from the analysis of empirical material (corpus).
Implementation of Valency Structures.The (simplified) example below is supposed to give a rough idea of the implementation of simple valency structures for predicative words.This is a code generated for the synset that includes four synonyms with a common meaning that may be translated to English as to help.This synset is composed of two Polish predicative simple verbs (Vpred) (pomóc and pomagać) and two (predicative) verb-noun collocations (Vsup+Npred) (udzielić pomocy and udzielać pomocy).(Each of the two Intra-Synset Variations.Adoption of the concept of meaning as the starting point for the construction of lexical ontologies has important consequences for the utility aspects of the use of ontologies in engineering practice.The critical point is the scope of the commonly used, and thus vague, term meaning. The carriers of intra-synset distinctions are often, but not exclusively, support verbs (Vsup) in predicative compound constructions, such as verb-noun collocations (Vsup+Npred).
Vsup plays an important role in the interpretation of complex predicative structures because: 1) in many cases it allows one to abolish the polysemy of the predicative form (here: a predicative noun) which is important from the point of view of applications, and also 2) brings information about register and aspect.
Grażyna Vetulani in her recent work on meaning-related aspects of support verbs (Vsup), exhaustively analyzes the role of Vsup in complex predicative NEO.2023.35.22 p. 24/32 Zygmunt Vetulani, Grażyna Vetulani expressions.She observes that despite its apparently subordinate non-predicative role, the support verb brings important semantic and grammatical information to the meaning of the whole predicative expression.In particular support verbs often serve to determine register and aspect of a collocation and, ipso fac to, to concretize the meaning of the predicative noun (Npred) of the collocation (Vetulani, G., 2022).

Valency Structures in Lexicon Grammar VerbNets.
Storing the valency structure together with synsets -as part of their descriptionbrings a number of benefits from the point of view of use in NLP applications (parsing, computer understanding, text processing) and is consistent with the idea and practice of Lexicon-Grammar as a tool dedicated to the broadly understood area of Language Technology for real utility applications (Gross, M., 1979).
An example of using a lexicon-grammar for the Polish language (built on the basis of PolNet v3) is the prototype of the POLINT-112-SMS system intended to support information management and decision making in emergency situations (see e.g., Vetulani et al., 2010;Vetulani & Marciniak, 2011;Vetulani & Osiński, 2017).The system is able to interpret SMS texts messages, as well as understand and process information provided by the human user.
POLINT-112-SMS has also proven itself to be an environment for testing the usefulness of grammar lexicons in the creation of utility applications.In particular, easy access to valency information facilitated the creation of simple heuristics allowing for effective (smart) search space reduction in syntactic and semantic analysis (parsing).This feature enables computationally cheap creation and testing of prototypes of utility systems or their replaceable modules, as well as the development of systems with multilingual competence.

Verb-Noun Collocation Gathering.
The family of verb-noun collocations is (in Polish and many Indo-European languages) an important group of compound verbs typically built of 1) an abstract predicative noun (Nsup) with a semantic and semantic-syntactic function, and 2) a support verb (Vsup), the main role of which is to introduce the predicative component (e.g., Nsup) and (often) to convey the pragmatic aspects (Vetulani, G., 2022).In some cases, the support verb is omitted from the surface structure (ellipsis).The semantic-syntactic function is primarily realized by the valency structure which fixes the conditions for the connection of the predicate with the arguments.In contrast to simple words, both nouns and (even more so) verbs, compound words are less well described than single-word forms for most languages.This is largely due to the scarcity of empirical research NEO.2023.35.22 p. 26/32 Zygmunt Vetulani, Grażyna Vetulani Various stages of work on the lexicon of verb-noun collocations were described in (Vetulani, G., 2000) and (Vetulani, G., 2012) and with the 2012 publication the lexicon was made available in digital form.The dictionary resource obtained was used in the implementation of valency structures in PolNet v3.For this reason, PolNet v3 may be considered the first mature version of Lexicon-Grammar Verbnet (Vetulani, Z. & Vetulani, G., 2016).

Final Comments
The transition from the PolNet v1 phase to PolNet v2 was a significant step towards the Lexicon-Grammar, when the concept of the valency structure was launched.Starting from PolNet v2, valency structures were used in PolNet systems as the basic exponent of the meaning of collections of predicative expres sions (simple or complex) organized in synsets.The requirement of mutual com patibility of syntactic patterns for all elements of the synset adopted for PolNet NEO.2023.35.22 p. 28/32 Zygmunt Vetulani, Grażyna Vetulani determines that the valency structure is ipso facto a determinant of the meaning of the predicative synset.The work currently being carried out aims to significantly increase the lexical and linguistic coverage of the class of complex predicative expressions, as well as to expand the scope of research covering the pragmatic layer of the Polish language.
Positive results in terms of practical usefulness of the lexical ontology model32 indicate directions of natural continuation of previous work.These will be: • at the grammatical description level: work covering the syntactic and semantic levels corresponding primarily to the needs generated by emerging application perspectives; this work will require further acquisition of empirical data from representative corpora certifying the use of units, • at the pragmatic level (currently in the initial phase): extension of the model with new factors that may allow the internal structure of synsets (intra-synset relations) to be taken into account, • at the tool level: development and implementation (or adaptation of existing ones) of the most effective systems collecting the necessary empirical data.