The Démonette Database

Overview

The Démonette database (under CC-BY-SA 4.0 license) is structured in two tables, freely searchable online and downloadable: the table of lexemes table and the table of relations.

To the moment, the table of lexemes counts 287,630 different lexemes. X of them are involved in circa 80,000 distinct derivational relations in the table of relations.

The database is made up of information from different sources: Converts, Démonette 1.2, Dénom (which itself has 11 sub-bases), Dérif (composed of 9 sub-bases), DiMoc (9 sub-bases) and finally Mordan. These various bases are heterogeneous and required a work of harmonization and completion to put them in the format expected in Démonette. As far as the animate nouns are concerned, only the masculine ones have been entered (e.g. épicier ‘male grocer’), with a coindexation with the feminine (épicière, ‘female grocer’). Cases too problematic in terms of semantic or formal analyzability were set aside.

Each of these (sub-)databases documents an affix or a family of affixes. Thus, 117 affixes were processed, corresponding to 144,206 relations. Some relations are described in several databases. In Demonette, they have been merged. In the end, the database contains 81,232 different relations for which it is possible to query the construction of each of the lexemes in relation to the other, the complexity of the relation between the two lexemes, as well as the orientation that links the two lexemes.

Among the 117 affixes processed,

  • 91 are used to form adjectives
  • 13 are used to form nouns
  • 5 affixes are used to form verbs
  • 7 are used to form ambiguous nominal / adjective forms
  • finally, the conversion process may result in verbs, nouns and adjectives.

From a formal point of view, we chose to indentify the derived word’s derivational stem as one of the stems belonging to the base inflexional paradigm. The rest of the derived word is the graphical form of the affix (variant). For example, the base of the adjective blanchelet (‘whitish’) is the adjective blanc (‘white’) represented by the graphical sequence `blanch’ of the feminine wordform `blanche’. Therefore blanchelet is analyzed as suffixed by the variant -elet of the suffix -et. For each of the 117 affixes, 1 to 88 variants have been distinguished. All in all, 448 variants of exponents have been identified.

Tagset

The relation (W1,W2) between two lexemes of the same derivational family is characterized by two features. (see illustration, Family of the verb “laver” `wash’)

  • A simple relation (W1,W2) connects a derived word to its base. If W1 is the derived word (lavageN-laverV) then the relation is descending (des2as). Otherwise it is ascending as2des (laverV -lavageN). It is non-oriented (NA) when there is no formal way to determine which of the two words is derived from the other (volN-volerV).
  • A simple, indirect relation connects two words derived from the same base (lavageN-lavableA)
  • A complex direct relation connect a descendant item to its ascendant relative. Between the two words there is at least one derivational step (between laverV and relavageN we necessarily go through either lavageN or relaverV). It can be as2des (laverV-relavageN) or des2as (relavageN-laverV).
  • An accidental relation involves a unique word pair, (i) either from a morphological point of view: in mentir ‘lieV ‘ – mensonge ‘lieN‘, the nominal affix –onge is not attested anywhere else in the French lexicon, (ii) or from a semantic point of view: the verb asperger ‘spray’ and the noun asperge ‘asparagus’ are formally connected but not semantically.
  • Otherwise, a relation is complex indirect (relavageN– lavableA)

Relations determine the morphological series of the connected lexemes. These are represented by the common stem (X) and an exponent when relevant: e.g. (X,Xable) for (laverV, lavableA).

Each series belongs to a derivational type: suf (Xable), pre (antiX), NA (X), pre-suf (reXage).

Some derivational relations (W1,W2) are atypical: meaning and form do not coincide. To express this discrepancy, two new values are added to the tagset that characterize the description of the feature complexity for (W1,W2) . This is illustrated in the figure below (Family of the noun “école” `school’)

  • A (W1,W2) relation is formally motivated (motiv-form) but not semantically when the formal sequence of the derivative (e.g. W2) is derived from that of the base (e.g. W1) but the definition of W2 is not deductible from the meaning of W1. For example, scolariserV ‘send to school’, is formed on the adjective scolaireA ‘educational, of school’, by suffixing -iser (and modulo the /ɛ/ – /a/ variation on the last syllable of the base stem) but the verb does not mean “make smth/smne educational”.
  • A (W1,W2) relation is semantically motivated (motiv-sem) but not formally when the definition of the derived word (e.g. W2) can be computed from the meaning of the base (e.g. W1), but the formal connection between W2 and W1 is either indirect or complex. For instance, scolariserV can be directly defined from the semantic content of écoleN ‘school’ (scolariser a child means to send him/her to school) but there is no direct formal link between W1 and W2.

The relations allow to gather the lexemes of the base into derivational families. Some illustrations here:

  1. “Classical” derivational family:
Family centered on the verb “laver” (wash): “relaver” (rewash), “lavable” (washable), “lavage” (washing), “relavage” (rewashing).

  1. “Unconventional” derivational family:
Family centered on the noun “école” (school): “scolaire” (educational, of school), “écolier” (schoolchild), “scolariser” (send to school), “déscolariser” (unschool, remove from school)

The 81,232 relations in Démonette are distributed as follows, according to their complexity and orientation:

Complexity :

  • 76270 simple relations
  • 116 complex relations
  • 3913 semantically motivated relations
  • 769 formally motivated relations
  • 164 accidental relations

Orientation

  • des2as : 70123
  • indirect : 8345
  • NA : 2687

Samples

Below are samples of entries in the database, corresponding to the main derivational processes currently represented in these relations. These are:

  • Here : V > N and N > V conversion, and verb-based suffixation processes deriving agent nouns ending with -eur, -euse, -rice, action nouns ending with -ion, -ment, -age, adjectives ending with if, -able
  • formation
    • Here : of noun-based adjectives suffixed with -ique, -al, -aire or prefixed with anti-
    • Here : nominal and adjectival diminutive in -et and -ette
    • Here : noun-, verb- and adjective-based verbs prefixed with en-, dé-, or suffixed with -iser
    • Here : nouns derived by suffixation with -aie, -at, -ier