Jun Kawasaki.

Augmenting English with a Japanese-Inspired Agentive Noun System

Cover Image for Augmenting English with a Japanese-Inspired Agentive Noun System
Jun Kawasaki
Jun Kawasaki

Augmenting English with a Japanese-Inspired Agentive Noun System for Actor-Network Theory Modeling

Abstract

Actor–Network Theory (ANT) requires a consistent way to denote the myriad human and non-human actants in a network. However, English's current agent-noun formation system (using suffixes like -er, -or, -ist, etc.) is limited in productivity and semantic nuance. This paper proposes a theoretical and computational solution: an extension of English's morphological space inspired by Japanese agentive noun morphology (e.g. 〜者, 〜手, 〜家). We first discuss ANT's need for systematic actant labeling and analyze how English's existing agentive suffixes lack granularity and regularity. We then introduce a new set of generalized English agent-noun suffixes (e.g. -ôr, -eêr, -îst, -ânt, -êé, -hēad, -shōp, -êrē) modeled after Japanese, each denoting a distinct role type. An algorithmic word-formation model is presented to generate these agent nouns from arbitrary stems, with phonological rules for vowel linking, consonant assimilation, and exceptions. Use cases are explored in natural language processing pipelines (for automatically labeling semantic roles), knowledge graph ontologies (for naming relational nodes), and large language model (LLM) fine-tuning for relational semantics. We discuss limitations of this approach, implications for cross-linguistic morphology, and the potential of a more universal, high-resolution system of actant labeling. The proposed extension aims to enrich English lexicon systematically, enhancing the computational representation of complex networks of actors.

Introduction

In Actor–Network Theory (ANT), actants – the actors in a network – can be literally any entity, human or non-human, that is granted the source of an action . ANT's principle of generalized symmetry holds that all these entities should be described in the same terms, without privileging humans over nonhumans . In practice, this means when modeling or analyzing a sociotechnical network, we need a consistent linguistic scheme to label every actor or actant in the network. Whether the actant is a person, an organism, a machine, or an abstract artifact, it should be identifiable as playing a distinct role in the network's interactions.

However, the English language lacks a productive, systematic system of agentive nouns to uniformly label such actants. English does have agent nouns – words that mean "entity that does X" – typically formed by derivational suffixes (e.g. -er in driver from drive)  . Yet the system is irregular and limited. The suffix -er is the most common and can attach to many verbs (run → runner, build → builder) , but it also overlaps with instruments (cutter, printer) and has exceptions. Other suffixes like -or, -ist, -ian, or Latinate -ant exist, but their usage is not fully productive or consistent  . Many actions have no straightforward agentive noun (for example, steal has thief rather than stealer, teach has teacher but learn has no common learner for a student in everyday use, etc.), or they rely on periphrasis ("one who X"). Moreover, the semantic nuances of these suffixes are coarse – English does not routinely mark the difference between, say, a temporary performer of an action and a professional practitioner, except via context or additional descriptors.

This limitation is not just linguistic trivia; it poses a modeling challenge when we attempt to computationally represent networks of actors. In ANT-inspired data models or knowledge graphs, one might wish to label nodes by their roles ("driver", "regulator", "observer"). Using English's ad-hoc vocabulary can lead to inconsistent labels, polysemous terms, or gaps where no single-word label exists. A more systematic approach to naming actants could improve clarity and uniformity in such representations. For example, if an environmental sensor in a network is acting to trigger an alarm, English might label it simply "sensor" or "detector" (noun forms that are not productively related to the verb "sense" or "detect"). If a policy document is playing a constraining role, we might call it an "enabler" or "restrictor" – but those terms feel less standardized, and one often resorts to phrases ("the document which enables X"). A morphologically regular system could allow any verb or noun to be converted into an agentive label systematically, easing the task of naming actants in algorithms and analyses.

To find inspiration for such a system, we turn to Japanese. The Japanese language exhibits a rich set of agentive noun morphemes (often Sino-Japanese suffixes) that provide fine distinctions in meaning. For instance, Japanese can form agent nouns using 〜者 (-sha, denoting a person who does X in a general sense), 〜手 (-shu/te, denoting someone who does X as a skilled role or occupation), 〜家 (-ka, denoting an expert or professional in X, often artistic or academic), among others. These suffixes are highly productive within their categories and allow nuanced role identification. For example, a "driver" in Japanese could be untenshu (運転手, a driver by occupation) versus untensha (運転者, a driver in the context of someone who is driving at the moment) . The difference is encoded morphologically: 〜手 indicates a permanent or skilled role (a paid driver), whereas 〜者 indicates a situational role (whoever is driving, e.g. in an accident report) . Likewise, an "author" can be sakka (作家, a writer by profession, literally "make家") versus chosha (著者, the author of a specific work) . Japanese even distinguishes certified experts or official roles with suffix 〜士 (-shi, e.g. kaikeishi 会計士 "accountant", implying an accredited status) as opposed to 〜師 (-shi, "master/expert", as in kyōshi 教師 "teacher/instructor", or kangoshi 看護師 "nurse") . In summary, Japanese provides a high-resolution morphology for agent nouns: different suffixes mark nuances of temporality, professionalism, expertise, or social standing of the actor.

In this paper, we propose to augment English with a similar multi-suffix system of agentive nouns. By introducing a set of novel derivational suffixes (marked with diacritics here to distinguish them from existing forms) and corresponding formation rules, we aim to emulate the productive precision of Japanese while fitting into English phonology and usage. We present:

  • Background on ANT's requirements and a comparison of English and Japanese agent-noun systems, highlighting gaps in English (Section 2).
  • A set of new agentive suffixes for English (e.g. -ôr, -eêr, -îst, -ânt, -êé, -hēad, -shōp, -êrē) with defined semantic roles, inspired by Japanese categories (Section 3.1).
  • An algorithmic model for word formation using these suffixes, including orthographic and phonological rules for attaching them to English stems (Section 3.2). This model enables generation of agent-noun variants from arbitrary verbs or nouns (e.g. given a verb "drive", produce labels for "one who drives (generically)", "professional driver", "one who is driven", etc.).
  • Use cases and applications (Section 4) illustrating how this extended morphology can enhance Natural Language Processing (NLP) pipelines (e.g. easier semantic role labeling), knowledge graph modeling (more systematic ontology of roles), and even fine-tuning large language models by providing more explicit relational semantics.
  • Discussion of the limitations of this approach (e.g. learnability, integration with existing vocabulary), cross-linguistic implications (how this idea might transfer to or draw from other languages), and potential steps toward a universal or interlingual system of actant representation (Section 5).

By structuring the paper in this way, we hope to provide a comprehensive view – from linguistic theory to computational implementation – of how a more productive and systematic agent-noun system can be realized in English and why it is beneficial for representing complex networks of actors. We use precise academic English and formal examples throughout, targeting researchers in AI, information technology, and linguistics who are interested in the intersection of language design and knowledge representation.

Background

Actor–Network Theory and the Need for Consistent Actant Labeling

Actor–Network Theory (ANT), developed by Bruno Latour, Michel Callon, John Law, and others, is a framework for describing heterogeneous networks of actors (actants) in both social and technical systems  . In ANT, an actant is defined as anything that "acts or to which activity is granted by others" . Crucially, an actant can be literally any entity – not only people, but also objects, technologies, organisms, or ideas. For example, Latour famously described microbes in Pasteur's experiments as actors in the network of Pasteur's laboratory; the microbes "cause" fermentation and thus have agency in that network . ANT's radical stance is that human and non-human actors are described in the same terms, per the principle of generalized symmetry . Any differences (e.g. that one is human and another is a chemical reagent) are considered secondary to the roles they play in the network of relations .

This theoretical stance creates a practical linguistic challenge: How do we name or label these actants in a uniform way when we describe or model a network? In scholarly ANT literature, researchers often end up using descriptive phrases or borrowed terms to refer to various actants ("the spokesman", "inscription device", "ally", "intermediary", etc.). But when we move to computational representations – such as encoding an ANT analysis in a knowledge graph or building a simulation – we would ideally assign each actant a clear label indicating its role or action. A consistent labeling scheme would make it easier to map and analyze the network: for example, one might want to label nodes as "regulator", "mediator", "initiator", "receiver", etc., rather than just using the object's name or an ambiguous noun.

Current practice in knowledge representation often falls back on generic terms or proper nouns. In ontology engineering or semantic web data, one might encode roles with phrases (e.g., an ontology might have properties like hasSender and hasReceiver for a communication act). These labels are ad-hoc and the pattern is limited – typically only a few roles (sender/receiver, buyer/seller, etc.) get special names, often borrowed from legal or technical jargon. Furthermore, these terms are not derived systematically from the base action; instead, each is a separate lexical item. For instance, an ontology of a delivery process might have classes like Deliverer and Recipient, or Buyer and Vendor, relying on a mix of derivational patterns (-er) and unrelated words. The lack of a single coherent system means that adding a new type of action often requires inventing or searching for a new noun form, which can lead to inconsistency.

A motivating example from ANT could be the analysis of a scientific laboratory: human scientists, technical instruments, reagents, and written articles are all actants. The scientists act as experimenters, instruments as detectors or measurers, reagents as transformers (e.g. causing reactions), and articles as influencers of other scientists. If we were to computationally model this, it would help if we could systematically name each actant by its function: e.g., mixôr for something that mixes substances, reactânt for something participating in a reaction, readêé for something being read (like a guidebook used by the scientists), and so on. Without a productive system, we might manually coin terms or simply label them by their object name ("Centrifuge_01" with a property "type: instrument"), losing the generalizable notion that it's a centrifuger (something that centrifuges). A systematic agent-noun morphology for English aims to fill this gap by providing a consistent way to derive role labels for any actant.

Limitations of English Agent-Noun Formation

English does have mechanisms to form agent nouns (nouns meaning "one who does X"), but these mechanisms are fragmentary and often non-systematic. The most common agentive suffix in English is -er, inherited from Old English (OE -ere) and productive to this day . We add -er to many verbs to indicate the person or thing that performs the action: driver (one who drives), builder (one who builds), scanner (something that scans). This suffix is relatively productive for a wide range of base verbs, especially Germanic-root verbs. However, -er has notable limitations: - Polysemy: -er forms can denote not only persons but also instruments or machines (e.g. printer can mean a machine that prints , blender a device that blends). Context is needed to know if a cutter is a person using a knife or the tool itself . There is no morphological distinction in English between an agentive and an instrumental role – both are "the thing that does X". In contrast, some languages (like Polish or Arabic) use different morphemes for human agent vs tool  . This conflation reduces semantic precision in English actant labeling. - Limited -er scope: -er typically attaches to verb stems, but it cannot attach to just any noun or adjective to mean "person associated with X" (that role is sometimes filled by other suffixes or compounding). For instance, library + -er does not yield a person (librarian fills that role via -ian). Similarly, to get "person who enjoys X" we often use compounds (coffee lover, music fan) rather than X-er (a coffeeer is not a word). Thus, -er covers many "doer of action" cases but not all actor relationships. - Competing Suffixes: English has several Latin-origin agentive suffixes that overlap in function with -er, such as -or, -ist, -ant, and -ian . These usually derive from Latin or French introductions and have more restricted usage: - -or: Appears in words like actor, creator, governor. It often attaches to Latinate stems (e.g. investigator from investigate) or is chosen by convention in certain cases (advisor/adviser can be spelled either way). -or is not truly productive in modern colloquial English; one wouldn't coin walkor for a person who walks. It survives in legal or formal contexts (e.g. lessor = one who leases out, parallel to lessee) . - -ist: Used for persons associated with a field or ideology (e.g. scientist, artist, Marxist) or who practice a skill (pianist, cyclist). It often attaches to nouns or bound roots (science → scientist, art → artist) rather than directly to verbs. -ist has a narrower scope – it implies specialization or adherence. One might say "violinist" for a specialist violin player, but a casual player is just a "violin player" (with a noun-noun phrase). The -ist suffix is productive in certain domains (especially sciences, arts, beliefs) but you wouldn't use it for arbitrary new actions (no one says "cookist" for a cook). - -ian / -an: Denotes affiliation or occupation in some cases (e.g. librarian, historian, technician). Often it attaches to nouns to mean "person of X" (country, group, or field), and sometimes to verbs (though usually via a Latin form: music (noun) -> musician, from Latin musica + -ian). It's not freely productive for new coinages outside certain patterns. - -ant / -ent: Found in words like assistant, servant, student, agent, inhabitant. These mostly come from French present participles or Latin agentive forms. English speakers do not generally create new -ant agent words; the form is tied to specific verbs (we say attendee for one who attends, not attendant in that sense because attendant already means something else). In general, -ant has been lexicalized in specific terms rather than being an active suffix today. - Passive Role Suffix: English has a unique suffix -ee, originally borrowed from French -é (past participle) to indicate a person who is the recipient or beneficiary of an action  . For example, employee (one who is employed), addressee (one who is addressed), lessee (one who is leased to). In legal terminology, -or and -ee form complementary pairs for agent and patient of transactions (grantor/grantee, lessor/lessee, consignor/consignee) . Historically, in Middle English, -or/-ee pairs were a systematic way to denote actors in legal actions . Outside of formal contexts, -ee was later generalized and even used humorously: e.g. spender/spendee (someone who is spent on) as nonce words . Over time, -ee also came to be used for some active meanings (an escapee is one who escapes, not one who is escaped from) . The use of -ee in modern English is thus somewhat inconsistent – it usually marks the passive or experiencer role, but in words like attendee or standee it marks an active participant (albeit one whose role is defined relative to the main action, e.g. attendee = one who attends an event, conceived as the passive audience of the event) . The key point is that -ee is not fully productive either; it's used in certain contexts (notably, you can often add -ee jocularly for "one who is X-ed", like "glitterati and the glittered-upon (the glitteree)", but many such forms remain nonstandard).

The net effect of these limitations is that English lacks a single coherent paradigm for generating agentive nouns with fine-grained distinctions. The speaker or writer must choose between -er, -or, -ist, etc., often based on convention rather than clear rules, and some nuances cannot be expressed morphologically at all. For example, if we take the verb "drive": - A person who drives (in general) is a driver (suffix -er). - A person who drives professionally (e.g. a chauffeur or truck driver) is still driver (no special suffix to mark profession; one must say "professional driver" or use a different word like chauffeur). - The person being driven has no simple suffix-derived term (we might say passenger or rider, but those come from different roots). - The leader of a group of drivers (say, the dispatch manager) isn't indicated by suffix at all (one would say "head driver" or "lead driver" as a phrase). - If we even consider an entity that provides driving service (like a ride-share company acting as an agent), English has no agentive noun – we'd call it a service provider, not driveor or similar.

In contrast, other languages often allow more flexibility. As noted, Japanese and some inflected languages can derive multiple nouns from the same root to cover these angles. Even in English's own history, we see hints of what a richer system might look like: for instance, Old English and Middle English had -ster (as in spinster, originally female spinner), which was a gender-marked agent suffix; they also freely coined -er/-ee pairs in legal language . But Modern English has since lost -ster as a productive suffix (and the word spinster changed meaning entirely), and the -or/-ee pairing is confined to formal registers. When new technologies or actions arise, we either use -er (e.g. coder, tweaker) or repurpose existing nouns (guru, ninja, junkie for enthusiasts), or create compounds (cat lover, mountain climber). These workarounds illustrate the lack of a systematic, extensible agent-noun toolkit in English.

From the perspective of modeling ANT actants, this is a bottleneck. Every time we need to label a novel actant (especially non-human ones), we either: - Coin a new term (which might not be transparent or standard), - Use multi-word labels (sacrificing the elegance of a single lexical item), or - Overload an existing term (which can cause ambiguity).

For instance, in a network, both a human operator and an automated script could be actors performing a "monitoring" task. If we only have "monitor" as a noun (which is also a device or a screen) or "observer", we might label the human as monitor and the script also as monitor, and an analytic system might not distinguish the two except via additional metadata. A more powerful derivational system could let us label one as a monitorôr (human monitor, perhaps) and the other as monitorbot (if we treat -bot as another suffix for automated agents, hypothetically). The focus of this paper, however, is on a systematic suffixal approach, drawing inspiration from Japanese, which we elaborate next.

High-Resolution Agentive Morphology in Japanese

Japanese offers a compelling example of how a language can have multiple productive agentive noun forms to express subtle differences in role. Rather than a one-size-fits-all suffix, Japanese speakers choose from several suffixes (often written in kanji) depending on the nuance. Key agentive suffixes in Japanese include: 者 (しゃ, -sha), 手 (しゅ or て, -shu/te), 家 (か, -ka), 員 (いん, -in), 士 (し, -shi), 師 (し, -shi), among others. Each has a particular semantic or stylistic sphere of usage  . We summarize a few important ones: - 〜者 (-sha) – General doer/actant: This suffix, meaning "person" or "one who…" in a generic sense, indicates an entity (usually a person, but not necessarily marked by profession) performing an action or being in a state. It is very versatile and often attaches to Sino-Japanese verbal nouns or adjective stems. For example, 運転者 (untensha) means "driver" in the sense of "the person who is (currently) driving" . 被害者 (higaisha) means "victim" (literally "sufferer of damage"), using 者 to denote the person experiencing an action. In an ANT context, 者 is akin to a neutral label for someone who occupies a role at a given time. It does not imply professionalism or permanence. Indeed, one explanation contrasts 者 vs 手 by saying 者 can refer to a temporary or contextual role . The baseball example from a Japanese source: the batter in a game is 打者 (dasha, "hit-者") because each batter is a role that changes every at-bat, whereas the pitcher is 投手 (tōshu, "pitch-手") because the pitcher is a fixed position for the game . This nuance is subtle but important: 者 = actant in context, standpoint, or role (often ephemeral) . - 〜手 (-shu/-te) – Skill-based agent / professional: The character 手 means "hand", and by extension "person skilled with their hands in …" or someone who does something as a vocation. It often indicates a person who is a specialist or practitioner in a certain activity. 運転手 (untenshu) is "driver" as an occupation (e.g. a taxi driver) . 歌手 (kashu) is "singer" (literally "song-hand") implying a professional singer  . 手 is also seen in words like 相手 (aite, literally "mutual hand") meaning an opponent or partner in an interaction – again focusing on the person's active engagement in an activity. Thus, 手 = agent who is skilled or actively engaged in X, often as their role or job . The nuance difference between 手 and 者 is highlighted by the example: 運転手 vs 運転者 for driver . The former implies a professional driver (e.g. a chauffeur or someone whose living is driving), the latter just means whoever is driving at the moment (like "the driver of the car" in an accident report, who might normally be a student or a teacher by profession) . - 〜家 (-ka) – Expert or specialist, often artistic/academic: The character 家 (literally "house") in this usage means someone who occupies a certain realm or pursues something as a career or serious endeavor – essentially, a professional or virtuoso. It's frequently used for artists, scholars, or people deeply involved in a field. For example, 漫画家 (mangaka) means "manga artist" (comic creator), 写真家 (shashinka) is "photographer" (art photographer, not just someone who takes photos), 音楽家 (ongakuka) is "musician" (emphasizing the creative profession). 作家 (sakka) means "writer/novelist" (one who writes books for a living), whereas 著者 (chosha) means "author (of a specific work)" – here 作家 (with 家) implies a professional identity, while 著者 (with 者) is tied to the context of a particular book . In sum, 家 = specialist/professional in X, often with a connotation of creativity or expertise . This is analogous to English -ist in some cases (e.g. pianist could be translated as ピアニスト or ピアノ家 if such existed, but Japanese actually uses the loanword ピアニスト for pianist; however, 音楽家 covers being a music professional broadly). The use of 家 tends to elevate the status: e.g. 政治家 (seijika) means "politician" (one who follows the profession of politics) rather than just someone political. - 〜員 (-in) – Member of a group / staff: This suffix means a member or personnel in some institution or grouping. E.g. 会社員 (kaishain) "company employee", 店員 (ten'in) "store clerk" (staff), 議員 (giin) "assembly member (legislator)". While not exactly about "one who does X" from a verb, it labels someone by their institutional role. It's productive for roles in organizations. We mention it because in a network context, membership can be a role (though -in attaches to nouns usually, not verbs). - 〜士 (-shi) – Licensed or prestigious practitioner: 士 originally means "warrior" or "gentleman" (samurai), and in modern compounds it often indicates a person with a certain certification or professional standing . For example, 弁護士 (bengoshi) "lawyer" (literally "argue-master", but using 士 marks a certain official status), 会計士 (kaikeishi) "certified public accountant" . It's used for roles like 消防士 (firefighter), 建築士 (architect - certified). The nuance is of an accredited or formally recognized role. In some cases Japanese had to choose between 士 and 師 (below) for gender-neutral titles, but generally 士 carries prestige or legality. - 〜師 (-shi) – Master, teacher, or practitioner (often without formal rank): 師 means "master" or "teacher". It's used in 教師 (kyōshi) "teacher" (as an occupation) , 医師 (ishi) "medical doctor" (lit. "medical master"), 美容師 (biyōshi) "beautician/hairdresser". It implies expertise and often teaching or guiding capacity. Unlike 士, it doesn't inherently imply state certification (though many professions with 師 are indeed licensed, like doctor or teacher; the distinction is subtle and sometimes historical). In the context described by the Japanese source, 師 vs 士 sometimes overlapped and modern usage settled on one or the other for various professions (e.g. nurse became 看護師 replacing older gendered terms) .

This array of suffixes allows Japanese speakers to create specific agent nouns as needed. If a new action or role comes up, there is a palette of choices. For example, consider the concept of "someone who explains". Japanese has 説明者 (setsumeisha) for "explainer" in the sense of a person who is explaining something in a given moment (者, generic), whereas a professional explainer (like a guide or someone whose job is to explain, perhaps a commentator) might be called 説明員 (if part of an organization) or some context-specific term. If explaining were an art, 説明家 could be coined to mean a person who is a virtuoso at explaining (though that would be rare). The point is, Japanese could adapt one of these morphemes to fit the context of "explainer" with a certain shade of meaning, rather than having just one option "explainer" for all cases.

From an ANT modeling perspective, Japanese's approach means one could label actants in different ways: a human participant might get a 者-suffix label if they are a transient actor in that scenario, whereas a non-human acting in a reliable, skill-like capacity (say an AI system that drives a process) might even be given a 手 label analogically to mean it's the "hand" executing a task. Indeed, in Japanese one sometimes sees anthropomorphic use of these terms for machines (e.g. a vending machine might be called 自動販売機, literally "automatic selling machine," but if casually personified, one might call it 売り手 "seller (hand)" in context).

The key takeaway for our work is that Japanese provides a model of multiple agentive noun types, each systematically derived, which collectively cover a spectrum of roles: - Temporary role vs. persistent role (者 vs. 手/家). - Amateur or situational vs. professional or expert (者 vs. 手/家/師). - Unqualified vs. certified (師 vs. 士, in some fields). - Member of a group (員) as a separate notion.

English, in comparison, tends to flatten many of these distinctions. A single English term "teacher" could correspond to 教師 (occupation), 先生 (honorific for teacher), 指導者 (shidōsha, guide/leader using 者), or even 教員 (school staff) depending on nuance. When modeling actants, having these distinctions explicitly available could enrich the representation. It might allow an algorithm to distinguish whether an actor is performing an action as a one-off participant or as a representative of an institutional role, simply by the label's morphology.

Our goal is to bring some of this granularity into English in a systematic way. Instead of relying on entirely different words or multi-word descriptions for these nuances, we propose creating new derivational suffixes that English speakers (or NLP systems) could append to a base word to convey the desired agentive meaning. In the next section, we will introduce these proposed suffixes and describe their intended meanings and correspondences to the Japanese system. We will then detail how to generate new words with them while respecting English phonology and spelling.

Proposed Extension: A System of Productive English Agentive Suffixes

Inspired by the Japanese framework above, we propose a set of eight new agentive noun suffixes for English. Each suffix is designed to be highly productive (able to attach to essentially any verb or relevant noun) and to carry a specific semantic nuance corresponding to a role type. Table 1 summarizes the suffixes, their phonological form (using diacritics here for clarity), their Japanese inspiration or analogous concept, and their intended meaning in English use.

Table 1. Proposed Agentive Suffixes for English and their Semantics

Suffix Japanese Analog Intended Meaning in English Example Formation (base "X") -ôr ~者 (-sha) General doer/actant (contextual, temporary) Xôr = entity that does X (in context) -eêr ~手 (-shu/te) Skilled doer or professional in X Xeêr = practitioner of X (by trade) -îst ~家 (-ka) Expert or devotee of X (specialist/enthusiast) Xîst = specialist or serious player of X -ânt (various: ~員, participle) Participant in or performer of X (formal role or participant) Xânt = one involved in X (often in group/event) -êé (none exact; cf. -ee) Recipient or undergoer of X (patient role) Xêé = one who is X-ed (object of action) -hēad ~長 (-chō, "chief") Leader/head of X or of those who do X Xhēad = leader of X-doers or of X domain -shōp ~屋 (-ya, "seller") Dealer or provider related to X (commerce/service role) Xshōp = one who trades in X or services X -êrē (no direct analog; cf. Latin -ary) Person characterized by X or associated formally with X Xêrē = functionary/agent in realm of X

A few notes on these forms and their names: - We use diacritics on the vowels primarily to indicate that these suffixes are novel and distinguished from any existing English suffix. In an actual implementation or usage, they might be rendered without diacritics, but here -ôr can be thought of as a variant of -or, -eêr as a variant of -eer, etc., generalized beyond their current English usages. The diacritics also hint at pronunciation: for instance, -ôr could be pronounced with a long "or" sound /ɔː/ (to differentiate from unstressed /-ər/ of -er), -eêr with a clear "ear" /iːr/ sound, and -îst like -ist (the circumflex just marking it as a concept here). - Some of these suffixes have a basis in existing English or French affixes: - -eer (without diacritics) exists in words like engineer, auctioneer, mountaineer, often meaning "person engaged in X" or sometimes "person concerned with X" (e.g. racketeer). Our -eêr is meant to generalize that notion to any activity X, akin to Japanese 手. - -ist is already common; -îst doesn't change pronunciation but signals our intention to use -ist more freely for any domain where someone can be an expert or devotee. We include it as part of the system for completeness, effectively extending its productivity. - -ant is seen in formal words (participant, assistant). Our -ânt would be an explicitly productive suffix to indicate an actor in the context of an event or process (like Japanese 員 or the idea of someone who takes part). - -ee is the English patient suffix. -êé in our system is essentially the same concept, but we notate it distinctly to emphasize its role as one option among agentive roles (even though it marks the opposite side of the action). We consider it part of actant labeling – not all actants are initiators; some are relevant as receivers of an action. (We will ensure -êé is only used in appropriate contexts to avoid confusion with existing -ee words or the divergent uses of -ee for active meanings.) - -head as an English word means leader or chief (as in department head, or compounds like gearhead as slang for an enthusiast). -hēad as a suffix would attach directly to a stem to denote "leader of X" or "head (chief) X-doer". This draws on the concept of 長 (-chō in Japanese, e.g. 部長 "department chief") but using an English morpheme. - -shop in English is a noun (store) or part of compounds (workshop, bookshop). We repurpose it as -shōp to signify a person associated with a shop or trade. This is inspired by the Japanese suffix 屋 (-ya), which literally means "shop" but is used to indicate someone who sells or deals in something (e.g. 魚屋 "fish seller"). So bookshōp would mean "bookseller" (person, not the store itself), coffee-shōp might mean a coffee vendor (though coffeeshop is a place in normal English). - -êrē has no direct existing equivalent; it is patterned to resemble certain Latinate endings (like -ary or -ator in functionary, senator, dignitary). We use -êrē to capture roles that are not strictly doing or receiving an action, but are defined by a relation to the action or domain. For instance, someone who is involved in X in a general sense or whose identity is tied to X. This could cover things like missionêrē for "missionary" (one associated with a mission, essentially adopting the -ary meaning), or visionêrē for "visionary" (one characterized by vision). We include it to allow formation of terms that might correspond to where Japanese would use something like 家 or 者 depending on context, but the nuance is "one characterized by or belonging to X."

The guiding design principle is that each suffix corresponds to a role archetype: - -ôr: the basic, neutral agent. If one were unsure which nuance to pick, -ôr would be the default "one who does X." It mirrors the broad usage of Japanese 者 . We choose -or (circumflexed) rather than -er as the form because English -er is already ubiquitous and could remain as a generic agent former. However, -er often implies a habitual or professional aspect in English (e.g. a writer is usually someone who writes regularly). Our -ôr is meant to be even more generic and transient: Xôr = "an entity that is doing X (right now or in this context)." For example, leakôr would mean "something that leaks or is leaking (in this scenario)" If a dam is leaking, the crack could be labeled leakôr as an actant causing the leak. Another example: interruptôr = the one who interrupts (in a conversation, could be used even for a noise or alarm that caused an interruption, not just a person). - -eêr: the occupational or skill-based agent. This corresponds to Japanese 手 . Xeêr = "doer of X as a specialized activity." Use this when the actant's role is to perform X as a service, job, or skilled function. For a verb like "drive," driveeêr would denote "driver by profession or assignment," akin to Japanese 運転手 (professional driver) . For "write," writeeêr might be used to mean a writer in the sense of a scribe or someone hired to write (though English "writer" is already common; our system could produce writeeêr to explicitly mean someone doing writing as their job as opposed to just anyone who writes). Hackeêr could be coined for "hacker" (one who hacks skillfully) – currently "hacker" exists but has ambiguous connotations; hackeêr would fit our scheme as a person whose role is hacking (e.g. an ethical hacker hired to test security). The -eer sound in English often has a slightly active/doer connotation with perhaps a touch of labor or craft (think engineer, originally engin-eer, one who operates an engine or contrives engines). That fits well with this intended meaning. - -îst: the expert or enthusiast. This is basically extending -ist as in English. Xîst = a person deeply involved in X, either as an expert practitioner or a devotee of X. This maps to Japanese 家 in many cases , but also covers some -ist uses we already have (like scientist, artist). For example, gardîst could be coined for "gardener" (one who gardens as a serious hobby or art – although we have gardener with -er; gardenist might sound like someone interested in garden design or theory). Or cloudîst might be humorously "a cloud-watching enthusiast." We include -îst to ensure that roles involving personal inclination or ideology (which Japanese might handle by specialist terms or compounds) are representable. It carries a more permanent or defining flavor: if someone is an Xîst, X is likely central to their identity or profession (compare artîst vs paintôr, if we had the latter: an artîst does art as an identity, a paintôr would just be someone who happens to be painting something at the moment). - -ânt: the participant or affiliated agent. This suffix is intended for someone who takes part in or is associated with an action or process, without necessarily leading it or doing it habitually. It aligns loosely with words like attendant, participant, claimant, etc., and with Japanese 員 (member) in spirit (though etymologically -ant comes from Latin present participles meaning "-ing"). Xânt could label someone who is involved in X in a supporting or contextual way. For instance, in a knowledge graph of a medical trial, you might label the trial subjects as testânts (those who undergo a test, analogous to participant but derived from "test"). If the base is "celebrate", celebrânt would mean someone who is celebrating or taking part in a celebration (indeed celebrant is a word in English for the person who conducts a religious ceremony, which shows such forms can exist). We make it fully productive: Xânt = any person engaged in X (especially when X is an event or social process). It's a flexible middle ground between -ôr (the doer who initiates) and -êé (the one who is affected). For example, in a transaction, a payânt could be either a payer or payee – basically someone participating in the pay process, but that is a bit abstract. More concretely, protestânt (one who protests – and indeed Protestant is a historical term from "protestantus", showing again precedent, though it became a proper noun for a religious group). The existence of words like assistant, lieutenant, adjunct (from Latin adjunctus though) suggests English ears are not unfamiliar with -ant in roles. - -êé: the undergoer or receiver. This directly parallels the English -ee suffix usage  . We include it in our system to cover the patient side of actions systematically. In many interactions, modeling requires labeling not just the actor but the acted-upon. For example, consider a simple network: a giver and a receiver of some resource. English would label the giver as giver (agent) and receiver as recipient or receiver. With our scheme, if base is "give," givôr would be the giver, givêé the one given to. Indeed, donor/donee and lessor/lessee exist in legal contexts  – we generalize that pattern. So for any verb X, Xêé = one who is the target of X. Some examples: - teachêé = one who is taught (i.e. a student from the perspective of being taught by someone). We might alternatively say learnêr for one who learns, but teachêé explicitly frames them as the recipient of teaching. English uses trainee (one who is trained) similarly, and employee (one who is employed) , so there is precedent. - inspectêé = the entity being inspected (perhaps a facility under inspection). - feedêé = the creature being fed (like how we use feedee informally in animal care contexts). One must be careful: as noted, English -ee sometimes has been extended to active meanings (e.g. escapee, returnee ). In our system, we would prefer to keep -êé for the patient role, to maintain clarity, and use -ôr, -ânt, etc., for active roles. If needed, one could form both sides: e.g. escapeôr (the one who escapes something, i.e. an escapee in plain English usage) vs escapeêé (the one who is escaped from – not common, but imagine "the prison is the escapee's escapeêé" which is weird, so likely not all verbs need a -ee form). In networks, -êé will be most useful for clearly binary relations like teacher/student, giver/recipient, sender/addressee, etc. - -hēad: the leader or principal actor. Xhēad = "head of X" or "chief X-er". This corresponds to roles of leadership or primacy. For example, researchhēad could mean the lead researcher (principle investigator) in a project. drivehēad might refer to a lead driver (imagine a convoy's lead vehicle driver). It could also attach to a noun that is a group: statehēad (head of state), though we already have "head of state" as a phrase. Essentially, -hēad turns an action or domain into the person at the top of it. It's inspired by Japanese 長 (-chō) which is used as a suffix meaning chief/director (e.g. 課長 kajō "section chief"). English currently handles this by either putting "head" in front (head coach) or after as a separate word (team head). By suffixing it, we allow single-word titles: teamhēad, councilhēad, missionhēad, etc. This can streamline labels in data (no spaces or compounds necessary). It also resonates with some existing compounds: figurehead (nominally a leader), pothead/gearhead (slang, but meaning someone whose head is "full of" pot or gear – slightly different usage). In our usage it's strictly hierarchical: the Xhēad is the leader among those who do or are involved in X. - -shōp: the trader or supplier role. Xshōp = "person who sells/deals in X" or "provider of X". This fills a gap for commercial or service roles. English often uses -seller, -dealer, or agent nouns of specific verbs (like vendor from vend, merchant from Latin mercari). With -shōp, any product or service noun can be turned into a person who provides it. If foodshōp were a word, it could mean restauranteur or food vendor. Bookshōp would mean bookseller (compare Japanese 本屋 hon-ya, which is both "bookstore" and by extension "bookseller"). hackshōp (from "hack", though hack as noun is odd) might metaphorically mean someone who sells hacks – maybe not apt. But consider rideshōp: with ride-sharing, the driver is essentially selling rides, so one might humorously call them a rideshōp under this system. This suffix finds its closest Japanese analog in 屋 (-ya), as mentioned, where e.g. 八百屋 (yaoya) "vegetable-seller" (literally 800-store, idiomatic) is a person, and 魚屋 (sakanaya) "fishmonger". We use the English word "shop" to evoke that idea. It's a bit novel to stick "-shop" onto a word to mean a person, but we already have occupations like pawnshop as a place; making pawnshōp the person running it could be intuitive in context. - -êrē: the associated person (functionary or adherent). This suffix is somewhat abstract in our design. It's meant for roles that are defined by a relationship to an activity or thing, rather than by performing it directly or receiving it. For instance, consider someone who is a beneficiary of a process not as a patient but as an interested party. Or someone who occupies a role of X-er by title even if they might not actively do X all the time. One possible use is to create terms that parallel words like secretary, emissary, dignitary. If we had a verb "govern", governêrē could be intended to mean "functionary of governance" (which might align with functionary itself or legislator etc., though those have Latin roots). The idea is -êrē could denote an agent noun of affiliation. For example: - libraryêrē: someone associated with a library (not exactly a librarian, since we have that, but perhaps a member or patron? This might be stretching). - missionêrē: one engaged in a mission – indeed missionary is a word (from mission + -ary) for someone who is on a religious mission. Under our scheme missionêrē would cover that concept. - visionêrē: someone driven by a vision (we have visionary meaning a person with vision/ideas). Essentially, -êrē maps to the suffix "-ary" in many English words which derive from Latin -arius. It often indicates a person connected to or concerned with something (not always the direct doer). We wanted to include it to handle cases where neither -ôr (direct doer) nor -eêr (skilled doer) nor -îst (expert) exactly fit – for example, someone who advocates or is devoted to a cause might be causeêrē in our system (where Japanese might use 者 or 家 or create a compound like 運動家 for "movement activist"). Causeîst might imply ideologue (like communist), whereas causeêrē would simply mean a person associated with the cause (maybe an activist). There is flexibility and some overlap; context would determine the best choice.

Each suffix is intended to be used when its specific nuance is relevant. In many cases, multiple suffixes could apply to the same base but yield different meanings. This is a feature, not a bug – it allows us to generate a family of related terms around a single concept, just as Japanese does. For example, take the base concept "teach": - teachôr: an actant who is teaching (could be anyone imparting knowledge in a given moment). - teacheêr: a professional teacher (educator by trade; analogous to 教師 or instructor). - teachîst: perhaps an education expert or pedagogue (one who deeply believes in teaching methods – not a common notion, but if someone were an evangelist for teaching as a practice). - teachânt: a participant in teaching, maybe an assistant teacher or someone involved in a teaching event (or even a practice student-teacher). - teachêé: one who is taught – the student (like 教え子 in Japanese, though that's a compound meaning "taught-child"). - teachhēad: head teacher (principal) – effectively an equivalent of "headmaster" or "lead instructor". - teachshōp: one who provides teaching as a service, e.g. a tutor-for-hire or operator of a teaching business (a tutoring company head might be labeled a teachshōp since they "sell" teaching). - teachêrē: someone associated with teaching in a general sense – maybe a member of an educational board or a didactic person. (This one is a bit vague; teachary (teacherly person) is not standard, but our system could coin teachêrē if needed to label, say, an education official or an advocate for teaching methods.)

Not every combination will be meaningful in practice, but the system allows them to be formed if needed, which is important for generative completeness in NLP applications. Rather than hard-coding a few possibilities, an algorithm could produce all and then perhaps filter by context or frequency.

Morphophonological Rules for Suffix Attachment

To make these suffixes truly usable, we must define how they attach to base words. English spelling and pronunciation can be tricky when adding suffixes; for instance, -er vs -or usage often depends on etymology, and adding -ist sometimes triggers stress shifts (PHOTOgraph -> photoGRAph-ist). Our goal is to keep the formation rules as regular and simple as possible. The suffixes are designed to mostly be added with a hyphen-like simplicity, but we account for a few English orthographic conventions: 1. Base Selection: Typically, we attach these suffixes to the stem of the verb (or noun) that represents the action or domain. For most verbs, the stem is the infinitive/base form (e.g. drive, teach, govern). For nouns, it could be the noun itself (e.g. cloud for cloud watcher, music for musician-equivalent). If the word already ends in a typical English agent suffix (-er, -or, -ist), we might prefer the root to avoid redundancy (e.g. we would attach to investigate rather than investigator if we wanted an extended form; but realistically, one might not need to extend an already agentive word). 2. Silent "e" handling: If the base ends in a silent -e, and the suffix begins with a vowel sound, we drop the e to avoid an awkward spelling. This is akin to the rule for adding -ing or -able (e.g. arrive -> arriving, arrange -> arrangement). For example: - drive + -ôr → drivôr (drop e). However, for clarity, we might retain e in spelling if it helps indicate pronunciation or avoid confusion with a real word. Drivor vs Driver distinction might be better shown as drivor (since we expect -or to be pronounced /ɔːr/). But dropping e could make drivor look like an uncommon but plausible word. We lean towards: if base+"or" would look like an existing word or a misspelling, we can use a diacritic spelling to cue the difference. Since we're anyway using diacritics in this paper, drivôr is unambiguous. - create + -ânt → creânt (instead of "createant" which looks odd). Actually creant is a word root meaning "believer" (in miscreant), but that aside, dropping e yields creant. The pronunciation would be /kree-ant/. This is fine. - hire + -êé → hirêé (we drop the e in hire, get hirêé, meaning one who is hired; interestingly hiree as a nonstandard term is sometimes used informally for someone who was hired). If the silent e is there to soften a consonant (like notice -> noticer retains e to keep c as /s/), we should keep it. But since our suffixes mostly start with vowels (except -hēad, -shōp), similar logic applies as with -er suffixation in English: - If base ends in -ce or -ge, we often keep e before -or/-er to maintain the soft sound (e.g. noticeable, outrageous keep e before certain suffixes). For -ôr or -ânt, likely we keep e: manage + -ânt -> manageânt (maybe simplify to managant? But that would be read with hard g /g/; manageant with e would preserve /dʒ/ sound). So maybe manageânt is spelled with e to yield /ˈmanəˌdʒænt/. Because these are novel words, it might not matter as long as one defines it, but consistency with phonology is good. - We can include a rule: If base ends in a consonant + e where the consonant is c/g and needs soft pronunciation, keep the e. Otherwise drop silent e when suffix starts with a vowel. 3. Consonant Doubling: In English, when adding a suffix that starts with a vowel to a base that ends in a stressed short vowel + consonant, we typically double the consonant to preserve the short vowel sound (e.g. run -> runner with nn, commit -> committal, big -> biggest). We can adopt a similar rule: - If the base is a single syllable ending in one vowel + one consonant (e.g. "cut"), or a multi-syllable word ending in one vowel + one consonant with stress on the final syllable (e.g. "omit"), then double the final consonant before adding the suffix. - Example: cut + -êrē → cuttêrē (meaning someone associated with cutting, maybe a tailor or butcher? Hypothetical). We double t to indicate /ˈkʌtˌɛəriː/ rather than /ˈkjuːtɛəriː/. - commit + -ôr → committôr (though "commitor" is not common, doubling helps indicate the stress maybe). - However, doubling in these novel forms might confuse readers if they think of existing words. We could also choose not to double for simplicity and just rely on the base's standard spelling since these are not words people are accustomed to. But as a system, it's better to follow English orthographic conventions for readability. 4. Y to I change: If a base ends in -y preceded by a consonant, English often changes y to i before adding suffix (e.g. victory -> victori-ous, try -> tried). For derivational like -er, sometimes we keep Y (e.g. party -> partygoer, copy -> copier where we do change y to i; sky -> skyward keeps y). For simplicity: - We can change final -y to -i- if the suffix starts with anything except i. This prevents yy or preserves pronunciation. - Example: try + -ôr → triôr (sounds like /tri:ɔr/, meaning one who tries). But maybe we'd prefer tryôr to clearly relate to try? Actually, "trior" is a real but obscure word (an assayer or judge). Regardless, trîor could be our representation. - hungry + -îst (hungryist?) Not likely to be used, but if a base is an adjective we might not usually attach these; mostly verbs/nouns. If noun like "mystery" -> mysteriânt for participant in a mystery? Possibly change y to i: mysteriant. - ally (verb "to ally") + -êé could yield alliêé (meaning one who is allied, an "allyee" so to speak). Changing y to i looks better: alliêé. However, if Y is preceded by a vowel (e.g. "play"), we usually keep it (play + er -> player). So play + -eêr -> playeêr (two e's might merge to one in pronunciation but we'd write it playeêr perhaps, which looks odd; maybe playêr with accent to indicate the elongated sound? Or don't change at all: playeer). Actually, need a strategy for when base ends in a vowel and suffix begins in a vowel: often we'd insert a consonant like -r- or -y- in some languages (not in English though, English just runs vowels together or uses a hyphen). For example: ski + -ôr -> skiôr (which might be pronounced /ski.or/ or /ˈskiːɔr/). It's manageable. skiêé for one who is skied (like a person being towed on skis?) is very rare scenario, but ski base plus a vowel suffix doesn't need change; maybe just keep as skiee with some marker, or skiyée? That might be overkill. 5. Linking vowel or consonant: Japanese sometimes uses a connecting element (like -o- in Sino-Japanese compounds). English generally does not in derivation, except in classical compounds (-o- in Greco-Latin compounds, e.g. "microscope" micro + scope). For our system, we aim not to introduce new linking sounds unless absolutely necessary for euphony or to avoid confusion. Possibly: - If a base ends in the exact same vowel that a suffix begins with, or similar, we might add a hyphen or adjust to avoid merging. E.g. base "audio" + -ist (if we had it) normally yields "audioist" which is okay. Base "vita" + -ist -> "vitaist"? Possibly fine. - If base ends with a consonant and suffix begins with a consonant (like -hēad or -shōp), we might add a filler vowel for pronunciation if needed. For example, risk + -shōp: riskshōp is a tongue-twister; we might insert an "e" -> riske-shōp or more naturally, use the existing term risk broker. But if strictly using our scheme, we could say the rule: if the combination is hard to pronounce, optionally insert -e- or -o- between. Perhaps default to -e-. Many English words use -o- linking (speedometer, linguo-centric) for classical reasons, but -e- is seen in words like speedometer has no e, rifleman has no linking vowel. Actually English usually just toughs it out (riskshēad, if that were a word, you'd just add a slight vowel in speech but not in writing). - For comprehension, we might in writing use a hyphen if necessary (like risk-shōp) as a last resort. But since the goal is one-word labels, I'd prefer internal adjustments. 6. Stress and pronunciation: The suffixes -ôr, -eêr, -ânt, -êé, -hēad, -shōp, -êrē will each carry a particular stress pattern: - -ôr, -eêr, -ânt, -êrē would likely be stressed or at least secondary-stressed syllables, similar to how -or and -ant are typically unstressed in English (e.g. ACT-or (primary stress on act, secondary or none on or), PARTI-ci-pant). But since we often put stress earlier, maybe the base keeps the stress and the suffix is lightly stressed if at all. - -îst is always a stressed syllable in English (scientist is SCI-entist, actually stress can shift forward a bit but -ist often is a strong suffix: e.g. guitar/GUI-tar vs guitarist/gui-TAR-ist, it adds an extra syllable that can take a stress). - -êé as we know from English -ee always carries primary or strong stress on itself  (e.g. employé is pronounced /ɪmˈplɔɪˌiː/ with stress on -ee). This could remain: Xêé gets stress on êé. That's fine and even useful to distinguish agent vs patient in speech (e.g. "TEAchôr" vs "teaCHÉE"). - -hēad likely would be its own syllable with maybe secondary stress (like arrowhead has stress on arrow, secondary on head). If we do Xhēad, probably primary stress remains on X, head is less stressed. - -shōp would form a compound-like stress (e.g. bookshōp: stress on "book", secondary on shōp). Our algorithm in an NLP context might not worry about stress explicitly, but in generating text or speech, these patterns matter. They align with how English usually treats similar suffixes: -ee gets stress, -ist can attract stress to preceding syllable, others usually don't shift the primary stress from the base significantly.

Given these considerations, we can articulate a simplified set of orthographic rules for the generative algorithm: - Rule 1: Basic concatenation. By default, form the agent noun by concatenating the base word and the suffix. E.g. base + -ôr = baseôr. - Rule 2: Drop terminal "-e" from base if suffix begins with a vowel (any of ô, eê, î, â, ê, ê (from êrē)). Exception: if dropping "e" would cause a soft "c" or "g" to become hard, then keep the "e". E.g. face + -ânt → faceânt (to keep "c" as /s/ sound). But make + -ôr → makôr (drop e). - Rule 3: Consonant doubling. If base is monosyllabic ending in (vowel + single consonant) or polysyllabic ending in (vowel + single consonant) where the final syllable is stressed, double that consonant if suffix starts with a vowel. E.g. trim + -êrē → trimmêrē ("trimmery"? meaning one associated with trimming). begin + -ôr → beginnôr (though base stress might be on "gin", it's arguable). For safety, apply to monosyllables mostly. - Rule 4: "-y to -i" conversion. If base ends in a consonant + "y", change "y" to "i" if suffix begins with a vowel (since the result would put "y" in middle, which could be a consonant /y/ sound or unclear). So carry + -ânt → carriânt, carry + -êé → carriêé. But if base ends in vowel + y, just add (play + -êé → playêé). - Rule 5: Hyphenation or epenthesis (rarely). If the combination of base+suffix results in an awkward cluster or potential misreading, a hyphen or linking vowel may be inserted. For example, if base ends with the same letter as suffix begins, consider adding a hyphen: ski + -eêr → ski-eêr (to avoid "skieer" which has double e that might confuse parsing). Or research + -shōp → research-shōp to separate chsh. However, these cases are expected to be rare and could be handled by context if not by rule.

The above rules will be implemented in a hypothetical morphology generation algorithm. The algorithm would take an input word (base) and a desired suffix type, then apply these orthographic adjustments to output the new agent noun. We can outline the algorithm in pseudocode:

function derive_agent_noun(base_word, suffix):
    word = base_word
    if suffix.starts_with_vowel:
        if base_word ends in "e":
            if base_word ends in "ce" or "ge":
                # check if soft pronunciation needed
                # (simple heuristic: if base_word[-2] in consonants c or g and not a word like "bee")
                keep_e = True
            else:
                word = base_word[:-1]  # drop final "e"
        if base_word ends in consonant + "y":
            word = base_word[:-1] + "i"
        if base_word is CVC monosyllable or (multi-syllable with final stress):
            if base_word ends in consonant and base_word[-2] is vowel and base_word[-3] is consonant (or word length 2):
                # double final consonant
                word = word + word[-1]
    # Now handle specific suffix attachments:
    if suffix == "-ôr":
        result = word + "ôr"
    elif suffix == "-eêr":
        result = word + "eêr"
    ... etc for each suffix ...
    # Minor fix: if result has two of same vowels in a row and looks odd, consider hyphen (not automated here).
    return result

This is a rough idea. In practice, one might also consult a list of exceptions (for instance, if base+suffix accidentally equals an existing different word or homograph, maybe adjust spelling). For example, reader (one who reads) vs our scheme might produce readôr for a generic reader actant. But "reador" would look odd next to "reader." Perhaps we'd actually keep using "reader" for that meaning since it's already an English word. Our system doesn't intend to replace existing common words but to provide options when needed. There may be cases where the new form coexists with an old form but with a different nuance. For instance, teacher (common word) vs teacheêr (maybe reserved for someone who is explicitly acting in a paid teaching capacity, distinguishing from say a parent who teaches their child at home, who might be a teachôr in that context but not a professional teacheêr).

It's important to highlight: the introduction of these new forms is theoretical. For practical use, especially in computational systems, one might implement them as part of a controlled vocabulary or allow an AI to generate them when a concise label is needed. They would need semantic disambiguation (ensuring that the context makes clear which suffix to use). But by defining them with clear semantics, we also make it easier for algorithms to choose: e.g., if an actant is recognized as the initiator of action X but only in that scenario, label as Xôr; if recognized as a professional role, label as Xeêr; if it's the target, Xêé, etc.

To demonstrate the formation with a concrete example, let's take the verb "govern" (to rule or manage). - Base "govern". It ends in consonant-n, not a silent e, not a y, and stress is on first syllable ("GOVern"), so no doubling needed for these suffixes since second syllable is not strongly stressed. - govern + -ôr = governôr: meaning a generic governor/actant who governs. Interestingly, "governor" is an existing word but specifically an official title (state governor) or a device in an engine. Our governôr is meant more generally (anyone/anything that happens to govern at the moment). We might pronounce it slightly differently or just treat it as a concept separate from the political title. To avoid confusion, one might visually keep the diacritic: governôr. - govern + -eêr = governeêr: this would denote someone whose job is to govern – effectively a professional governor. In normal English, "governor" already covers that, but if we needed to coin for a different base, this pattern stands. (For a base without an existing -or agent, the difference is clearer: manageôr vs manageeêr, etc.) - govern + -îst = governîst: someone who is an expert in governance or who advocates governing principles (not a standard word, but could be analogous to "statist" which is someone favoring state control). - govern + -ânt = governânt: a participant in governance – perhaps a member of a governing council without being the head. (Latin "governante" isn't a word, but it might label, say, a junior official who takes part in governing). - govern + -êé = governêé: one who is governed. That would mean a subject or citizen under a government. Indeed, political theory often uses the governed as a noun. Here we have a single word for it. Governêé = the recipient of governance (could apply to people in a state, or to departments under an administrator). - govern + -hēad = governhēad: the head of a governing body – essentially a "chief governor." In practice, we might just say governor for that too. But if, for instance, the base was "research", researchhēad nicely labels the head of research. - govern + -shōp = governshōp: this one is odd for "govern" since governance isn't a commodity. Xshōp works better for tangible goods or services. Governshōp could jokingly refer to someone who "sells governance" (imagine a lobbyist or a colonial power imposing governance for profit). Likely, one wouldn't use -shōp for "govern" in typical cases. - govern + -êrē = governêrē: an affiliated person in governance, maybe a bureaucrat or governmental agent. (We might translate governêrē loosely as "functionary in a government system," similar to how military as a noun can mean personnel, though that's collective.)

From this, we see sometimes multiple forms exist that overlap with existing English words (governor vs governôr/governeêr). In an applied setting, we would either redefine the existing word through context or choose a new form when needed to avoid confusion. Perhaps our -ôr could be thought of as slightly different from current -or. We might imagine spelling governor (title) without diacritic and governôr (actant) with diacritic in scholarly usage to distinguish them. But in plain text, context would have to distinguish. Since this is a theoretical proposal, we won't dwell too much on collisions with existing lexicon; the assumption is one could negotiate those (for example, by using -ôr mainly on bases that don't already have a common -er/or agent noun).

In summary, the algorithmic generation of these forms is straightforward enough to implement in a morphological component of an NLP system, abiding by a few orthographic rules. Next, we turn to how having these forms available can actually benefit practical applications in AI and linguistics, and provide some illustrative use cases.

Use Cases and Applications

A productive system of agentive nouns for English opens up numerous applications in Natural Language Processing (NLP), knowledge representation, and language education. Here we outline several key areas where our proposed suffix system could be leveraged: - Semantic Role Labeling and Information Extraction: In NLP tasks like Open Information Extraction or semantic role labeling (as in PropBank or FrameNet), it's often necessary to name the roles that entities play (Agent, Patient, Instrument, etc.). With an extended agent-noun morphology, an NLP pipeline could generate descriptive labels on the fly. For example, if a sentence says "The robot analyzed the data and alerted the team," a system could label the robot as analyzeôr (something that analyzes) and alertôr (something that alerts) in an event representation. These labels are more compact than phrases like "the agent that performed analysis" or relying on the noun "robot" (which doesn't convey its role directly). By using forms like analyzôr and alertôr, we capture the robot's two actant roles distinctly, in a way that can be easily parsed (suffix -ôr telling us it's the doer) . Similarly, the team in the example is the recipient of an alert, so it could be labeled as alertêé (one that is alerted). This systematic labeling could improve downstream tasks like coreference resolution or scenario mapping, because each role is clearly named by its relation to the verb. - Knowledge Graphs and Ontologies: In designing ontologies or knowledge graphs, one often has to create names for classes and properties. Our system provides a reservoir of consistent naming options. For instance, in an ontology of communications, one might have classes Messagôr (agent that sends a message) and Messagêé (agent that receives a message) instead of generic "Sender" and "Receiver". This has two advantages: (1) all such role classes follow a uniform pattern (verb + suffix), making the ontology easier to learn and extend; (2) it avoids ambiguities with nouns that have other meanings. "Receiver" in English could be an electronic device, but receivêé (from receive) unambiguously means an entity that receives. In large knowledge graphs like ConceptNet or WordNet, one could encode relations using these forms. For example, rather than a triple like (Person) --[employs]--> (Person) and (Person) --[is employed by]--> (Person), we could introduce nodes for employôr and employêé, linking them to Person instances as needed. Essentially, turning relations into reified nodes with systematic names. This resonates with the idea of reification in semantic web, but here the naming is part of the innovation. A concrete use-case: imagine a universal knowledge graph schema where for any verb or predicate relation, we automatically have a standard node name for the subject-role and object-role. If a new relation "X zaps Y" is added, we can immediately add class zapôr (for any actant doing the zapping) and zapêé (for any actant being zapped). This could greatly aid automated ontology building and schema integration, because the names are constructed from the relation itself. - Large Language Model (LLM) Fine-Tuning for Relational Understanding: Large language models like GPT or BERT-derived models have vast knowledge but sometimes struggle with precise relational understanding or outputting structured knowledge. Introducing a set of consistent morphological cues could help. By fine-tuning an LLM on data where these new agentive forms are used, we can encourage the model to internalize the distinctions. For example, one could fine-tune a model on a corpus where instead of saying "the teacher taught the student", it says "the teacheêr taught the teachêé". The model would learn that teacheêr correlates with initiating teaching and teachêé with receiving teaching. This might make it easier for the model to, say, answer questions about "Who is doing X and who is receiving X?" without confusion, as the roles are explicitly marked in language. It essentially adds a layer of semantic annotation baked into the vocabulary. Additionally, an LLM could be prompted to generate explanations or summaries using these terms for clarity. In complex multi-agent narratives, an LLM could refer to characters by their roles (e.g. negotiânt for someone just participating versus negotiator vs negotiatôr if that was a distinction) which reduces ambiguity. Although this is speculative, it aligns with how adding controlled vocabulary or notations can guide LLM outputs. This system might be particularly useful in instruction-following or chain-of-thought explanations, where the model could label entities as it reasons ("Device A (measureôr) collects data from Device B (measureêé)…" thereby clarifying the relationship). - Machine Translation and Cross-Lingual Applications: A universal agent-noun system in English could act as an intermediate representation for translating from languages that have rich morphology. For instance, Japanese or Turkish often encode information in morphology that English loses. A Japanese-to-English MT system could choose an English output with our extended suffixes to preserve nuances. E.g., Japanese: 彼は医者で、彼女は患者だ ("He is a doctor and she is a patient") – a trivial case, but if it were "彼は教える人で、彼女は学ぶ人だ" ("He is [a] teach-er person and she is [a] learn-er person"), a vanilla translation might say "He is a teacher and she is a learner". But learner in English doesn't specifically mean someone being taught by that same teacher. Using our terms, one could translate it as "He is a teacheêr and she is a learnêé (learnêé meaning one who is learning from someone)" This maintains the parallel. For languages with clear role marking (like many languages have distinct words for say murderer vs murder victim, where English might just say victim), we could generate a term like murderêé to specifically mean murder victim, aligning closer to the source language's explicit marking. Over time, a whole interlingual inventory of such roles could facilitate more precise translation or even interlingual word embeddings, since each role concept has a one-to-one mapping across languages that have an equivalent morphological or lexical item. - Knowledge Representation in AI Planning and Multi-Agent Systems: In AI planning or robotics, when we define scenarios with multiple agents, giving them role identifiers is common. Instead of arbitrary labels (Agent1, Agent2) or verbose ones, we can assign names like carryôr and carryêé in a task where one agent carries another. This makes scripts more readable: a plan could state carryôr:MOVE_TO(target); carryêé:WAIT(); carryôr:PICK_UP(carryêé) etc., where the variable names carry meaning. It essentially uses the linguistic device to make the code self-documenting. For multi-agent simulations, one could dynamically generate such labels for any new action that comes into play, keeping consistency automatically. - Educational Tools and Linguistic Analysis: Linguists or language educators might use this system to teach about semantic roles or grammar. By inventing these words, students can directly see the relationship between the base verb and the derived noun indicating the role. It's like having a built-in "actor" and "acted-upon" label in the word itself. For example, teaching the concept of active vs passive, one could introduce that English doesn't have a clear pair for many verbs, but in a hypothetical extended English, lovor/lovee (we'd write lovôr/lovêé) would be analogous to nominative/accusative roles. It might also be used in natural semantic metalanguage (NSM) or interlingua research where one wants a controlled subset of English that can express certain logic clearly. Our agentive nouns could be part of a controlled vocabulary for "Basic Universal English" that is richer than current English in some respects, making explicit who is doing what to whom. - Cultural and Cross-Disciplinary Communication: Outside strict computation, having these words can sharpen communication in fields like sociology or design. For instance, ANT researchers themselves might adopt such terms when describing networks: actor-network analysis of a classroom might categorize participants as instructôr, instructêé, supervisôr, observânt, etc. This would be a novel academic jargon, but one that is systematic and arguably clearer than borrowing terms or constantly re-defining who is actor vs intermediary. It also resonates with Latour's own propensity to create terms (like actant, factish, etc.). Why not inscriptor vs inscriptee for those who inscribe meaning into an artifact? In fact Latour did use terms like scribe and inscription device; our system would call them inscribôr and inscribêé (device), or maybe inscribant if the device is a participant in inscription rather than initiator.

In summary, by providing a consistent morphological toolkit, we give both machines and humans a way to generate a concept label instead of resorting to ambiguous or lengthy descriptions. This is especially powerful in complex systems with many interacting entities – a hallmark of ANT and many computer systems.

To illustrate one applied scenario, consider a smart contract system in blockchain technology. These often involve roles like issuer, payer, payee, guarantor, beneficiary, etc. Not all are systematically named. If this system were formalized in a domain-specific language using our suffixes, one could have a smart contract template where roles are <Action>ôr and <Action>êé. E.g., a contract for lending might refer to lendôr and lendêé. This is not far from legal usage (loaner/loanee are sometimes used informally, although the official terms are lender/borrower). Having a default schema (Xor = doer, Xee = receiver) reduces the cognitive load in understanding new contracts or protocols, because the naming convention itself carries meaning .

One more example in NLP: Question Answering (QA) systems. If a question asks "Who received the Nobel Prize in Physics in 2020?", a system might translate internally to something like receiveêé(NobelPrize, Physics, 2020) = ? because we are looking for the recipient. If the system's knowledge base tagged laureates as prizeêés, it could answer more directly. Conversely, if asked "Who awarded the Nobel Prize…?", it looks for awardôr (the committee). While current systems use logical forms or semantic roles, an explicit morphological marker could be a proxy or aid.

Of course, widespread adoption of these suffixes would require acceptance and understanding. Initially, their use might be confined to controlled settings (like within an AI's "thinking" or labeling, or in academic experiments). But even that can be valuable, as it provides a consistent layer of representation that can be stripped off for natural language output if needed, or translated to existing phrasing.

Discussion: Addressing Criticisms and Proposing Mitigations

While the proposed extension of English agentive morphology is theoretically powerful, its practical implementation faces valid criticisms from various academic perspectives. This section addresses these challenges by outlining them and proposing concrete mitigation strategies, structured from the viewpoints of AI research, Actor-Network Theory, and linguistics.

From an AI Researcher's Perspective

Challenge 1: Collision with Existing Vocabulary and Ambiguity The new forms may clash with existing English words (e.g., our governôr vs. the existing governor). This homography could create significant ambiguity for both human readers and NLP models.

  • Mitigation Strategy:
    • Visual Disambiguation: In academic or technical writing, novel suffixes could retain diacritical marks (e.g., -ôr) to visually distinguish them from standard English. Computationally, these could be registered as unique tokens within a model's vocabulary.
    • Priority Rules: A generation algorithm could incorporate priority rules. If a well-established, conventional term exists for a role, the system would default to it. The new forms would only be generated when no standard term is available, thus filling lexical gaps rather than creating redundant terms.

Challenge 2: Computational Processing Cost A rule-based morphological generator could be complex to build and maintain, and its execution could add overhead to NLP pipelines.

  • Mitigation Strategy:
    • Hybrid Implementation: Start with a simple, rule-based prototype for morphological generation. This system can then be enhanced with a data-driven component, using machine learning to identify and learn high-frequency formation patterns from corpora.
    • Modular Design: The morphology generation module should be designed as a loosely coupled component. This allows it to be applied on-demand only for specific tasks (e.g., knowledge graph population, semantic role labeling) rather than being a constant, costly step in a general pipeline.

Challenge 3: Incompatibility with Pre-trained Models Large language models (LLMs) rely on pre-trained tokenizers and word embeddings. Introducing novel words could lead to suboptimal subword tokenization and a lack of meaningful semantic representations.

  • Mitigation Strategy:
    • Vocabulary Integration: Explicitly add the new agentive nouns to the model's vocabulary and optimize the subword segmentation algorithm (e.g., SentencePiece, BPE) to ensure they are treated as whole units.
    • Intelligent Embedding Initialization: During fine-tuning, the embeddings for new tokens can be initialized thoughtfully. For instance, the embedding for drivôr could be initialized as a function of the embeddings for drive and -or, positioning it meaningfully within the existing embedding space.

From an ANT Researcher's Perspective

Challenge 1: Risk of Conceptual Oversimplification and Instrumentalism Forcing complex, fluid roles into discrete morphological boxes could oversimplify the analysis and reduce actants to mere instruments, contradicting ANT's nuanced view of agency.

  • Mitigation Strategy:
    • Suffix Selection Guidelines: Develop a "semantic role matrix" that guides the selection of a suffix based on key dichotomies (e.g., transient vs. permanent, active vs. passive). This matrix would serve as metadata to supplement the formal labeling.
    • Multi-Layered Labeling: Allow for multiple labels to be attached to a single actant. This avoids forcing a single description, permitting a richer representation where an actant can be, for instance, both a participânt in a process and an initiatôr of a specific action within it.

Challenge 2: Difficulty of Practical Application in Fieldwork ANT researchers in the field may find it cumbersome to manually apply this novel terminology to their notes and data.

  • Mitigation Strategy:
    • Lightweight Tooling: Provide simple tools to ease adoption, such as a browser plug-in or spreadsheet macro that allows researchers to easily insert or suggest the appropriate agentive nouns in their documents.
    • Demonstrative Case Studies: Apply the word-formation system to existing, published ANT case studies (e.g., an analysis of a laboratory network). Presenting "before and after" examples will demonstrate the practical benefits.

Challenge 3: Lack of Explicit Connection to ANT's Core Theory The suffix system, as a standalone linguistic proposal, may seem disconnected from core ANT concepts.

  • Mitigation Strategy:
    • Theoretical Mapping: Create an explicit mapping table that shows how each suffix can help to articulate key ANT concepts like translation, modality, or black-boxing.
    • Phase-Specific Examples: Illustrate how different suffixes can be used to label representative actants during the various phases of translation defined by Callon (problematisation, intéressement, enrolment, mobilisation). This would firmly ground the linguistic tool in the theoretical framework.

From a Linguistics Researcher's Perspective

Challenge 1: Lack of Naturalness and Acceptability The proposed forms are neologisms and may be perceived as unnatural or ungrammatical by native English speakers, hindering adoption.

  • Mitigation Strategy:
    • User Studies: Conduct empirical user studies where native speakers are asked to rate the naturalness, comprehensibility, and intuitive appeal of the generated forms.
    • Propose a Graduated Subset: Based on user feedback, identify a subset of suffixes that are most readily accepted (e.g., extensions of -ist and -ant might be more palatable) and propose them for general use, while reserving others for specialized contexts.

Challenge 2: Morphological Over-differentiation An eight-suffix system might introduce more semantic distinctions than are practically necessary or cognitively manageable for most users.

  • Mitigation Strategy:
    • Layered System: Define a "core set" of 3-4 highly useful suffixes for general purposes. The remaining suffixes can be designated as part of an "expert set" for use in highly specialized domains requiring maximum precision.
    • Quantify Productivity: Use corpus analysis to quantify the potential productivity of each suffix, prioritizing those that fill the most significant lexical gaps in English.

Challenge 3: Insufficient Historical and Statistical Grounding The proposal could be strengthened by connecting it more deeply to the historical evolution and statistical realities of the English language.

  • Mitigation Strategy:
    • Historical Case Studies: Conduct a literature review on the history of English derivational suffixes, such as the rise and fall of agentive -ster in Old/Middle English or the systematic pairing of -or/-ee in legal language. Use these as historical precedents for guided morphological expansion.
    • Corpus-based Validation: Use large English corpora (e.g., COCA, BNC) to measure the frequency and productivity of existing derivational patterns. This data can be used to provide a quantitative argument for where the proposed system offers the most value.

Conclusion

English's current repertoire of agentive nouns – though serviceable for everyday communication – falls short of providing a systematic, productive, and semantically fine-grained way to label all types of actors in a complex network.

References: (Included inline as per the citation format)