Development of Morphological Analyzer For Af-Somali

September 4, 2022 | Author: Anonymous | Category: N/A
Share Embed Donate


Short Description

Download Development of Morphological Analyzer For Af-Somali...

Description

 

 

DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR AF-SOMALI

MAHDI YONIS KAYAD

A Thesis Submitted to the Department of Computer Science in Partial Fulfillment for the Degree of Master of Science in Computer Science 

Addis Ababa, Ethiopia May, 2017

 

MAHDI YONIS KAYAD Advisor: Dr. Yaregal Assabie

This is to certify that the thesis prepared by Mahdi Yonis, titled: development of Morphological analyzer for Af-Somali and submitted in partial fulfillment of the requirements for the Degree of Master of Science in Computer Science complies with the regulations of the University and meets the accepted standards with respect to originality and quality. Signed by the Examining Committee:  Name __________________________Signature__________ _______________________ ___Signature__________ Date_______ Advisor:_______________________________________ Examiner:______________________________________ Examiner:______________________________________  

 

 

ABSTRACT

Morphological analysis is a very critical issue especially for natural language processing related tasks on inflectional languages. This thesis work gives the implementation details of the development of morphological analyzer for Af-Somali, which is an inflectional language. A detailed computational analysis of Af-Somali morphology such as formalization of alternation and morphotactic rules for Af-Somali is worked out in order to create the morphological analyzer. In the implementation of the morphological analyzer, alternation and morphotactic rules of AfSomali are represented by two-level morphology rules. This is the first detailed computational analysis of Af-Somali from morphological view. The attempt of this thesis is mainly based on the dictionary book Annarita, known as Qaamuus and the declensions of nouns Andrzejewski. This thesis work is employed by finite state two level approach using Xerox finite state toolkit. The work is done in two parts, means to encode the lexicon we have used lexical formalism (lexc) and the alternation rules are implemented by xfst. Generally, we evaluated the morphological analyzer by measuring the following things, the total number of word tokens correctly accepted by the analyzer versus the number of words incorrectly  processed by the analyzer. We hav havee man manually ually annotated an notated 2218 18 tokens, 90 nouns, 120 verbs and 8 adjectives of words from the book known as (qaamuus). 77 nominal, 105 verbal and 6 adjectives were correctly analyzed. So, from this we can understand that, 85.5% Nominal, 87.5% verbal and 75% of adjectives were correctly analyzed, and total of 218 tokens 86.2% was correctly analyzed, 13.76% is wrongly analyzed and total 10 tokens failed to be analyzed by the system. The results were evaluated by a human reader familiar with the languages. Therefore we found fou nd an encouraging result which is a preliminary work for computational development of Af-Somali.

Keywords: 

(NLP) Natural language Processing, morphological analyzer, (FST) finite state

transducer, (XFST) Xerox finite state toolkit and lexical formalism (LEXC).

I

 

ACKNOWLEDGEMENTS

I thank all who in one way or another contributed in the completion of this thesis. First, I give thanks to Allah who gives me protection and ability to do work. I am so grateful to the Addis Ababa university college of natural science and computer science department for making it possible for me to study here. I give deep thanks to the lecturers at the department of computer science, the librarians, and other workers of the faculty. My special and heartily thanks to my Advisor, Dr. Yaregal Assabie who encouraged and directed me. His challenges brought this work towards a completion. It is with his advices that this work came into existence. For any faults I take full responsibility. My special gratitude and appreciation also goes to Annarita Puglielli and Cabdalla Cumar Mansuur for their invaluable service contribution to Af-Somali dictionary which was first fully written dictionary with the full grammatical information. Their discussions and comments on Af-Somali Lexicons and Morphology have been the base of this work. Moreover, I am grateful to many friends and colleague through these difficult years. I appreciate my dear, Mother and goodhearted brothers, Mr Abdirashid Yonis and Hamse Yonis, who have supported and helped me many setback and I greatly value their contribution.

II

 

Table of Contents List of Figures Figures .................................. ................. ................................... ................................... ................................... ................................... .................................. ................................... ....................VI List of Tables ........................................................................................................................................... VII Chapter 1 : Introduction ............................................................................................................................ 1 1.1

Background of the Study ................................... .................. ................................... ................................... .................................. ................................... ..................... ... 1

1.2

Morphological Analysis .................................. ................. ................................... ................................... .................................. ................................... ........................ ...... 1

1.3

Statement of the Problem .................................. ................. ................................... ................................... .................................. ................................... ..................... ... 3

1.4

Objectives.................................. ................. .................................. ................................... ................................... .................................. ................................... .............................. ............ 4

1.5

Methodology ................................... ................. ................................... ................................... ................................... .................................. ................................... ........................ ...... 5

1.5.1

Literature Review ................................... .................. ................................... ................................... .................................. ................................... ........................ ...... 5

1.5.2

Data Collection and Classification................................... .................. ................................... ................................... ................................ ............... 6

1.5.3

Analysis ................................... ................. ................................... ................................... ................................... .................................. ................................... ........................ ...... 6

1.5.4

Implementation ................................. ................ ................................... ................................... ................................... ................................... ............................. ............ 6

1.5.5

Testing .................................. ................ ................................... ................................... ................................... .................................. ................................... ........................... ......... 6

1.6

Application of the Result ................................... .................. ................................... ................................... .................................. ................................... ..................... ... 6

1.7

Scope and Limitation ................................... .................. ................................... ................................... ................................... ................................... .......................... ......... 7

1.8

Organization of the Thesis ................................... .................. .................................. ................................... ................................... ................................... .................. 7

Chapter 2 : Literature Review ................................................................................................................... 8 2.1 2.2

Introduction .................................... .................. ................................... ................................... ................................... .................................. ................................... ........................ ...... 8 Introduction to Morphological Analysis ................................. ................ ................................... ................................... ................................ ............... 8

2.2.1

Morphemes .................................. ................. ................................... ................................... ................................... ................................... ................................... .................. 8

2.2.2

Affixes................................... ................. ................................... ................................... ................................... .................................. ................................... ........................... ......... 9

2.2.3

Types of Morphological Processes ................................... .................. ................................... ................................... ................................ ............... 9

2.2.4

Inflection .................................... .................. ................................... ................................... ................................... .................................. ................................... .................... 10

2.2.5

Derivation .................................. ................ ................................... ................................... ................................... .................................. ................................... .................... 10

2.2.6

Compounding ................................. ................ ................................... ................................... ................................... ................................... .............................. ............. 10

2.3

AF-Somali Morphology .................................. ................. ................................... ................................... .................................. ................................... ...................... .... 10

2.3.1

AF-Somali Phonetics .................................. ................. ................................... ................................... .................................. ................................... .................... 11

2.3.2

Basic Characteristics of Af-Somali .................................. ................. ................................... ................................... .............................. ............. 11 III

 

2.4

Inflectional Process of AF-Somali ................................... ................. ................................... .................................. ................................... ...................... .... 12

2.4.1

Nouns................................. ................ .................................. ................................... ................................... .................................. ................................... ............................ .......... 12

2.4.2

AF-Somali Noun Determiners.................................. ................ ................................... .................................. ................................... ...................... .... 15

2.4.3

Adjectives ................................... ................. ................................... ................................... ................................... .................................. ................................... .................... 17

2.4.4

The Verb .................................... .................. ................................... ................................... ................................... .................................. ................................... .................... 17

2.4.5 Classification AF-Somali Verbs .................................. ................ ................................... .................................. ................................... .................... 18 2.5 Derivational System of AF-Somali .................................. ................ ................................... .................................. ................................... ...................... .... 20 2.6

Approaches to Morphological Analysis .................................. ................. ................................... ................................... .............................. ............. 21

2.6.1

Corpus-based Approaches ................................. ............... ................................... .................................. ................................... ............................ .......... 21

2.6.2

Rule-based Approach ................................... .................. .................................. ................................... ................................... ................................. ................ 22

2.7

Finite State Technology .................................. ................. ................................... ................................... .................................. ................................... ...................... .... 23

2.7.1

Finite State Machines................................. ................ ................................... ................................... .................................. ................................... .................... 24

2.7.2

Finite-state transducers ................................... ................. ................................... .................................. ................................... ............................... ............. 24

2.7.3

Two Level Morphological Approach.................................. ................. ................................... ................................... ........................... .......... 25

2.7.4

The Xerox Finite State Frame work ................................... .................. ................................... ................................... ........................... .......... 25

2.8

Summary................................... .................. .................................. ................................... ................................... .................................. ................................... ............................ .......... 28

Chapter 3 : Related work ......................................................................................................................... 29 3.1

Introduction .................................... .................. ................................... ................................... ................................... .................................. ................................... ...................... .... 29

3.2

Morphological Analyzer for European Languages................................. ................ ................................... ............................... ............. 29

3.3

Morphological Analyzer for Asian Languages ................................... .................. .................................. ................................... .................... 30

3.4

Morphological Analyzer for Ethiopian Languages................................. ................ ................................... ............................... ............. 31

3.5

Summary................................... .................. .................................. ................................... ................................... .................................. ................................... ............................ .......... 32

Chapter 4 : Design of Af-Somali Morphological Analyzer ................................................ ............................... ................................... .................... 33 4.1

Introduction .................................... .................. ................................... ................................... ................................... .................................. ................................... ...................... .... 33

4.2

General Architecture of AF-Somali Morphological Analyzer ................................. ............... ............................... ............. 33

4.2.1

Lexicon/ Morph-tactics ................................. ................ .................................. ................................... ................................... ................................. ................ 35

4.2.2

Alternation Rules ................................. ................ ................................... ................................... .................................. ................................... ......................... ....... 36

4.3

The Design of AF-Somali Part-Of-Speech Lexicon and Alternation Rules ......................... .................. ....... 37

4.3.1

AF-Somali Verb Lexicon Design .................................. ................. ................................... ................................... ................................. ................ 37

4.3.2

Alternation Rules of AF-Somali Verbs ................................. ................ .................................. ................................... ......................... ....... 41

4.3.3

Noun Lexicon Design ................................. ................ ................................... ................................... .................................. ................................... .................... 44

4.3.4

Alternation Rules of AF-Somali Nouns ................................. ................ .................................. ................................... ......................... ....... 47

4.3.5

Adjectives Lexicon Design .................................. ................ ................................... .................................. ................................... ............................ .......... 48

Chapter 5 : Experimentation and Evaluation ........................................................................................ 50 5.1 Introduction .................................... .................. ................................... ................................... ................................... .................................. ................................... ...................... .... 50 IV

 

5.2

Experimentation.................................. ................. ................................... ................................... ................................... ................................... ................................. ................ 50

5.3

Discussion and Evaluation.......................................... Evaluation........................ ................................... .................................. ................................... ............................ .......... 51

Chapter 6 : Conclusion and Future Work .............................................................................................. 53 6.1

Conclusion ................................... ................. ................................... ................................... ................................... .................................. ................................... ......................... ....... 53

6.2

Future Work ................................... ................. ................................... ................................... ................................... .................................. ................................... ...................... .... 54

References .................................................................................................................................................. 55 1.9 Appendix-A: Alternation Rules for Noun and Verb .................................. ................. ................................... .............................. ............ 1 1.10

Appendix-B: Af-Somali verb Lexicon .................................. ................. ................................... ................................... ................................... .................. 4

1.11

Appendix-C: Af-Somali Noun lexicon .................................. ................. ................................... ................................... ................................... .................. 9 

V

 

List of Figures Figure 2-1: Example of two lev level el representat representation ion of Af-Somali ................................................................ 27 Figure 2-2: Creation of a lex lexical ical transducer. The .o. operator represen represents ts the composit composition ion operation ...... 28 Figure 4-1: 4-1: Af-Somali morphological analyzer arch architecture itecture desig designn ......................................................... 34 Figure 4-2: Af-Somali verb lexicon  .......................................................................................................... 38 Figure 4-3: 4-3: Af-Somali verbs finite state nnetworks etworks .................................................................................... 39 Figure 4-4: Example represent representation ation of A Af-Somali f-Somali second and third group verb FSN .............................. ................. ............. 41 Figure 4-5: Af-Somali verbs alternation rules  ..........................................................................................

42

Figure 4-6: 4-6: Alternation rule representation with xxfst fst ................................................................................ 43 Figure 4-7: person morpheme realization  ................................................................................................ 44 Figure 4-8: Af-Somali noun lexicon  .........................................................................................................

45

Figure 4-9: Af-Somali noun suffixes  ........................................................................................................

45

Figure 4-10: Af-Somali verb finite state networks  ..................................................................................... 46 Figure 4-11: Af-Somali noun alternation rules ........................................................................................... 47 Figure 4-12: Af-Somali adjective lexicon ................................................................................................... 48 Figure 4-13: Af-Somali Adjective finite state networks ............................................................................. 49 Figure 5-1: 5-1: AF-Somali Verb to suffix attachment .................................................................................... 51

VI

 

List of Tables Table 2.1: Pluralization system of Af-Somali ........................................................................................... 11 Table 2.2: Derivational inflected plural form of Af-Somali ..................................................................... 12 Table 2.3: Af-Somali Gender Markers  .....................................................................................................

13

Table 2.4: Example ooff nou nounn with gender m markers arkers .................................................................................... 13 Table 2.5: Example of Af-Somali plu pluralization ralization an andd declension formation .............................................. 14 Table 2.6: Example of Af-Somali Articles  ...............................................................................................

15

Table 2.7: Af-Somali Demonstratives  ......................................................................................................

16

Table 2.8: AF-Somali possessive .............................................................................................................. 16 Table 2.9: Interrogative representation of A A-Somali -Somali  ................................................................................ 17 Table 2.10: Pluralization of adjectives ........................................................................................................ 17 Table 2.11: Example of person agreement a greement with tenses ............................................................................... 18 Table 2.12: First conjugation representation of Af-Somali verbs ............................................................... 19 Table 2.13: Second Af-Somali verb conjugation (toosi)  ............................................................................ 19 Table 2.14: Example of Af-Somali 3rd. conjugation representation .......................................................... 20 Table 2.15: Fourth Af-Somali verb conjugation representation ................................................................. 20 Table 2.16: Example of Af-Somali two level representation ...................................................................... 25 Table 4.1: Tags ooff AF-Som AF-Somali ali gramm grammatical atical in information formation .......................................................................... 35 Table 4.2: Mappings ooff root words and their mo morphemes rphemes ........................................................................ 36 Table 4.3: An examp example le of Af-Somali vverb erb morphotac morphotactics tics ........................................................................ 40 Table 4.4: Realization w with ith sh when it suffixed wi with th t  ............................................................................. 42 Table 4.5: Example ooff nou nounn declens declension ion 2 morphotactics ......................................................................... 47 Table 4.6: Partial reduplication of nouns  .................................................................................................. 48 Table 4.7: The alternation of declension 5 representation ......................................................................... 48 Table 4.8: Example of adjective morphotactics ........................................................................................ 49 Table 5.1: Overall accuracy of the system ................................................................................................ 52

VII

 

List of Abbreviations Af-Somali

Somali Language

FSA

Finite State Automata

FST

Finite State Transducers

IR MT

Information Retrieval Machine Translation

 NLP

Natural Language Processing

POS

Part-Of-Speech

SOV

Subject-Object-Verb

VIII

 

Chapter 1 : Introduction 1.1  Background of the Study

A natural language is the preferred medium of communication for people and it can be in a spoken or written form, which is difficult to be simply understood by the computers. This needs a mechanism with enough information of the language including its word grammar and sentence structure to be understood by the computers. The processing of this information by a computer is known as natural language processing (NLP). NLP is used for both generating human readable information from computer systems and converting human language into more formal structures that a computer can understand [6]. It is a field of study which consists of different levels of linguistics analysis such as phonetic, morphological, syntactic and semantic analysis, and the basic level is the morphological analysis to different NLP applications. 1.2  Morphologica Morphologicall Analysis

Morphological analysis is a process of segmenting words into morphemes, the assignment of grammatical information to grammatical categories and the assignment of the lexical information to particular lexeme or lemma [30]. It retrieves the grammatical features and  properties of an inflected word. The analyzer breaks the word into minimal meaning  bearing morphemes and produces the morph syntactic features such as the root, tense,  person and number etc. Morpheme Words Words are formed by combination of one or more free morphemes and zero or more bound morphemes. In spoken language, morphemes are composed of phonemes, the smallest linguistically distinctive distinctive units of sound. re-, de-, un-, -ish, -ly, -cieve, -mand, tie, boy, like, etc. of receive, demand, untie, boyish, likely. Morphology is seen as ‘the study of words that are formally and semantically related’. In  

order to consider a word as an expression, it must be characterized as having three 1|Page 

 

features, a phonological form, a category or word classes and a meaning. Morphology is concerned with the study of internal structure of words. Morphological analysis consists of the identification of parts of the words or constituents of the words. For example the word toosi (strengthen) in Af-Somali consists of two constituents, the root word toos (straight) and the imperative marker (i). The morphological analysis primarily consists in breaking up the words into their parts and establishing the rules that govern the co-occurrence of these parts. Morphology can be viewed as the process of building words by inflection and word-formation. So, the task of morphological analysis, is to take forms and relate them to other word forms, at the same time deriving information about the form [30]. A morphological analyzer is an essential and basic tool for building any language  processing application in natural language e.g., Machine Translation system and it is an essential technology for most text analysis applications like information retrieval (IR) and text summarization etc. The most obvious applications are found in the areas of lexicography and computational linguistics [24]. Two factors are essential to achieve accurate automatic morphological analysis, one factor is the construction of a set of morphological rules (morphotactic) and the other is the morphological analysis procedure [24]. The absence or underperformance of either of them impairs the overall ability of the morphological analyzer. For example, with respect to the word "dogs", we can say that the "dog" is the root form, and s‟ is the affix. Here the affix gives the number information of the root word. Thus,

morphological analysis is found to be centered on the analysis and generation g eneration of the word forms. It deals with the internal structure of the words and how those words can be formed. Morphological analysis also play an important role in applications such as spell checking, electronic dictionary interfacing and information retrieving systems, where it is important that words are only morphological variants of each other are identified and treated similarly [30]. In NLP and especially in machine translation (MT) systems, we need to identify words in texts in order to determine their syntactic and semantic properties. Morphological study helps us by providing rules for analyzing the structure and formation of the words. 2|Page 

 

Therefore, having a morphological analyzer for any natural language is a vital step in starting natural language processing; especially those lesser-studied and under-resourced languages, it is often a practical and extremely valuable first step, making use of corpora, lexicons, morphological grammars and phonological rules already produced by field of linguists and descriptive linguists [9]. Several Morphological Analyzers have been developed for different well documented languages such as English [30] and Arabic [13]. On the other hand, there is some significant studies in the area of computational morphology for Ethiopian languages like Amharic [5, 8, 21, 22 and 29], Oromo [22] and Tigrinya [22]. Moreover, there are also works performed for Afaraf [2] and Ge’ez by Yitayal Abate [34]. But, to the best of our knowledge there is no academically or published study that had been made so far to develop morphological analyzer for Af-Somali. 1.3  Statement of the Problem

Af-Somali is the official language of Somalia, Ethiopian Somali region and it’s the working language for Kenyan Northern Province and Djibouti [26]. It is also the instructional medium of education of all the schools of these countries, which means that the language is spoken by a number of people and needs to be given attention to computationally process the language. Furthermore, a large number of official documents, religious books and computerized documents are found in Af-Somali, these makes the language to be predominantly used in word processing activities in different areas. In addition to this, there are some NLP applications developed for AfSomali like, machine translation system by Google, bilingual electronic dictionary project which is an English to Somali and Somali speech corpus by Niman Abdillahi [26] and these need to identify words in texts in order to determine their syntactic and semantic properties and the word is lexical category. For example, to translate a word in Af-Somali to English using the electronic dictionary, the users couldn’t find the exact meaning or the corresponding word in English

language. Firstly, this process needs to have the morphological analyzer to distinguish the word category like that tells the word is past or present and it identifies its part-of-speech. Furthermore, if someone wants to conduct a research on NLP and to access the different resources found in different format of the Af-Somali, we need a computational processing of the language or in other way we need to translate the language to the well-developed languages. 3|Page 

 

Considerable research has been done on NLP systems for main Ethiopian languages in general including various works on computational morphology like, Amharic [5, 8, 21, 22 and 29], Afan Oromo [22], Tigrinya [22] and Afaraf [2]. However, No research has been conducted so far in the area of automatic morphological analyzer for Af-Somali. The absence of morphological analysis systems limits the effort of making computers work comfortable with Af-Somali. Af-Somali is the same Cushitic origin to the Afaraf and Afaan Oromo and the other Cushitic language family; and has a much similarity in its vocabulary and grammatical structure, which means they follow SOV structure. However, it has its own uniqueness by which it differs extensively in terms of focus noun and verb markers’, morphology and word order which seems to the semant ic family of Arabic

language. It is also unique in that, the modifiers modif iers occupy a single position, it is pluralization pattern of the language and their word formation process; hence, it needs its own independent morphological analyzer. Af-Somali is morphologically rich and the word formation in the language possesses a number of different linguistic morphological features including complex verb and noun inflectional, derivational and compounding, and because of this complexity, automated morphological analyzer is difficult to construct. Hence, it is a challenging task. Moreover, Af-Somali has more complex inflectional verbs, adding a large number of affix to the stem word and morphological analysis, is vital for the development of many practical natural language processing systems such as machine readable dictionaries, machine translation, information retrieval, spell-checkers, and speech recognition. Therefore, the aim of this work is to conduct a research on morphological morp hological analysis for Af-Somali morphology that can be implemented from computational point of view, to anal analyze yze the word and morphological category, the word formation process in the language and to model computational morphological analysis for Af-Somali. 1.4  Objectives General Objective

The main objective of this thesis work is to develop a morphological analyzer for Af-Somali word morphology.

4|Page 

 

Specific Objectives

In order to achieve the above general objectives, this thesis work has the following specific objectives;  

Studying and understanding the word and morphological categories in Af-Somali

 

Studying and understanding the phonological and morphological alternation rules





involved in Af-Somali word formations and conjugations.  



Assessing the different techniques and approaches employed so far in morphological analysis tasks and select the ones that appropriate to the morphological propert propertyy of Somali inflectional morphology.

 

Designing morphological analyzer for Af-Somali;

 

Formulating the phonological/orthographic rules involved in inflectional morphological





 processes in the language  



Test the prototype for morphological analyzer to measure it-s performance. 1.5  Methodology 1.5.1  Literature Review

Literature review will be conducted to understand the language ’s morphology in developing the morphological analyzer. Consultations of the scholars in the area of Af-Somali morphology will  be conducted to better understand the morphology of the language and to get information which is helpful for the thesis work. Developing a morphological analyzer requires to analyze and identify the property of Somali word formation and it will be important to review the researches done on the development of morphological analyzer for other languages. It is also, so important and will  be helpful by studying and selecting s electing the suitable approach of morpho morphology logy for Af-Somali. Besides this, literature in the area of morphological analysis in particular and computational linguistics in general (e.g. approaches) will be reviewed to better understand how words are analyzed. Thus, the Finite State transducer based Approach to morphological analysis was selected to analyze and derive the root and grammatical properties of Somali words.

5|Page 

 

1.5.2  Data Collection and Classification

To conduct any study needs to collect and analyze a data important for the research to be conducted. In this thesis work a corpus data d ata or a list of words, being electronic text data consisting of list of words such found in a Book Known as Qaamuus and different magazines from internet of Af-Somali words will be collected. The unique word-forms will be classified into different categories such as nouns, verbs, adjective, etc. and further subdivisions have been made according to their morpho-syntactic behaviors using Xerox finite state tool. 1.5.3  Analysis

The classified data will be analyzed into root or (stems) and affixes for each category using Xerox finite state tool in lexicon formalism. Then phonological rules have also been identified and formalized for each category by using xfst-tool. 1.5.4  Implementation

Finite state transducers for each group of words will created following concept of ‘finite state transducer ’. ’. Then, a computational model for Af-Somali inflectional morphology will be implemented using Xerox Finite State Tool (xfst) developed by the two principle researchers at the Xerox Palo Alto Research Center. 1.5.5  Testing

In this thesis work, finite state approach will be used to develop, the morphological analyzer. A wordlists of surface word forms (tokens) will be extracted from Af-Somali Dictionary Book (Qaamuus) and will be inserted in to the prototype to be analyzed. An output was considered correct only if it found all legal combinations of roots and grammatical structure for a given word form and included no incorrect roots or structures. 1.6  Application of the Result

As morphological analyzer is a vital step in starting natural language processing for any language, Af-Somali morphological analyzer is developed for Af-Somali morphology morpholog y to have more efficient and improved NLP applications like Spelling and grammar checker, POS tagger, machine translation system, etc. Besides it has a great contribution to the linguistic experts to easily analyze 6|Page 

 

the language’s morphological properties and when the applications related to Af-Somali are

developed, such as the end users who are seeking the information stored in Af-Somali can be  benefited from the analyzer by identifying the word is morphological categorical property. In this regard, this work can be basic and very much useful for the languages’ technological improvement. The computational analysis of morphology in Af-Somali would be a central and essential component for the development of other Af-Somali processing applications. 1.7  Scope and Limitation

Somali linguistic varieties are divided into three main groups: Northern, Benadir and Maay. The  Northern Somali forms the basis for Standard Af-Somali. So, the scope of this study stud y is limited to develop a morphological analyzer for the standard Af-Somali/northern Af-Somali morphology. It doesn’t include other dialects of Af-Somali. On the other hand, this study mainly focuses on the

written form of words. Derivation and compounding are also morphologically important, but they have not been dealt with in this thesis work. Despite the fact that there are a number of models/approaches for computational analysis in the literature, a finite f inite state approach is employed in this thesis work. 1.8  Organization of the Thesis

This thesis work has been structured into six chapters. The first chapter of this thesis work, started  by giving background information of the thesis work, which introduces natural language  processing and morphological analysis, presenting pr esenting the problems that motivated us, objectives and the methodologies followed. Also the first chapter describes about the importance and the scope of the thesis work. In chapter 2, we presented literatures reviewed for the thesis work. It looks into the general Af-Somali word morphology and the general characteristics of Af-Somali part of speech. In this chapter, we also presented the morphological analysis approaches. The studies related to this thesis work are presented in chapter 3. The fourth chapter describes the design and implementation of all those analyses done in the preceding chapters. In chapter 5, the experimentation and evaluation are discussed. In the last chapter 6 we have concluded the thesis work and give a direction to the future works related to this thesis.

7|Page 

 

Chapter 2 : Literature Review 2.1  Introduction

This chapter presents documents reviewed, which are important for the development of Af-Somali morphological analyzer. Mainly, this chapter presents Af-Somali morphology giving more emphasis on the description of the morphological processes involved in the word formation and generation. It also presents the Af-Somali background information and phonetics. In addition to this the chapter reviews the different computational approaches employed in natural language  processing systems and morphological morph ological analysis. 2.2 

Introduction to Morphological Analysis

2.2.1  Morphemes

Morphs are the phonological/orthograp p honological/orthographical hical realization of morphemes. A single morpheme may be realized by more than one morph. In such cases, the morphs are said to be allomorphs of a single morpheme. The following examples demonstrate the concept of morphemes and their realization as morphs. Free morphemes like town, dog can appear with other lexemes (as in town hall or dog house) or they can sstand tand alone, i.e. "free". Free morphemes are morphemes, which can stand by themselves as single words, for example caleemo (‘leaves’) and saar (“get off’) in Af -Somali -Somali

whereas, bound morphemes  (or

affixes) never stand alone. They always appear attached with other morphemes like "un-" appears only together with other morphemes to form a lexeme. Bound morphemes in general tend to be prefixes and suffixes. For example, in Af-Somali, the morph ‘in’ is the realization of morpheme for denoting verb infinitive marker. For example, the words like “afuri, ababi, toosi” and other second group of Af -Somali -Somali verbs use “in” as infinitive marker which makes “afurin, ababin and toosin”. But when the

same morpheme is attached with a different word, it is realized as a different morph. So, the same morpheme can be realized by different morphs in a language. These different 8|Page 

 

morphs of the same morpheme are called allomorphs. An allomorph is a special variant of a morpheme. For example, the second person singular marker in Af-Somali is sometimes realized as o, t or s and the morpheme -t has the morph "-t" in birta (the metal), but "d" in mindida (the knife) of definite marker in feminine nouns. These are the allomorphs of "-t". A group of allomorphs make up one morpheme class. In addition to this, morphology deals with all combinations that word forms or parts of words. So, the two broad classes of morphemes are stems and affixes. The stem is the “main morpheme” of the word, supplying the main meaning, for example, “guriga” where

guri (house) is the stem and “ga” is the affixes which adds an additional meaning “the”.   2.2.2  Affixes

An affix is a bound morph that is realized as a sequence of phonemes. Affixes are classified according to whether they are attached before or after the form to which they are added. Prefixes are attached before and suffixes after. Most Af-Somali word uses the suffixes and a few number of verbs may use use the prefix ty type pe of affixes. Therefore, we can classify languages into concatenative and non-concatenative languages based on the morphology they possess. Non-concatenative language is called template or root-and-pattern morphology and Af-Somali possesses this system in its plural formation of nouns. For example, its duplifix property of the fourth noun declension “aC” as buug-buugag and fool-

foolal. 2.2.3  Types of Morphological Processes

Word is defined as the smallest thought unit vocally expressible composed of one or more sounds combined in one or more syllables. A word is a minimum free form consisting of one or more morphemes. There are three broad classes of ways to form words from morphemes and Af-Somali make use of these three forms in word formation, inflection, derivation and compounding. 

9|Page 

 

2.2.4  Inflection

Inflection is the combination of a word stem with a grammatical morpheme, usually resulting in a word of the same class as the original stem, and usually filling some syntactic function and is productive, e.g. imperative of verb Toos (direct!) toos+i (straighten) the meaning of the resulting word is easily predictable. Inflectional morphemes modify a word's tense, number, aspect, and so on. 2.2.5  Derivation

Derivation is the combination of a word stem with a grammatical morpheme, usually resulting in a word of a different class, often with a meaning hard to predict exactly. In case of derivation, the part of speech (POS) of the new derived word may change. Mostly, in Af-Somali we use inflectional word formation process even if some word uses to form a word in derivational. 2.2.6  Compounding

Compounding is the joining of two or more base forms to form a new word. Such S uch frequent root-root fusions are very common in written Af-Somali. Compounds are formed by combining uninflected noun forms with semantic content with either different inflected verbal forms with no semantic content. For example, the Af-Somali plural noun “buugag”  books with the verbal form sheeg for another noun of “buugagsheeg” bibliography. 

2.3  AF-Somali Morphology

Somali language (Af-Somali) is an Afro-Asiatic language, belonging to the Cushitic family's  branch. It is a Lowland East-Cushitic language spoken by roughly up to 16 million people in Somalia, Somaliland, Puntland, Djibouti, Ethiopia (Somali Region) and Kenya (Northeastern Province) [25]. Somali linguistic varieties are divided into three main groups Northern, Benadir and Maay. Northern Somali (or Northern-Central Somali) forms the basis for Standard Somali language. Northern Somali dialect, commonly known as Somali language is spoken in Djibouti, Ethiopia, Puntland and North of the Wabi-Shabeele, which represent the spoken standard of 10 | P a g e  

 

literary Somali [26]. The written system of the language was adopted in 1972 and there are no textual archives before this date. It uses Roman letters and doesn ’t consider the tonal accent [26]. 2.3.1  AF-Somali Phonetics

The phonetic structure of Af-Somali has 22 consonants and 10 vowels, 5 long and 5 short vowels [33]. Af-Somali is also a tone accent language with 2 to 3 lexical tons. Af-Somali consonants follow the same order and have the same value with the equivalent letters of the Arabic alphabet, except G. As presented below some alphabets are not found in English and this alphabets are similar to Arabic voiced. The Af-Somali alphabets are preceded by ' ( ‘= alif) ' and contains 21 consonants which are B, T, J, X, KH, D, R, S, SH, DH, C, G, F, Q, K, L, M, N, W, H, Y and other ten vowels of Somali language which are a, i, e, u, and o and their long counterparts aa, ee, ii, oo and uu. There is no problem for the Latin understanding and the vowels have the same value as in Spanish or Italian. 2.3.2  Basic Characteristics of AF-Somali

The syllable structure of the Somali language is (C) V(C) (C) [items in parentheses are optional] and most words have a di- or tri-syllabic structure (root morphemes and affixes are usually monoor disyllabic [33]. Af-Somali is of the same Cushitic origin to the Afaraf and Afaan Oromo and the other Cushitic language family; and has a similarity in its vocabulary and in their basic word order, which means they follow SOV structure. But, the most distinguishing characteristics of AfSomali is that, double pluralization processes such as the ones illustrated in Table 2.1, where an independently productive plural suffix -yáal can be added to already plural forms such as nim-á-n ‘men’ or naag-ó ‘women’.  Table 2.1: Pluralization system of Af-Somali Singular word

Simple plural

Plural of plural

 Nin(ka)-masculine

 Niman(ka)-masculine

 Niman-yaal-Feminine

‘The Man’ 

‘Men’ 

‘Groups of men’ 

Roob(ka)-masculine

Roobab(ka) -masculine

‘Roobab-yow’(ga)-masculine

‘The rain’

‘rains’ 

11 | P a g e  

 

The other and important characteristics that distinguishes Af-Somali from the other Cushitic languages is that, existence of unquestionably derivational process that takes inflected plural forms as a basis as illustrated in Table 2.2. Table 2.2: Derivational inflected plural form of Af-Somali

Root word Inflected Plural Derivation Word/English Buug ag Sheeg Buugagsheeg/bibliography Buug

ag

haye

Buugaghaye/librarian

Geed

O

Aqoon

Geedaqoon/botany

Xagl

O

Gooye

Xaglogooye/diagonal

Therefore, like any other language there are some common notable characteristics in AF-Somali and these are inflectional system, inflected forms in composition/derivation, conjugational classes, affixation, and reduplication. In addition to this, there are three broad classes of ways to form words from morphemes in AF-Somali namely, inflectional, derivational and compounding. So, in this work we consider the analysis of inflectional word formation processes relating to the important AF-Somali part of speech. Therefore, the most important part of speech in Somali language are nouns, verbs and adjectives and we present their word formation process in the following sections. 2.4 

Inflectional Process of AF-Somal AF-Somalii 2.4.1  Nouns

Grammatically, Af-Somali nouns are encoded morphologically by way of affixation to root and stems. Also, as in other related languages, Af-Somali nouns are inflected for gender, number and  person. Nouns in Af-Somali, like any other languages, languages, are the names of persons, places, things and abstract entities from estimated point of view. Nouns are inherently masculine or feminine. In general, a noun consists of a root and affixes, which provides a combination of gender and number marking. The main complication is that there are several declension classes, with specific singular and plural suffixes for groups of classes. So, Af-Somali is marked for gender distinction,  pluralization and determiners as we will present as follows. 12 | P a g e  

 

Gender Markers

Somali language nouns can be marked for gender to distinguish between masculine and feminine. Some of the Af-Somali nouns are distinguished by accentual tone difference. But in this thesis work, we will only consider the nouns that are marked for gender changes. The markers for AfSomali gender changes are only suffixes that distinguish between the masculine and feminine. f eminine. The markers for the feminine and masculine are shown in the following Table 2.3. As the Table 2.3 shows “ka, ha, a and ga”  are masculine markers and ta, da and sha. Table 2.3: Af-Somali Gender Markers

Masculine marker Ka Ha Ga a Feminine markers Ta Da sha

The gender markers in Af-Somali are attached to the nouns as suffixes suffix es to differentiate between the masculine and feminine. In the Table 2.4, we will describe how the markers are suffixed to the nouns of Af-Somali. Table 2.4: Example of noun with gender markers

Words

Masculine marker Words

 Ninka( the man)

Ka

Feminine marker

Gabadha (the girl) da

Guriga (the house) Ga

Badda (the sea)

Aabaha (a father)

Hasha (the camel) Sha

Ha

da

Even though we have presented the nouns and how the gender markers are suffixed to them, there are different rules that have to be captured in this study. In Af-Somali the basic markers for gender are ‘ka” and “kii’ for masculine and ‘ta and “ tii” for feminine. But the markers can be changed

 based on the last character of the words. F For or example if the masculine nouns are ended up with the vowels i and e the ka marker is changed into ga and ha respectively and if the feminine nouns are ended up with the consonant l the feminine marker “A” is changed into “sha” and the “l” is deleted. The other rule is that all feminine words that end up with the vowel o take the “da” gender marker.

13 | P a g e  

 

Pluralization System of AF-Somali Nouns

There are different rules to change the singular nouns of Af- Somali into plural by looking at the gender of the words. Most Af-Somali n oun pluralization is inflectional, which means it doesn’t change the grammatical word category and most of them become b ecome plural by simply taking suffixes. As described in a Table 2.5, one syllabic Af-Somali words can be plural with partial reduplication of their last consonant alphabet and ‘a’ vowel is inserted between the double consonants. If singular Af-Somali noun ends with the consonants like b, d, n, l and r the last consonant of the word  becomes dou ble and ‘o’ vowel is added to make the word plural and the gender is changed in to feminine. And also, if the noun ends with the consonants like s, q, c, f, x, and I, we add the root word ‘yo’ suffix as a plural. Some nouns which are two syllabic singular words are changed into

 plural by adding the suffix ‘o’ and the alphabet that is found before the last consonant is deleted and their gender remains unchanged. In addition to this, the nouns that end up with the alphabet  –  e is changed into plural by adding the suffix  – yaal. yaal. There are some words derived from Arabic language which becomes plural like the Arabic pluralization. As a result of this, Af-Somali nouns are classified into seven declensions as shown in Table 2.5, based on how they become plural and the gender of the plural with respect to the singular, as shown in the Table 2.5, if the singular word is masculine and changed into feminine when it becomes, becomes , plural that word is in declension one [1]. Table 2.5: Example of Af-Somali pluralization and declension formation

Word

Gender

Miis

Masculine Singular Miis+as

Masculine Plural Tables

Baal

Feminine

Feminine

erey

Masculine Singular erey+yo

Masculine Plural Words

Dec-2

Mindi

Feminine

Singular Mindi+yo

Masculine Plural Knifes

Dec-2

 Naag

Feminine

Singular Naag+o

Masculine Plural Women

Dec-1

Ilig

Masculine Singular Ilk+o

Masculine Plural Teeth

Dec-3

Masculine Plural Girls

Decl-3

Gabadh Feminine

Number Word Singular Baal+al

Singular Gabdh+o

Gender

Num

English

Dec-4

Plural Diagonal Dec-4

Dameer Masculine Singular Demeer+ro

Feminine

Hooyo

Feminine

Singular Hooyooyin

Masculine Plural Mothers

Sheeko

Feminine

Singular Sheeko+oyin Masculine Plural Stories

14 | P a g e  

declension

Plural Donkeys Dec-5 Dec-5 Dec-6 Dec-6

 

Aabe

Masculine Singular Aabeyaal

Feminine

Plural Father

Dec-7

2.4.2  AF-Somali Noun Determiners

The determiners are the modifiers which add meaning to the noun by attaching as a suffix. They are classified into 4 types according to the meaning they add to the noun. These are, Articles (Qodob), demonstrative (Tilmaame), interrogative (Weydimo) and possessive (Lahaansho). Articles

AF-Somali Articles take different forms like, -ka and –  kii  kii for masculine nouns, and  – ta ta and  – ttii ii  for feminine nouns. If the person we are talking about is far from us or the thing we are reporting is past, we will change ka/ta into  – kii/-tii kii/-tii respectively. The form of the articles are changed into another form by looking at the last alphabet of the noun that the article is attached to to.. For example, let us take the noun “kabo” and add the article “ka”; then “ka” is changed into ha and the word  becomes “kabaha”. So, we have described this process in the Table 2.6, which article is attached to the noun and how it was changed. As indicated in Table 2.6, the article marker  – k  k  can be changed into – g when it is suffixed to the masculine nouns that ends with the characters like,-g, w, -aa, -u, -y or  – I and the article  – k can be changed into  – a when the masculine nouns ends up with the characters like,-h, -x, -q, -c, -kh. In addition to this, the feminine article marker  – ta ta can  be changed into – da da or – sha sha. – T can be – d if it is suffixed to the noun that ends with the characters like -o or – d, d, -c, -x, -h, -y, (‘) and the “ta” article article can be – ssh h when it was suffixed to the feminine nouns that end with the character “–l” by deleting the “l” character.  Table 2.6: Example of Af-Somali Articles

Root word Gender

15 | P a g e  

Article Formed word

Kabo

Masculine Ka

Kaba(ha)

Buug

Masculine ka

Buug(ga)

Magic

Masculine Ka

magac(a)

Maro

Feminine

Ta

Mara(da)

Bac

Feminine

Ta

Bac(da)

Ul

Feminine

Ta

Usha

 

Demonstrative suffixes

Like the articles, demonstratives are suffixed to the nouns to modify the meaning of nouns in determining the farness or where the things are. Their difference depends on the relationship that is found between the subject and object or the distance between the person talking and what he was talking about. So, in Af-Somali we have three different demonstratives of noun markers as described in the Table 2.7, which indicates nearness (kan), farness (kaas), to left/right (keer) for masculine and nearness (tan), farness (taas) and to left/right (teer) for feminine. Table 2.7: Af-Somali Demonstratives

Word

Near

Farness

To left/right

Gabadh Feminine

Tan

Taas

teer

Gabadh Feminine

Gabadhan Gabadhaas gabadheer

 Nin

Gender

Masculine Kan

Kaas

keer

 Nin Masculine Ninkan Ninkaas ninkeer The Table 2.7 also describes that whenever, a suffix starting with t is added to a feminine noun which the last character is “dh”, the t is deleted and only takes the remaining part of the suffix.  

Possessive Suffixes

In Af-Somali the possessive suffixes are used to represent in the word that something you own or  possession like other languages and are classified into masculine and feminine f eminine which depends on the degree of person and this forms 6 different possessives as indicated in Table 2.8. Table 2.8: AF-Somali possessive

Person Masculine Feminine Root noun Gender 1st.Sg.

Word

Kayga

Tayda

Buug

Masculine Buug-gayga

2nd.Sg. Kaaga

Taada

Gabadh

Feminine

3rd.Sg

Kiisa

Tiisa

Buug

Masculine Buug-giisa

3rd.Sg

Keeda

Teeda

Gabadh

Feminine

1st.Pl

Keenna

Teenna

Nin

Masculine Nin-keenna

2nd.Pl

Kiinna

Tiinna

Wiil

Masculine Wiil-kiinna

Kooda

Tooda

Bac

Feminine

Gabadh-aada Gabadh-eeda

rd

3 .Pl 16 | P a g e  

Bac-dooda

 

Interrogative Suffixes

The interrogative suffixes are determiners which adds question like meaning and uses markers like other determiners that can be masculine and feminine. So use –  (kee)  (kee) for masculine nouns and the  –  (tee)  (tee) suffix for feminine nouns as we described in the Table 2.9. Table 2.9: Interrogative representation of A-Somali

Root noun Gender

Interrogative suffix Word

Dal

Masculine Kee

Dalkee

Sacad

Feminine

Tee

Sacaddee

Meel

Feminine

Tee

meeshee

2.4.3  Adjectives

Adjectives, in turn, do not belong to a clearly defined category in Af-Somali. Items such as yár ‘small’ and wéyn ‘big’ are best interp reted as state verbs displaying a particular defective

 paradigm. Adjectives are inflectionally pluralized through rreduplication. eduplication. The reduplicated plural is formed by prefixing a copy of the first syllable to the stem. Only the second syllable bears the high tone. Besides this adjectives can be marked for person, definiteness and have tense markers. For example the plural form of adjective words like cad, cusub, yar are described with the Table 2.10. Table 2.10: Pluralization of adjectives

Root adjective word Number Word

Number

Cad(white)

Sg

Cadcad

plural

Cusub

Sg

Cuscusub plural

Yar

Sg

Yaryar

Plural

2.4.4  The Verb

The verb is the most important part of speech in Af-Somali, which can be inflectionally complex than other parts of speeches. Verb morphology is slightly more complex. Again, a typical verb 17 | P a g e  

 

consists of a root plus a number of affixes. These include derivational affixes (Somali includes a  passivizing form which can only be applied to verbs which have a ‘causative’ argument, and a causative affix which adds such an argument) and a set of inflectional affixes which mark aspect, tense and agreement [25]. It has complex alternation patterns and it is basic building part of the Af-Somali verbs are the root word, modifiers, person and conjugation. The most important that have to be described is the verbs conjugations. So we have presented some of properties of conjugations with an examples as follows. The conjugation is a thing that shows the verb’s ten se,

aspect and mood. The agreement of the person and tense produces 6 different forms of a word as we illustrated with an examples in Table 2.11. And also the table shows sh ows the person agreement with tenses and the person markers for each the 6 forms. Table 2.11: Example of person agreement with tenses

Person

The root verb

Tense

Present verb Past verb

1st.Sg

Cun

Present Cun-0-aa

past Cun-0-Ay Cunaa

cunay

2nd.Sg

Cun

Cun-t-aa

Cun-t-ay

Cuntaa

cuntay

3rd.Sg.masc Cun

Cun-0-aa

Cun-0-ay

Cunaa

Cunay

3rd.Sg.fem

Cun

Cun-t-aa

Cun-0-ay

Cuntaa

Cuntay

1st.Pl

Cun

Cun-n-aa

Cun-n-ay

Cunnaa

Cunnay

2nd.Pl

Cun

Cun-t-aan Cun-t-een Cuntaan

cunteen

As shown in the above table 2.11 ( 0) indicates the person 1st.sg, 3rd.sg.masc; 3rd.pl. And the suffix  – t shows the 2nd.sg, 3rd.sg.fem, 2nd.pl; the suffix  – n also indicates the 1 st.person Pl. These can be also affixed by the suffixes like – ay ay or – een een for the conjugation of the past verb and when the verb

is present it takes the suffixes like  – aa/-aan. aa/-aan. Af-Somali verbs are classified into five conjugation categories based on their imperative markers. 2.4.5  Classification AF-Somali Verbs

Based on conjugation Af-Somali verbs are classified into two broad categories, huge number of Af-Somali verbs with only suffixes and small number of verbs with both prefix and suffixes. So, firstly late is consider the conjugation of verbs only with suffix which we mostly used in AfSomali. This types of Af-Somali verbs are classified into five types of conjugations known as 1 st. 18 | P a g e  

 

conjugations, 2nd. Conjugations, 3rd. conjugations, 4th. Conjugations and 5th. Conjugations. The 1st. conjugation verbs are characterized by b y that, this verbs didn’t use an imperative marker and they are mostly one syllabic words. For example let us consider and present this in the Table 2.12. Table 2.12: First conjugation representation of Af-Somali verbs

Verb Cun

Person 0

Tense Ay

Imperative 0

The word cunay

Jab

0

Ay

0

Jabay

Qor

T

Ay

0

qortay

Secondly, the 2nd. Conjugation of Af-Somali verbs are characterized by that, these verbs are mostly formed from other verbs and they are suffixed with imperative marker “I”. For example, the verb “toos” is suffixed with “I” to become the 2 nd. Conjugation type of Af-Somali verbs as shown in

Table 2.13. Table 2.13: Second Af-Somali verb conjugation (toosi)

Person/number imperative Habitual

Present

 present

continuous

 past

Past continues

1st.Sg

I

Toosiyaa

Toosinayaa

Toosiyay

Toosinayay

2nd.Sg

I

Toosisaa

Toosinaysaa

Toosiyay

Toosinaysay

3rd.Sg.masc

I

Toosiyaa

Toosinayaa

Toosiyay

Toosinayay

3rd.Sg.fem

I

Toosisaa

Toosinaysaa

Toosisay

Toosinaysay

1st.pl

I

Toosinaa

Toosinaynaa

Toosinay

Toosinaynay

2nd.Pl

I

Toosisaan

Toosinaysaan

Toosiseen Toosinayseen

3rd.Pl

I

Toosiyaan

Toosinayaan

Toosiyeen toosinayeen

The other type of Af-Somali verbs is that, 3rd. conjugation verbs which is characterized to be suffixed with “ee” of imperative marker as shown in Table 2.14 and this indicates that the verb is

in 3rd. conjugation and we have listed some of the verbs in this conjugation and represented in an example found in the Table 2.14.

19 | P a g e  

 

Table 2.14: Example of Af-Somali 3rd. conjugation representation

The root verb

Imperative

Infinitive

The verb

In English

Dhab

Ee

Eyn

Dhabeyn

Make the truth

Ciid

Ee

Eyn

Ciideyn

Put the soil

Lastly, the 4th. Af-Somali verb conjugations are characterized by their imperative marker “o” which makes this verbs to have different representation and 5 th. Af-Somali verb conjugation are also characterized by their imperative marker “so”  and we clearly described the following example ex ample

found in Table 2.15 to represent the verb conjugation which shows their inflections like person, number, tenses and other properties and how this conjugation forms seven different part of verbs which formed from the person agreement with number and tenses. Table 2.15: Fourth Af-Somali verb conjugation representation

Person/number imperative Habitual

Present

 present

continues

Paste

Paste continues

1st.Sg

O(dhaqo)

Dhaqdaa

Dhaqanayaa

Dhaqday

Dhaqanayay

2nd.Sg

O

Dhaqataa

Dhaqanaysaa

Dhaqatay

Dhaqanaysay

3rd.masc

O

Dhaqdaa

Dhaqanayaa

Dhaqday

Dhaqanayay

3rd.fem

O

Dhaqataa

Dhaqanaysaa

Dhaqatay

Dhaqanaysay

1st.Pl

O

Dhaqannaa

Dhaqanaynaa

Dhaqannay Dhaqanaynay

O O

Dhaqataan Dhaqdaan

Dhaqanaysaan Dhaqanayaan

Dhaqateen Dhaqdeen

nd

2 .Pl 3rd.Pl

2.5 

Dhaqanayseen dhaqanayeen

Derivational System of AF-Somali

Morphologically Af-Somali words are inflectional like other Cushitic languages, but some words are derivational. Mostly words which are derivational in Af-Somali are verbs and Adjectives, which can be formed from other categories of words and most adjectives are formed from verbs. Some nouns are morphologically derived from other categorical word classes in the process of 20 | P a g e  

 

word formations. Most verbs in Af-Somali can be changed in to nouns by taking the suffix (a) and doubling the last consonant. For example the verb “dil” can be changed chan ged into noun by simply adding “aa” and it becomes “dilaa” the verb “cun” is also changed into noun by adding the character “o” and the noun formed is “cunto”. 

Verb morphology is slightly more complex and gain, a typical verb consists of a root plus a number of affixes. These include derivational affixes (Somali includes a passivizing form which can only  be applied to verbs which have a ‘causative’ argument, and a cau sative affix which adds such an

[25]. For example Aadaan (prayer)-noun word becomes “aadanay” (praying) which is a verb and the noun word iskaashato (cooperation) noun word is changed in iskashi which is a verb. Also like other part of speech Af-Somali adjectives have a derivational process. There are two sorts of adjectives, ‘basic adjectives’ (a small number), such as yár ‘small’ and wéyn ‘big’ and

those formed from nouns and verbs by addition of lexical suffixes, such as caan-sán ‘famous’ (cáan ‘fame’), wanaag-sán ‘good’ (wanáag ‘goodness’) an d jar-án ‘chopped’ (jár ‘to break’).  On the

other hand the compounding of words creates a derivational word which can be formed from two different words like verb and noun or adjective to noun and others. 2.6 

Approaches to Morphological Analysis

There are a number of approaches which are widely used in computational morphology. Some of these approaches are based on concepts in automata theory, probability, principle of analogy, and information theory. The computational morphological approaches are broadly categorized into rule-based and corpus-based approaches. 2.6.1  Corpus-based Approaches

Corpus-based approaches are statistical in nature and these approaches do not strictly follow explicit theory of linguistics [32].Suitable machine learning algorithm is used to train the system and collect the necessary information and features from the corpus. The knowledge acquired is then used to perform the morphological analysis task [32].Based on the type of text corpora used, corpus-based approaches can be further categorized into supervised and unsupervised approaches. Supervised approaches use annotated text corpora while unsupervised approaches uses natural corpus as those found in newspaper and books. As noted above, these approaches need a huge 21 | P a g e  

 

corpus of words which used to train the algorithm to be developed. So this approach is difficult for under resourced languages like Somali and it may not produce an efficient and quality output. Mostly, the most developed languages used the machine learning approach, which mostly requires huge number of word corpora and electronic dictionary, newspapers and other documents that are found in the Internet. The languages used this approach to overcome the overload created by the rule based approach and some of the languages that used this approach are English [30], Arabic [13], etc. Limited researches are done in this area for local languages such as Amharic [22] and Ge’ez [34] using corpus based approaches. But, most of local languages are used a rule based

approach specifically the two level morphological analysis. 2.6.2  Rule-based Approach

The rule-based approach strictly follows the explicit theory of the linguistics, which is based on a theory of morphology laid down by an expert. Kazakov and Munandhar [32] stated that this approach enables to incorporate sophisticated linguistic theories such as generative phonology into computational morphology processes [32]. Because of their reliance on linguistic theories, systems s ystems developed using rule-based approaches are often efficient and produce better quality outputs [28]. There are different rule-based methods used to develop morphological analyzer for any languages and some of these are, paradigm based and finite state automata. In paradigm based method for a particular language, each word category like nouns, verbs, adjectives, adverbs and postpositions will be classified into certain types of paradigms. Based on their morphophonemic behavior, a paradigm based morphological compiler program is used to develop the morphological analyzer. The Finite State Automata (FSA) based method uses regular expressions and is used to accept or reject a string in a given language. In general, an FSA is used to study the behavior of a system composing of states, transitions and actions. When FSA starts working, it will be in the initial stage and if the automation is in any an y one of the final states it accepts its input and stops working. Within computational morphology, a very significant advance came with the demonstration that  phonological rules could be implemented as finite state transducers  (FSTs) and that the rule ordering could be dispensed with using FSTs that relate the surface and lexical levels directly, socalled “two level” morphology (TLM) to lexical output) to one that performs generation (lexical 22 | P a g e  

 

input to surface output) [32].TLM is devised to handle morphological analysis and generation in a  bi-directional way. The approach a pproach is based on two lexica lexi ca (one for the underlying and the other for surface word forms), and a set of morphological rules. The rules establish whether a given sequence of characters at the surface level (as it appears in the text) can correspond to a sequence of symbols used to represent the morphemes in the lexicon. In other word, the rules map the two strings to each other. TLM is currently very popular method in computational morphology [32].And the most common benefits of FST for NLP stem from several properties of finite-state devices are true representation, modularity, compactness, efficiency and reversibility. True representation means that the kind of phonological and morphological rules r ules that are common in linguistic theories can be directly implemented as finite-state relations. The implementation of linguistically motivated rules in FST is therefore straightforward and direct. Modularity is   the closure properties of regular languages and relations provide various means for combining regular expressions, supporting a variety of operations on the languages these expressions denote. For example, closure under union facilitates a separate development of two grammar fragments which can then be directly combined in a single operation. The most useful operations under which transductions are closed is probably composition, which is the central vehicle for implementing replace rules. Finite-state automata can be minimized, guaranteeing that for a given language, an automaton with a minimal number of states can always be generated and this property is known as compactness. Toolboxes can apply minimization either explicitly or implicitly to improve storage requirements. When an automaton is deterministic, recognition is optimally efficient (linear in the length of the string to be recognized). Automata can always be determined, and toolboxes can take advantage of this to improve time efficiency. In addition to this finite-state automata and transducers are inherently declarative, it is the application program which either implements recognition or generation. In particular, transducers can be used to map strings from the upper language to the lower language or vice versa with no changes in the underlying finitestate device [28]. 2.7 

Finite State Technology

Finite-state technology (FST) denotes the use of finite-state devices, such as automata and transducers, in natural language processing. Since the early works which demonstrated the 23 | P a g e  

 

applicability of this technology to linguistic representation. FST is considered adequate for describing the phonological and morphological processes of the world’s languages [32].In order

to understand how to build the linguistic application, we first need to be acquainted acq uainted with the basics of how a finite-state machine works. 2.7.1  Finite State Machines

A finite-state machine (FSM) is an abstract machine that implements a regular language. Regular languages can be described formally in a concise notation, through regular expressions. A finite-state machine is a network consisting of states indicating one start state and one or more final states. Transitions between states are possible only onl y if the required input is recognized. A path is a sequence of transition over arcs to a particular state. In computational morphology, a path is a set of alphabets equivalent to a word in natural language. So, it can be said tthat hat the technology that utilizes the finite-state network in the processing of creating an application is said to be a finite state technology. But, the finite state automata only accepts word and checks if the word is a valid word that found in the language. It does not gives or produces an output or generate. 2.7.2  Finite-state transducers

So far, the analysis of words in a network has simply yielded one of two responses, either accept, indicating that the word is in the language of the network, or a reject, indicating that the word is not in the language. While this can be valuable, as for instance in spell-checking, finite-state networks are capable of storing and returning much more interesting information [28]. Within computational morphology, a very significant advance came with the demonstration that phonological rules could be implemented as finite state transducers [11] and that the rule ordering could be dispensed with using FSTs that relate the surface surf ace and lexical levels directly [11], so-called “Two-level” morphology. A second important advance was the reco gnition by [11] that a cascade of composed FSTs could implement the two-level model. Finite-state techniques are  probably the most prevalent approach employed by automatic morphology systems, as their simplicity and outstanding efficiency are unequaled. FSAs can be used to recognize particular patterns, but don’t, by themselves, allow for any an y analysis of word forms. Hence for morphology, we use finite state transducers (FSTs) which allow the

24 | P a g e  

 

surface structure to be mapped into the list of morphemes. FSTs are useful for both analysis and generation, since the mapping is bidirectional [28]. 2.7.3  Two Level Morphological Approach

The two-level morphology approach to morphological analysis is a language independent general formalism for analysis and generation of word-forms [28]. [ 28]. Kimmo invented this approach in 1983. The Generative phonology approach creates un-necessary intermediate levels and is also unidirectional. Kimmo decided to eliminate the intermediate levels. This created a new approach, which has only two levels, the lexical level and the surface level, hence the name Two-Level Morphology. This model has also an added advantage of being bi-directional, implying that both analysis and generation could be done using the same system, which was not possible with the earlier approaches which were uni-directional. Two-level morphology depends heavily on finite state methods, which are well known and are often described as elegant [28]. The two level approach has already successfully been used to develop a comprehensive morphological analyzer for Swahili, a Bantu, Amharic, Afan Oromo and Tigrign languages. The following examples described in Table 2.16, shows the two level representation of Af- Somali S omali words of tagay (he went) and waddooyin (roads). The surface level is the inflected word form and the lexical level defines the stem plus a set of morphological feature tags relating to the word. For example; let as describe with an example shown in table 2.16 using the Tagay and waddooyin of Af-Somali words. Table 2.16: Example of Af-Somali two level representation

Word

Word class

Inflectional type

Generated word

Lexical level Surface level

Tag (go) Tag

Verb 0

paste Ay

Lexical level

Waddo

Noun

Pl

waddooyin

Surface level

Waddo

0

oyin

waddooyin

3rd.per.Sg.masc tagay 0 tagay

2.7.4  The Xerox Finite State Frame work

Xerox research institute has developed a set of finite-state tools which provide a means of implementing two level morphologies. The tools are natural language independent and have been used to implement morphologies for many of the major languages English, Spanish, French, and 25 | P a g e  

 

German, Arabic etc. as well as Afaraf, Afan Oromo, Amharic and others. Xerox finite state technology (XFST) is a programming language for regular expressions, which can be compiled into finite state networks and is used here for analysis of Af-Somali morphology. morphol ogy. It comes bundled with a set of tools for compiling and working with FSTs. XFST includes two components known as lexc and xfst. lexc is a compiler for lexicons lex icons in the lexc language, which is specifically designed for handling morphotactics (the syntax of the morphemes) in natural languages and xfst xfs t is the core tool providing an interface to the finite state calculus for building, accessing, manipulating finite state networks and a compiler for regular expressions and replacement rules which will be essential for any work. Lexicon Compiler 

Lexicon compiler (Lexc) is the finite-state tool which has been developed by Xerox for defining two-level lexicons. Lexc is just one of several ways to specify finite-state transducers, but it is especially designed to facilitate the work of the lexicographer [28]. Lexicons and morphotactic information are encoded in the lexc language, which is a kind of right recursive phrase-structure grammar, and are compiled into finite-state transducers as shown in figure 2.1. Finite-state transducers (FSTs) are data structures that encode regular relations [28] which are mappings between two regular languages. For our human convenience, we can visualize a finite-state relation as having an upper-side regular language and a lower-side regular language and each string in one language is related to one or more strings in the other language. By convention, the upper-side or analysis strings of an FST compiled from a lexc description consist of underlying morphemes (strings of phonemes and morphophonemic) and multi-character symbol s ymbol tags like +Noun, +Verb, +Adj(adjectives, +Conj (conjugations), +ImpeV (imperative verb), +Masc[masculine], +Fem[feminine], +Sg[singular], +Pl[plural], etc. that identify the morphemes [3].It accepts a text file containing a user-defined lexicon encoded using to the following syntax. Lexical-item

Continuation-class;

The lexical item is usually the unmarked form of the word (the root or headword given in a dictionary). In the context of this work the lexical item is the stem  (the root in most cases) to which inflectional affixes are attached, i.e. a free morpheme. The continuation class can be a pointer to another lexicon or it can be the end-of-string marker, the example below found in 26 | P a g e  

 

Figure 2-1 shows two entries for ‘tag (go)’, one of which is followed by the end-of-string marker ‘#’ and the second which points to the continuation class past Tense, where the aspect form of the

word will be defined.

 Figure 2-1: Example of two level representation of Af-Somali

We make use of the two-level representation to encode valuable morphological information about the words as the above example shows. The symbols to the left of the colon represent the lexical level Verb+tag’, and the symbols to the right of the colon represent the surface su rface form ‘tag’.  Xerox Finite State Technology Interface

The xfst part of this frame work is mainly concerned with the realization, i.e. surface forms, and  phonological alternation rules. This component takes the output of lexc transducer (lexical grammar) as input, which has stems with grammatical features labeled with tags and it is passed through additional rules to obtain the acceptable surface forms. The xfst component helps to compile the lexc grammar into an FST as well as other rule FSTs using lexc files and rule files respectively. Generally, the followi following ng Figure 2-2 illustrates the components of morphological analyzer using finite state transducer, where the The .o. operator represents the composition operation. 

27 | P a g e  

 

 Figure 2-2: Creation Creation of of a lexical lexical transduc transducer er

2.8 

Summary

In this chapter, we introduced Af-Somali background information, morphology and the Af-Somali important part of speech words. We have also described d escribed finite state technology that is successfully applied to computational morphology. The regular expression that can be compiled into finite state network which signifies regular language and the same language can be encoded by the finite state network. The complex finite state network can be built from the smaller networks using various mathematical operations such as union, concatenation, composition, complementation, subtraction su btraction and intersection.

28 | P a g e  

 

Chapter 3 : Related work 3.1  Introduction

In this chapter, we present the system developed for computational morphological analysis for different languages in the world and also in this chapter we look at the approaches they used to develop the morphological analyzers. Specifically, we will look in detail the rule based approach of finite state technologies developed and used for the morphological analyzer of Ethiopian and Cushitic language which are related to Af-Somali. Creating an automatic morphological analyzer/generator is just one step in starting natural language processing for any language; but especially for minority, emerging or generally lesserstudied languages, it is often a practical and extremely valuable first step, making use of corpora, lexicons, morphological grammars and phonological rules already produced by linguists and descriptive linguists [6]. 3.2  Morphologica Morphologicall Analyzer for European Languages

Cagri [17] developed  TRmorph, a two-level morphological analyzer for Turkish. The system is completely implemented using freely available Stuttgart finite state transducer tools (SFST). As Cagri [17] presented, SFST is a freely available finite state tool set particularly aimed for implementing morphological analyzers. The tool uses a simple specification language mainly  based on regular expressions, with additions of the well-known two-level operators that are  particularly useful in implementing phonological (or orthographic) alternations. The TRmorph was analyzed and evaluated with real world data during its development and the system has been tested on two relatively large corpora, the METU corpus and Turkish Wikipedia. Generally, Cagri[17] said, the same process is repeated for successfully analyzed words, where there was no errors, but with some ambiguous analyses. Elaine [18] also developed morphological analyzer for Irish language. The system was developed  by using finite-state two-level description descr iption with Xerox Finite-State Tools. The ssystem ystem encodes the inflectional morphology of all inflected parts-of-speech in modern Irish and the morphotactics of 29 | P a g e  

 

stems and affixes are encoded in the lexicon and word mutations are implemented as a series of replace rules encoded as regular expressions. A major advantage that Elaine [18] get from finitestate two-level implementations of morphology is their inherent bi-directionality; the same system is used for both analysis and generation of word forms in the language. The system designed for  broad coverage co verage of the language, is evaluated against the most ffrequently requently used words in a corpus corpu s of contemporary Irish texts. Finally, Elaine [18] gives as suggestion to include derivational morphology and dialectal or historical word-forms that the system was not implemented. Generally, we can understand that, morphological analyzer systems can be used as a component  part in many NLP applications such as spelling checkers/correctors, stemmers, s temmers, and text to speech synthesizer’s [18].

In addition to this, Xuri [30] developed an English morphological analyzer using machine learning learn ing approach. The system is consists of two closely related components; morphological rule learning and morphological analyzing. As Xuri [30] presented unsupervised learning learnin g has been employed to obtain a set of affix transformational rules and the experiment presented shows that the analyzer has a satisfactory performance. However as stated in [30], problems remain and the most difficult is combinatory ambiguity. This shows that a larger context, such as part of speech or context contex t between words is needed for a correct analysis of these words. So, mostly the machine learning approaches require to have huge number of wordlist in a corpus corp us trained to give an analysis which did not exactly follow the linguistic rules of the languages. 3.3  Morphologica Morphologicall Analyzer for Asian Languages

Gulshat and Ilyas [19] developed a rule based morphological analyzer and a morphological disambiguator for Kazakh language. This system gives the implementation details of a rule-based morphological analyzer of Kazakh language which is an agglutinative language. In the implementation of the morphological analyzer, alternation and morphotactic rules of these systems are represented by two-level morphology rules and Foma finite state compiler is employed. As Gulshat and Ilyas [19] have presented the Morphotactic rules and possible morphemes are defined in the lexicon file and alternation rules in the system are defined and the rules are composed with 30 | P a g e  

 

the lexicon file in a Foma file. The system was tested and evaluated which shows sh ows a beginning work on the development of morphological analyzer of Kazakh language. This system s ystem is working in two directions as at lexical and surface level and due to the ambiguities in language there is no one-toone mapping between surface and lexical forms of words and the system can produce more than one result. Also Kenneth [20] developed a morphological analysis and generation of Arabic language. The system uses Xerox finite state transducer toolkit for its implementation. Kenneth [20] described that, the Lexicons and morphotactic information are encoded in the lexc language which is a kind of right recursive phrase-structure grammar, and are compiled into finite-state transducers and Alternation rules to perform deletion, epenthesis, assimilation and metathesis are written in the twolc

language and/or in a notation known as REPLACE rules. The system was tested and

evaluated with an encouraging performance containing include about 4930 roots. So, for any language to have a morphological analyzer is one step forwarding fo rwarding to technology for that language. 3.4  Morphological Analyzer for Ethiopian Languages

Micheal [22] developed a morphological analyzer for three of Ethiopian languages, Amharic, Afaan Oromo and Tigrinya called HornMorpho. The system uses finite state transducer integrated with python programming language for the implementation and the system uses separate finite state transducer for each language. langu age. In addition to this, the system was evaluated with a web crawler developed by Biniam Gebremicheal and Michael Gasser [22], stated that, more testing is called for, this evaluation suggests excellent coverage of Amharic and Tigrinya verbs for which the roo roots ts are known. Although Oromo, a Cushitic language, does not exhibit the root+template morphology that is typical of Semitic languages, it is also convenient to handle its morphology using the same technique because there are some long-distance dependencies and because it is useful to have the grammatical output that this approach yields for analysis. For Amharic, however, the system is apparently able to at least analyze the great majority of nouns and adjectives. adjectives . The system treats all Amharic words other than verbs, nouns, and adjectives as unanalyzed lexemes. But, the tool is not convenient to Afaan Oromo, because of the language is complicated by the great variation in the use of double consonants and vowels by Oromo writers [22].

31 | P a g e  

 

The other mostly related language is Afaraf and Ali Mohamed [2] developed the first morphological analyzer for this languages and used a finite state transducer. As Ali described that the analyzer, manually annotated 312 tokens, 200 (100 consonant-initial & 100 vowel-initial) verbal, 80 nominal and 32 adjectival words from three popular Afar magazines2 published in Ethiopia and Djibouti. 192 verbal, 75 nominal and 28 adjectives were correctly analyzed and said that the results were evaluated by a human reader familiar with the languages. An output was considered correct only if it found all legal combinations of roots and grammatical structure for a given word form and included no incorrect roots or structures [2]. 3.5  Summary

A limited researches have been conducted in developing morphological analyzer for Cushitic languages like Afaan Oromo [22] and Afaraf [2] and both languages analyzers used rule based approach with finite state transducer. But, to the best of our knowledge no research has been conducted so far in the area of automatic morphological analyzer for Af-Somali. The absence of morphological analysis systems limits the effort of making computers work comfortable with Somali.

32 | P a g e  

 

Chapter 4 : Design of Af-Somali Morphological Analyzer 4.1  Introduction

This chapter presents the design of Af-Somali morphological categories and phonological rules to design a computational model using the Xerox finite state toolkit. It presents the general architecture of lexical FSTs for Af-Somali morphological analysis and the morph-tactics of the language which means how the morphemes co-occur. It also, shows the morph-tactics for each word class separately with lexc formalism and the alternation rules using xfst interface. The main objective in the design of the morphological analyzer is to construct a network which accepts all and only the valid Somali words, and delivers the right analysis. So, in this section, we clearly present the detailed overview of the morphological mor phological analyzer system design and its components. 4.2  General Architecture of AF-Somal AF-Somalii Morphological Analyzer

The construction of the morphological analyzer system, s ystem, using finite state transducer will be broken down into two large components lexicon/ morph-tactics mor ph-tactics part and phonological or alternation rules  part. The morph-tactics of the language describes what stems and affixes can co-occur and in what order, are captured in the lexicon. While phonological and morph-phonological alternations  between underlying forms and surface spoken or o r written forms are implemented using alternation rules. A word, in order to be analyzed, follows the path lexicon→morphotactic rules→alternation rules→surface. Before the result of the morphological analyzer appears at the surface, it will follow the lexicon path to determine the actual morpheme of that word. After moving from the lexicon, that word will be analyzed by morph-tactic and morphophonemic rules. Only after finishing the  process in morph-tactic mor ph-tactic and morphophonemic rules, the result of morphological mor phological analyzer for that word will be delivered as shown in Figure 4-1.

33 | P a g e  

 

 Figure 4-1: Af-Somali morphological analyzer architecture design

The other common applications of finite-state techniques include handling words whose roots or stems are not found in the lexicon using guessers, by which the lexical component is replaced by a phonotactic component characterizing the possible shapes of roots or stems. Guessers is to define or recognize the words, which are not found f ound in the lexicon, because all words, cannot be collected or it is time consuming.

34 | P a g e  

 

4.2.1  Lexicon/ Morph-tactics

The design of the tags has become very important in the development of morphological analyzers, since the tags will deliver linguistic information that occurs on a word being analyzed. The morphological analyses of Somali word forms are presented in this system in terms of the following symbols found in Table 4.1. Table 4.1: Tags of AF-Somali grammatical information

 No. Grammatical

Tags

information 1

POS

+N(noun), +V(verb), +Adj(adjective)

2

Number

+Sg(singular), +pl(plural)

3

Definiteness

+def(definite), +indef(indefinite)

4

Gender

+fem(feminine), +masc(masculine)

5

Tenses

+pres(present tense), +paste(paste tense, +pres.conti(present continuous tense),+paste.conti(paste continuous tense)

6

Imperative

+imp(imperative)

7

Demonstratives

+close, +far, +near

8

Possessives

1st.Sg,2nd.Sg,3rd.masc,3rd.fem,1st.pl,2nd.pl,3rd.pl

9

Interrogatives

+inter(interrogative)

10

Infinitive

+inf(infinitive)

After various affixes in the morphology were identified, the order in which these affixes are attached to the verbal, nominal, adjectival stem was determined in the lexicon database.

35 | P a g e  

 

The lexicon component will be a transducer that accepts as input only valid Somali stems/roots followed by only legal sequence of tags and produces as output from these, an intermediate form, where the tags are replaced by the morphemes that they correspond to. Within a lexicon, word classes (stems) are assigned to separate classes depending on their inflection they require. Each stem class has an associated continuation class where morphological tags and affixes are concatenated to the stem. Internal modifications (ablaut) to stems also have been implemented in the lexicon. The part that accomplishes this, the lexicon transducer, will be written in a formalism called lexc. The lexc-formalism is more suited for lexicon construction and expressing morph tactics. For example, in the analyzer about to be constructed, the lexicon component FST will  perform the following mappings shown in the Table Tab le 4.2. Table 4.2: Mappings of root words and their morphemes

Word

verb Imperative infinitive

Lexical level Caddeyn +V

+imp

+inf

Surface level caddeyn

Ee

Eyn

+0

All root words and morph tactics rules were entered into lexicon database and all spelling rules were entered into rules database. Separate FSTs were created for lexicon and rules, and then combined into one big FST by applying FST composition operation. Therefore, for each word class we created a separate lexicon and alternation rules described in the following sections. 4.2.2  Alternation Rules

Having accomplished the first part of the grammar construction, we now turn to the alternation rules component. The idea is to construct a set of ordered rule transducers that modify the intermediate forms output by the lexicon component. At the very least we will need to remove the ^-symbol which is used to separate morpheme boundaries before we produce valid surface forms. The role of the alternation rules is to modify the output of the lexicon transducer according to phonological and morph-phonological rules. So, for the above example in Table 4.2, we've seen that Af-Somali verb3 word class root concatenated with imperative ee and infinitive marker eyn 36 | P a g e  

 

cadd caddeyn (clarifying). However, when the infinitive marker eyn e yn is suffixed to double vowels (ee) the last vowel of the double vowels e is replaced with the character y. A way to describe the process of forming the correct verb3 word class is to always represent the infinitive suffix as the morpheme eyn as we have, and then subject these word forms to alternation altern ation rules that eliminate the final double vowels and only add the infinitive suffix. This, among others, is the task of the alternation rules component to produce the valid surface forms from the intermediate forms output by the lexicon transducer. Since alternation rule FSTs that are conditioned by their environment are very difficult to construct by hand, we use the replacement rules formalism in xfst to compile the necessary rules into FSTs. This is accomplished by the regular expression composition operator (.o.). Somali has several phonological alternations involving reduplication, lenition, vowel harmony and tone. With this documentation we described the design of alternation rules clearer and we describe or represent with an examples. 4.3  The Design of AF-Somali Part-Of-Speech Part-Of-Speech Lexicon and Alternation Rules

As described in Chapter 2, there are a number of approaches implemented for morphological analyzer development of many languages, but for this thesis work we have chosen rule-based approach by using finite state transducer technology with Xerox finite state toolkit. So, as mentioned in the previous section using rule based approach needs to have two components, lexicon and alternation rules of the language. Therefore, for the development of Af-Somali morphological analyzer we have created a lexicon for the morph-tactics of the Af-Somali most important part of speech verbs, nouns and adjectives separately and the rules are captured with the xfst tool. 4.3.1  AF-Somali Verb Lexicon Design

Verbs in Af-Somali are actions, and states. They agree in person and number, numbe r, and also gender. We classified the verbs into 5 groups which are interrelated based on their imperative markers. Their representation and encoding process is described as follows using finite state transducer lexc formalism by notepad as shown in Figure 4-2. 37 | P a g e  

 

As mentioned above in the development of Af-Somali verb lexicon; we classified the verbs into five groups known as V1, V2, V3, V4 and V5 which we illustrated their V1 verbs in the above figure. The figure also shows that, there is a lexicon called verbs which contains five sub lexicons of v1, v2, v3 v4 and v5 which also have a sub lexicon called v_suffixing and the detailed description of the lexicon is found in Appendix-B. V_suffixing sub lexicon contains all the suffixes attached to the root verbs which is described or created in different lexicon as shown in the Figure 4-2. In this lexicon, we have presented the morphemes that goes with the root verbs and in which order they co-occur with the verbs.

 Figure 4-2: Af-Somali verb lexicon

In addition to this, the development of morphological analyzer requires to build finite state networks which present how the morphemes and the root word can co-occur. So, we have 38 | P a g e  

 

 presented the Af-Somali verb finite state networks which shows the morphemes and the root verb and their order as shown in Figure 4-3. And in this process the states are described with the rule of Xerox finite state staring from the root verb till the word ends. As shown in the Figure 4-3, the arcs represent states and the arrows indicate the tags and the double circle indicates that the state is final state.

 Figure 4-3: Af-Somali verbs finite state networks

39 | P a g e  

 

Generally, we have described the word root/stem lexicon lex icon and their morphotactics with an examples as shown in the Table 4.3. For example, the morphotactics of Af-Somali second subgroup verb (V2) words are illustrated in Figure 4-4, and we also presented the finite state network with an example in Figure 4-4, using the verbs of “toosi” and “caddee” which shows how the verbs of second and 3rd group of Af-Somali verbs generated and the order in which they co-occur. Table 4.3: An example of Af-Somali verb morphotactics

Lexical level Toos +V +imp +Sg +inf +pers +paste The word Surface level toos

40 | P a g e  

0

I

0

In

0

ay

toosinay

 

 Figure 4-4: Example representation of Af-Somali second and third group verb FSN

4.3.2  Alternation Rules of AF-Somali Verbs

Af-Somali has a number of morpho-phonemic alternations that a morphological analyzer has to consider. These alternations are dependent on the phonological context, where the features of individual morphemes in the context affect this process. Alternation rules of Af-Somali are defined and the rules are composed with the lexicon file in xfst file. Af-Somali has several phonological alternations involving reduplication, lenition, vowel harmony and tone. 41 | P a g e  

 

In order to construct a finite state transducer for alternation rules, firstly we have defined AfSomali alphabets such as ‘, b, t, j, x, kh, d, r, z, sh, q, k, l, m, n, w, h, y, (‘, B, T, J, X, KH, D, R, S, SH, DH, C, G, F, Q, K, L, M, N, W, H, Y and the five vowels a, e, I, o, u. but Af-Somali also has other five long vowels which are aa, ee, ii, oo, uu. Some vowels in certain words are dropped if a suffix starting with a vowel is attached and the detailed description of Af-Somali alternations are  presented in Appendix-A.

 Figure 4-5: Af-Somali verbs alternation rules

For example caddee is an imperative verb and if we suffix with infinitive eyn, one of the two last ee of imperative is replaced with y as we tried to show in figure 4-5. Table 4.4: Realization with sh when it suffixed with t

The root

English

Person

paste

The verb

Alternation

Maqal

Listen

T

Ay

Maqashay

l->sh

Hadal

Talk

T

Ay

Hadashay

l->sh

Dil

Kill

T

Ay

Dishay

l->sh

Partial ablaut occurs in verbal infinitives with mostly any word of the pattern CaC. The infinitive ending is appended, raises to . It also occurs around person Suffixes and tense ending 42 | P a g e  

 

in . for example tag’go’ takes an infinitive marker ‘I’ and becomes tagi’to go’, but when we

add the 2nd.PL.paste tense of ‘een’ the verb be comes tageen ‘they went’ which means I replaced with e. also in Af-Somali verbs we have to consider the property of l replacement with sh when we add verb with 3rd.Sg.masc marker t and l is realized as sh as represented in Figure 4-6 and as an example in Table 4.4.

 Figure 4-6: Alternation rule representation with xfst

The Person morphemes, the realization of personal suffixes on verbs is a little complex and depends mostly on declension type and whether or not the suffix is preceded by the progressive. Realization of these suffixes is currently all handled by xfst as described the following Figure 47.

43 | P a g e  

 

 Figure 4-7: person morpheme realization

4.3.3  Noun Lexicon Design

 Nouns in Somali are things and we have developed a separate lexicon known as Nounlex.lexc using lexc binary file. They have separate paradigms depending on morpho-phonological stuff,  but are split up into subgroups which correspond to pluralization pattern groups. Hence the AfSomali Noun lexicon in this study is classified in to seven declensions based on their pluralization  pattern. Nominal marked for gender undergo under go gender po polarity larity changes in plural. p lural. We want to mark +Masc and +Fem, such that disambiguation is easier, but knowing the gender of the lemma since it is not predictable from a given plural form is a good thing. So, to solve this we already created a lexicon database, which shows their gender. Nominal are also affixed with demonstrative markers of aas, eer and an. So, we have defined a root lexicon known as noun which intern contains seven sub lexicons each for one declension and they are suffixed with the morphemes of the AfSomali nouns as shown in Figure 4-8 the first declension.

44 | P a g e  

 

 Figure 4-8: Af-Somali noun lexicon

In addition to this, there is also a separate lexicon which includes the suffix tags and the order in which these suffixes co-occur with the root nouns as illustrated in the following Figure 4-9. But the general co-occurrence of the root noun with the morphemes are shown in figure by using finite state networks and this shows the state in which the transducer passes. This figure simply shows the first declension known as D1_f which are feminine nouns and we have put the detailed description of the noun lexicon in Appendix-C.

 Figure 4-9: Af-Somali noun suffixes

45 | P a g e  

 

In general, the morphemes attached to the root nouns are number (Sg,Pl), definiteness (def,indef), interrogatives (inter), possessives and demonstratives as we presented in Figure 4-10 which the finite state network of the Af-Somali nouns.

 Figure 4-10: Af-Somali verb finite state networks

For example, the morph-tactics of Af-Somali feminine noun of declension2 words as found in above Finite state networks are described with Table 4.5.

46 | P a g e  

 

Table 4.5: Example of noun declension 2 morphotactics

Lexical

Mindi

D2_F

+Pl

+def

+inter

The noun

Mindi

0

Yo

Ha

ee

mindiyahee

level Surface level 4.3.4  Alternation Rules of AF-Somali Nouns

Generally, to develop and use a lexicon and alternation rules using Xerox finite state toolkit we have to define the characters used in that language. So, in the following sections we defined the variables of Af-Somali and the rules used to implement the transducer. In declension 5 some consonants becomes double when we make the noun plural and this process is captured with the alternation rule components as shown the following Figure 4-11 and detailed description of the alternation rules are presented in Appendix-A.

 Figure 4-11: Af-Somali noun alternation rules

The other rule that have to be considered in the xfst is the deletion of when it follows a back consonant (which is not itself). For example Af-Somali noun magac possess this property when it is suffixed with the definite marker ‘ka. AF-Somali has two kinds of reduplication: partial and full. Reduplication is typically a strategy for marking plural in nouns and adjectives in some declensions, but also appears in verbs as a derivational process. The inflectional processes are quite productive, but the derivational processes proces ses are not as productive. The Partial reduplication occurs in the 4th declension of o f nouns, but a subtype 47 | P a g e  

 

of these 4th declensional nouns also has full reduplication. Partial reduplication includes epenthesis of and in nouns it is suffixing. Also, the template is slightly different. For late is see with an example found in the following Table 4.6. Table 4.6: Partial reduplication of nouns

Root noun

English

Af Qoys

Suffixing

Number

The noun

Mouth,language Af

PL

Afaf

Family

Pl

Qoysas

As

So, this alternation can be presented with an example in table 4.7 as follows. Table 4.7: The alternation of declension 5 representation

Verb

Declension

Plural marker

The rule

Sacab Dameer

Dec-5 Dec-5

CCo CCo

sacabbo Dameerro

4.3.5  Adjectives Lexicon Design

The Af-Somali adjective is formed by an adjectival root and the inflected forms of the reduced  paradigm of the verb yahay ‘to be’. A reduced paradigm is characterized by  reduced distinctions

in subject marking.

 Figure 4-12: Af-Somali adjective lexicon

48 | P a g e  

 

Reduced present forms are identical to the root, whereas past forms display distinct inflectional endings. As described in Figure 4-12, Af-Somali Af-S omali adjectives are few in number and we defined ro root ot lexicon known as adjectives and sub lexicon known as Ad_suffix which indicates the suffixes attached to the Adjectives using lexc formalism. The Af-Somali adjectives inflectionally use  person markers and tenses which needs with the agreement of numbers as shown in Table 4.8 with an example. Table 4.8: Example of adjective morphotactics

adjective

1st.Sg

1st.Pl

2nd.masc

2nd.fem

pres

paste

The word

fiican

0

N

Y

T

Ahay/ihiin 0/een

fiicanahay

fiican

0

N

Y

T

Ahay/ihiin 0/een

Fiicantahay

Fiican

0

N

Y

T

Ahay/ihiin 0/een

Fiicannahay

Fiican

0

N

Y

T

Ahay/ihiin 0/een

fiicanyihiin

In addition to this, the morphotactic representation of adjectives are also presented in the following Figure 4-13 and describes the order that the suffixes attached with the adjectives.

 Figure 4-13: Af-Somali Adjective finite state networks 49 | P a g e  

 

Chapter 5 : Experimentation and Evaluation 5.1 

Introduction

This chapter discusses the test and evaluation conducted on Af-Somali Morphological analyzer. In the discussion emphasis is given to assess the outputs produced and the test result found. So the testing of any sizable natural-language processing system is notoriously difficult [8] and the morphological analyzer is an essential and basic tool for building any language processing application for a natural language e.g., Machine Translation system. 5.2  Experimentation

We have developed the morphological analyzer using XFST tool developed by Xerox. It supports UTF-8 character coding which is important for the implementation of Af-Somali computational morphologies. The tool is based on a lexicon and a set of rules for root and morphemes. This lexicon contains the list of root words and its category separated by a tab. The analyzer fails on giving a complex word as an input and the corresponding root word does not exist in the lexicon file. We have developed the Af-Somali lexicon and the rules r ules file required for analysis. The lexicon is designed to reflect the word categories in the Af-Somali language. The lexicon contains different states for each of the root words, starting with the declaration of the tags. For example the verb lexicon is illustrated as shown in Figure 4-2. The root words and its category are separated by a semicolon as shown in Figure 5-1 of Af-Somali verb. The left side of the colon represents the upper side or the analysis form of the transducer, and the right side shows the lower side or the surface form as presented on Appendix-B. The hash symbol at the end of a row indicates the end of the transition, and therefore, that state is the final state. The anal analyzer yzer takes the surface form as input and produces the result as the grammatical structure of the word or the lexicon form.

50 | P a g e  

 

 Figure 5-1: AF-Somali Verb to suffix attachment

5.3 

Discussion and Evaluation

Evaluation of a morphological analyzer can be performed using a reliable broad-coverage morphological analyzer, or by having a human experts annotate a text manually. The former option was not possible as we have no such a tool developed for Af-Somali. Af -Somali. The latter option is very hard and difficult to perform manually and can be done on relatively small texts. Generally to evaluate and test any morphological analyzer requires to measure the following things the total number of word tokens correctly accepted by the analyzer versus the number of words incorrectly processed by the analyzer and the total percentage that are correctly correc tly analyzed in context versus the total percentage of tokens that are not analyzed at all in the context. Although, we have to know the total percentage of wrongly wrongl y analyzed linguistically regardless of context. Finall Finally, y, how many correct analysis have not output for a token is calculated. Therefore, we have manually annotated 220 tokens, 90 nouns, 120 verbs verb s and 8 Adjectives of words from the book known as (qaamuus). 77 nominal, 105 verbal and 6 adjectives were correctly analyzed. The results were evaluated by b y a human reader familiar with the language. An output was considered correct only if it found all legal combinations of roots and grammatical structure for a 51 | P a g e  

 

given word form and included no incorrect roots or structures. Thus, the overall accuracy of the system is: 84.1% was correctly analyzed as shown in Table 5.1. Table 5.1: Overall accuracy of the system

ni l

l mo

90

jd re A V

N

ina

ct%e N

oc

ctive sb

ina

120

8

mo se

85.55

rr

cte V

oc

re

sb rr

ctive

cte A

oc

jd

87.5% 75%

rr

to to T T

218

188

%

C

ro

gn

cter gn

cter la

la

or w

30

ro

or

C

% ni W

%

86.2% 13.76 %

So, from this we can understand that, the total number tokens analyzed was 218 and out of this 86.2% was correctly analyzed, 13.76% is wrongly analyzed and total 10 tokens failed to be analyzed by the system. Lastly, we have observed that, there was an errors because of the limited size of lexicon we annotated and also we haven’t incorporated Guesser component which helps to guess the words

that was not found in the lexicon. In addition to this, the Af-Somali authors write words in different formats and this gives to analyze one word in different way. For example, some authors or writers write the word Dawlad while others write Dowlad (government).

52 | P a g e  

 

Chapter 6 : Conclusion and Future Work 6.1  Conclusion

Language is one of the main tools for communication. Thus, its investigation will provide better  perspectives on all other aspects related with NLP. However, the formalization and computational analysis of Af-Somali morphology are not worked out. In other words, there is lack of tools for analysis of Af-Somali morphology from computational point of view. Moreover, grammar resources contain variances depending on scholars. For example, in some resources there are that write down the adjectives as verbs, whereas others describe adjectives as a separate word class. To summarize, building correctly working system of morphological analysis by combining all information is valuable for further researches on the language. In this thesis, a detailed analysis of Af-Somali has been performed. Also, the formalization of rules over all morphotactics of AfSomali is worked out. By combining all gained information, a morphological analyzer is constructed. This thesis reports on an attempt made to develop Af-Somali morphological analysis system using finite state two level approach. The report started off with brief introduction to concepts and principles used in the study. The introduction also includes description of morphological analysis and the unique feature of Af-Somali words along with their peculiar morphemic components. The different subcategories of rule-based approaches were described briefly. In this study, finite state two level approach was considered. Finite state transducer is the main tool for the development of morphological analyzer and the implementation has been based on [8]. Two level morphology is proving to be very well suited to Af-Somali morphology. A major advantage of finite state two-level implementations of morphology is their inherent bi-directionality; the same system is used for both analysis and generation of word forms in the language. An additional advantage is the high efficiency of finite-state networks that allows to process even large words within a few seconds. We presented the design and implementation of analyzed categories into a finite state transducer using Xerox Finite State Toolkit in chapter 4. First, all forms of verbs, verbs , nouns and Adjectives have been implemented in separate lexc formalism. The rules identified have been implemented in xfst files respectively. The finite state transducers of each category categor y and finite state 53 | P a g e  

 

transducers of rules for respective categories are composed separately. All the finite state transducers have been composed together resulting into a single lexicon finite state transducer which can be used as morphological analyzer and generator. However, the study is carried out under a number of constraints. The main challenge of these was to figure out the linguistic, especially the exact morphotactical details needed for analysis and (generation). The lack of any linguistic lexical resources, the list of words for Af-Somali in an electronic form was so demanding. And also it was difficult to find out the morphological rules that was used in the system. 6.2 

Future Work

The morphological analyzer/generator can be useful for linguists who wish to understand the morphological processes of Af-Somali, as well as for language learners to aid in their language comprehension and the practice of word conjugation or declension, The main weakness of the system results from the limited number of available roots and stems in the lexicon, to incorporate Guesser and thus can be improved by increasing increasin g the number of stems and phonological alternation rules and using Guesser component. As this work deals only with inflectional morphology and the northern Somali dialect, there is a need to extend the system to also include derivational and compounding morphology and the Benaadir and Maay of Af-Somali morphology. Finally, it is good to note that when the SoMorph is completely describe Af-Somali Af- Somali morphological analysis it will be useful tool for large-scale NLP applications like machine language translation, Pos checkers in the future.

54 | P a g e  

 

References [1]  Annarita Puglielli iyo Cabdalla Cumar Mansuur, “QAAMUUSKA AF‒SOOMAALIGA”, AF‒SO OMAALIGA”, diyaarintii Roma TRE-PRESS, 2012 [2]  Ali Mohamed “Development of morphological analyzer for afaraf”, M.sc Thesis, Debra Birhan University, 2014. [3]  Andrzejewski, B. W. the Declensions of Somali Nouns, London: School of Oriental and African Studies, 1964 [4]  Banti, G. ‘Two Cushitic Systems: Somali and Oromo Nouns’, in H., 1988  [5]  BATI, T. B., AUTOMATIC MORPHOLOGICAL ANALYZER FOR AMHARIC, 2002. [6]  Beesley K. R., Morphological Analysis and Generation:A First-Step in Natural Language Processing, 2004, p. 1 [7]  Elaine Uí Dhonnchadha, A Two-level Morphological Analyser and Generator for Irish using Finite-State Transducers, Institiúid Teangeolaíochta Éireann 31 Plás Mhic Liam, Baile Átha Cliath 2, Éire, and Dublin City University Glasnevin, Dublin 11 , Ireland [8]  Fissaha and Haller, “First larger -scale -scale morphological analyzer for Amharic verbs used XFST”, the Xerox Finite State Tools, 2003.  2003. 

[9]  Jackson Muhirwe, Computational Analysis of Kinyarwanda Morphology: The Morphological Alternations. Advances in Systems Modelling and ICT Applications. [10]  Jurafsky Daniel and James Martin, Speech and Language Processing , Prentice-Hall, 2000 (referenced as J&M throughout this handout.4 [11]  Karttunen, Lauri, Kaplan & Zaenen, Two-level morphology with composition, 1992. [12]  Kenneth Beesley and Lauri Karttunen, Finite State Morphology, CSU Studies in Computational Linguistics, 2003. [13]  Kenneth Beesley , Finite-State Morphological Analysis and Generation of Arabic at Xerox Research, Status and Plans in 2001, Xerox Research Centre Europe [14]  Kenneth Beesley, Finite state morphology / Kenneth Beesley and Lauri Karttunen, p. cm. - (Studies in computational linguistics; 3), 1954. [15]  Koskeniemmi, K., Two-level morphology: a general computational model for word-form recognition and production. Ph.D. thesis, University of Helsinki, 1983. 55

 

[16]  Lauri Karttunen, Constructiong lexical transducers. In the proceeding of the fifteenth international conference on computional linguistics, 1994. [17] 

Çagrı Çöltekin, A Freely Available Morphological Analyzer for Turkish, Center for Language and

Cognition (CLCG) University of Groningen 

[18]

 

Elaine Uí Dhonnchadha, A Two-level

Morphological Analyser and Generator for Irish using us ing Finite-State Transducers, institute of technology of Éireann 31 Plás Mhic Liam, Baile Átha Cliath 2, Éire, and Dublin City University Glasnevin, Dublin 11, Ireland

[19]  Gulshat Kessikbayeva and Ilyas Cicekli, A Rule Based Morphological Analyzer and A Morphological Disambiguator for Kazakh Language, Linguistics and Literature Studies, 2016 [20]  Kenneth R. Beesley, Finite-State Morphological Analysis and Generation of Arabic, Xerox Research Centre Europe 6, chemin de Maupertuis 38240 MEYLAN, France, 2001 [21]  Mesfin Abate, Yaregal Assabie (2014).”Development of Amharic mor  phological analyzer using memory based approach”, 9th International Conference on NLP, PolTAL, Warsaw,

Poland, September 17-19, 2014. Proceedings. [22]  Michael Gasser (2009). “HornMorpho1.0: a system for morphological processing of Amharic, Oromo, and Tigrinya”.

[23]  KhumbarDebbarma,

BrajaGopalPatra,

Dipankar

Das,

Sivaji

Bandyopadhyay2

Morphological Analyzer for Kokborok [24]  KorayAk, OlcayTanerYıldız, 2011. Unsupervised Morphological Analysis Using Tries, Dept. of Computer Science and Engineering. En gineering. Isık University 

 

[25]  Nicola Lampitelli, Evaluative morphology in Somali, Université Paris Diderot-Paris [26]   Nimaan Abdillahi, Building and Evaluating Af-SomaliCorpora, Proceedings of the 2014 Workshop on the Use of Computational Methods in the Study of Endangered Languages,  pages 73 – 7766 [27]  R.Akilan* and Prof. E.R.Naganathan , Morphological Analyzer for Classical Tamil Texts: A Rulebased approach, Research Scholar, (Department of Computer Science, Bharathiar University, Coimbatore) Programmer, Central Institute of Classical Tamil, Chennai. [28]  Shuly Wintner and Gelbukh: Finite-State Technology as a Programming Environment, CICLing 2007, LNCS 4394, pp. 97 – 106, 106, 2007.

56

 

[29]  Saba Amsalu, Girma A. Demeke. (2006). Non-concatinative Finite State Morphotactics of Amharic Simple Verbs. [30]  Xuri TANG , English Morphological Analysis with Machine-learned Rules, Dept. Foreign Languages, Wuhan University of Science and Engineering, 430073  Wuhan, P. R. China ,

[31]   Nicola Lampitelli, The morphophonology of Somali nouns, June, 15-18 2011 [32]  Kazakov Dimater & Manandhar Suresh (2000) Unsupervised Learning for Word Segmentation Rules with Genetic Algorithms and Inductive Logic Programming. [33]  John I. Saeed, “Somali Reference Grammar”, the University of Virginia, 26 Sep 2007   [34]  Yitayal Abate, 2013.” Morphological analyzer for Ge’ez verbs us ing machine learning approach”, in the thesis of Addis Ababa University. 

[35]  Shlomo Yona , A finite-state based morphological analyzer for Hebrew, thesis in

Department of Computer Science, November, 2004.  

57

 

1.9  Appendix-A: Alternation Alternation Rules for Noun and Verb

1

 

2

 

3

 

1.10 Appendix-B: Af-Somali Af-Somali verb Lexicon

!!Somorph-lex.txt LEXICON Root

kaamil

V1suffixing;

Verb;

naafow

V1suffixing;

LEXICON verb

naanays

V1suffixing;

aammus

V1suffixing;

naaqus

V1suffixing;

abab

V1suffixing;

qaad

V1suffixing;

aadaan

V1suffixing;

raac

V1suffixing;

aammin

V1suffixing;

raadgoob

V1suffixing;

daab

V1suffixing;

raadgur

V1suffixing;

daabul

V1suffixing;

saacidV1suffixing;

edeg

V1suffixing;

saaf

V1suffixing;

faalal V1suffixing; faan

V1suffixing;

!!aaddi

gaad

V1suffixing;

!!aammusi

gaadaan

V1suffixing;

!!baafi

V2suffixing;

gaaddabbuur V1suffixing;

!!baahi

V2suffixing;

gaadh

!!caafi

`

haajir V1suffixing;

!!faafi

V2suffixing;

habaar

V1suffixing;

!!faahi

V2suffixing;

kaah

V1suffixing;

!!gaabi

V2suffixing;

V1suffixing;

4

V2suffixing; V2suffixing;

V2suffixing;

 

!!gaadhsii !!haadi

V2suffixing; V2suffixing;

 baadiyee

V3suffixing;

 baahee

V3suffixing;

toosi

V2suffixing;

caalsaaree

V3suffixing;

maadi

V2suffixing;

caanee

V3suffixing;

maahi

V2suffixing;

hallee

V3suffixing;

maalgeli

V2suffixing;

hambalyee

V3suffixing;

qaawi

V2suffixing;

hambee

V3suffixing;

rafaadi

V2suffixing;

qaaligaree

V3suffixing;

rafaaji

V2suffixing;

qaaliyee

V3suffixing;

ragaadi saaci

V2suffixing; V2suffixing;

qaamee raacdee

V3suffixing; V3suffixing;

taabsii

V2suffixing;

saamee

V3suffixing;

uburi

V2suffixing;

saandambee V3suffixing;

ubxi

V2suffixing;

saawee

V3suffixing;

xaadi

V2suffixing;

taakee

V3suffixing;

xaadiri

V2suffixing;

taallee tabaabulee

V3suffixing; V3suffixing;

caddee

V3suffixing;

waayee

V3suffixing;

dhabee

V3suffixing;

yaree

V3suffixing;

aabee

V3suffixing;

aafee

V3suffixing;

abyood

V4suffixing;

aaladee

V3suffixing;

adaadumo

V4suffixing;

 baabee

V3suffixing;

 baaho

V4suffixing;

5

 

caashaqo

V4suffixing;

aamuso

V5suffixing;

lifaaqo

V4suffixing;

 badso

V5suffixing;

liidaanyoo

V4suffixing;

 bahayso

V5suffixing;

liido

V4suffixing;

caddayso

V5suffixing;

qaysho

V4suffixing;

cadgooso

V5suffixing;

qiiroo V4suffixing;

galdhacso

V5suffixing;

qodo

V4suffixing;

hakaabso

V5suffixing;

rigoo

V4suffixing;

halabayso

V5suffixing;

riiqo

V4suffixing;

ilaaleyso

V5suffixing;

riiqo saloolo

V4suffixing; V4suffixing;

 janjeerso naso

V5suffixing; V5suffixing;

tacdaaro

V4suffixing;

qaawiso

V5suffixing;

tafaxaydo

V4suffixing;

qalayso

V5suffixing;

tafwareemo V4suffixing;

raacdayso

V5suffixing;

unko

V4suffixing;

samayso

V5suffixing;

urugoo waabo

V4suffixing; V4suffixing;

tallaabso tallaabso

V5suffixing; V5suffixing;

xeroo

V4suffixing;

taraarayso

V5suffixing;

xeydo

V4suffixing;

ubaxayso

V5suffixing;

yeelo

V4suffixing;

udgoonso

V5suffixing;

yeelo

V4suffixing;

waabariiso

V5suffixing;

xabeebso

V5suffixing;

xabkayso

V5suffixing;

aammiinso

V5suffixing; 6

 

LEXICON V1suffixing +V1+Sg+1P:0 +V1+Pl:a

+V2+Sg+inf:in #;

+V2+1PSg:y

#;

#; #;

+V2+3PSgmasc:y

#;

+V1+Sg+inf:i

#;

+V2+3PPl:y

#;

+V1+2P:s

#;

+V2+3PSgfem:s

#;

+V1+Sg+3Pfem:t

#;

+V2+2PSg:s

#;

+V1+1PPl:n

#;

+V2+2PPl:s

#;

+V1+pres:aa

#;

+V2+1PPl:n

#; #;

+V1+1P+pres:naa

#;

+V2+pres:aa

+V1+paste:ay

#;

+V2+2P+pres:saan

#;

+V2+3PPl+paste:yay

#;

+V2+Sg+inf+paste:nay

#;

+V2+3PPl+paste:yaan

#;

+V1+2P+paste:tay #; +V1+1PPl+paste:nay

#;

+V1+3Pfem+paste+1PPl:teen

#;

+V1+paste+1PPl:een

#;

+V2+paste:ay

#;

+V1+1Ppres.conti:ayaa

#;

+V2+2PSg+paste:seen

#;

+V1+2Ppres.conti:aysaa #; +V1+1PPlpres.conti:aynaa #;

+V2+3PPl+paste:yeen

#;

+V1+2PPlpres.conti:aysaan #;

LEXICON V3suffixing

+V1+3PPl+pres.conti:ayaan #; LEXICON V2suffixing

+V3:ee

#;

+V3+Sg:0

#; #;

+V2:i

#;

+V3+Pl:ya

+V2+Sg:0

#;

+V3+Sg+inf:yn

+V2+Pl:ya

#;

+V3+Sg+3PSgmasc:y 7

#; #;

 

+V3+3PSgfem:s

#;

+V4+Sg+1PSg+paste:aan

#;

+V3+1PPl:n

#;

+V4+Sg+paste:ay

#;

+V3+pres:aa

#;

+v4+3PSgfem+paste:teen

#;

+V4+Sg+1PSg+paste:een

#;

+V3+3PSgfem+pres:saan

#;

+V3+Sg+3PSgmasc+paste:yaan

#;

+V3+paste:ay

#;

+V3+3PSgfem+paste:seen #; +V3+Sg+3PSgmasc+paste:yeen

LEXICON V5suffixing #;

+V5+Sg:0

#;

+V5+Sg+3Pmasc:0 #; LEXICON V4suffixing

+V5+Pl:da

#;

+V4:o

#;

+V5+Sg+inf:an

#;

+V4+Sg:0

#;

+V5+3PSgfem:t

#;

+V5+Sg+1PPl:n

#; #;

+V4+Pl:da

#;

+V4+Sg+inf:an

#;

+V5+Sg+pres:aa

+V4+Sg+1PSg:0

#;

+V5+3PSgfem+pres:taan

+v4+3PSgfem:t +V4+1PPl:n

#;

+V5+Sg+3Pmasc+pres:aan #; +V5+Sg+paste:ay #;

#;

+V5+3PSgfem+paste:teen #;

#;

+V4+Sg+pres:aa +v4+3PSgfem+pres:taan

#;

+V5+Sg+3Pmasc+paste:een #;

8

#;

 

1.11 Appendix-C: Af-Somali Noun lexicon !!Somorph-lex.txt LEXICON Root Nouns; LEXICON Nouns

qori

N2MYo;

aalad

N1;

qurub N2MYo;

abaar

N1;

ubax

bad

N1;

unug N2MYo;

beer

N1;

xijaab N2MYo;

hees

N1;

kab

N1;

qor

kal

N1;

quraac N2FYo;

naag

N1;

sabti

qayb

N1;

subax N2FYo;

N2FYo;

N2FYo;

mindi N2FYo;

saacadN1; sannad

N1;

shimbir

N1;

suuradN1; toobad

N2MYo;

N1;

aroos N2MYo;

gabadh

N3F2V;

gacan

N3F2V;

galab

N3F2V;

kibis

N3F2V;

xubin

N3F2V;

asaas N2MYo; dalool N2MYo;

garab N3M2V;

dheri N2MYo;

hilib

N3M2V;

erey

ilig

N3M2V;

 jilib

N3M2V;

N2MYo;

magac N2MYo;

9

9

 

xadhig N3M2V;

yaraan N5MCC;

baal

N4FaC;

daymo

N6Moyin;

seef

N4FaC;

dhismo

N6Moyin;

weel

N4FaC;

barkimo

N6Moyin;

wiil

N4FaC; abeeso

N6Foyin; N6Foyin;

af

N4MaC;

daawo

baaf

N4MaC;

darajo N6Foyin;

ceel

N4MaC;

hooyo

N6Foyin;

dal

N4MaC;

magalo

N6Foyin;

fal

N4MaC;

taallo N6Foyin;

miis

N4MaC;

ujeeddo

N6Foyin;

waddo

N6Foyin;

qoys N4MaC; riig

N4MaC;

shil

N4MaC;

aabbe

N7Myaal;

weel

N4MaC;

beenaale

N7Myaal;

biyoole

N7Myaal;

aabbur N5MCC;

caanoole

N7Myaal;

albaab N5MCC;

fure

N7Myaal;

alool

gacaliye

N7Myaal;

N5MCC;

baabuur

N5MCC;

 jaalle N7Myaal;

dagaal N5MCC;

walaale

N7Myaal;

dameer

waraabe

N7Myaal;

yeele

N7Myaal;

N5MCC;

hoteel N5MCC; ijaar

N5MCC;

sacab N5MCC; shaqal N5MCC; wadaad N5MCC;

LEXICON N1 +N1+Sg:0 +N1+Pl:o

#; #;

10

 

+N1+defM:ka #;

+N2F+Pl:yo #;

+N1+defF:ta

+N2F+Pl:O #;

#;

+N1+defF+inter:tee

#;

+N2F+defF:ta #;

+N1+defM+inter:kee

#;

+N2F+defF:ha +N2F+defF :ha #;

+N1+defF+1PSg:tayda #;

+N2F+defF+inter:yahee #;

+N1+defF+2PSg:taada #;

+N2F+defF+inter:tee #;

+N1+defF+3Pmasc:tiisa

#;

+N2F+defF+1stSg:tayda #;

+N1+defF+3Pfem:teeda

#;

+N2F+defF+2ndSg:taada #;

+N1+defF+1PPl:taayada

#;

+N2F+defF+3rdmasc:tiisa #;

+N1+defF+close:tan

#;

+N2F+defF+3rdfem:teeda #;

+N1+defF+near:tas

#;

+N2F+defF+1stPl:taa +N2F+defF +1stPl:taayada yada #;

+N1+defF+far:teer

#;

+N2F+defF+close:tan +N2F+defF +close:tan #; +N2F+defF+near:tas #;

LEXICON N2MYo

+N2F+defF+far:teer #;

+N2M+Sg:0 #; +N2M+Pl:yo #;

LEXICON N3F2V

+N2M+defM:ka +N2M+defM :ka #;

+N3F+Sg:0 #;

+N2M+defM+inter:kee #;

+N3F+Pl:0 #;

+N2M+defM+1PSg:kayga #;

+N3F+defF:ta #;

+N2M+defM+2PSg:kaaga #;

+N3F+defF+inter:tee #;

+N2M+defM+3Pmasc:kiisa #;

+N3F+defF+1PSg:tayda #;

+N2M+defM+3Pfem:keeda #;

+N3F+defF+2PSg:taada #;

+N2M+defM+1PPl:kaayaga #;

+N3F+defF+3Pmasc:tiisa #;

+N2M+defM+close:kan +N2M+defM+ close:kan #;

+N3F+defF+3Pfem:teeda #;

+N2M+defM+near:kas +N2M+defM+n ear:kas #;

+N3F+defF+1PPl:taayada #;

+N2M+defF+far:keer #;

+defF+close:tan #; +defF+near:tas +defF+near:t as #;

LEXICON N2FYo +N2F+Sg:0 #;

+N3F+defF+far:teer #;

11

 

LEXICON N3M2V

LEXICON N4MaC

+N3M+Sg:0 #;

+N4M+Sg:0 #;

+N3M+Pl:0 #;

+N4M+Pl:aC #;

+N3M+defM:ka +N3M+defM :ka #;

+N4M+defM:ka +N4M+defM :ka #;

+N3M+defM+inter:kee #;

+N4M+defM+inter:kee #;

+N3M+defM+1PSg:kayga #;

+N4M+defM+1PSg:kayga #;

+N3M+defM+2PSg:kaaga #;

+N4M+defM+2PSg:kaaga #;

+N3M+defM+3Pmasc:kiisa #;

+N4M+defM+3Pmasc:kiisa #;

+N3M+defM+3Pfem:keeda #;

+N4M+defM+3Pfem:keeda #;

+N3M+defM+1PPl:kaayaga #;

+N4M+defM+1PPl:kaayaga #;

+N3M+defM+close:kan +N3M+defM+ close:kan #;

+N4M+defM+close:kan #;

+N3M+defM+near:kas +N3M+defM+n ear:kas #;

+N4M+defM+near:kas +N4M+defM+ near:kas #;

+N3M+defM+far:keer +N3M+defM+f ar:keer #;

+N4M+defM+far:keer #;

LEXICON N4FaC

LEXICON N5MCC

+N4F+Sg:0 #;

+N5M+Sg:0 #;

+N4F+Pl:aC #;

+N5M+Pl:CC #;

+N4F+defF:ta #;

+N5M+defM:ka +N5M+defM :ka #;

+N4F+defF+inter:tee +N4F+defF+ inter:tee #;

+N5M+defM+inter:kee #;

+N4F+defF+1PSg:tayda +N4F+defF+ 1PSg:tayda #;

+N5M+defM+1PSg:kayga #;

+N4F+defF+2PSg:taada #;

+N5M+defM+2PSg:kaaga #;

+N4F+defF+3Pmasc:tiisa #;

+N5M+defM+3Pmasc:kiisa #;

+N4F+defF+3Pfem:teeda #;

+N5M+defM+3Pfem:keeda #;

+N4F+defF+1PPl:taayada +N4F+defF+ 1PPl:taayada #;

+N5M+defM+1PPl:kaayaga #;

+N4F+defF+close:tan +N4F+defF+ close:tan #;

+N5M+defM+close:kan #;

+N4F+defF+near:tas #;

+N5M+defM+near:kas +N5M+defM+ near:kas #;

+N4F+defF+far:teer +N4F+defF+ far:teer #;

+N5M+defM+far:keer #;

12

 

LEXICON N6Moyin

+N6F+defF+3Pmasc:tiisa #;

+N6M+Sg:0 #;

+N6F+defF+3Pfem:teeda #;

+N6M+Pl:oyin #;

+N6F+defF+1PPl:taayada #;

+N6M+defM:ka +N6M+defM :ka #;

+N6F+defF+close:tan +N6F+defF +close:tan #;

+N6M+defM+inter:kee #;

+N6F+defF+near:tas #;

+N6M+defM+1PSg:kayga #;

+N6F+defF+far:teer #;

+N6M+defM+2PSg:kaaga #; +N6M+defM+3Pmasc:kiisa #;

LEXICON N7Myaal

+N6M+defM+3Pfem:keeda #;

+N7M+Sg:0 #;

+N6M+defM+1PPl:kaayaga #;

+N7M+Pl:yaal +N7M+Pl: yaal #;

+N6M+defM+close:kan +N6M+defM+ close:kan #;

+N7M+defM:ka +N7M+defM :ka #;

+N6M+defM+near:kas +N6M+defM+n ear:kas #;

+N7M+defM+inter:kee #;

+N6M+defM+far:keer +N6M+defM+f ar:keer #;

+N7M+defM+1PSg:kayga #; +N7M+defM+2PSg:kaaga #;

LEXICON N6Foyin

+N7M+defM+3Pmasc:kiisa #;

+N6F+Sg:0 #;

+N7M+defM+3Pfem:keeda #;

+N6F+Pl:oyin #;

+N7M+defM+1PPl:kaayaga #;

+N6F+defF:ta #;

+N7M+defM+close:kan #;

+N6F+defF+inter:tee +N6F+defF+ inter:tee #;

+N7M+defM+near:kas +N7M+defM+ near:kas #;

+N6F+defF+1PSg:tayda +N6F+defF+ 1PSg:tayda #;

+N7M+defM+far:keer #;

+N6F+defF+2PSg:taada #;

13

 

Submitted by: Mahdi Yonis Student

_____________________ `

Signature

May 30, 2017 Date

Approved by:

1.  Yaregal Assabie Advisor

______________________ Signature

May 30, 2017 Date

2.   ______________________ ______________________ ______________________ ____________________ Chairman, Dept’s

Signature

Date

Graduate Committee 3.   _______________________ ______________________ ______________________ ___________________ Chairman, Faculty’s

Signature

Date

Graduate Commission 4.   _______________________ ______________________ ______________________ ___________________ Dean, Graduate School

Signature

Date

View more...

Comments

Copyright ©2017 KUPDF Inc.
SUPPORT KUPDF