Recent Updates

  • Sep 25 Online Portal will be opened soon
  • Sep 10 Dataset Released. Register here.
  • Aug 30 Website for WSMP launched
  • Aug 13 Hackathon for WSMP with FIRE conference
Sep 25
The Online Portal for the Hackathon on "Word Segmentation and Morphological Parsing for Sanskrit" will be opened on September 25, 2021. The platform for the hackathon is Codalab.

About the Hackathon

Sentential analysis in Sanskrit poses various challenges in each of the stages of word segmentation, morphological parsing, dependency parsing, etc. This hackathon focuses on developing methodologies to handle Word Segmentation and Morphological Parsing in Sanskrit.

  • Task 1

    Word Segmentation: The writings in Sanskrit follow a structured scheme where the words often undergo phonetic transformations at the juncture of their boundaries, thus modifying the phonemes at these boundaries and also obscuring the original boundaries. This process of euphonic assimilation or joining of the words is known as Sandhi and the splitting or segmentation of such joint word forms is known as Sandhi-Viccheda. While Sandhi is deterministic, Sandhi-Viccheda is not. It is desirable to identify the individual words in a sentence and obtain the semantically most valid split of the sentence for subsequent processing in downstream tasks.

  • Task 2

    Morphological Parsing: Sanskrit is a morphologically rich fusional language, and morphology plays a crucial role in carrying the grammatical information encoded in a sentence. However, this is a challenging task due to the prevalence of syncretism and homonymy expressed by the words. A word is formed by the combination of preverb(s), stem(s) and suffixes. The suffixes denote the morphological category of the word. The information regarding the grammatical role played by a word in a sentence is latent in the morphological category of the word. This task focuses on predicting the morphological tags for each of the words in a given sentence.

  • Task 3

    Combined Word Segmentation and Morphological Parsing: Given the sequential dependency between the aforementioned tasks, we encourage a joint or pipeline based formulation, by combining Tasks 1 and 2. If a pipeline structure is deployed, then the segmented form of the sentence from the Word Segmentation should be fed to the Morphological Parser. On the other hand, one participating team may model both the tasks jointly. There are interdependencies with both the tasks, and joint modelling of such related tasks is preferable over a pipeline-based approach. So, both the segmented forms and the morphological tags are to be predicted.

Applications
Sentential Analysis for Sanskrit sentences | Dependency parsing of Sanskrit sentences | Discourse analysis Constituency analysis of compounds in Sanskrit | Analysing Compounds (Named Entity Recognition)

Important Dates

  • 10th September 2021

    Training Dataset Release
    -----

  • 25th September 2021

    Online Portal for Registration
    -----

  • 20-21 November 2021

    Test Data Release
    Date of Hackathon
    -----

  • 13-17 December 2021

    FIRE '21
    Declaration of Results
    -----