Recent Updates

  • Nov 22 Hackathon Completed! Check Results here
  • Nov 20 We are live!! Checkout here
  • Nov 20 Test Data can be accessed here
  • Nov 1 The Discord Server is live here
  • Oct 15 Online Portal is open and can be accessed here
  • Sep 10 Dataset Released. Register here.
  • Aug 30 Website for WSMP launched
  • Aug 13 Hackathon for WSMP with FIRE conference
Nov 22
The Hackathon has officially ended. The leaderboard is available here

About the Hackathon

Sentential analysis in Sanskrit poses various challenges in each of the stages of word segmentation, morphological parsing, dependency parsing, etc. This hackathon focuses on developing methodologies to handle Word Segmentation and Morphological Parsing in Sanskrit.


There are three tasks: Word Segmentation, Morphological Parsing and Combined Word Segmentation & Morphological Parsing. The online portal for the competition will be in Codalab. The participants are requested to register for the dataset and can try out their models with the training and development dataset on the online portal.

The top 3 performs in the competition will be awarded with prizes. The details of the prizes will be updated soon. The details for the tasks are given below:

  • Task 1

    Word Segmentation: The writings in Sanskrit follow a structured scheme where the words often undergo phonetic transformations at the juncture of their boundaries, thus modifying the phonemes at these boundaries and also obscuring the original boundaries. This process of euphonic assimilation or joining of the words is known as Sandhi and the splitting or segmentation of such joint word forms is known as Sandhi-Viccheda. While Sandhi is deterministic, Sandhi-Viccheda is not. It is desirable to identify the individual words in a sentence and obtain the semantically most valid split of the sentence for subsequent processing in downstream tasks.

  • Task 2

    Morphological Parsing: Sanskrit is a morphologically rich fusional language, and morphology plays a crucial role in carrying the grammatical information encoded in a sentence. However, this is a challenging task due to the prevalence of syncretism and homonymy expressed by the words. A word is formed by the combination of preverb(s), stem(s) and suffixes. The suffixes denote the morphological category of the word. The information regarding the grammatical role played by a word in a sentence is latent in the morphological category of the word. This task focuses on predicting the morphological tags for each of the words in a given sentence.

  • Task 3

    Combined Word Segmentation and Morphological Parsing: Given the sequential dependency between the aforementioned tasks, we encourage a joint or pipeline based formulation, by combining Tasks 1 and 2. If a pipeline structure is deployed, then the segmented form of the sentence from the Word Segmentation should be fed to the Morphological Parser. On the other hand, one participating team may model both the tasks jointly. There are interdependencies with both the tasks, and joint modelling of such related tasks is preferable over a pipeline-based approach. So, both the segmented forms and the morphological tags are to be predicted.

Online Portal
The competition is conducted on the following platform:
WSMP

Important Dates

  • 10th September 2021

    Training Dataset Release
    -----

  • 15th October 2021

    Online Portal for Registration
    -----

  • 20-21 November 2021

    Test Data Release
    Date of Hackathon
    -----

  • 13-17 December 2021

    FIRE '21
    Declaration of Results
    -----