Ellogon

Ellogon is a cross-platform, multi-lingual, general-purpose language engineering environment
Download

Ellogon Ranking & Summary

Advertisement

  • Rating:
  • License:
  • GPL
  • Publisher Name:
  • Georgios Petasis
  • Operating Systems:
  • Windows All
  • File Size:
  • 10.6 MB

Ellogon Tags


Ellogon Description

Ellogon is a cross-platform, multi-lingual, general-purpose language engineering environment, developed in order to aid both researchers who are doing research in computational linguistics, as well as companies who produce and deliver language engineering Ellogon as a language engineering platform offers an extensive set of facilities, including tools for processing and visualising textual/HTML/XML data and associated linguistic information, support for lexical resources (like creating and embedding lexicons), tools for creating annotated corpora, accessing databases, comparing annotated data, or transforming linguistic information into vectors for use with various machine learning algorithms. During the last decade, a large number of software infrastructures aiming at facilitating R&D in the field of natural language processing have been presented. Some of these infrastructures, such as LT-NSL/LT-XML tools or GATE, have become extremely popular as they have been applied to a wide range of tasks by many institutions around the world. Ellogon belongs to the category of referential or annotation based platforms, where the linguistic information is stored separately from the textual data, having references back to the original text. Based on the TIPSTER data model, Ellogon provides infrastructure for: · Managing, storing and exchanging textual data as well as the associated linguistic information. · Creating, embedding and managing linguistic processing components. · Facilitating communication among different linguistic components by defining a suitable programming interface (API). · Visualising textual data and associated linguistic information. Ellogon shares the same data model as the TIPSTER architecture. Due to this, it shares some basic features with other TIPSTER-based infrastructures, such as GATE. However, it also offers a large number of features that differentiate it from such infrastructures. The central element for storing data in Ellogon is the Collection. A collection is a finite set of Documents. An Ellogon document consists of textual data as well as linguistic information about the textual data. This linguistic information is stored in the form of attributes and annotations. An attribute associates a specific type of information with a typed value. An annotation associates arbitrary information (in the form of attributes) with portions of textual data. Each such portion, named span, consists of two character offsets denoting the start and the end characters of the portion, as measured from the first character of some textual data. Annotations typically consist of four elements: · A numeric identifier. This identifier is unique for every annotation within a document and can be used to unambiguously identify the annotation. · A type. Annotation types are textual values that are used to classify annotations into categories. · A set of spans that denote the range of the annotated textual data. · A set of attributes. These attributes usually encode the necessary linguistic information. Ellogon in its present form satisfies all of these requirements. As Ellogon is based on the TIPSTER architecture, it shares many basic properties with other TIPSTER-based infrastructures like GATE. However, Ellogon offers several important features that differentiate it from similar infrastructures: · Easy Component Development It is fairly easy to understand the process of developing new components and develop them using the functionalities provided by Ellogon. Additionally, a wide range of programming languages for component development are supported, including C, C++, Java, Tcl, Perl and Python. · Integrated Development Environment Ellogon operates as an integrated development environment, as it provides complete support to the development cycle of a component. Components can be created, edited, compiled and linked (whether applicable) from inside Ellogon. Furthermore, C/C++/Java components can be unloaded, modified, compiled and reloaded into Ellogon without having to quit from Ellogon. The ability to unload or reload all components is essential as it can significantly reduce development cycle, since component modifications can be immediately evaluated. · A ready to use component "toolbox" Ellogon is equipped with a large number of ready-to-use tools for performing tasks like annotated corpora creation, vector generation or data comparison. Additionally, several sample components are provided that can be adapted to various domains and languages, which perform some basic tasks like tokenization, part-of-speech tagging or gazetteer list lookup. Finally, Ellogon offers several data visualisation tools, ranging from simple viewers for the annotation database to viewers able to display hierarchical information, like syntax trees. · Easy deployment As Ellogon implements a decomposable architecture, it is extremely easy to create an easy to use product from a set of components that perform a specific task. All the components along with the needed Ellogon parts can be packaged either in a single executable (which needs no installation) or as an application (which can be ran unmodified under multipleoperating systems). These specialised applications can be distributed and used in any system, even if Ellogon has not been installed to the system. Requirements: · Tcl/Tk 8.4 (or newer), · Java JDK/JRE 1.4.1 (optional), · Perl 5.8.1 (optional), · Python 2.2 (optional).


Ellogon Related Software