cpDetector

Framework for configurable code page-detection of documents
Download

cpDetector Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Freeware
  • Price:
  • FREE
  • Publisher Name:
  • Achim Westermann
  • Publisher web site:
  • Operating Systems:
  • Mac OS X
  • File Size:
  • 1.2 MB

cpDetector Tags


cpDetector Description

Framework for configurable code page-detection of documents The name cpDetector is a short form for code page - detector and has nothing to do with java classpaths. cpDetector may be used to detect the code page of documents retrieved from remote hosts. When you don't know which encoding document belongs to then you use code page detection. Therefore it is a core requirement for any application in the field of information mining or just information retrieval. Here are some key features of "cpDetector":· Configurable proxy that delegates to multiple choosable codepage detection implementation.· Codepage detection implementation that parses html pages for charset attribute in html pages (ANTLR based) · Codepage detection implementation facade for jchardet, the java port of mozilla codepage guessing algorithm. · Command line executable (jar-file) that uses charset detection for sorting documents into a taxonomy tree. NOTE: cpDetector is licensed and distributed under the terms of the Mozilla Public License 1.1 (MPL 1.1).


cpDetector Related Software