pyuca

Python Unicode Collation Algorithm implementation
Download

pyuca Ranking & Summary

Advertisement

  • Rating:
  • License:
  • Other/Proprietary Li...
  • Price:
  • FREE
  • Publisher Name:
  • James Tauber
  • Publisher web site:
  • http://jtauber.com/pyso

pyuca Tags


pyuca Description

pyuca is a preliminary attempt at a Python implementation of the Unicode Collation Algorithm (UCA).Developer commentsI originally posted it to my blog in 2006 but it seems to get enough usage it really belongs here (and in PyPI).The core of the algorithm involves multi-level comparison. For example, café comes before caff because at the primary level, the accent is ignored and the first word is treated as if it were cafe. The secondary level (which considers accents) only applies then to words that are equivalent at the primary level.The Unicode Collation Algorithm and pyuca also support contraction and expansion. Contraction is where multiple letters are treated as a single unit. In Spanish, ch is treated as a letter coming between c and d so that, for example, words beginning ch should sort after all other words beginnings with c. Expansion is where a single letter is treated as though it were multiple letters. In German, ä is sorted as if it were ae, i.e. after ad but before af.Here is how to use the pyuca module.pip install pyucaUsage example:from pyuca import Collatorc = Collator("allkeys.txt")sorted_words = sorted(words, key=c.sort_key)allkeys.txt (1 MB) is available athttp://www.unicode.org/Public/UCA/latest/allkeys.txtProduct's homepage


pyuca Related Software