Package translate :: Package lang :: Module common :: Class Common
[hide private]
[frames] | no frames]

Class Common

source code

object --+
         |
        Common
Known Subclasses:

This class is the common parent class for all language classes.

Instance Methods [hide private]
 
__init__(self, code)
This constructor is used if we need to instantiate an abject (not the usual setup).
source code
 
__repr__(self)
Give a simple string representation without address information to be able to store it in text for comparison later.
source code

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __str__

Class Methods [hide private]
 
punctranslate(cls, text)
Converts the punctuation in a string according to the rules of the language.
source code
 
character_iter(cls, text)
Returns an iterator over the characters in text.
source code
 
characters(cls, text)
Returns a list of characters in text.
source code
 
word_iter(cls, text)
Returns an iterator over the words in text.
source code
 
words(cls, text)
Returns a list of words in text.
source code
 
sentence_iter(cls, text, strip=True)
Returns an iterator over the sentences in text.
source code
 
sentences(cls, text, strip=True)
Returns a list of senteces in text.
source code
 
capsstart(cls, text)
Determines whether the text starts with a capital letter.
source code
Class Variables [hide private]
  code = ''
The ISO 639 language code, possibly with a country specifier or other modifier.
  fullname = ''
The full (English) name of this language.
  nplurals = 0
The number of plural forms of this language.
  pluralequation = '0'
The plural equation for selection of plural forms.
  listseperator = u', '
This string is used to seperate lists of textual elements.
  commonpunc = u'.,;:!?-@#$%^*_()[]{}/\'`"<>'
These punctuation marks are common in English and most languages that use latin script.
  quotes = u'‘’‛“”„‟′″‴‵‶‷‹›«»'
These are different quotation marks used by various languages.
  invertedpunc = u'¿¡'
Inveted punctuation sometimes used at the beginning of sentences in Spanish, Asturian, Galician, and Catalan.
  rtlpunc = u'،؟؛÷'
These punctuation marks are used by Arabic and Persian, for example.
  CJKpunc = u'。、,;!?「」『』【】'
These punctuation marks are used in certain circumstances with CJK languages.
  indicpunc = u'।॥॰'
These punctuation marks are used by several Indic languages.
  ethiopicpunc = u'።፤፣'
These punctuation marks are used by several Ethiopic languages.
  miscpunc = u'…±°¹²³·©®×£¥€'
The middle dot (·) is used by Greek and Georgian.
  punctuation = u'.,;:!?-@#$%^*_()[]{}/\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡...
We include many types of punctuation here, simply since this is only meant to determine if something is punctuation.
  sentenceend = u'.!?…։؟।。!?።'
These marks can indicate a sentence end.
  sentencere = re.compile(r'(?sx).*?[\.!\?\u2026\u0589\u061f\u09...
  puncdict = {}
A dictionary of punctuation transformation rules that can be used by punctranslate().
  ignoretests = []
List of pofilter tests for this language that must be ignored.
  checker = None
A language specific checker (see filters.checks).
Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, code)
(Constructor)

source code 

This constructor is used if we need to instantiate an abject (not the usual setup). This will mostly when the factory is asked for a language for which we don't have a dedicated class.

Overrides: object.__init__

__repr__(self)
(Representation operator)

source code 

Give a simple string representation without address information to be able to store it in text for comparison later.

Overrides: object.__repr__

Class Variable Details [hide private]

code

The ISO 639 language code, possibly with a country specifier or other 
modifier.

Examples:
    km
    pt_BR
    sr_YU@Latn

Value:
''

fullname

The full (English) name of this language.

Dialect codes should have the form of 
  Khmer
  Portugese (Brazil)
  #TODO: sr_YU@Latn?

Value:
''

nplurals

The number of plural forms of this language.

0 is not a valid value - it must be overridden. Any positive integer is valid (it should probably be between 1 and 6) Also see data.py

Value:
0

pluralequation

The plural equation for selection of plural forms.

This is used for PO files to fill into the header. See http://www.gnu.org/software/gettext/manual/html_node/gettext_150.html. Also see data.py

Value:
'0'

listseperator

This string is used to seperate lists of textual elements. Most languages probably can stick with the default comma, but Arabic and some Asian languages might want to override this.

Value:
u', '

punctuation

We include many types of punctuation here, simply since this is only meant to determine if something is punctuation. Hopefully we catch some languages which might not be represented with modules. Most languages won't need to override this.

Value:
u'.,;:!?-@#$%^*_()[]{}/\'`"<>‘’‛“”„‟′″‴‵‶‷‹›«»¿¡،؟؛÷。、,;!?「」『』【】।॥॰።፤፣\
…±°¹²³·©®×£¥€'

sentenceend

These marks can indicate a sentence end. Once again we try to account for many languages. Most langauges won't need to override this.

Value:
u'.!?…։؟।。!?።'

sentencere

Value:
re.compile(r'(?sx).*?[\.!\?\u2026\u0589\u061f\u0964\u3002\uff01\uff1f\\
u1362]\s+(?=[^a-z\d])')

checker

A language specific checker (see filters.checks).

This doesn't need to be supplied, but will be used if it exists.

Value:
None