Tuesday, July 15, 2008

Class Sequence

Class ‘Sequence’ is the result of an effort to create a generic class of object that encapsulates most of the common properties that biological information holding sequences contain. Hmmm… now what could those be? Obviously, one needs to store the actual sequence. It might also prove useful to define the alphabet (e.g. A, T, G and C for DNA) that makes up the sequence as well as their properties, such as, molecular weight of the nucleotide or amino acid, etc. Another important property is the number of alphabets/letters that are read together (i.e. the word size) to derive meaning from the sequence. As you’ve probably already guessed, the word size in an ORF would be 3 (i.e. the size of a codon). We’ve also included a few artificial but useful properties like name, locus and any interesting references that you might want to associate with the sequence.

The class also contains methods to manipulate the sequence and other properties. Methods analogous to Python string operators are addition and array-like indexing of the sequence. The binary ‘+’ operator behaves exactly as one would expect! The addition of two Sequence objects results in the formation of a new Sequence object that contains the end-to-end ligation of two sequences that the operator acts upon. The unary ‘[n]’ (index) operator returns the nth letter in the sequence. There are also a set of methods whose names speak for themselves! ‘getname’, ‘getseq’, ‘setname’ and ‘setseq’ return the name/sequence or alter the name/sequence respectively. ‘locus’ and ‘ref’ are methods that alter the locus or references.

Two important methods are ‘find’ and ‘fragment’. ‘find’ searches for sub-sequence or ‘motif’ in the sequence and returns the position of the first instance (if it finds one). ‘fragment’ returns a fragment of the Sequence according to the start and end positions supplied by the programmer. These two methods are at the heart of the GenePython ideology. For example, the RNAPolymerase Virtual Enzyme uses the ‘find’ method to locate important signal sequences and then returns the ‘fragment’ that lies between the TATA box and poly-adenylation signal.

Next time, we’ll look at some actual coding examples that will illustrate how class Sequence works.

Wednesday, July 2, 2008

Getting started with GenePython...

Before using GenePython, you should install the latest, stable version of Python (www.python.org/download) which currently happens to be version 2.5.2 (do not download any other version) on your computer. Just run the installer and... thats it! You are now ready to use Python. Check out the help and tutorial files for a detailed introduction to the language... or just read on!

Python is an object-oriented scripting language and as soon as you enter a command the Python interpreter responds. The term ‘object’ refers to an entity that not only contains data, but also contains instructions (called methods or functions) on how to manipulate that data. The word ‘class’ essentially refers to the ‘type’ of the object or, for those with a more philosophical bent of mind, the abstract qualities that define an object!. Let me give a real world analogy to make things clearer… your shiny new car, assuming you are very lucky to own one, is an object of class ‘Ferrari’, or maybe class ‘BMW’ or maybe even class ‘Fiat’.

All car objects present you, the driver, with interfaces to manipulate them in a specific way. The steering-wheel, brake pedal and door-handle are some such interfaces that offer a simple way of doing pretty complicated things. For instance, if you turn the steering wheel to the right, the car turns (oddly enough) to the right! Unbeknownst to you, turning the steering wheel right actually ends up turning a shaft with a geared end, which, in turn, displaces a ratchet which in turn moves another thing, and another…., that finally turns the wheels which in turn transmit the frictional forces between them and the road to the rest of the car that ends up turning (like I said, oddly enough!). The great part about being a driver is that you never needed to know all this! You just turn the wheel and the car turns… simple! As in the real world, well-designed objects in the programming world offer a simple and easy to use interface to ‘drivers’ so that they can get on with their driving.

You may have noticed that regardless of the classes of car, there are a set of common interfaces with brake-pedal, steering-wheel etc. One can imagine that each particular class of car has ‘inherited’ some of its common properties from a parent class… class ‘Car’! So, in object-oriented programming, if one intends to create a lot of classes which have common properties, all one need do is creating one ‘parent’ class which encapsulates all those common properties. Then one just creates different classes, each inheriting these common properties from the parent class! This saves you a whole lot of typing and helps you organize your thoughts and you’re coding. Of course, one can tweak the properties of each different daughter class as required. For instance, all cars have steering-wheels, but some have power steering while others don’t.

Python offers a set of standard classes so that programmers might while away their time productively. These classes are used to store and manipulate commonly required data types such as numbers, strings of characters, ordered lists of stuff etc. So for instance, in Python, the number ‘5’ is an object of type ‘int’ (integer), the word ‘five’ is an object of type ‘str’ (string of characters) and one could, if one wished, create a ‘list’ object in which the first element could be ‘5’, second element ‘five’ and so on. Most useful code consists of creating ‘variables’ of these classes that change their values according to your rules.

GenePython offers an additional set of classes that drive biologists, bioinformaticians or anyone interested to make sense out of biological information such as sequences or strings. In this context, the concept of inheritance is used extensively in GenePython. For instance, classes representing information-holding sequences like DNA, RNA and protein, inherit from the parent class ‘Sequence’ that holds all the common properties that one would like any information-holding sequence to have, something we’ll deal with the next time.

Labels: , , ,