Saturday, August 23, 2008

Coding with Class Sequence

Ok… time to get down to business! This time, we’re going to see how to code with class Sequence. To use Class Sequence in Python, download the source code file here and remember to import it into your python source file or interactive session. The following examples assume that you're working at the Python interactive session command prompt. Input (what you type in...!) and output (what Python throws back at you) have been coloured blue and green respectively to help distinguish them from the posts text. Python keywords are coloured orange.


First, let’s create a Sequence object. To do this, type the following at the Python command prompt (‘>>>’):

>>>my_sequence = Sequence(name = 'Sequence 1', seq = 'This is a sequence')

Notice that the name of the sequence and the actual sequence are written inside quotes (you could use either single or double quotes). Anything written inside quotes is taken by the Python interpreter to be a string.
Now, type

>>> print my_sequence

The Python interpreter should print the following:

>Sequence 1
THIS IS A SEQUENCE


Ok… so it works! Now lets try to add two Sequence objects…

>>> another_sequence = Sequence(name = 'Sequence 2',seq = ' of english characters')
>>> joined_sequences = my_sequence + another_sequence
>>> print joined_sequence
>Sequence 1+Sequence 2
THIS IS A SEQUENCE OF ENGLISH CHARACTERS


Note that the variable ‘joined_sequence’ is also a Sequence object. Addition of two Sequence objects leads to the creation of a new Sequence object whose name reflects the fact that it is an addition of two Sequences. You may then change the name of a Sequence object if you wish:

>>> joined_sequences.setname('New Name')

One can also obtain the name or the sequence contained within a Sequence object if one wished:

>>> name = joined_sequences.getname()
>>> seq = joined_sequences.getseq()
>>> type(name)


Notice that the name and sequence are themselves just Python strings. If you want to get just the 6th letter in the sequence, type:

>>> sixth_char = joined_sequences[5]
>>> print sixth_char
I

Note that to access the sixth letter we used ‘joined_sequences[5]’. That’s because the first character in a Python string is actually numbered zero! We can actually search for the first ‘I’ in joined_sequences:

>>> position_first_I = joined_sequences.find(motif = 'I')
>>> print position_first_I.start()
2
>>> print position_first_I.end()
3
>>> print position_first_I.span()
(2, 3)

The ‘find’ function can find not only characters but entire sub-strings!

>>> pos_subs = joined_sequences.find(motif = 'SEQUENCE')
>>> print pos_subs.span()
(10, 18)

Finally, the ‘fragment’ function:

>>> fragment = joined_sequences.fragment(my_start = 10, my_stop = 18)
>>> print fragment
>New Name(10,18)
SEQUENCE

Next time we’ll see how the more biologically relevant classes DNA, preMRNA, mRNA and Protein work.