Exercises: if
¶
Warning
Do not forget the colon :
after the if
, else
, etc.
Writing an if
without the colon, e.g.:
>>> answer = raw_input("do you like yellow? ")
>>> if answer == "yes"
is an error. As soon as I enter the last line of code, Python gets upset:
File "<stdin>", line 1
if answer == "yes"
^
SyntaxError: invalid syntax
and refuses to execute the code.
Warning
Watch out for wrong indentation levels!
In Python, wrong indentation means wrong code.
The code may still “run”, but it may compute the wrong thing.
In some cases, it is easy to spot what’s wrong. For instance, here Python immediately raises an error:
>>> answer = raw_input("do you like yellow? ")
>>> if answer == "yes":
>>> print "you said:"
>>> print "yes"
File "<stdin>", line 4
print "yes"
^
IndentationError: unexpected indent
In other cases the error can be much more subtle and difficult to find. See below the section on nested statements.
Ask the user a number (with
raw_input()
). If the number is even, print"even"
; print"odd"
otherwise.Hint.
raw_input()
always returns a string.Ask the user a float. If the number is in the interval [-1, 1], print
"okay"
. Do not print anything otherwise.Hint. Are
elif
/else
necessary in this case?Ask the user two integers. If the first one is larger than the second one, print
"first"
. If the second is larger than the first, print"second"
. Otherwise, print"neither"
.Given the dictionary:
horoscope_of = { "January": "extreme luck", "February": "try to be born again", "March": "kissed by fortune", "April": "lucky luke", }
ask the user her birth month. If the month appears (as a key) in the dictionary, print the corresponding horoscope. Otherwise, print
"not available"
.Ask the user a path to an existing file, and read the contents using
readlines()
. Then print:- If the file is empty, the string
"empty"
- If the file has less than 100 lines,
"short"
, as well as the number of lines. - If the file has between 100 and 1000 lines,
"average"
and the number of lines. - Otherwise, print
"large"
and the number of lines.
The message must be printed on a single line.
- If the file is empty, the string
Using two calls to
raw_input()
, ask the user two triples of floats. The two triples represent 3D coordinates: x, y, z.If all coordinates are non-negative, print the Euclidean distance between the two points. Do not print anything otherwise.
Hint: the Euclidean distance is given by \(\sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2 + (z_1 - z_2)^2}`\)
Read this code:
number = int(raw_input("write a number: ")) if number % 3 == 0: print "divisible by 3" elif number % 3 != 0: print "not divisible by 3" else: print "dunno"
Can it actually print
"dunno"
?Read this code:
number = int(raw_input("write a number: ")) if number % 2 == 0: print "divisible by 2" if number % 3 == 0: print "divisible by 3" if number % 2 != 0 and number % 3 != 0: print "dunno"
Can it actually print
"dunno"
?Ask the user whether he wants to perform a
"sum"
or a"product"
.If the user asks for a
"sum"
, ask for two numbers, sum them, and print the result.Otherwise, if the user asks for a
"product"
, ask for two numbers, multiply them, and print the result.If the user replies neither
"sum"
or"product"
, do nothing.
Exercises: for
and while
¶
Write a
for
cycle to perform the following tasks:Print to screen the elements of
range(10)
, one for each row.Print to screen the square of the elements of
range(10)
, one for each row.Print to screen the sum of squares of
range(10)
.Print to screen the product of the elements of
range(1,10)
.Given the dictionary:
volume_of = { "A": 67.0, "C": 86.0, "D": 91.0, "E": 109.0, "F": 135.0, "G": 48.0, "H": 118.0, "I": 124.0, "K": 135.0, "L": 124.0, "M": 124.0, "N": 96.0, "P": 90.0, "Q": 114.0, "R": 148.0, "S": 73.0, "T": 93.0, "V": 105.0, "W": 163.0, "Y": 141.0, }
containing the volume of each amino acid, print to screen the sum of all the values.
Given the dictionary:
volume_of = { "A": 67.0, "C": 86.0, "D": 91.0, "E": 109.0, "F": 135.0, "G": 48.0, "H": 118.0, "I": 124.0, "K": 135.0, "L": 124.0, "M": 124.0, "N": 96.0, "P": 90.0, "Q": 114.0, "R": 148.0, "S": 73.0, "T": 93.0, "V": 105.0, "W": 163.0, "Y": 141.0, }
containing the volume of each amino acid, and the FASTA string:
fasta = """>1BA4:A|PDBID|CHAIN|SEQUENCE DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV"""
print to screen the total volume of the amino acids of the protein sequence.
Hint. First, you should extract the amino acid sequence from
fasta
, then, for each character of the sequence (for character in sequence
) get from the dictionary the corresponding volume and add it to the total.Find the minimum value of the list
[1, 25, 6, 27, 57, 12]
.Hint. See the previous example about finding the maximum of a list, and adapt it to the new logic (auxiliary variable
minimum_so_far
).Find both the maximum and the minimum of the list
[1, 25, 6, 27, 57, 12]
.Hint. You should create two auxiliary variables:
maximum_so_far
andminimum_so_far
.Given the nucleotide sequence:
sequence = "ATGGCGCCCGAACAGGGA"
compute the list of all its codons (0 offset for reading frame). The solution should be:
["ATG", "GCG", "CCC", "GAA", "CAG", "GGA"]
Hint: you should iterate on the result of
range(0, len(sequence), 3)
and add at each step the sequence of a codon to a previously created empty list.Given the text (in FASTA format):
text = """>2HMI:A|PDBID|CHAIN|SEQUENCE PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI >2HMI:B|PDBID|CHAIN|SEQUENCE PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI >2HMI:C|PDBID|CHAIN|SEQUENCE DIQMTQTTSSLSASLGDRVTISCSASQDISSYLNWYQQKPEGTVKLLIYY >2HMI:D|PDBID|CHAIN|SEQUENCE QITLKESGPGIVQPSQPFRLTCTFSGFSLSTSGIGVTWIRQPSGKGLEWL >2HMI:E|PDBID|CHAIN|SEQUENCE ATGGCGCCCGAACAGGGAC >2HMI:F|PDBID|CHAIN|SEQUENCE GTCCCTGTTCGGGCGCCA"""
return the dictionary
sequence_of
, having as keys the names of the sequences (the first will be2HMI:A
, the second2HMI:B
, and so on), and as values the corresponding sequences.The result should resemble this:
sequence_of = { "2HMI:A": "PISPIETVPVKLKPGMDGPKVKQW...", "2HMI:B": "PISPIETVPVKLKPGMDGPKVKQW...", # ... }
Hint. You should first split
text
in lines. Next, you should iterate on lines: if the line is a header, you should save the name of the sequence; otherwise, you should update the dictionary with the name you got from the previous line, and the sequence you have in the current line.
Write a
while
cycle performing the following task:- keep asking the user to write
"STOP"
. If the user writes``”STOP”`` (in upper case) the cycle terminates, otherwise it prints"you must write 'STOP'..."
and continues. - as before, but the cycle terminates also if the user writes
"stop"
in lower case.
- keep asking the user to write
What is printed to screen when executing the following code?
for number in range(10): print "processing the element", number
for number in range(10): print "processing the element", number break
for number in range(10): print "processing the element", number continue
for number in range(10): print number if number % 2 == 0: break
for number in range(10): if number % 2 == 0: break print number
condition = False while condition: print "the condition is true"
condition = False while condition: print "the condition is true" condition = True
condition = True while condition: print "the condition is true"
numbers = range(10) i = 0 while i < len(numbers): print "position", i, "contains the element", numbers[i]
lines = [ "line 1", "line 2", "line 3", "", "line 5", "line 6", ] for line in lines: line = line.strip() if len(line) == 0: break else: print "I read:", line
Given the tuple:
numbers = (0, 1, 1, 0, 0, 0, 1, 1, 2, 1, 2)
write a cycle that iterates on
numbers
, stopping whenever finding the value2
and printing to screen its position.Given the tuple:
strings = ("000", "51", "51", "32", "57", "26")
write a cycle that iterates on
strings
, stopping whenever finding a string containing the character"2"
, and printing to screen position and value of the string that stopped the cycle.The solution should be: position
4
, value"32"
.Adapted from example 2.4 in [book2] : create a random nucleotide sequence, with length defined by the user, and print it to screen.
Hint. The
random
module allows you to create random numbers. It provides a number of tools to manage random objects. For example, therandint(i,j)
function generates a number betweeni
andj
with equal probabilities, for example:import random index = random.randint(0,3) #index can be used to select the nucleotide at each position
Exercises: nested statements¶
Given the matrix:
n = 5 matrix = [range(n) for i in range(n)]
write a double
for
cycle printing to screen all the elements of matrix, one for each line.Given the matrix:
n = 5 matrix = [range(n) for i in range(n)]
what do the following fragments of code print?
for row in matrix: for element in row: print element
sum = 0 for row in matrix: for element in row: sum = sum + element print sum
for i in range(len(matrix)): row = matrix[i] for j in range(len(row)): element = row[j] print element
for i in range(len(matrix)): for j in range(len(matrix[i])): print matrix[i][j]
dunno = [] for i in range(len(matrix)): for j in range(len(matrix[i])): if i == j: dunno.append(matrix[i][j]) print " ".join([str(x) for x in dunno])
Given the list:
numbers = [8, 3, 2, 9, 7, 1, 8]
write a double
for
cycle printing to screen all the pairs of elements ofnumbers
.Modify the solution of the last exercise so that, if the pair
(i,j)
has been already printed, then the symmetric pair(j,i)
is not printed.Hint. See the example above.
Do the same as in the last exercise with the following list:
strings = ["I", "am", "a", "list"]
Given the list:
numbers = range(10)
write a double
for
printing to screen only pairs of elements ofnumbers
where the second element of the pair is twice the first.The result will be:
0 0 1 2 2 4 ...
Given the list:
numbers = [8, 3, 2, 9, 7, 1, 8]
write a double
for
cycle iterating on all element pairs ofnumbers
and printing to screen the pairs whose sum is10
.(Printing “repetitions” such as
8 + 2
and2 + 8
is allowed.)The result will be:
8 2 3 7 2 8 9 1
Hint. There is an example showing how to iterate on all pairs of elements of a list. It is sufficient to modify this example.
As before, but instead of printing to screen, store the pairs of elements whose sum is
10
in a listlist_of_pairs
.The result will be:
>>> list_of_pairs [(8, 2), (3, 7), (2, 8), 9, 1)]
Given the lists:
number_1 = [5, 9, 4, 4, 9, 2] number_2 = [7, 9, 6, 2]
write a double
for
cycle iterating on the two lists and printing to screen values and positions of all the elements ofnumber_1
appearing also innumber_2
.The result will be:
positions: 1, 1; repeated value: 9 positions: 4, 1; repeated value: 9 positions: 5, 3; repeated value: 2
As before, but instead of printing to screen, store positions and value in a list of triplets like this:
(position_1, position_2, repeated_values)
.Given the matrix:
n = 5 matrix = [range(n) for i in range(n)]
write a double
for
cycle finding the higher element.Hint. It is sufficient to adapt the code that finds the maximum-minimum of a list (with one dimension) to a matrix (with two dimensions).
Given the list of nucleotide sequences:
sequences = [ "ATGGCGCCCGAACAGGGA", "GTCCCTGTTCGGGCGCCA", ]
we want to obtain a list containing, for each sequence in
sequences
, the list of its triplets.Hint. You can re-use a previous exercise.
Given the list:
numbers = [5, 9, 4, 4, 9, 2]
write a code that counts the number of occurrences of each element and store the result in a dictionary, similar to this:
num_occurrences = { 5: 1, 9: 2, 4: 2, 2: 1, }
Hint. You can modify one of the previous examples so that, instead of saving the position of occurrences, increases the number of occurrences in
num_occurrences
.Hint. Note that if the key
5
is not in the dictionary, we cannot executenum_occurrences[5] += 1
, sincenum_occurrences[5]
doesn’t exist. See the example about reading a FASTA file.Given a list of gene clusters (lists), for example:
groups = [["gene1", "gene2"], ["gene3"], [], ["gene4", "gene5"]]
write a single cycle finding the biggest group and storing it in a variable
biggest_group_so_far
.Hint: this task is similar to finding the minimum/maximum in a list of integers, but the auxiliary variable should contain the longer list found so far.
Given the list of sequences:
sequences_2HMI = { "A": "PISPIETVPVKLKPGMDGPKVKQWPLTEEKI", "B": "PISPIETVPVKLKPGMDGPKVKQWPLTEEKI", "C": "DIQMTQTTSSLSASLGDRVTISCSASQDISS", "D": "QITLKESGPGIVQPSQPFRLTCTFSGFSLST", "E": "ATGGCGCCCGAACAGGGAC", "F": "GTCCCTGTTCGGGCGCCA", }
write a
for
cycle that (iterating on all the key-value pairs of the dictionary) returns a dictionary of histograms (that is dictionaries mapping amino acids to their number of occurrences) of each element ofsequences_2HMI
.Hint. Calculating a histogram also requires a
for
cycle: therefore you should expect to code two nestedfor
cycles.The result (a dictionary of dictionaries) should be like this:
histograms = { "A": { "P": 6, "I": 3, "S": 1, #... }, "B": { "P": 6, "I": 3, "S": 1, #... }, #... "F": { "A": 1, "C": 7, "G": 6, "T": 4, } }
Given the list of strings:
table = [ "protein domain start end", "YNL275W PF00955 236 498", "YHR065C SM00490 335 416", "YKL053C-A PF05254 5 72", "YOR349W PANTHER 353 414", ]
write a code that takes column names from the first row of
table
and:for each row creates a dictionary like this:
dictionary = { "protein": "YNL275W", "domain": "PF00955", "start": "236", "end":, "498" }
append the dictionary to a list.
Given:
alphabet_lo = "abcdefghijklmnopqrstuvwxyz" alphabet_up = alphabet_lo.upper()
write a cycle (
for
orwhile
) that, starting from an empty dictionary, insert all the key-value pairs:"a": "A", "b": "B", ...
in other words, the dictionary maps from the i-th character of
alphabet_min
to the i-th character ofalphabet_max
.Next, use the dictionary to implement a
for
cycle that, given an arbitrary string, for example:string = "I am a string"
returns the same result of
string.upper()
.Write a module that asks the user for the path to two text files, and print to screen the rows of the two files, one by one, next to each other: the rows of the first file should be printed on the left, the rows of the second to the right.
If the first file contains:
first row second row
and the second:
ACTG GCTA
the result will be:
first row ACTG second row GCTA
Hint. Note that the two files could be of different length. In that case (optionally) missing lines should be printed as if they were empty lines.
Write a module that, given the file
data/dna-fasta/fasta.1
:- Read the contents of the FASTA file in a dictionary.
- Calculate how many times each nucleotide appears in each sequence.
- Calculate the GC-content of each sequence.
- Calculate the AT/GC-ratio of each sequence.
Given the genetic code, provided as a dictionary, write a program that reads an RNA sequence from a FASTA file as a string, translates the sequence in each possible reading frame and print it to screen:
codon_table = { 'GCU':'A','GCC':'A','GCA':'A','GCG':'A','CGU':'R', 'CGC':'R','CGA':'R','CGG':'R','AGA':'R','AGG':'R', 'UCU':'S','UCC':'S','UCA':'S','UCG':'S','AGU':'S', 'AGC':'S','AUU':'I','AUC':'I','AUA':'I','AUU':'I', 'AUC':'I','AUA':'I','UUA':'L','UUG':'L','CUU':'L', 'CUC':'L','CUA':'L','CUG':'L','GGU':'G','GGC':'G', 'GGA':'G','GGG':'G','GUU':'V','GUC':'V','GUA':'V', 'GUG':'V','ACU':'T','ACC':'T','ACA':'T','ACG':'T', 'CCU':'P','CCC':'P','CCA':'P','CCG':'P','AAU':'N', 'AAC':'N','GAU':'D','GAC':'D','UGU':'C','UGC':'C', 'CAA':'Q','CAG':'Q','GAA':'E','GAG':'E','CAU':'H', 'CAC':'H','AAA':'K','AAG':'K','UUU':'F','UUC':'F', 'UAU':'Y','UAC':'Y','AUG':'M','UGG':'W', 'UAG':'*','UGA':'*','UAA':'*' }
Exercise 5.4 in in [book2]. Write a sequence-based predictor for protein secondary structure elements. Use the following dictionaries of preferences for alpha helices and beta sheets:
helix_propensity = { 'A':1.450, 'C':0.770, 'D':0.980, 'E':1.530, 'F':1.120, 'G':0.530, 'H':1.240, 'I':1.000, 'K':1.070, 'L':1.340, 'M':1.200, 'N':0.730, 'P':0.590, 'Q':1.170, 'R':0.790, 'S':0.790, 'T':0.820, 'V':1.140, 'W':1.140, 'Y':0.610 } sheet_propensity = { 'A':0.970, 'C':1.300, 'D':0.800, 'E':0.260, 'F':1.280, 'G':0.810, 'H':0.710, 'I':1.600, 'K':0.740, 'L':1.220, 'M':1.670, 'N':0.650, 'P':0.620, 'Q':1.230, 'R':0.900, 'S':0.720, 'T':1.200, 'V':1.650, 'W':1.190, 'Y':1.290 }
Hint. Scan the input sequence residue by residue and replace each residue with
H (helix) if its helix_propensity ≥ 1
and its helix_propensity > sheet_propensity
, with S (sheet) if
its sheet_propensity ≥ 1
and its helix_propensity < sheet_propensity
, and with L (loop) otherwise.
Read the input sequence from a FASTA file, and print (or write to a file) the input and output sequences, one on top of the
other.