Exercises: ``if`` ----------------- .. warning:: Do not forget the colon ``:`` after the ``if``, ``else``, etc. Writing an ``if`` without the colon, e.g.:: >>> answer = raw_input("do you like yellow? ") >>> if answer == "yes" is an error. As soon as I enter the last line of code, Python gets upset:: File "", line 1 if answer == "yes" ^ SyntaxError: invalid syntax and refuses to execute the code. .. warning:: Watch out for wrong indentation levels! In Python, **wrong indentation means wrong code**. The code may still "run", but it may compute the wrong thing. In some cases, it is easy to spot what's wrong. For instance, here Python immediately raises an error:: >>> answer = raw_input("do you like yellow? ") >>> if answer == "yes": >>> print "you said:" >>> print "yes" File "", line 4 print "yes" ^ IndentationError: unexpected indent In other cases the error can be much more subtle and difficult to find. See below the section on nested statements. #. Ask the user a number (with ``raw_input()``). If the number is even, print ``"even"``; print ``"odd"`` otherwise. *Hint*. ``raw_input()`` always returns a string. #. Ask the user a float. If the number is in the interval [-1, 1], print ``"okay"``. Do not print anything otherwise. *Hint*. Are ``elif``/``else`` necessary in this case? #. Ask the user two integers. If the first one is larger than the second one, print ``"first"``. If the second is larger than the first, print ``"second"``. Otherwise, print ``"neither"``. #. Given the dictionary:: horoscope_of = { "January": "extreme luck", "February": "try to be born again", "March": "kissed by fortune", "April": "lucky luke", } ask the user her birth month. If the month appears (as a key) in the dictionary, print the corresponding horoscope. Otherwise, print ``"not available"``. #. Ask the user a path to an existing file, and read the contents using ``readlines()``. Then print: #. If the file is empty, the string ``"empty"`` #. If the file has less than 100 lines, ``"short"``, as well as the number of lines. #. If the file has between 100 and 1000 lines, ``"average"`` and the number of lines. #. Otherwise, print ``"large"`` and the number of lines. The message must be printed on a single line. #. Using two calls to ``raw_input()``, ask the user two triples of floats. The two triples represent 3D coordinates: x, y, z. If all coordinates are non-negative, print the Euclidean distance between the two points. Do not print anything otherwise. *Hint*: the Euclidean distance is given by :math:`\sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2 + (z_1 - z_2)^2}`` #. Read this code:: number = int(raw_input("write a number: ")) if number % 3 == 0: print "divisible by 3" elif number % 3 != 0: print "not divisible by 3" else: print "dunno" Can it actually print ``"dunno"``? #. Read this code:: number = int(raw_input("write a number: ")) if number % 2 == 0: print "divisible by 2" if number % 3 == 0: print "divisible by 3" if number % 2 != 0 and number % 3 != 0: print "dunno" Can it actually print ``"dunno"``? #. Ask the user whether he wants to perform a ``"sum"`` or a ``"product"``. If the user asks for a ``"sum"``, ask for two numbers, sum them, and print the result. Otherwise, if the user asks for a ``"product"``, ask for two numbers, multiply them, and print the result. If the user replies neither ``"sum"`` or ``"product"``, do nothing. | Exercises: ``for`` and ``while`` -------------------------------- #. Write a ``for`` cycle to perform the following tasks: #. Print to screen the elements of ``range(10)``, one for each row. #. Print to screen the square of the elements of ``range(10)``, one for each row. #. Print to screen the sum of *squares* of ``range(10)``. #. Print to screen the *product* of the elements of ``range(1,10)``. #. Given the dictionary:: volume_of = { "A": 67.0, "C": 86.0, "D": 91.0, "E": 109.0, "F": 135.0, "G": 48.0, "H": 118.0, "I": 124.0, "K": 135.0, "L": 124.0, "M": 124.0, "N": 96.0, "P": 90.0, "Q": 114.0, "R": 148.0, "S": 73.0, "T": 93.0, "V": 105.0, "W": 163.0, "Y": 141.0, } containing the volume of each amino acid, print to screen the sum of all the values. #. Given the dictionary:: volume_of = { "A": 67.0, "C": 86.0, "D": 91.0, "E": 109.0, "F": 135.0, "G": 48.0, "H": 118.0, "I": 124.0, "K": 135.0, "L": 124.0, "M": 124.0, "N": 96.0, "P": 90.0, "Q": 114.0, "R": 148.0, "S": 73.0, "T": 93.0, "V": 105.0, "W": 163.0, "Y": 141.0, } containing the volume of each amino acid, and the FASTA string:: fasta = """>1BA4:A|PDBID|CHAIN|SEQUENCE DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV""" print to screen the total volume of the amino acids of the protein sequence. *Hint*. First, you should extract the amino acid sequence from ``fasta``, then, for each character of the sequence (``for character in sequence``) get from the dictionary the corresponding volume and add it to the total. #. Find the *minimum* value of the list ``[1, 25, 6, 27, 57, 12]``. *Hint*. See the previous example about finding the *maximum* of a list, and adapt it to the new logic (auxiliary variable ``minimum_so_far``). #. Find both the *maximum* and the *minimum* of the list ``[1, 25, 6, 27, 57, 12]``. *Hint*. You should create two auxiliary variables: ``maximum_so_far`` and ``minimum_so_far``. #. Given the nucleotide sequence:: sequence = "ATGGCGCCCGAACAGGGA" compute the list of all its codons (0 offset for reading frame). The solution should be:: ["ATG", "GCG", "CCC", "GAA", "CAG", "GGA"] *Hint*: you should iterate on the result of ``range(0, len(sequence), 3)`` and add at each step the sequence of a codon to a previously created empty list. #. Given the text (in FASTA format):: text = """>2HMI:A|PDBID|CHAIN|SEQUENCE PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI >2HMI:B|PDBID|CHAIN|SEQUENCE PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI >2HMI:C|PDBID|CHAIN|SEQUENCE DIQMTQTTSSLSASLGDRVTISCSASQDISSYLNWYQQKPEGTVKLLIYY >2HMI:D|PDBID|CHAIN|SEQUENCE QITLKESGPGIVQPSQPFRLTCTFSGFSLSTSGIGVTWIRQPSGKGLEWL >2HMI:E|PDBID|CHAIN|SEQUENCE ATGGCGCCCGAACAGGGAC >2HMI:F|PDBID|CHAIN|SEQUENCE GTCCCTGTTCGGGCGCCA""" return the dictionary ``sequence_of``, having as keys the names of the sequences (the first will be ``2HMI:A``, the second ``2HMI:B``, and so on), and as values the corresponding sequences. The result should resemble this:: sequence_of = { "2HMI:A": "PISPIETVPVKLKPGMDGPKVKQW...", "2HMI:B": "PISPIETVPVKLKPGMDGPKVKQW...", # ... } *Hint*. You should first split ``text`` in lines. Next, you should iterate on lines: if the line is a header, you should save the name of the sequence; otherwise, you should update the dictionary with the name you got from the previous line, and the sequence you have in the current line. #. Write a ``while`` cycle performing the following task: #. keep asking the user to write ``"STOP"``. If the user writes``"STOP"`` (in upper case) the cycle terminates, otherwise it prints ``"you must write 'STOP'..."`` and continues. #. as before, but the cycle terminates also if the user writes ``"stop"`` in lower case. #. What is printed to screen when executing the following code? #. :: for number in range(10): print "processing the element", number #. :: for number in range(10): print "processing the element", number break #. :: for number in range(10): print "processing the element", number continue #. :: for number in range(10): print number if number % 2 == 0: break #. :: for number in range(10): if number % 2 == 0: break print number #. :: condition = False while condition: print "the condition is true" #. :: condition = False while condition: print "the condition is true" condition = True #. :: condition = True while condition: print "the condition is true" #. :: numbers = range(10) i = 0 while i < len(numbers): print "position", i, "contains the element", numbers[i] #. :: lines = [ "line 1", "line 2", "line 3", "", "line 5", "line 6", ] for line in lines: line = line.strip() if len(line) == 0: break else: print "I read:", line #. Given the tuple:: numbers = (0, 1, 1, 0, 0, 0, 1, 1, 2, 1, 2) write a cycle that iterates on ``numbers``, stopping whenever finding the value ``2`` and printing to screen its *position*. #. Given the tuple:: strings = ("000", "51", "51", "32", "57", "26") write a cycle that iterates on ``strings``, stopping whenever finding a string containing the character ``"2"``, and printing to screen *position* and *value* of the string that stopped the cycle. The solution should be: position ``4``, value ``"32"``. #. Adapted from example 2.4 in [book2]_ : create a random nucleotide sequence, with length defined by the user, and print it to screen. *Hint*. The ``random`` module allows you to create random numbers. It provides a number of tools to manage random objects. For example, the ``randint(i,j)`` function generates a number between ``i`` and ``j`` with equal probabilities, for example:: import random index = random.randint(0,3) #index can be used to select the nucleotide at each position | Exercises: nested statements ---------------------------- #. Given the matrix:: n = 5 matrix = [range(n) for i in range(n)] write a double ``for`` cycle printing to screen all the elements of matrix, one for each line. #. Given the matrix:: n = 5 matrix = [range(n) for i in range(n)] what do the following fragments of code print? #. :: for row in matrix: for element in row: print element #. :: sum = 0 for row in matrix: for element in row: sum = sum + element print sum #. :: for i in range(len(matrix)): row = matrix[i] for j in range(len(row)): element = row[j] print element #. :: for i in range(len(matrix)): for j in range(len(matrix[i])): print matrix[i][j] #. :: dunno = [] for i in range(len(matrix)): for j in range(len(matrix[i])): if i == j: dunno.append(matrix[i][j]) print " ".join([str(x) for x in dunno]) #. Given the list:: numbers = [8, 3, 2, 9, 7, 1, 8] write a double ``for`` cycle printing to screen all the *pairs* of elements of ``numbers``. #. Modify the solution of the last exercise so that, if the pair ``(i,j)`` has been already printed, then the symmetric pair ``(j,i)`` is not printed. *Hint*. See the example above. #. Do the same as in the last exercise with the following list:: strings = ["I", "am", "a", "list"] #. Given the list:: numbers = range(10) write a double ``for`` printing to screen only pairs of elements of ``numbers`` where the second element of the pair is twice the first. The result will be:: 0 0 1 2 2 4 ... #. Given the list:: numbers = [8, 3, 2, 9, 7, 1, 8] write a double ``for`` cycle iterating on all element pairs of ``numbers`` and printing to screen the pairs whose sum is ``10``. (Printing "repetitions" such as ``8 + 2`` and ``2 + 8`` is allowed.) The result will be:: 8 2 3 7 2 8 9 1 *Hint*. There is an example showing how to iterate on all pairs of elements of a list. It is sufficient to modify this example. #. As before, but instead of printing to screen, store the pairs of elements whose sum is ``10`` in a list ``list_of_pairs``. The result will be:: >>> list_of_pairs [(8, 2), (3, 7), (2, 8), 9, 1)] #. Given the lists:: number_1 = [5, 9, 4, 4, 9, 2] number_2 = [7, 9, 6, 2] write a double ``for`` cycle iterating on the two lists and printing to screen *values* and *positions* of all the elements of ``number_1`` appearing also in ``number_2``. The result will be:: positions: 1, 1; repeated value: 9 positions: 4, 1; repeated value: 9 positions: 5, 3; repeated value: 2 #. As before, but instead of printing to screen, store positions and value in a list of triplets like this: ``(position_1, position_2, repeated_values)``. #. Given the matrix:: n = 5 matrix = [range(n) for i in range(n)] write a double ``for`` cycle finding the higher element. *Hint*. It is sufficient to adapt the code that finds the maximum-minimum of a list (with one dimension) to a matrix (with two dimensions). #. Given the list of nucleotide sequences:: sequences = [ "ATGGCGCCCGAACAGGGA", "GTCCCTGTTCGGGCGCCA", ] we want to obtain a list containing, for each sequence in ``sequences``, the list of its triplets. *Hint*. You can re-use a previous exercise. #. Given the list:: numbers = [5, 9, 4, 4, 9, 2] write a code that counts the number of occurrences of each element and store the result in a dictionary, similar to this:: num_occurrences = { 5: 1, 9: 2, 4: 2, 2: 1, } *Hint*. You can modify one of the previous examples so that, instead of saving the position of occurrences, increases the number of occurrences in ``num_occurrences``. *Hint*. Note that if the key ``5`` is not in the dictionary, we cannot execute ``num_occurrences[5] += 1``, since ``num_occurrences[5]`` doesn't exist. See the example about reading a FASTA file. #. Given a list of gene clusters (lists), for example:: groups = [["gene1", "gene2"], ["gene3"], [], ["gene4", "gene5"]] write a *single* cycle finding the biggest group and storing it in a variable ``biggest_group_so_far``. *Hint*: this task is similar to finding the minimum/maximum in a list of integers, but the auxiliary variable should contain the longer *list* found so far. #. Given the list of sequences:: sequences_2HMI = { "A": "PISPIETVPVKLKPGMDGPKVKQWPLTEEKI", "B": "PISPIETVPVKLKPGMDGPKVKQWPLTEEKI", "C": "DIQMTQTTSSLSASLGDRVTISCSASQDISS", "D": "QITLKESGPGIVQPSQPFRLTCTFSGFSLST", "E": "ATGGCGCCCGAACAGGGAC", "F": "GTCCCTGTTCGGGCGCCA", } write a ``for`` cycle that (iterating on all the key-value pairs of the dictionary) returns a dictionary of histograms (that is dictionaries mapping amino acids to their number of occurrences) of each element of ``sequences_2HMI``. *Hint*. Calculating a histogram also requires a ``for`` cycle: therefore you should expect to code two nested ``for`` cycles. The result (a dictionary of dictionaries) should be like this:: histograms = { "A": { "P": 6, "I": 3, "S": 1, #... }, "B": { "P": 6, "I": 3, "S": 1, #... }, #... "F": { "A": 1, "C": 7, "G": 6, "T": 4, } } #. Given the list of strings:: table = [ "protein domain start end", "YNL275W PF00955 236 498", "YHR065C SM00490 335 416", "YKL053C-A PF05254 5 72", "YOR349W PANTHER 353 414", ] write a code that takes column names from the first row of ``table`` and: - for each row creates a dictionary like this:: dictionary = { "protein": "YNL275W", "domain": "PF00955", "start": "236", "end":, "498" } - append the dictionary to a list. #. Given:: alphabet_lo = "abcdefghijklmnopqrstuvwxyz" alphabet_up = alphabet_lo.upper() write a cycle (``for`` or ``while``) that, starting from an empty dictionary, insert all the key-value pairs:: "a": "A", "b": "B", ... in other words, the dictionary maps from the *i*-th character of ``alphabet_min`` to the *i*-th character of ``alphabet_max``. Next, use the dictionary to implement a ``for`` cycle that, given an arbitrary string, for example:: string = "I am a string" returns the same result of ``string.upper()``. #. Write a module that asks the user for the path to *two* text files, and print to screen the rows of the two files, one by one, next to each other: the rows of the first file should be printed on the left, the rows of the second to the right. If the first file contains:: first row second row and the second:: ACTG GCTA the result will be:: first row ACTG second row GCTA *Hint*. Note that the two files could be of different length. In that case (optionally) missing lines should be printed as if they were empty lines. #. Write a module that, given the file ``data/dna-fasta/fasta.1``: #. Read the contents of the FASTA file in a dictionary. #. Calculate how many times each nucleotide appears in each sequence. #. Calculate the GC-content of each sequence. #. Calculate the AT/GC-ratio of each sequence. #. Given the genetic code, provided as a dictionary, write a program that reads an RNA sequence from a FASTA file as a string, translates the sequence in each possible reading frame and print it to screen:: codon_table = { 'GCU':'A','GCC':'A','GCA':'A','GCG':'A','CGU':'R', 'CGC':'R','CGA':'R','CGG':'R','AGA':'R','AGG':'R', 'UCU':'S','UCC':'S','UCA':'S','UCG':'S','AGU':'S', 'AGC':'S','AUU':'I','AUC':'I','AUA':'I','AUU':'I', 'AUC':'I','AUA':'I','UUA':'L','UUG':'L','CUU':'L', 'CUC':'L','CUA':'L','CUG':'L','GGU':'G','GGC':'G', 'GGA':'G','GGG':'G','GUU':'V','GUC':'V','GUA':'V', 'GUG':'V','ACU':'T','ACC':'T','ACA':'T','ACG':'T', 'CCU':'P','CCC':'P','CCA':'P','CCG':'P','AAU':'N', 'AAC':'N','GAU':'D','GAC':'D','UGU':'C','UGC':'C', 'CAA':'Q','CAG':'Q','GAA':'E','GAG':'E','CAU':'H', 'CAC':'H','AAA':'K','AAG':'K','UUU':'F','UUC':'F', 'UAU':'Y','UAC':'Y','AUG':'M','UGG':'W', 'UAG':'*','UGA':'*','UAA':'*' } #. Exercise 5.4 in in [book2]_. Write a sequence-based predictor for protein secondary structure elements. Use the following dictionaries of preferences for alpha helices and beta sheets:: helix_propensity = { 'A':1.450, 'C':0.770, 'D':0.980, 'E':1.530, 'F':1.120, 'G':0.530, 'H':1.240, 'I':1.000, 'K':1.070, 'L':1.340, 'M':1.200, 'N':0.730, 'P':0.590, 'Q':1.170, 'R':0.790, 'S':0.790, 'T':0.820, 'V':1.140, 'W':1.140, 'Y':0.610 } sheet_propensity = { 'A':0.970, 'C':1.300, 'D':0.800, 'E':0.260, 'F':1.280, 'G':0.810, 'H':0.710, 'I':1.600, 'K':0.740, 'L':1.220, 'M':1.670, 'N':0.650, 'P':0.620, 'Q':1.230, 'R':0.900, 'S':0.720, 'T':1.200, 'V':1.650, 'W':1.190, 'Y':1.290 } *Hint*. Scan the input sequence residue by residue and replace each residue with H (helix) if its ``helix_propensity ≥ 1`` and its ``helix_propensity > sheet_propensity``, with S (sheet) if its ``sheet_propensity ≥ 1`` and its ``helix_propensity < sheet_propensity``, and with L (loop) otherwise. Read the input sequence from a FASTA file, and print (or write to a file) the input and output sequences, one on top of the other.