Exercises: if

Warning

Do not forget the colon : after the if, else, etc.

Writing an if without the colon, e.g.:

>>> answer = raw_input("do you like yellow? ")
>>> if answer == "yes"

is an error. As soon as I enter the last line of code, Python gets upset:

  File "<stdin>", line 1
    if answer == "yes"
                     ^
SyntaxError: invalid syntax

and refuses to execute the code.

Warning

Watch out for wrong indentation levels!

In Python, wrong indentation means wrong code.

The code may still “run”, but it may compute the wrong thing.

In some cases, it is easy to spot what’s wrong. For instance, here Python immediately raises an error:

>>> answer = raw_input("do you like yellow? ")
>>> if answer == "yes":
>>>    print "you said:"
>>>        print "yes"
  File "<stdin>", line 4
        print "yes"
        ^
IndentationError: unexpected indent

In other cases the error can be much more subtle and difficult to find. See below the section on nested statements.

  1. Ask the user a number (with raw_input()). If the number is even, print "even"; print "odd" otherwise.

    Hint. raw_input() always returns a string.

  2. Ask the user a float. If the number is in the interval [-1, 1], print "okay". Do not print anything otherwise.

    Hint. Are elif/else necessary in this case?

  3. Ask the user two integers. If the first one is larger than the second one, print "first". If the second is larger than the first, print "second". Otherwise, print "neither".

  4. Given the dictionary:

    horoscope_of = {
        "January": "extreme luck",
        "February": "try to be born again",
        "March": "kissed by fortune",
        "April": "lucky luke",
    }
    

    ask the user her birth month. If the month appears (as a key) in the dictionary, print the corresponding horoscope. Otherwise, print "not available".

  5. Ask the user a path to an existing file, and read the contents using readlines(). Then print:

    1. If the file is empty, the string "empty"
    2. If the file has less than 100 lines, "short", as well as the number of lines.
    3. If the file has between 100 and 1000 lines, "average" and the number of lines.
    4. Otherwise, print "large" and the number of lines.

    The message must be printed on a single line.

  6. Using two calls to raw_input(), ask the user two triples of floats. The two triples represent 3D coordinates: x, y, z.

    If all coordinates are non-negative, print the Euclidean distance between the two points. Do not print anything otherwise.

    Hint: the Euclidean distance is given by \(\sqrt{(x_1 - x_2)^2 + (y_1 - y_2)^2 + (z_1 - z_2)^2}`\)

  7. Read this code:

    number = int(raw_input("write a number: "))
    if number % 3 == 0:
        print "divisible by 3"
    elif number % 3 != 0:
        print "not divisible by 3"
    else:
        print "dunno"
    

    Can it actually print "dunno"?

  8. Read this code:

    number = int(raw_input("write a number: "))
    if number % 2 == 0:
        print "divisible by 2"
    if number % 3 == 0:
        print "divisible by 3"
    if number % 2 != 0 and number % 3 != 0:
        print "dunno"
    

    Can it actually print "dunno"?

  9. Ask the user whether he wants to perform a "sum" or a "product".

    If the user asks for a "sum", ask for two numbers, sum them, and print the result.

    Otherwise, if the user asks for a "product", ask for two numbers, multiply them, and print the result.

    If the user replies neither "sum" or "product", do nothing.


Exercises: for and while

  1. Write a for cycle to perform the following tasks:

    1. Print to screen the elements of range(10), one for each row.

    2. Print to screen the square of the elements of range(10), one for each row.

    3. Print to screen the sum of squares of range(10).

    4. Print to screen the product of the elements of range(1,10).

    5. Given the dictionary:

      volume_of = {
          "A":  67.0, "C":  86.0, "D":  91.0,
          "E": 109.0, "F": 135.0, "G":  48.0,
          "H": 118.0, "I": 124.0, "K": 135.0,
          "L": 124.0, "M": 124.0, "N":  96.0,
          "P":  90.0, "Q": 114.0, "R": 148.0,
          "S":  73.0, "T":  93.0, "V": 105.0,
          "W": 163.0, "Y": 141.0,
      }
      

      containing the volume of each amino acid, print to screen the sum of all the values.

    6. Given the dictionary:

      volume_of = {
          "A":  67.0, "C":  86.0, "D":  91.0,
          "E": 109.0, "F": 135.0, "G":  48.0,
          "H": 118.0, "I": 124.0, "K": 135.0,
          "L": 124.0, "M": 124.0, "N":  96.0,
          "P":  90.0, "Q": 114.0, "R": 148.0,
          "S":  73.0, "T":  93.0, "V": 105.0,
          "W": 163.0, "Y": 141.0,
      }
      

      containing the volume of each amino acid, and the FASTA string:

      fasta = """>1BA4:A|PDBID|CHAIN|SEQUENCE
      DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV"""
      

      print to screen the total volume of the amino acids of the protein sequence.

      Hint. First, you should extract the amino acid sequence from fasta, then, for each character of the sequence (for character in sequence) get from the dictionary the corresponding volume and add it to the total.

    7. Find the minimum value of the list [1, 25, 6, 27, 57, 12].

      Hint. See the previous example about finding the maximum of a list, and adapt it to the new logic (auxiliary variable minimum_so_far).

    8. Find both the maximum and the minimum of the list [1, 25, 6, 27, 57, 12].

      Hint. You should create two auxiliary variables: maximum_so_far and minimum_so_far.

    9. Given the nucleotide sequence:

      sequence = "ATGGCGCCCGAACAGGGA"
      

      compute the list of all its codons (0 offset for reading frame). The solution should be:

      ["ATG", "GCG", "CCC", "GAA", "CAG", "GGA"]
      

      Hint: you should iterate on the result of range(0, len(sequence), 3) and add at each step the sequence of a codon to a previously created empty list.

    10. Given the text (in FASTA format):

      text = """>2HMI:A|PDBID|CHAIN|SEQUENCE
      PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI
      >2HMI:B|PDBID|CHAIN|SEQUENCE
      PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI
      >2HMI:C|PDBID|CHAIN|SEQUENCE
      DIQMTQTTSSLSASLGDRVTISCSASQDISSYLNWYQQKPEGTVKLLIYY
      >2HMI:D|PDBID|CHAIN|SEQUENCE
      QITLKESGPGIVQPSQPFRLTCTFSGFSLSTSGIGVTWIRQPSGKGLEWL
      >2HMI:E|PDBID|CHAIN|SEQUENCE
      ATGGCGCCCGAACAGGGAC
      >2HMI:F|PDBID|CHAIN|SEQUENCE
      GTCCCTGTTCGGGCGCCA"""
      

      return the dictionary sequence_of, having as keys the names of the sequences (the first will be 2HMI:A, the second 2HMI:B, and so on), and as values the corresponding sequences.

      The result should resemble this:

      sequence_of = {
          "2HMI:A": "PISPIETVPVKLKPGMDGPKVKQW...",
          "2HMI:B": "PISPIETVPVKLKPGMDGPKVKQW...",
          # ...
      }
      

      Hint. You should first split text in lines. Next, you should iterate on lines: if the line is a header, you should save the name of the sequence; otherwise, you should update the dictionary with the name you got from the previous line, and the sequence you have in the current line.

  2. Write a while cycle performing the following task:

    1. keep asking the user to write "STOP". If the user writes``”STOP”`` (in upper case) the cycle terminates, otherwise it prints "you must write 'STOP'..." and continues.
    2. as before, but the cycle terminates also if the user writes "stop" in lower case.
  3. What is printed to screen when executing the following code?

    1. for number in range(10):
          print "processing the element", number
      
    2. for number in range(10):
          print "processing the element", number
          break
      
    3. for number in range(10):
          print "processing the element", number
          continue
      
    4. for number in range(10):
          print number
          if number % 2 == 0:
              break
      
    5. for number in range(10):
          if number % 2 == 0:
              break
          print number
      
    6. condition = False
      while condition:
          print "the condition is true"
      
    7. condition = False
      while condition:
          print "the condition is true"
          condition = True
      
    8. condition = True
      while condition:
          print "the condition is true"
      
    9. numbers = range(10)
      
      i = 0
      while i < len(numbers):
          print "position", i, "contains the element", numbers[i]
      
    10. lines = [
          "line 1",
          "line 2",
          "line 3",
          "",
          "line 5",
          "line 6",
      ]
      
      for line in lines:
          line = line.strip()
          if len(line) == 0:
              break
          else:
              print "I read:", line
      
  4. Given the tuple:

    numbers = (0, 1, 1, 0, 0, 0, 1, 1, 2, 1, 2)
    

    write a cycle that iterates on numbers, stopping whenever finding the value 2 and printing to screen its position.

  5. Given the tuple:

    strings = ("000", "51", "51", "32", "57", "26")
    

    write a cycle that iterates on strings, stopping whenever finding a string containing the character "2", and printing to screen position and value of the string that stopped the cycle.

    The solution should be: position 4, value "32".

  6. Adapted from example 2.4 in [book2] : create a random nucleotide sequence, with length defined by the user, and print it to screen.

    Hint. The random module allows you to create random numbers. It provides a number of tools to manage random objects. For example, the randint(i,j) function generates a number between i and j with equal probabilities, for example:

    import random
    index = random.randint(0,3)
    #index can be used to select the nucleotide at each position
    

Exercises: nested statements

  1. Given the matrix:

    n = 5
    matrix = [range(n) for i in range(n)]
    

    write a double for cycle printing to screen all the elements of matrix, one for each line.

  2. Given the matrix:

    n = 5
    matrix = [range(n) for i in range(n)]
    

    what do the following fragments of code print?

    1. for row in matrix:
          for element in row:
              print element
      
    2. sum = 0
      for row in matrix:
          for element in row:
              sum = sum + element
      print sum
      
    3. for i in range(len(matrix)):
          row = matrix[i]
          for j in range(len(row)):
              element = row[j]
              print element
      
    4. for i in range(len(matrix)):
          for j in range(len(matrix[i])):
              print matrix[i][j]
      
    5. dunno = []
      for i in range(len(matrix)):
          for j in range(len(matrix[i])):
              if i == j:
                  dunno.append(matrix[i][j])
      print " ".join([str(x) for x in dunno])
      
  3. Given the list:

    numbers = [8, 3, 2, 9, 7, 1, 8]
    

    write a double for cycle printing to screen all the pairs of elements of numbers.

  4. Modify the solution of the last exercise so that, if the pair (i,j) has been already printed, then the symmetric pair (j,i) is not printed.

    Hint. See the example above.

  5. Do the same as in the last exercise with the following list:

    strings = ["I", "am", "a", "list"]
    
  6. Given the list:

    numbers = range(10)
    

    write a double for printing to screen only pairs of elements of numbers where the second element of the pair is twice the first.

    The result will be:

    0 0
    1 2
    2 4
    ...
    
  7. Given the list:

    numbers = [8, 3, 2, 9, 7, 1, 8]
    

    write a double for cycle iterating on all element pairs of numbers and printing to screen the pairs whose sum is 10.

    (Printing “repetitions” such as 8 + 2 and 2 + 8 is allowed.)

    The result will be:

    8 2
    3 7
    2 8
    9 1
    

    Hint. There is an example showing how to iterate on all pairs of elements of a list. It is sufficient to modify this example.

  8. As before, but instead of printing to screen, store the pairs of elements whose sum is 10 in a list list_of_pairs.

    The result will be:

    >>> list_of_pairs
    [(8, 2), (3, 7), (2, 8), 9, 1)]
    
  9. Given the lists:

    number_1 = [5, 9, 4, 4, 9, 2]
    number_2 = [7, 9, 6, 2]
    

    write a double for cycle iterating on the two lists and printing to screen values and positions of all the elements of number_1 appearing also in number_2.

    The result will be:

    positions: 1, 1; repeated value: 9
    positions: 4, 1; repeated value: 9
    positions: 5, 3; repeated value: 2
    
  10. As before, but instead of printing to screen, store positions and value in a list of triplets like this: (position_1, position_2, repeated_values).

  11. Given the matrix:

    n = 5
    matrix = [range(n) for i in range(n)]
    

    write a double for cycle finding the higher element.

    Hint. It is sufficient to adapt the code that finds the maximum-minimum of a list (with one dimension) to a matrix (with two dimensions).

  12. Given the list of nucleotide sequences:

    sequences = [
        "ATGGCGCCCGAACAGGGA",
        "GTCCCTGTTCGGGCGCCA",
    ]
    

    we want to obtain a list containing, for each sequence in sequences, the list of its triplets.

    Hint. You can re-use a previous exercise.

  13. Given the list:

    numbers = [5, 9, 4, 4, 9, 2]
    

    write a code that counts the number of occurrences of each element and store the result in a dictionary, similar to this:

    num_occurrences = {
        5: 1,
        9: 2,
        4: 2,
        2: 1,
    }
    

    Hint. You can modify one of the previous examples so that, instead of saving the position of occurrences, increases the number of occurrences in num_occurrences.

    Hint. Note that if the key 5 is not in the dictionary, we cannot execute num_occurrences[5] += 1, since num_occurrences[5] doesn’t exist. See the example about reading a FASTA file.

  14. Given a list of gene clusters (lists), for example:

    groups = [["gene1", "gene2"], ["gene3"], [], ["gene4", "gene5"]]
    

    write a single cycle finding the biggest group and storing it in a variable biggest_group_so_far.

    Hint: this task is similar to finding the minimum/maximum in a list of integers, but the auxiliary variable should contain the longer list found so far.

  15. Given the list of sequences:

    sequences_2HMI = {
        "A": "PISPIETVPVKLKPGMDGPKVKQWPLTEEKI",
        "B": "PISPIETVPVKLKPGMDGPKVKQWPLTEEKI",
        "C": "DIQMTQTTSSLSASLGDRVTISCSASQDISS",
        "D": "QITLKESGPGIVQPSQPFRLTCTFSGFSLST",
        "E": "ATGGCGCCCGAACAGGGAC",
        "F": "GTCCCTGTTCGGGCGCCA",
    }
    

    write a for cycle that (iterating on all the key-value pairs of the dictionary) returns a dictionary of histograms (that is dictionaries mapping amino acids to their number of occurrences) of each element of sequences_2HMI.

    Hint. Calculating a histogram also requires a for cycle: therefore you should expect to code two nested for cycles.

    The result (a dictionary of dictionaries) should be like this:

    histograms = {
        "A": {
            "P": 6,
            "I": 3,
            "S": 1,
            #...
        },
        "B": {
            "P": 6,
            "I": 3,
            "S": 1,
            #...
        },
    
        #...
    
        "F": {
            "A": 1,
            "C": 7,
            "G": 6,
            "T": 4,
        }
    }
    
  16. Given the list of strings:

    table = [
        "protein domain start end",
        "YNL275W PF00955 236 498",
        "YHR065C SM00490 335 416",
        "YKL053C-A PF05254 5 72",
        "YOR349W PANTHER 353 414",
    ]
    

    write a code that takes column names from the first row of table and:

    • for each row creates a dictionary like this:

      dictionary = {
          "protein": "YNL275W",
          "domain": "PF00955",
          "start": "236",
          "end":, "498"
      }
      
    • append the dictionary to a list.

  17. Given:

    alphabet_lo = "abcdefghijklmnopqrstuvwxyz"
    alphabet_up = alphabet_lo.upper()
    

    write a cycle (for or while) that, starting from an empty dictionary, insert all the key-value pairs:

    "a": "A",
    "b": "B",
    ...
    

    in other words, the dictionary maps from the i-th character of alphabet_min to the i-th character of alphabet_max.

    Next, use the dictionary to implement a for cycle that, given an arbitrary string, for example:

    string = "I am a string"
    

    returns the same result of string.upper().

  18. Write a module that asks the user for the path to two text files, and print to screen the rows of the two files, one by one, next to each other: the rows of the first file should be printed on the left, the rows of the second to the right.

    If the first file contains:

    first row
    second row
    

    and the second:

    ACTG
    GCTA
    

    the result will be:

    first row ACTG
    second row GCTA
    

    Hint. Note that the two files could be of different length. In that case (optionally) missing lines should be printed as if they were empty lines.

  19. Write a module that, given the file data/dna-fasta/fasta.1:

    1. Read the contents of the FASTA file in a dictionary.
    2. Calculate how many times each nucleotide appears in each sequence.
    3. Calculate the GC-content of each sequence.
    4. Calculate the AT/GC-ratio of each sequence.
  20. Given the genetic code, provided as a dictionary, write a program that reads an RNA sequence from a FASTA file as a string, translates the sequence in each possible reading frame and print it to screen:

    codon_table = {
        'GCU':'A','GCC':'A','GCA':'A','GCG':'A','CGU':'R',
        'CGC':'R','CGA':'R','CGG':'R','AGA':'R','AGG':'R',
        'UCU':'S','UCC':'S','UCA':'S','UCG':'S','AGU':'S',
        'AGC':'S','AUU':'I','AUC':'I','AUA':'I','AUU':'I',
        'AUC':'I','AUA':'I','UUA':'L','UUG':'L','CUU':'L',
        'CUC':'L','CUA':'L','CUG':'L','GGU':'G','GGC':'G',
        'GGA':'G','GGG':'G','GUU':'V','GUC':'V','GUA':'V',
        'GUG':'V','ACU':'T','ACC':'T','ACA':'T','ACG':'T',
        'CCU':'P','CCC':'P','CCA':'P','CCG':'P','AAU':'N',
        'AAC':'N','GAU':'D','GAC':'D','UGU':'C','UGC':'C',
        'CAA':'Q','CAG':'Q','GAA':'E','GAG':'E','CAU':'H',
        'CAC':'H','AAA':'K','AAG':'K','UUU':'F','UUC':'F',
        'UAU':'Y','UAC':'Y','AUG':'M','UGG':'W',
        'UAG':'*','UGA':'*','UAA':'*'
        }
    
  21. Exercise 5.4 in in [book2]. Write a sequence-based predictor for protein secondary structure elements. Use the following dictionaries of preferences for alpha helices and beta sheets:

    helix_propensity = {
        'A':1.450, 'C':0.770, 'D':0.980,
        'E':1.530, 'F':1.120, 'G':0.530,
        'H':1.240, 'I':1.000, 'K':1.070,
        'L':1.340, 'M':1.200, 'N':0.730,
        'P':0.590, 'Q':1.170, 'R':0.790,
        'S':0.790, 'T':0.820, 'V':1.140,
        'W':1.140, 'Y':0.610
    }
    
    sheet_propensity = {
        'A':0.970, 'C':1.300, 'D':0.800,
        'E':0.260, 'F':1.280, 'G':0.810,
        'H':0.710, 'I':1.600, 'K':0.740,
        'L':1.220, 'M':1.670, 'N':0.650,
        'P':0.620, 'Q':1.230, 'R':0.900,
        'S':0.720, 'T':1.200, 'V':1.650,
        'W':1.190, 'Y':1.290
    }
    

Hint. Scan the input sequence residue by residue and replace each residue with H (helix) if its helix_propensity 1 and its helix_propensity > sheet_propensity, with S (sheet) if its sheet_propensity 1 and its helix_propensity < sheet_propensity, and with L (loop) otherwise. Read the input sequence from a FASTA file, and print (or write to a file) the input and output sequences, one on top of the other.