=======================================
Python: complex statements (Solutions)
=======================================

Conditional code: ``if``
---------------------------

#. Solution::

    number = int(raw_input("write a number: "))

    if number % 2 == 0:
        print "even"
    else:
        print "odd"

   We use ``else``, since  even and odd are the only two possibilities.

   A way to make a third option explicit would be::

    if number % 2 == 0:
        print "even"
    elif number % 2 == 1:
        print "odd"
    else:
        print "impossible!"

   but the code in ``else`` will never be executed for any value of ``number``!

   Since the two options are mutually exclusive, we can also write::

    if number % 2 == 0:
        print "even"
    if numero % 2 == 1:
        print "odd"

   even without the``else``, one and only one of the
   ``if`` can be executed.

#. Solution::

    number = float(raw_input("write rational: "))

    if number >= -1 and number <= 1:
        print "okay"

   we don't need neither``elif`` (there is only one condition) neither ``else`` (if the
   condition is false, we don't need to do anything).

#. Solution::

    answer = raw_input("write two numbers separated by a space: ")

    words = answer.split()
    num1 = int(words[0])
    num2 = int(words[1])

    if num1 > num2:
        print "first"
    elif num2 > num1:
        print "second"
    else:
        print "neither"

   Alternatively::

    answer = raw_input("write two numbers separated by a space: ")

    numbers = [int(word) for word in answer.split()]

    if numbers[0] > numbers[1]:
        print "first"
    elif numbers[0] < numbers[1]:
        print "second"
    else:
        print "neither"

#. Solution::

    horoscope_of = {
        "January": "extreme luck",
        "February": "try to be born again",
        "March": "kissed by fortune",
        "April": "lucky luke",
    }

    month = raw_input("tell me your birth month: ")

    if horoscope_of.has_key(month):
        print horoscope_of[month]
    else:
        print "not available"

#. Solution::

    path = raw_input("write your path: ")

    lines = open(path, "r").readlines()
    if len(lines) == 0:
        print "empty"
    elif len(lines) < 100:
        print "short", len(lines)
    elif len(lines) < 1000:
        print "average", len(lines)
    else:
        print "large", len(lines)

   Note that it's not necessary to specify entirely the conditions: in the code
   we can shorten ``100 < len(lines) < 1000`` with ``len(lines) < 1000``.
   We can do that, since when``len(lines)`` is lower than ``100``
   the first ``elif`` is executed: the second ``elif`` is not even considered.

#. Solution::

    point1 = [float(word) for word
              in raw_input("write three coordinates: ").split()]

    point2 = [float(word) for word
              in raw_input("write three coordinates: ").split()]

    if point1[0] >= 0 and point1[1] >= 0 and point1[2] >= 0 and \
       point2[0] >= 0 and point2[1] >= 0 and point2[2] >= 0:
        diff_x = point1[0] - point2[0]
        diff_y = point1[1] - point2[1]
        diff_z = point1[2] - point2[2]

        print "the distance is", (diff_x**2 + diff_y**2 +  diff_z**2)**0.5

   Note that ``print`` is *inside* the ``if``.

#. Solution: we know that ``number`` is an arbitrary integer, chosen by the user::

    if number % 3 == 0:
        print "divisible by 3"
    elif numero % 3 != 0:
        print "not divisible by 3"
    else:
        print "dunno"

   ``if``, ``elif`` and ``else`` form a chain: only one among them is executed.

   #. ``if`` is executed if and only if ``number`` is divisibile by three.

   #. ``elif`` is executed if and only if the previous ``if`` is not executed and 
      if ``number`` is *not* divisible by three.

   #. ``else`` is execute whenever neither ``if`` and ``elif`` are executed.

   Since all numbers are either divisible by ``3`` either not, there is no other possibility, 
   ``else`` will *never* be executed.

   Therefore, the answer is no.

#. Solution: as before, ``number`` is an arbitrary integer. The code is::

    number = int(raw_input("write a number: "))
    if number % 2 == 0:
        print "divisible by 2"
    if number % 3 == 0:
        print "divisible by 2"
    if number % 2 != 0 and number % 3 != 0:
        print "dunno"

   Here we don't have "chains" of ``if``, ``elif`` ed ``else``: we have three independent ``if``.

   #. The first ``if`` is executed if and only if ``number`` is divisible by two.

   #. The second ``if`` is executed if and only if ``number`` is divisible by three.

   #. The third ``if`` is executed if and only if ``number`` is *not* divisible by neither 
      two and three.

   If ``number`` is 6, divisible by both two and three, the first two
   ``if`` will be both executed, while the third won't be.

   If ``number`` is 5, not divisible by neither two and three, the first two 
   ``if`` will *not* be executed, but the third will be. 

   Therefore, the answer is yes.

   (There is no possibility to *not* execute neither of the three ``if``.)

#. Solution::

    answer = raw_input("sum or product?: ")

    if answer == "sum":
        num1 = int(raw_input("number 1: "))
        num2 = int(raw_input("number 2: "))
        print "the sum is", num1 + num2

    elif answer == "product":
        num1 = int(raw_input("num1: "))
        num2 = int(raw_input("num2: "))
        print "the product is", num1 * num2

   Using ``if`` or ``elif`` won't change the execution of the program.

   We can simplify like this::

    answer = raw_input("sum or product?: ")
    num1 = int(raw_input("number 1: "))
    num2 = int(raw_input("number 2: "))

    if answer == "sum":
        print "the sum is", num1 + num2

    elif answer == "product":
        print "the product is", num1 * num2

Iterative code: ``for`` and ``while``
--------------------------------------

#. Solutions:

   #. Solution::

        for number in range(10):
            print number

   #. Solution::

        for number in range(10):
            print number**2

   #. Solution::

        sum_of_squares = 0
        for number in range(10):
            sum_of_squares = sum_of_squares + number**2
        print sum_of_squares

   #. Solution::

        product = 1 # note that for the product the initial value should be 1!
        for number in range(1,10):
            product = product * number
        print product

   #. Solution::

        volume_of = {
            "A":  67.0, "C":  86.0, "D":  91.0,
            "E": 109.0, "F": 135.0, "G":  48.0,
            "H": 118.0, "I": 124.0, "K": 135.0,
            "L": 124.0, "M": 124.0, "N":  96.0,
            "P":  90.0, "Q": 114.0, "R": 148.0,
            "S":  73.0, "T":  93.0, "V": 105.0,
            "W": 163.0, "Y": 141.0,
        }

        sum_of_volumes = 0
        for volume in volume_of.values():
            sum_of_volumes = sum_of_volumes + volume
        print sum_of_volumes

   #. Solution::

        volume_of = {
            "A":  67.0, "C":  86.0, "D":  91.0,
            "E": 109.0, "F": 135.0, "G":  48.0,
            "H": 118.0, "I": 124.0, "K": 135.0,
            "L": 124.0, "M": 124.0, "N":  96.0,
            "P":  90.0, "Q": 114.0, "R": 148.0,
            "S":  73.0, "T":  93.0, "V": 105.0,
            "W": 163.0, "Y": 141.0,
        }

        fasta = """>1BA4:A|PDBID|CHAIN|SEQUENCE
        DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV"""

        # Let's extract the sequence
        sequence = fasta.split("\n")[1]

        sum_of_volumes = 0

        # for each character in the sequence ...
        for aa in sequence:
            volume_of_aa = volume_of[aa]
            sum_of_volumes = sum_of_volumes + volume_of_aa

        print sum_of_volumes

   #. Solution: let's adapt the code from the previous example::

        list = [1, 25, 6, 27, 57, 12]

        minimum_so_far = list[0]
        for number in list[1:]:
            if number < minimum_so_far:
                minimum_so_far = number

        print "the minimum value is:", minimum_so_far

   #. Solution: let's combine the example and the previous exercise::

        list = [1, 25, 6, 27, 57, 12]

        max = list[0]
        min = list[0]

        for number in list[1:]:
            if number > max:
                max = number
            if number < min:
                min = number

        print "minimum =", min, "maximum =", max

   #. Solution: ``range(0, len(sequence), 3)`` returns ``[0, 3, 6, 9, ...]``,
      containing the positions of the first character of all the triplets.

      Let's write::

        sequence = "ATGGCGCCCGAACAGGGA"

        # let's start from an empty list
        triplets = []

        for pos_start in range(0, len(sequence), 3):
            triplets = sequence[pos_start:pos_start+3]
            triplets.append(triplets)

        print triplets

   #. Solution::

        text = """>2HMI:A|PDBID|CHAIN|SEQUENCE
        PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI
        >2HMI:B|PDBID|CHAIN|SEQUENCE
        PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI
        >2HMI:C|PDBID|CHAIN|SEQUENCE
        DIQMTQTTSSLSASLGDRVTISCSASQDISSYLNWYQQKPEGTVKLLIYY
        >2HMI:D|PDBID|CHAIN|SEQUENCE
        QITLKESGPGIVQPSQPFRLTCTFSGFSLSTSGIGVTWIRQPSGKGLEWL
        >2HMI:E|PDBID|CHAIN|SEQUENCE
        ATGGCGCCCGAACAGGGAC
        >2HMI:F|PDBID|CHAIN|SEQUENCE
        GTCCCTGTTCGGGCGCCA"""

        # first, let's split the text il lines
        lines = text.split("\n")

        # then, let's create an empty dictionary
        sequence_of = {}

        # now we can iterate on lines
        for line in lines:

            if line[0] == ">":
                # if the line is a header, we extract the sequence name
                name = line.split("|")[0]
            else:
                # the line contains the sequence, that we add to the dictionary, using the name extracted before as key
                sequence_of[name] = line

        print sequence_of

#. Solutions:

   #. Solution::

        while raw_input("write 'STOP': ") != "STOP":
            print "you must write 'STOP'..."

   #. Solution::

        while raw_input("write stop: ").lower() != "stop":
            print "you must write 'stop'..."

#. Solutions:

   #. Solution: all numbers in ``range(10)``.

   #. Solution: the number ``0``. ``break`` immediately interrupts the ``for`` cycle.

   #. Solution: all numbers in ``range(10)``. ``continue`` jumps to the next iteration, as Python automatically does when the instructions in the ``for`` cycle are finished. Since ``continue`` in this case is right at the end of the ``for`` cycle, it doesn't have any effect.

   #. Solution: the number ``0``. In the first iteration, when ``number`` has value ``0``, first Python executes ``print number``, printing ``0``; then ``if`` is executed, and also the ``break`` inside the ``if``, immediately interrupting the ``for`` cycle.

   #. Solution: nothing. In the first iteration, when ``number`` has value ``0``, ``if`` is executed and also the ``break`` inside the ``if``, immediately interrupting the ``for`` cycle. Therefore, ``print`` is never executed.

   #. Solution: nothing. Instructions inside the ``while`` are never executed, since the condition is ``False``!

   #. Solution: nothing. Instructions inside the ``while`` are never executed, since the condition is ``False``! As a consequence, the line ``condition = True`` is never executed.

   #. Solution: ``"the condition is true"`` an infinite number of times. Since the condition is always  ``True``, the ``while`` never stops iterating!

   #. Solution: ten strings of the form ``"position 0 contains the element 0"``, ``"position 1 contains the element 1"``, *and so on*

   #. Solution: all the elements of ``lines`` (processed by ``strip()``) occurring before the first empty line: ``"line 1"``, ``"line 2"`` and ``"line 3"``. As soon as ``line`` has value ``""`` (the fourth element of ``lines``) the ``if`` is executed, and ``break`` interrupts the cycle. Note that the fourth row is *not* printed.

#. Solution::

    numbers = (0, 1, 1, 0, 0, 0, 1, 1, 2, 1, 2)

    for i in range(len(numbers)):
        number_in_pos_i = numbers[i]

        if number_in_pos_i == 2:
            print "the position is", i
            break

#. Solution::

    strings = ("000", "51", "51", "32", "57", "26")

    for i in range(len(strings)):
        string_in_pos_i = strings[i]

        if "2" in string_in_pos_i:
            print "position =", i, "value =", string_in_pos_i
            break

#. Solution::

    length = int(raw_input("write the length of the sequence: "))
    import random
    alphabet = "AGCT"
    sequence = ""
    for i in range(length):
        index = random.randint(0, 3)
        sequence = sequence + alphabet[index]
    print sequence

Nested code
-------------

#. Solution::

    n = 5
    matrix = [range(n) for i in range(n)]

    for line in matrix:
        for element in line:
            print element

#. Solution:

   #. All the elements of the matrix.
   #. The *sum* of all the elements of the matrix.
   #. Again, all the elements of the matrix.
   #. Again, all the elements of the matrix.
   #. The list of the elements on the diagonal.

#. Solution::

    numbers = [8, 3, 2, 9, 7, 1, 8]

    for num_1 in numbers:
        for num_2 in numbers:
            print num_1, num_2

   This code is very similar to the clock example!

#. Solution::

    numbers = [8, 3, 2, 9, 7, 1, 8]

    already_printed_pairs = []

    for i in range(len(numbers)):
        for j in range(len(numbers)):

            pair = (numbers[i], numbers[j])

            # check wheter we already printed the symmetric pair
            if (pair[1], pair[0]) in already_printed_pairs:
                continue

            # this code will be executed if the pair has not been printed: 
            # print the pair and update already_printed_pairs
            print pair
            already_printed_pairs.append(pair)

#. The solution is the same of the previous exercise.

#. Solution::

    numbers = range(10)

    for element_1 in numbers:
        for element_2 in numbers:
            if 2 * element_1 == element_2:
                print element_1, element_2

#. Solution::

    numbers = [8, 3, 2, 9, 7, 1, 8]

    for element_1 in numbers:
        for element_2 in numbers:
            if element_1 + element_2 == 10:
                print element_1, element_2

#. Solution::

    numbers = [8, 3, 2, 9, 7, 1, 8]

    # first, let's create an empty list
    list_of_pairs = []

    for element_1 in numbers:
        for element_2 in numbers:
            if element_1 + element_2 == 10:
                # update the list with append()
                list_of_pairs.append((element_1, element_2))

    # finally, let's print the list
    print list_of_pairs

#. Solution::

    numbers_1 = [5, 9, 4, 4, 9, 2]
    numbers_2 = [7, 9, 6, 2]

    # iteration on the *first* list
    for i in range(len(numbers_1)):
        num_in_pos_i = numbers_1[i]

        # iteration on the *second* list
        for j in range(len(numbers_2)):
            num_in_pos_j = numbers_2[j]

            if num_in_pos_i == num_in_pos_j:
                print "positions:", i, j, "; repeated value:", num_in_pos_i

#. Solution::

    numbers_1 = [5, 9, 4, 4, 9, 2]
    numbers_2 = [7, 9, 6, 2]

    # first, let's create an empty list
    list_of_triplets = []

    # iteration on the *first* list
    for i in range(len(numbers_1)):
        num_in_pos_i = numbers_1[i]

    # iteration on the *second* list
        for j in range(len(numbers_2)):
            num_in_pos_j = numbers_2[j]

            if num_in_pos_i == num_in_pos_j:
                # instead of printing, we update the list
                llist_of_triplets.append((i, j, num_in_pos_i))

    # finally, let's print the list
    print list_of_triplets

#. Solution::

    n = 5
    matrix = [range(n) for i in range(n)]

    # let's initialize with the first element (any other element would be fine as well)
    max_element_so_far = matrix[0][0]

    # iteration...
    for line in matrix:
        for element in line:
            # we update max_element_so_far when we find a higher element,
            if element > max_element_so_far:
                max_element_so_far = element

    print max_element_so_far

#. Solution::

    sequences = [
        "ATGGCGCCCGAACAGGGA",
        "GTCCCTGTTCGGGCGCCA",
    ]

    # first, let's create an empty list
    result = []

    # iteration
    for sequence in sequences:
        # split the current sequence in triplets
        triplets = []
        for i in range(0, len(sequence), 3):
            triplets.append(sequence[i:i+3])

        # append (*not* extend()!!!) the obtained triplets
        # to the list result
        result.append(triplets)

    # finally, let's print the list
    print result

#. Solution::

    numbers = [5, 9, 4, 4, 9, 2]

    num_occurrences = {}

    for number in numbers:
        if not num_occurrences.has_key(number):
            num_occurrences[number] = 1
        else:
            num_occurrences[number] += 1

   alternatively::

    numbers = [5, 9, 4, 4, 9, 2]

    num_occurrences = {}

    for number in numbers:
        if not num_occurrences.has_key(number):
            num_occurrences[number] = 0
        num_occurrences[number] += 1

   or, using ``count()``::

    numbers = [5, 9, 4, 4, 9, 2]

    num_occurrences = {}

    for number in numbers:
        if not num_occurrences.has_key(number):
            num_occurrences[number] = numbers.count(number)

   Note that in the last variant, the ``if`` line is optional (but not the following  "content"!)
 
#. Solution::

    groups = [["gene1", "gene2"], ["gene3"], [], ["gene4", "gene5"]]

    # let's initialize with the first group
    biggest_group_so_far = groups[0]

    # iteration
    for grup in groups[1:]:
        if len(gropu) > len(biggest_group_so_far):
            biggest_group_so_far = group

    print biggest_group_so_far

#. Solution::

    sequences_2HMI = {
        "A": "PISPIETVPVKLKPGMDGPKVKQWPLTEEKI",
        "B": "PISPIETVPVKLKPGMDGPKVKQWPLTEEKI",
        "C": "DIQMTQTTSSLSASLGDRVTISCSASQDISS",
        "D": "QITLKESGPGIVQPSQPFRLTCTFSGFSLST",
        "E": "ATGGCGCCCGAACAGGGAC",
        "F": "GTCCCTGTTCGGGCGCCA",
    }

    # let's start with an empty dictionary
    histograms = {}

    for key, sequence in sequences_2HMI.items():

        # let's associate this key to an empty dictionary
        histograms[key] = {}

        for residue in sequence:
            if not histograms[key].has_key(residue):
                histograms[key][residue] = 1
            else:
                histograms[key][residue] += 1

    # let's print the result
    print histograms

    # let's print the result more clearly
    for key, histogram in histograms.items():
        print key
        print histogram
        print ""

#. Solution::

    table = [
        "protein domain start end",
        "YNL275W PF00955 236 498",
        "YHR065C SM00490 335 416",
        "YKL053C-A PF05254 5 72",
        "YOR349W PANTHER 353 414",
    ]

    # as before, first let's extract column names from the first row
    column_names = table[0].split()

    # let's start from an empty list
    lines_as_dictionaries = []

    # now, let's iterate on the other rows
    for line in table[1:]:

        # let's compile the dictionary for this row
        dictionary = {}
        words = line.split()
        for i in range(len(words)):

            # extract the corresponding word
            word= words[i]

            # extract the corresponding column name
            column_name = column_names[i]

            # update the dictionary
            dictionary[column_name] = word

        # having compiled the dictionary for this line,
        # we can update the list
        lines_as_dictionaries.append(dictionary)

    # finished! now let's print the result (one row at a time,
    # to make it easier to read)
    for row in lines_as_dictionaries:
        print row

#. Solution::

    alphabel_lo = "abcdefghijklmnopqrstuvwxyz"
    alphabet_up = alfabeto_min.upper()

    # let's build the dictionary
    lo_to_up = {}
    for i in range(len(alphabel_lo)):
        lo_to_up[alphabel_lo[i]] = alphabel_up[i]


    string = "I am a string"

    # let's convert the string
    converted_chars = []
    for character in string:
        if lo_to_up.has_key(character):
            # convert the alphabetic character
            converted_chars.append(lo_to_up[character])
        else:
            # we don't convert it (e.g., it's not an alphabetic character)
            converted_chars.append(character)
    converted_string = "".join(converted_chars)

    print converted_string

#. Solution::

    lines_1 = open(raw_input("path 1: ")).readlines()
    lines_2 = open(raw_input("path 2: ")).readlines()

    # we have to be careful, since the two files could be of different lengths!
    max_lines = len(lines_1)
    if len(lines_2) > max_lines:
        max_lines = len(lines_2)

    # iteration on the lines of both files
    for i in range(max_lines):

        # take the i-th line of the first file, if existent,
        if i < len(lines_1):
            line_1 = lines_1[i].strip()
        else:
            line_1 = ""

        # take the i-th line of the second file, if existent,
        if i < len(lines_2):
            line_2 = lines_2[i].strip()
        else:
            line_2 = ""

        print line_1 + " " + line_2

#. Solution::

    # let's read the fasta file
    fasta_as_dictionary = {}
    for line in open("data/dna-fasta/fasta.1").readlines():

        # let's clean the sequence
        line = line.strip()

        if line[0] == ">":
            header = line
            fasta_as_dictionary[header] = ""

        else:
            fasta_as_dictionary[header] += line

    # let's iterate on header-sequence pairs
    for header, sequence in fasta_as_dictionary.items():

        print "processind", header

        # let's count the number of occurrences of each nucleotide
        count = {}
        for nucleotide in ("A", "C", "G", "T"):
            count[nucleotide] = sequence.count(nucleotide)
        print "nucleotide occurrences:", count

        # calculate gc-content
        gc_content = (count["G"] + count["C"]) / float(len(sequence))
        print "GC content:", gc_content

        # calculate the AT/GC-ratio
        sum_at = count["A"] + count["T"]
        sum_cg = count["C"] + count["G"]
        at_gc_ratio = float(sum_at) / float(sum_cg)
        print "AT/GC-ratio:", at_gc_ratio