======================================= Python: complex statements (Solutions) ======================================= Conditional code: ``if`` --------------------------- #. Solution:: number = int(raw_input("write a number: ")) if number % 2 == 0: print "even" else: print "odd" We use ``else``, since even and odd are the only two possibilities. A way to make a third option explicit would be:: if number % 2 == 0: print "even" elif number % 2 == 1: print "odd" else: print "impossible!" but the code in ``else`` will never be executed for any value of ``number``! Since the two options are mutually exclusive, we can also write:: if number % 2 == 0: print "even" if numero % 2 == 1: print "odd" even without the``else``, one and only one of the ``if`` can be executed. #. Solution:: number = float(raw_input("write rational: ")) if number >= -1 and number <= 1: print "okay" we don't need neither``elif`` (there is only one condition) neither ``else`` (if the condition is false, we don't need to do anything). #. Solution:: answer = raw_input("write two numbers separated by a space: ") words = answer.split() num1 = int(words[0]) num2 = int(words[1]) if num1 > num2: print "first" elif num2 > num1: print "second" else: print "neither" Alternatively:: answer = raw_input("write two numbers separated by a space: ") numbers = [int(word) for word in answer.split()] if numbers[0] > numbers[1]: print "first" elif numbers[0] < numbers[1]: print "second" else: print "neither" #. Solution:: horoscope_of = { "January": "extreme luck", "February": "try to be born again", "March": "kissed by fortune", "April": "lucky luke", } month = raw_input("tell me your birth month: ") if horoscope_of.has_key(month): print horoscope_of[month] else: print "not available" #. Solution:: path = raw_input("write your path: ") lines = open(path, "r").readlines() if len(lines) == 0: print "empty" elif len(lines) < 100: print "short", len(lines) elif len(lines) < 1000: print "average", len(lines) else: print "large", len(lines) Note that it's not necessary to specify entirely the conditions: in the code we can shorten ``100 < len(lines) < 1000`` with ``len(lines) < 1000``. We can do that, since when``len(lines)`` is lower than ``100`` the first ``elif`` is executed: the second ``elif`` is not even considered. #. Solution:: point1 = [float(word) for word in raw_input("write three coordinates: ").split()] point2 = [float(word) for word in raw_input("write three coordinates: ").split()] if point1[0] >= 0 and point1[1] >= 0 and point1[2] >= 0 and \ point2[0] >= 0 and point2[1] >= 0 and point2[2] >= 0: diff_x = point1[0] - point2[0] diff_y = point1[1] - point2[1] diff_z = point1[2] - point2[2] print "the distance is", (diff_x**2 + diff_y**2 + diff_z**2)**0.5 Note that ``print`` is *inside* the ``if``. #. Solution: we know that ``number`` is an arbitrary integer, chosen by the user:: if number % 3 == 0: print "divisible by 3" elif numero % 3 != 0: print "not divisible by 3" else: print "dunno" ``if``, ``elif`` and ``else`` form a chain: only one among them is executed. #. ``if`` is executed if and only if ``number`` is divisibile by three. #. ``elif`` is executed if and only if the previous ``if`` is not executed and if ``number`` is *not* divisible by three. #. ``else`` is execute whenever neither ``if`` and ``elif`` are executed. Since all numbers are either divisible by ``3`` either not, there is no other possibility, ``else`` will *never* be executed. Therefore, the answer is no. #. Solution: as before, ``number`` is an arbitrary integer. The code is:: number = int(raw_input("write a number: ")) if number % 2 == 0: print "divisible by 2" if number % 3 == 0: print "divisible by 2" if number % 2 != 0 and number % 3 != 0: print "dunno" Here we don't have "chains" of ``if``, ``elif`` ed ``else``: we have three independent ``if``. #. The first ``if`` is executed if and only if ``number`` is divisible by two. #. The second ``if`` is executed if and only if ``number`` is divisible by three. #. The third ``if`` is executed if and only if ``number`` is *not* divisible by neither two and three. If ``number`` is 6, divisible by both two and three, the first two ``if`` will be both executed, while the third won't be. If ``number`` is 5, not divisible by neither two and three, the first two ``if`` will *not* be executed, but the third will be. Therefore, the answer is yes. (There is no possibility to *not* execute neither of the three ``if``.) #. Solution:: answer = raw_input("sum or product?: ") if answer == "sum": num1 = int(raw_input("number 1: ")) num2 = int(raw_input("number 2: ")) print "the sum is", num1 + num2 elif answer == "product": num1 = int(raw_input("num1: ")) num2 = int(raw_input("num2: ")) print "the product is", num1 * num2 Using ``if`` or ``elif`` won't change the execution of the program. We can simplify like this:: answer = raw_input("sum or product?: ") num1 = int(raw_input("number 1: ")) num2 = int(raw_input("number 2: ")) if answer == "sum": print "the sum is", num1 + num2 elif answer == "product": print "the product is", num1 * num2 Iterative code: ``for`` and ``while`` -------------------------------------- #. Solutions: #. Solution:: for number in range(10): print number #. Solution:: for number in range(10): print number**2 #. Solution:: sum_of_squares = 0 for number in range(10): sum_of_squares = sum_of_squares + number**2 print sum_of_squares #. Solution:: product = 1 # note that for the product the initial value should be 1! for number in range(1,10): product = product * number print product #. Solution:: volume_of = { "A": 67.0, "C": 86.0, "D": 91.0, "E": 109.0, "F": 135.0, "G": 48.0, "H": 118.0, "I": 124.0, "K": 135.0, "L": 124.0, "M": 124.0, "N": 96.0, "P": 90.0, "Q": 114.0, "R": 148.0, "S": 73.0, "T": 93.0, "V": 105.0, "W": 163.0, "Y": 141.0, } sum_of_volumes = 0 for volume in volume_of.values(): sum_of_volumes = sum_of_volumes + volume print sum_of_volumes #. Solution:: volume_of = { "A": 67.0, "C": 86.0, "D": 91.0, "E": 109.0, "F": 135.0, "G": 48.0, "H": 118.0, "I": 124.0, "K": 135.0, "L": 124.0, "M": 124.0, "N": 96.0, "P": 90.0, "Q": 114.0, "R": 148.0, "S": 73.0, "T": 93.0, "V": 105.0, "W": 163.0, "Y": 141.0, } fasta = """>1BA4:A|PDBID|CHAIN|SEQUENCE DAEFRHDSGYEVHHQKLVFFAEDVGSNKGAIIGLMVGGVV""" # Let's extract the sequence sequence = fasta.split("\n")[1] sum_of_volumes = 0 # for each character in the sequence ... for aa in sequence: volume_of_aa = volume_of[aa] sum_of_volumes = sum_of_volumes + volume_of_aa print sum_of_volumes #. Solution: let's adapt the code from the previous example:: list = [1, 25, 6, 27, 57, 12] minimum_so_far = list[0] for number in list[1:]: if number < minimum_so_far: minimum_so_far = number print "the minimum value is:", minimum_so_far #. Solution: let's combine the example and the previous exercise:: list = [1, 25, 6, 27, 57, 12] max = list[0] min = list[0] for number in list[1:]: if number > max: max = number if number < min: min = number print "minimum =", min, "maximum =", max #. Solution: ``range(0, len(sequence), 3)`` returns ``[0, 3, 6, 9, ...]``, containing the positions of the first character of all the triplets. Let's write:: sequence = "ATGGCGCCCGAACAGGGA" # let's start from an empty list triplets = [] for pos_start in range(0, len(sequence), 3): triplets = sequence[pos_start:pos_start+3] triplets.append(triplets) print triplets #. Solution:: text = """>2HMI:A|PDBID|CHAIN|SEQUENCE PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI >2HMI:B|PDBID|CHAIN|SEQUENCE PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI >2HMI:C|PDBID|CHAIN|SEQUENCE DIQMTQTTSSLSASLGDRVTISCSASQDISSYLNWYQQKPEGTVKLLIYY >2HMI:D|PDBID|CHAIN|SEQUENCE QITLKESGPGIVQPSQPFRLTCTFSGFSLSTSGIGVTWIRQPSGKGLEWL >2HMI:E|PDBID|CHAIN|SEQUENCE ATGGCGCCCGAACAGGGAC >2HMI:F|PDBID|CHAIN|SEQUENCE GTCCCTGTTCGGGCGCCA""" # first, let's split the text il lines lines = text.split("\n") # then, let's create an empty dictionary sequence_of = {} # now we can iterate on lines for line in lines: if line[0] == ">": # if the line is a header, we extract the sequence name name = line.split("|")[0] else: # the line contains the sequence, that we add to the dictionary, using the name extracted before as key sequence_of[name] = line print sequence_of #. Solutions: #. Solution:: while raw_input("write 'STOP': ") != "STOP": print "you must write 'STOP'..." #. Solution:: while raw_input("write stop: ").lower() != "stop": print "you must write 'stop'..." #. Solutions: #. Solution: all numbers in ``range(10)``. #. Solution: the number ``0``. ``break`` immediately interrupts the ``for`` cycle. #. Solution: all numbers in ``range(10)``. ``continue`` jumps to the next iteration, as Python automatically does when the instructions in the ``for`` cycle are finished. Since ``continue`` in this case is right at the end of the ``for`` cycle, it doesn't have any effect. #. Solution: the number ``0``. In the first iteration, when ``number`` has value ``0``, first Python executes ``print number``, printing ``0``; then ``if`` is executed, and also the ``break`` inside the ``if``, immediately interrupting the ``for`` cycle. #. Solution: nothing. In the first iteration, when ``number`` has value ``0``, ``if`` is executed and also the ``break`` inside the ``if``, immediately interrupting the ``for`` cycle. Therefore, ``print`` is never executed. #. Solution: nothing. Instructions inside the ``while`` are never executed, since the condition is ``False``! #. Solution: nothing. Instructions inside the ``while`` are never executed, since the condition is ``False``! As a consequence, the line ``condition = True`` is never executed. #. Solution: ``"the condition is true"`` an infinite number of times. Since the condition is always ``True``, the ``while`` never stops iterating! #. Solution: ten strings of the form ``"position 0 contains the element 0"``, ``"position 1 contains the element 1"``, *and so on* #. Solution: all the elements of ``lines`` (processed by ``strip()``) occurring before the first empty line: ``"line 1"``, ``"line 2"`` and ``"line 3"``. As soon as ``line`` has value ``""`` (the fourth element of ``lines``) the ``if`` is executed, and ``break`` interrupts the cycle. Note that the fourth row is *not* printed. #. Solution:: numbers = (0, 1, 1, 0, 0, 0, 1, 1, 2, 1, 2) for i in range(len(numbers)): number_in_pos_i = numbers[i] if number_in_pos_i == 2: print "the position is", i break #. Solution:: strings = ("000", "51", "51", "32", "57", "26") for i in range(len(strings)): string_in_pos_i = strings[i] if "2" in string_in_pos_i: print "position =", i, "value =", string_in_pos_i break #. Solution:: length = int(raw_input("write the length of the sequence: ")) import random alphabet = "AGCT" sequence = "" for i in range(length): index = random.randint(0, 3) sequence = sequence + alphabet[index] print sequence Nested code ------------- #. Solution:: n = 5 matrix = [range(n) for i in range(n)] for line in matrix: for element in line: print element #. Solution: #. All the elements of the matrix. #. The *sum* of all the elements of the matrix. #. Again, all the elements of the matrix. #. Again, all the elements of the matrix. #. The list of the elements on the diagonal. #. Solution:: numbers = [8, 3, 2, 9, 7, 1, 8] for num_1 in numbers: for num_2 in numbers: print num_1, num_2 This code is very similar to the clock example! #. Solution:: numbers = [8, 3, 2, 9, 7, 1, 8] already_printed_pairs = [] for i in range(len(numbers)): for j in range(len(numbers)): pair = (numbers[i], numbers[j]) # check wheter we already printed the symmetric pair if (pair[1], pair[0]) in already_printed_pairs: continue # this code will be executed if the pair has not been printed: # print the pair and update already_printed_pairs print pair already_printed_pairs.append(pair) #. The solution is the same of the previous exercise. #. Solution:: numbers = range(10) for element_1 in numbers: for element_2 in numbers: if 2 * element_1 == element_2: print element_1, element_2 #. Solution:: numbers = [8, 3, 2, 9, 7, 1, 8] for element_1 in numbers: for element_2 in numbers: if element_1 + element_2 == 10: print element_1, element_2 #. Solution:: numbers = [8, 3, 2, 9, 7, 1, 8] # first, let's create an empty list list_of_pairs = [] for element_1 in numbers: for element_2 in numbers: if element_1 + element_2 == 10: # update the list with append() list_of_pairs.append((element_1, element_2)) # finally, let's print the list print list_of_pairs #. Solution:: numbers_1 = [5, 9, 4, 4, 9, 2] numbers_2 = [7, 9, 6, 2] # iteration on the *first* list for i in range(len(numbers_1)): num_in_pos_i = numbers_1[i] # iteration on the *second* list for j in range(len(numbers_2)): num_in_pos_j = numbers_2[j] if num_in_pos_i == num_in_pos_j: print "positions:", i, j, "; repeated value:", num_in_pos_i #. Solution:: numbers_1 = [5, 9, 4, 4, 9, 2] numbers_2 = [7, 9, 6, 2] # first, let's create an empty list list_of_triplets = [] # iteration on the *first* list for i in range(len(numbers_1)): num_in_pos_i = numbers_1[i] # iteration on the *second* list for j in range(len(numbers_2)): num_in_pos_j = numbers_2[j] if num_in_pos_i == num_in_pos_j: # instead of printing, we update the list llist_of_triplets.append((i, j, num_in_pos_i)) # finally, let's print the list print list_of_triplets #. Solution:: n = 5 matrix = [range(n) for i in range(n)] # let's initialize with the first element (any other element would be fine as well) max_element_so_far = matrix[0][0] # iteration... for line in matrix: for element in line: # we update max_element_so_far when we find a higher element, if element > max_element_so_far: max_element_so_far = element print max_element_so_far #. Solution:: sequences = [ "ATGGCGCCCGAACAGGGA", "GTCCCTGTTCGGGCGCCA", ] # first, let's create an empty list result = [] # iteration for sequence in sequences: # split the current sequence in triplets triplets = [] for i in range(0, len(sequence), 3): triplets.append(sequence[i:i+3]) # append (*not* extend()!!!) the obtained triplets # to the list result result.append(triplets) # finally, let's print the list print result #. Solution:: numbers = [5, 9, 4, 4, 9, 2] num_occurrences = {} for number in numbers: if not num_occurrences.has_key(number): num_occurrences[number] = 1 else: num_occurrences[number] += 1 alternatively:: numbers = [5, 9, 4, 4, 9, 2] num_occurrences = {} for number in numbers: if not num_occurrences.has_key(number): num_occurrences[number] = 0 num_occurrences[number] += 1 or, using ``count()``:: numbers = [5, 9, 4, 4, 9, 2] num_occurrences = {} for number in numbers: if not num_occurrences.has_key(number): num_occurrences[number] = numbers.count(number) Note that in the last variant, the ``if`` line is optional (but not the following "content"!) #. Solution:: groups = [["gene1", "gene2"], ["gene3"], [], ["gene4", "gene5"]] # let's initialize with the first group biggest_group_so_far = groups[0] # iteration for grup in groups[1:]: if len(gropu) > len(biggest_group_so_far): biggest_group_so_far = group print biggest_group_so_far #. Solution:: sequences_2HMI = { "A": "PISPIETVPVKLKPGMDGPKVKQWPLTEEKI", "B": "PISPIETVPVKLKPGMDGPKVKQWPLTEEKI", "C": "DIQMTQTTSSLSASLGDRVTISCSASQDISS", "D": "QITLKESGPGIVQPSQPFRLTCTFSGFSLST", "E": "ATGGCGCCCGAACAGGGAC", "F": "GTCCCTGTTCGGGCGCCA", } # let's start with an empty dictionary histograms = {} for key, sequence in sequences_2HMI.items(): # let's associate this key to an empty dictionary histograms[key] = {} for residue in sequence: if not histograms[key].has_key(residue): histograms[key][residue] = 1 else: histograms[key][residue] += 1 # let's print the result print histograms # let's print the result more clearly for key, histogram in histograms.items(): print key print histogram print "" #. Solution:: table = [ "protein domain start end", "YNL275W PF00955 236 498", "YHR065C SM00490 335 416", "YKL053C-A PF05254 5 72", "YOR349W PANTHER 353 414", ] # as before, first let's extract column names from the first row column_names = table[0].split() # let's start from an empty list lines_as_dictionaries = [] # now, let's iterate on the other rows for line in table[1:]: # let's compile the dictionary for this row dictionary = {} words = line.split() for i in range(len(words)): # extract the corresponding word word= words[i] # extract the corresponding column name column_name = column_names[i] # update the dictionary dictionary[column_name] = word # having compiled the dictionary for this line, # we can update the list lines_as_dictionaries.append(dictionary) # finished! now let's print the result (one row at a time, # to make it easier to read) for row in lines_as_dictionaries: print row #. Solution:: alphabel_lo = "abcdefghijklmnopqrstuvwxyz" alphabet_up = alfabeto_min.upper() # let's build the dictionary lo_to_up = {} for i in range(len(alphabel_lo)): lo_to_up[alphabel_lo[i]] = alphabel_up[i] string = "I am a string" # let's convert the string converted_chars = [] for character in string: if lo_to_up.has_key(character): # convert the alphabetic character converted_chars.append(lo_to_up[character]) else: # we don't convert it (e.g., it's not an alphabetic character) converted_chars.append(character) converted_string = "".join(converted_chars) print converted_string #. Solution:: lines_1 = open(raw_input("path 1: ")).readlines() lines_2 = open(raw_input("path 2: ")).readlines() # we have to be careful, since the two files could be of different lengths! max_lines = len(lines_1) if len(lines_2) > max_lines: max_lines = len(lines_2) # iteration on the lines of both files for i in range(max_lines): # take the i-th line of the first file, if existent, if i < len(lines_1): line_1 = lines_1[i].strip() else: line_1 = "" # take the i-th line of the second file, if existent, if i < len(lines_2): line_2 = lines_2[i].strip() else: line_2 = "" print line_1 + " " + line_2 #. Solution:: # let's read the fasta file fasta_as_dictionary = {} for line in open("data/dna-fasta/fasta.1").readlines(): # let's clean the sequence line = line.strip() if line[0] == ">": header = line fasta_as_dictionary[header] = "" else: fasta_as_dictionary[header] += line # let's iterate on header-sequence pairs for header, sequence in fasta_as_dictionary.items(): print "processind", header # let's count the number of occurrences of each nucleotide count = {} for nucleotide in ("A", "C", "G", "T"): count[nucleotide] = sequence.count(nucleotide) print "nucleotide occurrences:", count # calculate gc-content gc_content = (count["G"] + count["C"]) / float(len(sequence)) print "GC content:", gc_content # calculate the AT/GC-ratio sum_at = count["A"] + count["T"] sum_cg = count["C"] + count["G"] at_gc_ratio = float(sum_at) / float(sum_cg) print "AT/GC-ratio:", at_gc_ratio