Python: Strings (Solutions)ΒΆ

Note

Later on in the solutions, I will sometimes use the backslash character \ at the end of a line.

When used this way, \ tells Python that the command continues on the following line, allowing to break long commands over multiple lines.

  1. Solutions:

    1. Solution:

      #        12345
      text = "     "
      print text
      print len(text)
      
    2. Solution:

      at_least_one_space = " " in text
      
      # check whether it works
      print " " in "nospaceatallhere"
      print " " in "onlyonespacehere--> <--"
      print " " in "more spaces in here"
      
    3. Solution:

      exactly_5_characters = len(text) == 5
      
      # check whether it works
      print len("1234") == 5
      print len("12345") == 5
      print len("123456") == 5
      
    4. Solution:

      empty_string = ""
      print len(empty_string) == 0
      
    5. Solution:

      base = "Python is great"
      repeats = base * 100
      
      # check whether the length is correct
      print len(repeats) == len(base) * 100
      
    6. Solution:

      part_1 = "but cell"
      part_2 = "biology"
      part_3 = "is way better"
      
      text = (part_1 + part_2 + part_3) * 1000
      
    7. Let’s try this:

      start_with_1 = "12345".startswith(1)
      

      but Python gives an error message:

      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      TypeError: startswith first arg must be str, unicode, or tuple, not int
      #                     ^^^^^^^^^^^^^^^^^^^^^                     ^^^^^^^
      

      The error message, see highlighted parts, says that startswith() requires the argument to be a string, non an int as in our case: 1, is an int.

      The solution is:

      start_with_1 = "12345".startswith("1")
      print start_with_1
      

      the value is True, as expected.

    8. Solution:

      string = "\\"
      string
      print string
      print len(string)                  # 1
      

      alternatively:

      string = r"\"
      string
      print string
      print len(string)                  # 1
      
    9. Already checked before, the answer is no. Anyway:

      backslash = r"\"
      
      print backslash*2 in "\\"           # False
      
    10. First method:

      backslash = r"\"
      
      condition = text.startswith(backslash) or \
                   text.endswith(backslash)
      

      Second method:

      condition = (text[0] == backslash) or \
                   (text[-1] == backslash)
      
    11. Solution:

      condition = \
           text.startswith("xxx") or \
          (text.startswith("xx") and text.endswith("x")) or \
          (text.startswith("x")  and text.endswith("xx")) or \
                                      text.endswith("xxx")
      

      It’s worth to check the condition using the examples provided in the exercise.

  2. Solution:

    s = "0123456789"
    print len(s)                        # 10
    

    Which of the following extractions are correct?

    1. s[9]: correct, extracts the last character.
    2. s[10]: invalid.
    3. s[:10]: corrett, extracts all characters (remember that the second index, 10 in this case, is exclusive.)
    4. s[1000]: invalid.
    5. s[0]: correct, extracts the first character.
    6. s[-1]: correct, extracts the last character.
    7. s[1:5]: correct, ectracts from the 2nd to the 6th character.
    8. s[-1:-5]: correct
    9. s[-5:-1]: correct, but nothing is extracted (indexes are inverted!)
    10. s[-1000]: invalid.
  3. Solution (one of two possible solutions):

    text = """never say \"never!\"
    \said the sad turtle."""
    
  4. Solution:

    string = "a 1 b 2 c 3"
    
    digit = "DIGIT"
    character = "CHARACTER"
    
    result = string.replace("1", digit)
    result = result.replace("2", digit)
    result = result.replace("3", digit)
    result = result.replace("a", character)
    result = result.replace("b", character)
    result = result.replace("c", character)
    
    print result                     # "CHARACTER DIGIT CHARACTER ..."
    

    In one line:

    print string.replace("1", digit).replace("2", digit) ...
    
  5. Solution:

    chain_a = """SSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKM
    FCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVV
    RRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFR
    HSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILT
    IITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG
    EPHHELPPGSTKRALPNNT"""
    
    
    num_lines = chain_a.count("\n") + 1
    print num_lines                          # 6
    
    
    # NOTE: we want to know the length of the actual *sequence*, non the length of the *string*
    length_sequence = len(chain_a) - chain_a.count("\n")
    print length_sequenza                    # 219
    
    
    sequence = chain_a.replace("\n", "")
    print len(chain_a) - len(sequence)          # 5 (giusto)
    print len(sequence)                         # 219
    
    
    num_cysteine = sequence.count("C")
    num_histidine = sequence.count("H")
    print num_cysteine, num_histidine            # 10, 9
    
    
    print "NLRVEYLDDRN" in sequence             # True
    print sequence.find("NLRVEYLDDRN")          # 106
    # let's check
    print sequence[106 : 106 + len("NLRVEYLDDRN")]  # "NLRVEYLDDRN"
    
    
    index_first_newline = chain_a.find("\n")
    first_line = chain_a[:index_first_newline]
    print first_line
    
  6. Solution:

    structure_chain_a = """SER A 96 77.253 20.522 75.007
    VAL A 97 76.066 22.304 71.921
    PRO A 98 77.731 23.371 68.681
    SER A 99 80.136 26.246 68.973
    GLN A 100 79.039 29.534 67.364
    LYS A 101 81.787 32.022 68.157"""
    
    # I use a variable with a shorter name
    chain = structure_chain_a
    
    
    index_first_newline = chain.find("\n")
    index_second_newline = chain[index_first_newline + 1:].find("\n")
    index_third_newline = chain[index_second_newline + 1:].find("\n")
    print index_first_newline, index_second_newline, index_third_newline
    
    second_line = chain[index_first_newline + 1 : index_second_newline]
    print second_line                      # "VAL A 97 76.066 22.304 71.921"
                                            #           |    | |    | |    |
                                            #  01234567890123456789012345678
                                            #  0         1         2
    
    x = second_line[9:15]
    y = second_line[16:22]
    z = second_line[23:]
    print x, y, z
    # NOTE: they are all strings
    
    
    third_line = chain[index_second_newline + 1 : index_third_newline]
    print third_line                        # "PRO A 98 77.731 23.371 68.681"
                                            #           |    | |    | |    |
                                            #  01234567890123456789012345678
                                            #  0         1         2
    
    x_prime = third_line[9:15]
    y_prime = third_line[16:22]
    z_prime = third_line[23:]
    print x_prime, y_prime, z_prime
    # NOTE: they are all strings
    
    
    # we should convert all variables to floats, in order to calculate distances
    x, y, z = float(x), float(y), float(z)
    x_prime, y_prime, z_prime = float(x_prime), float(y_prime), float(z_prime)
    
    diff_x = x - x_prime
    diff_y = y - y_prime
    diff_z = z - z_prime
    
    distance = (diff_x**2 + diff_y**2 + diff_z**2)**0.5
    print distance
    

    The solution is way simpler using split():

    lines = chain.split("\n")
    second_line = lines[1]
    third_line = lines[2]
    
    words = second_line.split()
    x, y, z = float(words[-3]), float(words[-2]), float(words[-1])
    
    words = third_line.split()
    x_prime, y_prime, z_prime = float(words[-3]), float(words[-2]), float(words[-1])
    
    distance = ((x - x_prime)**2 + (y - y_prime)**2 + (z - z_prime)**2)**0.5
    
  7. Solutions:

    1. Solution:

      dna_seq = dna_seq.replace("\n", "") # Remove newline characters
      length = len(dna_seq)               # Calculate length
      ng = dna_seq.count("G")             # Calculate the number of Gs
      nc = dna_seq.count("C")             # Calculate the number of Cs
      gc_cont = (ng + nc)/float(length)   # Calculate the GC-content
      
    2. Solution:

      rna_seq = dna_seq.replace("T","U")
      
    3. Solution:

      intron = dna_seq[50:156]        # Careful with indexes
      exon1 = dna_seq[:50]            # Careful with indexes
      exon2 = dna_seq[156:]           # Careful with indexes
      spliced = exon1+exon2