Python: Strings (Solutions)¶

Note

Later on in the solutions, I will sometimes use the backslash character \ at the end of a line.

When used this way, \ tells Python that the command continues on the following line, allowing to break long commands over multiple lines.

Solutions:

Solution:

#        12345
text = "     "
print text
print len(text)

Solution:

at_least_one_space = " " in text

# check whether it works
print " " in "nospaceatallhere"
print " " in "onlyonespacehere--> <--"
print " " in "more spaces in here"

Solution:

exactly_5_characters = len(text) == 5

# check whether it works
print len("1234") == 5
print len("12345") == 5
print len("123456") == 5

Solution:

empty_string = ""
print len(empty_string) == 0

Solution:

base = "Python is great"
repeats = base * 100

# check whether the length is correct
print len(repeats) == len(base) * 100

Solution:

part_1 = "but cell"
part_2 = "biology"
part_3 = "is way better"

text = (part_1 + part_2 + part_3) * 1000

Let’s try this:

start_with_1 = "12345".startswith(1)

but Python gives an error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: startswith first arg must be str, unicode, or tuple, not int
#                     ^^^^^^^^^^^^^^^^^^^^^                     ^^^^^^^

The error message, see highlighted parts, says that startswith() requires the argument to be a string, non an int as in our case: 1, is an int.

The solution is:

start_with_1 = "12345".startswith("1")
print start_with_1

the value is True, as expected.

Solution:

string = "\\"
string
print string
print len(string)                  # 1

alternatively:

string = r"\"
string
print string
print len(string)                  # 1

Already checked before, the answer is no. Anyway:

backslash = r"\"

print backslash*2 in "\\"           # False

First method:

backslash = r"\"

condition = text.startswith(backslash) or \
             text.endswith(backslash)

Second method:

condition = (text[0] == backslash) or \
             (text[-1] == backslash)

Solution:

condition = \
     text.startswith("xxx") or \
    (text.startswith("xx") and text.endswith("x")) or \
    (text.startswith("x")  and text.endswith("xx")) or \
                                text.endswith("xxx")

It’s worth to check the condition using the examples provided in the exercise.

Solution:
```
s = "0123456789"
print len(s)                        # 10
```
Which of the following extractions are correct?
1. s[9]: correct, extracts the last character.
2. s[10]: invalid.
3. s[:10]: corrett, extracts all characters (remember that the second index, 10 in this case, is exclusive.)
4. s[1000]: invalid.
5. s[0]: correct, extracts the first character.
6. s[-1]: correct, extracts the last character.
7. s[1:5]: correct, ectracts from the 2nd to the 6th character.
8. s[-1:-5]: correct
9. s[-5:-1]: correct, but nothing is extracted (indexes are inverted!)
10. s[-1000]: invalid.

Solution (one of two possible solutions):

text = """never say \"never!\"
\said the sad turtle."""

Solution:

string = "a 1 b 2 c 3"

digit = "DIGIT"
character = "CHARACTER"

result = string.replace("1", digit)
result = result.replace("2", digit)
result = result.replace("3", digit)
result = result.replace("a", character)
result = result.replace("b", character)
result = result.replace("c", character)

print result                     # "CHARACTER DIGIT CHARACTER ..."

In one line:

print string.replace("1", digit).replace("2", digit) ...

Solution:

chain_a = """SSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKM
FCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVV
RRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFR
HSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILT
IITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG
EPHHELPPGSTKRALPNNT"""


num_lines = chain_a.count("\n") + 1
print num_lines                          # 6


# NOTE: we want to know the length of the actual *sequence*, non the length of the *string*
length_sequence = len(chain_a) - chain_a.count("\n")
print length_sequenza                    # 219


sequence = chain_a.replace("\n", "")
print len(chain_a) - len(sequence)          # 5 (giusto)
print len(sequence)                         # 219


num_cysteine = sequence.count("C")
num_histidine = sequence.count("H")
print num_cysteine, num_histidine            # 10, 9


print "NLRVEYLDDRN" in sequence             # True
print sequence.find("NLRVEYLDDRN")          # 106
# let's check
print sequence[106 : 106 + len("NLRVEYLDDRN")]  # "NLRVEYLDDRN"


index_first_newline = chain_a.find("\n")
first_line = chain_a[:index_first_newline]
print first_line

Solution:

structure_chain_a = """SER A 96 77.253 20.522 75.007
VAL A 97 76.066 22.304 71.921
PRO A 98 77.731 23.371 68.681
SER A 99 80.136 26.246 68.973
GLN A 100 79.039 29.534 67.364
LYS A 101 81.787 32.022 68.157"""

# I use a variable with a shorter name
chain = structure_chain_a


index_first_newline = chain.find("\n")
index_second_newline = chain[index_first_newline + 1:].find("\n")
index_third_newline = chain[index_second_newline + 1:].find("\n")
print index_first_newline, index_second_newline, index_third_newline

second_line = chain[index_first_newline + 1 : index_second_newline]
print second_line                      # "VAL A 97 76.066 22.304 71.921"
                                        #           |    | |    | |    |
                                        #  01234567890123456789012345678
                                        #  0         1         2

x = second_line[9:15]
y = second_line[16:22]
z = second_line[23:]
print x, y, z
# NOTE: they are all strings


third_line = chain[index_second_newline + 1 : index_third_newline]
print third_line                        # "PRO A 98 77.731 23.371 68.681"
                                        #           |    | |    | |    |
                                        #  01234567890123456789012345678
                                        #  0         1         2

x_prime = third_line[9:15]
y_prime = third_line[16:22]
z_prime = third_line[23:]
print x_prime, y_prime, z_prime
# NOTE: they are all strings


# we should convert all variables to floats, in order to calculate distances
x, y, z = float(x), float(y), float(z)
x_prime, y_prime, z_prime = float(x_prime), float(y_prime), float(z_prime)

diff_x = x - x_prime
diff_y = y - y_prime
diff_z = z - z_prime

distance = (diff_x**2 + diff_y**2 + diff_z**2)**0.5
print distance

The solution is way simpler using split():

lines = chain.split("\n")
second_line = lines[1]
third_line = lines[2]

words = second_line.split()
x, y, z = float(words[-3]), float(words[-2]), float(words[-1])

words = third_line.split()
x_prime, y_prime, z_prime = float(words[-3]), float(words[-2]), float(words[-1])

distance = ((x - x_prime)**2 + (y - y_prime)**2 + (z - z_prime)**2)**0.5

Solutions:

Solution:

dna_seq = dna_seq.replace("\n", "") # Remove newline characters
length = len(dna_seq)               # Calculate length
ng = dna_seq.count("G")             # Calculate the number of Gs
nc = dna_seq.count("C")             # Calculate the number of Cs
gc_cont = (ng + nc)/float(length)   # Calculate the GC-content

Solution:
```
rna_seq = dna_seq.replace("T","U")
```

Solution:

intron = dna_seq[50:156]        # Careful with indexes
exon1 = dna_seq[:50]            # Careful with indexes
exon2 = dna_seq[156:]           # Careful with indexes
spliced = exon1+exon2