Python: Strings (Solutions)ΒΆ
Note
Later on in the solutions, I will sometimes use the backslash character
\
at the end of a line.
When used this way, \
tells Python that the command continues on the
following line, allowing to break long commands over multiple lines.
Solutions:
Solution:
# 12345 text = " " print text print len(text)
Solution:
at_least_one_space = " " in text # check whether it works print " " in "nospaceatallhere" print " " in "onlyonespacehere--> <--" print " " in "more spaces in here"
Solution:
exactly_5_characters = len(text) == 5 # check whether it works print len("1234") == 5 print len("12345") == 5 print len("123456") == 5
Solution:
empty_string = "" print len(empty_string) == 0
Solution:
base = "Python is great" repeats = base * 100 # check whether the length is correct print len(repeats) == len(base) * 100
Solution:
part_1 = "but cell" part_2 = "biology" part_3 = "is way better" text = (part_1 + part_2 + part_3) * 1000
Let’s try this:
start_with_1 = "12345".startswith(1)
but Python gives an error message:
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: startswith first arg must be str, unicode, or tuple, not int # ^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^
The error message, see highlighted parts, says that
startswith()
requires the argument to be a string, non an int as in our case:1
, is an int.The solution is:
start_with_1 = "12345".startswith("1") print start_with_1
the value is
True
, as expected.Solution:
string = "\\" string print string print len(string) # 1
alternatively:
string = r"\" string print string print len(string) # 1
Already checked before, the answer is no. Anyway:
backslash = r"\" print backslash*2 in "\\" # False
First method:
backslash = r"\" condition = text.startswith(backslash) or \ text.endswith(backslash)
Second method:
condition = (text[0] == backslash) or \ (text[-1] == backslash)
Solution:
condition = \ text.startswith("xxx") or \ (text.startswith("xx") and text.endswith("x")) or \ (text.startswith("x") and text.endswith("xx")) or \ text.endswith("xxx")
It’s worth to check the condition using the examples provided in the exercise.
Solution:
s = "0123456789" print len(s) # 10
Which of the following extractions are correct?
s[9]
: correct, extracts the last character.s[10]
: invalid.s[:10]
: corrett, extracts all characters (remember that the second index,10
in this case, is exclusive.)s[1000]
: invalid.s[0]
: correct, extracts the first character.s[-1]
: correct, extracts the last character.s[1:5]
: correct, ectracts from the 2nd to the 6th character.s[-1:-5]
: corrects[-5:-1]
: correct, but nothing is extracted (indexes are inverted!)s[-1000]
: invalid.
Solution (one of two possible solutions):
text = """never say \"never!\" \said the sad turtle."""
Solution:
string = "a 1 b 2 c 3" digit = "DIGIT" character = "CHARACTER" result = string.replace("1", digit) result = result.replace("2", digit) result = result.replace("3", digit) result = result.replace("a", character) result = result.replace("b", character) result = result.replace("c", character) print result # "CHARACTER DIGIT CHARACTER ..."
In one line:
print string.replace("1", digit).replace("2", digit) ...
Solution:
chain_a = """SSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKM FCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVV RRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFR HSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILT IITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKG EPHHELPPGSTKRALPNNT""" num_lines = chain_a.count("\n") + 1 print num_lines # 6 # NOTE: we want to know the length of the actual *sequence*, non the length of the *string* length_sequence = len(chain_a) - chain_a.count("\n") print length_sequenza # 219 sequence = chain_a.replace("\n", "") print len(chain_a) - len(sequence) # 5 (giusto) print len(sequence) # 219 num_cysteine = sequence.count("C") num_histidine = sequence.count("H") print num_cysteine, num_histidine # 10, 9 print "NLRVEYLDDRN" in sequence # True print sequence.find("NLRVEYLDDRN") # 106 # let's check print sequence[106 : 106 + len("NLRVEYLDDRN")] # "NLRVEYLDDRN" index_first_newline = chain_a.find("\n") first_line = chain_a[:index_first_newline] print first_line
Solution:
structure_chain_a = """SER A 96 77.253 20.522 75.007 VAL A 97 76.066 22.304 71.921 PRO A 98 77.731 23.371 68.681 SER A 99 80.136 26.246 68.973 GLN A 100 79.039 29.534 67.364 LYS A 101 81.787 32.022 68.157""" # I use a variable with a shorter name chain = structure_chain_a index_first_newline = chain.find("\n") index_second_newline = chain[index_first_newline + 1:].find("\n") index_third_newline = chain[index_second_newline + 1:].find("\n") print index_first_newline, index_second_newline, index_third_newline second_line = chain[index_first_newline + 1 : index_second_newline] print second_line # "VAL A 97 76.066 22.304 71.921" # | | | | | | # 01234567890123456789012345678 # 0 1 2 x = second_line[9:15] y = second_line[16:22] z = second_line[23:] print x, y, z # NOTE: they are all strings third_line = chain[index_second_newline + 1 : index_third_newline] print third_line # "PRO A 98 77.731 23.371 68.681" # | | | | | | # 01234567890123456789012345678 # 0 1 2 x_prime = third_line[9:15] y_prime = third_line[16:22] z_prime = third_line[23:] print x_prime, y_prime, z_prime # NOTE: they are all strings # we should convert all variables to floats, in order to calculate distances x, y, z = float(x), float(y), float(z) x_prime, y_prime, z_prime = float(x_prime), float(y_prime), float(z_prime) diff_x = x - x_prime diff_y = y - y_prime diff_z = z - z_prime distance = (diff_x**2 + diff_y**2 + diff_z**2)**0.5 print distance
The solution is way simpler using
split()
:lines = chain.split("\n") second_line = lines[1] third_line = lines[2] words = second_line.split() x, y, z = float(words[-3]), float(words[-2]), float(words[-1]) words = third_line.split() x_prime, y_prime, z_prime = float(words[-3]), float(words[-2]), float(words[-1]) distance = ((x - x_prime)**2 + (y - y_prime)**2 + (z - z_prime)**2)**0.5
Solutions:
Solution:
dna_seq = dna_seq.replace("\n", "") # Remove newline characters length = len(dna_seq) # Calculate length ng = dna_seq.count("G") # Calculate the number of Gs nc = dna_seq.count("C") # Calculate the number of Cs gc_cont = (ng + nc)/float(length) # Calculate the GC-content
Solution:
rna_seq = dna_seq.replace("T","U")
Solution:
intron = dna_seq[50:156] # Careful with indexes exon1 = dna_seq[:50] # Careful with indexes exon2 = dna_seq[156:] # Careful with indexes spliced = exon1+exon2