Python: Functions¶

Functions are named blocks of code. They take inputs and produce outputs.

The abstract syntax is:

def function(arg1, arg2, ...):
    # the code
    return result

Once a function is defined (as above), it can be called as follows:

the_result = function(value1, value2, ...)

The arguments (arg1, arg2, etc.) are variables that specify how many inputs the function takes. The variable result is output from the function to its caller.

Warning

The name of the variables I pass to the function has nothing to do with the name of the arguments.

In the code above, the values of the variables valueX are visible from the inside the function as argX:
- arg1 takes the value of value1
- arg2 takes the value of value2
- etc.
The name of the variable I use to store the result of the function has nothing to do with the name of the variable used inside the function to store the result.

In the code above, the value of the result is stored in the result variable inside the function, but is stored inside the the_result variable by the caller.
- the_result takes the value of result

Example:

def function(a, b):
    r = a + b
    return r

x = 5
y = 10

z = function(x, y)
print z

Here a takes its value from x, b from y; return r makes the function return the value of r, which is assigned by the caller to the variable z.

Example. Let’s define a function that takes two numbers and returns their sum:

def add(n, m):
    return n + m

It can be used as follows:

result = add(4, 6)
print result            # 10

result = add(6, 4)
print result            # 10

By replacing the calls to add() with the code in the definition of the add function, and substituting the arguments with the input values, we get the equivalent code:

n = 4
m = 6
result = n + m
print result            # 10

n = 6
m = 4
result = n + m
print result            # 10

As with all Python-provided functions, I can also skip assigning the return value to a variable (result in the code above):

print add(4, 6)         # 10

Example. Let’s write a function print_sum() that prints the sum of two numbers:

def print_sum(n, m):
    print n + m

It can be used as follows:

print_sum(4, 6)         # prints 10
print_sum(6, 4)         # prints 10

Warning

Notice that there is no return statement in print_sum().

When return is omitted, the function automatically returns None:

result = print_sum(4, 6)    # prints 10
print result                # prints None

Warning

A function does nothing until it is called.

Consider the Python module test.py:

print "beginning"

def function():
    print "I do stuff"

print "end"

Running it with the Python interpreter:

$ python test.py

produces the following output:

beginning
end

As you can see, the interpreter executes the code line by line. However, while the function function() is defined in the middle of the module, it is not called anywhere. Therefore it is not executed at all.

In order to actually call the function, let’s write:

print "beginning"

def function():
    print "I do stuff"

function()

print "end"

This code prints:

beginning
I do stuff
end

as expected.

Example. Let’s write a function factorial() that computes the factorial of an integer n:

\[n! = 1 \times 2 \times 3 \times \ldots (n - 2) \times (n - 1) \times n\]

Now, let’s compute the factorial of n normally (i.e. without defining a new function):

fact = 1
for k in range(1, n + 1):
    fact = fact * k

Now that we have the code for computing the factorial, it is easy to write a function that computes the factorial: it is sufficient to plug the above code inside a new function, as follows:

def factorial(n):
    fact = 1
    for k in range(1, n + 1):
        fact = fact * k
    return fact

And let’s check if it works:

print factorial(1)          # 1
print factorial(2)          # 2
print factorial(3)          # 6
print factorial(4)          # 24
print factorial(5)          # 120
print factorial(6)          # 720

Of course, the new function can be used like any of the Python-defined functions, e.g. in list comprehensions:

factorials = [factorial(n) for n in range(10)]

Warning

The name of the function, as well as the name of the arguments, are arbitrary: pick whichever name you find more fitting.

Quiz. What is the difference between this code:

def arith(op, a, b):
    if op == "+":
        return a + b
    elif op == "*":
        return a * b
    else:
        return None

print arith("+", 10, 10)
print arith("*", 2, 2)

and this code?:

def f(what, x, y):
    if what == "+":
        return x + y
    elif what == "*":
        return x * y
    else:
        return 0

print f("+", 10, 10)
print f("*", 2, 2)

Note

A function can return more than one result, as follows:

def multiresult():
    result_1 = "first result"
    result_2 = 0.12
    result_3 = "something else"
    return result_1, result_2, result_3

Internally, Python interprets the return statement as returning a tuple. In practice, the above code is equivalent to:

def multiresult():
    return ("first result", 0.12, "something else")

When I call a “multi-result” function, I can either put the resulting tuple into a variable and extract the various elements individually:

result = multiresult()
res1 = result[0]
print res1
res2 = result[1]
print res2
res3 = result[2]
print res3

or I can use the “automatic unpacking” feature of Python, as follows:

res1, res2, res3 = multiresult()
print res1
print res2
print res3

Warning

Variables have a scope, and in particular:

Variables declared outside the function are not visible the inside. [1]

If you want to pass one or more values from the outside to the function, pass them through the arguments.
Variables declared inside the function are not visible from the outside.

If you want to pass one or more values from the function to the external world, use the return statement.

[1] There are exceptions to this rule; we will ignore them in this presentation.

Exampe. Consider this code:

def find_physical(triples):
    """Takes a mixed interaction protein network, example:

        [("1A3A", "physical", "5ARM"),
         ("5JTD", "genetic", "5TGD")]

    and extracts physical interacting protein pairs, example:

        [("1A3A", "5ARM")]
    """
    phys_pairs = []
    for p1, relation, p2 in triples:
        if relation == "physical":
            phys_pairs.append((p1, p2))
    # XXX I forgot to return `phys_pairs` here!

network = [
    ("1A3A", "physical", "5ARM"),
    ("5JTD", "genetic", "5TGD")
]
find_physical(network)
print phys_pairs

Here phys_pairs is declared inside the function: it is not visible from the outside!

In order to fix this issue, I have to explicitly return it:

def find_physical(triples):
    phys_pairs = []
    for p1, relation, p2 in triples:
        if relation == "physical":
            phys_pairs.append((p1, p2))
    return phys_pairs

network = [
    ("1A3A", "physical", "5ARM"),
    ("5JTD", "genetic", "5TGD")
]
result = find_physical(network)
print result

Example. Functions can call other functions. Let’s write two functions:

def read_fasta(path):
    """Reads a FASTA file with one-line sequences."""
    fasta = {}
    for line in open(path).readlines():
        line = line.strip()
        if line[0].startswith(">"):
            header = line
        else:
            fasta[header] = line
    return fasta

def compute_histogram(sequence):
    """Computes the histogram of the characters."""
    histogram = {}
    for letter in sequence:
        if not histogram.has_key(letter):
            histogram[letter] = 0
        histogram[letter] += 1
    return histogram

These functions can be used to implement a complex program that:

Reads a FASTA file into a dictionary
For each sequence in the FASTA file, computes the histogram of its letters
Prints each sequence header and the corresponding histogram

as follows:

path = raw_input("enter a path: ")
fasta = read_fasta(path)

for header, sequence in fasta.items():
    histogram = compute_histogram(sequence)
    print "header =", header.lstrip(">"), ":"
    print histogram

Example. Since functions can call other functions, the “call graph” of a program can become arbitrarily complicated. Let’s see a moderately realistic example of what a call graph looks like.

Let’s write a (mock!) program, composed of multiple functions, that asks the user for:

the path to one or more FASTA files.
the path to a file describing a physical protein interaction network (PIN).

and computes some average statistic (say, a histogram) of the amino acid composition of interacting proteins.

When ran, the program does the following:

reads the sequence data from each FASTA file, see the read_sequences() and read_fasta() functions
reads the interaction network with the read_interactions() function
for each pair of interacting proteins, computes statistics about their joint amino acid composition, through the compute_aa_stats() function, and computes an “average” summary statistic in the compute_avg_stats() function.

Here is the code:

def read_fasta(path):
    """Takes a path to a FASTA file, returns a
    header->sequence dict."""
    # TODO actually read the file
    return "1A3A:A", "MANLFKLG..."

def read_sequences(paths):
    """Reads a bunch of FASTA files, returns a
    list of dicts."""
    header_to_seq = {}
    for path in paths:
        header, seq = read_fasta(path)
        header_to_seq[header] = seq
    return header_to_seq

def read_interactions(path):
    """Reads physical protein interactions from a
    file. Returns a list of pairs of strings."""
    # TODO actually read the file
    return [("1A3A:A", "5AA3:F"), ("5AA3:F", "5K9C:A")]

def compute_aa_stats(seq1, seq2):
    """Compute amino acid statistics, e.g.
    co-occurrence."""
    # TODO actually compute co-occurrence and MI
    cooccurrence = {"A": 0.2, "C": 0.01}
    mutual_information = 0.72
    return cooccurrence, mutual_information

def compute_avg_stats(sequences, interactions):
    """Takes a list of statistics (in some format) and
    computes the average statistics."""
    stats = []
    for prot1, prot2 in interactions:
        if not (sequences.has_key(prot1) and sequences.has_key(prot2)):
            continue
        seq1 = sequences[prot1]
        seq2 = sequences[prot2]
        stats.append(compute_aa_stats(seq1, seq2))
    # TODO actually average all the collected statistics
    return 0.3

def main():
    """The whole (fake) program."""

    # Read the sequence files
    paths = []
    while True:
        ans = raw_input("path to FASTA file: ")
        if len(ans) == 0:
            break
        paths.append(ans)

    sequences = read_sequences(paths)

    # Read the interaction file
    ans = raw_input("path to interaction data: ")
    interactions = read_interactions(ans)

    # Print the average stats
    print "average stats =", compute_avg_stats(sequences, interactions)

main()

As you can see, Python begins by calling the main() function at the very last line of the program. The main() function calls all the other “major” functions: read_sequences(), read_interactions() and compute_avg_stats().

The read_sequences() function internally calls the read_fasta() function multiple times, once for each user-provided FASTA file.

The read_interactions() function calls no other function.

The compute_avg_stats() function uses the compute_aa_stats() function to compute the statistics of individual protein-protein pairs.

The above can be summarized using a “call graph” like this:

Quiz. How many times is: - the main() function called? - the read_fasta() function called? - the compute_aa_stats() function called?