{ "cells": [ { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0, "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0 }, "source": [ "# Wrting python functions" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0 }, "source": [ "## What makes code beautiful?\n", "\n", "Reference: [Chapter 29 of Beautiful Code](http://webcat2.library.ubc.ca/vwebv/holdingsInfo?searchId=169488&recCount=100&recPointer=1&bibId=9436527) by Yukihiro Matsumoto:\n", "\"Treating code as an essay\"\n", "\n", "- Brevity -- no unnecessary information -- DRY \"don't repeat yourself\"\n", "\n", "- Familiarity -- use familiar patterns\n", "\n", "- Simplicity\n", "\n", "- Flexibilty -- simple things should be simple, complex things should be possible\n", "\n", "- Balance\n", "\n", "Coding is a craft, like writing, cooking or furniture making. You develop a sense of balance\n", "by following master craftspeople in an apprenticeship. One of the big benefits of github is that\n", "it gives you a chance to interact with very good programmers in an informal apprenticeship.\n" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0 }, "source": [ "## Some rules of thumb\n", "\n", "1. Functions in a program play the role of paragraphs in an essay. They should express a single idea clearly.\n", "\n", "1. That means they should not be longer than a single screen. Paging is distracting and breaks your concentration. It shouldn't take more than 1 minute to understand what a function does.\n", "\n", "1. Not every function has to be documented, but you should be able to summarize any function you write in a clear, concise, docstring.\n", "\n", "1. The best documentation is a working test case.\n", "\n", "1. You should think about how your function might change in the future, and design in some degree of flexibility.\n", "\n", "1. Functions should have a single entry and a single exit\n", "\n", "1. Whenever possible functions should be free of side effects. Exceptions to this rule include opening and writing files to disk, and modifying large arrays in place to avoid a copy.\n", "\n", "1. If you do modify an erray that is passed as a function argument, return that array to signal the change. In python there is no performance penalty for this, because the array is not copied, instead, a new\n", "name is assigned and python now knows that two names point to the same array. When in doubt,\n", "use the `id` function to get the memory location of the new name and the old name -- they should be\n", "identical\n", "\n", "### Some docstring examples\n", "\n", "1. Formatted: https://phaustin.github.io/a301_code/codedoc/full_listing.html#a301.landsat.toa_radiance.calc_radiance_8\n", "\n", "1. Source: https://phaustin.github.io/a301_code/_modules/a301/landsat/toa_radiance.html#calc_radiance_8" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0 }, "source": [ "## Type systems\n", "\n", "In order to understand python functions, it helps to understand how python handles types.\n", "\n", "Compare C and python:\n", "\n", "* C: Strongly typed, statically typed\n", "\n", "* Python: Stongly typed, dynamically typed" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0 }, "source": [ "### Example of strong typing\n", "\n", "The following cell will raise a TypeError in python. This will also fail to compile in C" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "caught a TypeError -- won't work\n" ] } ], "source": [ "a = 5\n", "try:\n", " b = 5 + \"3\"\n", "except TypeError:\n", " print(\"caught a TypeError -- won't work\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Your turn -- rewrite the cell above with a cast that makes it work" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example of dynamic typing\n", "\n", "The following cell will run in python, but would fail to compile in C\n", "because it reassigns the type of a" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "the type of a is \n", "now the type of a is \n" ] } ], "source": [ "a = 5\n", "print(f\"the type of a is {type(a)}\")\n", "a = \"5\"\n", "print(f\"now the type of a is {type(a)}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Type summary\n", "\n", "- Python is strongly typed, which means that it won't coerce a type into another type\n", " without an explicit cast. (\"Explicit is better than implicit\")\n", "\n", "- Python is dynamically typed, which means that a variable name is attached to an instance\n", " of an object, but not to the object's type, so the name can be reassigned to an\n", " instance of a different type." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Flexible functions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### See: [vanderplas Section 6](https://jakevdp.github.io/WhirlwindTourOfPython/08-defining-functions.html) for an explantion of `*args` and `**kwargs`" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def fibonacci(N, a=0, b=1):\n", " L = []\n", " while len(L) < N:\n", " a, b = b, a + b\n", " L.append(a)\n", " return L" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fibonacci(10)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[3, 4, 7, 11, 18, 29, 47, 76, 123, 199]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fibonacci(10, b=3, a=1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### What is going on under the hood" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now rewrite this to be fully flexible -- this is what\n", "the default arguments code is actually doing:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "def fibonacci_raw(*args, **kwargs):\n", " print(f\"I got args={args} and kwargs={kwargs}\")\n", " N = args[0]\n", " L = []\n", " #\n", " # the dictionary \"get\" method takes a second\n", " # argument which is the default value\n", " # that is returned when the dictionary key is missing\n", " #\n", " a = kwargs.get(\"a\", 0)\n", " b = kwargs.get(\"b\", 1)\n", " while len(L) < N:\n", " a, b = b, a + b\n", " L.append(a)\n", " return L" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I got args=(10,) and kwargs={'b': 3, 'a': 1, 'bummer': True}\n" ] }, { "data": { "text/plain": [ "[3, 4, 7, 11, 18, 29, 47, 76, 123, 199]" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fibonacci_raw(10, b=3, a=1, bummer=True)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "I got args=(10,) and kwargs={}\n" ] }, { "data": { "text/plain": [ "[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fibonacci_raw(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Number 1 python \"gotcha\"\n", "\n", "As noted here, there is a subtle issue with using default arguments\n", "that are not numbers or strings. Bottom line, do not do this.\n", "\n", "https://docs.python-guide.org/writing/gotchas/\n", "\n", "Here's an example of how you can get bitten:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "calling with element=12, the id of to_list is 4416911432\n", "\n", "first time I call the function I get [12]\n", "\n", "calling with element=42, the id of to_list is 4416911432\n", "\n", "second time I call the function I get [12, 42]\n" ] } ], "source": [ "def append_to(element, to_list=[]):\n", " to_list.append(element)\n", " print(f\"\\ncalling with element={element}, the id of to_list is {id(to_list)}\\n\")\n", " return to_list\n", "\n", "\n", "my_list = append_to(12)\n", "print(f\"first time I call the function I get {my_list}\")\n", "\n", "my_other_list = append_to(42)\n", "print(f\"second time I call the function I get {my_other_list}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Note that the id of the default list is the same for each call!\n", "\n", "This is generally not what you expect, because you'll get different behaviour with identical\n", "inputs. This violates \"no side effects\" and also \"familiarity\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### The preferred approach -- use None as a default value\n", "\n", "If you want the list to be created fresh by default, then test for None and create it.\n", "\n", "Note that now the two lists have different ids." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "calling with element=12, the id of to_list is 4416911880\n", "\n", "first time I call the function I get [12]\n", "\n", "calling with element=42, the id of to_list is 4416911176\n", "\n", "second time I call the function I get [42]\n" ] } ], "source": [ "def append_to(element, to_list=None):\n", " if to_list is None:\n", " to_list = []\n", " to_list.append(element)\n", " print(f\"\\ncalling with element={element}, the id of to_list is {id(to_list)}\\n\")\n", " return to_list\n", "\n", "\n", "my_list = append_to(12)\n", "print(f\"first time I call the function I get {my_list}\")\n", "\n", "my_other_list = append_to(42)\n", "print(f\"second time I call the function I get {my_other_list}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Duck typing and type casting\n", "\n", "Consider the following function:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "inside trysort, mylist is [1, 2, 3]\n", "inside trysort, mylist is [1 2 3]\n", "last example failed, tuple has no sort method\n" ] } ], "source": [ "import numpy as np\n", "\n", "\n", "def trysort(mylist):\n", " #\n", " # this assumes mylist is a \"duck\" with a sort method\n", " #\n", " mylist.sort()\n", " print(f\"inside trysort, mylist is {mylist}\")\n", " return mylist\n", "\n", "\n", "trysort([3, 2, 1])\n", "trysort(np.array([3, 2, 1]))\n", "try:\n", " trysort((3, 2, 1))\n", "except AttributeError:\n", " print(\"last example failed, tuple has no sort method\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is an example of \"duck typing\"\n", "\n", "```\n", "If it walks like duck, and quacks like a duck\n", "then it's a duck\n", "```\n", "\n", "This function fails because the tuple object has no sort method" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Your turn: in the cell below, use numpy.asarray to cast the argument to an array" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## writing tests\n", "\n", "Python has an extensive testing framework called [pytest](https://docs.pytest.org/en/latest/). This\n", "is overkill for this class, but we can capture the spririt of pytest by writing test functions\n", "with asserts\n", "\n", "### Example" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "ename": "AssertionError", "evalue": "\nNot equal to tolerance rtol=1e-07, atol=0\n\nMismatch: 10%\nMax absolute difference: 3\nMax relative difference: 1.5\n x: array([ 5, 4, 7, 11, 18, 29, 47, 76, 123, 199])\n y: array([ 2, 4, 7, 11, 18, 29, 47, 76, 123, 199])", "output_type": "error", "traceback": [ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0massert_allclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0manswer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 11\u001b[0;31m \u001b[0mtest_fib\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 12\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m\u001b[0m in \u001b[0;36mtest_fib\u001b[0;34m()\u001b[0m\n\u001b[1;32m 7\u001b[0m \u001b[0mresult\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;36m5\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 8\u001b[0m \u001b[0manswer\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m4\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m7\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m11\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m18\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m29\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m47\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m76\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m123\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m199\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 9\u001b[0;31m \u001b[0massert_allclose\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0manswer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 10\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 11\u001b[0m \u001b[0mtest_fib\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/mini37/envs/e213/lib/python3.6/site-packages/numpy/testing/_private/utils.py\u001b[0m in \u001b[0;36massert_allclose\u001b[0;34m(actual, desired, rtol, atol, equal_nan, err_msg, verbose)\u001b[0m\n\u001b[1;32m 1491\u001b[0m \u001b[0mheader\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'Not equal to tolerance rtol=%g, atol=%g'\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mrtol\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0matol\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1492\u001b[0m assert_array_compare(compare, actual, desired, err_msg=str(err_msg),\n\u001b[0;32m-> 1493\u001b[0;31m verbose=verbose, header=header, equal_nan=equal_nan)\n\u001b[0m\u001b[1;32m 1494\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 1495\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;32m~/mini37/envs/e213/lib/python3.6/site-packages/numpy/testing/_private/utils.py\u001b[0m in \u001b[0;36massert_array_compare\u001b[0;34m(comparison, x, y, err_msg, verbose, header, precision, equal_nan, equal_inf)\u001b[0m\n\u001b[1;32m 817\u001b[0m \u001b[0mverbose\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mverbose\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mheader\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mheader\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 818\u001b[0m names=('x', 'y'), precision=precision)\n\u001b[0;32m--> 819\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mAssertionError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsg\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 820\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 821\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mtraceback\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", "\u001b[0;31mAssertionError\u001b[0m: \nNot equal to tolerance rtol=1e-07, atol=0\n\nMismatch: 10%\nMax absolute difference: 3\nMax relative difference: 1.5\n x: array([ 5, 4, 7, 11, 18, 29, 47, 76, 123, 199])\n y: array([ 2, 4, 7, 11, 18, 29, 47, 76, 123, 199])" ] } ], "source": [ "from numpy.testing import assert_allclose\n", "def test_fib():\n", " #\n", " # deliberately insert a wrong result\n", " # \n", " result=fibonacci(10, b=3, a=1)\n", " result[0]=5\n", " answer=[2, 4, 7, 11, 18, 29, 47, 76, 123, 199]\n", " assert_allclose(result,answer)\n", " \n", "test_fib()\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Sumary for testing\n", "\n", "When we start writing python modules, we can use pytest to search through the file, find any\n", "functions with the word \"test\" in their name, and run those tests, generating a report" ] } ], "metadata": { "jupytext": { "cell_metadata_filter": "all", "notebook_metadata_filter": "all", "text_representation": { "extension": ".py", "format_name": "percent", "format_version": "1.2", "jupytext_version": "1.0.1" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }