{ "cells": [ { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0, "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0 }, "source": [ "# Plotting with Matplotlib\n", "\n", "Though there are many options for plotting data in Python, we will be using [Matplotlib](https://matplotlib.org/).\n", "In particular, we will be using the `pyplot` module in Matplotlib, which provides MATLAB-like plotting.\n", "The reason for this is simple: Matplotlib is the most common module used for plotting in Python and many examples of plotting you may find online will be using Matplotlib." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## This assignment\n", "\n", "Run the cells in this notebook -- feel free to experiment with the plots.\n", "\n", "There are two tasks at the bottom of the notebook which you should complete and\n", "upload as usual." ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0 }, "source": [ "## Our dataset\n", "\n", "For our first lesson plotting data using Matplotlib we will again be using the weather data file from Lesson 5.\n", "\n", "- The data file (`Kumpula-June-2016-w-metadata.txt`) is in the `data` subdirectory.\n", "- It contains observed daily mean, minimum, and maximum temperatures from June 2016 recorded from the Kumpula weather observation station in Helsinki. It is derived from a data file of daily temperature measurments downloaded from the [US National Oceanographic and Atmospheric Administration’s National Centers for Environmental Information climate database](https://www.ncdc.noaa.gov/cdo-web/).\n", "\n", "## Getting started\n", "\n", "Let's start by importing the pyplot submodule of Matplotlib." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "from matplotlib import pyplot as plt" ] }, { "cell_type": "markdown", "metadata": { "lines_to_next_cell": 0 }, "source": [ "Note again that we are renaming the Matplotlib pyplot submodule when we import it.\n", "Perhaps now it is more clear why you might want to rename a module on import.\n", "Having to type `matplotlib.pyplot` every time you use one of its methods would be a pain.\n", "\n", "### Loading the data with pandas\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can read in the data file in the same way we have for the numpy notebook" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataFrame = pd.read_csv(\n", " \"data/Kumpula-June-2016-w-metadata.txt\", \n", " header=[8], skip_blank_lines=False\n", ")\n", "data = dataFrame.values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you may recall, we will now have a data file with 4 columns.\n", "Let's rename each of those below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "date = data[:, 0]\n", "temp = data[:, 1]\n", "temp_max = data[:, 2]\n", "temp_min = data[:, 3]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "OK, great.\n", "One thing we'll do a bit differently this week is that we're going to split the data from `dataFrame` into separate Pandas value arrays so we can plot things in the same way as with NumPy.\n", "We can split the data into separate series as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "date = dataFrame[\"YEARMODA\"].values\n", "temp = dataFrame[\"TEMP\"].values\n", "temp_max = dataFrame[\"MAX\"].values\n", "temp_min = dataFrame[\"MIN\"].values" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The `.values` attribute of a Pandas series returns only the numerical values of the given series, not the index list." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Our first plot\n", "\n", "OK, so let’s get to plotting! We can start by using the Matplotlib plt.plot() function." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "x = date\n", "y = temp\n", "plt.plot(x, y);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If all goes well, you should see the plot above.\n", "\n", "OK, so what happened here?\n", "Well, first we assigned the values we would like to plot, the year and temperature, to the variables `x` and `y`.\n", "This isn’t necessary, *per se*, but does make it easier to see what is plotted.\n", "Next, it is perhaps pretty obvious that `plt.plot()` is a function in pyplot that produces a simple *x*-*y* plot.\n", "Conveniently, plots are automatically displayed in Jupyter notebooks, so there is no need for the additional `plt.show()` function you might see in examples you can find online." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic plot formatting\n", "\n", "We can make our plot look a bit nicer and provide more information by using a few additional pyplot options." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "plt.plot(x, y, 'ro--')\n", "plt.title('Kumpula temperatures in June 2016')\n", "plt.xlabel('Date')\n", "plt.ylabel('Temperature [°F]');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This should produce the plot above.\n", "\n", "Now we see our temperature data as a red dashed line with circles showing the data points.\n", "This comes from the additional `ro--` used with `plt.plot()`.\n", "In this case, `r` tells the `plt.plot()` function to use red color, `o` tells it to show circles at the points, and `--` says to use a dashed line.\n", "You can use `help(plt.plot)` to find out more about formatting plots.\n", "Better yet, check out the [documentation for `plt.plot()` online](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html).\n", "We have also added a title and axis labels, but their use is straightforward." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Embiggening\\* the plot\n", "\n", "While the plot sizes we're working with are OK, it would be nice to have them displayed a bit larger.\n", "Fortunately, there is an easy way to make the plots larger in Jupyter notebooks.\n", "To set the default plot size to be larger, simply run the Python cell below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.rcParams[\"figure.figsize\"] = [12, 6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The cell above sets the default plot size to be 12 inches wide by 6 inches tall.\n", "Feel free to change these values if you prefer.\n", "\n", "To test whether this is working as expected, simply re-run one of the earlier cells that generated a plot.\n", "\n", "\\* To *[embiggen](https://en.oxforddictionaries.com/definition/embiggen)* means to enlarge.\n", "It's a perfectly [cromulent](https://en.oxforddictionaries.com/definition/cromulent) word." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Adding text labels to a plot\n", "\n", "Adding text to plots can be done using `plt.text()`.\n", "\n", "```python\n", "plt.text(20160604.0, 68.0, 'High temperature in early June')\n", "```\n", "\n", "This will display the text \"High temperature in early June\" at the location `x = 20160604.0` (i.e., June 4, 2016), `y = 68.0` on the plot.\n", "We'll see how to do this in a live example in just a second.\n", "With our approach to plotting thus far, the commands related to an individual plot should all be in the same Python cell." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Changing the axis ranges\n", "\n", "Changing the plot axes can be done using the `plt.axis()` function.\n", "\n", "```python\n", "plt.axis([20160601, 20160615, 55.0, 70.0])\n", "```\n", "\n", "The format for `plt.axis()` is `[xmin, xmax, ymin, ymax]` enclosed in square brackets (i.e., a Python list).\n", "Here, the *x* range would be changed to the equivalents of June 1, 2016 to June 15, 2016 and the *y* range would be 55.0-70.0.\n", "The complete set of commands to plot would thus be:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "plt.plot(x, y, \"ro--\")\n", "plt.title(\"Kumpula temperatures in June 2016\")\n", "plt.xlabel(\"Date\")\n", "plt.ylabel(\"Temperature [°F]\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### repeat with clipped axes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.plot(x, y, 'ro--')\n", "plt.title(\"Kumpula temperatures in June 2016\")\n", "plt.xlabel(\"Date\")\n", "plt.ylabel(\"Temperature [°F]\")\n", "plt.text(20160604.0, 68.0, \"High temperature in early June\")\n", "plt.axis([20160601, 20160615, 55.0, 70.0]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bar plots in Matplotlib\n", "\n", "In addition to line plots, there are many other options for plotting in Matplotlib.\n", "Bar plots are one option, which can be used quite similarly to line plots." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plt.bar(x, y)\n", "plt.title('Kumpula temperatures in June 2016')\n", "plt.xlabel('Date')\n", "plt.ylabel('Temperature [°F]')\n", "plt.text(20160604.0, 68.0, 'High temperature in early June')\n", "plt.axis([20160601, 20160615, 55.0, 70.0]);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can find more about how to format bar charts on the [Matplotlib documentation website](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.bar.html?highlight=matplotlib%20pyplot%20bar#matplotlib.pyplot.bar)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Saving your plots as image files\n", "\n", "Saving plots created using Matplotlib done several ways.\n", "The recommendation for use outside of Jupyter notebooks is to use the `plt.savefig()` function.\n", "When using `plt.savefig()`, you simply give a list of commands to generate a plot and list `plt.savefig()` with some parameters as the last command.\n", "The file name is required, and the image format will be determined based on the listed file extension.\n", "\n", "Matplotlib plots can be saved in a number of useful file formats, including PNG, PDF, and EPS.\n", "PNG is a nice format for raster images, and EPS is probably easiest to use for vector graphics.\n", "Let's check out an example and save our lovely bar plot." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "plt.bar(x, y)\n", "plt.title(\"Kumpula temperatures in June 2016\")\n", "plt.xlabel(\"Date\")\n", "plt.ylabel(\"Temperature [°F]\")\n", "plt.text(20160604.0, 68.0, \"High temperature in early June\")\n", "plt.axis([20160601, 20160615, 55.0, 70.0])\n", "plt.savefig('bar-plot.png')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you refresh your **Files** tab on the left side of the JupyterLab window you should now see `bar-plot.png` listed.\n", "We could try to save another version in higher resolution with a minor change to our plot commands above." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "lines_to_next_cell": 2 }, "outputs": [], "source": [ "plt.bar(x, y)\n", "plt.title(\"Kumpula temperatures in June 2016\")\n", "plt.xlabel(\"Date\")\n", "plt.ylabel(\"Temperature [°F]\")\n", "plt.text(20160604.0, 68.0, \"High temperature in early June\")\n", "plt.axis([20160601, 20160615, 55.0, 70.0])\n", "plt.savefig('bar-plot-hi-res.pdf', dpi=600)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "dc2659482e2836551d85c78169edaa01", "grade": false, "grade_id": "cell-7a3c2a23d10ff4c6", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "## Task 1: Plotting like the \"pros\"\n", "\n", "We’re only introducing a tiny amount of what can be done with pyplot.\n", "In most cases, when we would like to create some more complicated type of plot, we would search using [Google](https://www.google.fi/) or visit the [Matplotlib plot gallery](http://matplotlib.org/gallery.html).\n", "The great thing about the [Matplotlib plot gallery](http://matplotlib.org/gallery.html) is that not only can you find example plots there, but you can also find the Python commands used to create the plots.\n", "This makes it easy to take a working example from the gallery and modify it for your use.\n", "\n", "Your job in this task is to:\n", "\n", "1. Visit the [Matplotlib plot gallery](http://matplotlib.org/gallery.html)\n", "2. Find an interesting plot and click on it\n", "3. Copy the code you find listed beneath the plot on the page that loads\n", "4. Paste that into an Python cell in this notebook and run it to reproduce the plot\n", "\n", "After you have reproduced the plot, you are welcome to try to make a small change to the plot commands and see what happens.\n", "For this, you can simply edit the Python cell contents and re-run." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "6c93c9def1ddd28ff2489285d76b14ed", "grade": true, "grade_id": "cell-935c3f0838b3c3c1", "locked": false, "points": 5, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "28f8590834fb06aeff760c968cd0c48a", "grade": false, "grade_id": "cell-18844911794679a3", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "## Task 2: Plotting only part of a dataset\n", "\n", "For this task, you should use the values for arrays `x` and `y` calculated earlier in this part of the lesson, and use `plt.axis()` to limit the plot to the following *x* and *y* ranges: *x = June 7-14*, *y = 45.0-65.0*.\n", "\n", "- What do you expect to see in this case?\n", "\n", "**Note**: In order to get the plot to display properly, you will need to first type in the `plt.plot()` command, then `plt.axis()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "e528384fe1e43036e1c94eb0653976c6", "grade": true, "grade_id": "cell-986c16b84ab694e5", "locked": false, "points": 5, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] } ], "metadata": { "jupytext": { "formats": "ipynb", "metadata_filter": { "cells": { "additional": "all" }, "notebook": { "additional": "all" } }, "text_representation": { "extension": ".py", "format_name": "percent", "format_version": "1.2", "jupytext_version": "0.8.6" } }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.7" }, "nbsphinx": { "execute": "never" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": true, "toc_window_display": true } }, "nbformat": 4, "nbformat_minor": 2 }