{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": true
   },
   "source": [
    "<h1>Table of Contents<span class=\"tocSkip\"></span></h1>\n",
    "<div class=\"toc\"><ul class=\"toc-item\"><li><span><a href=\"#Create-a-simulated-dataset\" data-toc-modified-id=\"Create-a-simulated-dataset-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>Create a simulated dataset</a></span></li><li><span><a href=\"#Convert-to-a-dataframe\" data-toc-modified-id=\"Convert-to-a-dataframe-2\"><span class=\"toc-item-num\">2&nbsp;&nbsp;</span>Convert to a dataframe</a></span><ul class=\"toc-item\"><li><span><a href=\"#Q1:-Turn-uni_dist-into-a-dataframe-called-df\" data-toc-modified-id=\"Q1:-Turn-uni_dist-into-a-dataframe-called-df-2.1\"><span class=\"toc-item-num\">2.1&nbsp;&nbsp;</span>Q1: Turn uni_dist into a dataframe called df</a></span></li></ul></li><li><span><a href=\"#Create-a-datetime-for-each-row\" data-toc-modified-id=\"Create-a-datetime-for-each-row-3\"><span class=\"toc-item-num\">3&nbsp;&nbsp;</span>Create a datetime for each row</a></span></li><li><span><a href=\"#Make-time_list-the-index-to-the-dataframe\" data-toc-modified-id=\"Make-time_list-the-index-to-the-dataframe-4\"><span class=\"toc-item-num\">4&nbsp;&nbsp;</span>Make time_list the index to the dataframe</a></span><ul class=\"toc-item\"><li><span><a href=\"#Add-it-as-a-new-column\" data-toc-modified-id=\"Add-it-as-a-new-column-4.1\"><span class=\"toc-item-num\">4.1&nbsp;&nbsp;</span>Add it as a new column</a></span></li><li><span><a href=\"#Q2----set-this-column-as-an-index\" data-toc-modified-id=\"Q2----set-this-column-as-an-index-4.2\"><span class=\"toc-item-num\">4.2&nbsp;&nbsp;</span>Q2 -- set this column as an index</a></span></li></ul></li><li><span><a href=\"#Dataframe-grouping\" data-toc-modified-id=\"Dataframe-grouping-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>Dataframe grouping</a></span><ul class=\"toc-item\"><li><span><a href=\"#Q3-write-a-function-that-rounds-a-column-value-to-1-decimal\" data-toc-modified-id=\"Q3-write-a-function-that-rounds-a-column-value-to-1-decimal-5.1\"><span class=\"toc-item-num\">5.1&nbsp;&nbsp;</span>Q3 write a function that rounds a column value to 1 decimal</a></span></li><li><span><a href=\"#Q4-Add-a-new-column-called-&quot;mass_tenths&quot;-using-coarse_grain\" data-toc-modified-id=\"Q4-Add-a-new-column-called-&quot;mass_tenths&quot;-using-coarse_grain-5.2\"><span class=\"toc-item-num\">5.2&nbsp;&nbsp;</span>Q4 Add a new column called \"mass_tenths\" using coarse_grain</a></span></li><li><span><a href=\"#Q5:-Group-the-measurements-on-mass_tenths\" data-toc-modified-id=\"Q5:-Group-the-measurements-on-mass_tenths-5.3\"><span class=\"toc-item-num\">5.3&nbsp;&nbsp;</span>Q5: Group the measurements on mass_tenths</a></span></li></ul></li><li><span><a href=\"#Dataframe-input/output\" data-toc-modified-id=\"Dataframe-input/output-6\"><span class=\"toc-item-num\">6&nbsp;&nbsp;</span>Dataframe input/output</a></span><ul class=\"toc-item\"><li><span><a href=\"#Q6-Round-trip\" data-toc-modified-id=\"Q6-Round-trip-6.1\"><span class=\"toc-item-num\">6.1&nbsp;&nbsp;</span>Q6 Round trip</a></span></li></ul></li></ul></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import numpy.random as rn\n",
    "import numpy.testing as nt\n",
    "import datetime as dt\n",
    "from datetime import timezone as tz\n",
    "import pprint"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Dataframe practice\n",
    "\n",
    "Learning objectives\n",
    "\n",
    "- Be able to use the apply method to execute a function on every row of a dataframe\n",
    "- Be able to add a column to a data frame and use groupby with that column to \n",
    "  group dataframe rows into subset dataframes\n",
    "- Be able to do simple statistics (mean, median, max, min, summary) on dataframes and dataframe series\n",
    "- Be able to construct dataframes from lists of tuples, lists of dictionaries, or numpy arrays using\n",
    "  from_records member function\n",
    "- Be able to create and work with python datetime objects\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a simulated dataset\n",
    "\n",
    "This cell creates 3 columns: column 0 is temperatures\n",
    "in the range 263-283 K, column 1 is veloc in the range\n",
    "-100 to +100 dm/s, column 2 is mass in the range "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "my_seed = 5\n",
    "rn.seed(my_seed)\n",
    "uni_dist = rn.rand(25,3)\n",
    "uni_dist[:,:2] = (uni_dist[:,:2] - 0.5)*2.\n",
    "temps = uni_dist[:,0]*10. + 273.\n",
    "veloc = uni_dist[:,1]*100.\n",
    "uni_dist[:,0]=temps\n",
    "uni_dist[:,1]=veloc\n",
    "print(uni_dist[:5,:])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convert to a dataframe\n",
    "\n",
    "with columns=['temperature','veloc','mass']\n",
    "\n",
    "### Q1: Turn uni_dist into a dataframe called df\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "#Q1 answer\n",
    "columns=['temperature','veloc','mass']\n",
    "df=pd.DataFrame(uni_dist,columns=columns)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It should pass this test"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nt.assert_allclose(df['veloc'][1],-2.3,rtol=0.1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create a datetime for each row\n",
    "\n",
    "Assume that each row represents a measurement, and that\n",
    "the measurements happened every 10 minutes atarting at\n",
    "5 am UCT on March 25, 2019"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "start_time=dt.datetime(2019,3,25,5,0,0,tzinfo=tz.utc)\n",
    "interval=dt.timedelta(minutes=10)\n",
    "print(f'start and time interval: {start_time} -- {interval}')\n",
    "time_list=[start_time]\n",
    "for tick in range(len(df)-1):\n",
    "    time_list.append(time_list[tick] + interval)\n",
    "pprint.pprint(time_list[:5])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Make time_list the index to the dataframe\n",
    "\n",
    "### Add it as a new column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df['date']=time_list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q2 -- set this column as an index\n",
    "\n",
    "google pandas.DataFrame.set_index for method arguments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "#Q2 answer\n",
    "df.set_index('date',inplace=True)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your dataframe should pass this assert."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "target_row=start_time=dt.datetime(2019,3,25,5,40,0,tzinfo=tz.utc)\n",
    "nt.assert_allclose(df.loc[target_row]['temperature'],271.8,rtol=1.e-2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dataframe grouping\n",
    "\n",
    "### Q3 write a function that rounds a column value to 1 decimal\n",
    "\n",
    "i.e. complete this function\n",
    "\n",
    "```\n",
    "def coarse_grain(row,colname):\n",
    "    #\n",
    "    # input -- dataframe row called row and string colname\n",
    "    #\n",
    "    # returns -- row[colname] rounded to 1 decimal\n",
    "    return col_val\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 1
   },
   "outputs": [],
   "source": [
    "# Q3 answer\n",
    "def coarse_grain(row,colname):\n",
    "    col_val = np.around(row[colname],decimals=1)\n",
    "    return col_val"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your function should pass this assert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "the_row = df.iloc[2]\n",
    "the_round = coarse_grain(the_row,'mass')\n",
    "np.testing.assert_allclose(the_round,0.3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q4 Add a new column called \"mass_tenths\" using coarse_grain\n",
    "\n",
    "i.e. use df.apply to run coarse_grain on the dataframe and add the result\n",
    "as the 'mass_tenths' column\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "#Q4 answer\n",
    "df['mass_tenths']=\\\n",
    "    df.apply(coarse_grain,args=('mass',),axis=1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You dataframe should now pass the following assert"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nt.assert_allclose(df['mass_tenths'][3],0.7)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Q5: Group the measurements on mass_tenths"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "Use groups=df.groupby to create a DataFrameGroupBy object\n",
    "called groups indexed on mass_tenths\n",
    "\n",
    "Then find the median temperature, speed and mass of each group\n",
    "using the the_median=groups.median() method\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#Q5 answer\n",
    "groups=df.groupby('mass_tenths')\n",
    "the_median=groups.median()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "nt.assert_allclose(the_median.loc[0.2]['veloc'],1.49,rtol=0.1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "groups.median()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Dataframe input/output\n",
    "\n",
    "### Q6 Round trip\n",
    "\n",
    "Add a cell that fixes the following problem -- new_df is not\n",
    "the same as our original df after writing it out to csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df.to_csv('test.csv')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_df=pd.read_csv('test.csv')\n",
    "new_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "lines_to_next_cell": 2
   },
   "outputs": [],
   "source": [
    "#Q6 answer\n",
    "new_df=pd.read_csv('test.csv',index_col='date')\n",
    "new_df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "target_row=start_time=dt.datetime(2019,3,25,5,40,0,tzinfo=tz.utc)\n",
    "nt.assert_allclose(df.loc[target_row]['temperature'],271.8,rtol=1.e-2)"
   ]
  }
 ],
 "metadata": {
  "jupytext": {
   "cell_metadata_filter": "all",
   "notebook_metadata_filter": "all,-language_info"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  },
  "nbsphinx": {
   "execute": "never"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {
    "height": "calc(100% - 180px)",
    "left": "10px",
    "top": "150px",
    "width": "273.438px"
   },
   "toc_section_display": true,
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}