{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hydropandas Objects\n",
    "\n",
    "In the HydroPandas Python package, the Obs and ObsCollection classes are designed to handle time series data related to hydrological observations.\n",
    "\n",
    "The Obs class represents a single time series of measurements at a specific location, such as groundwater levels or precipitation amounts. It is a subclass of the pandas DataFrame, enriched with additional attributes and methods for the type of observation it holds. There are specialized subclasses of Obs for different measurement types, including:\n",
    "\n",
    "- GroundwaterObs: for groundwater measurements\n",
    "- WaterQualityObs: for (ground)water quality measurements\n",
    "- WaterlvlObs: for surface water level measurements\n",
    "- ModelObs: for observations from a MODFLOW model\n",
    "- MeteoObs: for meteorological observations\n",
    "- PrecipitationObs: for precipitation observations (subclass of MeteoObs)\n",
    "- EvaporationObs: for evaporation observations (subclass of MeteoObs)\n",
    "\n",
    "The ObsCollection class represents a collection of Obs objects, such as multiple groundwater level time series within a certain area. It is also a subclass of the pandas DataFrame, where each row contains metadata (e.g., coordinates of the observation point) and the corresponding Obs object that holds the measurements. Both Obs and ObsCollection classes include methods for reading data from various sources, facilitating the management and analysis of hydrological time series data."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## <a id=top></a>Notebook contents\n",
    "\n",
    "1. [Obs](#Obs)\n",
    "2. [ObsCollection](#ObsCollection)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "import hydropandas as hpd\n",
    "\n",
    "hpd.util.get_color_logger(\"INFO\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Obs<a id=GroundwaterObs></a>\n",
    "\n",
    "Creating an `Obs` object is very similar to creating a `DataFrame`. Below we create 3 differente Obs objects:\n",
    "\n",
    "1. an empty Obs\n",
    "2. an Obs with only metadata \n",
    "3. an Obs with metadata and measurements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1. create an empty Obs object\n",
    "o1 = hpd.Obs(name=\"my empty obs\")\n",
    "display(o1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 2. create an Obs object with only metadata\n",
    "o2 = hpd.Obs(\n",
    "    name=\"my_observation\",\n",
    "    x=10,\n",
    "    y=20,\n",
    "    location=\"somewhere\",\n",
    "    filename=\"unknown\",\n",
    "    source=\"imagination\",\n",
    "    unit=\"m\",\n",
    ")\n",
    "display(o2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 3. create an Obs object with both metadata and measurements\n",
    "meas_df = pd.DataFrame(\n",
    "    index=pd.date_range(start=\"2020-01-01\", periods=10, freq=\"D\"),\n",
    "    data={\"value\": np.random.rand(10)},\n",
    ")\n",
    "o3 = hpd.Obs(\n",
    "    meas_df,\n",
    "    name=\"smw\",\n",
    "    x=1000,\n",
    "    y=22220,\n",
    "    location=\"somewhere else\",\n",
    "    source=\"advanced imagination\",\n",
    "    unit=\"m\",\n",
    ")\n",
    "display(o3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Metadata\n",
    "\n",
    "Access observation metadata as attributes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"x coordinate of observation 1: {o1.x}\")\n",
    "print(f\"x coordinate of observation 2: {o2.x}\")\n",
    "print(f\"x coordinate of observation 3: {o3.x}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"source of observation 1 is : {o1.source}\")\n",
    "print(f\"location of observation 2 is : {o2.location}\")\n",
    "print(f\"name of observation 3 is : {o3.name}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Measurements\n",
    "\n",
    "Access observation measurements as if the observation is a DataFrame with the measurements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "display(o3[\"value\"])  # show measurements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "perc85 = o3[\"value\"].quantile(0.85)  # get percentile\n",
    "print(f\"the 85th percentile of my measurements is {perc85:.2f} {o3.unit}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "o3[\"value\"].plot(\n",
    "    figsize=(7, 3),\n",
    "    label=o3.name,\n",
    "    ylabel=o3.unit,\n",
    "    marker=\"o\",\n",
    "    legend=True,\n",
    "    title=\"my observations\",\n",
    ");  # plot measurements"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Obs types\n",
    "\n",
    "Different Obs types have differente metadata. Groundwater observations have some extra properties `screen_top`, `screen_bottom`, `ground_level`, `tube_top` and `metadata_available`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gw_obs = hpd.GroundwaterObs(\n",
    "    o3,\n",
    "    name=\"smw_pb1\",\n",
    "    tube_nr=1,\n",
    "    screen_top=-5,\n",
    "    screen_bottom=-6,\n",
    "    unit=\"m NAP\",\n",
    "    ground_level=3,\n",
    "    tube_top=2.95,\n",
    "    metadata_available=True,\n",
    ")  # create a GroundwaterObs object from the Obs object\n",
    "display(gw_obs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modify\n",
    "\n",
    "Sometimes you want to change measurement values or metadata of an Obs object."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# modify measurements (similar to how you modify a pandas DataFrame)\n",
    "o3.loc[\"2020-01-05\":\"2020-01-7\", \"value\"] = 2  # set value of a specific date\n",
    "o3.plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create new Obs object from existing\n",
    "o4 = o3.copy()  # note use the copy method to create a new object\n",
    "o4.loc[\"2020-01-05\":\"2020-01-7\", \"value\"] = -1\n",
    "o4.plot()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# modify metadata by direct assignment\n",
    "o4.name = \"smw_modified\"\n",
    "o4.source = \"smw\"\n",
    "display(o4)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Additional metadata\n",
    "\n",
    "You can have metadata that does not match any of the default metadata names for\n",
    "a particular Observation type. For groundwater observations you may for example\n",
    "have the name of the company that constructed the measurement well. This\n",
    "additional metadata can be stored in the `meta` attribute as a dictionary.\n",
    "Below we create a GroundwaterObs object with some additional metadata.\n",
    "\n",
    "Note that `display(gw_obs)` will never display the meta dictionary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gw_obs = hpd.GroundwaterObs(\n",
    "    o3,\n",
    "    name=\"smw_pb1\",\n",
    "    tube_nr=1,\n",
    "    screen_top=-5,\n",
    "    screen_bottom=-6,\n",
    "    unit=\"m NAP\",\n",
    "    ground_level=3,\n",
    "    tube_top=2.95,\n",
    "    metadata_available=True,\n",
    "    meta={\"contractor\": \"GeoDrill Inc.\"},\n",
    ")\n",
    "print(gw_obs.meta)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Read/write Obs\n",
    "\n",
    "Observations can be read/written from/to a json, csv, excel or pickle file, see this table:\n",
    "\n",
    "| type   | write function  | read function   | Human readable        | Store metadata | Write/read additional metadata*   | keep dtypes?    |\n",
    "|--------|-----------------|-----------------|-----------------------|----------------|-----------------------------------|-----------------|\n",
    "| json   | Obs.to_json     | hpd.read_json   | Yes                   | Yes            | Yes                               | Mostly          |\n",
    "| csv    | Obs.to_csv      | Obs.from_csv    | Yes                   | Yes            | No                                | No              |\n",
    "| pickle | Obs.to_pickle   | Obs.from_pickle | No                    | Yes            | Yes                               | Yes             |\n",
    "| excel* | Obs.to_excel    | pd.read_excel   | Yes (in Excel)        | No             | No                                | No              |\n",
    "\n",
    "*the to_excel method is the inherited method from a pandas DataFrame. The other methods are methods adapted for hydropandas."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### json\n",
    "\n",
    "Json is a human readable format that can be used to store observation objects. Additional metadata is kept in the json file and it is more robust for keeping the same dtypes. At times small details, such as the index frequency, may be different between the original file and the one that is written and read to a json file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write to json\n",
    "gw_obs.to_json(\"my_gw_obs.json\", indent=4)\n",
    "gw_obs"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read from json\n",
    "gw_obs_from_json = hpd.read_json(\"my_gw_obs.json\")\n",
    "gw_obs_from_json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"index frequency original:\", gw_obs.index.freq)\n",
    "print(\"index frequency from json:\", gw_obs_from_json.index.freq)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### csv\n",
    "\n",
    "When Obs object are written and read from a csv file:\n",
    "1. The datatypes may have changed\n",
    "2. The additional metadata in the `.meta` attribute is lost."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save the groundwater observations to a csv file\n",
    "gw_obs.to_csv(\"my_gw_obs.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read the groundwater observations from a csv file\n",
    "gw_obs_from_csv = hpd.GroundwaterObs.from_csv(\"my_gw_obs.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 1. datatypes changed\n",
    "print(\"datatype of gw_obs.screen_top:\", type(gw_obs.screen_top))\n",
    "print(\"datatype of gw_obs_from_csv.screen_top:\", type(gw_obs_from_csv.screen_top))\n",
    "\n",
    "# 2. additional metadata is not saved in the csv file\n",
    "print(f\"\\nadditional metadata: {gw_obs.meta=}\")\n",
    "print(f\"additional metadata: {gw_obs_from_csv.meta=}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### pickle\n",
    "Pickle files are binary and not readable for humans. They are very fast to write/read and return an exact copy of the original file. Pickle files are not very useful for long-term storage because they:\n",
    "\n",
    "- are only readable in Python\n",
    "- are not stable across Python and package versions.\n",
    "- contain references to exact class and module paths"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# save the object to a pickle file\n",
    "gw_obs.to_pickle(\"my_gw_obs.pklz\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read the object from a pickle file\n",
    "gw_obs2 = hpd.read_pickle(\"my_gw_obs.pklz\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "gw_obs2.equals(gw_obs)  # check if the two objects are equal"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## ObsCollection<a id=ObsCollection></a>\n",
    "\n",
    "An ObsCollection is a structured way to manage and analyse multiple time series of hydrological observations. It serves as a container for multiple Obs objects, which represent individual time series of measurements, such as groundwater levels, precipitation, or water quality.\n",
    "\n",
    "Each row in an ObsCollection contains metadata (e.g., location, station name) and a corresponding Obs object holding the time series data. This structure allows for easy comparison, filtering, and statistical analysis across multiple observation sites."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create an empty ObsCollection\n",
    "oc = hpd.ObsCollection()\n",
    "print(oc)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create an ObsCollection with a single Obs object\n",
    "oc = hpd.ObsCollection(o3)\n",
    "oc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create an ObsCollection with multiple Obs objects\n",
    "oc = hpd.ObsCollection([o1, o2, o3])\n",
    "oc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ObsCollection metadata\n",
    "\n",
    "Access the metadata using the standard DataFrame methods."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"the x coordinate of observation 2 is: {oc.loc['my_observation', 'x']}\")\n",
    "print(f\"the location of observation 3 is: {oc.loc['smw', 'location']}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ObsCollection observations\n",
    "\n",
    "Access the Obs objects from the collection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# using the loc method\n",
    "o3_1 = oc.loc[\"smw\", \"obs\"]\n",
    "\n",
    "# using the get_obs method with the name of the observation\n",
    "o3_2 = oc.get_obs(\"smw\")\n",
    "\n",
    "# using the get_obs method with the location (only works if the location is unique)\n",
    "o3_3 = oc.get_obs(location=\"somewhere else\")\n",
    "\n",
    "# check if the three objects are the same\n",
    "id(o3_1) == id(o3_2) == id(o3_3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Slice ObsCollection\n",
    "\n",
    "Filter and slice ObsCollections"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oc.loc[oc[\"y\"] > 10]  # Selection based on the y coordinate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oc.loc[oc[\"source\"].str.contains(\"advanced\")]  # Selection based on the location"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Modify ObsCollections\n",
    "\n",
    "Below are some examples to modify ObsCollections. More details on merging observations and ObsCollections are available [here](04_merging_observations.ipynb).\n",
    "\n",
    "- add an observation\n",
    "- remove an observation\n",
    "- modify metadata of an observation\n",
    "- modify the timeseries of an observation\n",
    "- add a copy of an existing observation\n",
    "- replace an existing observation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Remove observation\n",
    "\n",
    "Remove an observation from an ObsCollection using `drop`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# remove an observation\n",
    "oc.drop(\"my_observation\", inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Add observation\n",
    "\n",
    "Add an observation from an ObsCollection using `add_observation`.\n",
    "\n",
    "Note: Adding an observation using `oc.loc[<name>,'obs'] = o` does not work and results in an empty 'obs' column"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add an observation\n",
    "oc.add_observation(o2, inplace=True)\n",
    "oc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Do not add observations using loc!\n",
    "oc_copy = oc.copy(deep=True)\n",
    "oc_copy.loc[\"new_obs\", \"obs\"] = gw_obs\n",
    "oc_copy"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Modify metadata\n",
    "Use the `set_metadata_value` on the ObsCollection to modify metadata.\n",
    "\n",
    "Note:  Metadata of a single observation is stored in two places: in the ObsCollection DataFrame and in the attribute of the Observation object. `set_metadata_value` will modify both which is preferred over setting the value only in the ObsCollection dataframe or only in the Observation attribute. The latter two will result in an inconsistent ObsCollection.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# modify metadata of an observation\n",
    "oc.set_metadata_value(\"my_observation\", \"x\", 1815)\n",
    "oc.set_metadata_value(\"my_observation\", \"y\", 2025)\n",
    "print(oc._is_consistent())  # check if the ObsCollection is consistent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Do not use this way to modify metadata!\n",
    "oc_copy = oc.copy(deep=True)\n",
    "oc_copy.loc[\"smw\", \"x\"] = 100\n",
    "oc_copy._is_consistent()  # check if the ObsCollection is consistent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Modify time series\n",
    "\n",
    "There are several ways to modify time series of observations in a collection. Below we show a way using chained assignment and a two-step plan."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# modify the timeseries of an observation\n",
    "\n",
    "# 1 chained assignment\n",
    "oc.loc[\"smw\", \"obs\"].loc[\"2020-1-7\":\"2020-1-9\", \"value\"] = 42\n",
    "display(oc.loc[\"smw\", \"obs\"])\n",
    "\n",
    "# 2 two-step\n",
    "o = oc.loc[\"smw\", \"obs\"]\n",
    "o.loc[\"2020-1-7\":\"2020-1-9\", \"value\"] = 21\n",
    "display(oc.loc[\"smw\", \"obs\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Add observation copy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a copy of an observation\n",
    "o = oc.loc[\"my empty obs\", \"obs\"].copy()\n",
    "\n",
    "# modify some stuff\n",
    "date_range = pd.date_range(start=\"2020-01-01\", periods=10, freq=\"D\")\n",
    "o.index = date_range\n",
    "o.loc[date_range, \"value\"] = np.arange(10)\n",
    "o.name = \"not so emtpy obs\"\n",
    "\n",
    "# add modified observation to collection\n",
    "oc.add_observation(o, inplace=True)\n",
    "oc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Replace observation\n",
    "\n",
    "There are several ways to replace an observation in a collection:\n",
    "\n",
    "1. remove the observation first and then add the observation that should replace the original.\n",
    "2. use `add_observation` to add an observation with the same name as an existing observation in the collection. Hydropandas will try to merge the new observation with the existing observation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create a copy of an observation\n",
    "o = oc.loc[\"my empty obs\", \"obs\"].copy()\n",
    "\n",
    "# modify time series\n",
    "date_range = pd.date_range(start=\"2020-01-01\", periods=10, freq=\"D\")\n",
    "o.index = date_range\n",
    "o.loc[date_range, \"value\"] = np.arange(10)\n",
    "\n",
    "# replace existing observation\n",
    "oc.add_observation(o, inplace=True)\n",
    "oc"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### ObsCollection additional metadata\n",
    "\n",
    "[Additional metadata](#additional-metadata) is not shown by default in an ObsCollection. It can be added manually by calling the `add_meta_to_df` method. By default all metadata is added but you can also specify a key from the meta dictionary to add."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# create an Obs object with additional metadata\n",
    "o2_with_meta = hpd.Obs(\n",
    "    name=\"my_observation\",\n",
    "    x=10,\n",
    "    y=20,\n",
    "    location=\"somewhere\",\n",
    "    filename=\"unknown\",\n",
    "    source=\"imagination\",\n",
    "    unit=\"m\",\n",
    "    meta={\"owner\": \"me\", \"project\": \"hydropandas\"},\n",
    ")\n",
    "\n",
    "# create an ObsCollection with multiple Obs objects, one of them with additional metadata\n",
    "oc = hpd.ObsCollection([o1, o2_with_meta, o3])\n",
    "\n",
    "# metadata is not shown by default\n",
    "oc"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# add metadata to the dataframe to show it\n",
    "oc_with_meta = oc.add_meta_to_df()\n",
    "oc_with_meta"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Read/write ObsCollection\n",
    "\n",
    "There are several options to read/write an ObsCollection from/to a file. The table below gives a broad overview on the options\n",
    "\n",
    "\n",
    "| type   | write function          | read function  | Human readable        | Write/read additional metadata*   | keep dtypes?    |\n",
    "|--------|-------------------------|----------------|-----------------------|-----------------------------------|-----------------|\n",
    "| json   | ObsCollection.to_json   | hpd.read_json  | Yes                   | Yes                               | Mostly          |\n",
    "| csv    | ObsCollection.to_csv    | hpd.read_csv   | Yes                   | No                                | No              |\n",
    "| excel  | ObsCollection.to_excel  | hpd.read_excel | Yes (via excel)       | Only if exposed in oc**           | No              |\n",
    "| pickle | ObsCollection.to_pickle | hpd.read_pickle| No                    | Yes                               | Yes             |\n",
    "\n",
    "Writing to and reading from an excel, csv or json file slightly alters the properties of the ObsCollection, just like writing and reading a DataFrame to these file types would do. Reading/writing a pickle does not change anything.\n",
    "\n",
    "** Additional metadata is only written and read if it was added to the ObsCollection using the `add_meta_to_df` method. More info on additional metadata [here](#obscollection-additional-metadata)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### json"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "path = \"my_obs_collection.json\"\n",
    "oc_with_meta.to_json(path, indent=4)\n",
    "oc_with_meta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oc_from_json = hpd.read_json(path)\n",
    "oc_from_json"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### csv"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "csvdir = \"my_obs_collection\"\n",
    "oc_with_meta.to_csv(csvdir)\n",
    "oc_with_meta"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oc_from_csv = hpd.read_csv(csvdir)  # read the ObsCollection from the csv files\n",
    "oc_from_csv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### excel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oc_with_meta.to_excel(\"my_obs_collection.xlsx\")  # write to excel"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read excel file\n",
    "oc_from_excel = hpd.read_excel(\"my_obs_collection.xlsx\")\n",
    "oc_from_excel"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "##### pickle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oc_with_meta.to_pickle(\"my_obs_collection.pklz\")  # write to pickle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# read pickle\n",
    "oc_from_pickle = hpd.read_pickle(\"my_obs_collection.pklz\")\n",
    "oc_from_pickle"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Extensions\n",
    "\n",
    "To enhance the functionality of an ObsCollection, HydroPandas provides several extensions that add specialized methods for visualization, spatial analysis, and data processing. Some key extensions include:\n",
    "\n",
    "- Plot Extension (ObsCollection.plot): Built-in plotting capabilities for visualizing time series data. Users can generate time series plots for individual or multiple observations, histograms, and other graphical representations to analyze trends and patterns in hydrological data.\n",
    "- Geo Extension (ObsCollection.geo): Spatial analysis by integrating with geopandas. It allows users to obtain the extent of an ObsCollection, convert to another coordinate reference system and find nearby geometries.\n",
    "- Groundwater Obs (ObsCollection.gwobs): Analyse and process groundwater observations. Users can find the REGIS layer of each tube and set the tube number based on the screen depth.\n",
    "- Statistics (ObsCollection.stats): Statistical analysis of the observations. Users can obtain the number of consecutive years with more than 10 observations or find seasonal minimum and maximum values.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oc.stats.get_first_last_obs_date()  # get the first and last observation date using the stats extension"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oc.geo.get_extent()  # get the extent of the observations using the geo extension"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "hydropandas",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.13.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}