EarthML

Machine learning and visualization in Python for Earth science

Python offers a wide variety of open-source libraries covering a huge range of functionality, but it can be difficult to work out which libraries are suitable for which tasks. The EarthML project helps to:

  • Demonstrate how to use Python tools for machine learning and analysis in the earth sciences
  • Identify libraries suitable for working with earth-science data
  • Make improvements to these libraries as needed to help improve earth-science workflows

EarthML contains no code of its own, only tutorials and examples showing how to use packages like:

  • Data libraries:
    • Intake: Cleanly loading data from various sources.
    • XArray: Processing gridded (n-dimensional) data structures.
    • Pandas: Processing columnar (tabular) data structures.
    • Dask: Parallelism and performance at scale.
  • Visualization libraries:
    • hvPlot: Simple data-centric API for plotting, building on:
      • Bokeh: Interactive browser-based plotting.
      • HoloViews: Easy construction of Bokeh plots for datasets.
      • GeoViews: HoloViews with earth-specific projections.
      • Datashader: Rendering large datasets into images for display in browsers.
    • Panel: Dashboards, apps, and widgets for any library’s plots.
  • Other tools:
    • Jupyter: Reproducible notebooks (source code for all examples on this site).
    • Cartopy: Geographic coordinate reference systems.
    • Dask: Parallelism and performance at scale.
  • ML tools: (representative only – use any you like!)

The EarthML Tutorial offers a general-purpose overview of the concepts and tools involved, and the Topics section shows examples of how these tools may be used to perform machine learning and related tasks in the Earth sciences, such as:

Carbon Flux
Heat and Trees
Walker Lake
Classification
Clustering
Lidar

Please feel free to report issues or contribute code.

Installation

Step 1: Install a Miniconda (or Anaconda) environment

Any computer with a modern web browser (preferably Google Chrome) should be suitable. 16GB of RAM is required for some of the examples, but most will run fine in 4GB.

If you don’t already have conda on your machine, you can get it from Anaconda.com, and then open a terminal window.

If you do have conda already, it’s a good idea to update it.

On Mac and Linux update to the latest version by updating twice using

> conda update conda
> conda update conda

On Windows it is better to pin to a specific conda version to avoid a bug:

> conda update conda=4.5.4

Step 2: Clone the EarthML git repository

> git clone https://github.com/pyviz-topics/EarthML.git
> cd EarthML

Step 3: Install and activate the earthml environment

> conda env create -f environment.yml
> conda activate earthml

Step 4: Run the notebook server

> cd examples
> jupyter notebook