Questions tagged [python-xarray]
xarray (formerly xray) is an open source library that provides a range of N-dimensional data structures.
1,102
questions
16
votes
1answer
12k views
What is the pandas.Panel deprecation warning actually recommending?
I have a package that uses pandas Panels to generate MultiIndex pandas DataFrames. However, whenever I use pandas.Panel, I get the following DeprecationError:
DeprecationWarning:
Panel is ...
15
votes
3answers
5k views
How to get the coordinates of the maximum in xarray?
Simple question: I don't only want the value of the maximum but also the coordinates of it in an xarray DataArray. How to do that?
I can, of course, write my own simple reduce function, but I wonder ...
15
votes
1answer
4k views
When to use multiindexing vs. xarray in pandas
The pandas pivot tables documentation seems to recomend dealing with more than two dimensions of data by using multiindexing:
In [1]: import pandas as pd
In [2]: import numpy as np
In [3]: import ...
15
votes
1answer
3k views
Join/merge multiple NetCDF files using xarray
I have a folder with NetCDF files from 2006-2100, in ten year blocks (2011-2020, 2021-2030 etc).
I want to create a new NetCDF file which contains all of these files joined together. So far I have ...
14
votes
3answers
6k views
Speeding up reading of very large netcdf file in python
I have a very large netCDF file that I am reading using netCDF4 in python
I cannot read this file all at once since its dimensions (1200 x 720 x 1440) are too big for the entire file to be in memory ...
14
votes
3answers
1k views
combining spatial netcdf files using xarray python
Is there a way to merge 2 or more netCDF files with the same time dimension but different spatial domains into a single netCDF file? The spatial domains are specified by latitude and longitude ...
13
votes
4answers
2k views
How to apply linear regression to every pixel in a large multi-dimensional array containing NaNs?
I have a 1D array of independent variable values (x_array) that match the timesteps in a 3D numpy array of spatial data with multiple time-steps (y_array). My actual data is much larger: 300+ ...
11
votes
2answers
4k views
Is it possible to append to an xarray.Dataset?
I've been using the .append() method to concatenate two tables (with the same fields) in pandas. Unfortunately this method does not exist in xarray, is there another way to do it?
10
votes
5answers
1k views
Get hourly average for each month from a netcdf file
I have a netCDF file with the time dimension containing data by the hour for 2 years. I want to average it to get an hourly average for each hour of the day for each month. I tried this:
import ...
10
votes
2answers
3k views
Concise way to filter data in xarray
I need to apply a very simple 'match statement' to the values in an xarray array:
Where the value > 0, make 2
Where the value == 0, make 0
Where the value is NaN, make NaN
Here's my current solution....
10
votes
1answer
737 views
Avoid overlapping colorbar in xarray facet grid plot
import xarray as xr
import cartopy.crs as ccrs
USA_PROJ = ccrs.AlbersEqualArea(central_longitude=-97., central_latitude=38.)
g_simple = ds_by_month.t2m.plot(x='longitude',
...
10
votes
1answer
2k views
boolean indexing in xarray
I have some arrays with dims 'time', 'lat', 'lon' and some with just 'lat', 'lon'. I often have to do this in order to mask time-dependent data with a 2d (lat-lon) mask:
x.data[:, mask.data] = np.nan
...
9
votes
2answers
347 views
xarray reverse interpolation (on coordinate, not on data)
I have a the following DataArray
arr = xr.DataArray([[0.33, 0.25],[0.55, 0.60],[0.85, 0.71],[0.92,0.85],[1.50,0.96],[2.5,1.1]],[('x',[0.25,0.5,0.75,1.0,1.25,1.5]),('y',[1,2])])
This gives the ...
8
votes
5answers
4k views
add dimension to an xarray DataArray
I need to add a dimension to a DataArray, filling the values across the new dimension. Here's the original array.
a_size = 10
a_coords = np.linspace(0, 1, a_size)
b_size = 5
b_coords = np.linspace(0,...
7
votes
3answers
2k views
python-xarray: open_mfdataset concat along two dimensions
I have files which are made of 10 ensembles and 35 time files. One of these files looks like:
>>> xr.open_dataset('ens1/CCSM4_ens1_07ic_19820701-19820731_NPac_Jul.nc')
<xarray.Dataset>
...
7
votes
2answers
5k views
Substitute dataset coordinates in xarray (Python)
I have a dataset stored in NetCDF4 format that consists of Intensity values with 3 dimensions: Loop, Delay and Wavelength. I named my coordinates the same as the dimensions (I don't know if it's good ...
7
votes
2answers
3k views
Extract coordinate values in xarray
I would like to extract the values of the coordinate variables.
For example I create a DataArray as:
import xarray as xr
import numpy as np
import pandas as pd
years_arr=range(1982,1986)
time = pd....
7
votes
1answer
162 views
Writing xarray multiindex data in chunks
I am trying to efficiently restructure a large multidimentional dataset. Let assume I have a number of remotely sensed images over time with a number of bands with coordinates x y for pixel location, ...
7
votes
1answer
812 views
Memory errors using xarray + dask - use groupby or apply_ufunc?
I am using xarray as the basis of my workflow for analysing fluid turbulence data, but I'm having trouble leveraging dask correctly to limit memory usage on my laptop.
I have a dataarray n with ...
6
votes
1answer
3k views
Select xarray/pandas index based on specific months
I have an xarray DataArray that I want to select the months April, May, June (similar to time.season=='JJA') for an entire time series.
Its structured like:
<xarray.DataArray 't2m' (time: 492, ...
6
votes
1answer
2k views
Specify encoding/compression for many variables in xarray dataset when write to_netcdf
I have been writing out some xarray.Datasets that have multiple variables. Currently, in order to keep the size manageable I specify the encoding, e.g. zlib, but needs to be applied on a variable (...
6
votes
1answer
1k views
Grouping by multiple dimensions
Grouping by a single dimension works fine for xarray DataArrays:
d = xr.DataArray([1, 2, 3], coords={'a': ['x', 'x', 'y']}, dims=['a'])
d.groupby('a').mean()) # -> DataArray (a: 2) array([1.5, 3. ...
6
votes
1answer
889 views
xarray automatically applying _FillValue to coordinates on netCDF output
I'm trying to create a cf compliant netcdf file. I can get it about 98% cf compliant with xarray but there is one issue that I am running into. When I do an ncdump on the file that I am creating, I ...
6
votes
1answer
4k views
replace values in xarray dataset with None
I want to replace values in a variable in an xarray dataset with None. I tried this approach but it did not work:
da[da['var'] == -9999.]['var'] = None
I get this error: *** TypeError: unhashable ...
6
votes
2answers
2k views
Xarray: slice coordinates with no dimensions
I am having difficultly with this topic, even though it seems like it should be rather simple.
I want to slice an xarray dataset using a set of latitude and longitude coordinates.
Here is what my ...
6
votes
1answer
2k views
xarray too slow for performance critical code
I planned to use xarray extensively in some numerically intensive scientific code that I am writing. So far, it makes the code very elegant, but I think I will have to abandon it as the performance ...
6
votes
3answers
2k views
How to join data from multiple netCDF files with xarray in Python?
I'm trying to open multiple netCDF files with xarray in Python. The files have data with same shape and I want to join them, creating a new dimension.
I tried to use concat_dim argument for xarray....
6
votes
1answer
1k views
How to convert an xarray dataset to pandas dataframes inside a dask dataframe
I have a calculation that expects a pandas dataframe as input. I'd like to run this calculation on data stored in a netCDF file that expands to 51GB - currently I've been opening the file with xarray....
6
votes
2answers
934 views
Importing and decoding dataset in xarray to avoid conflicting _FillValue and missing_value
When using xarray open_dataset or open_mfdataset to load a NARR netcdf dataset (e.g. ftp://ftp.cdc.noaa.gov/Datasets/NARR/monolevel/air.2m.2010.nc), xarray returns an error regarding "conflicting ...
6
votes
1answer
209 views
Parallelized bootstrapping with replacement with xarray/dask
I want to perform N=1000 bootstrapping with replacement on gridded data. One computation takes about 0.5s. I have access to a supercomputer exclusive node with 48 cores. Because the resampling are ...
6
votes
1answer
872 views
How best to rechunk a NetCDF file collection to Zarr dataset
I'm trying to rechunk a NetCDF file collection and create a Zarr dataset on AWS S3. I have 168 original NetCDF4 classic files with arrays of dimension time: 1, y: 3840, x: 4608 chunked as chunks={'...
5
votes
1answer
2k views
Drop duplicate times in xarray
I'm reading NetCDF files with open_mfdataset, which contain duplicate times. For each duplicate time I only want to keep the first occurrence, and drop the second (it will never occur more often). The ...
5
votes
2answers
2k views
Add 'constant' dimension to xarray Dataset
I have a series of monthly gridded datasets in CSV form. I want to read them, add a few dimensions, and then write to netcdf. I've had great experience using xarray (xray) in the past so thought I'd ...
5
votes
3answers
9k views
Python Xarray add DataArray to Dataset
Very simple question but I can't find the answer online. I have a Dataset and I just want to add a named DataArray to it. Something like dataset.add({"new_array": new_data_array}). I know about merge ...
5
votes
2answers
5k views
python mask netcdf data using shapefile
I am using the following packages:
import pandas as pd
import numpy as np
import xarray as xr
import geopandas as gpd
I have the following objects storing data:
print(precip_da)
Out[]:
<...
5
votes
2answers
928 views
Python xarray.concat then xarray.to_netcdf generates huge new file size
So I have 3 netcdf4 files (each approx 90 MB), which I would like to concatenate using the package xarray. Each file has one variable (dis) represented at a 0.5 degree resolution (lat, lon) for 365 ...
5
votes
2answers
482 views
With xarray, how to parallelize 1D operations on a multidimensional Dataset?
I have a 4D xarray Dataset. I want to carry out a linear regression between two variables on a specific dimension (here time), and keep the regression parameters in a 3D array (the remaining ...
5
votes
2answers
4k views
Create and write xarray DataArray to NetCDF in chunks
Is it also possible to create an out-of-core DataArray, and write it chunk-by-chunk to a NetCDF4 file using xarray?
For example, I want to be able to do this in an out-of-core fashion when the ...
5
votes
1answer
2k views
How to merge xArray datasets with conflicting coordinates
Let's say I have two data sets, each containing a different variable of interest and with incomplete (but not conflicting) indices:
In [1]: import xarray as xr, numpy as np
In [2]: ages = xr.Dataset(
...
5
votes
3answers
2k views
How to flatten an xarray dataset into a 1D numpy array?
Is there a simple way of flattening an xarray dataset into a single 1D numpy array?
For example, flattening the following test dataset:
xr.Dataset({
'a' : xr.DataArray(
data=[...
5
votes
2answers
1k views
Python Xarray, sort by index or dimension?
Is there a sort_index or sort_by_dimension method of some kind in xarray, much like pandas.DataFrame.sort_index(), where I can sort a xarray.DataArray object by one of its dimensions? In terms of ...
5
votes
1answer
132 views
Calculate the percentile rank of a value in a multi-dimensional array along an axis
I have a 3D dimensional array.
>>> M2 = np.arange(24).reshape((4, 3, 2))
>>> print(M2)
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
...
5
votes
1answer
487 views
Sparse DataArray Xarray search
Using DataArray objects in xarray what is the best way to find all cells that have values != 0.
For example in pandas I would do
df.loc[df.col1 > 0]
My specific example I'm trying to look at 3 ...
5
votes
1answer
1k views
Do xarray or dask really support memory-mapping?
In my experimentation so far, I've tried:
xr.open_dataset with chunks arg, and it loads the data into memory.
Set up a NetCDF4DataStore, and call ds['field'].values and it loads the data into memory.
...
5
votes
2answers
381 views
python get month of maximum value xarray
How to get the month of maximum runoff
I want to get the month of maximum runoff for each year, and for the time series as a whole. The idea is to characterise global seasonality by looking at the ...
5
votes
2answers
779 views
create netcdf using xarray with time stamp beyond year 2263
Is there a way to create a netCDF file with time dimension beyond year 2263 using xarray?
Here is how a netCDF toy dataset can be created http://xarray.pydata.org/en/stable/time-series.html
However ...
5
votes
0answers
89 views
How to create a gdal.Dataset or xarray.Dataset object from a django.contrib.gis.gdal.GDALRaster object?
I am working on a Django project in which I'm trying to get all the raster data from my Database.
Here is my model in models.py
from django.contrib.gis.db import models
class RasterWithName(models....
5
votes
0answers
120 views
Parallel appending to a zarr store via xarray.to_zarr and Dask
I am in a situation where I want to load objects, transform them into an xarray.Dataset and write that into a zarr store on s3. However, to make the loading of objects faster, I do it in parallel ...
5
votes
0answers
163 views
xarray/dask - limiting the number of threads/cpus
I'm fairly new to xarray and I'm currently trying to leverage it to subset some NetCDFs. I'm running this on a shared server and would like to know how best to limit the processing power used by ...