notebook This is my personal notebook ^_^

How to save a dataframe to a netcdf file

How to save a txt file to Netcdf file format

  • Read one daily station file from USCRN
  • Save it as a Netcdf file with dimension (time, station)
import xarray as xr
import pandas as pd
# An example of USCRN station data
uscrn_file = 'https://www1.ncdc.noaa.gov/pub/data/uscrn/products/daily01/2019/CRND0103-2019-AK_Bethel_87_WNW.txt'
uscrn_header_file = 'ftp://ftp.ncdc.noaa.gov/pub/data/uscrn/products/daily01/HEADERS.txt'
# Read header file
uscrn_header = pd.read_csv(uscrn_header_file, sep='\s+')
# Read data file
uscrn_df = pd.read_csv(uscrn_file, sep='\s+', header=None)
uscrn_header
1 2 3 4 5 6 7 8 9 10 ... 19 20 21 22 23 24 25 26 27 28
0 WBANNO LST_DATE CRX_VN LONGITUDE LATITUDE T_DAILY_MAX T_DAILY_MIN T_DAILY_MEAN T_DAILY_AVG P_DAILY_CALC ... SOIL_MOISTURE_5_DAILY SOIL_MOISTURE_10_DAILY SOIL_MOISTURE_20_DAILY SOIL_MOISTURE_50_DAILY SOIL_MOISTURE_100_DAILY SOIL_TEMP_5_DAILY SOIL_TEMP_10_DAILY SOIL_TEMP_20_DAILY SOIL_TEMP_50_DAILY SOIL_TEMP_100_DAILY
1 XXXXX YYYYMMDD XXXXXX Decimal_degrees Decimal_degrees Celsius Celsius Celsius Celsius mm ... m^3/m^3 m^3/m^3 m^3/m^3 m^3/m^3 m^3/m^3 Celsius Celsius Celsius Celsius Celsius

2 rows × 28 columns

uscrn_df.head(5)
0 1 2 3 4 5 6 7 8 9 ... 18 19 20 21 22 23 24 25 26 27
0 26656 20190101 2.515 -164.08 61.35 0.7 -12.4 -5.8 -5.1 0.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
1 26656 20190102 2.515 -164.08 61.35 -12.3 -17.0 -14.7 -14.4 0.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
2 26656 20190103 2.515 -164.08 61.35 -9.9 -14.0 -12.0 -11.8 0.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
3 26656 20190104 2.515 -164.08 61.35 -7.6 -12.3 -9.9 -10.4 0.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
4 26656 20190105 2.515 -164.08 61.35 -8.2 -14.1 -11.2 -11.5 0.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0

5 rows × 28 columns

# Set uscrn_df column name using uscrn_header information
uscrn_df.columns = uscrn_header.iloc[0, :]
uscrn_df.head(5)
WBANNO LST_DATE CRX_VN LONGITUDE LATITUDE T_DAILY_MAX T_DAILY_MIN T_DAILY_MEAN T_DAILY_AVG P_DAILY_CALC ... SOIL_MOISTURE_5_DAILY SOIL_MOISTURE_10_DAILY SOIL_MOISTURE_20_DAILY SOIL_MOISTURE_50_DAILY SOIL_MOISTURE_100_DAILY SOIL_TEMP_5_DAILY SOIL_TEMP_10_DAILY SOIL_TEMP_20_DAILY SOIL_TEMP_50_DAILY SOIL_TEMP_100_DAILY
0 26656 20190101 2.515 -164.08 61.35 0.7 -12.4 -5.8 -5.1 0.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
1 26656 20190102 2.515 -164.08 61.35 -12.3 -17.0 -14.7 -14.4 0.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
2 26656 20190103 2.515 -164.08 61.35 -9.9 -14.0 -12.0 -11.8 0.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
3 26656 20190104 2.515 -164.08 61.35 -7.6 -12.3 -9.9 -10.4 0.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
4 26656 20190105 2.515 -164.08 61.35 -8.2 -14.1 -11.2 -11.5 0.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0

5 rows × 28 columns

# Drop constant variables
uscrn_df_clean = uscrn_df.drop(['WBANNO', 'LONGITUDE', 'LATITUDE', 'CRX_VN'], axis=1)
uscrn_df_clean.head(5)
LST_DATE T_DAILY_MAX T_DAILY_MIN T_DAILY_MEAN T_DAILY_AVG P_DAILY_CALC SOLARAD_DAILY SUR_TEMP_DAILY_TYPE SUR_TEMP_DAILY_MAX SUR_TEMP_DAILY_MIN ... SOIL_MOISTURE_5_DAILY SOIL_MOISTURE_10_DAILY SOIL_MOISTURE_20_DAILY SOIL_MOISTURE_50_DAILY SOIL_MOISTURE_100_DAILY SOIL_TEMP_5_DAILY SOIL_TEMP_10_DAILY SOIL_TEMP_20_DAILY SOIL_TEMP_50_DAILY SOIL_TEMP_100_DAILY
0 20190101 0.7 -12.4 -5.8 -5.1 0.0 0.26 C -1.1 -12.4 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
1 20190102 -12.3 -17.0 -14.7 -14.4 0.0 0.68 C -12.4 -21.3 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
2 20190103 -9.9 -14.0 -12.0 -11.8 0.0 0.17 C -9.9 -15.0 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
3 20190104 -7.6 -12.3 -9.9 -10.4 0.0 0.25 C -7.7 -13.3 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
4 20190105 -8.2 -14.1 -11.2 -11.5 0.0 0.48 C -8.8 -14.3 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0

5 rows × 24 columns

# Set LST_DATE as the dataframe index 
uscrn_df_clean = uscrn_df_clean.set_index('LST_DATE')
uscrn_df_clean.head(5)
T_DAILY_MAX T_DAILY_MIN T_DAILY_MEAN T_DAILY_AVG P_DAILY_CALC SOLARAD_DAILY SUR_TEMP_DAILY_TYPE SUR_TEMP_DAILY_MAX SUR_TEMP_DAILY_MIN SUR_TEMP_DAILY_AVG ... SOIL_MOISTURE_5_DAILY SOIL_MOISTURE_10_DAILY SOIL_MOISTURE_20_DAILY SOIL_MOISTURE_50_DAILY SOIL_MOISTURE_100_DAILY SOIL_TEMP_5_DAILY SOIL_TEMP_10_DAILY SOIL_TEMP_20_DAILY SOIL_TEMP_50_DAILY SOIL_TEMP_100_DAILY
LST_DATE
20190101 0.7 -12.4 -5.8 -5.1 0.0 0.26 C -1.1 -12.4 -5.3 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
20190102 -12.3 -17.0 -14.7 -14.4 0.0 0.68 C -12.4 -21.3 -16.4 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
20190103 -9.9 -14.0 -12.0 -11.8 0.0 0.17 C -9.9 -15.0 -11.8 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
20190104 -7.6 -12.3 -9.9 -10.4 0.0 0.25 C -7.7 -13.3 -10.5 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0
20190105 -8.2 -14.1 -11.2 -11.5 0.0 0.48 C -8.8 -14.3 -11.9 ... -99.0 -99.0 -99.0 -99.0 -99.0 -9999.0 -9999.0 -9999.0 -9999.0 -9999.0

5 rows × 23 columns

# Use xarray function to convert dataframe into dataset
uscrn_ds = xr.Dataset.from_dataframe(uscrn_df_clean)
uscrn_ds
<xarray.Dataset>
Dimensions:                  (LST_DATE: 89)
Coordinates:
  * LST_DATE                 (LST_DATE) int64 20190101 20190102 ... 20190330
Data variables:
    T_DAILY_MAX              (LST_DATE) float64 0.7 -12.3 -9.9 ... 0.0 0.0
    T_DAILY_MIN              (LST_DATE) float64 -12.4 -17.0 -14.0 ... -4.5 -2.5
    T_DAILY_MEAN             (LST_DATE) float64 -5.8 -14.7 -12.0 ... -2.3 -1.3
    T_DAILY_AVG              (LST_DATE) float64 -5.1 -14.4 -11.8 ... -1.4 -0.3
    P_DAILY_CALC             (LST_DATE) float64 0.0 0.0 0.0 ... 0.0 0.0
    SOLARAD_DAILY            (LST_DATE) float64 0.26 0.68 0.17 ... 2.02 0.57
    SUR_TEMP_DAILY_TYPE      (LST_DATE) object 'C' 'C' 'C' 'C' ... 'U' 'C' 'C'
    SUR_TEMP_DAILY_MAX       (LST_DATE) float64 -1.1 -12.4 -9.9 ... 6.5 3.6
    SUR_TEMP_DAILY_MIN       (LST_DATE) float64 -12.4 -21.3 -15.0 ... -6.5 -3.6
    SUR_TEMP_DAILY_AVG       (LST_DATE) float64 -5.3 -16.4 -11.8 ... -0.2 0.1
    RH_DAILY_MAX             (LST_DATE) float64 97.6 92.8 ... -9.999e+03
    RH_DAILY_MIN             (LST_DATE) float64 79.3 82.0 ... -9.999e+03
    RH_DAILY_AVG             (LST_DATE) float64 92.0 87.4 ... -9.999e+03
    SOIL_MOISTURE_5_DAILY    (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_10_DAILY   (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_20_DAILY   (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_50_DAILY   (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_100_DAILY  (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_TEMP_5_DAILY        (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_10_DAILY       (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_20_DAILY       (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_50_DAILY       (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_100_DAILY      (LST_DATE) float64 -9.999e+03 ... -9.999e+03
# Add dropped constant ['WBANNO', 'LONGITUDE', 'LATITUDE', 'CRX_VN'] to xarray dataset
for x in ['WBANNO', 'LONGITUDE', 'LATITUDE', 'CRX_VN']:
    uscrn_ds[x] = uscrn_df[x][0]
uscrn_ds
<xarray.Dataset>
Dimensions:                  (LST_DATE: 89)
Coordinates:
  * LST_DATE                 (LST_DATE) int64 20190101 20190102 ... 20190330
Data variables:
    T_DAILY_MAX              (LST_DATE) float64 0.7 -12.3 -9.9 ... 0.0 0.0
    T_DAILY_MIN              (LST_DATE) float64 -12.4 -17.0 -14.0 ... -4.5 -2.5
    T_DAILY_MEAN             (LST_DATE) float64 -5.8 -14.7 -12.0 ... -2.3 -1.3
    T_DAILY_AVG              (LST_DATE) float64 -5.1 -14.4 -11.8 ... -1.4 -0.3
    P_DAILY_CALC             (LST_DATE) float64 0.0 0.0 0.0 ... 0.0 0.0
    SOLARAD_DAILY            (LST_DATE) float64 0.26 0.68 0.17 ... 2.02 0.57
    SUR_TEMP_DAILY_TYPE      (LST_DATE) object 'C' 'C' 'C' 'C' ... 'U' 'C' 'C'
    SUR_TEMP_DAILY_MAX       (LST_DATE) float64 -1.1 -12.4 -9.9 ... 6.5 3.6
    SUR_TEMP_DAILY_MIN       (LST_DATE) float64 -12.4 -21.3 -15.0 ... -6.5 -3.6
    SUR_TEMP_DAILY_AVG       (LST_DATE) float64 -5.3 -16.4 -11.8 ... -0.2 0.1
    RH_DAILY_MAX             (LST_DATE) float64 97.6 92.8 ... -9.999e+03
    RH_DAILY_MIN             (LST_DATE) float64 79.3 82.0 ... -9.999e+03
    RH_DAILY_AVG             (LST_DATE) float64 92.0 87.4 ... -9.999e+03
    SOIL_MOISTURE_5_DAILY    (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_10_DAILY   (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_20_DAILY   (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_50_DAILY   (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_100_DAILY  (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_TEMP_5_DAILY        (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_10_DAILY       (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_20_DAILY       (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_50_DAILY       (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_100_DAILY      (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    WBANNO                   int64 26656
    LONGITUDE                float64 -164.1
    LATITUDE                 float64 61.35
    CRX_VN                   float64 2.515
# Set time and station as xarray dataset coordinates
uscrn_ds.set_coords(['LST_DATE', 'WBANNO'])
<xarray.Dataset>
Dimensions:                  (LST_DATE: 89)
Coordinates:
  * LST_DATE                 (LST_DATE) int64 20190101 20190102 ... 20190330
    WBANNO                   int64 26656
Data variables:
    T_DAILY_MAX              (LST_DATE) float64 0.7 -12.3 -9.9 ... 0.0 0.0
    T_DAILY_MIN              (LST_DATE) float64 -12.4 -17.0 -14.0 ... -4.5 -2.5
    T_DAILY_MEAN             (LST_DATE) float64 -5.8 -14.7 -12.0 ... -2.3 -1.3
    T_DAILY_AVG              (LST_DATE) float64 -5.1 -14.4 -11.8 ... -1.4 -0.3
    P_DAILY_CALC             (LST_DATE) float64 0.0 0.0 0.0 ... 0.0 0.0
    SOLARAD_DAILY            (LST_DATE) float64 0.26 0.68 0.17 ... 2.02 0.57
    SUR_TEMP_DAILY_TYPE      (LST_DATE) object 'C' 'C' 'C' 'C' ... 'U' 'C' 'C'
    SUR_TEMP_DAILY_MAX       (LST_DATE) float64 -1.1 -12.4 -9.9 ... 6.5 3.6
    SUR_TEMP_DAILY_MIN       (LST_DATE) float64 -12.4 -21.3 -15.0 ... -6.5 -3.6
    SUR_TEMP_DAILY_AVG       (LST_DATE) float64 -5.3 -16.4 -11.8 ... -0.2 0.1
    RH_DAILY_MAX             (LST_DATE) float64 97.6 92.8 ... -9.999e+03
    RH_DAILY_MIN             (LST_DATE) float64 79.3 82.0 ... -9.999e+03
    RH_DAILY_AVG             (LST_DATE) float64 92.0 87.4 ... -9.999e+03
    SOIL_MOISTURE_5_DAILY    (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_10_DAILY   (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_20_DAILY   (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_50_DAILY   (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_MOISTURE_100_DAILY  (LST_DATE) float64 -99.0 -99.0 ... -99.0 -99.0
    SOIL_TEMP_5_DAILY        (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_10_DAILY       (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_20_DAILY       (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_50_DAILY       (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_100_DAILY      (LST_DATE) float64 -9.999e+03 ... -9.999e+03
    LONGITUDE                float64 -164.1
    LATITUDE                 float64 61.35
    CRX_VN                   float64 2.515
# Expand dataset dimension from one to two
uscrn_ds.set_coords(['LST_DATE', 'WBANNO']).expand_dims('WBANNO')
<xarray.Dataset>
Dimensions:                  (LST_DATE: 89, WBANNO: 1)
Coordinates:
  * LST_DATE                 (LST_DATE) int64 20190101 20190102 ... 20190330
  * WBANNO                   (WBANNO) int64 26656
Data variables:
    T_DAILY_MAX              (WBANNO, LST_DATE) float64 0.7 -12.3 ... 0.0 0.0
    T_DAILY_MIN              (WBANNO, LST_DATE) float64 -12.4 -17.0 ... -2.5
    T_DAILY_MEAN             (WBANNO, LST_DATE) float64 -5.8 -14.7 ... -2.3 -1.3
    T_DAILY_AVG              (WBANNO, LST_DATE) float64 -5.1 -14.4 ... -1.4 -0.3
    P_DAILY_CALC             (WBANNO, LST_DATE) float64 0.0 0.0 0.0 ... 0.0 0.0
    SOLARAD_DAILY            (WBANNO, LST_DATE) float64 0.26 0.68 ... 2.02 0.57
    SUR_TEMP_DAILY_TYPE      (WBANNO, LST_DATE) object 'C' 'C' 'C' ... 'C' 'C'
    SUR_TEMP_DAILY_MAX       (WBANNO, LST_DATE) float64 -1.1 -12.4 ... 6.5 3.6
    SUR_TEMP_DAILY_MIN       (WBANNO, LST_DATE) float64 -12.4 -21.3 ... -3.6
    SUR_TEMP_DAILY_AVG       (WBANNO, LST_DATE) float64 -5.3 -16.4 ... -0.2 0.1
    RH_DAILY_MAX             (WBANNO, LST_DATE) float64 97.6 92.8 ... -9.999e+03
    RH_DAILY_MIN             (WBANNO, LST_DATE) float64 79.3 82.0 ... -9.999e+03
    RH_DAILY_AVG             (WBANNO, LST_DATE) float64 92.0 87.4 ... -9.999e+03
    SOIL_MOISTURE_5_DAILY    (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_10_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_20_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_50_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_100_DAILY  (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_TEMP_5_DAILY        (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_10_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_20_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_50_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_100_DAILY      (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    LONGITUDE                (WBANNO) float64 -164.1
    LATITUDE                 (WBANNO) float64 61.35
    CRX_VN                   (WBANNO) float64 2.515
# Make sure assign the results to a variable
uscrn_ds = uscrn_ds.set_coords(['LST_DATE', 'WBANNO']).expand_dims('WBANNO')
uscrn_ds
<xarray.Dataset>
Dimensions:                  (LST_DATE: 89, WBANNO: 1)
Coordinates:
  * LST_DATE                 (LST_DATE) int64 20190101 20190102 ... 20190330
  * WBANNO                   (WBANNO) int64 26656
Data variables:
    T_DAILY_MAX              (WBANNO, LST_DATE) float64 0.7 -12.3 ... 0.0 0.0
    T_DAILY_MIN              (WBANNO, LST_DATE) float64 -12.4 -17.0 ... -2.5
    T_DAILY_MEAN             (WBANNO, LST_DATE) float64 -5.8 -14.7 ... -2.3 -1.3
    T_DAILY_AVG              (WBANNO, LST_DATE) float64 -5.1 -14.4 ... -1.4 -0.3
    P_DAILY_CALC             (WBANNO, LST_DATE) float64 0.0 0.0 0.0 ... 0.0 0.0
    SOLARAD_DAILY            (WBANNO, LST_DATE) float64 0.26 0.68 ... 2.02 0.57
    SUR_TEMP_DAILY_TYPE      (WBANNO, LST_DATE) object 'C' 'C' 'C' ... 'C' 'C'
    SUR_TEMP_DAILY_MAX       (WBANNO, LST_DATE) float64 -1.1 -12.4 ... 6.5 3.6
    SUR_TEMP_DAILY_MIN       (WBANNO, LST_DATE) float64 -12.4 -21.3 ... -3.6
    SUR_TEMP_DAILY_AVG       (WBANNO, LST_DATE) float64 -5.3 -16.4 ... -0.2 0.1
    RH_DAILY_MAX             (WBANNO, LST_DATE) float64 97.6 92.8 ... -9.999e+03
    RH_DAILY_MIN             (WBANNO, LST_DATE) float64 79.3 82.0 ... -9.999e+03
    RH_DAILY_AVG             (WBANNO, LST_DATE) float64 92.0 87.4 ... -9.999e+03
    SOIL_MOISTURE_5_DAILY    (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_10_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_20_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_50_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_100_DAILY  (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_TEMP_5_DAILY        (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_10_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_20_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_50_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_100_DAILY      (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    LONGITUDE                (WBANNO) float64 -164.1
    LATITUDE                 (WBANNO) float64 61.35
    CRX_VN                   (WBANNO) float64 2.515
# List the variables of the dataset
uscrn_ds.data_vars
Data variables:
    T_DAILY_MAX              (WBANNO, LST_DATE) float64 0.7 -12.3 ... 0.0 0.0
    T_DAILY_MIN              (WBANNO, LST_DATE) float64 -12.4 -17.0 ... -2.5
    T_DAILY_MEAN             (WBANNO, LST_DATE) float64 -5.8 -14.7 ... -2.3 -1.3
    T_DAILY_AVG              (WBANNO, LST_DATE) float64 -5.1 -14.4 ... -1.4 -0.3
    P_DAILY_CALC             (WBANNO, LST_DATE) float64 0.0 0.0 0.0 ... 0.0 0.0
    SOLARAD_DAILY            (WBANNO, LST_DATE) float64 0.26 0.68 ... 2.02 0.57
    SUR_TEMP_DAILY_TYPE      (WBANNO, LST_DATE) object 'C' 'C' 'C' ... 'C' 'C'
    SUR_TEMP_DAILY_MAX       (WBANNO, LST_DATE) float64 -1.1 -12.4 ... 6.5 3.6
    SUR_TEMP_DAILY_MIN       (WBANNO, LST_DATE) float64 -12.4 -21.3 ... -3.6
    SUR_TEMP_DAILY_AVG       (WBANNO, LST_DATE) float64 -5.3 -16.4 ... -0.2 0.1
    RH_DAILY_MAX             (WBANNO, LST_DATE) float64 97.6 92.8 ... -9.999e+03
    RH_DAILY_MIN             (WBANNO, LST_DATE) float64 79.3 82.0 ... -9.999e+03
    RH_DAILY_AVG             (WBANNO, LST_DATE) float64 92.0 87.4 ... -9.999e+03
    SOIL_MOISTURE_5_DAILY    (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_10_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_20_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_50_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_100_DAILY  (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_TEMP_5_DAILY        (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_10_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_20_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_50_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_100_DAILY      (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    LONGITUDE                (WBANNO) float64 -164.1
    LATITUDE                 (WBANNO) float64 61.35
    CRX_VN                   (WBANNO) float64 2.515
# Add units for each variable
for x in uscrn_ds.data_vars:
    uscrn_ds[x].attrs['unit'] = uscrn_header.loc[1, uscrn_header.iloc[0, :]==x].values[0]
uscrn_ds
<xarray.Dataset>
Dimensions:                  (LST_DATE: 89, WBANNO: 1)
Coordinates:
  * LST_DATE                 (LST_DATE) int64 20190101 20190102 ... 20190330
  * WBANNO                   (WBANNO) int64 26656
Data variables:
    T_DAILY_MAX              (WBANNO, LST_DATE) float64 0.7 -12.3 ... 0.0 0.0
    T_DAILY_MIN              (WBANNO, LST_DATE) float64 -12.4 -17.0 ... -2.5
    T_DAILY_MEAN             (WBANNO, LST_DATE) float64 -5.8 -14.7 ... -2.3 -1.3
    T_DAILY_AVG              (WBANNO, LST_DATE) float64 -5.1 -14.4 ... -1.4 -0.3
    P_DAILY_CALC             (WBANNO, LST_DATE) float64 0.0 0.0 0.0 ... 0.0 0.0
    SOLARAD_DAILY            (WBANNO, LST_DATE) float64 0.26 0.68 ... 2.02 0.57
    SUR_TEMP_DAILY_TYPE      (WBANNO, LST_DATE) object 'C' 'C' 'C' ... 'C' 'C'
    SUR_TEMP_DAILY_MAX       (WBANNO, LST_DATE) float64 -1.1 -12.4 ... 6.5 3.6
    SUR_TEMP_DAILY_MIN       (WBANNO, LST_DATE) float64 -12.4 -21.3 ... -3.6
    SUR_TEMP_DAILY_AVG       (WBANNO, LST_DATE) float64 -5.3 -16.4 ... -0.2 0.1
    RH_DAILY_MAX             (WBANNO, LST_DATE) float64 97.6 92.8 ... -9.999e+03
    RH_DAILY_MIN             (WBANNO, LST_DATE) float64 79.3 82.0 ... -9.999e+03
    RH_DAILY_AVG             (WBANNO, LST_DATE) float64 92.0 87.4 ... -9.999e+03
    SOIL_MOISTURE_5_DAILY    (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_10_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_20_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_50_DAILY   (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_MOISTURE_100_DAILY  (WBANNO, LST_DATE) float64 -99.0 -99.0 ... -99.0
    SOIL_TEMP_5_DAILY        (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_10_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_20_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_50_DAILY       (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    SOIL_TEMP_100_DAILY      (WBANNO, LST_DATE) float64 -9.999e+03 ... -9.999e+03
    LONGITUDE                (WBANNO) float64 -164.1
    LATITUDE                 (WBANNO) float64 61.35
    CRX_VN                   (WBANNO) float64 2.515
# Save the dataset to a netcdf file
uscrn_ds.to_netcdf(str(uscrn_ds.WBANNO.values[0]) + '_' + str(uscrn_ds.LST_DATE.values[0]) + '_' + str(uscrn_ds.LST_DATE.values[-1]) + '.nc')
!ls -lrth
total 232
-rw-r--r--  1 zeng  staff    61K Mar 31 21:11 xarray_notes.ipynb
-rw-r--r--  1 zeng  staff    49K Mar 31 21:11 26656_20190101_20190330.nc

Zarr file compress notes

store = s3fs.S3Map(root=s3_path+zarr_name, s3=s3, check=False)
compressor = zarr.Blosc(cname='zstd', clevel=5)
encoding = {vname: {'compressor': compressor} for vname in tmp_ds.data_vars}
#Try the consolidated=True option in next released version.
tmp_ds.to_zarr(store=store, encoding = encoding, consolidated=True)

The compress level clevel really affects the file saving speed and size. I did some experiments, and found clevel=5 is usually a good choice.

Experiment results:

  • clevel=9; 18min; 7.6G
  • clevel=5; 7min; 8.3G
  • clevel=3; 8min; 9.1G
  • clevel=0; 8min; 12G

Git best practice

When working in git, it’s alway safe to have your own branch and do your personal work within it. After everything is ready, then you can merge it with the master branch. To do that, we need

#Create the branch on your local machine and switch in this branch :
$ git checkout -b [name_of_your_new_branch]
#Change working branch :
$ git checkout [name_of_your_new_branch]
#Push the branch on github :
$ git push origin [name_of_your_new_branch]
#You can see all branches created by using :
$ git branch

To merge your good code with the master branch, we can (assume test is your own branch)

$git checkout master
$git pull origin master
$git merge test
$git push origin master

References:

seaborn figure style setting notes

We can set different figure styles for seaborn, such as poster, talk, and paper. For example,

sns.set_style('whitegrid')
sns.set_context('talk')

Details can be found at

Notes for Postgres and psycopg2

  • A database can have several schemas
  • A schemas can have several tables
  • Unless a table is public, refer to a table by schema_name.table_name

Here’s an example of connectting to a postgres database using psycopg2 python package

import psycopg2

conn = psycopg2.connect(host="hostname", database="databasename", user="username", password="password")
cur = conn.cursor()
#list all tables in a database
#cur.execute("""SELECT table_name FROM information_schema.tables""")
#list all schemas in a database
cur.execute("""SELECT schema_name FROM information_schema.schemata;""")
for schema in cur.fetchall():
    print(schema)
    
# List all tables in certain schema
conn = psycopg2.connect(host="hostname", database="databasename", user="username", password="password")
cur = conn.cursor()
cur.execute("""SELECT table_name FROM information_schema.tables WHERE table_schema = 'xxxx'""")
for x in cur.fetchall():
    print(x)