数据库应用
如果您拥有数据库后不知道干点什么,不妨参考作者的另一个开源项目:Vonng/isd
您可以直接复用监控系统Grafana,以交互式的方式查阅近30000个地面气象站过去120年间的亚小时级气象数据。
ISD —— Intergrated Surface Data
这里包含了下载、解析、处理、可视化NOAA ISD数据集所需的所有工具。 能让您查阅近30000个地面气象站过去120年间的亚小时级气象数据。并充分体验PostgreSQL带来的强大的数据分析与处理能力!
SYNOPSIS
Download, Parse, Visualize Intergrated Suface Dataset.
Including 30000 meteorology station, sub-hourly observation records, from 1900-2020.

Quick Started
- 
Clone repo git clone https://github.com/Vonng/isd && cd isd
- 
Prepare a postgres database Connect via something like isdorpostgres://user:pass@host/dbname)# skip this if you already have a viable database PGURL=postgres psql ${PGURL} -c 'CREATE DATABASE isd;' # database connection string, something like `isd` or `postgres://user:pass@host/dbname` PGURL='isd' psql ${PGURL} -AXtwc 'CREATE EXTENSION postgis;' # create tables, partitions, functions psql ${PGURL} -AXtwf 'sql/schema.sql'
- 
Download data - ISD Station: Station metadata, id, name, location, country, etc…
- ISD History: Station observation records: observation count per month
- ISD Hourly: Yearly archived station (sub-)hourly observation records
- ISD Daily: Yearly archvied station daily aggregated summary
 git clone https://github.com/Vonng/isd && cd isd bin/get-isd-station.sh # download isd station from noaa (proxy makes it faster) bin/get-isd-history.sh # download isd history observation from noaa bin/get-isd-hourly.sh <year> # download isd hourly data (yearly tarball 1901-2020) bin/get-isd-daily.sh <year> # download isd daily data (yearly tarball 1929-2020)
- 
Build Parser There are two ISD dataset parsers written in Golang : isdhfor isd hourly dataset andisddfor isd daily dataset.make isdhandmake isddwill build it and copy to bin. These parsers are required for loading data into database.You can download pre-compiled binary to bin/ dir to skip this phase. 
- 
Load data Metadata includes world_fences,china_fences,isd_elements,isd_mwcode,isd_station,isd_history. These are gzipped csv file lies indata/meta/.world_fences,china_fences,isd_elements,isd_mwcodeare constant dict table. Butisd_stationandisd_historyare frequently updated. You’ll have to download it from noaa before loading it.# load metadata: fences, dicts, station, history,... bin/load-meta.sh # load a year's daily data to database bin/load-isd-daily <year> # load a year's hourly data to database bin/laod-isd-hourly <year>Note that the original isd_dailydataset has some un-cleansed data, refer caveat for detail.
Data
Dataset
| 数据集 | 样本 | 文档 | 备注 | 
|---|---|---|---|
| ISD Hourly | isd-hourly-sample.csv | isd-hourly-document.pdf | (Sub-) Hour oberservation records | 
| ISD Daily | isd-daily-sample.csv | isd-daily-format.txt | Daily summary | 
| ISD Monthly | N/A | isd-gsom-document.pdf | Not used, gen from daily | 
| ISD Yearly | N/A | isd-gsoy-document.pdf | Not used, gen from monthly | 
Hourly Data: Oringinal tarball size 105GB, Table size 1TB (+600GB Indexes).
Daily Data: Oringinal tarball size 3.2GB, table size 24 GB
It is recommended to have 2TB storage for a full installation, and at least 40GB for daily data only installation.
Schema
Data schema definition
Station
CREATE TABLE public.isd_station
(
    station    VARCHAR(12) PRIMARY KEY,
    usaf       VARCHAR(6) GENERATED ALWAYS AS (substring(station, 1, 6)) STORED,
    wban       VARCHAR(5) GENERATED ALWAYS AS (substring(station, 7, 5)) STORED,
    name       VARCHAR(32),
    country    VARCHAR(2),
    province   VARCHAR(2),
    icao       VARCHAR(4),
    location   GEOMETRY(POINT),
    longitude  NUMERIC GENERATED ALWAYS AS (Round(ST_X(location)::NUMERIC, 6)) STORED,
    latitude   NUMERIC GENERATED ALWAYS AS (Round(ST_Y(location)::NUMERIC, 6)) STORED,
    elevation  NUMERIC,
    period     daterange,
    begin_date DATE GENERATED ALWAYS AS (lower(period)) STORED,
    end_date   DATE GENERATED ALWAYS AS (upper(period)) STORED
);
Hourly Data
CREATE TABLE public.isd_hourly
(
    station    VARCHAR(11) NOT NULL,
    ts         TIMESTAMP   NOT NULL,
    temp       NUMERIC(3, 1),
    dewp       NUMERIC(3, 1),
    slp        NUMERIC(5, 1),
    stp        NUMERIC(5, 1),
    vis        NUMERIC(6),
    wd_angle   NUMERIC(3),
    wd_speed   NUMERIC(4, 1),
    wd_gust    NUMERIC(4, 1),
    wd_code    VARCHAR(1),
    cld_height NUMERIC(5),
    cld_code   VARCHAR(2),
    sndp       NUMERIC(5, 1),
    prcp       NUMERIC(5, 1),
    prcp_hour  NUMERIC(2),
    prcp_code  VARCHAR(1),
    mw_code    VARCHAR(2),
    aw_code    VARCHAR(2),
    pw_code    VARCHAR(1),
    pw_hour    NUMERIC(2),
    data       JSONB
) PARTITION BY RANGE (ts);
Daily Data
CREATE TABLE public.isd_daily
(
    station     VARCHAR(12) NOT NULL,
    ts          DATE        NOT NULL,
    temp_mean   NUMERIC(3, 1),
    temp_min    NUMERIC(3, 1),
    temp_max    NUMERIC(3, 1),
    dewp_mean   NUMERIC(3, 1),
    slp_mean    NUMERIC(5, 1),
    stp_mean    NUMERIC(5, 1),
    vis_mean    NUMERIC(6),
    wdsp_mean   NUMERIC(4, 1),
    wdsp_max    NUMERIC(4, 1),
    gust        NUMERIC(4, 1),
    prcp_mean   NUMERIC(5, 1),
    prcp        NUMERIC(5, 1),
    sndp        NuMERIC(5, 1),
    is_foggy    BOOLEAN,
    is_rainy    BOOLEAN,
    is_snowy    BOOLEAN,
    is_hail     BOOLEAN,
    is_thunder  BOOLEAN,
    is_tornado  BOOLEAN,
    temp_count  SMALLINT,
    dewp_count  SMALLINT,
    slp_count   SMALLINT,
    stp_count   SMALLINT,
    wdsp_count  SMALLINT,
    visib_count SMALLINT,
    temp_min_f  BOOLEAN,
    temp_max_f  BOOLEAN,
    prcp_flag   CHAR,
    PRIMARY KEY (ts, station)
) PARTITION BY RANGE (ts);
Update
ISD Daily and ISD hourly dataset will rolling update each day. Run following scripts to load latest data into database.
# download, clean, reload latest hourly dataset
bin/get-isd-daily.sh
bin/load-isd-daily.sh
# download, clean, reload latest daily dataset
bin/get-isd-daily.sh
bin/load-isd-daily.sh
# recalculate latest partition of monthly and yearly
bin/refresh-latest.sh
Parser
There are two parser: isdd and isdh, which takes noaa original yearly tarball as input, generate CSV as output (which could be directly consume by PostgreSQL Copy command).
NAME
	isdh -- Intergrated Surface Dataset Hourly Parser
SYNOPSIS
	isdh [-i <input|stdin>] [-o <output|st>] -p -d -c -v
DESCRIPTION
	The isdh program takes isd hourly (yearly tarball file) as input.
	And generate csv format as output
OPTIONS
	-i	<input>		input file, stdin by default
	-o	<output>	output file, stdout by default
	-p	<profpath>	pprof file path (disable by default)	
	-v                verbose progress report
	-d                de-duplicate rows (raw, ts-first, hour-first)
	-c                add comma separated extra columns
UI
ISD Station

ISD Monthly
