Rasteret
Index-first access to cloud-native GeoTIFF collections for ML and analysis. A growing catalog of pre-registered datasets, multi-cloud (S3/Azure/GCS), up to 20x faster reads.
As someone once said - "Metadata is a love note to the future." We believe in building systems that stay true to this, metadata is better than more data!
Come try it out! Apache-2.0, no strings attached
Docs: terrafloww.github.io/rasteret/
GitHub: github.com/terrafloww/r...
27.02.2026 17:16
π 0
π 0
π¬ 0
π 0
We replicated Major-TOM dataset from source COGs instead of image-inside-Parquet - 6x faster reads than HF datasets.
Our bet -
Your dataset is a table. Pixels stay where they already live. Everything else - splits, labels, patch geometries - lives as columns you can version, share, and reproduce.
27.02.2026 14:35
π 0
π 0
π¬ 1
π 0
2. When you need pixels, pick your output and our engine gets it for you:
- get_numpy() β [N, C, H, W] arrays
- get_xarray() β xarray Dataset
- to_torchgeo_dataset() β drop-in GeoDataset
No GDAL, no TIFF metadata re-parsing, no cold-start tax. Upto 20x faster hashtag#TorchGeo data loading.
27.02.2026 14:25
π 0
π 0
π¬ 1
π 0
The flow -
1. Build a 'Collection' from 12 built-in datasets
(Sentinel, Google DeepMind Alpha Earth Embeddings, and more), or Bring-Your-Own from any STAC API, or Parquets with COG URLs.
Filter, join, add splits, labels, quality flags as columns, with PyArrow, Polars, DuckDB without moving images.
27.02.2026 14:24
π 0
π 0
π¬ 1
π 0
Rasteret 0.3 : EO image datasets should be tables, not folders
Index COG metadata once into Parquet, skip cold starts forever. Filter with DuckDB, train with TorchGeo, fetch pixels on demand. 20x faster runs
Releasing Rasteret 0.3.x π
EO image datasets should be in tables, not folders. Something for your weekend coding bug!
We prove this in our new blog explaining how we use Apache Arrow, Parquet and our custom IO engine to redefine how to interact with EO imagery!
blog.terrafloww.com/eo-datasets-...
27.02.2026 14:22
π 3
π 1
π¬ 1
π 0
We must stop selling files: The case for streaming tensor...
Rasteret sped up geo image reads by 10x. But users still spend 80% time doing ETL before even feeding GPUs. Its like having a fast car on roads full of potholes. We investigate why, and share our solu...
This combo matters not only for better devX today, but also for agentic commerce, agent shouldn't click βContact Salesβ
It needs contract + trust + billing.
Write-up: blog.terrafloww.com/streaming-te...
If youβve built EO pipelines, Weβd love feedback.
#EarthObservation #GeoAI #Geospatial
22.01.2026 14:46
π 0
π 0
π¬ 0
π 0
Part/3
What weβre launching:
1. Rasteret SDK : provides stream of bytes from S3 β Arrow/DLPack β JAX/PyTorch tensors
2. Terrafloww Platform : handles data discovery + attribution/licensing + metering/pay-per-byte, built on open standards and does not copy/transform images
22.01.2026 14:43
π 0
π 0
π¬ 1
π 0
Part/2
Whatβs broken today:
- STAC discovery + filtering overhead
- downloading GeoTIFFs to disk
- Rasterio/GDAL loops that starve GPUs
- unclear attribution + licensing for reuse of data
22.01.2026 14:40
π 0
π 0
π¬ 1
π 0
Most GeoAI teams donβt lose time on model training.
They lose it before the GPU ever sees a tensor.
YouTube didnβt win by inventing better cameras.
It won by standardizing the player.
GeoAI needs the same shift: stop moving files, stream tensors and monetize it. But this requires a lot of work π§΅
22.01.2026 14:38
π 0
π 0
π¬ 1
π 1
GitHub - terrafloww/rasteret: A library for fast reads of Cloud Optimized Geotiff satellite images, using GeoParquet as COG metadata cache
A library for fast reads of Cloud Optimized Geotiff satellite images, using GeoParquet as COG metadata cache - terrafloww/rasteret
Been working on 'Rasteret' since the last blog I wrote, its out now as an early release.
More details - blog.terrafloww.com/rasteret-a-l...
Open to feedback and contributions, there is much more exciting work to do!
github.com/terrafloww/r...
#geospatial #cloudnativegeo #opensource
12.01.2025 06:37
π 5
π 2
π¬ 1
π 1