Welcome to my personal place for love, peace and happiness❣️

Полезный фрэймворк datafusion – SQL Engine на базе Arrow

https://arrow.apache.org/datafusion/

$ datafusion-cli
DataFusion CLI v17.0.0
❯ select * from 'data.csv';
+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+
1 row in set. Query took 0.007 seconds.

А можно даже с s3 напрямую читать:

CREATE EXTERNAL TABLE test
STORED AS PARQUET
OPTIONS(
    'access_key_id' '******',
    'secret_access_key' '******',
    'region' 'us-east-2'
)
LOCATION 's3://bucket/path/file.parquet';
Follow this blog
Send
Share
9 mo   Apache   Arrow   big data   datafusion