Полезный фрэймворк datafusion – SQL Engine на базе Arrow
https://arrow.apache.org/datafusion/
$ datafusion-cli
DataFusion CLI v17.0.0
❯ select * from 'data.csv';
+---+---+
| a | b |
+---+---+
| 1 | 2 |
+---+---+
1 row in set. Query took 0.007 seconds.
А можно даже с s3 напрямую читать:
CREATE EXTERNAL TABLE test
STORED AS PARQUET
OPTIONS(
'access_key_id' '******',
'secret_access_key' '******',
'region' 'us-east-2'
)
LOCATION 's3://bucket/path/file.parquet';