# hostFS
**Repository Path**: shf675/hostfs
## Basic Information
- **Project Name**: hostFS
- **Description**: No description available
- **Primary Language**: C++
- **License**: MIT
- **Default Branch**: main
- **Homepage**: None
- **GVP Project**: No
## Statistics
- **Stars**: 0
- **Forks**: 0
- **Created**: 2025-08-23
- **Last Updated**: 2025-08-23
## Categories & Tags
**Categories**: Uncategorized
**Tags**: None
## README
# HostFS

HostFS allows you to navigate and explore the host filesystem from DuckDB.
Install via
```plaintext
INSTALL hostfs FROM community;
LOAD hostfs;
```
Example 1: Navigate to the workspace and list the files.
```plaintext
D PRAGMA cd('/Users/paul/workspace');
D PRAGMA ls;
┌───────────────────────────────┐
│ path │
│ varchar │
├───────────────────────────────┤
│ ./duckdb │
│ ./playground │
│ ./hostfs │
...
```
Example 2: List the top 3 file types by total size, with file count, ordered by size.
```plaintext
D SELECT size, count, file_extension AS "type"
FROM (
SELECT SUM(file_size(path)) AS size_raw, hsize(size_raw) AS size, COUNT(*) AS count, file_extension(path) AS file_extension
FROM lsr('/Users/paul/workspace', 10)
GROUP BY file_extension(path)
) AS subquery
ORDER BY size_raw DESC
LIMIT 3;
┌───────────┬───────┬─────────┐
│ size │ count │ type │
│ varchar │ int64 │ varchar │
├───────────┼───────┼─────────┤
│ 246.95 GB │ 29 │ .duckdb │
│ 90.33 GB │ 3776 │ .tmp │
│ 26.17 GB │ 28175 │ .csv │
└───────────┴───────┴─────────┘
```
Example 3: Find the files you were working on last to continue your analysis.
```plaintext
D SELECT path, file_last_modified(path) AS date FROM ls() WHERE 'csv' IN file_extension(path) ORDER BY date LIMIT 1 ;
┌───────────────────────────┬─────────────────────┐
│ path │ date │
│ varchar │ timestamp │
├───────────────────────────┼─────────────────────┤
│ ./sketch_results_join.csv │ 2024-07-13 23:25:48 │
└───────────────────────────┴─────────────────────┘
D SELECT n_rows, std, n_duplicates FROM './sketch_results.csv' LIMIT 4;
┌────────┬────────────────────┬────────────────────┐
│ n_rows │ std │ n_duplicates │
│ int64 │ double │ double │
├────────┼────────────────────┼────────────────────┤
│ 1000 │ 26.855167100578615 │ 1.0405827263267429 │
│ 1000 │ 44.76159067772279 │ 1.1547344110854503 │
│ 1000 │ 31.675858315126995 │ 1.5649452269170578 │
│ 1000 │ 52.60798418491246 │ 3.1545741324921135 │
└────────┴────────────────────┴────────────────────┘
```
## Functions Overview
### Scalar Functions
> All path functions like `is_dir(path)` return `NULL` if the path does not exist!
| **Function** | **Description** | **Parameters** |
|------------------------|------------------------------------------------------------|-------------------------------------|
| `pwd()` | Get the current working directory. | None |
| `path_separator()` | Get the path separator for the current OS. | None |
| `is_file(path)` | Check if the given path is a file. | `path`: File path (String) |
| `is_dir(path)` | Check if the given path is a directory. | `path`: Directory path (String) |
| `file_name(path)` | Get the file name from the path. | `path`: File path (String) |
| `file_extension(path)`| Get the file extension from the path. | `path`: File path (String) |
| `file_size(path)` | Get the size of the file (in bytes). | `path`: File path (String) |
| `file_last_modified(path)`| Get the last modified time of the file. | `path`: File path (String) |
| `absolute_path(path)` | Get the absolute path of the file or directory. | `path`: File path (String) |
| `path_exists(path)` | Check if the given path exists. | `path`: File or directory path (String) |
| `path_type(path)` | Determine the type of the path (file or directory). | `path`: File or directory path (String) |
| `path_split(path)` | Split the path into by the path separator. | `path`: File or directory path (String) |
| `hsize(bytes)` | Format file size into a human-readable form (e.g., KB, MB). | `bytes`: Number of bytes (Integer) |
---
### Table Functions
| **Function** | **Description** | **Parameters** |
|--------------------------|------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `cd(path)` | Change the current working directory. | `path`: Target directory path (String) |
| `ls(path, skip_permission_denied)`| List files in a directory. Defaults to the current directory if `path` is not provided. | `path` (optional): Directory path (String), default is `pwd`
`skip_permission_denied` (optional): Boolean, default is `true` |
| `lsr(path, depth, skip_permission_denied)`| List files in a directory recursively. Defaults to no depth limit and the current directory. | `path` (optional): Directory path (String), default is `pwd`
`depth` (optional): default is `-1`, which is no limit (Integer)
`skip_permission_denied` (optional): default is `true` |
---
## Building
### Build steps
Now to build the extension, run:
```sh
make
```
The main binaries that will be built are:
```sh
./build/release/duckdb
./build/release/test/unittest
./build/release/extension/hostfs/hostfs.duckdb_extension
```
- `duckdb` is the binary for the duckdb shell with the extension code automatically loaded.
- `unittest` is the test runner of duckdb. Again, the extension is already linked into the binary.
- `hostfs.duckdb_extension` is the loadable binary as it would be distributed.
## Running the extension
To run the extension code, simply start the shell with `./build/release/duckdb`.
Now we can use the features from the extension directly in DuckDB. The template contains a single scalar function `hostfs()` that takes a string arguments and returns a string:
```
D select hostfs('Jane') as result;
┌────────────────┐
│ result │
│ varchar │
├────────────────┤
│ Hostfs Jane 🐥 │
└────────────────┘
```
## Running the tests
Different tests can be created for DuckDB extensions. The primary way of testing DuckDB extensions should be the SQL tests in `./test/sql`. These SQL tests can be run using:
```sh
make test
```
### Installing the deployed binaries
To install your extension binaries from S3, you will need to do two things. Firstly, DuckDB should be launched with the
`allow_unsigned_extensions` option set to true. How to set this will depend on the client you're using. Some examples:
CLI:
```shell
duckdb -unsigned
```
Python:
```python
con = duckdb.connect(':memory:', config={'allow_unsigned_extensions' : 'true'})
```
NodeJS:
```js
db = new duckdb.Database(':memory:', {"allow_unsigned_extensions": "true"});
```
Secondly, you will need to set the repository endpoint in DuckDB to the HTTP url of your bucket + version of the extension
you want to install. To do this run the following SQL query in DuckDB:
```sql
SET custom_extension_repository='bucket.s3.eu-west-1.amazonaws.com//latest';
```
Note that the `/latest` path will allow you to install the latest extension version available for your current version of
DuckDB. To specify a specific version, you can pass the version instead.
After running these steps, you can install and load your extension using the regular INSTALL/LOAD commands in DuckDB:
```sql
INSTALL hostfs
LOAD hostfs
```