Implement the file tree in PostgreSQL using ltree
Background
A file tree is a hierarchical structure used to organize files and directories on a computer. It allows users to easily navigate and access their files and folders, and is commonly used in operating systems and file management software.
But implementing file trees in traditional RDBMS like MySQL can be a challenge due to the lack of support for hierarchical data structures. However, there are workarounds such as using nested sets or materialized path approaches. Alternatively, you could consider using NoSQL databases like MongoDB or document-oriented databases like Couchbase, which have built-in support for hierarchical data structures.
It is possible to implement a file tree in PostgreSQL using the ltree
datatype provided by PostgreSQL. This datatype can help us build the hierarchy within the database.
TL;DR
Pros
- Excellent performance!
- No migration is needed for this, as no new columns will be added. Only a new expression index needs to be created.
Cons
- Need additional mechanism to create virtual folder entities.(only if you need to show the folder level)
- There are limitations on the file/folder name length.(especially in non-ASCII characters)
Limitation
The maximum length for a file or directory name is limited, and in the worst case scenario where non-ASCII characters(Chinese) and alphabets are interlaced, it can not be longer than 33 characters. Even if all the characters are Chinese, the name can not exceed 62 characters in length.
Based on PostgreSQL documentation, the label path can not exceed 65535 labels. However, in most cases, this limit should be sufficient and it is unlikely that you would need to nest directories to such a deep level.
1 | select escape_filename_for_ltree( |
1 | [42622] ERROR: label string is too long Detail: Label length is 259, must be at most 255, at character 260. Where: PL/pgSQL function escape_filename_for_ltree(text) line 5 at SQL statement |
How to use
Build expression index
1 | CREATE INDEX idx_file_tree_filename ON files using gist (escape_filename_for_ltree(filename)); |
Example Query
1 | explain analyse |
Query Result
1 | ow/ros_00000000_2022-03-02-12-55-19_330.bag |
Query Explain
1 | Bitmap Heap Scan on files (cost=32.12..36.38 rows=1 width=28) (actual time=0.341..0.355 rows=8 loops=1) |
Explaination
PostgreSQL's LTREE data type allows you to use a sequence of alphanumeric characters and underscores on the label, with a maximum length of 256 characters. So, we get a special character underscore that can be used as a notation to build our escape rules within the label.
Slashes(/
) will be replaced with dots(.
). I think it does not require further explanation.
Initially, I attempted to encode all non-alphabetic characters into their Unicode hex format. However, after receiving advice from other guys, I discovered that using base64 encoding can be more efficient in terms of information entropy. Ultimately, I decided to use base62 encoding instead to ensure that no illegal characters are produced and to achieve the maximum possible information entropy.
This is the final representation of the physical data that will be stored in the index of PostgreSQL.
1 | select escape_filename_for_ltree('root/folder1/机器人仿真gazebo11-noetic集成ROS1/state.log'); |
Further
If you want to store an isolated file tree in the same table, one thing you need to do is prepend the isolation key as the first label of the ltree. For example:
1 | select escape_filename_for_ltree('<put_user_id_in_there>' || '/' || '<path_to_file>'); |
By doing this, you will get the best query performance.
Summary
This document explains how to implement a file tree in PostgreSQL using the ltree
datatype. The ltree
datatype can help build the hierarchy within the database, and an expression index needs to be created. There are some limitations on the file/folder name length, but the performance is excellent. The document also provides PostgreSQL functions for escaping and encoding file/folder names.
Appendix: PostgreSQL Functions
Entry function (immutable is required)
1 | CREATE OR REPLACE FUNCTION escape_filename_for_ltree(filename TEXT) |
Util: Escape every part (folder or file)
1 | create or replace function escape_part(part text) returns text as |
Util: Split a string into groups
Each group contains only alphabetic characters or non-alphabetic characters.
1 | CREATE OR REPLACE FUNCTION split_string_by_alpha(input_str TEXT) RETURNS SETOF TEXT AS |
Util: base62 encode function
By using the base62_encode function, we can create a string that meets the requirements of LTREE and achieves maximum information entropy.
1 | CREATE OR REPLACE FUNCTION base62_encode(data TEXT) RETURNS TEXT AS $$ |