A file tree is a hierarchical structure used to organize files and directories on a computer. It allows users to easily navigate and access their files and folders, and is commonly used in operating systems and file management software.
But implementing file trees in traditional RDBMS like MySQL can be a challenge due to the lack of support for hierarchical data structures. However, there are workarounds such as using nested sets or materialized path approaches. Alternatively, you could consider using NoSQL databases like MongoDB or document-oriented databases like Couchbase, which have built-in support for hierarchical data structures.
It is possible to implement a file tree in PostgreSQL using the
ltree datatype provided by PostgreSQL. This datatype can help us build the hierarchy within the database.
- Excellent performance!
- No migration is needed for this, as no new columns will be added. Only a new expression index needs to be created.
- Need additional mechanism to create virtual folder entities.(only if you need to show the folder level)
- There are limitations on the file/folder name length.(especially in non-ASCII characters)
The maximum length for a file or directory name is limited, and in the worst case scenario where non-ASCII characters(Chinese) and alphabets are interlaced, it can not be longer than 33 characters. Even if all the characters are Chinese, the name can not exceed 62 characters in length.
Based on PostgreSQL documentation, the label path can not exceed 65535 labels. However, in most cases, this limit should be sufficient and it is unlikely that you would need to nest directories to such a deep level.
 ERROR: label string is too long Detail: Label length is 259, must be at most 255, at character 260. Where: PL/pgSQL function escape_filename_for_ltree(text) line 5 at SQL statement
How to use
Build expression index
CREATE INDEX idx_file_tree_filename ON files using gist (escape_filename_for_ltree(filename));
Bitmap Heap Scan on files (cost=32.12..36.38 rows=1 width=28) (actual time=0.341..0.355 rows=8 loops=1)
PostgreSQL's LTREE data type allows you to use a sequence of alphanumeric characters and underscores on the label, with a maximum length of 256 characters. So, we get a special character underscore that can be used as a notation to build our escape rules within the label.
/) will be replaced with dots(
.). I think it does not require further explanation.
Initially, I attempted to encode all non-alphabetic characters into their Unicode hex format. However, after receiving advice from other guys, I discovered that using base64 encoding can be more efficient in terms of information entropy. Ultimately, I decided to use base62 encoding instead to ensure that no illegal characters are produced and to achieve the maximum possible information entropy.
This is the final representation of the physical data that will be stored in the index of PostgreSQL.
If you want to store an isolated file tree in the same table, one thing you need to do is prepend the isolation key as the first label of the ltree. For example:
select escape_filename_for_ltree('<put_user_id_in_there>' || '/' || '<path_to_file>');
By doing this, you will get the best query performance.
This document explains how to implement a file tree in PostgreSQL using the
ltree datatype. The
ltree datatype can help build the hierarchy within the database, and an expression index needs to be created. There are some limitations on the file/folder name length, but the performance is excellent. The document also provides PostgreSQL functions for escaping and encoding file/folder names.
Appendix: PostgreSQL Functions
Entry function (immutable is required)
CREATE OR REPLACE FUNCTION escape_filename_for_ltree(filename TEXT)
Util: Escape every part (folder or file)
create or replace function escape_part(part text) returns text as
Util: Split a string into groups
Each group contains only alphabetic characters or non-alphabetic characters.
CREATE OR REPLACE FUNCTION split_string_by_alpha(input_str TEXT) RETURNS SETOF TEXT AS
Util: base62 encode function
By using the base62_encode function, we can create a string that meets the requirements of LTREE and achieves maximum information entropy.
CREATE OR REPLACE FUNCTION base62_encode(data TEXT) RETURNS TEXT AS $$