Overview
Lidar2Forest, whose full title is 3D Forest Vegetation Segmentation and Procedural Modeling from LiDAR Point Clouds and Spherical Images, was developed for a customer.
The goal was to build a configuration-driven pipeline that transforms raw LiDAR point cloud data of forests into structured 3D vegetation models, extracting per-tree metrics and producing Blender-ready scenes for visualization and analysis. The system takes colored point cloud data (.las/.laz files) as input and outputs classified point clouds, tree instance labels, structural parameter tables, optimized meshes, and rendered 3D images via Blender.
Since the timeline was extremely strict and resources scarce, the pipeline was delivered as-is, without larger cleanup and parameter fine tuning that would improve the output quality. For this reason a lot of the more ambitious features were left for future development as well.
Purpose and Motivation
Forestry robotics and environmental monitoring require accurate, interpretable representations of complex vegetation. Raw LiDAR point clouds contain rich geometric information but are difficult to use directly for mapping, measurement, simulation, or high-quality visualization. Lidar2Forest bridges this gap by turning point clouds into structured vegetation entities (tree instances with measured parameters) and procedural 3D models (parametric trees, optimized meshes, and Blender scenes).
The system was designed to prioritize integration over reimplementation, leveraging mature open-source libraries such as PDAL, Open3D, scikit-learn, and PyTorch. The pipeline is fully configuration-driven with YAML profiles and Pydantic v2 validation for reproducibility. It provides a CPU-only baseline alongside optional GPU acceleration for practicality, and uses a registry pattern so new algorithms can be added without modifying pipeline code.
What Was Done
Pipeline Stages
The implemented pipeline consists of eight stages, each configurable and independently toggleable.
Preprocessing removes statistical outliers via PDAL, preserves RGB attributes, and validates point cloud quality.
Ground Classification separates ground from vegetation using the Cloth Simulation Filter (CSF), producing a DTM and height-above-ground values.
Tree Segmentation detects individual tree instances via HDBSCAN clustering (classical, CPU-only) or TreeLearn deep learning (experimental, GPU). A novel Vertical HDBSCAN variant was developed that clusters stem-layer points first and assigns canopy points to the nearest stems, handling dense overlapping canopies better than standard 2D clustering.
Species Classification provides an experimental Random Forest baseline for per-tree species labeling. It was implemented but not validated within the project timeframe, with PointMLP integration planned as future work.
Parameter Extraction computes comprehensive per-tree metrics: DBH via RANSAC cylinder fitting, crown metrics via 3D alpha-shapes, wood-leaf separation, and biomass estimation with over 20 allometric equations. Optional TreeQSM integration provides additional structural detail.
Mesh Generation handles procedural tree generation using three algorithms (Sapling/Weber & Penn, Space Colonization, and L-System grammars), terrain meshing via Poisson surface reconstruction, RGB color transfer, UV coordinate generation, mesh optimization with quadric decimation and Taubin smoothing, and LOD hierarchy generation.
Blender Scene Assembly automates scene construction with hierarchical collections, Geometry Nodes-based instancing (143x memory reduction), PBR materials with a species texture library, HDRI lighting, and camera configuration.
Blender Rendering provides optional high-quality image rendering with configurable camera presets (11 presets including orbital, bird’s eye, ground level, and cinematic).
TreeQSM Python Port
A companion subproject (treeqsm-py) ported the original MATLAB TreeQSM implementation to Python. TreeQSM reconstructs hierarchical cylinder-based Quantitative Structure Models from point cloud data, providing detailed branch topology, volume estimates, and structural metrics. The Python port eliminates the MATLAB license requirement and enables seamless integration with the lidar2forest pipeline. The port was primarily done with LLM assistance and includes fixes and guardrails specific to the lidar2forest use case.
CLI and Configuration System
A complete CLI was built with 7 commands (process, validate, validate-input, info, batch, list-profiles, show-config) and 4 configuration profiles:
| Profile | Use Case | GPU Required | Rendering |
|---|
| cpu-only | Quick testing | No | Disabled |
| production | Balanced deployment | Optional | Disabled |
| research | Maximum accuracy | Yes (9GB+) | Enabled |
| visualization | High-quality 3D | Yes (8GB+) | Enabled (4K) |
Configuration follows a hierarchical merge: Default, Profile, User config, then CLI overrides.
Implementation Scale
The final codebase comprises approximately 30,000 lines of code (excluding external libraries):
| Subsystem | Lines of Code |
|---|
| Core/CLI | ~5,800 |
| Segmentation | ~2,700 |
| Species Classification (experimental) | ~700 |
| Parameter Extraction | ~6,200 |
| Meshing (procedural + terrain + optimization) | ~9,200 |
| Blender Integration | ~5,500 |
| Test Suite | ~15,600 |
Key Technical Achievements
Geometry Nodes-based instancing reduced Blender scene memory from roughly 50 GB to roughly 350 MB for 1000-tree scenes. Parallel processing yielded a 6x speedup for parameter extraction and 5-6x for meshing on 8 cores, bringing total processing time from around 57 minutes down to around 22 minutes for 100 trees. The pipeline also includes automatic detection and correction of blue-shifted LiDAR color data. The Vertical HDBSCAN segmentation approach preserves foliage in dense forests with overlapping canopies where standard methods struggle.
Project Timeline
The project ran for approximately 10 weeks, from mid-October to late December 2025.
| Period | Milestone |
|---|
| Oct 15 to Oct 25 | Project plan, technology selection, state-of-the-art research |
| Oct 26 to Nov 15 | Ground classification (PDAL CSF) and tree segmentation (HDBSCAN) |
| Nov 16 to Nov 30 | Parameter extraction, TreeQSM integration, metric exports |
| Nov 28 to Dec 15 | Visualization pipeline: procedural meshing, mesh optimization, LOD, Blender assembly and rendering |
| Dec 16 to Dec 23 | Reporting, documentation, demoing |
Development Methodology
The project followed an Agile/Scrum-inspired methodology adapted for a small academic team. Work was organized into weekly sprints running Monday to Sunday, with customer meetings every Monday for sprint review and planning. A working end-to-end pipeline was prioritized early, with functionality added incrementally. Version control used Git with conventional commit messages on the main branch, and the project included unit tests, integration tests, and CLI-based validation.
Known Limitations and Future Work
Several features were implemented but not fully validated due to resource constraints. The TreeLearn deep learning segmentation includes a mock mode but was not tested with real pre-trained models due to hardware limitations. The species classification Random Forest baseline is present but not validated on independent datasets, with PointMLP integration planned. LiDAR-image fusion for spherical images remains optional future work. Tree branch topology is not copied accurately from point cloud data to procedural models, and procedural foliage does not yet closely match real-world tree appearance.
Recommended future directions include improving foliage and branch accuracy, validating TreeLearn on NVIDIA hardware, implementing PointMLP species classification, adding ForestFormer3D segmentation, field validation with ground truth measurements, and seasonal comparison capabilities.
Technology Stack
Core: Python 3.11, PDAL, Open3D, NumPy, SciPy, scikit-learn, Click, Pydantic v2
Deep Learning (optional): PyTorch, TreeLearn, MinkowskiEngine
Visualization: Blender 4.0+ (Python API), Trimesh
Mesh Processing: Open3D, PyMeshLab
QSM: treeqsm-py (custom Python port of MATLAB TreeQSM)
Research Foundation
The project integrates methods from published research. Ground classification uses PDAL CSF (Zhang et al. 2016) with 90-98% accuracy. Tree segmentation relies on HDBSCAN (McInnes et al. 2017) achieving 85-93% F1, alongside TreeLearn (Ecker et al. 2024) with 91.8% coverage. TreeQSM follows Raumonen et al. 2013 for hierarchical cylinder fitting. Procedural generation draws on Weber & Penn 1995 (Sapling), Runions et al. 2007 (Space Colonization), and Prusinkiewicz & Lindenmayer 1990 (L-Systems). Mesh optimization uses Garland & Heckbert 1997 for quadric decimation and Taubin 1995 for smoothing, while surface reconstruction follows Kazhdan et al. 2006 and 2013 for Poisson reconstruction.