Lidar2Forest

Overview

Lidar2Forest, whose full title is 3D Forest Vegetation Segmentation and Procedural Modeling from LiDAR Point Clouds and Spherical Images, was developed for a customer.

The goal was to build a configuration-driven pipeline that transforms raw LiDAR point cloud data of forests into structured 3D vegetation models, extracting per-tree metrics and producing Blender-ready scenes for visualization and analysis. The system takes colored point cloud data (.las/.laz files) as input and outputs classified point clouds, tree instance labels, structural parameter tables, optimized meshes, and rendered 3D images via Blender.

Since the timeline was extremely strict and resources scarce, the pipeline was delivered as-is, without larger cleanup and parameter fine tuning that would improve the output quality. For this reason a lot of the more ambitious features were left for future development as well.

Purpose and Motivation

Forestry robotics and environmental monitoring require accurate, interpretable representations of complex vegetation. Raw LiDAR point clouds contain rich geometric information but are difficult to use directly for mapping, measurement, simulation, or high-quality visualization. Lidar2Forest bridges this gap by turning point clouds into structured vegetation entities (tree instances with measured parameters) and procedural 3D models (parametric trees, optimized meshes, and Blender scenes).

The system was designed to prioritize integration over reimplementation, leveraging mature open-source libraries such as PDAL, Open3D, scikit-learn, and PyTorch. The pipeline is fully configuration-driven with YAML profiles and Pydantic v2 validation for reproducibility. It provides a CPU-only baseline alongside optional GPU acceleration for practicality, and uses a registry pattern so new algorithms can be added without modifying pipeline code.

What Was Done

Pipeline Stages

The implemented pipeline consists of eight stages, each configurable and independently toggleable.

Preprocessing removes statistical outliers via PDAL, preserves RGB attributes, and validates point cloud quality.

Ground Classification separates ground from vegetation using the Cloth Simulation Filter (CSF), producing a DTM and height-above-ground values.

Tree Segmentation detects individual tree instances via HDBSCAN clustering (classical, CPU-only) or TreeLearn deep learning (experimental, GPU). A novel Vertical HDBSCAN variant was developed that clusters stem-layer points first and assigns canopy points to the nearest stems, handling dense overlapping canopies better than standard 2D clustering.

Species Classification provides an experimental Random Forest baseline for per-tree species labeling. It was implemented but not validated within the project timeframe, with PointMLP integration planned as future work.

Parameter Extraction computes comprehensive per-tree metrics: DBH via RANSAC cylinder fitting, crown metrics via 3D alpha-shapes, wood-leaf separation, and biomass estimation with over 20 allometric equations. Optional TreeQSM integration provides additional structural detail.

Mesh Generation handles procedural tree generation using three algorithms (Sapling/Weber & Penn, Space Colonization, and L-System grammars), terrain meshing via Poisson surface reconstruction, RGB color transfer, UV coordinate generation, mesh optimization with quadric decimation and Taubin smoothing, and LOD hierarchy generation.

Blender Scene Assembly automates scene construction with hierarchical collections, Geometry Nodes-based instancing (143x memory reduction), PBR materials with a species texture library, HDRI lighting, and camera configuration.

Blender Rendering provides optional high-quality image rendering with configurable camera presets (11 presets including orbital, bird’s eye, ground level, and cinematic).

TreeQSM Python Port

A companion subproject (treeqsm-py) ported the original MATLAB TreeQSM implementation to Python. TreeQSM reconstructs hierarchical cylinder-based Quantitative Structure Models from point cloud data, providing detailed branch topology, volume estimates, and structural metrics. The Python port eliminates the MATLAB license requirement and enables seamless integration with the lidar2forest pipeline. The port was primarily done with LLM assistance and includes fixes and guardrails specific to the lidar2forest use case.

CLI and Configuration System

A complete CLI was built with 7 commands (process, validate, validate-input, info, batch, list-profiles, show-config) and 4 configuration profiles:

Profile	Use Case	GPU Required	Rendering
cpu-only	Quick testing	No	Disabled
production	Balanced deployment	Optional	Disabled
research	Maximum accuracy	Yes (9GB+)	Enabled
visualization	High-quality 3D	Yes (8GB+)	Enabled (4K)

Configuration follows a hierarchical merge: Default, Profile, User config, then CLI overrides.

Implementation Scale

The final codebase comprises approximately 30,000 lines of code (excluding external libraries):

Subsystem	Lines of Code
Core/CLI	~5,800
Segmentation	~2,700
Species Classification (experimental)	~700
Parameter Extraction	~6,200
Meshing (procedural + terrain + optimization)	~9,200
Blender Integration	~5,500
Test Suite	~15,600

Key Technical Achievements

Geometry Nodes-based instancing reduced Blender scene memory from roughly 50 GB to roughly 350 MB for 1000-tree scenes. Parallel processing yielded a 6x speedup for parameter extraction and 5-6x for meshing on 8 cores, bringing total processing time from around 57 minutes down to around 22 minutes for 100 trees. The pipeline also includes automatic detection and correction of blue-shifted LiDAR color data. The Vertical HDBSCAN segmentation approach preserves foliage in dense forests with overlapping canopies where standard methods struggle.

Project Timeline

The project ran for approximately 10 weeks, from mid-October to late December 2025.

Period	Milestone
Oct 15 to Oct 25	Project plan, technology selection, state-of-the-art research
Oct 26 to Nov 15	Ground classification (PDAL CSF) and tree segmentation (HDBSCAN)
Nov 16 to Nov 30	Parameter extraction, TreeQSM integration, metric exports
Nov 28 to Dec 15	Visualization pipeline: procedural meshing, mesh optimization, LOD, Blender assembly and rendering
Dec 16 to Dec 23	Reporting, documentation, demoing

Development Methodology

The project followed an Agile/Scrum-inspired methodology adapted for a small academic team. Work was organized into weekly sprints running Monday to Sunday, with customer meetings every Monday for sprint review and planning. A working end-to-end pipeline was prioritized early, with functionality added incrementally. Version control used Git with conventional commit messages on the main branch, and the project included unit tests, integration tests, and CLI-based validation.

Known Limitations and Future Work

Several features were implemented but not fully validated due to resource constraints. The TreeLearn deep learning segmentation includes a mock mode but was not tested with real pre-trained models due to hardware limitations. The species classification Random Forest baseline is present but not validated on independent datasets, with PointMLP integration planned. LiDAR-image fusion for spherical images remains optional future work. Tree branch topology is not copied accurately from point cloud data to procedural models, and procedural foliage does not yet closely match real-world tree appearance.

Recommended future directions include improving foliage and branch accuracy, validating TreeLearn on NVIDIA hardware, implementing PointMLP species classification, adding ForestFormer3D segmentation, field validation with ground truth measurements, and seasonal comparison capabilities.

Technology Stack

Core: Python 3.11, PDAL, Open3D, NumPy, SciPy, scikit-learn, Click, Pydantic v2

Deep Learning (optional): PyTorch, TreeLearn, MinkowskiEngine

Visualization: Blender 4.0+ (Python API), Trimesh

Mesh Processing: Open3D, PyMeshLab

QSM: treeqsm-py (custom Python port of MATLAB TreeQSM)

Research Foundation

The project integrates methods from published research. Ground classification uses PDAL CSF (Zhang et al. 2016) with 90-98% accuracy. Tree segmentation relies on HDBSCAN (McInnes et al. 2017) achieving 85-93% F1, alongside TreeLearn (Ecker et al. 2024) with 91.8% coverage. TreeQSM follows Raumonen et al. 2013 for hierarchical cylinder fitting. Procedural generation draws on Weber & Penn 1995 (Sapling), Runions et al. 2007 (Space Colonization), and Prusinkiewicz & Lindenmayer 1990 (L-Systems). Mesh optimization uses Garland & Heckbert 1997 for quadric decimation and Taubin 1995 for smoothing, while surface reconstruction follows Kazhdan et al. 2006 and 2013 for Poisson reconstruction.