A Study of Large Language Model Integration in Modern CAD Systems and Industrial Design
Architectural limitations of transformers, geometric & physical hallucinations, industry-specific failures in structural steel and furniture design, economic consequences, and strategies for overcoming barriers
From fundamental architecture barriers to industrial solutions and engineering recommendations
The state of engineering design by March 2026 is characterized by deep integration of large language models into computer-aided design and building information modeling workflows. Models such as GPT-5, Gemini 2.5 Pro, and Claude 4.5 Sonnet demonstrate unprecedented capabilities in code generation and natural language processing. However, despite this progress, the industry continues to face a fundamental gap between models’ ability to generate syntactically correct code and their ability to comprehend the physical reality of the products being designed.
Designing real-world structures — whether structural steel for buildings and transportation or furniture for mass production — requires more than manipulating geometric primitives. It demands a deep understanding of continuum mechanics, metallurgy, nonlinear fracture processes, ergonomics, and manufacturing constraints. All of these domains remain beyond the statistical horizon of modern transformers.
Conceptual mixing of metric units (millimeters) and angular values (degrees) — translating rotation into linear shift
Systematic generation of non-manifold geometry, self-intersecting surfaces, and phantom volumes
Machining algorithms violating fundamental laws of topology, strength of materials, and engineering judgment
Tokenization, semantic proximity of units, autoregressive error accumulation
One of the fundamental reasons modern LLMs demonstrate poor accuracy in physical object design is the very nature of numerical data tokenization. Models process information as discrete units — tokens — which fundamentally contradicts the continuous nature of physical quantities used in CAD systems. Standard tokenization algorithms such as Byte-Pair Encoding (BPE) destroy the semantic integrity of numerical data, converting coordinates and tolerances into sets of unrelated symbols.
When a model encounters a number representing a fillet radius or steel sheet thickness, that number is split into tokens based on frequency of occurrence in the training corpus, rather than mathematical meaning. For example, "10.5mm" may be split into tokens "10" + "." + "5" + "mm" — losing all mathematical semantics.
E_total ≥ ε_token + Σ(i=1..n) ε_M(i) + σ²_sampling
where:
ε_token — token quantization error (irreducible)
ε_M(i) — autoregressive dependency error at step i
σ²_sampling — sampling variance
For structural steel: tolerance ≤ 0.5 mm required for assembly
→ direct LLM geometry generation is UNRELIABLE without external verification
The “9.9 vs 9.11” problem: Many advanced models incorrectly choose 9.11 as larger, because token “11” carries greater statistical weight than “9”. In CAD: models generate dimensions based on lexical prevalence, not their true position on the number line.
| Error Type | Mechanism | Design Consequence |
|---|---|---|
| Discretization noise | Continuous coordinates → finite symbol set | Tolerance violations in precision assemblies (aerospace) |
| Autoregressive accumulation | Single decimal error distorts all subsequent predictions | Geometric drift, structural misalignment |
| Topological connectivity loss | Semantic link between spatial neighbors severed | Non-manifold edges, infeasible geometries |
| Scale invariance | Cannot distinguish micro-defects from macro-features | Ignoring stress concentrators in welds |
| Number fragmentation | Fractional values split into independent tokens | Order-of-magnitude errors (0.1 instead of 0.01) |
Vector representations of "millimeter" and "degree" are formed solely on contextual co-occurrence in technical literature, patents, and code. Since both accompany numerical parameters in drawings and parametric scripts, their vectors are positioned extremely close in the model’s latent space — spawning dangerous translation errors.
# User: "3-blade propeller, 120° spacing"
# LLM generated LINEAR OFFSET instead of ROTATION:
for i in range(3):
blade.translate(x = 120 * i) # ← mm, not degrees!
# Result: blades DON'T converge at center
# → scattered array of disconnected parts
# Proper: rotate each blade around Z axis
for i in range(3):
blade.rotate(
angle = 120 * i deg,
axis = Z,
center = (0,0,0)
)
# Result: blades meet at center shaft
| Characteristic | Algorithmic CAD Kernels (OpenCascade) | Large Language Models (LLMs) |
|---|---|---|
| Nature of computation | Deterministic, analytical geometry | Probabilistic, statistical token prediction |
| Parameter handling | Strict typing (lengths, angles, radii have meaning) | Linguistic embeddings (numbers are just text) |
| Topological control | Continuous B-Rep integrity checking | Absent; only external tools |
| Relationship to physics | Built-in constraints on self-intersections | Ignores physics for lexical coherence |
Modern CAD systems use Boundary Representation (B-Rep) — a complex structure where topological entities (faces, edges, vertices) are linked to geometric surfaces and curves. The core problem: LLMs perceive CAD models as sequences of text commands (OpenSCAD or CadQuery scripts), completely ignoring spatial relationships between elements.
Operations like “chamfer” or “fillet” require explicit selection of B-Rep primitives from the transient model state at execution time. Without access to the geometric kernel during reasoning, LLMs frequently predict operations on non-existent or incorrectly identified faces.
By early 2026, architectures like Pointer-CAD and FutureCAD encode B-Rep as an undirected face-adjacency graph G(V, E) using Graph Neural Networks (GNNs), where nodes = faces, edges = shared boundaries.
h_i^(k) = φ^(k)((1 + ε^(k)) · h_i^(k-1) + Σ(j∈N(i)) f_Θ(h_ij^(k-1)) ⊙ h_j^(k-1))
Pointer mechanism anchors to existing B-Rep primitives
→ segmentation errors reduced by 2 ORDERS OF MAGNITUDE
→ reliable generation of chamfers & fillets for industrial design
Modern 3D-LLMs claim to understand three-dimensional scenes, but early 2026 research reveals a critical dependence on linguistic “textual shortcuts.” On popular benchmarks like SQA3D, high results can be achieved by fine-tuning a text model on Q&A pairs without any 3D inputs at all — proving models exploit statistical patterns in descriptions, not genuine 3D analysis.
On the stricter Real-3DQA benchmark (context-guessable questions removed, strict 3D reasoning taxonomy), existing 3D-LLM performance drops by more than 60%, with viewpoint-shift consistency tests showing near-total failure.
Models lack the “sense of physics” that humans develop through evolutionary interaction with the physical world — MBZUAI, 2026
Material hallucinations, fatigue strength, welding, residual stress
LLMs frequently exhibit “intrinsic hallucinations” — reasoning errors based on false internal knowledge grounding. A model may propose a steel beam design based on static strength data while completely ignoring cyclic loading that leads to fatigue failure. The nonlinear nature of Wöhler curves (S-N curves) is not understood; models interpolate safety values linearly — which is unacceptable in engineering practice.
A component thickness change of just 1.2× can accelerate fatigue crack propagation by 8.5× — but LLMs completely miss this.
| Steel Grade | Yield (MPa) | Failure Characteristics Ignored by LLMs |
|---|---|---|
| E36 (Shipbuilding) | ~355 | High weld seam sensitivity to brittle fracture at low temperatures |
| DC04 (Sheet) | ~210–270 | Strong dependence on strain hardening during bending |
| AISI 316L (Additive) | ~290 | Anisotropy depending on 3D print build orientation |
| S500MC (High-str.) | ~500 | Cracking susceptibility in heat-affected welding zone |
Welding is a complex thermomechanical process causing significant residual stresses and geometric distortions. LLMs lack “world models” that predict how welding heat alters geometry after cooling. They propose weld sequences based on “aesthetic” symmetry — not actual thermal field distribution — causing beam warping beyond tolerances.
Fundamental process understanding belongs to specialized spatiotemporal graph neural networks (STGNNs) trained on FEA simulation data to predict thermal histories — not to general-purpose language models.
LLMs design heavy stone countertops on low-stiffness supports. No understanding of load paths — the continuous connection from mass to ground. Models possess the “texture of expert discourse” but not the physics behind it.
Chair: 20 cm seat depth + 1.5 m backrest. Bar stool footrest at 85 cm — physically unusable. Root cause: no embodiment. The model has never sat in a chair.
“Slot-machine randomness” — dimensions change unpredictably per generation. Solution: object-level in-place editing + linking visual tokens to real supplier SKUs.
Edges shared by 3+ faces, isolated vertices in void, internal faces intersecting enclosed volumes
AI generates 2D sketches with self-crossing lines, then extrudes them → phantom volumes
Insufficient coordinate precision in Bézier/NURBS surface merging → micro-gaps
AI proposes clamping workpiece, milling, then rotating the clamping block 90° for drilling — not realizing the fixture blocks all tool access
Part exceeds 10:1 ratio; AI ignores bending risk. When prompted, suggests rigid tailstock support — which would instantly bend and destroy the thin-walled brass tube
Recommends tools that physically can’t reach the machining zone. Sets Z0 on rough raw stock surface instead of pre-faced surface — impossible micron-level tolerances
When an engineer receives a syntactically correct, visually flawless 3D model from an LLM, their critical thinking is dulled. The illusion of competence — impeccable grammar, sophisticated professional jargon — makes the “human-in-the-loop” approach ineffective. Engineers gradually lose the “feel for metal” and spatial intuition needed to detect subtle geometric paradoxes.
| Design Parameter | LLM Error | Manufacturing Consequence |
|---|---|---|
| Tolerances | Zero clearance or confused units | Parts can’t assemble; fusion during printing |
| Wall thickness | Below physical limit | Brittleness, layer skips, part destruction |
| Orientation | Visual aesthetics priority | Low interlayer adhesion, load failure |
| Boolean ops | Overlapping without subtraction | Missing functional holes |
AI in critical infrastructure = high-risk systems. Requires risk management, documentation, logging, human oversight.
Engineering data (CAD models, test reports, defect maps) is proprietary IP. Models train on simplified textbook examples. STEP files contain complex graphs — unfriendly to ML tokenization.
If AI-designed structure fails → liability falls entirely on engineer & design firm. “Black box” opacity makes cause tracing impossible. Legally untenable for critical systems.
| Data Barrier | Problem | Impact on LLM Intelligence |
|---|---|---|
| Trade secret regime | CAD data under NDA — not public | Models train on simplified textbook examples, not real products |
| Modality gap | Text documentation ≠ product geometry | Model knows “how to describe” a defect, not “how to see” it in 3D |
| EU AI Act | Transparency & dataset quality requirements | Restrictions on synthetic data for critical systems |
| Agent liability | Ambiguous fault attribution | Companies reluctant to delegate decision authority to LLMs |
PINNs, Neuro-Symbolic AI, CAD-Tokenizer, World Models
PINNs embed governing differential equations (PDEs) directly into the loss function, penalizing violations of conservation laws — unlike LLMs which only minimize text prediction error.
L_PINN = L_data + λ_phys · L_phys + L_BC
L_phys = PDE residual (Navier-Stokes, heat conduction...)
→ penalizes violation of mass/energy conservation
Euler-Bernoulli beam equation for furniture frames:
d²/dx²(EI · d²w/dx²) = q(x)
w = deflection, E = elastic modulus, I = moment of inertia, q = load
| Parameter | Traditional FEM | PINN (2026) |
|---|---|---|
| Mesh | Complex discretization required | Mesh-free method |
| Noisy data | Sensitive to geometry errors | Robust to noise |
| Parametric speed | Low (full recalculation) | High (fast approximation) |
| LLM integration | Difficult (different domains) | Native through loss functions |
The most promising approach: combining neural network creativity with strict, deterministic logic of classical symbolic computation (CAD kernels, FEA simulators, physics engines). A closed recursive loop of generation → verification → correction.
User describes part → LLM writes draft design script
Code → strict symbolic engine: compiles geometry, checks self-intersections, tolerances, runs FEA
If failure: mathematical error report → back to LLM → cycle until convergence
| Paradigm | LLM Role | Physics Control | Fatal Error Risk |
|---|---|---|---|
| Pure LLM | Final product from prompt | Absent | EXTREME |
| NSAI | Hypothesis + iterative correction | Strict — FEA/CAD solvers block bad code | LOW |
VQ-VAE compresses operation pairs (e.g., “create 2D sketch” + “extrusion”) into single discrete tokens preserving geometric semantics. Finite-State Automaton (FSA) decoding forcibly injects strict CAD grammar into generation — blocking tokens that would cause self-intersection or B-Rep violations. Adaptive computation allocation lets AI dynamically allocate reasoning time proportional to geometric complexity.
The future: transition from language models to “world models” — systems that perceive the physical environment through visual-geometric simulation, not text. NVIDIA Cosmos 3: first world foundation model unifying synthetic world generation, visual reasoning, and action simulation. World Labs: “3D as code” — 3D space as universal interface for generating, editing, and simulating worlds.
Uses RL adapted from LLM research for automatic geometric constraints. Understands “design intent” — changing table width auto-preserves leg symmetry via functional role understanding.
Specialized LLM in secure cloud with access to vast technical standards database. LEO: mechanical design + simulation, assembly structures, STEP error resolution. MARIE: materials science + chemistry for complex-condition material selection.
2026: “Subdivision-native” workflows — designers work with control meshes while AI updates FEA meshes in real time. The end of LLMs as chatbots, the beginning of integrated engineering partners.
Modern LLMs as of March 2026 remain powerful copilots for scripting and documentation automation but are not full-fledged design engineers. Their weak physics understanding stems from architectural tokenization limitations, absent direct CAD kernel connections, and lack of proprietary physical data access.
LLM hallucinations — confusion of angular/metric quantities, generation of physically meaningless structures — are not temporary bugs but systemic technological limitations driven by natural language tokenization dominance over analytical geometry principles.
Design demands precision that, within the current LLM paradigm, is a statistical coincidence rather than a physical necessity.
Generate parametric code structure (CadQuery, Dynamo, OpenSCAD) with mandatory manual review of geometric logic
LLM output filtered through PINNs or classical FEA solvers (Parasolid, OpenCASCADE)
Specialized tools to detect tolerance and material property hallucinations in real time
Pointer-CAD, CAD-Tokenizer — deeper AI “grounding” in spatial topology
All numerical values in strongly typed classes: Length(90) vs Angle(90) — block execution at syntax level
Risk management systems, technical documentation, and human oversight for critical infrastructure design
Where the probabilistic nature of neural networks is strictly governed by deterministic mathematical CAD kernels — from discrete tokens to continuous spatial representations.