Research Report · March 2026 · Consolidated Analysis

Cognitive Gap &
Physical Grounding

A Study of Large Language Model Integration in Modern CAD Systems and Industrial Design

Architectural limitations of transformers, geometric & physical hallucinations, industry-specific failures in structural steel and furniture design, economic consequences, and strategies for overcoming barriers

12
chapters
7
tables
5
error types
4
industries
6
solutions
40+
sources
Tokenization destroys metric precision B-Rep topology breaks in autoregressive generation 0.8% B-Rep correctness at IoU-0.50 PINN + NSAI reduce design time 40–85% EU AI Act: up to €35M or 7% revenue fines $4.1B — cost of one undetected design error Tokenization destroys metric precision B-Rep topology breaks in autoregressive generation 0.8% B-Rep correctness at IoU-0.50 PINN + NSAI reduce design time 40–85% EU AI Act: up to €35M or 7% revenue fines $4.1B — cost of one undetected design error
Chapter 1

Introduction & Problem Statement

The state of engineering design by March 2026 is characterized by deep integration of large language models into computer-aided design and building information modeling workflows. Models such as GPT-5, Gemini 2.5 Pro, and Claude 4.5 Sonnet demonstrate unprecedented capabilities in code generation and natural language processing. However, despite this progress, the industry continues to face a fundamental gap between models’ ability to generate syntactically correct code and their ability to comprehend the physical reality of the products being designed.

Designing real-world structures — whether structural steel for buildings and transportation or furniture for mass production — requires more than manipulating geometric primitives. It demands a deep understanding of continuum mechanics, metallurgy, nonlinear fracture processes, ergonomics, and manufacturing constraints. All of these domains remain beyond the statistical horizon of modern transformers.

Key Manifestations of the Gap

Unit Confusion

Conceptual mixing of metric units (millimeters) and angular values (degrees) — translating rotation into linear shift

Impossible Objects

Systematic generation of non-manifold geometry, self-intersecting surfaces, and phantom volumes

Topology Violations

Machining algorithms violating fundamental laws of topology, strength of materials, and engineering judgment

Chapter 2

Fundamental Architectural Barriers

Tokenization, semantic proximity of units, autoregressive error accumulation

2.1 Tokenization & Discretization of Continuous Space

One of the fundamental reasons modern LLMs demonstrate poor accuracy in physical object design is the very nature of numerical data tokenization. Models process information as discrete units — tokens — which fundamentally contradicts the continuous nature of physical quantities used in CAD systems. Standard tokenization algorithms such as Byte-Pair Encoding (BPE) destroy the semantic integrity of numerical data, converting coordinates and tolerances into sets of unrelated symbols.

When a model encounters a number representing a fillet radius or steel sheet thickness, that number is split into tokens based on frequency of occurrence in the training corpus, rather than mathematical meaning. For example, "10.5mm" may be split into tokens "10" + "." + "5" + "mm" — losing all mathematical semantics.

MATHEMATICAL LOWER BOUND ON ERROR
E_total  ε_token + Σ(i=1..n) ε_M(i) + σ²_sampling

where:
  ε_token    — token quantization error (irreducible)
  ε_M(i)     — autoregressive dependency error at step i
  σ²_sampling — sampling variance

For structural steel: tolerance ≤ 0.5 mm required for assembly
→ direct LLM geometry generation is UNRELIABLE without external verification

The “9.9 vs 9.11” problem: Many advanced models incorrectly choose 9.11 as larger, because token “11” carries greater statistical weight than “9”. In CAD: models generate dimensions based on lexical prevalence, not their true position on the number line.

Tokenization Error Typology

Error TypeMechanismDesign Consequence
Discretization noiseContinuous coordinates → finite symbol setTolerance violations in precision assemblies (aerospace)
Autoregressive accumulationSingle decimal error distorts all subsequent predictionsGeometric drift, structural misalignment
Topological connectivity lossSemantic link between spatial neighbors severedNon-manifold edges, infeasible geometries
Scale invarianceCannot distinguish micro-defects from macro-featuresIgnoring stress concentrators in welds
Number fragmentationFractional values split into independent tokensOrder-of-magnitude errors (0.1 instead of 0.01)

2.2 Semantic Proximity of Units of Measurement

Vector representations of "millimeter" and "degree" are formed solely on contextual co-occurrence in technical literature, patents, and code. Since both accompany numerical parameters in drawings and parametric scripts, their vectors are positioned extremely close in the model’s latent space — spawning dangerous translation errors.

✗ LLM OUTPUT — PROPELLER
# User: "3-blade propeller, 120° spacing"
# LLM generated LINEAR OFFSET instead of ROTATION:

for i in range(3):
    blade.translate(x = 120 * i)  # ← mm, not degrees!

# Result: blades DON'T converge at center
# → scattered array of disconnected parts
✓ CORRECT — ROTATION
# Proper: rotate each blade around Z axis

for i in range(3):
    blade.rotate(
        angle = 120 * i deg,
        axis  = Z,
        center = (0,0,0)
    )

# Result: blades meet at center shaft

2.3 Autoregressive Error Accumulation

CharacteristicAlgorithmic CAD Kernels (OpenCascade)Large Language Models (LLMs)
Nature of computationDeterministic, analytical geometryProbabilistic, statistical token prediction
Parameter handlingStrict typing (lengths, angles, radii have meaning)Linguistic embeddings (numbers are just text)
Topological controlContinuous B-Rep integrity checkingAbsent; only external tools
Relationship to physicsBuilt-in constraints on self-intersectionsIgnores physics for lexical coherence
Chapter 3

Geometric Topology & B-Rep Paradigm

3.1 The Paradigm Gap

Modern CAD systems use Boundary Representation (B-Rep) — a complex structure where topological entities (faces, edges, vertices) are linked to geometric surfaces and curves. The core problem: LLMs perceive CAD models as sequences of text commands (OpenSCAD or CadQuery scripts), completely ignoring spatial relationships between elements.

Operations like “chamfer” or “fillet” require explicit selection of B-Rep primitives from the transient model state at execution time. Without access to the geometric kernel during reasoning, LLMs frequently predict operations on non-existent or incorrectly identified faces.

3.2 Specialized Architectures: Pointer-CAD & Graph Neural Networks

By early 2026, architectures like Pointer-CAD and FutureCAD encode B-Rep as an undirected face-adjacency graph G(V, E) using Graph Neural Networks (GNNs), where nodes = faces, edges = shared boundaries.

GNN B-REP ENCODER — NODE FEATURE UPDATE
h_i^(k) = φ^(k)((1 + ε^(k)) · h_i^(k-1) + Σ(jN(i)) f_Θ(h_ij^(k-1)) ⊙ h_j^(k-1))

Pointer mechanism anchors to existing B-Rep primitives
→ segmentation errors reduced by 2 ORDERS OF MAGNITUDE
→ reliable generation of chamfers & fillets for industrial design
100×
fewer segmentation errors vs. pure autoregressive
B-Rep
topology visible before each generation step
Chapter 4

The Illusion of Spatial Intelligence

4.1 Textual Shortcuts & Benchmarks

Modern 3D-LLMs claim to understand three-dimensional scenes, but early 2026 research reveals a critical dependence on linguistic “textual shortcuts.” On popular benchmarks like SQA3D, high results can be achieved by fine-tuning a text model on Q&A pairs without any 3D inputs at all — proving models exploit statistical patterns in descriptions, not genuine 3D analysis.

On the stricter Real-3DQA benchmark (context-guessable questions removed, strict 3D reasoning taxonomy), existing 3D-LLM performance drops by more than 60%, with viewpoint-shift consistency tests showing near-total failure.

4.2 Empirical Accuracy — March 2026

82.4%
Object ID
by name
21.7%
Relative Position
spatial nav
4.1%
Distance
absolute
3.2%
Stability
prediction
0.8%
B-Rep
topology

Models lack the “sense of physics” that humans develop through evolutionary interaction with the physical world — MBZUAI, 2026

Chapter 5

Industry: Structural Steel

Material hallucinations, fatigue strength, welding, residual stress

5.1 Hallucinations in Materials Science & Fatigue

LLMs frequently exhibit “intrinsic hallucinations” — reasoning errors based on false internal knowledge grounding. A model may propose a steel beam design based on static strength data while completely ignoring cyclic loading that leads to fatigue failure. The nonlinear nature of Wöhler curves (S-N curves) is not understood; models interpolate safety values linearly — which is unacceptable in engineering practice.

A component thickness change of just 1.2× can accelerate fatigue crack propagation by 8.5× — but LLMs completely miss this.

Steel GradeYield (MPa)Failure Characteristics Ignored by LLMs
E36 (Shipbuilding)~355High weld seam sensitivity to brittle fracture at low temperatures
DC04 (Sheet)~210–270Strong dependence on strain hardening during bending
AISI 316L (Additive)~290Anisotropy depending on 3D print build orientation
S500MC (High-str.)~500Cracking susceptibility in heat-affected welding zone

5.2 Welding & Residual Stress

Welding is a complex thermomechanical process causing significant residual stresses and geometric distortions. LLMs lack “world models” that predict how welding heat alters geometry after cooling. They propose weld sequences based on “aesthetic” symmetry — not actual thermal field distribution — causing beam warping beyond tolerances.

Fundamental process understanding belongs to specialized spatiotemporal graph neural networks (STGNNs) trained on FEA simulation data to predict thermal histories — not to general-purpose language models.

Chapter 6

Industry: Furniture & Interior Design

6.1 Ignoring Gravity

LLMs design heavy stone countertops on low-stiffness supports. No understanding of load paths — the continuous connection from mass to ground. Models possess the “texture of expert discourse” but not the physics behind it.

6.2 Ergonomic Violations

Chair: 20 cm seat depth + 1.5 m backrest. Bar stool footrest at 85 cm — physically unusable. Root cause: no embodiment. The model has never sat in a chair.

6.3 Pixel-to-Product Gap

“Slot-machine randomness” — dimensions change unpredictably per generation. Solution: object-level in-place editing + linking visual tokens to real supplier SKUs.

Chapter 7

Physical Inadequacy in Generated Models

7.1 Topological Defects

Non-manifold Geometry

Edges shared by 3+ faces, isolated vertices in void, internal faces intersecting enclosed volumes

Self-intersecting Profiles

AI generates 2D sketches with self-crossing lines, then extrudes them → phantom volumes

Zero-area Faces

Insufficient coordinate precision in Bézier/NURBS surface merging → micro-gaps

7.3 Fatal Manufacturing Planning Errors

FATAL
Absurd Fixturing

AI proposes clamping workpiece, milling, then rotating the clamping block 90° for drilling — not realizing the fixture blocks all tool access

FATAL
Ignoring Rigidity (L/D > 10:1)

Part exceeds 10:1 ratio; AI ignores bending risk. When prompted, suggests rigid tailstock support — which would instantly bend and destroy the thin-walled brass tube

FATAL
Tool Collision & Incorrect Datums

Recommends tools that physically can’t reach the machining zone. Sets Z0 on rough raw stock surface instead of pre-faced surface — impossible micron-level tolerances

Chapter 8

Economic Consequences & Production Risks

8.1 Geometric Cost Escalation of Errors

$0.57
Part cost
GM ignition switch
$41K
Fix at CAD stage
Drawing correction
$23K
Wrong material
15 mm vs 1.5 mm steel
$4.1B
Shipped defect
GM total recall losses

8.3 Automation Bias — The Hidden Multiplier

When an engineer receives a syntactically correct, visually flawless 3D model from an LLM, their critical thinking is dulled. The illusion of competence — impeccable grammar, sophisticated professional jargon — makes the “human-in-the-loop” approach ineffective. Engineers gradually lose the “feel for metal” and spatial intuition needed to detect subtle geometric paradoxes.

Design ParameterLLM ErrorManufacturing Consequence
TolerancesZero clearance or confused unitsParts can’t assemble; fusion during printing
Wall thicknessBelow physical limitBrittleness, layer skips, part destruction
OrientationVisual aesthetics priorityLow interlayer adhesion, load failure
Boolean opsOverlapping without subtractionMissing functional holes
Chapter 9

Legal Regulation & Data Barriers

9.1

EU AI Act

AI in critical infrastructure = high-risk systems. Requires risk management, documentation, logging, human oversight.

€35M / 7%
max penalty (global turnover)
9.2

Training Data Deficit

Engineering data (CAD models, test reports, defect maps) is proprietary IP. Models train on simplified textbook examples. STEP files contain complex graphs — unfriendly to ML tokenization.

9.3

Legal Liability

If AI-designed structure fails → liability falls entirely on engineer & design firm. “Black box” opacity makes cause tracing impossible. Legally untenable for critical systems.

Data BarrierProblemImpact on LLM Intelligence
Trade secret regimeCAD data under NDA — not publicModels train on simplified textbook examples, not real products
Modality gapText documentation ≠ product geometryModel knows “how to describe” a defect, not “how to see” it in 3D
EU AI ActTransparency & dataset quality requirementsRestrictions on synthetic data for critical systems
Agent liabilityAmbiguous fault attributionCompanies reluctant to delegate decision authority to LLMs
Chapter 10

Strategies for Overcoming Limitations

PINNs, Neuro-Symbolic AI, CAD-Tokenizer, World Models

10.1 Physics-Informed Neural Networks (PINNs)

PINNs embed governing differential equations (PDEs) directly into the loss function, penalizing violations of conservation laws — unlike LLMs which only minimize text prediction error.

PINN LOSS FUNCTION
L_PINN = L_data + λ_phys · L_phys + L_BC

L_phys = PDE residual (Navier-Stokes, heat conduction...)
→ penalizes violation of mass/energy conservation

Euler-Bernoulli beam equation for furniture frames:
d²/dx²(EI · d²w/dx²) = q(x)
w = deflection, E = elastic modulus, I = moment of inertia, q = load
ParameterTraditional FEMPINN (2026)
MeshComplex discretization requiredMesh-free method
Noisy dataSensitive to geometry errorsRobust to noise
Parametric speedLow (full recalculation)High (fast approximation)
LLM integrationDifficult (different domains)Native through loss functions
40–85% design time reduction with hybrid LLM + PINN frameworks

10.2 Neuro-Symbolic AI (NSAI)

The most promising approach: combining neural network creativity with strict, deterministic logic of classical symbolic computation (CAD kernels, FEA simulators, physics engines). A closed recursive loop of generation → verification → correction.

Stage 1 — Neural Hypothesis Generation

User describes part → LLM writes draft design script

Stage 2 — Symbolic Verification

Code → strict symbolic engine: compiles geometry, checks self-intersections, tolerances, runs FEA

Stage 3 — Feedback & Regeneration

If failure: mathematical error report → back to LLM → cycle until convergence

ParadigmLLM RolePhysics ControlFatal Error Risk
Pure LLMFinal product from promptAbsentEXTREME
NSAIHypothesis + iterative correctionStrict — FEA/CAD solvers block bad codeLOW
94% specification accuracy achieved — zero life-threatening hallucinations

10.3 The Tokenization Revolution: CAD-Tokenizer

VQ-VAE compresses operation pairs (e.g., “create 2D sketch” + “extrusion”) into single discrete tokens preserving geometric semantics. Finite-State Automaton (FSA) decoding forcibly injects strict CAD grammar into generation — blocking tokens that would cause self-intersection or B-Rep violations. Adaptive computation allocation lets AI dynamically allocate reasoning time proportional to geometric complexity.

VQ-VAE
Modality-specific tokens
FSA Guard
Blocks invalid geometry at generation time
Adaptive Compute
More thinking time for complex joints

10.4 World Models & Spatial Intelligence

The future: transition from language models to “world models” — systems that perceive the physical environment through visual-geometric simulation, not text. NVIDIA Cosmos 3: first world foundation model unifying synthetic world generation, visual reasoning, and action simulation. World Labs: “3D as code” — 3D space as universal interface for generating, editing, and simulating worlds.

30×
faster than Chain-of-Thought via latent reasoning
Digital Twins
Physically accurate training before real deployment
Usage Scenarios
Load on chair edge, cabinet door clearance, lighting simulation
Chapter 11

Autodesk Fusion & SolidWorks AURA

Autodesk Fusion 360

AutoConstrain + Reinforcement Learning

Uses RL adapted from LLM research for automatic geometric constraints. Understands “design intent” — changing table width auto-preserves leg symmetry via functional role understanding.

93%
fully constrained (RL)
8.9%
baseline model

SolidWorks AURA

LEO + MARIE Virtual Companions

Specialized LLM in secure cloud with access to vast technical standards database. LEO: mechanical design + simulation, assembly structures, STEP error resolution. MARIE: materials science + chemistry for complex-condition material selection.

LEO MARIE AURA

2026: “Subdivision-native” workflows — designers work with control meshes while AI updates FEA meshes in real time. The end of LLMs as chatbots, the beginning of integrated engineering partners.

Chapter 12

Summary & Engineering Recommendations

Modern LLMs as of March 2026 remain powerful copilots for scripting and documentation automation but are not full-fledged design engineers. Their weak physics understanding stems from architectural tokenization limitations, absent direct CAD kernel connections, and lack of proprietary physical data access.

LLM hallucinations — confusion of angular/metric quantities, generation of physically meaningless structures — are not temporary bugs but systemic technological limitations driven by natural language tokenization dominance over analytical geometry principles.

Design demands precision that, within the current LLM paradigm, is a statistical coincidence rather than a physical necessity.

Engineering Recommendations for 2026

1
LLMs for base code only

Generate parametric code structure (CadQuery, Dynamo, OpenSCAD) with mandatory manual review of geometric logic

2
Hybrid neuro-symbolic architectures

LLM output filtered through PINNs or classical FEA solvers (Parasolid, OpenCASCADE)

3
Monitor model drift & hallucinations

Specialized tools to detect tolerance and material property hallucinations in real time

4
Specialized CAD tokenizers & graph representations

Pointer-CAD, CAD-Tokenizer — deeper AI “grounding” in spatial topology

5
Enforce strict type-wrapping

All numerical values in strongly typed classes: Length(90) vs Angle(90) — block execution at syntax level

6
EU AI Act compliance

Risk management systems, technical documentation, and human oversight for critical infrastructure design

References

40+ Sources & References

The future of design lies not in
larger vocabularies but in
physical grounding

Where the probabilistic nature of neural networks is strictly governed by deterministic mathematical CAD kernels — from discrete tokens to continuous spatial representations.

0.8%
B-Rep accuracy
8.5×
crack acceleration
$4.1B
max loss
94%
NSAI accuracy
30×
latent speed
100×
fewer errors