Cognitive Gap & Physical Grounding

Navigation

Table of Contents 12 Chapters

From fundamental architecture barriers to industrial solutions and engineering recommendations

1Introduction & Problem Statement 2Fundamental Architectural Barriers 3Geometric Topology & B-Rep 4The Illusion of Spatial Intelligence 5Industry: Structural Steel 6Industry: Furniture & Interior Design

7Physical Inadequacy in Generated Models 8Economic Consequences & Risks 9Legal Regulation & Data Barriers 10Strategies for Overcoming Limitations 11Autodesk Fusion & SolidWorks AURA 12Summary & Recommendations

Chapter 1

Introduction & Problem Statement

The state of engineering design by March 2026 is characterized by deep integration of large language models into computer-aided design and building information modeling workflows. Models such as GPT-5, Gemini 2.5 Pro, and Claude 4.5 Sonnet demonstrate unprecedented capabilities in code generation and natural language processing. However, despite this progress, the industry continues to face a fundamental gap between models’ ability to generate syntactically correct code and their ability to comprehend the physical reality of the products being designed.

Designing real-world structures — whether structural steel for buildings and transportation or furniture for mass production — requires more than manipulating geometric primitives. It demands a deep understanding of continuum mechanics, metallurgy, nonlinear fracture processes, ergonomics, and manufacturing constraints. All of these domains remain beyond the statistical horizon of modern transformers.

Key Manifestations of the Gap

Unit Confusion

Conceptual mixing of metric units (millimeters) and angular values (degrees) — translating rotation into linear shift

Impossible Objects

Systematic generation of non-manifold geometry, self-intersecting surfaces, and phantom volumes

Topology Violations

Machining algorithms violating fundamental laws of topology, strength of materials, and engineering judgment

Chapter 2

Fundamental Architectural Barriers

Tokenization, semantic proximity of units, autoregressive error accumulation

2.1 Tokenization & Discretization of Continuous Space

One of the fundamental reasons modern LLMs demonstrate poor accuracy in physical object design is the very nature of numerical data tokenization. Models process information as discrete units — tokens — which fundamentally contradicts the continuous nature of physical quantities used in CAD systems. Standard tokenization algorithms such as Byte-Pair Encoding (BPE) destroy the semantic integrity of numerical data, converting coordinates and tolerances into sets of unrelated symbols.

When a model encounters a number representing a fillet radius or steel sheet thickness, that number is split into tokens based on frequency of occurrence in the training corpus, rather than mathematical meaning. For example, "10.5mm" may be split into tokens "10" + "." + "5" + "mm" — losing all mathematical semantics.

MATHEMATICAL LOWER BOUND ON ERROR

E_total ≥ ε_token + Σ(i=1..n) ε_M(i) + σ²_sampling

where:
  ε_token    — token quantization error (irreducible)
  ε_M(i)     — autoregressive dependency error at step i
  σ²_sampling — sampling variance

For structural steel: tolerance ≤ 0.5 mm required for assembly
→ direct LLM geometry generation is UNRELIABLE without external verification

The “9.9 vs 9.11” problem: Many advanced models incorrectly choose 9.11 as larger, because token “11” carries greater statistical weight than “9”. In CAD: models generate dimensions based on lexical prevalence, not their true position on the number line.

Tokenization Error Typology

Error Type	Mechanism	Design Consequence
Discretization noise	Continuous coordinates → finite symbol set	Tolerance violations in precision assemblies (aerospace)
Autoregressive accumulation	Single decimal error distorts all subsequent predictions	Geometric drift, structural misalignment
Topological connectivity loss	Semantic link between spatial neighbors severed	Non-manifold edges, infeasible geometries
Scale invariance	Cannot distinguish micro-defects from macro-features	Ignoring stress concentrators in welds
Number fragmentation	Fractional values split into independent tokens	Order-of-magnitude errors (0.1 instead of 0.01)

2.2 Semantic Proximity of Units of Measurement

Vector representations of "millimeter" and "degree" are formed solely on contextual co-occurrence in technical literature, patents, and code. Since both accompany numerical parameters in drawings and parametric scripts, their vectors are positioned extremely close in the model’s latent space — spawning dangerous translation errors.

✗ LLM OUTPUT — PROPELLER

# User: "3-blade propeller, 120° spacing"
# LLM generated LINEAR OFFSET instead of ROTATION:

for i in range(3):
    blade.translate(x = 120 * i)  # ← mm, not degrees!

# Result: blades DON'T converge at center
# → scattered array of disconnected parts

✓ CORRECT — ROTATION

# Proper: rotate each blade around Z axis

for i in range(3):
    blade.rotate(
        angle = 120 * i deg,
        axis  = Z,
        center = (0,0,0)
    )

# Result: blades meet at center shaft

2.3 Autoregressive Error Accumulation

Characteristic	Algorithmic CAD Kernels (OpenCascade)	Large Language Models (LLMs)
Nature of computation	Deterministic, analytical geometry	Probabilistic, statistical token prediction
Parameter handling	Strict typing (lengths, angles, radii have meaning)	Linguistic embeddings (numbers are just text)
Topological control	Continuous B-Rep integrity checking	Absent; only external tools
Relationship to physics	Built-in constraints on self-intersections	Ignores physics for lexical coherence

Chapter 3

Geometric Topology & B-Rep Paradigm

3.1 The Paradigm Gap

Modern CAD systems use Boundary Representation (B-Rep) — a complex structure where topological entities (faces, edges, vertices) are linked to geometric surfaces and curves. The core problem: LLMs perceive CAD models as sequences of text commands (OpenSCAD or CadQuery scripts), completely ignoring spatial relationships between elements.

Operations like “chamfer” or “fillet” require explicit selection of B-Rep primitives from the transient model state at execution time. Without access to the geometric kernel during reasoning, LLMs frequently predict operations on non-existent or incorrectly identified faces.

3.2 Specialized Architectures: Pointer-CAD & Graph Neural Networks

By early 2026, architectures like Pointer-CAD and FutureCAD encode B-Rep as an undirected face-adjacency graph G(V, E) using Graph Neural Networks (GNNs), where nodes = faces, edges = shared boundaries.

GNN B-REP ENCODER — NODE FEATURE UPDATE

h_i^(k) = φ^(k)((1 + ε^(k)) · h_i^(k-1) + Σ(j∈N(i)) f_Θ(h_ij^(k-1)) ⊙ h_j^(k-1))

Pointer mechanism anchors to existing B-Rep primitives
→ segmentation errors reduced by 2 ORDERS OF MAGNITUDE
→ reliable generation of chamfers & fillets for industrial design

100×

fewer segmentation errors vs. pure autoregressive

B-Rep

topology visible before each generation step

Chapter 4

The Illusion of Spatial Intelligence

4.1 Textual Shortcuts & Benchmarks

Modern 3D-LLMs claim to understand three-dimensional scenes, but early 2026 research reveals a critical dependence on linguistic “textual shortcuts.” On popular benchmarks like SQA3D, high results can be achieved by fine-tuning a text model on Q&A pairs without any 3D inputs at all — proving models exploit statistical patterns in descriptions, not genuine 3D analysis.

On the stricter Real-3DQA benchmark (context-guessable questions removed, strict 3D reasoning taxonomy), existing 3D-LLM performance drops by more than 60%, with viewpoint-shift consistency tests showing near-total failure.

4.2 Empirical Accuracy — March 2026

82.4%

Object ID

by name

21.7%

Relative Position

spatial nav

4.1%

Distance

absolute

3.2%

Stability

prediction

0.8%

B-Rep

topology

Models lack the “sense of physics” that humans develop through evolutionary interaction with the physical world — MBZUAI, 2026

Chapter 5

Industry: Structural Steel

Material hallucinations, fatigue strength, welding, residual stress

5.1 Hallucinations in Materials Science & Fatigue

LLMs frequently exhibit “intrinsic hallucinations” — reasoning errors based on false internal knowledge grounding. A model may propose a steel beam design based on static strength data while completely ignoring cyclic loading that leads to fatigue failure. The nonlinear nature of Wöhler curves (S-N curves) is not understood; models interpolate safety values linearly — which is unacceptable in engineering practice.

A component thickness change of just 1.2× can accelerate fatigue crack propagation by 8.5× — but LLMs completely miss this.

Steel Grade	Yield (MPa)	Failure Characteristics Ignored by LLMs
E36 (Shipbuilding)	~355	High weld seam sensitivity to brittle fracture at low temperatures
DC04 (Sheet)	~210–270	Strong dependence on strain hardening during bending
AISI 316L (Additive)	~290	Anisotropy depending on 3D print build orientation
S500MC (High-str.)	~500	Cracking susceptibility in heat-affected welding zone

5.2 Welding & Residual Stress

Welding is a complex thermomechanical process causing significant residual stresses and geometric distortions. LLMs lack “world models” that predict how welding heat alters geometry after cooling. They propose weld sequences based on “aesthetic” symmetry — not actual thermal field distribution — causing beam warping beyond tolerances.

Fundamental process understanding belongs to specialized spatiotemporal graph neural networks (STGNNs) trained on FEA simulation data to predict thermal histories — not to general-purpose language models.

Chapter 6

Industry: Furniture & Interior Design

6.1 Ignoring Gravity

LLMs design heavy stone countertops on low-stiffness supports. No understanding of load paths — the continuous connection from mass to ground. Models possess the “texture of expert discourse” but not the physics behind it.

6.2 Ergonomic Violations

Chair: 20 cm seat depth + 1.5 m backrest. Bar stool footrest at 85 cm — physically unusable. Root cause: no embodiment. The model has never sat in a chair.

6.3 Pixel-to-Product Gap

“Slot-machine randomness” — dimensions change unpredictably per generation. Solution: object-level in-place editing + linking visual tokens to real supplier SKUs.

Chapter 7

Physical Inadequacy in Generated Models

7.1 Topological Defects

Non-manifold Geometry

Edges shared by 3+ faces, isolated vertices in void, internal faces intersecting enclosed volumes

Self-intersecting Profiles

AI generates 2D sketches with self-crossing lines, then extrudes them → phantom volumes

Zero-area Faces

Insufficient coordinate precision in Bézier/NURBS surface merging → micro-gaps

7.3 Fatal Manufacturing Planning Errors

FATAL

Absurd Fixturing

AI proposes clamping workpiece, milling, then rotating the clamping block 90° for drilling — not realizing the fixture blocks all tool access

FATAL

Ignoring Rigidity (L/D > 10:1)

Part exceeds 10:1 ratio; AI ignores bending risk. When prompted, suggests rigid tailstock support — which would instantly bend and destroy the thin-walled brass tube

FATAL

Tool Collision & Incorrect Datums

Recommends tools that physically can’t reach the machining zone. Sets Z0 on rough raw stock surface instead of pre-faced surface — impossible micron-level tolerances

Chapter 8

Economic Consequences & Production Risks

8.1 Geometric Cost Escalation of Errors

$0.57

Part cost

GM ignition switch

$41K

Fix at CAD stage

Drawing correction

$23K

Wrong material

15 mm vs 1.5 mm steel

$4.1B

Shipped defect

GM total recall losses

8.3 Automation Bias — The Hidden Multiplier

When an engineer receives a syntactically correct, visually flawless 3D model from an LLM, their critical thinking is dulled. The illusion of competence — impeccable grammar, sophisticated professional jargon — makes the “human-in-the-loop” approach ineffective. Engineers gradually lose the “feel for metal” and spatial intuition needed to detect subtle geometric paradoxes.

Design Parameter	LLM Error	Manufacturing Consequence
Tolerances	Zero clearance or confused units	Parts can’t assemble; fusion during printing
Wall thickness	Below physical limit	Brittleness, layer skips, part destruction
Orientation	Visual aesthetics priority	Low interlayer adhesion, load failure
Boolean ops	Overlapping without subtraction	Missing functional holes

Chapter 9

Legal Regulation & Data Barriers

9.1

EU AI Act

AI in critical infrastructure = high-risk systems. Requires risk management, documentation, logging, human oversight.

€35M / 7%

max penalty (global turnover)

9.2

Training Data Deficit

Engineering data (CAD models, test reports, defect maps) is proprietary IP. Models train on simplified textbook examples. STEP files contain complex graphs — unfriendly to ML tokenization.

9.3

Legal Liability

If AI-designed structure fails → liability falls entirely on engineer & design firm. “Black box” opacity makes cause tracing impossible. Legally untenable for critical systems.

Data Barrier	Problem	Impact on LLM Intelligence
Trade secret regime	CAD data under NDA — not public	Models train on simplified textbook examples, not real products
Modality gap	Text documentation ≠ product geometry	Model knows “how to describe” a defect, not “how to see” it in 3D
EU AI Act	Transparency & dataset quality requirements	Restrictions on synthetic data for critical systems
Agent liability	Ambiguous fault attribution	Companies reluctant to delegate decision authority to LLMs

Chapter 10

Strategies for Overcoming Limitations

PINNs, Neuro-Symbolic AI, CAD-Tokenizer, World Models

10.1 Physics-Informed Neural Networks (PINNs)

PINNs embed governing differential equations (PDEs) directly into the loss function, penalizing violations of conservation laws — unlike LLMs which only minimize text prediction error.

PINN LOSS FUNCTION

L_PINN = L_data + λ_phys · L_phys + L_BC

L_phys = PDE residual (Navier-Stokes, heat conduction...)
→ penalizes violation of mass/energy conservation

Euler-Bernoulli beam equation for furniture frames:
d²/dx²(EI · d²w/dx²) = q(x)
w = deflection, E = elastic modulus, I = moment of inertia, q = load

Parameter	Traditional FEM	PINN (2026)
Mesh	Complex discretization required	Mesh-free method
Noisy data	Sensitive to geometry errors	Robust to noise
Parametric speed	Low (full recalculation)	High (fast approximation)
LLM integration	Difficult (different domains)	Native through loss functions

40–85% design time reduction with hybrid LLM + PINN frameworks

10.2 Neuro-Symbolic AI (NSAI)

The most promising approach: combining neural network creativity with strict, deterministic logic of classical symbolic computation (CAD kernels, FEA simulators, physics engines). A closed recursive loop of generation → verification → correction.

Stage 1 — Neural Hypothesis Generation

User describes part → LLM writes draft design script

Stage 2 — Symbolic Verification

Code → strict symbolic engine: compiles geometry, checks self-intersections, tolerances, runs FEA

Stage 3 — Feedback & Regeneration

If failure: mathematical error report → back to LLM → cycle until convergence

Paradigm	LLM Role	Physics Control	Fatal Error Risk
Pure LLM	Final product from prompt	Absent	EXTREME
NSAI	Hypothesis + iterative correction	Strict — FEA/CAD solvers block bad code	LOW

94% specification accuracy achieved — zero life-threatening hallucinations

10.3 The Tokenization Revolution: CAD-Tokenizer

VQ-VAE compresses operation pairs (e.g., “create 2D sketch” + “extrusion”) into single discrete tokens preserving geometric semantics. Finite-State Automaton (FSA) decoding forcibly injects strict CAD grammar into generation — blocking tokens that would cause self-intersection or B-Rep violations. Adaptive computation allocation lets AI dynamically allocate reasoning time proportional to geometric complexity.

VQ-VAE

Modality-specific tokens

FSA Guard

Blocks invalid geometry at generation time

Adaptive Compute

More thinking time for complex joints

10.4 World Models & Spatial Intelligence

The future: transition from language models to “world models” — systems that perceive the physical environment through visual-geometric simulation, not text. NVIDIA Cosmos 3: first world foundation model unifying synthetic world generation, visual reasoning, and action simulation. World Labs: “3D as code” — 3D space as universal interface for generating, editing, and simulating worlds.

30×

faster than Chain-of-Thought via latent reasoning

Digital Twins

Physically accurate training before real deployment

Usage Scenarios

Load on chair edge, cabinet door clearance, lighting simulation

Chapter 11

Autodesk Fusion & SolidWorks AURA

Autodesk Fusion 360

AutoConstrain + Reinforcement Learning

Uses RL adapted from LLM research for automatic geometric constraints. Understands “design intent” — changing table width auto-preserves leg symmetry via functional role understanding.

93%

fully constrained (RL)

8.9%

baseline model

SolidWorks AURA

LEO + MARIE Virtual Companions

Specialized LLM in secure cloud with access to vast technical standards database. LEO: mechanical design + simulation, assembly structures, STEP error resolution. MARIE: materials science + chemistry for complex-condition material selection.

LEO MARIE AURA

2026: “Subdivision-native” workflows — designers work with control meshes while AI updates FEA meshes in real time. The end of LLMs as chatbots, the beginning of integrated engineering partners.

Chapter 12

Summary & Engineering Recommendations

Modern LLMs as of March 2026 remain powerful copilots for scripting and documentation automation but are not full-fledged design engineers. Their weak physics understanding stems from architectural tokenization limitations, absent direct CAD kernel connections, and lack of proprietary physical data access.

LLM hallucinations — confusion of angular/metric quantities, generation of physically meaningless structures — are not temporary bugs but systemic technological limitations driven by natural language tokenization dominance over analytical geometry principles.

Design demands precision that, within the current LLM paradigm, is a statistical coincidence rather than a physical necessity.

Engineering Recommendations for 2026

1

LLMs for base code only

Generate parametric code structure (CadQuery, Dynamo, OpenSCAD) with mandatory manual review of geometric logic

2

Hybrid neuro-symbolic architectures

LLM output filtered through PINNs or classical FEA solvers (Parasolid, OpenCASCADE)

3

Monitor model drift & hallucinations

Specialized tools to detect tolerance and material property hallucinations in real time

4

Specialized CAD tokenizers & graph representations

Pointer-CAD, CAD-Tokenizer — deeper AI “grounding” in spatial topology

5

Enforce strict type-wrapping

All numerical values in strongly typed classes: Length(90) vs Angle(90) — block execution at syntax level

6

EU AI Act compliance

Risk management systems, technical documentation, and human oversight for critical infrastructure design

References

40+ Sources & References

The future of design lies not in
larger vocabularies but in
physical grounding

Where the probabilistic nature of neural networks is strictly governed by deterministic mathematical CAD kernels — from discrete tokens to continuous spatial representations.

0.8%

B-Rep accuracy

8.5×

crack acceleration

$4.1B

max loss

94%

NSAI accuracy

30×

latent speed

100×

fewer errors

Cognitive Gap &Physical Grounding

Table of Contents 12 Chapters

Introduction & Problem Statement

Key Manifestations of the Gap

Unit Confusion

Impossible Objects

Topology Violations

Fundamental Architectural Barriers

2.1 Tokenization & Discretization of Continuous Space

Tokenization Error Typology

2.2 Semantic Proximity of Units of Measurement

2.3 Autoregressive Error Accumulation

Geometric Topology & B-Rep Paradigm

3.1 The Paradigm Gap

3.2 Specialized Architectures: Pointer-CAD & Graph Neural Networks

The Illusion of Spatial Intelligence

4.1 Textual Shortcuts & Benchmarks

4.2 Empirical Accuracy — March 2026

Industry: Structural Steel

5.1 Hallucinations in Materials Science & Fatigue

5.2 Welding & Residual Stress

Industry: Furniture & Interior Design

6.1 Ignoring Gravity

6.2 Ergonomic Violations

6.3 Pixel-to-Product Gap

Physical Inadequacy in Generated Models

7.1 Topological Defects

Non-manifold Geometry

Self-intersecting Profiles

Zero-area Faces

7.3 Fatal Manufacturing Planning Errors

Economic Consequences & Production Risks

8.1 Geometric Cost Escalation of Errors

8.3 Automation Bias — The Hidden Multiplier

Legal Regulation & Data Barriers

EU AI Act

Training Data Deficit

Legal Liability

Strategies for Overcoming Limitations

10.1 Physics-Informed Neural Networks (PINNs)

10.2 Neuro-Symbolic AI (NSAI)

10.3 The Tokenization Revolution: CAD-Tokenizer

10.4 World Models & Spatial Intelligence

Autodesk Fusion & SolidWorks AURA

Autodesk Fusion 360

SolidWorks AURA

Summary & Engineering Recommendations

Engineering Recommendations for 2026

40+ Sources & References

Architecture & Tokenization

3D Spatial Reasoning

Manufacturing & Materials

Solutions & Frameworks

Industry & CAD Platforms

Legal, Economics & Safety

The future of design lies not inlarger vocabularies but inphysical grounding

Cognitive Gap &
Physical Grounding

The future of design lies not in
larger vocabularies but in
physical grounding