What Is an AI Stack? LLMs, RAG, & AI Hardware

From this video: https://www.youtube.com/watch?v=RRKwmeyIc24

The AI Technology Stack Hierarchy

[ Level 5: Application Layer ] (User Interface, Integrations, Usability) ▲ [ Level 4: Orchestration Layer ] (Thinking, Planning, Execution, Feedback Loops) ▲ [ Level 3: Data Layer ] (Data Sources, Pipelines, Vector Databases/RAG) ▲ [ Level 2: Models ] (Intelligence Core: LLMs/SLMs, Reasoning, Code) ▲ [ Level 1: Infrastructure ] (Hardware: GPUs, Cloud/On-premise/Local)

The AI technology stack consists of five distinct, interconnected layers, each playing a specific role in moving from raw data to a functional, user-facing AI application. Understanding these helps in optimizing for performance, speed, cost, and safety.

1. Infrastructure This is the foundational hardware layer. Because Large Language Models (LLMs) require significant computational power, they typically run on GPUs. There are three primary deployment models:

On-premise: Managing your own dedicated hardware.
Cloud: Renting scalable capacity from providers.
Local: Running smaller models directly on devices like a laptop.

2. Models This is the core intelligence layer. Developers choose models based on three factors:

Licensing: Open versus Proprietary models.
Size: Large Language Models offer higher reasoning capacity, while Small Language Models are lighter and faster for specific hardware constraints.
Specialization: Different models are optimized for specific tasks, such as reasoning, tool calling, or generating code.

3. Data Since base models have fixed training cutoffs (meaning, the AI was only trained on information up to a certain date, so anything newer is not in what it originally learned), this layer supplies the necessary context to make the AI useful for current or proprietary tasks.

This involves:⁠

Data Sources: Supplementing information not found in the model's base training.
Pipelines: Processing, pre-processing, and post-processing data.

“Pipelines” are the step-by-step “assembly line” that takes raw data, cleans and reshapes it, and then prepares the final version so an AI can use it well.1
Vector Databases/RAG (Retrieval-Augmented Generation): Converting external data into embeddings so the model can retrieve and use domain-specific knowledge to answer questions accurately.

4. Orchestration This layer acts as the "brain" that manages logic and complex workflows. Instead of a simple prompt-to-answer flow, orchestration performs:

Thinking/Planning: Breaking down complex user queries into smaller, manageable steps.