Step 2: Configuration & LLM Loading¶

Goal for this Step

To write the Python code that will configure our project and load the downloaded TinyLlama model into a LangChain-compatible object, ready for use in our prompt engineering workbench.

2.1. Why Use a Configuration File?¶

As projects grow, it's a best practice to keep important settings, like file paths or model names, in a central location. This makes the application easier to maintain and update. If we ever want to use a different model, we only have to change it in one place.

graph LR
    A[config.py] --> B[llm_loader.py]
    A --> C[chains.py]
    A --> D[ui.py]

    style A fill:#e8f5e8
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#e1f5fe

The Code: `src/config.py`¶

Create a file at src/config.py inside your PromptCraft project folder and add the following code:

# src/config.py

import os

# --- Model Configuration ---

# Define the path to the directory containing the model
MODEL_DIR = "models"
# Define the specific model filename
MODEL_FILENAME = "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"

# Construct the full path to the model
MODEL_PATH = os.path.join(MODEL_DIR, MODEL_FILENAME)

Configuration Benefits

Centralized: All important paths and settings in one place
Maintainable: Easy to update model or paths
Reusable: Other modules can import these settings
Clean: Keeps hardcoded values out of business logic

2.2. The LLM Loader Module¶

This module's only job is to load the GGUF model file from disk using the LlamaCpp library. This separates the complex model loading logic from our main application logic, which is a key principle of clean code.

The Code: `src/llm_loader.py`¶

Create a file at src/llm_loader.py and add the following code:

# src/llm_loader.py

from langchain_community.llms import LlamaCpp
from .config import MODEL_PATH # Use a relative import from our config file

def load_local_llm():
    """
    Loads the local GGUF model using LlamaCpp.

    Returns:
        LlamaCpp: An instance of the LlamaCpp model, ready for use.
    """
    # LlamaCpp is the LangChain connector for GGUF models.
    # We configure it with the path to our downloaded model and some parameters.
    llm = LlamaCpp(
        model_path=MODEL_PATH,
        n_ctx=2048,           # The max context size
        n_batch=512,          # Batch size for prompt processing (should be > 1)
        temperature=0.75,     # Controls the randomness of the output
        max_tokens=512,       # The maximum number of tokens to generate
        top_p=1,
        verbose=True,         # Set to True to see the LlamaCpp logs
        n_gpu_layers=-1       # Offload all layers to GPU if available. Set to 0 for CPU-only.
    )
    return llm

# This block allows us to test the loader directly by running this script
if __name__ == '__main__':
    print("--- Testing LLM Loader ---")

    # Load the model
    llm_instance = load_local_llm()

    # Define a test prompt
    test_prompt = "Question: What is the capital of France? Answer:"

    # Get a response from the model
    print(f"Sending prompt: '{test_prompt}'")
    response = llm_instance.invoke(test_prompt)

    # Print the response
    print("\n--- Model Response ---")
    print(response)
    print("\n--- Test Complete ---")

2.3. Understanding the LlamaCpp Parameters¶

Let's break down the key parameters we're using to configure our model:

Parameter	Description	Our Value	Purpose
`model_path`	Path to the GGUF model file	`MODEL_PATH`	Tells LlamaCpp where to find our model
`n_ctx`	Context window size	`2048`	Maximum tokens the model can "remember"
`n_batch`	Batch size for processing	`512`	How many tokens to process at once
`temperature`	Randomness control	`0.75`	Higher = more creative, Lower = more focused
`max_tokens`	Maximum response length	`512`	Limits how long responses can be
`n_gpu_layers`	GPU acceleration	`-1`	-1 = use all GPU layers, 0 = CPU only

GPU vs CPU

If you have a compatible GPU, -1 will use GPU acceleration for faster inference
If you encounter issues or want to run on CPU only, change n_gpu_layers to 0
GPU acceleration requires proper CUDA setup for NVIDIA GPUs

2.4. Testing the Loader¶

You can test the loader independently by running:

# Navigate to your project directory
cd PromptCraft

# Run the loader test
python -m src.llm_loader

This will: 1. Load the model from the models/ directory 2. Send a test prompt to the model 3. Display the model's response 4. Confirm everything is working correctly

2.5. Understanding the Module Structure¶

sequenceDiagram
    participant Config as config.py
    participant Loader as llm_loader.py
    participant Model as TinyLlama Model

    Loader->>Config: Import MODEL_PATH
    Config-->>Loader: Return path string
    Loader->>Model: Load model from path
    Model-->>Loader: Return LlamaCpp instance
    Loader->>Model: Send test prompt
    Model-->>Loader: Return response

What You've Accomplished

✅ Created a centralized configuration system
✅ Built a robust model loader with proper error handling
✅ Configured optimal parameters for the TinyLlama model
✅ Added testing capabilities to verify everything works
✅ Set up the foundation for our prompt engineering chains

Next Steps¶

Now that we have our model loading infrastructure in place, we're ready to create the logical chains that will handle different prompt engineering tasks.

Step 3: Building the Logic Chains →

← Step 1: Project Setup & Model Download