Step 2: Configuration & LLM Loading¶
Goal for this Step
To write the Python code that will configure our project and load the downloaded TinyLlama
model into a LangChain-compatible object, ready for use in our prompt engineering workbench.
2.1. Why Use a Configuration File?¶
As projects grow, it's a best practice to keep important settings, like file paths or model names, in a central location. This makes the application easier to maintain and update. If we ever want to use a different model, we only have to change it in one place.
graph LR
A[config.py] --> B[llm_loader.py]
A --> C[chains.py]
A --> D[ui.py]
style A fill:#e8f5e8
style B fill:#fff3e0
style C fill:#f3e5f5
style D fill:#e1f5fe
The Code: src/config.py
¶
Create a file at src/config.py
inside your PromptCraft project folder and add the following code:
# src/config.py
import os
# --- Model Configuration ---
# Define the path to the directory containing the model
MODEL_DIR = "models"
# Define the specific model filename
MODEL_FILENAME = "tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf"
# Construct the full path to the model
MODEL_PATH = os.path.join(MODEL_DIR, MODEL_FILENAME)
Configuration Benefits
- Centralized: All important paths and settings in one place
- Maintainable: Easy to update model or paths
- Reusable: Other modules can import these settings
- Clean: Keeps hardcoded values out of business logic
2.2. The LLM Loader Module¶
This module's only job is to load the GGUF model file from disk using the LlamaCpp
library. This separates the complex model loading logic from our main application logic, which is a key principle of clean code.
The Code: src/llm_loader.py
¶
Create a file at src/llm_loader.py
and add the following code:
# src/llm_loader.py
from langchain_community.llms import LlamaCpp
from .config import MODEL_PATH # Use a relative import from our config file
def load_local_llm():
"""
Loads the local GGUF model using LlamaCpp.
Returns:
LlamaCpp: An instance of the LlamaCpp model, ready for use.
"""
# LlamaCpp is the LangChain connector for GGUF models.
# We configure it with the path to our downloaded model and some parameters.
llm = LlamaCpp(
model_path=MODEL_PATH,
n_ctx=2048, # The max context size
n_batch=512, # Batch size for prompt processing (should be > 1)
temperature=0.75, # Controls the randomness of the output
max_tokens=512, # The maximum number of tokens to generate
top_p=1,
verbose=True, # Set to True to see the LlamaCpp logs
n_gpu_layers=-1 # Offload all layers to GPU if available. Set to 0 for CPU-only.
)
return llm
# This block allows us to test the loader directly by running this script
if __name__ == '__main__':
print("--- Testing LLM Loader ---")
# Load the model
llm_instance = load_local_llm()
# Define a test prompt
test_prompt = "Question: What is the capital of France? Answer:"
# Get a response from the model
print(f"Sending prompt: '{test_prompt}'")
response = llm_instance.invoke(test_prompt)
# Print the response
print("\n--- Model Response ---")
print(response)
print("\n--- Test Complete ---")
2.3. Understanding the LlamaCpp Parameters¶
Let's break down the key parameters we're using to configure our model:
Parameter | Description | Our Value | Purpose |
---|---|---|---|
model_path |
Path to the GGUF model file | MODEL_PATH |
Tells LlamaCpp where to find our model |
n_ctx |
Context window size | 2048 |
Maximum tokens the model can "remember" |
n_batch |
Batch size for processing | 512 |
How many tokens to process at once |
temperature |
Randomness control | 0.75 |
Higher = more creative, Lower = more focused |
max_tokens |
Maximum response length | 512 |
Limits how long responses can be |
n_gpu_layers |
GPU acceleration | -1 |
-1 = use all GPU layers, 0 = CPU only |
GPU vs CPU
- If you have a compatible GPU,
-1
will use GPU acceleration for faster inference - If you encounter issues or want to run on CPU only, change
n_gpu_layers
to0
- GPU acceleration requires proper CUDA setup for NVIDIA GPUs
2.4. Testing the Loader¶
You can test the loader independently by running:
This will:
1. Load the model from the models/
directory
2. Send a test prompt to the model
3. Display the model's response
4. Confirm everything is working correctly
2.5. Understanding the Module Structure¶
sequenceDiagram
participant Config as config.py
participant Loader as llm_loader.py
participant Model as TinyLlama Model
Loader->>Config: Import MODEL_PATH
Config-->>Loader: Return path string
Loader->>Model: Load model from path
Model-->>Loader: Return LlamaCpp instance
Loader->>Model: Send test prompt
Model-->>Loader: Return response
What You've Accomplished
- ✅ Created a centralized configuration system
- ✅ Built a robust model loader with proper error handling
- ✅ Configured optimal parameters for the TinyLlama model
- ✅ Added testing capabilities to verify everything works
- ✅ Set up the foundation for our prompt engineering chains
Next Steps¶
Now that we have our model loading infrastructure in place, we're ready to create the logical chains that will handle different prompt engineering tasks.