This lesson dives into advanced techniques: making the model reason step-by-step with Chain of Thought, returning structured outputs, and detecting hallucinations with logprobs. You’ll also learn how to chunk documents effectively — and even run an open-source LLM on your own machine.
What Anthropic Thinks About Building Effective Agents
Recently, the team at Anthropic (creators of Claude) released an article titled Building Effective Agents. Here are a few key insights from it:
Context is everything. An agent doesn’t “know” anything by itself. It works with what you give it — the prompt, structure, and dialogue history.
An agent is not a single prompt — it’s a sequence of steps. It’s recommended to design agents as processes, where the LLM draws conclusions, makes decisions, stores intermediate data, and passes it between stages.
State matters. An agent that makes a request and passes the result to another stage must understand its current state. Without it, it becomes a chatterbox with amnesia.
Use multiple prompts. One for analysis, another for decision-making, a third for generation, a fourth for review and improvement. This is architecture — not just one long prompt.
Conclusion: An LLM is just a building block. A real agent is architecture + orchestration. This is where Directual comes in.
Structured Output — So It Doesn’t Just Talk, But Actually Works
If you want the assistant to return not just plain text, but structured data, you need to use Structured Output.
This can be in the form of JSON, XML, markdown, a table, etc.
Why this matters:
Easy to parse and use in Directual scenarios
Ensures the correctness of responses
Precise control over the format
How to Configure Structured Output in Directual
There are two ways to set up SO on the Directual platform:
Response Format
This is the option where you add "response_format": { "type": "json_object" } to the request, and define the response structure directly in the system prompt. Request example below:
{
"model": "gpt-3.5-turbo",
"response_format": { "type": "json_object" },
"messages": [
{
"role": "system",
"content": "Respond strictly with a JSON object containing the fields: title (string), summary (string), items (array of strings)." },
{
"role": "user",
"content": "Compose a summary: {{#escapeJson}}{{text}}{{/escapeJson}}" }
]
}
Function Calling
This option involves using tools. In this case, the response format is guaranteed. Request example below:
If you need a different format (for example, XML), specify it directly in the prompt text.
{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "system",
"content": "You are an assistant that always returns output in valid XML format. Wrap the entire response in a single <response> root tag. Do not include any explanation or commentary. Output only XML. Tags inside: title (string), summary (string), items (array of strings)." },
{
"role": "user",
"content": "Compose a summary: {{#escapeJson}}{{text}}{{/escapeJson}}" }
]
}
Chain of Thought — Making the LLM Think Out Loud
Chain of Thought (CoT) is a technique where the model reasons step by step. This:
improves accuracy
makes logic traceable
helps catch errors
Combo: CoT + Structured Output
The model reasons through the problem,
then returns a final JSON.
You can store the reasoning steps for logging, while showing only the result to the user.
As a result, we get a JSON like this, which can be further processed in the scenario:
{
"steps": [
{
"explanation": "We start with the equation 8x + 7 = -23. Our goal is to solve for x.",
"output": "8x + 7 = -23" },
{
"explanation": "To isolate the term with x, we first need to get rid of the constant on the left side. We do this by subtracting 7 from both sides of the equation.",
"output": "8x = -23 - 7" },
{
"explanation": "Simplify the right side by performing the subtraction, which gives us -30.",
"output": "8x = -30" },
{
"explanation": "Now, we need to solve for x. Since 8 is multiplied by x, we divide both sides of the equation by 8 to isolate x.",
"output": "x = -30 / 8" },
{
"explanation": "Simplify the right side by performing the division, and reduce the fraction to its simplest form. -30 divided by 8 is -3.75, which can also be expressed as the fraction -15/4.",
"output": "x = -3.75 or x = -15/4" }
],
"final_answer": "x = -3.75 or x = -15/4"}
Back to RAG — Let’s Talk About Chunking
When you have a lot of documents, and they’re long — you need to split them into chunks.
Why it matters:
Language models have a context length limit
Very long texts are poorly vectorized
Chunking Approaches
In Directual, it’s convenient to implement chunking in three steps:
Split the text into a JSON object like this: { "chunks": [ { "text": "..." }, { "text": "..." }, ... ] }
Use a JSON step to create objects in the Chunks structure
Send the array of objects to a LINK scenario step, where you apply embeddings and link each chunk back to the parent document
There are three methods for splitting text into chunks:
1. By length
For example, 100 words with an overlap of 10.
The code for generating JSON is below.
Note: to use arrow functions, you need to enable ECMAScript 2022 (ES13) in the START => Advanced step. By default, ES6 is used.
Also, when saving a JS object into a field of type json, make sure to wrap the expression with JSON.stringify().
JSON.stringify({
"chunks": _.chain(`{{text}}`.split(/\s+/))
.thru(words => {
const chunkSize = 100;
const overlap = 10;
const chunks = [];
for (let i = 0; i < words.length; i += (chunkSize - overlap)) {
chunks.push({ text: words.slice(i, i + chunkSize).join(' ') });
}
return chunks;
})
.value()
})
2. By structure
Split into paragraphs. If a chunk is shorter than 5 words, merge it with the next one — it’s probably a heading.
Code for the Edit object step:
JSON.stringify({
"chunks": _.chain(`{{text}}`.split(/\n+/))
.map(_.trim)
.filter(p => p.length > 0)
.thru(paragraphs => {
const chunks = [];
let i = 0;
while (i < paragraphs.length) {
const current = paragraphs[i];
if (current.split(/\s+/).length < 5 && i + 1 < paragraphs.length) {
chunks.push({ text: current + ' ' + paragraphs[i + 1] });
i += 2;
} else {
chunks.push({ text: current });
i += 1;
}
}
return chunks;
})
.value()
})
3. By meaning
Send a request to ChatGPT:
{
"model": "gpt-4o",
"response_format": { "type": "json_object" },
"messages": [
{
"role": "system",
"content": "You are a helpful assistant that splits text into meaningful semantic chunks. Each chunk should represent a coherent idea, paragraph, or topic segment. Return your output strictly as a JSON object with the structure: { \"chunks\": [ {\"text\": \"...\"} ] }." },
{
"role": "user",
"content": "Split the following text into semantic chunks:{{#escapeJson}}{{text}}{{/escapeJson}}" }
]
}
How to Know If Chunking Is Bad
Poor chunking = repetitive answers, broken logic, or “nothing found.”
How to Test LLM for Hallucinations — Use logprobs
Logprobs = the log-probability of each token.
High logprob (closer to 0) = confidence
Low logprob = uncertainty
Use this to filter unreliable responses.
What you can do:
Don’t show uncertain answers
Regenerate the response
Show the answer with a warning
In combination with Structured Output, you can check confidence at the field level.
On Directual:
Add a request step with logprobs: true
Visualize it in HTML (color-coded by confidence level)
Code for visualizing the model’s response with logprobs: true: