Agentic vs Deterministic Workflows: Designing a Reliable AI Application

In this article, I’ll walk through the architecture of the Autonomous Career Agent (ACA), a system I built to automate the tedious process of job hunting. Instead of just chatting, this agent acts as a specialized recruiter: it searches for live job vacancies, assesses your resume against them, and generates tailored resume and cover letter ready for submission.

This application was created as a capstone project for the 5-Day AI Agents Intensive Course with Google.

A video worth a thousand words, isn’t it? Jump to the Application Demo.

We’ll dive into the Orchestrator Pattern, managing state across multiple specialized agents, and solving production challenges like secure artifact delivery using Google Cloud Storage (GCS).

Many parts of job search automation can be implemented without LLM autonomy, so this project became a practical comparison between explicit control flow and delegated reasoning.

Architecture: the “Brain” and the “Hands”

When building complex agentic workflows, a common pitfall is the “God Prompt” — trying to stuff every possible instruction (search logic, evaluation criteria, formatting rules) into a single system prompt. This leads to fragile systems that are hard to debug and even harder to scale.

For this application, I adopted a Multi-Agent Orchestration architecture using the Google Agents Development Kit (ADK). This design separates high-level reasoning from low-level execution.

Orchestrator Pattern

At the core is the Orchestrator Agent. Think of it as the project manager. It doesn’t know how to scrape LinkedIn or how to format a resume. Its job is to understand the user’s intent, maintain the state of the workflow, and delegate tasks to specialists.

Explore the full source code on GitHub: AxesAccess / Autonomous-Career-Agent.

Figure 1: The Application Workflow

Why This Approach?

Separation of Concerns: The Search Agent can be iterated on (e.g., swapping a mock API for a real LinkedIn integration) without touching the Orchestrator’s logic.
Context Management: The Orchestrator keeps the “big picture” (the user’s goal), while sub-agents only need the context for their specific task. This saves tokens and reduces hallucinations.
Testability: We can unit test the Assessment Agent’s scoring logic independently of the Job Search results.

What a Deterministic Version Would Look Like

To understand the trade-offs, it’s helpful to imagine a deterministic version of this system:

Hard-coded Pipeline: Input -> Search() -> Loop(Results) -> Assess() -> Generate() -> Output.
Rule-based Scoring: “If ‘Python’ in Resume and ‘Python’ in Job Description, Score += 10”.
Explicit Transitions: The code strictly dictates that step B always follows step A.

While efficient, this approach requires us to pre-define every possible filter criteria. If a user asks for “a job with a great engineering culture,” a deterministic system fails unless we have an explicit “culture_score” column. An agent, however, can reason about the unstructured text in a job description to infer cultural values without requiring a schema change. The trade-off is clear: do we value raw efficiency (deterministic) or the ability to handle open-ended, ambiguous requirements (agentic)?

Orchestration Logic

The Orchestrator’s intelligence comes from its ability to maintain a ‘Context Loop’ — remembering the results of Step 1 (Search) to inform Step 2 (Assessment).

In src/agents/orchestrator.py, the system instruction acts as a state machine. It doesn’t just say “Help the user”, it defines a strict protocol:

orchestrator_agent = LlmAgent(
    name="career_orchestrator",
    instruction="""
    Your responsibilities:
    ...
    3. Use the `recruitment_search_agent` to find relevant vacancies.
    4. Provide user with the list of relevant vacancies.
    5. Ask user to provide information about their skills and experiences.
    6. For each relevant vacancy chosen by the user:
        a. Use the `skills_assessment_agent` to analyze the fit...
        b. If the fit is good ... generate the tailored resume...
    """
)

By explicitly numbering the steps and restricting available tools at each phase, we reduce the degrees of freedom available to the LLM. The goal is not to make the model “smarter”, but to make incorrect behavior harder to express. This reliability is crucial for user trust.

Production-Grade Integrations

Building a demo is one thing; building something that works in a restricted environment is another. One major challenge I faced was Artifact Delivery.

Explore the full source code on GitHub: AxesAccess / Autonomous-Career-Agent.

Challenge: “Where do the files go?”

When the agent generates a Resume (Markdown/PDF), we need to give it to the user.

Local File System: In a containerized web deployment, local files aren’t accessible to the user’s browser.
Chat Attachment: The ADK UI didn’t support attached files for unknown reason.

Solution: Hybrid Cloud Storage

I implemented a hybrid tooling strategy (src/tools/hybrid_artifact_tools.py) that satisfies both the agent’s memory needs and the user’s UX needs.

Internal Memory: The file is saved to the Agent’s internal artifact store so it can “remember” what it wrote.
Public Delivery: The file is simultaneously uploaded to a private Google Cloud Storage (GCS) bucket.
Secure Access: The app generates Signed URL (valid for 24 hours) and present that link to the user in the chat.

# From src/tools/hybrid_artifact_tools.py

# 1. Upload to GCS
blob = bucket.blob(gcs_filename)
blob.upload_from_string(content, content_type=mime_type)

# 2. Generate Signed URL
signed_url = blob.generate_signed_url(
    version="v4", 
    expiration=timedelta(hours=24), 
    method="GET"
)

# 3. Return to Agent to show the user
return f"📥 Download Link: {signed_url}"

This approach demonstrates how to bridge the gap between AI generation and standard web infrastructure which is a key aspect for moving agents from prototype to production. This pattern proved essential because it decouples generation from delivery, allowing the agent to operate in restricted execution environments without leaking infrastructure concerns into the prompting layer.

Observability & Evaluation

A reliable agent system requires more than just code; it requires rigorous testing.

Evaluation with Golden Datasets

We don’t trust the agent blindly. We use an automated evaluation script (tests/evaluation/evaluate.py) that runs the agent against a golden_dataset.json. This dataset contains typical user scenarios (e.g., “Find Python jobs in Berlin”) and verifies:

Safety: Did the agent error out?
Correctness: Did the response contain expected keywords (e.g., job titles found)?
Tool Usage: Did it call the Search Tool?

Observability

Using ADK’s built-in observability features, I trace every step of the orchestration. This allows for inspection of raw prompts and responses, helping to debug why an agent might have “hallucinated” a step or missed a user instruction.

Notably, evaluation becomes more important as autonomy increases; unlike deterministic pipelines, agentic systems require behavioral testing rather than simple output validation.

Application Demo

Read the hackathon writeup on Kaggle: ACA aka Autonomous Career Agent.

Challenges & Lessons Learned

Latency vs. Accuracy: Splitting tasks into sub-agents improves accuracy but adds latency (multiple LLM round-trips). I optimized this by having the Orchestrator handle simple info-gathering without delegation where possible.
Tool Hallucinations: Early versions often tried to invent search parameters. Strict typing in the Tool classes pydantic models solved 90% of these issues.

Where a Deterministic Approach Was Better

It’s important to acknowledge where the agentic approach introduced friction compared to a traditional script:

Latency: The “thought process” of an LLM is orders of magnitude slower than a function call.
Reproducibility: Even with temperature=0, minor variations in phrasing can occur, making regression testing harder.
Cost Predictability: A loop in a script costs nothing; a loop in an agent consumes tokens with every iteration.
Debugging Difficulty: You can’t just set a breakpoint in a prompt; you have to trace the semantic flow.

Conclusion

The Autonomous Career Agent app demonstrates that building useful GenAI applications goes beyond prompt engineering. It requires solid software engineering principles: modular architecture, secure integration patterns, and automated testing.

This project reaffirmed an important engineering reality: for the specific task of matching keywords and filling templates, a deterministic pipeline offers distinct advantages in speed and predictability. The agentic overhead here is non-trivial.

However, this architecture lays the groundwork for a much broader application; for example, Career Consulting. By grasping the observability and evaluation frameworks probed here, we can safely expand the agentic application to handle unstructured tasks like helping a user figure out what they want to do or pivoting strategies mid-search, where no deterministic application could succeed.