The Problem That Was Hiding in Plain Sight
Tricon Solutions is our sister company. They run an IT services and staffing firm where billable hours, project engagements, and client relationships drive everything. We built MTT for them last year to clean up their timesheet workflow. That fixed one thing and immediately surfaced another, much bigger thing.
Once a consultant is placed or a project starts, the operational work doesn't end. It barely begins. Chasing timesheets. Following up on invoices. Handling delayed enterprise payments. Coordinating onboarding paperwork. Responding to MSP and vendor email threads. Managing factoring submissions. Tracking approvals across procurement portals. Reconciling payment statuses. Escalating overdue receivables. Validating compliance artifacts. The list keeps going.
Every services firm has this layer. A boutique agency runs it through email and spreadsheets. A Big 4 practice runs it through Workday plus a hundred offshore coordinators in a shared services center. A mid-market consultancy lands somewhere uncomfortably in between. The tooling is different at every scale, but the work isn't. It's mostly mechanical: read email, figure out status, send a follow-up, file an attachment, update a spreadsheet.
This was the realization that started the project. Most services back-office work is repetitive, pattern-driven workflow hidden inside email threads. The workflows look surprisingly similar whether the firm is fifty people or fifty thousand.
That's a problem AI can solve. Not someday. Now. With the model capabilities already shipping. The harder question wasn't whether to automate. It was where to start, what to leave alone, and how to build something that would earn trust gradually instead of asking for it on day one.
We Chose Operations, Not the Front Office
The obvious AI play in services is the front office. Recruiting AI for some firms, sales and proposal AI for others. Both spaces are crowded with venture-funded startups, every incumbent vendor is shipping AI features, and the ROI is honestly hard to measure because the outcomes are probabilistic and the feedback loops run for quarters. We chose to ignore both.
Operations is a different shape of problem. Much less competition. The pain is immediate and measurable. Workflows are deterministic rather than probabilistic. ROI shows up in cash flow rather than in hiring or pipeline metrics that take a year to evaluate. There's also something else, which is that the operational knowledge inside a services firm is genuinely proprietary. Which clients pay slow. How specific MSPs want submissions packaged. Which invoices need to be split for which procurement systems. Which engagement letters need which approvals. None of that lives in any third-party tool.
So the scope was clear from day one. Automate everything that happens after the engagement starts. Leave the front of the funnel alone.
Who This Is Actually For
I want to be explicit about who this is for, because the buyer profile is wider than the origin story might suggest. The workflows aren't specific to staffing. They're the operational workflows of any firm that bills by the hour or the engagement.
A small services firm runs the operational layer through email and spreadsheets, with maybe one administrative coordinator and a part-time bookkeeper. The pain is acute because every operational hour is a percentage of the owner's time. A mid-market consultancy runs the same workflows through Replicon or Workday, with a small ops team that's perpetually drowning. A Big 4 practice runs the workflows through enterprise systems and offshore shared services centers in Bangalore or Manila, with thousands of coordinators reading email, extracting status, updating spreadsheets, and chasing AP departments at the same Fortune 500 clients you'd recognize.
The tooling is different across that range. The underlying work isn't. Every firm in the spectrum is paying for human attention spent reading inboxes, matching email replies to ledger entries, drafting follow-ups, and routing escalations. The unit cost of that attention varies by an order of magnitude. The volume varies inversely. The math works at every tier.
The Hidden Pattern: Email-Driven State Machines
Once we started mapping how operational work actually moved through the company, the same shape kept appearing. An invoice exists in some state: submitted, approved, partially paid, overdue, escalated. A timesheet exists in some state: not submitted, submitted, approved, rejected, locked. A consultant onboarding exists in some state: pending docs, pending approvals, COI received, NDA signed, ready to start.
And every state transition was being driven by an email. Either an inbound one (the client AP department writes "we'll process this next cycle") or an outbound one (we send the third reminder). The operations team's job was to read the email, figure out what it meant, decide what should happen next, and either send the next message or update the spreadsheet that tracked everything.
This is exactly the kind of work LLMs are good at: reading, classifying, drafting, and following a playbook. The hard part wasn't model capability. It was building the right wrapper around the model so it could operate safely inside a real business with real client relationships.
The Workflows We Mapped
Before we wrote any code, we documented the workflows that ran through the operations inbox every week. Four families of work accounted for the vast majority of the volume.
Payment Follow-Up & Collections
- Detect invoices past due thresholds (30, 45, 60, 90 days)
- Calculate aging, identify factoring vs. direct-bill exposure
- Draft tone-appropriate follow-ups based on prior thread history
- Escalate to AP managers, then to internal management, on schedule
- Identify payment-risk patterns (transitions, restructurings, AP turnover)
Timesheet Operations
- Detect missing or unapproved timesheets against assignment-level expectations
- Send Friday consultant reminders, then Monday/Tuesday escalations
- Notify recruiters and account managers when a consultant goes silent
- Trigger invoice generation when timesheets clear approval
- Surface payroll-risk: which consultants haven't been approved by cutoff?
Factoring Coordination
- Identify fundable invoices once supporting docs are present
- Validate timesheet, signed approval, NOA-linked client, and PO match
- Package factoring submissions in the format the lender expects
- Track advance arrival, reconcile against expected amount
- Flag rebate timing and ACH discrepancies for human review
Onboarding & Compliance
- Sequence document requests (COI, W-9, ACH, NDA, background check)
- Detect missing items and re-request without losing the thread
- Track approvals across MSP/VMS portals (Beeline, Fieldglass)
- Validate compliance artifacts against client-specific checklists
- Hand off to delivery once the consultant is cleared to start
Every one of these workflows was being executed by a person reading email and updating a spreadsheet. None of them required judgment that an LLM, given the right context and supervision, couldn't either replicate or dramatically accelerate.
Design Philosophy: Human-Supervised Autonomy
The hardest design decision in this project wasn't technical. It was philosophical. How autonomous should the agent actually be?
The dominant pattern in AI startups right now is full autonomy. The agent reads the email, decides what to send back, and sends it without human review. We rejected that, and the reason matters. The cost of a mistake in operational communication is asymmetric. One tone-deaf escalation email to a Fortune 500 AP department can damage a client relationship that took years to build. A factoring submission with a wrong amount can trigger a chargeback. An onboarding compliance miss can void a placement. The downside dominates.
So we designed for human-supervised autonomy instead. The agent drafts, classifies, recommends, and escalates. Humans approve the actions that matter and override decisions that are wrong. The goal isn't to remove operators from the loop. It's to remove the repetition from the loop.
In practice, this looks like a triage queue. The agent reads inbound mail, classifies it across our eight categories (TIMESHEET, PAYMENT, ONBOARDING, COMPLIANCE, FACTORING, ESCALATION, ADMIN, OTHER), pulls relevant context from the operational state database, drafts a recommended response, and surfaces everything in a review screen where the operator can approve in one click or edit in two. Anything financially material (outbound dollar amounts, formal escalations, factoring submissions) sits behind explicit approval.
Over time, as the operator approves drafts without edits, the trust budget grows. Categories of action graduate to auto-send. A routine "got it, will follow up next Friday" doesn't need approval after the hundredth one. The agent learns the firm's voice, the escalation thresholds, the client-specific quirks. Autonomy gets earned per category. It doesn't get granted up front.
System Architecture
The architecture is intentionally composable. Not a monolith. Not a single agent loop. A set of small modules that pass operational state between each other, with the LLM showing up at the points where reasoning is required and staying out of the way everywhere else.
Ingestion Layer
The agent monitors operational inboxes through Microsoft 365 and Gmail APIs (with IMAP fallback for legacy mail systems), pulls VMS notifications from MSP portals like Beeline and Fieldglass via Playwright-driven browser automation, integrates with the factoring partner's API for advance and reconciliation events, and ingests timesheet state changes from MTT through webhooks. Everything lands in a unified event stream.
Reasoning Layer
Three modules sit at the LLM boundary. The Classifier takes an inbound event and assigns it to one of eight operational categories, with confidence scoring and multi-label support for the messier cases. The Context Builder assembles the full picture into a structured prompt: the email thread history, the relevant invoice or timesheet ledger entry, the prior outbound actions on this account, and any client-specific operational notes the firm has captured. The Draft Engine then generates a recommended action: a reply email, a portal submission, a payroll flag, an internal escalation. Tone is calibrated based on relationship history and aging. The agent never gets nastier than the operator would.
Orchestration Layer
The orchestrator handles everything that isn't a single LLM call. Scheduled workflows like the Friday consultant timesheet sweep, the Monday AR aging report, and the weekly factoring reconciliation. Retry logic for failed sends. Escalation timing windows that respect business hours and time zones. Approval routing to the right human based on action type and dollar threshold. Playwright-based browser automation for the VMS workflows that have no API. We started with APScheduler and cron for the MVP. The production roadmap considers Temporal for durable, replayable workflows once volume justifies it.
State Layer
An operational state database holds the canonical record of every consultant, client, project, invoice, timesheet, thread, and action. We chose PostgreSQL for the production tier (PostgreSQL 18, the same engine that powers MTT). The MVP ran on SQLite to avoid infrastructure complexity during prototyping, which was a deliberate choice. Every action the agent takes is logged with the full context that produced it: the inbound trigger, the prompt, the model response, the operator's approval or edit. The audit trail is non-negotiable. Operations work has to be defensible against client disputes, factoring chargebacks, and internal post-mortems.
Human-in-the-Loop Layer
The triage queue is a lightweight web UI where an operator works through the agent's drafts. Each item shows the inbound context, the recommended action, the confidence score, and one-click approve, edit, or reject controls. Outbound communication never leaves the system without an explicit human action, at least until the operator graduates a category to auto-send. Slack and Teams hooks push high-priority items (escalations, payroll-risk timesheets, factoring discrepancies) directly into the channels where decisions actually get made.
Why Python (and Why Not Java/Spring)
This deserves a direct answer because anyone with an enterprise background is going to ask. Most of our team's deepest expertise is in Java: legacy modernization, Spring, enterprise integration. I've spent most of my career in that world. We chose Python for this project anyway.
For the MVP, the AI ecosystem velocity in Python is decisive. The Anthropic SDK, the orchestration libraries, the email parsing utilities, the Playwright bindings. Everything we needed was a pip install away, and the iteration loop was fast enough to throw a design away and rebuild it in a day. Spring would have given us a more rigorous foundation, but at a cost of velocity that was wrong for the MVP stage.
For the long-term platform, we built the architecture so that any individual module can be ported to JVM if the multi-tenant SaaS version demands it. The reasoning layer stays Python regardless. The ingestion and orchestration layers are the candidates for migration when scale forces the issue. We don't think it'll need to happen soon, but we built the seams.
Three Phases of the Product
The product evolved through three phases. Each one built on the last and none of them was wasted work.
Internal Automation Tool
Scripts, scheduled jobs, email templates, and reports. The goal was to cut repetitive operations work inside Tricon Solutions. No LLM, no autonomy, just deterministic automation of the workflows that didn't need judgment. This phase paid for itself almost immediately and gave us the operational state data the next phases needed.
AI-Assisted Operations Agent
Add Claude to the loop. Classify inbound mail, assemble context, draft replies, recommend actions. The shift from automation scripts to autonomous reasoning over operational state, with a human-in-the-loop on every outbound action. This is where Tricon Ops Agent stopped being a script collection and started being a product.
Productized AI Operations Platform
Multi-tenant SaaS. Whitelabel deployments for other services firms across the size spectrum. Connectors for QuickBooks, NetSuite, Beeline, Fieldglass, Oorwin, Slack, Teams. Conversational queries over operational state ("show me all invoices over 45 days for NBCU"). This is the destination: an AI-native operating system for services-firm operations.
Technical Stack
FastAPI for the web tier and the human-review surfaces. PostgreSQL for state. Claude as the reasoning engine across classification, context assembly, and draft generation; this is the same platform we standardize on for every Tricon Infotech engagement. APScheduler for the MVP scheduling layer. Playwright for the VMS workflows that don't expose an API. Microsoft Graph and Gmail APIs for native email integration, with IMAP/SMTP as the fallback for legacy environments.
What We Learned
Operational AI has better unit economics than front-office AI
Recruiting and sales outcomes are probabilistic. Operations workflows are deterministic. When the agent drafts an AR follow-up that the operator sends, you saved fifteen minutes of work and that saving is real and measurable. When a recruiting or sales AI surfaces a "great candidate" or a "hot lead," you have to engage them, qualify them, present them, and close them before you know whether the AI was actually useful. The feedback loop on operational AI is hours. On front-office AI it's quarters. This holds whether you're a 30-person agency or a Big 4 practice.
The right autonomy level grows with trust
The biggest mistake we could have made was launching with full autonomy and burning client relationships in week one. The design that actually works starts with full human review and graduates categories of work to auto-send as the operator approves drafts without edits. Trust gets earned per category. Not all at once.
Email is the integration layer, not a feature
A large fraction of services-firm operational state lives in email threads. Half, maybe more, depending on the firm. You can either replace email (impossible, because your clients won't move) or treat email as a first-class data source. We chose the second path. The agent reads, classifies, and threads through email natively. It also writes back to email natively. Email isn't a notification channel for this product. It's the substrate.
Audit trail is non-negotiable
Every operational action has to be defensible. We log the inbound event, the prompt, the model response, the operator's approval, and the outbound action. When a client disputes a follow-up tone or a factoring partner challenges a submission, we can produce the full chain of context. Without that, the agent isn't deployable in a real business. We learned this the hard way in the first month of the pilot, when we couldn't reconstruct why a particular escalation had gone out the way it did. Build the audit trail before you build anything else.
Where This Goes Next
Tricon Ops Agent is built. We developed the first version inside Tricon Solutions to map the operational workflows end-to-end on a prior model stack — proof that the architecture works and the workflow categories generalize across services-firm operations. We're currently porting the reasoning layer onto Claude's API for the production rollout, which ships in the coming weeks. The prototype validated the model. The Claude-native version is what goes into broader deployment.
Once the Claude-native version is live, the next step is whitelabel. Other services firms have the same operational layer, with the same email-driven state machines, the same MSP coordination overhead, the same AR aging workflows, the same onboarding sequencing. We're packaging the platform for those firms with their branding, their clients, their playbooks, and their data isolation. It isn't a SaaS in the venture sense yet. It's a managed deployment that we operate alongside the client for the first few months until they're comfortable owning it. The buyer profile is intentionally wide: boutique agencies, mid-market consultancies, and the offshore shared services arms of large firms that already run the work but want to remove the repetition from their coordinators' day.
The longer arc is the AI-native operating system for services-firm operations: multi-tenant, with native VMS and ERP connectors, conversational queries over operational state, predictive collections risk modeling, and an operational knowledge graph that captures the relationships between consultants, partners, clients, vendors, MSPs, invoices, and timesheets. Most services firms today optimize sourcing, recruiting, and sales. Almost none optimize operational execution. That's the layer we're building for.
If you lead a services firm and the operations layer is eating your team's time, this is the system we built and the methodology we use. Whitelabel deployments are open. Happy to talk.
Whitelabel Tricon Ops Agent for Your Firm
Same platform, your branding, your playbooks. We deploy alongside your team and stay involved until your operations own it.
Start a Conversation →