Josh Pitzalis

AI Feature Development & Evaluation Services for Growth-Stage SaaS Companies

Consulting services offered as low-risk, fixed-price packages. If you don’t find a package that meets your needs, drop me a line at hey@jxsh.io to discuss a custom arrangement, or let's find a time to talk.


AI Feature Development (Typescript)

Takes 8 weeks and starts at $22,500

ck6uq0glsm5ltgrswqsv

Custom AI features from scratch in TypeScript with evaluation and quality assurance built-in from day one, delivering production-ready code in 8 weeks. The development can be done as an isolated microservice or directly within your existing codebase, depending on the level of access you're comfortable providing.

What makes this unique: I also use a proven MCP (Model Context Protocol) approach for feature validation - first building MCP adapters that let you test new functionality directly in Claude/ChatGPT. This means you get to test out the actual feature before we commit to a frontend and lock things in - you can play around with producing the result of the feature while we're building it. This helps us smooth off any rough edges and deliver a better final product in the same amount of time.

Process

Weeks 1-2: Discovery & Architecture

Weeks 3-6: Development & Testing

Weeks 7-8: Production Integration & Handover

What's Delivered

Complete, production-ready AI feature code that can be deployed as either a standalone microservice or integrated directly into your existing codebase. You get full ownership of all code, documentation, and deployment configurations, with architecture designed for scalability and maintainability.

A complete testing framework covering your AI feature with integration tests, unit tests, and end-to-end tests as appropriate for your specific implementation. The test suite ensures reliability across different scenarios and provides confidence for future updates and deployments.

A detailed assessment of your new AI feature's performance, including the custom grading criteria we develop together throughout the project and final evaluation scores across all quality metrics. Since this is a new feature, all test data will be synthetically generated for baseline testing, but I will work closely with you to ensure the test scenarios are as representative of real-world edge cases as possible. This report serves as your baseline for ongoing performance monitoring and future improvements.


Evals for AI SaaS Features

Takes 3 weeks and starts at $6000

yu308c2cgxcylxxrhjde

I systematically diagnose and fix existing AI features that aren't performing as expected, delivering quantified improvements in just 3 weeks. Unlike generic monitoring tools, I create custom grading criteria specific to your domain and provide measurable before/after results that prove ROI. You get a complete quality framework, documented fixes, and the knowledge to maintain high AI performance long after the engagement ends.

What makes this unique: Most AI consulting focuses on building new features, but I specialize in rescuing underperforming AI systems with rapid, measurable improvements and custom quality standards tailored to your specific business domain.

Process

Week 1: Assessment & Instrumentation

Week 2: Analysis & Automated Detection

Week 3: Validation & Knowledge Transfer

What's Delivered

A comprehensive, evolving document that defines what "good" AI output looks like for your specific use case. This includes scoring rubrics, quality thresholds, edge case handling rules, and examples of acceptable vs. unacceptable outputs. Unlike static documentation, this document is designed to be updated as your understanding of quality evolves, serving as the foundation for all future AI evaluation and improvement efforts.

A quantified analysis of your AI system's current performance, documenting all identified error modes with specific metrics. This report includes failure rates, error categories, cost analysis, and impact assessment for each problem area. It serves as your "before" snapshot, establishing concrete benchmarks against which all improvements will be measured.

A comprehensive before/after comparison showing exactly what was fixed and by how much. This report quantifies the measurable improvements achieved across all error modes, including reduced failure rates, cost savings, and enhanced reliability metrics. It provides concrete evidence of ROI and serves as documentation for stakeholders on the tangible value delivered.


Strategic AI Consultation

Takes 2 hours and starts at $500

qs27hq8v13zgtttxtdf2-1

Unlike generic AI consulting that gives you theoretical advice, I focus on practical, implementable solutions tailored to your specific technical constraints and business goals. You get clarity on whether to build new features, fix existing ones, or optimize your current approach - with concrete next steps and timelines rather than vague recommendations.

What makes this unique: Most AI consultants either sell you their preferred solution or give abstract advice, but I provide unbiased, practical guidance based on your actual situation and constraints, helping you make confident decisions before committing significant development resources.

What's Delivered

A comprehensive 2-hour video call where we dive deep into your AI product challenges, explore potential solutions, and align on the best path forward. This isn't just a presentation of findings - it's an interactive strategy session where you can ask questions, challenge assumptions, and work through implementation concerns in real-time. You'll get the recording to share with your team and reference later as you execute on the recommendations.

A complete transcript of our 2-hour strategy session, cleaned up and organized for easy reference. This ensures you have a written record of all recommendations, action items, and key insights discussed, making it easy to share with team members who weren't on the call and reference specific points as you implement the suggested changes.

A curated collection of links, tools, frameworks, and references based on everything we discussed during our session. I spend 2 hours after our call researching and compiling the specific resources, documentation, and implementation guides that relate to your unique situation, giving you a head start on executing any of the strategies or solutions we explored together.


FAQs

Who's behind this service?

I'm Josh Pitzalis, working alongside my development partner Sasha Koss. Together we built Chirr App, one of the first Twitter thread schedulers back in 2017, and have over 15 years of combined development experience. Sasha is the author of date-fns, the second most popular JavaScript utility library in the world according to the 2024 State of JS report. We've been exploring AI infrastructure and evaluation systems through projects like Mind Control (a CMS for production AI prompts) and DaisyChain (a no-code prompt chaining tool). Our backgrounds combine deep software engineering expertise with an understanding of the LLM evaluation process and practical lessons form building two AI-infrastructure projects.

What's required from my team?

For the consultation, 2 hours of your time. - You can share access to your AI systems and current performance data over the call if you want it to be reviewed, but these is no expectation to.

Do you work with early-stage startups?

For the development and evaluation services we offer, we tend to focus on growth-stage companies (Series A-C) with established AI products and meaningful API spending ($5K+ monthly). However, the consultation service is designed for any stage company that needs strategic clarity on their AI approach - whether you're pre-Series A exploring AI possibilities or a larger company deciding between different implementation paths.

What if my AI system is unique or complex?

Perfect - that's exactly what I specialize in. Generic monitoring tools and cookie-cutter solutions fail because every AI application has unique failure modes and requirements. I generally work with you to create custom evaluation frameworks and development approaches that understand your specific domain, whether that's legal document analysis, medical diagnosis, financial risk assessment, or something completely different.

How do you handle different tech stacks?

All development work is done in TypeScript, which integrates well with most modern web applications. For evaluation services, I work with your existing infrastructure regardless of language - the monitoring and evaluation systems are designed to work alongside your current setup.

What about data privacy and security?

I work within your existing security protocols and can sign NDAs as needed. For sensitive data, we can work with synthetic data or implement evaluation systems that don't require exposing actual customer data. All code and documentation developed during our engagement belongs entirely to you.

Can you work with regulated industries?

Yes, I have experience working with companies in legal, healthcare, and financial services. I understand the additional compliance requirements and can help design evaluation systems that meet regulatory standards while still providing actionable insights.

What happens if the project needs to be extended?

All services include built-in buffer time, but if scope changes significantly, we'll discuss timeline adjustments upfront. The goal is always to deliver complete, working solutions within the stated timeframe.

Do you provide ongoing support after the engagement?

All services include a 90-day support period for questions and adjustments. For longer-term support or additional features, we can discuss a follow-up engagement.

What about guarantees or refunds?

For the consultation service, if you don't find the session valuable you'll get a full refund. I'll only charge you if you genuinely found the consultation useful to your situation. Since AI development is still such an early-stage field with everyone doing things differently, there's a strong chance our experience can help based on the problems we've solved before, but if we can't add value to your specific situation, then we don't charge you - it's that simple. For the development and evaluation services, all engagements include a satisfaction guarantee. If you don't see measurable improvement or deliverables that meet the agreed specifications, we'll work together to make it right or provide appropriate compensation.