p.enthalabs

Strategic Projects Lead — Audio Data at Besimple AI | Y Combinator

![Image 1](https://www.ycombinator.com/companies/besimple-ai/jobs/yWfhhOR-strategic-projects-lead-audio-data)

Voice data for AI

Strategic Projects Lead — Audio Data

$140K - $160K•San Mateo, CA, US

**Job type**

Full-time

**Role**

Operations

**Experience**

Any (new grads ok)

**Visa**

US citizen/visa only

Connect directly with founders of the best YC-funded startups.

Apply to role ›

About the role

About Besimple AI

Besimple AI is building the data and benchmark infrastructure for the next generation of voice AI. We help AI understand people from all languages and accents.

The founders are ex-Meta product and engineering leaders, from MIT and Brown University. We are a small, high-ownership team working directly with frontier AI labs to push state-of-the-art on audio models.

We are looking for a **Strategic Projects Lead — Audio Data** to own high-priority audio data projects end to end through our platform.

About the role

This is an extremely high-ownership role at the intersection of **strategic operations, audio data, AI data delivery, product, and customer execution**.

You will own complex audio collection and annotation projects from customer requirement to final delivery. You will translate ambiguous customer needs into executable workflows inside our platform, run pilots, manage contributors and reviewers, track quality and throughput, identify bottlenecks, and ensure the final dataset meets customer expectations.

Because our platform is still evolving, this role is not just about operating existing workflows. You will also identify gaps in the platform, define product requirements, and work with engineering to build or improve features needed to deliver projects successfully.

This is not a generic project management role. We are looking for someone who has personally driven messy, cross-functional projects from zero to completion, ideally in AI data, data labeling, annotation, localization, or crowdsourced operations.

What you’ll do

- Own audio data collection and annotation projects from kickoff through final customer delivery.

- Translate customer requirements into project specs, contributor workflows, annotation guidelines, QA rubrics, acceptance criteria, and delivery plans.

- Configure and operate projects through Besimple’s internal platform.

- Design and run pilots to validate task design, contributor fit, audio quality, tooling, throughput, cost, and QA process before scaling.

- Manage day-to-day execution across contributors, annotators, reviewers, QA leads, and internal tools.

- Monitor project health across volume, quality, rejection rate, rework rate, cost, margin, and timeline risk.

- Identify platform gaps that prevent projects from scaling, then write clear product requirements or feature requests.

- Partner with engineering/product to build or improve tools for project setup, contributor workflows, QA, review, payments, reporting, and delivery.

- Partner with contributor growth to ensure we have the right supply by language, accent, demographic, device, skill set, or task type.

- Build dashboards, trackers, and operating cadences for project execution.

- Communicate project status, risks, tradeoffs, and blockers clearly to founders, internal teams, and customers.

- Create repeatable playbooks for future audio collection, transcription, annotation, and QA projects.

- Drive root-cause analysis when projects miss quality, cost, or timeline expectations.

What we’re looking for

- 3–7+ years of experience in data operations, AI data delivery, annotation operations, localization project management, marketplace operations, program management, or similar roles.

- Proven experience owning projects end to end, from ambiguous requirements to final delivery.

- Strong operator mindset: you can break down vague goals, create a plan, execute quickly, and unblock yourself.

- Experience managing complex workflows involving distributed contributors, reviewers, contractors, vendors, or large-scale data operations.

- Strong product sense; able to identify when tooling or platform features are needed and translate operational pain points into clear product requirements.

- Strong analytical ability; comfortable with spreadsheets, dashboards, funnel metrics, QA metrics, and operational KPIs.

- Excellent written communication; able to write clear instructions, guidelines, SOPs, customer updates, and internal product specs.

- Strong quality judgment and attention to detail.

- Comfortable balancing quality, speed, cost, customer requirements, contributor experience, and platform constraints.

- Comfortable working in ambiguity and building processes from scratch.

- High ownership, low ego, and willingness to get hands-on with messy operational details.

Strong pluses

- Experience at a data labeling, AI data, localization, or crowdsourcing company such as Scale AI, Surge AI, Appen, TELUS Digital, RWS, TransPerfect DataForce, Welocalize, Lilt, Turing, DataAnnotation, Outlier, Remotasks, or similar.

- Experience owning end-to-end delivery of data collection, annotation, transcription, evaluation, or QA projects.

- Experience with audio, speech, voice, ASR, TTS, speech-to-speech, transcription, podcast/audio production, or linguistic data.

- Experience building or improving internal tools, workflow systems, annotation platforms, QA systems, or contributor-facing products.

- Experience designing annotation guidelines, QA rubrics, reviewer training, or calibration workflows.

- Experience with multilingual or locale-specific data projects.

- Experience managing large distributed teams of contributors, reviewers, contractors, or vendors.

- Basic SQL, Python, Airtable, Retool, no-code automation, or workflow tooling experience.

Example projects you might own

- Launching a new audio collection project in a priority language through our platform.

- Designing the contributor workflow for voice actor auditions, recording tasks, metadata collection, and QA review.

- Collecting 1,000+ hours of natural speech from contributors in a specific language or locale.

- Building the operating process for detecting low-quality audio, wrong locale, synthetic voice, background noise, or incomplete metadata.

- Running a transcription or annotation workflow for speech datasets.

- Identifying that our platform needs a new QA queue, reviewer dashboard, contributor instruction flow, or reporting feature — then writing the requirements and working with engineering to ship it.

- Running a pilot, diagnosing quality issues, improving the workflow, and scaling the project to full production.

- Creating a reusable playbook that allows future customer projects to launch faster and with fewer manual steps.

What success looks like

You will be successful if Besimple can repeatedly deliver audio datasets that are:

- Accepted by customers

- Delivered on time

- Within budget

- High quality enough for model training

- Operationally repeatable through our platform

- Improving over time in cost, quality, throughput, and automation

Your core metrics may include:

- Accepted audio hours delivered

- On-time delivery rate

- Customer acceptance rate

- Rejection / rework rate

- Cost per accepted hour

- Reviewer agreement / QA consistency

- Contributor throughput

- Project margin

- Platform issues identified and resolved

- Number of reusable workflows or features created

Why join us

This is a high-impact role at an early-stage AI company. You will not just manage projects — you will help build the operating system for how high-quality audio data gets produced at scale.

You will work directly with the founders, own customer-critical projects, shape our internal platform, and define the playbooks we use to deliver audio data for frontier voice AI models.

About Besimple AI

**Why Us**

At **Besimple AI**, we’re making it radically easier for teams to **build and ship reliable AI** by fixing the hardest part of the stack: **data**. Good evaluation, training and safety data require domain experts, robust tooling and meticulous QA. AI teams and labs come to us to get high quality data so they can launch AI safely. We’re a **YC X25** company based in **Redwood City, CA**, already powering evaluation and training pipelines for leading AI companies across **customer support, search, and education**. Join now to be close to **real customer impact**, not just demos.

**Why This Matters**

High-quality, human-reviewed data is still the **single biggest driver of model quality**, but most teams are stuck with old tools and legacy processes that do not scale to **modern, multimodal, agentic workflows**. Besimple replaces that mess with **instant custom UIs, tailored rubrics, and an end-to-end human-in-the-loop workflow** that supports **text, chat, audio, video, LLM traces, and more**. We meet teams where they are—whether they need **on-prem deployments and granular user management** or a fast cloud setup—to turn **evaluation into a continuous capability** rather than a one-time project.

**Traction & Customers**

**Who You’ll Work With**

Founders previously **built the annotation platform that supported Meta’s Llama models**. We’ve seen how world-class annotation systems shape **model quality and iteration speed**; we’re bringing those lessons to every AI team that needs to ship with confidence. You’ll work directly with the founders and users, owning problems end-to-end—from an interface that **unlocks a tough rubric**, to a workflow that **reduces disagreement**, to a **AI judge system** that improves quality.

**How We Work**

- **Bias to shipping** and learning with customers

- **Respect for craft**: calibration, rubric clarity, inter-annotator agreement (IRR)

- **Tight feedback loops** from production back to evaluation

- **Ownership**: you’ll shape **evaluation as an engineering discipline** with real **“fail-to-ship” tests** tied to business and safety goals

If you’re excited by systems that combine **product design, human judgment, and applied AI**—and you want to build the **data and evaluation layer** that keeps AI trustworthy—come build with us. See how fast teams can go from **raw logs** to a **robust, human-in-the-loop eval pipeline**—and how that changes the way they ship AI.