Pocket FM is on a mission to deliver personalised and immersive audio experiences to listeners worldwide. We are revolutionising the audio entertainment industry through long-form storytelling, supported by our cutting-edge platform that serves millions of listeners and generates billions of minutes of engagement monthly. We leverage Generative AI in producing content and streamlining operations, developing innovative solutions for cutting-edge challenges in the AI landscape across all modalities—text, audio, and images. With strong backing and rapid user base growth, Pocket FM is an exciting and dynamic place to join.
The Role: What You'll Build and Own
Design and implement an agentic orchestration framework that Selects optimal video generation models per scene, Constructs and refines prompts dynamically decomposes episode-level goals into scene-level tasks, manages generation, validation, and refinement loops
Build a multi-agent system that can translate high-level episode briefs into structured scripts, break scripts into scenes, shots, and animation beats, select visual style, pacing, and emotional tone parameters, trigger the appropriate video models and pipelines
Develop automated prompt engineering strategies, model selection heuristics (or learned selection policies), self-refinement and critique loops, quality control mechanisms (LLM- or vision-based evaluators)
Create orchestration logic for scene continuity (character consistency, environment persistence), Style preservation across the episode, Temporal coherence,Budget / compute optimisation
Production-ready pipeline for end-to-end anime episode generation
Your Technical Toolkit:
Masters or PhD in Computer Science, AI, ML, or related field
Strong experience with Large Language Models (LLMs), multimodal generative models, prompt engineering and prompt optimisation, python and production ML systems
Hands-on experience building agentic systems (e.g., ReAct, AutoGPT-style, planning agents), tool-using LLM systems, and Orchestration pipelines
Deep understanding of video generation models, Model evaluation and benchmarking and experimentation frameworks.
Preferred Qualification:
Experience with video diffusion or text-to-video systems, character consistency techniques (LoRA, embeddings, adapters),scene planning or hierarchical generation, reinforcement learning or policy learning and automated content evaluation systems
Familiarity with anime production workflows, storyboarding, shot composition and pacing, diffusion models, and narrative structure
Experience deploying distributed ML systems, GPU-accelerated pipelines and cloud-based ML infrastructure
ATS Match is available
1) Upload your resume. 2) Open any job and click Check ATS Match to see your fit score.