FilmAgent, jointly launched by Harbin Institute of Technology and Tsinghua University, is a new LLM-based multi-agent collaboration framework for end-to-end film automation in 3D virtual space.
FilmAgent framework:
- Constructs a virtual 3D space with 15 locations, 65 actor positions, 272 lenses and 21 actor actions;
- Defines four roles of director, screenwriter, actor and photographer, and clarifies their respective responsibilities;
- Proposes two collaboration strategies: Critique-Correct-Verify and Debate-Judge;
- Divides the filmmaking process into three stages: creative development, scriptwriting and photography, each of which is completed by different agents in collaboration.
Human evaluation shows that FilmAgent outperforms all baselines in all aspects, with an average score of 3.98 (out of 5 points), proving the feasibility of multi-agent collaboration in filmmaking. Further analysis shows that although FilmAgent uses the less advanced GPT-4o model, it still surpasses the single agent o1, demonstrating the advantages of a well-coordinated multi-agent system.
Project link: