arxiv What comprises a good talking-head video generation?: A Survey and Benchmark