I tried Wan2.1-Fun-Control-14B_fp8_e4m3fn.safetensors based on kijai workflow, with a PC with RTX4090 (24GB VRAM) on hand and RTX5090 (32GB VRAM) hosted on Vast.ai.
The video is 57 frames.
With RTX5090, the maximum VRAM usage was about 21 GB, and generation finished within 2 minutes.
In contrast, the RTX4090 took nearly 10 hours to complete the process, even though it was using the full amount of VRAM.
Is this difference due to a difference in chip performance or a difference in CUDA or pytorch generation?
submitted by /u/Similar_Accountant50
[comments]
Source link