The code has been uploaded to GitHub. Project link: https://github.com/chn-lee-yumi/distributed_ffmpeg_transcoding_cluster
Concept
- A distributed video transcoding cluster based on FFmpeg.
- Current goal: Convert any video format to MP4.
- Architecture includes 1 control node and 3 compute nodes.
- For storage, a single-node NFS shared storage is used for now, with potential for distributed storage in the future.
- CPU architecture is currently not restricted — tested on x86, with plans to support ARM.
- Communication between control and compute nodes is via SSH.
- Workflow: The control node receives the task, uploads the file to shared storage, calculates total video frames, distributes the task to compute nodes (based on manually assigned weights), and each node processes a continuous segment of the video. The final result is merged and cleaned by the storage node. The control node then returns the download link for the transcoded file.
Installation & Configuration
Test Environment
- Three public VPS:
cn.gcc.ac.cn
,hk.gcc.ac.cn
,us.gcc.ac.cn
- Storage Node:
hk.gcc.ac.cn
- Control Node:
cn.gcc.ac.cn
- Compute Nodes:
cn.gcc.ac.cn
,hk.gcc.ac.cn
,us.gcc.ac.cn
Storage Node
System: Debian
|
|
Edit /etc/exports
to share directories via NFS:
upload
: Read-write for control node, read-only for compute nodes.tmp
: Read-write for compute nodes.download
: Not shared (local to storage node).
|
|
⚠️ Important: Use the
insecure
option, otherwise mounting will fail with “access denied”.
Apply changes:
|
|
Compute Nodes
|
|
Note: On Debian Jessie, add
deb http://ftp.debian.org/debian jessie-backports main
to yoursources.list
.
Control Node
|
|
Save the following script (e.g. dffmpeg.sh
) and make it executable:
👉 Script omitted here for brevity. It’s the same as in the original post. Refer to GitHub repository for full script
Usage & Testing
- Run the script from the control node:
|
|
test.mp4
is the input file.-c mpeg4
specifies codec,-b:v 1M
sets video bitrate.- The final
.mp4
file will appear in thedownload
directory. - Note: Since the servers are located in different geographic regions, the bottleneck is the read/write speed of NFS. If compute nodes cache the video locally, the speed is significantly better than single-node transcoding.
Sample Output
Final Thoughts
- This was a fun project. Originally, I planned to test this using Raspberry Pi devices, but only had one available, so I used VPS instances instead.
- There’s a commented-out line:
for i in {0..${#compute_node[*]}}
. This doesn’t work in Bash due to how variable expansion interacts with brace expansion. I replaced it with a classic C-style for loop. - The “task completion check” section could be written more elegantly — suggestions are welcome.
Changelog
- v1.1: Fixed issue where final video was longer than original. Caused by placement of
-ss
flag. - v1.2: Added support for FFmpeg output parameters and colourful logs.