I was raised in the old school where Nyquest and Shannon ruled and am firmly ingrained with the notion that sampling something at a low rate will lose information that cannot be regained.
But today’s young upstarts are claiming that you can upscale a VHS (240 line interleaved raster video) to “4K”. When looking into it I find they are using some AI techniques to create/replicate the lost detail.
From my oldster point of view, I interpret this to be creating realistic looking fake detail. Even if fake, if the video is more pleasurable to watch it may be worth it to use these techniques. So I thought I’d play around with it to see if some old videos could be improved.
There was some suggestion that a project called TecoGAN (apparently for Temporally Coherent Generative Adversarial Network) was a good one to try out.
I hate python
Every project I’ve dabbled in that uses python has always gotten to the point of not working due to some hard to figure out dependency. And even trying to isolate things using Python virtual environments for each doesn’t always stop the breakage.
For my topographic map generation I finally just got a dedicated machine so that I could trust that if I came back to it after a few months it would still run.
Unfortunately TecoGAN is written in Python.
And the instructions say, among other things, that it needs version 1.8 or higher of tensorflow. So you install the latest tensorflow and find nothing works. Turns out that tensorflow is now up into the v2.x range and the API has changed. The last v1 release was v1.15 so install that instead.
But it also uses keras which it simply imports using Keras>=2.1.2
but that loads v2.4.3 which requires tensorflow v2.2 or higher.
Setup Python Environment
python3 -m venv --system-site-packages ./venv
source ./venv/bin/activate
pip install --upgrade pip
# pip install --ignore-installed --upgrade tensorflow
pip install --ignore-installed --upgrade tensorflow==1.15
pip install -r requirements.txt
pip install --ignore-installed --upgrade Keras==2.1.2
python3 runGan.py 0
Upscaling a Video
Make Into Frame Images
Looking at the samples, it looks like all this works on individual images. So the first step is to extract each frame from a video as a separate image.
source venv2/bin/activate
cp /Volumes/media/..../video.mp3 ./content/
mkdir -p LR/mytest
cd LR/mytest
#
# 5 digits is enough for 99999 frames. At 30 fps that
# is 3,333.3 seconds (55'33.3")
#
ffmpeg -i ../../content/video.m4v %05d.png
Upscale the Frames
#
# Edit runGan.py to use mytest directory for input
#
python3 runGan.py 1
Create Movie From Upscaled Frame Images
Two steps:
- First extract the audio from the first movie.
- Second, create a new movie from the upscaled frame images and the sound track from the original.
ffmpeg -i content/video.m4v content/mytest.mp3
ffmpeg -r 30 -f image2 -i results/mytest/output_%05d.png -i content/mytest.mp3 -vcodec libx264 -b 4M -acodec copy mytest.mp4
Results
Very Low Res Home Movie
My first attempt was on a video recorded on an early digital point and shoot camera that had a video mode. The video was at 12 fps with a claimed resolution of 384×288. I used the downloaded training information from the TecoGAN project.
The upscaled result was of no better quality that that achieved by simple scaling. In some regards it was worse in that there were significant distracting artifacts. It was so bad I did not bother saving the results.
Possible reasons for poor results
- Way too much expansion attempted.
- Slow frame rate (too many changes per frame?).
- Training data used did not reflect the degradations present in this video.
VHS Video
Video taken in early 1990s by a camcorder mounted on a tripod. Camcorder video analog copied to VHS and years later ripped to MP4. Video is 640×408 at 30 fps. Again, I used the downloaded training information from the TecoGAN project.
Attempt to process entire 00:16:20 failed to output a single frame without logging reason. A 00:00:50 extract (1500 frames) was made.
The results were slightly better than simply scaling the video with ffmpeg. But hardly worth the effort: Hours of frame by frame processing, stripping the audio from one video and adding it to the result, etc., versus a few minutes and single command in ffmpeg.
Conclusions
This neural network AI based scaling does not automatically produce better upscaling than other lower cost methods.
In my case it may be due to the training data not matching the types of videos I was working with. This, however, is not easily rectified: I have no high resolution versions of these videos to use to create a training data set.
If a training dataset based on generic VHS quality input is available and if the tool suite is refined to make this easier, it might be worth revisiting.
But at present I view this as a technology that isn’t quite ready for general home use.