Saturday, March 27, 2021

Accelerating Premiere Elements rendering using an external GPU

Back in 2020, I decided to finally get with the times and attempt to start putting out videos in 1080p. I've had cameras capable of filming at 1080p for a long time, but I've generally downscaled to 720p since that's a common phone resolution (and it allows me to easily fake a multi-camera setup by zooming in on the footage-hope you never noticed), but with larger phone screens and TV viewing of online video becoming more common, I decided I needed to put in the effort to make the jump.

After reworking some elements (the zooming logo and a number of titles needed to be remade), Why it Works: The Chosen One became the first of my 1080p uploads. While the video turned out fine, the rendering time was much longer-1080p does have roughly twice as many pixels as 720p, after all. I wanted to do something about that, but having upgraded the memory and internal storage of my computer (a Lenovo T420) in 2019, I wasn't too keen on buying a new computer that seemed sufficiently fast for everything except rendering video.

Basically, I had two ideas: I could either upgrade the CPU in my machine (in addition to the officially supported dual-core options, it's also possible to use a quad-core processor in my computer), or look at the possibility of using an external GPU. When I first looked in 2020, the former wasn't appealing for various reasons, and the latter (or rather, GPU acceleration in general) wasn't supported by my video editing software.

After a few more excruciatingly long video rendering sessions, I decided to take another look. As it turns out, within the last year Premiere Elements (I'd need a computer with a newer processor architecture to use Premiere Pro) added GPU acceleration support. While I couldn't find any confirmation that an external GPU would be supported, I decided to take the plunge and see what I could do. It involved buying a number of new pieces of hardware:

1. JMT EXP GDC Notebook External PCI-E Discrete Graphics Beast Series External Laptop Docking Station Mini

Not wanting to open my computer to fiddle with the wi-fi card, I decided to go with the ExpressCard option (I do have a USB 3.0 card for that slot, but I've rarely used it).

2. NVIDIA GTX 750 Ti

According to Adobe's documentation, only a fairly limited set of GPUs are supported for acceleration. I cross-referenced this with a list of known builds, and narrowed things down to a few possibilities. Given my computer's relatively old CPU, it wouldn't have made much sense to buy a super-high-end card since it likely wouldn't have been used anywhere near its full potential. Additionally, the current GPU market has some rather high prices (due to a combination of factors), so moving to the lower end of the spectrum was prudent in that regard as well.

3. Dell DA-2 power supply

While there are multiple options for powering the GPU dock, this seems to be the preferred one. Luckily, it wasn't too hard to order one, and I was able to get a unit that was basically unused.


4. 6-pin power cable and extension

The graphics card I bought requires additional power, so I needed cables to connect it to the port on the dock. Apparently 6-pin-to-6-pin connections aren't common outside of this context, so the cable I needed was a bit of a special part. Additionally, since the power port on the top of the card is actually on the other side of the card from the dock's output, I realized I would need an extension cable to reach it.

Altogether, this hardware cost about $270-$300. Once everything arrived, I was able to assemble it and attach it to the computer. The biggest hurdle there was getting it to be recognized as a specific graphics card rather than a generic display adapter. After trying a few things, I eventually found that choosing to "uninstall" the generic adapter from the Windows device manager caused it to be recognized as the card it actually was.

Following an upgrade of Premiere and Photoshop Elements (another $100 or so), I opened some existing projects and went to the settings to verify the graphics card was recognized (it was), then went to do some performance comparisons. In each case, I rendered the video at 1080p with 1-pass VBR at a target bitrate of 18 mbps.


Why it Works: The B-52's and "The Chosen One"

Duration: 1:07

Render Time (No GPU): 11:18

Render Time (With GPU): 2:33 (77% reduction)

This video consists largely of still images and overlays, so it's not surprising that it would improve significantly.


Bonus: Peter Moshay talks creating music for media, recording in a studio, and more!

Duration: 1:59

Render Time (No GPU): 10:22

Render Time (With GPU): 7:17 (30% reduction)

Premiere Elements doesn't appear to use the GPU to actually encode the video, so it's not surprising that this video saw less of a benefit.


Would the music from "Streaming Stampede" fit a Japanese game show?

Duration: 2:28

Render Time (No GPU): 16:57

Render Time (With GPU): 14:00 (17% reduction)

With the fewest overlays and transitions, it's not surprising this saw the least improvement. It is worth noting, however, that I used a separate project to scale the footage from Vegas Stakes and Pokemon Stadium 2, which would have greatly benefited from the GPU acceleration, so I would have saved significant time on the project as a whole.

Other notes:

-The graphics card doesn't actually need to have a monitor attached for Premiere Elements to use it, either for rendering or editing.
-The major constraint in this relationship is definitely my CPU-Task Manager showed that it was still working near its capacity, while the graphics card had no trouble dealing with the data being sent to/from it (outside of some long passages with multilayered video). My only real option here would be to go for the quad-core upgrade I mentioned earlier, but the time saved from just the GPU is probably good enough for now.
-I've run into some system stability issues when using the card. Notably, I've gotten blue screens on startup after shutting the computer, removing the card, and booting it back up. The computer then works fine after the next reboot. I also encountered one crash while the card was plugged into the computer, so I'm leery of using it to improve videoconferencing or other live tasks.
-While the output is generally pretty consistent visually between rendering with or without the GPU, one difference I have noticed is during the end card sequence, where the fade-in of the video and the white matte behind it progresses a bit differently. It's a bit hard to describe, but basically the white matte is more visible underneath the video during the transition when GPU acceleration is on.
-The lowest-end graphics card that seems to be supported for this feature is the GT 730, which is available new under $100 in some configurations. While definitely not considered high-end for gaming, it might be the best choice if you are primarily concerned with improving video editing on an older laptop.

Sunday, March 14, 2021

Interview: Ingo Korb explains the GCVideo project

The following is an interview with Ingo Korb, one of the developers behind the GCVideo project. GCVideo is an effort that analyzed the GameCube’s digital AV output in order to produce a more affordable way to get high-quality video out of the system. As a user of one of its implementations, I wanted to know more about the process.

Where are you from, and how did you get into video games?

I’m from Germany and I was gifted a Commodore 64 in the late 1980s, which has a pretty great selection of games. An SNES with a few games joined the collection sometime in the early 90s, but except for an N64 that I bought for exactly two games (Ocarina of Time and Majora’s Mask), I stayed on the PC side for gaming for a bit. When the Dreamcast went on clearance, I grabbed one and a few cheap games and got back into console gaming, picking up the other contenders in that generation whenever I found a good offer.

How did the GCVideo project get started?

A long time ago, I happened to be sitting in the games room of a convention talking with ikari_01 about that pesky [GameCube] component cable and how it was ridiculously expensive already (he was lucky because he bought one much earlier). Since we both knew our way around FPGAs[1] (he from the SD2SNES, me from dabbling with them over the years), we both wondered why nobody had yet tried to decode the digital video from the console and feed it to a DAC[2] to clone that cable.

What were some of the most interesting technical challenges?

Making the product easily configurable required some work. For example, the original release of GCVideo-DVI did not have an on-screen display. Instead, you had to select line-doubling and scanline settings using jumper wires to various pins of the FPGA board. I had expected that users would just choose one set of options and be done with it, but someone announced on a forum that he planned to add switches for everything to his console-or maybe he showed a completed version, I don't remember exactly. I didn’t want to encourage people to do that, so I decided to put the flexibility of an FPGA to good use by adding a CPU with enough peripherals to the system to build an OSD. The nice thing about doing that on an FPGA is that you have full control over these peripherals-with a standard microcontroller, if some part of it does not work exactly how you need it to, you can only work around it, but with an FPGA I can just make it work exactly as I want.

Other times the challenges were more annoying than interesting. For example, when you have a digital AV interface, most consumer devices will just show a variation of “no signal” if you don’t get it mostly right. Debugging that can be a pain because the only option is to capture what your design generates and then comparing it manually to something that is known to work. There are professional analyzers that could tell you what you did wrong, but they are pretty expensive even on the used market, if you can find one at all.

One of the biggest challenges overall that I can remember was the ICAP[3] interface of the Spartan 3A though. It is an internal interface of the FPGA that is used to reboot the chip into another configuration, and while the documentation looks good at first glance, it glosses over lots of tiny details. There are some other projects on the internet that also access that interface, but they mostly just to tell the chip to reboot with a different configuration, which only needs to write to that interface. GCVideo needs more than that though: The internal flasher tool can be started either when the console is turned on, in which case it should just silently start the main firmware, or it can be started on user request to upgrade the firmware. The only data in the entire FPGA that survives that reboot is in the ICAP interface, so I had to be able to read data back from it that was written before the reboot. I have not found a single project online that does that on a Spartan 3A FPGA, but some old discussions on the chip vendor's forums pointed me in the right direction. It took me about two weeks to find a sequence of accesses to make that thing work reliably and I still can’t fiddle with that part of the code too much because it breaks easily.

What are your future plans (GCVideo or otherwise)?

That’s a secret. =) I find it easier not to talk about future plans for a project because that allows me to scrap them if they don’t work out. Announcing plans in public creates expectations on the user side that such plans are eventually realized and it is much nicer to suddenly hear that there is a new release with some new feature than to hear that the feature you were waiting for will never come.

I’m tempted to do a release with 720p HDR capability on the first of a certain month in the near future, but I'm not sure if I have enough spare time to make that happen. ;)

[1] Field-Programmable Gate Arrays-a technology that enables mimicking other hardware without having to build an exact replacement.

[2] Digital-to-analog converter-converts a digital signal to its analog equivalent (or vice versa).

[3] Internal configuration access port-does pretty much what the name says.