SageTV Community

SageTV Community (http://forums.sagetv.com/forums/index.php)
-   Hardware Support (http://forums.sagetv.com/forums/forumdisplay.php?f=9)
-   -   Video Conversions on Sage Server now trigger ESXi reboot (http://forums.sagetv.com/forums/showthread.php?t=60841)

WellThen 10-19-2013 01:20 PM

Video Conversions on Sage Server now trigger ESXi reboot
 
A little more than a month ago, I built a new ESXi 5.0 white box host, to run multiple VMs including my Sage Server. It's been working fine until the beginning of this week. On Monday, we had a power outage, but the host is attached to a UPS, so as far as I know, it was fine. After the power came back, I checked the host and it was still up, so I assume it ran on battery during the 45-50 minute outage. (My automatic shutdown from the UPS apparently did not work, reason unknown.)

A day or two later, I was watching a show on a client, and the server (host) rebooted with no warning. Since then, every time I start up Sage (whether as a service or an Application) the ESXi host reboots about 2 minutes later. Also, all of my other VMs are shut down so this is not some activity from something else.

Starting the Sage VM (Windows 7) alone does not cause the issue, only actually starting Sage. I've turned on debug logging in Sage to get a sense of what's going on.

My recording drives (4 physical drives) are attached to an M1015 controller, and I wondered if there was a problem with the drives or the controller. I have run chkdsk on each of the recording drives, and no errors were reported.

I did boot the host to the bios settings and confirm the CPU was not overheating. But of course, that was not under load.

I played another hunch, and as soon as Sage started, checked for queued video conversions (I automatically convert any show recorded with my HD PVR so that I can watch it on clients that can't handle .ts files) I found that a show was awaiting conversion, and quickly cancelled the conversion. Once I did this, Sage (and the ESXi host) stayed up. I then tried a conversion of another .ts (different show), and once again, the ESXi host reboots. Once again, video conversions had been working fine for the month or so since the new server was built.

Any suggestions about further diagnostics I can do to isolate what might be causing video conversion activity from Sage to bring down the ESXi host? I'm certainly not blaming Sage, but since a video conversion consistently precedes the reboot, I know that something Sage conversion does (disk I/O, traffic, CPU) must be involved.

BobPhoenix 10-21-2013 02:34 PM

I convert MPG's and 480i/p TS's all the time with SageTV in my VM's. I've never been able to get a successful conversion with a 1080i and the SageTV transcoder. It doesn't reboot the server it just fails. Doubt that helps much if you've been using the transcoder on HD TS material successfully. But some time in the next couple of months I intend to setup a separate VM and compress them with SJQ and Handbrake. You might try something like that instead.

WellThen 10-22-2013 11:02 AM

Hi Bob,

Thanks for the reply. I may try a different file conversion solution, as you've suggested.

Since it was working for a month (converting one or more recordings every day) I have to assume that something fundamental about the host is now compromised. I ran memtest86+ for a couple of hours over the weekend, and saw no issues. I need some non-Sage way to simulate the CPU load associated with converting files. My impression is that this is the most CPU-intensive thing that Sage does, and perhaps if I can reproduce it with some testing utility, then I can isolate whether it's a problem with the CPU or some other component.

WellThen 10-26-2013 09:16 PM

Since I can reliably cause an ESXi 5.0 host reboot by running a video conversion, I decided to watch as best I can to see what happens when a conversion starts. I started up Task Manager on the Win7 VM that Sage Server is running in, and went to the Performance tab. As soon as I started the conversion, all 3 of the virtual processors I have assigned to Sage went to 100%.

This made me wonder, how many Virtual CPUs does anyone dedicate to their Sage VM? VMware has that scary warning about how increasing Virtual CPUs may make the guest O/S unstable, does anyone know whether this can be done safely in some cases? I've already tried increasing RAM from 4G to 8G - did not help.

I've been casting about for a way to safely stress test my CPU, but I haven't found anything that shows my Temp sensors (I've tried StressLinux, and running HWINFO64 in the Sage VM.) I don't think it's safe to run this kind of test if you can't monitor temps. Any suggestions about a solution that might be able to read temps from my AMD A10-6700 on Gigabyte GA-F2A85X-UP Motherboard?

SafetyBob 10-28-2013 06:42 AM

First, I want to make a statement: I am not anyway a Linux or VM expert, but I have played with both with an expert so I know just enough to be stupid on the subject.

OK, this is way off the beaten path of thinking, but did you or your Win7 VM update lately? There was some heavy updates recently with Win7 that may have, perhaps, sent your VM into crazy land. Just a thought. I cannot think of any other reason, as you said, everything was working......sure has the smell of an update gone bad to me.....

Something to look at.

Bob E.

BobPhoenix 10-28-2013 09:44 AM

Quote:

Originally Posted by WellThen (Post 558334)
Since I can reliably cause an ESXi 5.0 host reboot by running a video conversion, I decided to watch as best I can to see what happens when a conversion starts. I started up Task Manager on the Win7 VM that Sage Server is running in, and went to the Performance tab. As soon as I started the conversion, all 3 of the virtual processors I have assigned to Sage went to 100%.

This made me wonder, how many Virtual CPUs does anyone dedicate to their Sage VM? VMware has that scary warning about how increasing Virtual CPUs may make the guest O/S unstable, does anyone know whether this can be done safely in some cases? I've already tried increasing RAM from 4G to 8G - did not help.

I've been casting about for a way to safely stress test my CPU, but I haven't found anything that shows my Temp sensors (I've tried StressLinux, and running HWINFO64 in the Sage VM.) I don't think it's safe to run this kind of test if you can't monitor temps. Any suggestions about a solution that might be able to read temps from my AMD A10-6700 on Gigabyte GA-F2A85X-UP Motherboard?

The more virtual cores you give the VM the more it might have to wait to get a real CPU core if another VM is using it. That is why except for SageTV I limit my VMs to a single core. What I wish (and maybe it is possible as I'm no expert) is to dedicate three REAL cores as always available to the SageTV VM and let the other VMs be swapped in and out of the rest. With a Xeon with hyper threading ESXi thinks there are 8 cores but I try to keep my SageTV VM to less than the 4 REAL cores on the CPU since hyper threading is still just another way to virtualize the actual cores. That's an over simplification of what hyper threading is but it is what I got out of reading about it. By limiting my SageTV VM to just three probably isn't helping but if ESXi can dedicate REAL cores to a specific VM then it should make a difference.

WellThen 11-02-2013 12:47 PM

First, thanks to BobPheonix and SafetyBob for the suggestions and comments. Really appreciated.

An update on things. I finally installed Windows 7 on its own drive so that I could run some of the stress testing utilities while still monitoring the CPU temps. I found that running Prime95, I could easily reproduce the spontaneous reboot in seconds using the Blend test that I found recommended here. The article mentions some other tests, like Large FFT, that work the CPU but "lay off the RAM a bit". I tried this alternate test, and was able to let it run for 10 minutes with no reboot.

Based on this, I picked up some replacement RAM, and now I have a Sage video conversion just completed under ESXi with no issues. So it appears my RAM was bad.


All times are GMT -6. The time now is 01:42 AM.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2022, vBulletin Solutions Inc.
Copyright 2003-2005 SageTV, LLC. All rights reserved.