SageTV Community  

Go Back   SageTV Community > SageTV Products > SageTV Linux > SageTV for unRAID/Docker

Notices

SageTV for unRAID/Docker Discussion related to SageTV for unRAID/Docker. Questions, issues, problems, suggestions, etc. relating to SageTV for unRAID/Docker should be posted here.

Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 03-06-2018, 10:59 PM
brookb brookb is offline
Sage User
 
Join Date: Aug 2003
Posts: 32
SageTV Docker Problems

So I've been running Sage on a Windows PC for years. Decided to set up a 'server' with Sage running in a docker. This is a new Supermicro server board with ECC RAM, and it seems rock solid except for the Sage docker. I didn't want to go Unraid, because I wanted ZFS.

I first tried FreeNAS, and got Sage Docker (stuckless/sagetv-server-java8) up and running OK. But if I ever had more than 3 recordings going, there would be glitches and dropouts. Couldn't resolve no matter what things I tried.

Decided to go with Debian linux and OpenMediaVault instead of FreeNAS. Got Sage up in docker on that, and it works beautifully with 6 channels recording. No glitches. Everying very responsive . . . until it crashes. I haven't been able to figure out the crashes. I've tried lots of things, including lowering the JVM memory (was set to 1024MiB - now 975MiB (1023MB)), and disabling a bunch of plugins. That eliminated "Hang Detected" types of failures which I had been getting, but now I'm getting the core dumps. Note that many other plugins (including comskip) are still running. Comskip seems to work fine.

When crashes occur, the server itself stays running fine. No memory errors and no ZFS filesystem errors. The docker itself remains 'up', but the sage java process within it is dead. Using HDHomerun network capture cards for all tuners, and they seem to work great. Docker is using host network.

It will typically run for 12-24 hours and then just crash - usually on the hour.
Lately, it has always crashed when there is no extender connected. No one watching.

I have some the sagetv_0.txt logs: Once when a recording was starting, and another time when EPG was updating. These logs appear to just STOP. No error message. I also have the hs_err_pid106.log and a full core dump (1.5GB) from the crash where the EPG was updating. I can find a way to provide these or any other system files if it will help.

Here are a few highlights from the crash that occurred during an EPG update:

Tail end of sagetv_0.log
Code:
Tue 3/6 4:00:15.074 [Seeker@40c2675] Seeker awoken
Tue 3/6 4:00:15.074 [EPG@545b21d3] EPG attempting to expand HDHomeRun 1016bc0d Tuner 0 Digital TV Tuner
Tue 3/6 4:00:15.074 [Seeker@40c2675] MemStats: Used=409MB Total=838MB Max=1023MB
Tue 3/6 4:00:15.074 [EPG@545b21d3] expand called on HDHomeRun 1016bc0d Tuner 0 Digital TV Tuner at Tue 3/6 4:00:15.074 expandedUntil=Tue 3/6 12:16:39.316 scannedUntil=Tue 3/6 5:00:00.000
Tue 3/6 4:00:15.074 [EPG@545b21d3] Saving properties file to Sage.properties
Tue 3/6 4:00:15.078 [Seeker@40c2675] MARK 1 currRecord=null enc=HDHomeRun 1045340f Tuner 1 clients=[] ir=false
Tue 3/6 4:00:15.079 [Seeker@40c2675] Seeker in AUTOMATIC mode nextRecord=A[16608582,16607185,"The Mick",21209@0306.20:30,30,T] nextTTA=59384926
Tue 3/6 4:00:15.079 [Seeker@40c2675] newRecord=null
Tue 3/6 4:00:15.079 [Seeker@40c2675] NOTHING TO RECORD FOR NOW...
Tue 3/6 4:00:15.079 [Seeker@40c2675] Enabling data scanning for input HDHomeRun 1045340f Tuner 1 Digital TV Tuner
Tue 3/6 4:00:15.079 [Seeker@40c2675] HDHR_setFilterEnable0(FALSE)
Tue 3/6 4:00:15.079 [Seeker@40c2675] startEncoding for HDHomeRun 1045340f Tuner 1, file=null, chan=null
Tue 3/6 4:00:15.079 [Seeker@40c2675] HDHR_setInput0(0x7fc24c1e3280, 100, 0, Air, 140466905415681, 140471200382977)
Tue 3/6 4:00:15.079 [Seeker@40c2675] HDHR_setupEncoding0(0x7fc24c1e3280, (null), 0)
Tue 3/6 4:00:15.079 [Seeker@40c2675] HDHR_getBroadcastStandard0(0x7fc24c1e3280)
Tue 3/6 4:00:15.080 [Seeker@40c2675] MARK 1 currRecord=null enc=HDHomeRun 1045340f Tuner 0 clients=[] ir=false
Tue 3/6 4:00:15.080 [Seeker@40c2675] Seeker in AUTOMATIC mode nextRecord=null nextTTA=9223372036854775807
Tue 3/6 4:00:15.080 [Seeker@40c2675] newRecord=null
Tue 3/6 4:00:15.080 [Seeker@40c2675] NOTHING TO RECORD FOR NOW...
Tue 3/6 4:00:15.080 [Seeker@40c2675] Enabling data scanning for input HDHomeRun 1045340f Tuner 0 Digital TV Tuner
Tue 3/6 4:00:15.080 [HDHomeRun 1045340f Tuner 1-Encoder@c9060cf] Starting capture thread for HDHomeRun 1045340f Tuner 1
Tue 3/6 4:00:15.081 [Seeker@40c2675] HDHR_setFilterEnable0(FALSE)
Tue 3/6 4:00:15.081 [Seeker@40c2675] startEncoding for HDHomeRun 1045340f Tuner 0, file=null, chan=null
Tue 3/6 4:00:15.081 [Seeker@40c2675] HDHR_setInput0(0x7fc24c81f3b0, 100, 0, Air, 140466905415681, 140471200382977)
Tue 3/6 4:00:15.081 [Seeker@40c2675] HDHR_setupEncoding0(0x7fc24c81f3b0, (null), 0)
Tue 3/6 4:00:15.095 [EPG@545b21d3] Done writing out the data to the properties file
Tue 3/6 4:00:15.095 [EPG@545b21d3] sage.epg.sd.SDRipper@1ba1df13 needs an update in 0:00:00
Tue 3/6 4:00:15.095 [EPG@545b21d3] sage.EPGDataSource@1679200d needs an update in 0:59:44
Tue 3/6 4:00:15.095 [EPG@545b21d3] EPG needs an update in 0 minutes
Tue 3/6 4:00:15.095 [EPG@545b21d3] EPG attempting to expand Local Over the Air Broadcast - 85138 (sdepg)
Tue 3/6 4:00:15.095 [EPG@545b21d3] expand called on Local Over the Air Broadcast - 85138 (sdepg) at Tue 3/6 4:00:15.095 expandedUntil=Tue 3/6 11:59:46.978 scannedUntil=Tue 3/6 4:00:00.000
Tue 3/6 4:00:15.095 [EPG@545b21d3] EPG Expanding Local Over the Air Broadcast - 85138 (sdepg) at Tue 3/6 4:00:15.095
Parts of hs_err_pid106.log
Code:
# JRE version: Java(TM) SE Runtime Environment (8.0_121-b13) (build 1.8.0_121-b13)
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.121-b13 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C  [libpthread.so.0+0x8919]  pthread_join+0x9
#
# Core dump written. Default location: /opt/sagetv/server/core or core.106
#
# If you would like to submit a bug report, please visit:
#   http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

---------------  T H R E A D  ---------------

Current thread (0x00007fc1b0008000):  JavaThread "Seeker" daemon [_thread_in_native, id=146, stack(0x00007fc1bf6ad000,0x00007fc1bf7ae000)]

siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007fc1a7d309d0

Registers:
RAX=0x00007fc1a7d30700, RBX=0x00007fc1d4d656a0, RCX=0x0000000000000000, RDX=0x00007fc1bf7ac068
RSP=0x00007fc1bf7ac048, RBP=0x00007fc1bf7ac070, RSI=0x00007fc1bf7ac068, RDI=0x00007fc1a7d30700
R8 =0x0000000000000005, R9 =0x0000000000000006, R10=0x0000000000000006, R11=0x00000000e524dbf0
R12=0x0000000000000000, R13=0x00007fc1d4d65698, R14=0x00007fc1bf7ac1e0, R15=0x00007fc1b0008000
RIP=0x00007fc25292f919, EFLAGS=0x0000000000010206, CSGSFS=0x002b000000000033, ERR=0x0000000000000004
  TRAPNO=0x000000000000000e

 . . . . 

Stack: [0x00007fc1bf6ad000,0x00007fc1bf7ae000],  sp=0x00007fc1bf7ac048,  free space=1020k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libpthread.so.0+0x8919]  pthread_join+0x9
C  [libHDHomeRunCapture.so+0x1152e]  HDHRDevice::setupEncoding(char*, unsigned long)+0x20
C  [libHDHomeRunCapture.so+0xe85e]  Java_sage_HDHomeRunCaptureDevice_setupEncoding0+0x84
j  sage.HDHomeRunCaptureDevice.setupEncoding0(JLjava/lang/String;J)Z+0
j  sage.HDHomeRunCaptureDevice.startEncoding(Lsage/CaptureDeviceInput;Ljava/lang/String;Ljava/lang/String;)V+103
j  sage.HDHomeRunCaptureDevice.enableDataScanning(Lsage/CaptureDeviceInput;)V+21
J 8710 C1 sage.Seeker.work()J (2665 bytes) @ 0x00007fc23ea09804 [0x00007fc23e9f7860+0x11fa4]
J 11207% C2 sage.Seeker.run()V (2220 bytes) @ 0x00007fc23f0d0334 [0x00007fc23f0cdd20+0x2614]
j  java.lang.Thread.run()V+11
v  ~StubRoutines::call_stub
V  [libjvm.so+0x690dd6]  JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)+0x1056
V  [libjvm.so+0x6912e1]  JavaCalls::call_virtual(JavaValue*, KlassHandle, Symbol*, Symbol*, JavaCallArguments*, Thread*)+0x321
V  [libjvm.so+0x691787]  JavaCalls::call_virtual(JavaValue*, Handle, KlassHandle, Symbol*, Symbol*, Thread*)+0x47
V  [libjvm.so+0x72cb00]  thread_entry(JavaThread*, Thread*)+0xa0
V  [libjvm.so+0xa75543]  JavaThread::thread_main_inner()+0x103
V  [libjvm.so+0xa7568c]  JavaThread::run()+0x11c
V  [libjvm.so+0x926268]  java_start(Thread*)+0x108
C  [libpthread.so.0+0x76fa]  start_thread+0xca
 . . . 

---------------  T H R E A D  ---------------

Current thread (0x00007fc1b0008000):  JavaThread "Seeker" daemon [_thread_in_native, id=146, stack(0x00007fc1bf6ad000,0x00007fc1bf7ae000)]
0x00007fc1bf7ac1f8:   00007fc1bf7ac250 00007fc1d4d75cb0
0x00007fc1bf7ac208:   0000000000000000 00007fc1d4d63fb0
0x00007fc1bf7ac218:   00007fc1bf7ac1b8 00007fc1bf7ac238
0x00007fc1bf7ac228:   00007fc1bf7ac298 00007fc23cbdb2bd
0x00007fc1bf7ac238:   0000000000000000 0000000000000000

Instructions: (pc=0x00007fc25292f919)
0x00007fc25292f8f9:   00 00 f0 48 0f b1 17 c3 0f 1f 44 00 00 66 2e 0f
0x00007fc25292f909:   1f 84 00 00 00 00 00 48 85 ff 0f 84 07 01 00 00
0x00007fc25292f919:   8b 87 d0 02 00 00 85 c0 0f 88 f9 00 00 00 48 3b
0x00007fc25292f929:   bf 28 06 00 00 b8 16 00 00 00 0f 84 e4 00 00 00

Register to memory mapping:

RAX=0x00007fc1a7d30700 is an unknown value
RBX={method} {0x00007fc1d4d656a0} 'setupEncoding0' '(JLjava/lang/String;J)Z' in 'sage/HDHomeRunCaptureDevice'
RCX=0x0000000000000000 is an unknown value
  0x00007fc1b0007000 JavaThread "SageTVServer" daemon [_thread_in_native, id=188, stack(0x00007fc1b426c000,0x00007fc1b436d000)]
  0x00007fc208003000 JavaThread "AsyncPropSaver" daemon [_thread_blocked, id=169, stack(0x00007fc1be69f000,0x00007fc1be7a0000)]
  0x00007fc208001000 JavaThread "KeepAlive" daemon [_thread_blocked, id=168, stack(0x00007fc1be7a0000,0x00007fc1be8a1000)]
  0x00007fc1b813c800 JavaThread "PooledThread" daemon [_thread_blocked, id=167, stack(0x00007fc1beea5000,0x00007fc1befa6000)]
  0x00007fc1b8005000 JavaThread "SeekerWatchdog" daemon [_thread_blocked, id=166, stack(0x00007fc1befa6000,0x00007fc1bf0a7000)]
  0x00007fc1b0014000 JavaThread "Scheduler" daemon [_thread_blocked, id=158, stack(0x00007fc1beba2000,0x00007fc1beca3000)]
  0x00007fc1b0011000 JavaThread "MediaServer" daemon [_thread_in_native, id=157, stack(0x00007fc1beca3000,0x00007fc1beda4000)]
  0x00007fc1b8006800 JavaThread "ReProcessHook" daemon [_thread_blocked, id=156, stack(0x00007fc1beda4000,0x00007fc1beea5000)]
  0x00007fc1b000f000 JavaThread "Carny" daemon [_thread_blocked, id=153, stack(0x00007fc1bf0a7000,0x00007fc1bf1a8000)]
  0x00007fc1b000d800 JavaThread "EPG" daemon [_thread_blocked, id=151, stack(0x00007fc1bf1a8000,0x00007fc1bf2a9000)]
  0x00007fc1b000b800 JavaThread "Ministry" daemon [_thread_blocked, id=150, stack(0x00007fc1bf2a9000,0x00007fc1bf3aa000)]
  0x00007fc1c4004000 JavaThread "MsgMgrSocket" daemon [_thread_in_native, id=149, stack(0x00007fc1bf3aa000,0x00007fc1bf4ab000)]
  0x00007fc1b8001800 JavaThread "FSManager" daemon [_thread_blocked, id=148, stack(0x00007fc1bf4ab000,0x00007fc1bf5ac000)]
  0x00007fc1b0009800 JavaThread "MsgManager" daemon [_thread_blocked, id=147, stack(0x00007fc1bf5ac000,0x00007fc1bf6ad000)]
=>0x00007fc1b0008000 JavaThread "Seeker" daemon [_thread_in_native, id=146, stack(0x00007fc1bf6ad000,0x00007fc1bf7ae000)]
  0x00007fc24c819000 JavaThread "AsyncPropSaver" daemon [_thread_blocked, id=139, stack(0x00007fc1d40cc000,0x00007fc1d41cd000)]
  0x00007fc24c810800 JavaThread "MediaServerConnection" daemon [_thread_blocked, id=138, stack(0x00007fc1d4bfd000,0x00007fc1d4cfe000)]
  0x00007fc24c7bd000 JavaThread "Flusher" daemon [_thread_blocked, id=137, stack(0x00007fc1d4efe000,0x00007fc1d4fff000)]
  0x00007fc24c7cf000 JavaThread "LuceneShowTransactionTask" daemon [_thread_blocked, id=136, stack(0x00007fc2100f7000,0x00007fc2101f8000)]
  0x00007fc24c7a9000 JavaThread "LucenePersonTransactionTask" daemon [_thread_blocked, id=135, stack(0x00007fc2101f8000,0x00007fc2102f9000)]
  0x00007fc24c527000 JavaThread "ThreadMonitor" daemon [_thread_blocked, id=134, stack(0x00007fc211033000,0x00007fc211134000)]
  0x00007fc24c1dc000 JavaThread "Service Thread" daemon [_thread_blocked, id=132, stack(0x00007fc2117da000,0x00007fc2118db000)]
  0x00007fc24c1d0800 JavaThread "C1 CompilerThread2" daemon [_thread_blocked, id=131, stack(0x00007fc2118db000,0x00007fc2119dc000)]
  0x00007fc24c1ce800 JavaThread "C2 CompilerThread1" daemon [_thread_blocked, id=130, stack(0x00007fc2119dc000,0x00007fc211add000)]
  0x00007fc24c1cc000 JavaThread "C2 CompilerThread0" daemon [_thread_blocked, id=129, stack(0x00007fc211add000,0x00007fc211bde000)]
  0x00007fc24c1ca000 JavaThread "Signal Dispatcher" daemon [_thread_blocked, id=128, stack(0x00007fc211bde000,0x00007fc211cdf000)]
  0x00007fc24c1c8000 JavaThread "Surrogate Locker Thread (Concurrent GC)" daemon [_thread_blocked, id=127, stack(0x00007fc22c033000,0x00007fc22c134000)]
  0x00007fc24c191800 JavaThread "Finalizer" daemon [_thread_blocked, id=126, stack(0x00007fc22c134000,0x00007fc22c235000)]
  0x00007fc24c18c800 JavaThread "Reference Handler" daemon [_thread_blocked, id=125, stack(0x00007fc22c235000,0x00007fc22c336000)]
  0x00007fc24c00b000 JavaThread "main" [_thread_blocked, id=111, stack(0x00007fc252c5a000,0x00007fc252d5b000)]
The other coredump that occurred when a recording started had the same 'problematic frame':
Code:
# C  [libpthread.so.0+0x8919]  pthread_join+0x9
Any help is much appreciated! I'm happy to provide whatever files you need.
Reply With Quote
  #2  
Old 03-07-2018, 10:25 AM
stuckless's Avatar
stuckless stuckless is offline
SageTVaholic
 
Join Date: Oct 2007
Location: London, Ontario, Canada
Posts: 9,713
certainly looks like it's related to HDHR... (based on the stack)

Code:
j  sage.HDHomeRunCaptureDevice.setupEncoding0(JLjava/lang/String;J)Z+0
j  sage.HDHomeRunCaptureDevice.startEncoding(Lsage/CaptureDeviceInput;Ljava/lang/String;Ljava/lang/String;)V+103
j  sage.HDHomeRunCaptureDevice.enableDataScanning
But, I can't offer a reason why

As for ZFS... there is no benegit to running ZFS under unraid, but it does use XFS which there is a benefit, especially for large files.
Reply With Quote
  #3  
Old 03-07-2018, 01:37 PM
brookb brookb is offline
Sage User
 
Join Date: Aug 2003
Posts: 32
Thanks for the reply, Stuckless!

I don't know much about what Sage is doing while it is 'idle'. I don't know why it might be accessing the HDHR tuners at that time, unless it is just checking to make sure they're still there. The sagetv_0.log file indicated there was nothing to record:
Code:
Tue 3/6 4:00:15.080 [Seeker@40c2675] NOTHING TO RECORD FOR NOW...
, yet the stack shows something going on with the HDHR library.

Some additional info: I still have SageTV running on my Windows PC as well as on this Linux docker. I haven't 'switched over' to the docker instance for primary use yet, due to these problems I'm having with the docker instance. So both copies of Sage have the same six HDHR tuner's registered. I have seen conflicts where one copy of Sage is trying to record from a tuner that is already in use by the other copy of Sage, and that recording fails on the docker Sage. I have yet to see a recording fail on the Windows Sage; perhaps it is able to tell that a tuner is in use before it tries to record. Regardless, neither copy of Sage crashed on these conflicts - it's just that the Docker instance would act as if it was recording, but the recorded file would be empty. Could these tuner conflicts possibly have anything to do with these core dumps? I'm sure there were no conflicts when the core dump occurred, but software bugs can often produce strange symptoms that you wouldn't think would be related until you dig into it. Do you think that if I stop the Windows SageTV instance there is a possibility that the docker instance will quit core dumping?

Also, is it possible that there are newer Silicondust HDHR libraries for HDHR that might be needed? My HDHR firmwares are pretty close to being 'up to date', as I update their firmware from Windows. This might mean their firmware is newer than the libHDHomeRunCapture.so library in the docker. Does anyone know which version (by date) the libHDHomeRunCapture.so library in the docker is? The latest version from Silicondust (source code) is 9/30/2017. See https://www.silicondust.com/support/linux/ for info.

Additional FYI: I'm using ZFS not because of its advantages with Sage, but because the server I've set up is also being used for other purposes, including large libraries of media, software, and pictures that I don't want to lose. I want the built-in integrity testing and 'bit rot' protection of ZFS, as well as the ability to snapshot, access, and roll back to previous versions of files. ZFS is also great protection against ransomware; if ransomware running on a client machine tries to encrypt everything on the server, you can just roll back to the previous versions that were there before you got hit. That being said, getting Debian, ZFS, OpenMediaVault, and SageTV docker set up is definitely a lot more work than I anticipated. But I'm learning a lot. I'm sure the Unraid approach is much easier.
Reply With Quote
  #4  
Old 03-07-2018, 01:54 PM
texneus texneus is offline
Sage Aficionado
 
Join Date: Nov 2009
Location: DFW
Posts: 279
I run Sage TV in NAS4Free with ZFS and know of the dropouts you speak of. I suspect it's due to many layers of virtualization (Docker behind Bhive in a virtualized OS...there is no Docker in FreeBSD). It took a lot of trial and error and experimenting with disk and file system setups, but ultimately if using a newer processor with the new "deep sleep" C states, I think it falls into this and can't wake up quick enough to process real time data.

I was able to fix it by changing the power management to High Performance. That helped a lot but still a few glitches, so I've also set the minimum CPU frequency to 400MHz, now it works beautifully. In theory you could go into BIOS and turn off some of those deeper C-States, but never tried.

There is a NAS4Free guide/walk through that may help get you going on FreeNAS. I need to update with this information and a few other things I've learned. Still work in progress but the old guide might help you figure out what's missing.

P.S. ZFS is a beautiful thing, isn't it!
__________________
Server: Xeon E3-1225, 32GB RAM, Open Media Vault 5, SageTV Docker
Tuners: HD Homerun Quatro (OTA)
Clients: NVidia SheildTV x3

Last edited by texneus; 03-07-2018 at 02:03 PM.
Reply With Quote
  #5  
Old 03-07-2018, 02:27 PM
brookb brookb is offline
Sage User
 
Join Date: Aug 2003
Posts: 32
ZFS and C States

Hi Texneus,

Yeah. Thanks for your inputs. I do love ZFS! I was using it with FreeNAS and everything was great except for Sage Docker dropouts. FreeNAS OS with Rancher OS (linux) in a BHyve VM and docker running on that. ZFS filesystem shared to the Rancher OS VM via NFS. Recordings were fine with 1-3 at the same time, but 4 or more would have the dropouts and glitches. I played with power management including forcing into high performance, and still had the dropouts. I played with NFS parameters, switching to async rather than sync, etc., to try to improve that, but couldn't resolve it. Maybe I missed something you did. Or maybe I could resolve it by disabling C states in BIOS. But I decided to try the Debian Linux approach which would allow me to run docker on the Host OS, avoiding the virtual network layer for NFS. The Debian/ZFS/Docker/SageTV setup eliminates the dropout problems completely - even with power saving states enabled on the host. I can record 6 shows at once with perfect fidelity (until it crashes).

Makes me wonder, though, if the power saving thing could be causing the core dumps. I may try disabling C states in BIOS on the Debian/docker/Sage system and let it run a couple of days that way to see if it core dumps.

If I can't figure out the current core dump problem, I may try installing Sage directly into the host Debian linux without docker to see what that does.

As before, any ideas from anyone are much appreciated!
Reply With Quote
  #6  
Old 03-07-2018, 04:33 PM
texneus texneus is offline
Sage Aficionado
 
Join Date: Nov 2009
Location: DFW
Posts: 279
I missed where you had issues only when using three or more tuners. My problem was with any one tuner, so an overly "relaxed" CPU may not be your issue. BUT...have you noticed heavier than expected disk activity on your NFS shares when recording/watching SageTV? Maybe your running into a physical NFS or disk I/O limit.

Initially my approach was very similar, just using Ubuntu to host Docker, and passing recording shares into the VM with NFS. One of the first things I tried to fix, thinking it to be my problem (it wasnt, but likely contributed), was nearly constant disk activity. Rather than a nice "heartbeat" of disk I/O only once every few seconds like ZFS normally does, I was seeing near solid drive access, as if all the buffering was being bypassed and every bit was being written/read direct to the disk one at a time without any cache at all.

I don't know why NFS would do that, but somebody else told me they were passing disks in with CIFS/SMB (in FreeNAS, no less) and did not see that behavior. In researching I discovered you could also pass entire disk devices to Bhive, so I did that for the past several months, formatting them EXT4 under Ubuntu, believing it to be superior since the "share" overhead is out of the loop. This did work as expected (much much more reasonable disk I/O), albeit as a disk invisible to the rest of the system. Ultimately that created more problems, but for just SageTV it did work very very well once I got my glitches solved.

Very recently (like 2 weeks ago) redid the drives as ZFS as before, but shared into into Ubuntu with CIFS/SMB (not NFS). This works exactly as expected with the nice ZFS cache (ARC) and regaining of other ZFS features. Bliss! Activity on the recording drives is a blip about once every 4 to 5 seconds, as it should be with ZFS. Why NFS apperently doesn't cut it is beyond me.

If you find similar I would really like to know!
__________________
Server: Xeon E3-1225, 32GB RAM, Open Media Vault 5, SageTV Docker
Tuners: HD Homerun Quatro (OTA)
Clients: NVidia SheildTV x3
Reply With Quote
  #7  
Old 03-08-2018, 12:35 AM
brookb brookb is offline
Sage User
 
Join Date: Aug 2003
Posts: 32
If I end up going back to FreeNAS or NAS4Free, I'll try using CIFS instead of NFS to share my ZFS folders with the Rancher/Linux BHyve VM that has Docker in it. Maybe that will resolve the problem of the dropouts and glitches.

I have to say, though, that it seems kind of kludgy to have to have 3 layers of OS before Sage Runs: BSD, Linux VM, Docker, SageTV. That's one of the reasons I decided to try Debian with OpenMedia Server. That removes one layer, including the need to use a network drive sharing method, and simplifies it to Linux, Docker, SageTV. Unfortunately, now I'm having problems with core dumps or 'spinning circle' freezes In Debian/Docker, so I'm not sure what to think.
Reply With Quote
  #8  
Old 03-08-2018, 01:43 AM
brookb brookb is offline
Sage User
 
Join Date: Aug 2003
Posts: 32
So getting back to the Debian/Docker/SageTV problems of Core Dumps. I also sometimes see infinite spinning circles instead of Core Dumps. When that happens, there appears to be two scenarios that cause it:

1) the sagetv_0.txt log is filling up extremely fast. Each 10MB sagetv_0.txt file fills up in about 2 minutes with repeated AWTThreadWatcher messages. Often times there are 10 or more messages EACH MILLISECOND. Within these messages are a few "Hang Detected". e.g.:
Code:
Wed 3/7 21:19:19.025 [AWTThreadWatcher-7085c25b4f8f@60780528] EventThread-7085c25b4f8f Hang Detected - hang time = 2827107 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
2) other times, I get the spinning circle only when I select to view previously recorded files, the Recording Schedule List, or select to watch live TV. When this happens, the log files don't fill as fast, but I get AWTThreadWatcher "Hang Detected" events roughly every 750ms for each extender that is hung. As with my Core Dump scenario, this error occurs on the hour. i.e. the first "Hang Detected" is on the hour. So sometimes it will core dump and sometimes it will spin circles (if an extender is up) e.g.:
Code:
Fri 3/2 10:00:08.936 [AWTThreadWatcher-7085c25b4f8f@74f75e8e] EventThread-7085c25b4f8f Hang Detected - hang time = 2252 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
Fri 3/2 10:00:09.686 [AWTThreadWatcher-7085c25b4f8f@74f75e8e] EventThread-7085c25b4f8f Hang Detected - hang time = 3002 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
Fri 3/2 10:00:10.437 [AWTThreadWatcher-7085c25b4f8f@74f75e8e] EventThread-7085c25b4f8f Hang Detected - hang time = 3753 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
Fri 3/2 10:00:11.187 [AWTThreadWatcher-7085c25b4f8f@74f75e8e] EventThread-7085c25b4f8f Hang Detected - hang time = 4503 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
Fri 3/2 10:00:11.938 [AWTThreadWatcher-7085c25b4f8f@74f75e8e] EventThread-7085c25b4f8f Hang Detected - hang time = 5254 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
Fri 3/2 10:00:12.688 [AWTThreadWatcher-7085c25b4f8f@74f75e8e] EventThread-7085c25b4f8f Hang Detected - hang time = 6004 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
Fri 3/2 10:00:13.439 [AWTThreadWatcher-7085c25b4f8f@74f75e8e] EventThread-7085c25b4f8f Hang Detected - hang time = 6755 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
Fri 3/2 10:00:14.189 [AWTThreadWatcher-7085c25b4f8f@74f75e8e] EventThread-7085c25b4f8f Hang Detected - hang time = 7505 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
Fri 3/2 10:00:14.940 [AWTThreadWatcher-7085c25b4f8f@74f75e8e] EventThread-7085c25b4f8f Hang Detected - hang time = 8256 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
Fri 3/2 10:00:15.690 [AWTThreadWatcher-7085c25b4f8f@74f75e8e] EventThread-7085c25b4f8f Hang Detected - hang time = 9006 UILocker=Thread[EventRouter-7085c25b4f8f,5,main]
Sometimes there are other logged events between these "Hang Detected" events, and other times there are not.

As I mentioned before on the Core Dumps, the core dumps seem to always occur with the following "problematic frame" (from the hs_err_pidxxx.log file)
Code:
# C  [libpthread.so.0+0x8919]  pthread_join+0x9
Looking at what 'libpthread' is, it seems to be from a standard Posix threading library for Linux. It's used for multi-thread management and is written in C. So this kind of makes sense that it might be associated with the 'Hang Detected" on the AWTThreadWatcher. For some reason the libpthread library can't start or stop threads correctly, and either Sage crashes with a Core Dump, or AWTThreadWatcher throws infinite "Hang Detected" errors in the log file and you get spinning circles.

So I get the feeling something is going on with an inability to start or control threads in some way. And since it usually seems to occur in the stack after an HDHomeRun call, it may be that the HDHomerun library calls this, and that's where the hang occurs.

I'm wondering if libpthreads is somehow 'incompatible' with certain Linux kernels. I'm using kernel 4.14.0-0.bpo.3-amd64. I'm also wondering if perhaps calling libthreads (which is an alternate, non-Posix, threading library) instead of libpthreads might help. . .

Thanks again . . .
Reply With Quote
  #9  
Old 03-08-2018, 07:29 AM
stuckless's Avatar
stuckless stuckless is offline
SageTVaholic
 
Join Date: Oct 2007
Location: London, Ontario, Canada
Posts: 9,713
@brookb

Usually when I had those kinds of hang times it was because my drives were going to sleep, and, it can take several seconds for them to fully spin up. This happened to me on unRAID and I disabled my drives from sleeping and i've never seen the issue since.
Reply With Quote
  #10  
Old 03-09-2018, 10:56 AM
brookb brookb is offline
Sage User
 
Join Date: Aug 2003
Posts: 32
Hi @stuckless,

Well, I double checked my drive spindown settings, and made sure they were set to not spin down. I used the OpenMediaVault gui as well as hdparm to disable spindown, but I'm still getting the 'spinning wheel' on Sage after about 12 hours if I ever open the 'recordings' window. Note this is accompanied by Sage not recording anything in the schedule. Strangely I haven't gotten a core dump lately. Just the spinning wheel. Either way, it makes SageTV effectively useless, because it won't restart itself and it won't record while this is going on.

Luckily I'm still running on Windows for my main setup, and that continues to work well. It gives me time to work through these issues on the Linux machine.

Later this weekend, I'm going to try installing directly on linux without the docker and see how that goes.

Thanks you and everyone else for your help!
Reply With Quote
  #11  
Old 03-13-2018, 11:24 PM
brookb brookb is offline
Sage User
 
Join Date: Aug 2003
Posts: 32
Potential cause of problem identified.

As mentioned previously, I have 2 Sage instances running at home and sharing my 6 HDHomeRun tuners: one original Windows SageTV, and the new one I'm putting together in a Linux Docker.

The Linux Docker instance would either core dump or get stuck in spinning circle mode.

One theory I had was that the problem had something to do with conflicts over the HDHomeRun tuners. I eventually got around to testing this theory; I left 4 tuners assigned to the Windows SageTV and assigned the other 2 to the Docker SageTV. I figured there would be no tuner conflicts this way.

So far, after making this change, I haven't had a core dump or spinning circle freeze in more than 48 hours. So I think this is a smoking gun and it is most likely pointing to a software bug in the Linux Sage code or in the Linux HDHomeRun library code that doesn't handle tuner conflicts correctly. The bug only shows itself when there are tuner conflicts; you can avoid the problem by eliminating tuner conflicts. Simply don't have more than one DVR using the same tuner and there won't be any conflicts, thus there will not be any crashes/hangs. Not really a 'fix' - it's more of a workaround to a problem that most people don't have in the first place because most people are only running one DVR at a time.

I won't be sure this is the solution until I shut down the Windows Sage and move all 6 tuners to the Docker. If it avoids crashes and hangs in that scenario, then I think we can pretty much conclude that the hangs/crashes are due to tuner conflicts trying to serve more than one DVR. I'll try to remember to update this thread after I get it figured out for sure.

Thanks to Stuckless and Texneus for helping!
Reply With Quote
Reply

Tags
core dump, crash, docker, libpthread


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
SageTV docker on unRAID 6.4 wayner SageTV for unRAID/Docker 37 02-22-2018 06:28 PM
Sagetv docker on LinuxMint Galaxysurfer SageTV for unRAID/Docker 12 06-26-2017 01:47 PM
SageTV Docker not working ptaylor SageTV for unRAID/Docker 10 06-02-2017 03:40 PM
Is my SageTV docker stuck or not ? makutaku SageTV for unRAID/Docker 21 03-13-2017 09:32 PM
SageTV Docker on macOS benjamintm SageTV for unRAID/Docker 8 12-12-2016 07:29 AM


All times are GMT -6. The time now is 03:08 AM.


Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, vBulletin Solutions Inc.
Copyright 2003-2005 SageTV, LLC. All rights reserved.