SageTV Community

SageTV Community (http://forums.sagetv.com/forums/index.php)
-   SageTV Github Development (http://forums.sagetv.com/forums/forumdisplay.php?f=53)
-   -   Wiz.bin SQL importer (http://forums.sagetv.com/forums/showthread.php?t=62512)

wnjj 09-27-2015 06:51 PM

Wiz.bin SQL importer
 
1 Attachment(s)
I wrote a standalone C++ program that reads in a wiz.bin file and generates database table schema (CREATE TABLE) and data import (INSERT) SQL commands. Currently SQLite and Postgres formats are supported with the only differences being a few data types and syntax. I recommend SQLite since it's very lightweight and requires no setup.

It's not necessarily complete but seemed to run fine on my version7 (wiz.bin ver 75) and version9 (wiz.bin ver 85 & ver 86). I have not tested any other versions so there could be errors in the format definitions for those which can be fixed easily.

The code is 4 .cpp and 3 .h files and compiles as a Windows console app or Linux.

I'm not sure what explicit purpose this program serves yet but it gave me an excuse to learn the insides of wiz.bin as well as confidently and portably save my TV show history. It could potentially be modified to write wiz.bin files which would allow for SQL data edits (repairs) before exporting back to wiz.bin.

I've included the Windows .exe inside the .zip. Compile on Linux with:
Code:

g++ -o wiz2sql Wiz2SQL.cpp Wizard.cpp Tables.cpp TableField.cpp
Run the program without any arguments for a usage description. An example run with arguments looks like this:

Code:

wiz2sql.exe wiz.bin -schema schema.sql -data data.sql
To import into SQLite:
Code:

sqlite3 wiz.db < schema.sql
sqlite3 wiz.db < data.sql

Play with it, ignore it, critique it, modify it, etc. One thing it lacks right now is support for UTF strings. It works fine for single-byte character sets but will fall apart for 2-byte and more chars. The function that reads strings from wiz.bin is Wizard::readUTF(). Refer to the readUTF function in Sage's FastRandomFile.java for the extra code needed for I18N support. I haven't yet figured out what kind of string format is needed in C++ to handle those. Java apparently uses 1 or 2-byte chars as needed. I also had to filter binary (non-printable and non-whitespace) characters from string fields because those are not handled in database text columns. I had some data like that in some photo files containing embedded binary metadata. That data isn't important anyway and displays as gibberish in SageTV when I press info on those photos.

The tables are built to follow the classes used in SageTV for the various DB objects (show, person, etc). In wiz.bin, related tables were simply serialized inline within the individual records. For these, Wiz2SQL creates related tables like "show_person" which represents "person" data related to a "show". For those tables, the primary ID column is the original table wiz ID (e.g. showID).

In later versions of wiz.bin, there are also indexes which provide a list of ID's, sorted by something specific. Those are built as tables with "_by_" in the name like "titles_by_name" with a 'pos' (position) column and an ID column.

Also, feel free to post questions about the wiz.bin format so Jeff can correct my answers. :)

EDIT: Modified ZIP attachment for Wizard version 87 - NOTE: Not tested.

EDIT (09/27/2018): Fixed bug with UPDATE that created a bogus "-" column title.

PLUCKYHD 09-27-2015 08:04 PM

Thanks for sharing. I am not sure of the usage case yet either but I can see it coming in handy. I definitely will play with it. Thanks again :goodjob:

MrD 09-27-2015 08:17 PM

Quote:

Originally Posted by wnjj (Post 577768)

I'm not sure what explicit purpose this program serves yet but it gave me an excuse to learn the insides of wiz.bin as well as confidently and portably save my TV show history. It could potentially be modified to write wiz.bin files which would allow for SQL data edits (repairs) before exporting back to wiz.bin.

This

scoful 09-28-2015 04:25 AM

Potential Future Move for Open Source to replace WIZ.BIN
 
It would make sense to me that WIZ.BIN be replaced with a SQL database if there's an advantage in speed and/or resilience and standardization for other things to connect to it.
I'd guess that SageTV already kind of behaves like a SQL database engine - I'm presuming clients and extenders don't directly look at WIZ.BIN and rely on the SageTV service / server app to hand data to them like an SQL server does, so there may not be any performance improvement.

KeithAbbott 09-28-2015 05:49 AM

My biggest wiz.bin wishlist item would be some sort of compaction utility, which would leave the wiz.bin file in a state that optimizes efficiency within SageTV, along with hopefully reducing the likelihood of the wiz.bin getting "corrupted".

stuckless 09-28-2015 06:10 AM

@wnjj - you should create a github project for this. People could then fork it, extend it, contribute back, etc.

stanger89 09-28-2015 08:54 AM

Quote:

Originally Posted by KeithAbbott (Post 577780)
My biggest wiz.bin wishlist item would be some sort of compaction utility, which would leave the wiz.bin file in a state that optimizes efficiency within SageTV, along with hopefully reducing the likelihood of the wiz.bin getting "corrupted".

What makes you think that it doesn't already do that? I'm not saying it does, but I've got ~10 years of history in mine and it's not like it keeps growing, I think the size has been pretty stable for years.

Tiki 09-28-2015 10:36 AM

Quote:

Originally Posted by PLUCKYHD (Post 577769)
Thanks for sharing. I am not sure of the usage case yet either but I can see it coming in handy. I definitely will play with it. Thanks again :goodjob:

I wonder if this could easily be extended to allow you to merge two Wiz.Bin files?

Another idea would be if you could use this to roll-back from a newer wiz.bin format to an older format.

wnjj 09-28-2015 11:05 AM

Quote:

Originally Posted by Tiki (Post 577799)
Another idea would be if you could use this to roll-back from a newer wiz.bin format to an older format.


I was thinking of that too. A V9 to V7 might be nice for those who didn't back things up. I'll have to look at what data may be changed or lost going from V7 to V9 but I think it would work ok.

PLUCKYHD 09-28-2015 11:32 AM

More thinking it would also be good for those of us (:cool:) that prefer SQL filtering/querying as compared to sage's API. I know personally I can perform more dynamic and powerful searches using SQL.

wnjj 09-28-2015 11:39 AM

Quote:

Originally Posted by stanger89 (Post 577798)
What makes you think that it doesn't already do that? I'm not saying it does, but I've got ~10 years of history in mine and it's not like it keeps growing, I think the size has been pretty stable for years.

Mine is about 40MB after 6.5 years. There is no compression in wiz.bin. When I looked into it, a decent part of it is all of my photos and the large amount of ASCII "info" pulled from the JPG. Since nothing is compressed or shared, there are literally tens of thousands of strings like "Shutter Speed", flash settings, etc. The thing is, at 40MB I don't really notice a performance issue so maybe if it ain't broke...

The other kind of wasteful thing I noticed is that the FULL path to every mediafile is stored (actually 2x for photos). I have something like 3MB in just the paths of my photos without their name.

KeithAbbott 09-28-2015 11:52 AM

Quote:

Originally Posted by wnjj (Post 577806)
Mine is about 40MB after 6.5 years. There is no compression in wiz.bin. When I looked into it, a decent part of it is all of my photos and the large amount of ASCII "info" pulled from the JPG. Since nothing is compressed or shared, there are literally tens of thousands of strings like "Shutter Speed", flash settings, etc. The thing is, at 40MB I don't really notice a performance issue so maybe if it ain't broke...

The other kind of wasteful thing I noticed is that the FULL path to every mediafile is stored (actually 2x for photos). I have something like 3MB in just the paths of my photos without their name.

I seem to remember a forum post some time back saying that the SageTV server start up time increases as the wiz.bin grows larger. Maybe the file contents are loaded into an array in memory on startup?

stanger89 09-28-2015 12:36 PM

If you've got a lot of history, the profiler (which does Intelligent Recording scheduling) can take a long time to start up, but I don't think it's in any way tied to the "physical" size of the database. Beside the delayed_carny_init or whatever property gets Sage started in seconds anyway (takes a while for IRs to show up though).

Monedeath 09-28-2015 12:44 PM

Quote:

Originally Posted by scoful (Post 577777)
It would make sense to me that WIZ.BIN be replaced with a SQL database if there's an advantage in speed and/or resilience and standardization for other things to connect to it.
I'd guess that SageTV already kind of behaves like a SQL database engine - I'm presuming clients and extenders don't directly look at WIZ.BIN and rely on the SageTV service / server app to hand data to them like an SQL server does, so there may not be any performance improvement.

Your flipside is this:

Quote:

Originally Posted by Narflex (Post 576140)
It's "Jeff's Custom Object Database" format. :)

I wanted to have a true Object database where everything in it corresponded to an actual Java object that I could just use normally and not have to worry about making actual database queries throughout the source code. It also allowed me to write logic which keeps it synchronized between all the clients & server simultaneously without having to do server RPCs for every database query on the client (although there's likely replication logic in SQLite/MySQL that could do the same thing).

There's a huge amount of work to replacing it at this point...all the core SageTV code has been written expecting that all of the database items are actually just in memory Java objects that can be treated that way. Theoretically it could all have been done with Java object serialization...but that doesn't have the client/server sync, and 'delta transactions' that are part of what SageTV has.


wnjj 09-28-2015 12:46 PM

Quote:

Originally Posted by KeithAbbott (Post 577807)
I seem to remember a forum post some time back saying that the SageTV server start up time increases as the wiz.bin grows larger. Maybe the file contents are loaded into an array in memory on startup?

Yes, the DB is loaded into a big array of tables in Wizard.java.

Narflex 09-28-2015 12:50 PM

Quote:

Originally Posted by scoful (Post 577777)
It would make sense to me that WIZ.BIN be replaced with a SQL database if there's an advantage in speed and/or resilience and standardization for other things to connect to it.
I'd guess that SageTV already kind of behaves like a SQL database engine - I'm presuming clients and extenders don't directly look at WIZ.BIN and rely on the SageTV service / server app to hand data to them like an SQL server does, so there may not be any performance improvement.

The prior post pasted my explanation about this before...nowhere near an easy project to replace the DB. And the clients do all have copies of the DB in RAM which are synced to the server in realtime. Extenders don't have copies in RAM (they are the 'thin clients').

Quote:

Originally Posted by KeithAbbott (Post 577780)
My biggest wiz.bin wishlist item would be some sort of compaction utility, which would leave the wiz.bin file in a state that optimizes efficiency within SageTV, along with hopefully reducing the likelihood of the wiz.bin getting "corrupted".

It already does this...there's Wizard.maintenance() which basically does garbage collection on the database daily.

Quote:

Originally Posted by wnjj (Post 577806)
Mine is about 40MB after 6.5 years. There is no compression in wiz.bin. When I looked into it, a decent part of it is all of my photos and the large amount of ASCII "info" pulled from the JPG. Since nothing is compressed or shared, there are literally tens of thousands of strings like "Shutter Speed", flash settings, etc. The thing is, at 40MB I don't really notice a performance issue so maybe if it ain't broke...

The other kind of wasteful thing I noticed is that the FULL path to every mediafile is stored (actually 2x for photos). I have something like 3MB in just the paths of my photos without their name.

There is a 'compactDB' mode in SageTV which is what we used on embedded devices (and on Fiber). This reduces storage of various things...specifically what you mention; but unless you really need that extra 50MB of memory back; it's not worth switching over to it.

Quote:

Originally Posted by KeithAbbott (Post 577807)
I seem to remember a forum post some time back saying that the SageTV server start up time increases as the wiz.bin grows larger. Maybe the file contents are loaded into an array in memory on startup?

Yes, it does grow over time...just not that much. :) And it is all loaded into various arrays in memory...the whole DB lives in RAM and then gets saved out to disk (and it appends it with transactional data between maintenance/compression cycles).

KeithAbbott 09-28-2015 01:06 PM

Quote:

Originally Posted by Narflex (Post 577815)
It already does this...there's Wizard.maintenance() which basically does garbage collection on the database daily.

Fantastic. I have officially dropped wiz.bin compaction off my wish list!

jusjoken 03-03-2016 04:53 PM

I just want to say THANKS for this tool...wow. This makes chasing/debuging issues a breeze as I can clearly see (offline of course) what is going on in the database.

Great tool!

Thanks
k

brandypuff 03-25-2016 07:56 AM

Won't run for me
 
I get a unable to start correctly exception. Tried on a few different computers.

wnjj 03-25-2016 11:42 AM

Quote:

Originally Posted by brandypuff (Post 586716)
I get a unable to start correctly exception. Tried on a few different computers.

There must be some Visual C++ dll dependencies that aren't happy on your machines. I only ran it on mine which has the compiler installed. I'll try it on a machine without one and see what it needs and/or see if I can compile it without needing the DLL's.


All times are GMT -6. The time now is 06:07 PM.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, vBulletin Solutions Inc.
Copyright 2003-2005 SageTV, LLC. All rights reserved.