SageTV Community

SageTV Community (http://forums.sagetv.com/forums/index.php)
-   SageTV Customizations (http://forums.sagetv.com/forums/forumdisplay.php?f=27)
-   -   Utility: mediaScraper - metadata made easy! (http://forums.sagetv.com/forums/showthread.php?t=38386)

evilpenguin 01-04-2009 11:16 PM

Utility: mediaScraper - metadata made easy!
 
Release History:
What it is:
mediaScrapper is my answer XBMC's excellent, but very confusing, scraping engine and you can use it to track down all the metadata and fanart for any TV/Movies you may have ripped/bought/downloaded/whatever.

Usage Instructions: See 2nd post

How to use metadata files:

* To make SageMC use the .my files, go into SageMC Properties -> Enhancments -> Use .my Files -> Enabled.
* To make the default STV use .properties files, read this thread.


Submitting Issues:

If you have any issue I'll need to see the EXACT PATH/FILE name that is giving you trouble and/or (preferably) the scraper.log file that gets generated next to the .exe after every run.

Example Outputs:

.properties file for default STV:
Code:

MediaType=TV
Title=Wonderfalls 1x05 - Crime Dog
MediaTitle=Wonderfalls
Actor=Caroline Dhavernas;Tracie Thoms;Tyron Leitso;Diana Scarwid;Lee Pace;Katie Finneran;William Sadler;
ActorRoles=Jaye Tyler;Mahandra;Eric;Karen Tyler;Aaron Tyler;Sharon Tyler;Darrin Tyler;
Guest=Audrey Wasilewski;Bill Lake;Kimberly Scott;
Director=Allan Kroeker;
Writer=Krista Vernoff;
Genres=Drama;
Description=When Jaye's brother sees her talking to inanimate objects, he begins to question her state of mind; the animals help Jaye reunite her family with the housekeeper.
Rated=
UserRating=6.0
ReleaseDate=2004-07-23
MediaProviderDataID=theTVDB:78845
SeasonNumber=1
EpisodeNumber=5
EpisodeTitle=Crime Dog

.my file for SageMC:
Code:

Title=Burn Notice
overview=A spy, Michael Weston, receives a burn notice for an unstated reason, effectively firing him. He has spent the previous decade working for the government in Eastern Europe and returns to his hometown of Miami to get his life in order. Michael will stop at nothing to find out why. Shut out from his normal contacts, but still driven to right wrongs, Michael needs to stay under the radar in order to stay in the game.
TVOriginalAiringDate=2007-07-26
actors=Jeffrey Donovan, Gabrielle Anwar, Bruce Campbell, Paul Gutrecht, Guri Weinberg, Audrey Landers, Scott Michael Campbell, Joel Swetow, Hannia Guillen, Jessica Baldwin, Rangel Hernandez Martinez,
TVEpisode=Family Business
TVOverview=Michael infiltrates a family of gunrunners who are pressuring an airport supervisor and his pregnant wife. The FBI begin to put pressure on Sam to gather more information on Michael.
TVDuration=60 minutes
TVGenre=Action and Adventure
TVChannelName=USA

List of Non-Working names that I know about and intend to Fix:

evilpenguin 01-04-2009 11:16 PM

Supported Naming Conventions:
MediaScaper works by looking at your file names and attempting to pull out all of the information it needs to track down the metadata. Chances are if mediaScraper can't find the metadata for your video then the file name is too confusion for it to figure out. Below are the supported naming conventions that will work

Television: Requires Show Title and Season/Episode Number
Code:

Show Title (S##E##|#x##|###) randomJunk.avi
BaseFolder\Show Title\(S##E##|#x##|###) randomJunk.avi
BaseFolder\Show Title\Season #\(S##E##|#x##|###) randomJunk.avi

Movies: Requires Movie Title
Code:

Movie Title.avi
Movie Title (YEAR).avi
Movie Title (YEAR).randomJunk.avi


NOTE: This will not work on shows in Sage Recording format (ShowTitle-EpisodeName-#####-#.mpg)!!!



Basic Usage:

For general usage all you need to do is drag and drop videos and/or folders containing videos onto mediaScraper.exe and, by default, it will download the metadata and fanart and place them right next to the original videos.

TV:
Code:

TV\Scrubs\Scrubs 2x01 - My Overkill.avi
TV\Scrubs\Scrubs 2x01 - My Overkill.avi.properties (metadata)
TV\Scrubs\background.jpg (Fanart)
TV\Scrubs\folder.jpg (Thumbnail/Poster)
TV\Scrubs\banner.jpg (Banner)

Movies:
Code:

Movies\300.avi
Movies\300.avi.properties (metadata)
Movies\300_background.jpg (Fanart)
Movies\300.jpg (Thumbnail/Poster)

Movies (VIDEO_TS Folder):
Code:

Movies\300\VIDEO_TS
Movies\300.properties (metadata)
Movies\300\background.jpg (Fanart)
Movies\300\folder.jpg  (Thumbnail/Poster)


folder.override:

There will be times when mediaScraper just won't be able to find a match for your file. Common problems include
  • Actual show name contains illegal window characters. ex: Terminator: The Sarah Conner Chronicles.
  • Name conflicts with other TV shows. ex: The Office (US) vs. The Office (UK) or Battlestar Gallactica (2003) vs. Battlestar Gallactica.
In these cases, rather than going through and renaming every file to get it to match you can create a folder.override file next to the original video(s) that is just a text file that contains the exact show title that theTVDB or IMDB is expecting for every show in that folder.

Example:
Lets say that you try and run this file through mediaScraper:
Code:

TV\Terminator The Sarah Connor Chronicles\Terminator The Sarah Connor Chronicles 1x01.avi
It would decide that its a TV show with...
Code:

Show Title = Terminator The Sarah Connor Chronicles
Season = 1
Episode = 1

Which looks all good, but when you search for it on theTVDB it returns 0 matches.

With a little manual investigating you'll find that theTVDB will only recognize the title if it contains the ':'.
Code:

Series Title = Terminator: The Sarah Conner Chronicles
However, you can't use a ':' in a Windows file name. This is where folder.override come in.

To fix this you'd create a folder.override in the folder with all the episodes.
Code:

TV\Terminator The Sarah Connor Chronicles\Terminator The Sarah Connor Chronicles 1x01.avi
TV\Terminator The Sarah Connor Chronicles\folder.override

and use notepad to make its contents be the exact show title that theTVDB or IMDB is expecting....
Code:

Terminator: The Sarah Connor Chronicles
Then next time mediaScraper gets a video from that folder, it'll see the folder.override file and instead of trying to figure out the Show Title, it will just read it out of that file and use that to search.

You can also place a folder.override it in the parent directory if you want it to apply to all folders directly above it.
Code:

TV\Terminator The Sarah Connor Chronicles\Season 1\Terminator The Sarah Connor Chronicles 1x01.avi
TV\Terminator The Sarah Connor Chronicles\folder.override

In addition, mostly for movies, you can create a whatEverYourFileNameIs.override so that it applies to only one file.
Code:

The Dark Knight.avi
The Dark Knight.override

mediaScraper.skip:

If you have a folder full of videos you know have no metadata (ex. home movies, clips, etc.) then you can place a file named mediaScraper.skip in the folder and then mediaScaper will ignore it and all its sub folders.
Code:

TV\Home Movies\mediaScraper.skip
Advanced Usage:

MediaScraper reads all of its options out of defaults.txt which is right next to mediaScraper.exe and it is just a list of command line switches that will always be used.

These are the default options:
Code:

/genPropertyFile /downloadFanArt /baseFolder "TV"
In addition you can also add any of the below, available, switches to further customize the output.

Available Switches:
  • /genPropertyFile - Generate a .properties file.
  • /genMyFile - Generate a .my file, contains extra data for SageMC
  • /genInfoFile - Generate a .info file. These don't do anything useful, they are just a dump of all of the available metadata, some of which, doesn't fit into the .my or .properties files.
  • /downloadFanArt - Download fanart/thumbnails/banners if available.
  • /baseFolder "C:\Example\Folder Name" - If you keep all your TV organized in a common base folder you can specify it here to help with scraping accuracy
  • /organizeFiles - If the mediaScraper is able to find metadata for your file it can also rename your file to match. (Will not work with VIDEO_TS folders)
    Code:

    Scrubs.201.lol.hdtv.avi -> Scrubs 2x01 - My Overkill.avi
    Code:

    The.Dark.Knight.2008.DVDrip.xor.aiv -> The Dark Knight (2008).avi
  • /tvSE - This will have mediaScraper use the TV naming format S02E01 rather than 2x01. This will apply to both organizing and metadata files.
  • /organizeBaseFolder "D:\Videos" - If you specify a base folder when you have /organizeFiles set this will move the videos into a folder structure as well.
    Code:

    Scrubs.201.lol.hdtv.avi -> D:\Videos\TV\Scrubs\Season 2\Scrubs 2x01 - My Overkill.avi
    Code:

    The.Dark.Knight.2008.DVDrip.xor.aiv -> D:\Videos\Movies\The Dark Knight (2008).avi
  • /4digitTV - Allow TV shows that don't use a separator between the season/episode (S07E22 -> 722) to be 4 digits long (S11E22 -> 1122). By default this is disabled to prevent a conflict with movies that have the year in the file name (2008 season/episode vs 2008 year).
  • /updateInfo - Normally mediaScraper will ignore videos that already have existing properties files. You can set this to have it run anyway.
  • (NEW) /useOriginalName - Use this if you want to use your original file name in the .properties files rather than the official name scraped from the show info.
  • (NEW) /genXMLFile - Generate an XML file that can be imported in to SageTV web server
  • (NEW) /userName "User" - User name for logging onto a SageTV web server
  • (NEW) /password "Password" - Password for logging onto a SageTV web server
  • (NEW) /sageTVServer "localhost:8080" - Host name and port number of web server
  • (NEW) /addToSageDB - When used with /genXMLFile and web server settings will automatically add a show to Sage's DB using the web servers XML import function. You can use this to have your imported TV show up with your recorded TV.

/addToSageDB:
If you pair this switch with /userName, /password, /sageTVServer, and /genXMLFile then mediaScraper will pack the metadata into a SageTV Webserver XML file and use an experimental feature that will add trick Sage into treating the show as if it were a SageTV Recording. I've been playing around with it and it seems to be working rock solid, but I make no guarantees that this won't blow out your entire wiz.bin: Use at your own risk!

mickp 01-05-2009 03:52 AM

Ooooh. If Mike or Dirk support this for fan art I might just give the feature a go.

Downloading now. Thanks E.P.

Mick.

mickp 01-05-2009 04:28 AM

Initial ignorant impression #1
 
Hey E.P.

Just gave it a burl on a few files

Filenames that didn't work;

Code:

"c:\shares\movies\Atlantis\Season 3\Stargate.Atlantis.S03E01.WS.DSR.XviD-DIMENSION.avi"

"c:\shares\movies\Atlantis\Season 3\Stargate.Atlantis.S03E01.avi"

Filename that did work;

Code:

"c:\shares\movies\Atlantis\Season 3\Stargate Atlantis S03E01.avi"
Unfortunately I'm utter rubbish at regex so can't be of much help with detailed suggestions :(.

Can I suggest replacing the (.) period with a space and also if a postitive result/hit hasn't yet been found try removing the last word of the file (assuming . replaced with space) and giving the search another go. Then removing another, and another?

I realise that I could rename the files so that they work but it would be nice to have it just work with typical download file names.

I'll go have more of a play now :)

Mick.

Edit: Also, would it be possible to not create the metadata files if no result was found?

evilpenguin 01-05-2009 04:52 AM

Yeah, good catch, looks like the '.' in the series name is causing some trouble. I'll that (and a whole bunch of other bugs I found :nono:) all sorted out tomorrow.

mickp 01-05-2009 05:04 AM

Cool. Thanks.

Mick.

evilpenguin 01-06-2009 01:38 AM

Just updated the download with a ton of fixes, the best of which is it won't create output files if it doesn't actually get any data.

mickp 01-06-2009 03:02 AM

1 Attachment(s)
Awesome!

I've adapted my old comskip batch file to process all files in a directory. The log files from this should be quite interesting.

I'll give the new version a run over one of my download directories and pm the logs.

In the meantime i've attached the two batch files required for bulk scraping. Hopefully someone will find them useful.

Usage is scrape [unc path]

Mick.

Edit: Fixed a bug which would cause all shows to be processed every time. Now only files without a .properties file are scraped on subsequent runs.

Edit: Updated version should cope correctly with video_ts directories

NB: You don't need to use this batch file(s) any more as mediascraper.exe will now process all files in a directory.

mickp 01-06-2009 03:07 AM

Um. File appears to be corrupted. One zip program I tried says "error in zip file. Garbage at end of file" :(

Mick.

Opus4 01-06-2009 07:25 AM

BTW: .properties files are read by the core, so that data is put into SageTV's database when importing files. I'm adding this comment because after a quick glance while adding this to the customizations index, it looked like the notes were saying that was for the default STV, but it will work for anything.

- Andy

jaminben 01-06-2009 07:55 AM

Quote:

Originally Posted by evilpenguin (Post 327739)
Just updated the download with a ton of fixes, the best of which is it won't create output files if it doesn't actually get any data.

Great stuff :) However it appears that the zip file is damaged and wont extract (C:\Users\jaminben\Desktop\mediaScraperBeta.zip: The archive is corrupt).

Is this me being stupid or is their something wrong with the zip file?

Cheers

Ben

deanm 01-06-2009 11:47 AM

Quote:

Originally Posted by jaminben (Post 327776)
Great stuff :) However it appears that the zip file is damaged and wont extract (C:\Users\jaminben\Desktop\mediaScraperBeta.zip: The archive is corrupt).

Is this me being stupid or is their something wrong with the zip file?

Cheers

Ben

If you click on the link (PERL source code) and download it that way

Dean

joe123 01-06-2009 11:52 AM

For the rest of us, what does this module do? :)

In basic terms please :D

deanm 01-06-2009 12:17 PM

Nice tool this will save me hours of work. Did try and do this manually once but gave up after about an hour and only 5 shows later.

It looks like the speech mark (“”) can not be displayed properly. Not sure if this is a Sagemc or Sage problem. If I run this on the Simpsons Season 18 Episode 1 then the text at the end “family business.” will not display correctly.

Works fine if you remove the Speech marks (“”)

Dean,

deanm 01-06-2009 12:24 PM

Quote:

Originally Posted by deanm (Post 327899)
Nice tool this will save me hours of work. Did try and do this manually once but gave up after about an hour and only 5 shows later.

It looks like the speech mark (“”) can not be displayed properly. Not sure if this is a Sagemc or Sage problem. If I run this on the Simpsons Season 18 Episode 1 then the text at the end “family business.” will not display correctly.

Works fine if you remove the Speech marks (“”)

Dean,

Strange one this. Works OK if I put the Speech marks back ("") back into the text. Just used notepad to edit the file

evilpenguin 01-06-2009 12:55 PM

Quote:

Originally Posted by mickp (Post 327748)
Um. File appears to be corrupted. One zip program I tried says "error in zip file. Garbage at end of file" :(

Mick.

Quote:

Originally Posted by jaminben (Post 327776)
Great stuff :) However it appears that the zip file is damaged and wont extract (C:\Users\jaminben\Desktop\mediaScraperBeta.zip: The archive is corrupt).

Is this me being stupid or is their something wrong with the zip file?

Cheers

Ben

Not sure what's up with this. I just tried downloading it and extracting it with 7zip (highly recommended, btw) and it works fine. But also it could just be that one of Sourceforge servers got a corrupt file so its working for some but not others, I've updated the link try downloading it again.

Sorry for the confusion.

evilpenguin 01-06-2009 12:58 PM

Quote:

Originally Posted by joe123 (Post 327881)
For the rest of us, what does this module do? :)

In basic terms please :D

Quote:

What it is:
mediaScrapper is my answer XBMC's excellent, but very confusing, scraping engine and you can use it to track down all the metadata for media (currently only TV) you may have ripped/bought/downloaded/whatever.

* For Users: Drag and drop your TV files onto mediaScraper.exe and it'll track down the metadata from TV.com and drop it into a .info, .my, and .properties file right next to it filled with the all that metadata you crave.
* For Developers: Want to bring to bring the rich world of metadata into your plug-in/STV/whatever? See this space for a link to detailed info for how you can incorporate this tool into your own software!

How to use metadata files:

* To make SageMC use the .my files, go into SageMC Properties -> Enhancments -> Use .my Files -> Enabled.
* To make the default STV use .properties files, read this thread.
Basically, the .properties/.my files this generates allow Sage to show extra information about the imported video rather than just its file name like it normally does. I'll post some screen shots tonight to show what I mean.

evilpenguin 01-06-2009 01:04 PM

Quote:

Originally Posted by deanm (Post 327903)
Strange one this. Works OK if I put the Speech marks back ("") back into the text. Just used notepad to edit the file

I've noticed a few weird things like this. I think the issue is that the quotes (") in the webpage are unicode characters, but when I read them in they lose their unicode-ness and just display as junk. I'll see what I can do about fixing that.

evilpenguin 01-06-2009 01:18 PM

Quote:

Originally Posted by mickp (Post 327402)
Can I suggest replacing the (.) period with a space and also if a postitive result/hit hasn't yet been found try removing the last word of the file (assuming . replaced with space) and giving the search another go. Then removing another, and another?

Something like that will have happen eventually if I ever want to solve that "Terminator: The Sarah Connor Chronicles" problem. Right now I'm looking at two possible solutions:
  1. Do something like you mention and start removing words until I get a hit. I think there's a good way to do this and still maintain a certain level of confidence that the hit i'm getting is actually the right show. It'd prolly be best to start removing words based on how long they are (i.e. "Terminator The Sarah Connor Chronicles" (0 hits) -> "Terminator Sarah Connor Chronicles" (0 hits) -> ... -> "Chronicles"(Lots of hits), then check the number of characters I used to make the hit and determine how confident I am based on that.
  2. Generate a list of possible hits and then have on the front end to give the choice to the user, and then rerun the tool using their selection.

mickp 01-06-2009 02:50 PM

Re: the zip file;

I found that if I renamed the zip to be .7z rather than .zip, winrar would open it and allow me to extract the one file "mediaScraperBeta". Extracted and added a .zip extension and voila...

Not sure what's going on with 7zip.

Mick.


All times are GMT -6. The time now is 06:58 AM.

Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Copyright 2003-2005 SageTV, LLC. All rights reserved.