Results 1 to 4 of 4

Thread: Needing some help

  1. #1
    Join Date
    Apr 2009
    Posts
    12

    Question Needing some help

    Hello
    I'm trying to scrape a site for some titles, ids and other information.
    I've figured out the path to the flv and i know how to get it to work in a player, now I just need to figure out how to scrape the site for the information and I hope I'll be one step closer.
    Can anyone give me some pointers on how to scrape a site?

    I read Plugin Creation Tutorial by Voinage and I tried to use the

    Code:
    match=re.compile('<a href="#" onClick="playerArticleID(.+?); return true;"><img height=".+?" alt=".+?" hspace=".+?" src="(.+?)" width=".+?" align=".+?" border=".+?" /><br>\r\n            (.+?)</a>').findall(link)
    but I can't get it to work.

    More specifically I'm not getting anything returned.

    Any help would be really appreciated.

  2. #2
    Join Date
    Feb 2009
    Posts
    427

    Default

    personally i would suggest using the yahoo pipes and scraping from there. its a bit of a different interface, but then if something changes you could just update the pipe and users wouldn't need to update their plugin.

    you can check this thread:
    http://forum.boxee.tv/showthread.php?t=8082

    and view any of the example's source.

  3. #3
    Join Date
    Jan 2009
    Location
    Los Angeles, CA
    Posts
    111

    Default

    I personally use a PHP proxy on my site and the preg_match function to scrape.

    The issue you are having with your regular expressions is that you are not giving it enough to work with.

    For this one, my regular expressions would look something like

    "/onClick=\"playerArticleID([^;]+).*src=\"([^\"]+).* ([^<]+)/"

    I didn't look at documentation, I may not have properly escaped the semicolon in the first match or the double quotes in the second. But, when done right, using preg_match, this would return an array where indexes "1", "2", and "3" would be the matches that you want.

    Then you can either pop those into an RSS feed on the fly, store them in a DB and use the DB to populate RSS (my personal method), or return them back as a response from a urllib "API call",

  4. #4
    Join Date
    Apr 2009
    Posts
    12

    Default

    Quote Originally Posted by ameno View Post
    I personally use a PHP proxy on my site and the preg_match function to scrape.

    The issue you are having with your regular expressions is that you are not giving it enough to work with.

    For this one, my regular expressions would look something like

    "/onClick=\"playerArticleID([^;]+).*src=\"([^\"]+).* ([^<]+)/"

    I didn't look at documentation, I may not have properly escaped the semicolon in the first match or the double quotes in the second. But, when done right, using preg_match, this would return an array where indexes "1", "2", and "3" would be the matches that you want.

    Then you can either pop those into an RSS feed on the fly, store them in a DB and use the DB to populate RSS (my personal method), or return them back as a response from a urllib "API call",
    Quote Originally Posted by xmcnuggetx View Post
    personally i would suggest using the yahoo pipes and scraping from there. its a bit of a different interface, but then if something changes you could just update the pipe and users wouldn't need to update their plugin.

    you can check this thread:
    http://forum.boxee.tv/showthread.php?t=8082

    and view any of the example's source.
    Thank you both. I'll take a look at each suggestion.

Similar Threads

  1. Needing some help...
    By galladanb in forum Boxee Box help
    Replies: 8
    Last Post: March 25th, 2012, 10:16 AM
  2. Noob needing help disappearing repositories
    By panicfreek in forum Boxee Box help
    Replies: 6
    Last Post: February 23rd, 2012, 05:35 PM
  3. Ubuntu user needing help. (boxee as NAS)
    By pooler in forum general boxee help
    Replies: 0
    Last Post: February 2nd, 2011, 06:14 PM
  4. Replies: 0
    Last Post: December 25th, 2009, 06:49 AM
  5. Remote USB receiver needing to be unplugged/replugged
    By vitorcon in forum boxee for linux help
    Replies: 0
    Last Post: August 12th, 2009, 12:34 PM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •