SPSS Statistics

 View Only
  • 1.  Python WebScraper

    Posted Fri August 12, 2022 12:42 PM
    Is there a way to nest a python block that scripts a webcrape?  If so does someone have an example? - thanks Arthur

    ------------------------------
    Art Jack
    ------------------------------

    #SPSSStatistics


  • 2.  RE: Python WebScraper

    IBM Champion
    Posted Fri August 12, 2022 01:11 PM
    There are a lot of Python applications that can do web scraping, so you could grab one of those and embed it in a begin program block or create a full fledged extension command to make this seamless.

    One of the popular webscraping and parsing tools is Beautiful Soup.
    There is a tutorial here

    To install it or other Python or R packages, use the STATS PACKAGE INSTALL extension command.
    For Beautiful Soup, specify
    beautifulsoup4

    --





  • 3.  RE: Python WebScraper

    Posted Fri August 12, 2022 01:31 PM
    Jon could you give me more details on how to install this package?


    Do you mean Extension Hub? or to Install Local Extension Bundle?

    In "Extension Hub" the package you mention does not appear.
    
    And in "Install Local Extension Bundle" it asks me for a .spe file, which if I understand correctly, are generated by IBM




    ------------------------------
    Mauricio Mora
    ------------------------------



  • 4.  RE: Python WebScraper

    IBM Champion
    Posted Fri August 12, 2022 01:39 PM
    Beautiful Soup is not an extension command and is not on the Extension Hub.  The PACKAGE INSTALL extension command installs packages from the standard (nonIBM) Python or R libraries.  But you would still have to write the code to use its tools to actually do specific web scraping.

    Another popular tool is Scrapy.

    Either of these would require someone writing a chunk of Python code to do particular tasks.

    --