Cloud Pak for Data Group

Expand all | Collapse all

What would you ask a room of Cloud Pak for Data users and experts?

  • 1.  What would you ask a room of Cloud Pak for Data users and experts?

    Posted 29 days ago

    We asked our members "If you were in a room full of Cloud Pak for Data users, what question would you ask?"

    So far, we’ve posted four questions from community members to the Cloud Pak for Data Group. We will post a few more soon, but if you missed the first discussions in our “Ask the Room” series, you can read and reply to them here:



    ------------------------------
    Shannon Rouiller
    ------------------------------


  • 2.  RE: What would you ask a room of Cloud Pak for Data users and experts?

    Posted 9 days ago

    When it comes to data modelling and storage with SQL and NoSQL, how do you decide which is the best approach for your project?

    Also, when creating AI models, do you find there is a big difference between using SQL and NoSQL databases?



    ------------------------------
    Fernanda Braga
    ------------------------------



  • 3.  RE: What would you ask a room of Cloud Pak for Data users and experts?

    Posted 7 days ago
    In general when you're selecting a data warehouse you want to pay attention to a few things:

    • Type of the Data
      • What kind of data are you storing? Is it images, documents, heirarcical data, tabular, geospatial?
      • Is it consistant? Does the data have nulls? Does the data structure change from record to record?
      • Do you have multiple kinds of data? Is some of it columnar and some of it hierarchical or blobs?
    • Access Patterns 
      • Is your data Transactional? 
      • Do you get one bulk update once a day or is it a realtime feed?
      • How often do you query the data? Do you need whole rows or just single columns?

    SQL databases are generally (with a few exceptions) focused around relational tabular data. This allows you to easily slice and dice the data you're looking for just like if it was a massive spreadsheet.

    NoSQL isn't any single type of database as it is a catch all term for anything that doesn't match my previous description for SQL. This includes everything from document stores like Mongo or Couch DB to Key-Value and Graph databases like Berkeley DB and Neo4j. Which of these you use is going to be determined by the two questions I listed above, what does your data look like, and what access patterns do you have.

    As for which database is most commonly used for AI. Most AI Algorithms exclusively use tabular data and any data that is not tabular must be made such first. So if you're looking at doing visual recognition, the first thing you do is map each pixel to a column in a very wide relational table. Often converting to black and white and normalizing the pixel values in the process. If you choose to use one of the NoSQL databases it is highly likely you will first start by converting the data INTO a tabular form prior to training or using your model.

    ------------------------------
    HANS UHLIG
    ------------------------------