Cloud Pak for Data

 View Only

Connectivity: What's New in 4.5

By Virginie Grandhaye posted Fri July 08, 2022 04:44 AM

I’m delighted to share with you what we delivered in the last Cloud Pak for Data release, especially on the topic of connectivity.
In case you’re not familiar with connectivity, I suggest you first have a look to this blog post that should help you capture basic concepts on what we understand by connectivity.
As you may know, connectivity is a platform mission, intended in serving all services needing access to datasources. As a consequence, we usually deliver features and enhancements either applicable to the entire platform, or specific to some services.
Please also refer to the documentation, which is an area where we invest a lot :

New Connectors :

At the platform level, compared to 4.0, we recently delivered a few new connectors either in 4.5 or in monthly patches :
·      IBM SQL Query: Delivered in CPD 4.0.1
·      Amazon RDS for Oracle: Delivered in CPD 4.0.2
·      Databases for Datastax: Delivered in 4.0.3
·      Google PubSub: Delivered in 4.0.3
·      Exasol: Delivered in 4.0.6
·      Generic S3: Delivered in 4.0.6, and enhanced in 4.0.7
·      Kafka: A Kafka connector was already available for DataStage only. We’ve made it available for the entire platform in 4.5. Adoption is in progress by the different services, based on valid use cases they might have for this datasource.
·      Match 360: Delivered in 4.5. This connector is helping support a better integration between M360 and DataStage or WKC.
Last, but not least we shipped two connectors for DataStage only, to connect to SAP as a datasource. DataStage used to have a specific pack for SAP connectivity on the legacy stack, and without talking about migration (technology is too different between Information Server and Tahoe), those new connectors aim at serving similar use cases (for new workloads) :
·      SAP Bulk Extract: extract high volume of data from SAP ABAP
·      SAP Delta Extract: extract partial data, based on a previous extract as a reference point.

Both are "read" only (no write back to SAP system), and have a dependency on SAP provided libraries, so they will not be available on CPDaaS.
Please expect more details to come from the DataStage team soon on this.

Existing Connectors improvements:

Because DataStage is a big consumer of connectivity, we also allocated some time in supporting more advanced and specific features for this service. Those were mostly driven by direct customer requests. This is not an exhaustive list but here is an example of the recently delivered ones:
·      DB2 optimized connector: improved handling of large data sets, using external tables.
·      Cassandra connector: Support for quoted identifiers.
      Google BigQuery: Support of “optimistic concurrency” (prevent jobs from failing when writing in parallel into the same table)
·      Hive connector: metadata performance improvements
·      MQ connector: Support of an additional authentication mechanism
·      Netezza connector: Certification of NZaaS.
·      Salesforce connector: Support of OAuth with JWT flow, support of Bulk API 2.0, Support of polymorphic fields
·      SQL Server connector: Support of Stored Procedures for DataStage Next Gen. Where in legacy DataStage we had a specific stage for Stored procedures, we decided for Next gen, to add this capability in each connector. We have some other connectors enhancements on the roadmap to support more datasources with this feature.
·      Snowflake connector: support of okta authentication (delivered in 4.0.7) 
·      SAP IQ (formerly Sybase ASE) connector: Support of Stored Procedure.
·      XML connector: Improvement of a complex race condition, resulting in drop of data 
Specific Watson Query enhancements :
  • Single Source Pushdown Enhancement for MySQL, Impala, and Data Virtulization Manager.
  • Pushdown Enhancement on OLAP functions for Db2 and Netezza data sources
Feel Free to reach out to the Watson Query / Data Virtualization PMs for further details.

Connections User Experience

We continuously work with UX teams to improve and keep connections UX simple. In CPD 4.0, we had introduced a capability enabling filtering connection types by “supported services”. Given the very positive feedback we got on this, we continue expanding this.
In 4.5, we delivered a new filter to expose the Manta partnership (Refer to the Watson Knowledge Catalog team for further details on Manta Lineage).
You can thus filter connections types supported for lineage with the Metadata import (lineage)  filter.
Until now, connections were used to get access to data only. The lineage connection types are different, and not necessarily tied to an existing connector of the platform. As such, some connections can only capture metadata, and will not give you access to the data themselves.
This is the case for two new connection types :
·       PowerBI (Azure)
·       PowerBI (Local)
The WKC team is planning to add more of those in the coming releases of CPD, stay tuned…

Operating Model Transformation 

One of the key achievement with 4.5 is also the fact that we work better together, with various PMs, Development, Content, UX teams... across silos. 
This was possible thanks to the great mindset that everyone has in the various teams, to work together, and make customer satisfaction a priority, to allow our platform expand over 2022 and beyond !
I mentioned DataStage, WKC, Data Virtualization, but we also work very close to Watson Studio, Match 360, and others...
It also contributed to a better support of DataSources in the different services of the platform. You may notice that in 4.5, many services now support additional datasources. Please refer to the documentation of each datasource type for further details.
We hope it will help many customers get value out of their investment in Cloud Pak for Data !

Virginie, Malcolm, Kevin, Katie, Alex and others :-)