Originally updated on 16th September 2019 by VinceTkac |
Authors:
Vince Tkac
STSM, IBM Cognitive Apps
vtkac@us.ibm.com
Doug Brown
IBM Cognitive Apps
dcbrown@us.ibm.com
Josh Rice
Architect, IBM Cognitive Apps
jmrice@us.ibm.com
SCBN Standard & Premium is a hosted, high volume, business to business data processing system that incorporates B2Bi to help process millions of business documents per month. The SCBN Standard & Premium services oriented architecture makes use of IBM B2Bi software deployed in "pods". A pod is a two node instance of B2Bi with a DB2 database connected to the message bus using IBM WebSphere. SCBN Standard & Premium makes use of B2Bi for both protocols and data transformation. This article discusses an index and purge issue encountered on a B2Bi data transformation pod and the process used to remediate it.
In one of these pods, the B2Bi index and purge processes fell behind. Data builds up quickly and the normal processes were not able to catch up. We were at roughly 240 million entries in WORKFLOW_CONTEXT and 65 million in ARCHIVE_INFO. The external purge job was taking hours just to get a list of BPs to purge (see QUERY1) and the BP indexer was was marking 100k entries every 13-16 hours.
Query plans showed high record counts causing problems even for healthy indexes.
As a non-intrusive option, we wanted to move data out of the primary work tables to reduce the result counts and speed up the maintenance queries. We would do this for both ARCHIVE_INFO and WORKFLOW_CONTEXT thus allowing external purge and BP_index to complete. The intent was to then slowly move the ARCHIVE_INFO and WORKFLOW_CONTENT data back into the primary tables for external purge and BP_index to work on as normal.
A custom DB2 stored procedure was created to move records out of ARCHIVE_INFO and into a save table. This took a few days to run but was able to reduce ARCHIVE_INFO to a more normal working size of around 23 million rows. DB2 runstats was executed at this point and we could see the initial selection query from external purge return quickly.
The stored procedure moves 100k records at a time from ARCHIVE_INFO to a save table. It was run in parallel across 10 different connections with different date ranges to move data faster. With the data moved out of the ARCHIVE_INFO table, we could see the external purge query against ARCHIVE_INFO complete very quickly. Over the course of a week, external purge was able to reduce ARCHIVE_INFO down to around 7 million rows. We trickle fed rows from the ARCHIVE_INFO save table back into the primary ARCHIVE_INFO table for external purge to start picking up. We ended up not having to move any data out of WORKFLOW_CONTEXT.
This is not a fast process. It took approximately 3 weeks to move data out, get the system stable, trickle feed data back in and get the system back to a stable self sustained state but this process required no outage or downtime.
QUERY 1:
SELECT wf_id, group_id, archive_flag, archive_date
FROM archive_info
WHERE wf_id IN (SELECT wf_id
FROM (SELECT DISTINCT wf_id, Row_number()
OVER (
ORDER BY wf_id) AS RANK_ROW
FROM archive_info
WHERE archive_info.archive_date < ?
AND ( archive_info.archive_flag = 1
OR archive_info.archive_flag = 2 )) AS
LEVEL1
WHERE rank_row < ?)
AND archive_info.archive_date < ?
AND archive_info.archive_flag IN ( 1, 2 )
AND archive_info.group_id IN ( 1, 2 )
Setup:
CREATE TABLE "ARCHIVE_INFO_INDEXER" (
"WF_ID" BIGINT NOT NULL,
"GROUP_ID" INTEGER NOT NULL,
"ARCHIVE_FLAG" INTEGER,
"ARCHIVE_DATE" TIMESTAMP,
"WFC_START_DATE" TIMESTAMP
)
DATA CAPTURE NONE
COMPRESS YES
CREATE INDEX "AII_ARCHIVE_INFO_PERF_IDX1"
ON "ARCHIVE_INFO_INDEXER"
("ARCHIVE_FLAG" ASC, "GROUP_ID" ASC, "WF_ID" ASC)
MINPCTUSED 0
ALLOW REVERSE SCANS
PAGE SPLIT SYMMETRIC
COMPRESS YES
CREATE INDEX "AII_SCI_IDX_176"
ON "ARCHIVE_INFO_INDEXER"
("WF_ID" ASC, "ARCHIVE_FLAG" ASC)
MINPCTUSED 0
ALLOW REVERSE SCANS
PAGE SPLIT SYMMETRIC
COMPRESS YES
CREATE INDEX "AII_SCI_IDX_177"
ON "ARCHIVE_INFO_INDEXER"
("ARCHIVE_DATE" ASC, "ARCHIVE_FLAG" ASC, "GROUP_ID" ASC, "WF_ID" ASC)
MINPCTUSED 0
ALLOW REVERSE SCANS
PAGE SPLIT SYMMETRIC
COMPRESS YES
CREATE INDEX "AII_SCI_IDX_178"
ON "ARCHIVE_INFO_INDEXER"
("ARCHIVE_FLAG" ASC, "WF_ID" ASC)
MINPCTUSED 0
ALLOW REVERSE SCANS
PAGE SPLIT SYMMETRIC
COMPRESS YES
CREATE UNIQUE INDEX "AII_ARCHIVE_INFO_PK"
ON "ARCHIVE_INFO_INDEXER"
("WF_ID" ASC, "GROUP_ID" ASC)
MINPCTUSED 0
ALLOW REVERSE SCANS
PAGE SPLIT SYMMETRIC
COMPRESS YES
ALTER TABLE "ARCHIVE_INFO_INDEXER" ADD CONSTRAINT "AAI_ARCHIVE_INFO_PK" PRIMARY KEY
("WF_ID", "GROUP_ID")
CREATE INDEX "AII_IDX_2"
ON "ARCHIVE_INFO_INDEXER"
("WFC_START_DATE" ASC)
MINPCTUSED 0
ALLOW REVERSE SCANS
PAGE SPLIT SYMMETRIC
COMPRESS YES
Stored Procedure 1:
create or replace procedure AII_MOVE_ARCHIVE_INFO_TO_OFFLINE (
in pStart timestamp,
in pEnd timestamp
)
begin
insert into ARCHIVE_INFO_INDEXER
(WF_ID, GROUP_ID, ARCHIVE_FLAG, ARCHIVE_DATE, WFC_START_DATE
)
(select a.WF_ID, a.GROUP_ID, min(a.ARCHIVE_FLAG), min(a.ARCHIVE_DATE), max(w.START_TIME) from
archive_info a inner join
workflow_context w on
a.wf_id = w.workflow_id
where a.archive_date is null
and w.start_time between pStart and pEnd
group by a.WF_ID, a.GROUP_ID
order by a.wf_id, a.group_id
fetch first 100000 rows only)
;
delete from archive_info where
(wf_id, group_id) in
(
select a.wf_id, a.group_id from archive_info a
inner join ARCHIVE_INFO_INDEXER AAI
on a.wf_id = aai.wf_id --and a.group_id = AAI.group_id
where aai.WFC_START_DATE between pStart and pEnd
);
commit;
end
#
# sample call
#call AII_MOVE_ARCHIVE_INFO_TO_OFFLINE ('2000-01-01', '2018-01-01')
Once the backlog was clear, data was moved back into the ARCHIVE_INFO table from the save table. Only about 5 million rows were moved at a time by calling this stored procedure multiple times. Monitored the count of un-indexed data in the ARCHIVE_INFO table and moved more data over when it got low enough.
Stored Procedure 2:
create or replace procedure MoveAAIDataBack ()
begin
declare movedate timestamp;
select max(WFC_START_DATE) into movedate from (
select * from ARCHIVE_INFO_INDEXER
order by WFC_START_DATE
fetch first 100000 rows only
)
for read only with ur;
insert into ARCHIVE_INFO (wf_id, group_id, archive_flag, archive_date)
select wf_id, group_id, archive_flag, archive_date from archive_info_indexer
where WFC_START_DATE <= movedate;
delete from ARCHIVE_INFO_indexer where WFC_START_DATE <= movedate ;
end
#
#IBMSterlingB2BIntegratorandIBMSterlingFileGatewayDevelopers#DataExchange