MySql拉链表算法实现
拉链表是针对数据仓库设计中表存储数据的⽅式⽽定义的,顾名思义,所谓拉链,就是记录历史。记录⼀个事物从开始,⼀直到当前状态的所有变化的信息
数据状态的变化分类对数据的操作
没有变化的数据 不做任何操作
insert的数据新开链,结束⽇期置为最⼤⽇期(当前有效)
delete的数据关链(结束⽇期为变化⽇)
update的数据拉链(开链、关链)
⼀.DDL
-- 源表
DROP TABLE IF EXISTS SRC;
CREATE TABLE IF NOT EXISTS `SRC` (
`ID` int(11) DEFAULT NULL,
`NAME` varchar(255) DEFAULT NULL,
`BAL`  int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
-- 拉链⽬标表
DROP TABLE IF EXISTS TAG;
CREATE TABLE IF NOT EXISTS `TAG` (
`ID` int(11) DEFAULT NULL,
`NAME` varchar(255) DEFAULT NULL,
`BAL`  int(11) DEFAULT NULL,
`START_DT`  Date DEFAULT NULL,
`END_DT`  Date DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
-- 当天数据存放临时表
drop table if exists VT_NEW;
create table if not exists VT_NEW like TAG;
-- 当天数据新增变化临时表
drop table if exists VT_INC;
create table if not exists VT_INC like TAG;
⼆.第⼀次初始数据⼊拉链表
001 源表数据插⼊
insert into SRC values(101,'王五',100);
insert into SRC values(102,'赵四',50);
+-----+------+------+
| ID  | NAME | BAL  |
+-----+------+------+
| 101 | 王五 |  100 |
| 102 | 赵四 |  50 |
+-----+------+------+
002 初始数据⼊⽬标表(第⼀次跑数,初始数据直接关链)
insert into TAG(id,name,bal,start_dt,end_dt)
select t.id,t.name,t.BAL,CURRENT_DATE,'2099-12-31'
from SRC t;
+-----+------+------+------------+------------+
| ID  | NAME | BAL  | START_DT  | END_DT    |
+-----+------+------+------------+------------+
| 101 | 王五 |  100 | 2022-03-17 | 2099-12-31 |
| 102 | 赵四 |  50 | 2022-03-17 | 2099-12-31 |
+-----+------+------+------------+------------+
三.T+1增量及更新数据拉链算法
001 源表数据插⼊及更新
update SRC set bal='1000' where id=101;
insert into SRC values(103,'张三',10);
+-----+------+------+
| ID  | NAME | BAL  |
+-----+------+------+
| 101 | 王五 | 1000 |
| 102 | 赵四 |  50 |
| 103 | 张三 |  10 |
+-----+------+------+
002 VT_NEW存放源数据,直接开链
drop table if exists VT_NEW;
create table if not exists VT_NEW like TAG;
insert into  VT_NEW(ID,NAME,BAL,START_DT,END_DT)
SELECT    ID,NAME,BAL,date_add(CURRENT_DATE,interval 1 day) as START_DT /*模拟T+1*/,'2099-12-31' as END_DT FROM    SRC ;
+-----+------+------+------------+------------+
| ID  | NAME | BAL  | START_DT  | END_DT    |
+-----+------+------+------------+------------+
| 101 | 王五 | 1000 | 2022-03-18 | 2099-12-31 |
| 102 | 赵四 |  50 | 2022-03-18 | 2099-12-31 |
| 103 | 张三 |  10 | 2022-03-18 | 2099-12-31 |
+-----+------+------+------------+------------+
003  VT_INC存放出对⽐出的新增及修改数据
drop table if exists VT_INC;
create table if not exists VT_INC like TAG;
insert into VT_INC(ID,NAME,BAL,START_DT,END_DT)
select ID,NAME,BAL,START_DT,END_DT
from VT_NEW
where (ID,NAME,BAL) not in (select ID,NAME,BAL from TAG where END_DT='2099-12-31');
+-----+------+------+------------+------------+
| ID  | NAME | BAL  | START_DT  | END_DT    |
+-----+------+------+------------+------------+
| 101 | 王五 | 1000 | 2022-03-18 | 2099-12-31 |
| 103 | 张三 |  10 | 2022-03-18 | 2099-12-31 |
+-----+------+------+------------+------------+
004 更新⽬标表需要做拉链的数据,将原数据关链
UPDATE TAG
SET END_DT = date_add(CURRENT_DATE,interval 1 day)/*模拟T+1*/
WHERE END_DT = '2099-12-31' AND ID IN (SELECT ID FROM VT_INC);
+-----+------+-----+------------+------------+
| ID  | NAME | BAL | START_DT  | END_DT    |
+-----+------+-----+------------+------------+
| 101 | 王五 | 100 | 2022-03-17 | 2022-03-18 |
| 102 | 赵四 |  50 | 2022-03-17 | 2099-12-31 |
+-----+------+-----+------------+------------+
005 修改⽬标表进⾏开链操作
insert into TAG(ID,NAME,BAL,START_DT,END_DT)
select ID,NAME,BAL,START_DT,'2099-12-31'
from VT_INC;
+-----+------+------+------------+------------+
| ID  | NAME | BAL  | START_DT  | END_DT    |
+-----+------+------+------------+------------+
| 101 | 王五 |  100 | 2022-03-17 | 2022-03-18 |
| 101 | 王五 | 1000 | 2022-03-18 | 2099-12-31 |
| 102 | 赵四 |  50 | 2022-03-17 | 2099-12-31 |
| 103 | 张三 |  10 | 2022-03-18 | 2099-12-31 |
+-----+------+------+------------+------------+
三.T+1⽤户数据移除拉链数据删除算法
001 源表数据移除
delete from SRC where id=101;
+-----+------+-----+
| ID  | NAME | BAL |
mysql下载哪个版本好2022+-----+------+-----+
| 102 | 赵四 |  50 |
| 103 | 张三 |  10 |
+-----+------+-----+
002 VT_NEW存放源数据,直接开链
drop table if exists VT_NEW;
create table if not exists VT_NEW like TAG;
insert into  VT_NEW(ID,NAME,BAL,START_DT,END_DT)
SELECT    ID,NAME,BAL,date_add(CURRENT_DATE,interval 1 day) as START_DT /*模拟T+1*/,'2099-12-31' as END_DT FROM    SRC ;
+-----+------+-----+------------+------------+
| ID  | NAME | BAL | START_DT  | END_DT    |
+-----+------+-----+------------+------------+
| 102 | 赵四 |  50 | 2022-03-18 | 2099-12-31 |
| 103 | 张三 |  10 | 2022-03-18 | 2099-12-31 |
+-----+------+-----+------------+------------+
003  VT_INC存放出对⽐出的新增及修改数据
drop table if exists VT_INC;
create table if not exists VT_INC like TAG;
INSERT    INTO VT_INC(ID,NAME,BAL,START_DT,END_DT)
SELECT    ID,NAME,BAL,START_DT,date_add(CURRENT_DATE,interval 2 day) /*模拟T+2数据移除了*/
FROM    TAG
WHERE    END_DT = '2099-12-31'    AND (ID,NAME,BAL) NOT IN (
SELECT    ID,NAME,BAL
FROM    VT_NEW ) ;
+-----+------+------+------------+------------+
| ID  | NAME | BAL  | START_DT  | END_DT    |
+-----+------+------+------------+------------+
| 101 | 王五 | 1000 | 2022-03-18 | 2022-03-19 |
+-----+------+------+------------+------------+
004 修改⽬标表进⾏关链操作
UPDATE TAG
SET END_DT = date_add(CURRENT_DATE,interval 2 day)/*模拟T+1*/
WHERE END_DT = '2099-12-31' AND ID IN (SELECT ID FROM VT_INC);
+-----+------+------+------------+------------+
| ID  | NAME | BAL  | START_DT  | END_DT    |
+-----+------+------+------------+------------+
| 101 | 王五 |  100 | 2022-03-17 | 2022-03-18 |
| 101 | 王五 | 1000 | 2022-03-18 | 2022-03-19 |
| 102 | 赵四 |  50 | 2022-03-17 | 2099-12-31 |
| 103 | 张三 |  10 | 2022-03-18 | 2099-12-31 |
+-----+------+------+------------+------------+

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。