Tags

, , , , ,

A colleague of mine had the problem to group continuous entries of same type to be able to count number of continuos entries.

His Data:

TS        USERNAME RETURNCODE
--------- -------- ----------
05-NOV-15 SCOTT         28000
04-NOV-15 SCOTT         28000
03-NOV-15 SCOTT          1027 
02-NOV-15 SCOTT          1027
01-NOV-15 SCOTT          1027
31-OCT-15 SCOTT          1027
30-OCT-15 SCOTT             0
29-OCT-15 SCOTT          1027
28-OCT-15 SCOTT          1027
27-OCT-15 SCOTT             0
26-OCT-15 SCOTT             0
25-OCT-15 SCOTT          1027

Output needed:

TS        USERNAME RETURNCODE GROUP COUNT
--------  -------- ---------- ----- -----
05-NOV-15 SCOTT         28000     1     2
04-NOV-15 SCOTT         28000     1     2
03-NOV-15 SCOTT          1027     2     4
02-NOV-15 SCOTT          1027     2     4
01-NOV-15 SCOTT          1027     2     4
31-OCT-15 SCOTT          1027     2     4
30-OCT-15 SCOTT             0     3     1
29-OCT-15 SCOTT          1027     4     2
28-OCT-15 SCOTT          1027     4     2
27-OCT-15 SCOTT             0     5     2
26-OCT-15 SCOTT             0     5     2
25-OCT-15 SCOTT          1027     6     1

So basically the problem was to build groups on continuous identical returncodes and increase group number with every change. After those groups have been built the count column can be calculated using a COUNT(*) OVER (PARTITION BY the group-identification).


Solution 1 – ANALYTIC FUNCTION:

This solution compares the returncode value of the previous row (within same username ordered by timestamp) with the current value and returns “1” in case of a difference if the current value is unchanged NULL is returned. In the second step a running summary over this 1/NULL result is built.

WITH data (ts, username, returncode)        
       AS ( SELECT DATE '2015-11-05', 'SCOTT', 28000 FROM dual UNION ALL
            SELECT DATE '2015-11-04', 'SCOTT', 28000 FROM dual UNION ALL
            SELECT DATE '2015-11-03', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-11-02', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-11-01', 'SCOTT', 1027  FROM dual UNION ALL 
            SELECT DATE '2015-10-31', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-10-30', 'SCOTT', 0     FROM dual UNION ALL
            SELECT DATE '2015-10-29', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-10-28', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-10-27', 'SCOTT', 0     FROM dual UNION ALL
            SELECT DATE '2015-10-26', 'SCOTT', 0     FROM dual UNION ALL
            SELECT DATE '2015-10-25', 'SCOTT', 1027  FROM dual) 
SELECT ts
      ,username
      ,returncode
      ,SUM(start_entry) OVER (PARTITION BY username ORDER BY ts DESC) AS grp
  FROM (SELECT ts
             , username
             , returncode
             , CASE
                  WHEN returncode = LAG(returncode) OVER (PARTITION BY username
                                                              ORDER BY ts DESC)
                     THEN 0                   
                  ELSE 1
               END AS start_entry 
          FROM data) 
/

TS        USERN RETURNCODE GRP
--------- ----- ---------- ----------
05-NOV-15 SCOTT      28000          1
04-NOV-15 SCOTT      28000          1
03-NOV-15 SCOTT       1027          2
02-NOV-15 SCOTT       1027          2
01-NOV-15 SCOTT       1027          2
31-OCT-15 SCOTT       1027          2
30-OCT-15 SCOTT          0          3
29-OCT-15 SCOTT       1027          4
28-OCT-15 SCOTT       1027          4
27-OCT-15 SCOTT          0          5
26-OCT-15 SCOTT          0          5
25-OCT-15 SCOTT       1027          6          

Solution 2 – MODEL clause:

The model clauses has the possibility to compare the values of different rows and generate a value for a different column based on this comparision. In our case I compare the value of the returncode of the previous row to the value of the returncode of the current row. In case of identical values i reuse the value of the grp column of the previous row to set the grp value of the current row, in case of a difference I increase the previous rows value by 1.

WITH DATA (ts, username, returncode)
        AS( SELECT DATE '2015-11-05', 'SCOTT', 28000 FROM dual UNION ALL
            SELECT DATE '2015-11-04', 'SCOTT', 28000 FROM dual UNION ALL 
            SELECT DATE '2015-11-03', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-11-02', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-11-01', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-10-31', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-10-30', 'SCOTT', 0     FROM dual UNION ALL
            SELECT DATE '2015-10-29', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-10-28', 'SCOTT', 1027  FROM dual UNION ALL
            SELECT DATE '2015-10-27', 'SCOTT', 0     FROM dual UNION ALL
            SELECT DATE '2015-10-26', 'SCOTT', 0     FROM dual UNION ALL
            SELECT DATE '2015-10-25', 'SCOTT', 1027  FROM dual)
SELECT ts, username, returncode, grp
  FROM data
      MODEL PARTITION BY (username)
            DIMENSION BY (ROW_NUMBER() OVER (PARTITION BY username 
                                                  ORDER BY ts) AS rn)
            MEASURES (1 as grp,returncode,ts)
            RULES AUTOMATIC ORDER
            (grp[rn > 1]  = CASE WHEN returncode[CV()] = returncode[cv()-1] 
                                 THEN grp[CV()-1]
                                 ELSE grp[CV()-1] + 1
                            END)
/

Solution 3 “MATCH_RECOGNIZE (12c)”:

With the MATCH_RECOGNIZE clause we can do a count of the group members simultaneously to the group assignement.

WITH DATA (ts, username, returncode)
        AS(SELECT DATE '2015-11-05', 'SCOTT', 28000 FROM dual UNION ALL
           SELECT DATE '2015-11-04', 'SCOTT', 28000 FROM dual UNION ALL 
           SELECT DATE '2015-11-03', 'SCOTT', 1027  FROM dual UNION ALL
           SELECT DATE '2015-11-02', 'SCOTT', 1027  FROM dual UNION ALL
           SELECT DATE '2015-11-01', 'SCOTT', 1027  FROM dual UNION ALL
           SELECT DATE '2015-10-31', 'SCOTT', 1027  FROM dual UNION ALL
           SELECT DATE '2015-10-30', 'SCOTT', 0     FROM dual UNION ALL
           SELECT DATE '2015-10-29', 'SCOTT', 1027  FROM dual UNION ALL
           SELECT DATE '2015-10-28', 'SCOTT', 1027  FROM dual UNION ALL
           SELECT DATE '2015-10-27', 'SCOTT', 0     FROM dual UNION ALL
           SELECT DATE '2015-10-26', 'SCOTT', 0     FROM dual UNION ALL
           SELECT DATE '2015-10-25', 'SCOTT', 1027  FROM dual)
SELECT ts, username, returncode, grp, continuous_rows
  FROM DATA
  MATCH_RECOGNIZE (PARTITION BY username
                   ORDER BY ts DESC                    
                   MEASURES FINAL COUNT(*) AS continuous_rows
                           ,MATCH_NUMBER() AS grp
                   ALL ROWS PER MATCH
                   AFTER MATCH SKIP PAST LAST ROW
                   PATTERN (strt cont*)
                   DEFINE cont AS cont.returncode = PREV(cont.returncode)) x
/