With ORACLE 19c another gap in the listagg feature was filled. Now we are able to do a native distinct operation on the values in the list, so that the list no longer contains duplicates.
In prior releases we had to deduplicate the values before using them in the listagg, which was rather dreadful when doing other group functions in the same query.
LISTAGG Deduplication
Imagine we would have to list all product subcategories of the SH.Products table group by the product category.
select PROD_CATEGORY , listagg(PROD_SUBCATEGORY,' | ') as SubCategories from SH.PRODUCTS group by PROD_CATEGORY; PROD_CATEGORY SUBCATEGORIES ---------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Photo Camcorders | Camera Batteries | Camera Batteries | Camera Batteries | Camera Batteries | Camera Batteries | Camera Media | Camera Media | Camera Media | Cameras Electronics Game Consoles | Home Audio | Home Audio | Y Box Accessories | Y Box Accessories | Y Box Games | Y Box Games | Y Box Games | Y Box Games | Y Box Games | Y Box Games | Y Box Games | Y Box Games Software/Other Accessories | Accessories | Accessories | Accessories | Accessories | Bulk Pack Diskettes | Bulk Pack Diskettes | Documentation | Documentation | Documentation | Documentation | Documentation | Documentation | Operating Systems | Recordable CDs | Recordable CDs | Recordable CDs | Recordable CDs | Recordable CDs | Recordable CDs | Recordable CDs | Recordable DVD Discs | Recordable DVD Discs | Recordable DVD Discs | Recordable DVD Discs | Recordable DVD Discs Peripherals and Accessories Accessories | Accessories | Accessories | Accessories | Accessories | CD-ROM | CD-ROM | CD-ROM | CD-ROM | CD-ROM | CD-ROM | Memory | Memory | Modems/Fax | Modems/Fax | Monitors | Monitors | Printer Supplies | Printer Supplies | Printer Supplies | Printer Supplies Hardware Desktop PCs | Portable PCs
As we can see, there are duplicates in the subcategory lists. So we had to do a deduplication in the past.
select PROD_CATEGORY , listagg(PROD_SUBCATEGORY,' | ') as SubCategories from (select distinct PROD_CATEGORY, PROD_SUBCATEGORY from SH.PRODUCTS) group by PROD_CATEGORY; PROD_CATEGORY SUBCATEGORIES --------------------------- ------------------------------------------------------------------------------------------------------------- Electronics Y Box Accessories | Game Consoles | Home Audio | Y Box Games Software/Other Bulk Pack Diskettes | Recordable CDs | Recordable DVD Discs | Accessories | Documentation | Operating Systems Photo Camera Batteries | Camera Media | Camcorders | Cameras Peripherals and Accessories CD-ROM | Accessories | Modems/Fax | Monitors | Printer Supplies | Memory Hardware Portable PCs | Desktop PCs
Now, imagine we would like to add the sum of the list_prices to this query … to get the data consistent, we had to do the sum on the list_prices in two steps. First per distinct prod_category/prod_subcategory to get a deduplicated list of subcategories and then on the category level using the pre-calculated sums.
select PROD_CATEGORY , listagg(PROD_SUBCATEGORY,' | ') as SubCategories , sum(SumProdListPrice) as SumProdListPrice from (select PROD_CATEGORY , PROD_SUBCATEGORY , sum(PROD_LIST_PRICE) as SumProdListPrice from SH.PRODUCTS group by PROD_CATEGORY , PROD_SUBCATEGORY) p1 group by PROD_CATEGORY;
Oracle 19c spelling
Now this got much easier with Oracle 19c, the only thing we have to do is adding distinct to the listagg function
select PROD_CATEGORY , listagg(distinct PROD_SUBCATEGORY,' | ') as SubCategories , sum(PROD_LIST_PRICE) as SumProdListPrice from SH.PRODUCTS group by PROD_CATEGORY;
If you need to have the list ordered, then you have to use the within group syntax.
select PROD_CATEGORY , listagg(distinct PROD_SUBCATEGORY,' | ') within group (order by PROD_SUBCATEGORY) as SubCategories , sum(PROD_LIST_PRICE) as SumProdListPrice from SH.PRODUCTS group by PROD_CATEGORY;
WITHIN GROUP clause
Starting with version 19c the WITHIN GROUP clause is optional and only needed if the list has to be ordered.
Conclusion
Adding the possibility to remove duplicates from the list using the distinct keyword looks like a tinyness. But as most of the time duplicates in the list are not wanted this tinyness is really useful.