Do you need to count distinct values in Power BI?
Counting distinct values is a frequent task in data analysis that helps you to grasp the uniqueness and diversity that exists within your datasets. It aids in efficiently uncovering valuable patterns and trends hidden within the data, thereby facilitating informed decision-making.
The process of counting the distinct values in a column also provides you with a comprehensive perspective on the diversity and distribution of the data.
Power BI, with its user-friendly interface and robust functionalities, provides several ways by which you can count the unique values in a column.
In this post, you will learn how you can count distinct values using DAX functions, Power Query, and the Visualizations pane.
Count Distinct Values in Power Query
Power BI offers multiple options for performing a distinct count of values within a column, with one of the possibilities being in Power Query.
While the conventional approach involves executing the distinct count operation in the Power BI model using DAX or within a visualization, counting distinct values directly in Power Query presents a valuable and potentially optimal solution when only the aggregated result is required, without the need for detailed data.
In this case, when you don’t need real-time dynamic calculations, using Power Query can be an efficient approach to counting distinct values.
You can determine the count of distinct values in a column in Power Query by using the Group By feature. This method allows you to obtain the count of a grouped column while considering the distinct values in another column.
The entire process can be effortlessly accomplished using the user-friendly UI menu or by directly utilizing M code for more advanced customizations.
To begin, first, you need to load your data into Power Query. After loading your data into Power Query, identify the column you want to group by and right-click on it. Then select the Group By option from the list of contextual menus.
In the Group By window, give the new column a name using the New column name and use the Operation box to select the Count Distinct Rows option.
Within this window, you’ll notice that the “Column” option appears greyed out, hindering your ability to select an alternative column for conducting a distinct count.
This limitation of the UI’s group by menu is its drawback, as it restricts your choice of an aggregation column, making it impossible to perform a distinct count on the column of your preference.
To get around this, you can click on the OK button and edit the generated M code to include an aggregation column. Or you simply just create the M code syntax from scratch.
The processing time for obtaining results after clicking OK may vary depending on your data’s size. The reason behind this delay is that Power BI attempts to identify unique values for each column in the dataset, considering the chosen column as a reference.
Consequently, for large datasets, this process may take longer to complete, and in some cases, it can even lead to encountering an error.
Nevertheless, an M code syntax will be generated which you can edit to get the result you want.
= Table.Group(dbo_FactInternetSales, {"CustomerKey"}, {{"CustomerKeyGroup ", each Table.RowCount(Distinct(_)), Int64.Type}})
This syntax is generated after clicking on the OK button. This syntax basically describes an operation where Power Query attempts to return the unique values for all the columns in the table.
= Table.Group(dbo_FactInternetSales, {"CustomerKey"}, {{"ProductCount", each Table.RowCount(List.Distinct([ProductKey])), Int64.Type}})
In this alternative syntax, there is a similarity to the first one, but with a distinct difference.
Specifically, this syntax allows you to perform a distinct count operation on a single specified column, which was not achievable in the group by window used in the previous syntax.
Unlike the first syntax, where all columns underwent the distinct count operation, this version narrows down the process to the selected column.
Once you enter the syntax into the formula bar, the table is summarized to reveal the number of distinct or unique products bought by each [CustomerKey].
Here’s a breakdown of the syntax step by step:
1. Table.Group – This function is used to group rows in a table based on specified columns. In this case, the table being used is dbo_FactInternetSales.
2. {"CustomerKey"}
– This is an array representing the column or columns by which the rows will be grouped. In this case, the grouping is based on the [CustomerKey] column.
3. {{"Count", each Table.RowCount(List.Distinct([ProductKey])), Int64.Type}}
– This is an array of records that specifies the aggregation to be performed on the grouped data.
- Count – This is the name of the new column that will be created in the resulting table to store the aggregated values.
each Table.RowCount(List.Distinct([ProductKey]))
– this expression calculates the count of distinct values in the [ProductKey] column for each group. The List.Distinct function is used to get the unique values in the [ProductKey] column, and then Table.RowCount counts the number of rows in that distinct list.Int64.Type
– This specifies the data type of the [Count] column. In this case, it is specified as Int64 (64-bit integer).
Count Distinct Values Using DAX
DAX provides you with several functions that you can use to create a measure, calculated column, or virtual table that will return the distinct count of a column. These functions are the DISTINCT, VALUES, DISTINCTCOUNT, and DISTINCTCOUNTNOBLANK.
Count Distinct with the DISTINCTCOUNT Function
The DISTINCTCOUNT function in DAX counts and returns the number of distinct values in a column. The function uses a very simple syntax:
DISTINCTCOUNT ( <ColumnName> )
Another variation of the DISTINCTCOUNT function is the DISTINCTCOUNTNOBLANK function. Both functions return the count of distinct values in a particular column. However, DISTINCTCOUNTNOBLANK excludes blank rows while DISTINCTCOUNT doesn’t.
The DISTINCTCOUNTNOBLANK function has the same syntax as the DISTINCTCOUNT function:
DISTINCTCOUNTNOBLANK ( <ColumnName> )
DISTINCTCOUNTNOBLANK =
DISTINCTCOUNTNOBLANK(
FactInternetSales[ProductKey]
)
DISTINCT COUNT =
DISTINCTCOUNT (
FactInternetSales[ProductKey]
)
When there are no blanks, both functions will return the same values.
Count Distinct with the DISTINCT AND VALUES Functions
The DISTINCT and VALUES functions are another type of DAX functions that also return the distinct values contained in a column.
Unlike the previous functions discussed above, however, they don’t return a scalar value. Instead, both functions return an entire table or a one-column table that contains all the unique values in a specific column or table.
The syntax for DISTINCT and VALUES are almost similar and very straightforward.
VALUES ( <TableNameOrColumnName> )
DISTINCT ( <ColumnNameorTableExpr> )
Before using the VALUES or DISTINCT functions to count distinct values, you should take note of the difference that exists between both functions.
VALUES may include a blank row when there are one-to-many relationships and referential integrity violations.
If the table is related to others, and there are invalid relationships (e.g., orphaned or unrelated records), VALUES may return an extra blank row. DISTINCT, however, will exclude such a blank row and only provide distinct values based on existing relationships.
When there’s no case of invalid relationships, VALUES and DISTINCT work the same.
EVALUATE
DISTINCT ( FactInternetSales[ProductKey] )
EVALUATE
VALUES( FactInternetSales[ProductKey] )
When you use these syntaxes in DAX studio, they will return a one-column table that contains all the distinct values in the specified column.
VALUES =
COUNTROWS(
VALUES(
FactInternetSales[ProductKey]
)
)
DISTINCT =
COUNTROWS(
DISTINCT(
FactInternetSales[ProductKey]
)
)
Because the functions are table functions, you can use them inside an aggregation function like the COUNTROWS function to return a scalar.
You can use it in a function like the COUNTX function to return the count of distinct values in any specific column.
Count Distinct Values Using Visualization’s Pane
An alternative approach to obtain the distinct count of a column’s values is by directly implementing it within a visual.
This method is particularly useful when there is no need to create a separate measure, especially for one-time use scenarios.
The table or matrix visual is the most suitable option, efficiently achieving the desired outcome.
To get the distinct count from a visual, follow these steps
- Select the table visual from the visualization pane.
- Drag the column name into the Columns well.
- Click on the dropdown arrow and select Count (Distinct) from the options.
Conclusions
Mastering the art of counting distinct values in Power BI and Power Query can significantly enhance your data analysis. With multiple approaches available for this task, you have the flexibility to choose the method that best aligns with your specific objectives.
The Power Query solution proves useful when seeking a straightforward distinct count without requiring customized calculations. On the other hand, DAX functions offer the option to obtain a single count or retrieve all distinct values present in the column.
For quick, one-time analyses, a table or matrix visual provides a simple and effective way to calculate the distinct count. By familiarizing yourself with these versatile techniques, you can elevate your data exploration endeavors and extract valuable insights to drive better-informed decisions.
Is it possible that I am getting different results when counting distinct values using visualization’s pane and using DISTINCTCOUNT DAX function?