Table.AddFuzzyClusterColumn

Updated on

Table.AddFuzzyClusterColumn is a Power Query M function that adds a new column to a table with representative values of an existing column, based on fuzzy matching. The function returns a modified table with the new column added and additional options for customization.

Compatible with: Power BI Service Power BI Desktop Excel Microsoft 365

Syntax

Table.AddFuzzyClusterColumn(
   table as table,
   columnName as text,
   newColumnName as text,
   optional options as nullable record,
) as table

Description

Adds a new column newColumnName to table with representative values of columnName. The representatives are obtained by fuzzily matching values in columnName, for each row.

An optional set of options may be included to specify how to compare the key columns. Options include:

  • Culture : Allows grouping records based on culture-specific rules. It can be any valid culture name. For example, a Culture option of "ja-JP" groups records based on the Japanese culture. The default value is "", which groups based on the Invariant English culture.
  • IgnoreCase : A logical (true/false) value that allows case-insensitive key grouping. For example, when true, "Grapes" is grouped with "grapes". The default value is true.
  • IgnoreSpace : A logical (true/false) value that allows combining of text parts in order to find groups. For example, when true, "Gra pes" is grouped with "Grapes". The default value is true.
  • SimilarityColumnName : A name for the column that shows the similarity between an input value and the representative value for that input. The default value is null, in which case a new column for similarities will not be added.
  • Threshold : A number between 0.00 and 1.00 that specifies the similarity score at which two values will be grouped. For example, "Grapes" and "Graes" (missing the "p") are grouped together only if this option is set to less than 0.90. A threshold of 1.00 only allows exact matches. (Note that a fuzzy "exact match" might ignore differences like casing, word order, and punctuation.) The default value is 0.80.
  • TransformationTable : A table that allows grouping records based on custom value mappings. It should contain "From" and "To" columns. For example, "Grapes" is grouped with "Raisins" if a transformation table is provided with the "From" column containing "Grapes" and the "To" column containing "Raisins". Note that the transformation will be applied to all occurrences of the text in the transformation table. With the above transformation table, "Grapes are sweet" will also be grouped with "Raisins are sweet".

Examples

Find the representative values for the location of the employees.

Table.AddFuzzyClusterColumn( 
    Table.FromRecords( 
        {
            [EmployeeID = 1, Location = "Seattle"],
            [EmployeeID = 2, Location = "seattl"],
            [EmployeeID = 3, Location = "Vancouver"],
            [EmployeeID = 4, Location = "Seatle"],
            [EmployeeID = 5, Location = "vancover"],
            [EmployeeID = 6, Location = "Seattle"],
            [EmployeeID = 7, Location = "Vancouver"]
        },
        type table [EmployeeID = nullable number, Location = nullable text]
     ),
    "Location",
    "Location_Cleaned",
    [IgnoreCase = true, IgnoreSpace = true]
 )

 /* Output: 
Table.FromRecords( 
    {
        [EmployeeID = 1, Location = "Seattle", Location_Cleaned = "Seattle"],
        [EmployeeID = 2, Location = "seattl", Location_Cleaned = "Seattle"],
        [EmployeeID = 3, Location = "Vancouver", Location_Cleaned = "Vancouver"],
        [EmployeeID = 4, Location = "Seatle", Location_Cleaned = "Seattle"],
        [EmployeeID = 5, Location = "vancover", Location_Cleaned = "Vancouver"],
        [EmployeeID = 6, Location = "Seattle", Location_Cleaned = "Seattle"],
        [EmployeeID = 7, Location = "Vancouver", Location_Cleaned = "Vancouver"]
    },
    type table [EmployeeID = nullable number, Location = nullable text, Location_Cleaned = nullable text]
 )
 */ 

Other functions related to Table.AddFuzzyClusterColumn are:

BI Gorilla Youtube Channel

Contribute » | Contributors: Rick de Groot
Microsoft documentation: https://learn.microsoft.com/en-us/powerquery-m/table-addfuzzyclustercolumn