Table.FuzzyNestedJoin

Updated on

Table.FuzzyNestedJoin is a Power Query M function that joins rows of two tables based on fuzzy matching of the values of key columns and returns the results in a new column named newColumnName. The function returns a table with the joined rows and a new column containing the join results.

Compatible with: Power BI Service Power BI Desktop Excel Microsoft 365

Syntax

Table.FuzzyNestedJoin(
   table1 as table,
   key1 as any,
   table2 as table,
   key2 as any,
   newColumnName as text,
   optional joinKind as nullable number,
   optional joinOptions as nullable record,
) as table
ArgumentAttributeDescription
table1The base table for the join.
key1One or multiple keys to perform the join on. For a single key, you can provide a text value, whereas for multiple you can specify the column names within a list.
table2The table to join with.
key2One or multiple keys to perform the join on. For a single key you can provide a text value, whereas for multiple you can specify the column names within a list.
newColumnNameThe join creates a column containing nested tables with the result of the join operation. This argument specifies the name of the column.
joinKindoptionalJoin Types indicate how to join the tables. Options are JoinKind.LeftOuter, JoinKind.LeftAnti, JoinKind.Inner, JoinKind.FullOuter, JoinKind.RightOuter and JoinKind.RightAnti. Defaults to Left Outer Join if unspecified.
joinOptionsoptionalOptionally you can specify an options record to change how to compare the key columns. Details are specified below.

Description

Table.FuzzyNestedJoin performs a fuzzy matching join between two tables, creating a new column with the name specified by ‘newColumnName’ to store the join results. This function enables joining based on the similarity, rather than exact matches, of key column values. Users can customize the join with options like join kind (e.g., inner, left outer, right outer), case sensitivity, culture-specific matching, and similarity thresholds. The result is a table that includes a nested table column containing the fuzzy join outcomes, making it particularly useful for data matching scenarios where exact text equality is not feasible.

An optional set of joinOptions may be included to specify how to compare the key columns. Options include:

  • ConcurrentRequests: A number between 1 and 8 that specifies the number of parallel threads to use for fuzzy matching. The default value is 1.
  • Culture: Allows matching records based on culture-specific rules. It can be any valid culture name. For example, a Culture option of “ja-JP” matches records based on the Japanese culture. The default value is “”, which matches based on the Invariant English culture.
  • IgnoreCase: A logical (true/false) value that allows case-insensitive key matching. For example, when true, “Grapes” is matched with “grapes”. The default value is true.
  • IgnoreSpace: A logical (true/false) value that allows combining of text parts in order to find matches. For example, when true, “Gra pes” is matched with “Grapes”. The default value is true.
  • NumberOfMatches: A whole number that specifies the maximum number of matching rows that can be returned for every input row. For example, a value of 1 will return at most one matching row for each input row. If this option is not provided, all matching rows are returned.
  • SimilarityColumnName: A name for the column that shows the similarity between an input value and the representative value for that input. The default value is null, in which case a new column for similarities will not be added.
  • Threshold: A number between 0.00 and 1.00 that specifies the similarity score at which two values will be matched. For example, “Grapes” and “Graes” (missing the “p”) are matched only if this option is set to less than 0.90. A threshold of 1.00 only allows exact matches. (Note that a fuzzy “exact match” might ignore differences like casing, word order, and punctuation.) The default value is 0.80.
  • TransformationTable: A table that allows matching records based on custom value mappings. It should contain “From” and “To” columns. For example, “Grapes” is matched with “Raisins” if a transformation table is provided with the “From” column containing “Grapes” and the “To” column containing “Raisins”. Note that the transformation will be applied to all occurrences of the text in the transformation table. With the above transformation table, “Grapes are sweet” will also be matched with “Raisins are sweet”.

Examples

Left inner fuzzy join of two tables based on [FirstName]

Table.FuzzyNestedJoin( 
  Table.FromRecords( 
        {
        [CustomerID = 1, FirstName1 = "Bob", Phone = "555-1234"],
        [CustomerID = 2, FirstName1 = "Robert", Phone = "555-4567"]
      },
        type table [CustomerID = nullable number, FirstName1 = nullable text, Phone = nullable text]
     ),
  {"FirstName1"},
  Table.FromRecords( 
        {
        [CustomerStateID = 1, FirstName2 = "Bob", State = "TX"],
        [CustomerStateID = 2, FirstName2 = "bOB", State = "CA"]
      },
        type table [CustomerStateID = nullable number, FirstName2 = nullable text, State = nullable text]
     ),
  {"FirstName2"},
  "NestedTable",
  JoinKind.LeftOuter,
  [IgnoreCase = true, IgnoreSpace = false]
 )

 /* Output: 
Table.FromRecords( {
    [
    CustomerID = 1,
    FirstName1 = "Bob",
    Phone = "555-1234",
    NestedTable = Table.FromRecords( {
            [
      CustomerStateID = 1,
      FirstName2 = "Bob",
      State = "TX"
      ],
            [
      CustomerStateID = 2,
      FirstName2 = "bOB",
      State = "CA"
      ]
    } )
  ],
    [
  CustomerID = 2,
  FirstName1 = "Robert",
  Phone = "555-4567",
  NestedTable = Table.FromRecords( {} )
  ]
} )
 */ 

Other functions related to Table.FuzzyNestedJoin are:

Contribute » | Contributors: Rick de Groot
Microsoft documentation: https://learn.microsoft.com/en-us/powerquery-m/table-fuzzynestedjoin