Pdf.Tables

Updated on

Pdf.Tables is a Power Query M function that extracts tables from a PDF document with control over options like implementation, page range, and table detection settings. The function returns a table of the extracted tables.

Compatible with: Power BI Service Power BI Desktop Excel Microsoft 365

Syntax

Pdf.Tables(
   pdf as binary,
   optional options as nullable record,
) as table

Description

Returns any tables found in pdf. An optional record parameter, options, may be provided to specify additional properties. The record can contain the following fields:

  • Implementation : The version of the algorithm to use when identifying tables. Old versions are available only for backwards compatibility, to prevent old queries from being broken by algorithm updates. The newest version should always give the best results. Valid values are "1.3", "1.2", "1.1", or null.
  • StartPage : Specifies the first page in the range of pages to examine. Default: 1.
  • EndPage : Specifies the last page in the range of pages to examine. Default: the last page of the document.
  • MultiPageTables : Controls whether similar tables on consecutive pages will be automatically combined into a single table. Default: true.
  • EnforceBorderLines : Controls whether border lines are always enforced as cell boundaries (when true), or simply used as one hint among many for determining cell boundaries (when false). Default: false.

Examples

Returns the tables contained in sample.pdf.

// Output: #table( {"Name", "Kind", "Data"}, ... )
Pdf.Tables( File.Contents( "c:sample.pdf" ) )

Other functions related to Pdf.Tables are:

Contribute » | Contributors: Rick de Groot
Microsoft documentation: https://learn.microsoft.com/en-us/powerquery-m/pdf-tables