Data types for Spark

The handling of data types is explained in request_data.

Although Excel is not a statically typedarrow-up-right language, understanding how Spark processes Excel's Number Category is important in ensuring the inputs and outputs are processed correctly.

Number Category in Excel

Please follow these formatting guidelines when preparing an Excel file to be used in Spark.

Blank values

  • For a single cell without a default value, leave the cell blank if possible. Avoid introducing quotations such as ="".

Numbers and text

  • To return a number value as a string, consider the following:

    • Set the Format of the cell to the Text category in Excel (see above).

    • Begin cell formulas with =TEXT(. This enables custom formatting to be applied to numbers and for them to be returned as a string.

    • Do not use an apostrophe before a number to denote it as a string. This will not prevent a number from being returned as a number unless the cell is defined as a Text category.

    • It is not uncommon to see data validation lists defined as "0,1,2+". See Avoid data type clashes.

Dates and times:

  • For dates, use the date format YYYY-MM-DD.

  • For times, use the time format hh:mm:ss.

  • For Spark to read a combined date and time field correctly, it must be set to a format that includes date and time. For this type of field, use the Custom category and define the format as either YYYY-MM-DD hh:mm:ss or YYYY-MM-DDThh:mm:ss.sssZ.

Last updated