File Reader Plugin¶
The File Reader plugin allows you to connect to a wide variety of data file types to:
- Read source data records in order to create Aunsight Golden Records & Transactional Workflows.
- Replicate Aunsight Golden Records to a delimited file format (ex. CSV).
Supported Modes¶
The File Reader plugin supports the following Modes:
- Delimited (ex. CSV)
- Fixed Width Columns
- Excel
- XML
- AS400 - Currently in beta, but is available by request
- File Copy
Configure Connection¶
The File Reader plugin provides flexible configuration options. This page steps through the various configuration sections and available Modes.
Settings¶
-
Global Columns Configuration File - Allows you to keep a single file on disk that contains all of the configuration required for fixed width plugins. It is superseded by local columns configuration files. Below are an example file path and file layout.
C:/Example/Path/To/File.json
{
"ReadDirectory": [
{
"ColumnName": "id",
"ColumnStart": 0,
"ColumnEnd": 43,
"IsKey": true,
"TrimWhitespace": true
},
{
"ColumnName": "data1",
"ColumnStart": 44,
"ColumnEnd": 45,
"IsKey": false
},
{
"ColumnName": "date1",
"ColumnStart": 46,
"ColumnEnd": 55,
"IsKey": false
},
{
"ColumnName": "date2",
"ColumnStart": 56,
"ColumnEnd": 65,
"IsKey": false
},
{
"ColumnName": "data2",
"ColumnStart": 66,
"ColumnEnd": 79,
"IsKey": false,
"TrimWhitespace": true
},
{
"ColumnName": "data3",
"ColumnStart": 293,
"ColumnEnd": 317,
"IsKey": false,
"TrimWhitespace": true
}
],
"PrimaryDirectory": [
{
"ColumnName": "id",
"ColumnStart": 0,
"ColumnEnd": 43,
"IsKey": true,
"TrimWhitespace": true
},
{
"ColumnName": "data1",
"ColumnStart": 44,
"ColumnEnd": 45,
"IsKey": false
},
{
"ColumnName": "date1",
"ColumnStart": 46,
"ColumnEnd": 55,
"IsKey": false
},
{
"ColumnName": "date2",
"ColumnStart": 56,
"ColumnEnd": 65,
"IsKey": false
},
{
"ColumnName": "data2",
"ColumnStart": 66,
"ColumnEnd": 79,
"IsKey": false,
"TrimWhitespace": true
},
{
"ColumnName": "data3",
"ColumnStart": 293,
"ColumnEnd": 317,
"IsKey": false,
"TrimWhitespace": true
}
]
}
- FTP Hostname - Hostname for the root paths using FTP/SFTP.
- FTP Port - Port for any root paths using FTP/SFTP.
- FTP Username - Username for any root paths using FTP/SFTP.
- FTP Password - Password for any root paths using FTP/SFTP.
- SFTP SSH Key - Path to SSH private key for any root paths using SFTP.
Index¶
This section gives the necessary configuration fields to add table indexes.
- Table Name - Name of the table to apply the index to.
- Index Column Names - Columns to include in the index.
Multiple tables may have indexes applied. Click the + in the lower right corner of the Indexes section to add additional tables. For each table to be indexed, multiple columns may be specified. Use the + in the lower right corner of the Index Column Name subsection to add more.
Root Paths¶
A Connection can contain one or many root paths. All files for a given root path must contain the same type of data in the same format (i.e., all the same fixed width specification, or all CSV with the same columns). Depending on the Mode selected for a given root path certain settings are active while others are not.
-
Root Path - (Required) The directory from which files are read. All files in the directory must contain the same data structure and utilize the same Mode. An example of what a root path may look like is below.
C:/Example/Path/To/Folder
-
Filter - (Required) The file filter to use when searching for files within the directory. Example filters are shown below.
For CSV files
*.csv
For text files
*.txt
For all files
*
-
Name - The name of the schema produced for a given root path. The default name is the directory in which the files for the root path are located.
- Cleanup Action - The action to take on the files after reading them. The possible options are:
- Nothing - This is the default selection and does nothing with the read files. It will not move or remove the read files.
- Delete - This selection will delete the files after they have been read.
- Archive - This option will move the files to the path specified by the Archive Folder setting. If the file being moved already exists it will not overwrite the existing archived file and will instead append (N) to the file name where N is the number of duplicate files archived (e.g., sample_file (1).txt).
- Archive Folder - (Required if Cleanup Action is set to Archive) - The path to which files that are read will be archived.
- Skip Lines - The number of lines to skip at the top of the file. Zero (0) is the default value.
- Error on Empty Root Path - Checking this box will throw an error when the root path is empty at time of reading.
- File Read Mode - The Mode to access the target files. The default option is Local, but FTP and SFTP are other available options.
Mode¶
The Mode selection specifies what Mode to use when reading the files specified in the Root Path settings. This selection is always required. Based on the Mode selected, the configuration options will vary. The available Modes are:
- Delimited
- Fixed Width Columns
- Excel
- XML
- AS400 - Currently in beta, but is available by request
- File Copy
Each Mode selection option is outlined below.
Delimited Settings¶
- Delimiter - (Required) The delimiter to use when reading delimited files. As an example, comma (,) would be the delimiter for .csv files.
- Has Header - Check this box if the file has a header row. Leave it unchecked if the file does not contain a header row.
- Auto Generate Row Number - Check this box to automatically generate a row number property (AUTO_ROW_NUM).
- Include File Name as Field - Check this box to automatically generate a file name property (AUTO_FILE_NAME).
- Selected Ranges - If populated, this option will limit CSV data to the defined 0-based comma-separated column ranges set by the user.
Fixed Width Columns Settings¶
- Auto Generate Row Number - Check this box to automatically generate a row number property (AUTO_ROW_NUM).
- Include File Name as Field - Check this box to automatically generate a file name property (AUTO_FILE_NAME).
- Columns Configuration File - File path to the columns configuration file for fixed width files. It will override all other column configurations if specified. An example file is provided below:
[
{
"ColumnName": "id",
"ColumnStart": 0,
"ColumnEnd": 43,
"IsKey": true,
"TrimWhitespace": true
},
{
"ColumnName": "data1",
"ColumnStart": 44,
"ColumnEnd": 45,
"IsKey": false
},
{
"ColumnName": "date1",
"ColumnStart": 46,
"ColumnEnd": 55,
"IsKey": false
},
{
"ColumnName": "date2",
"ColumnStart": 56,
"ColumnEnd": 65,
"IsKey": false
},
{
"ColumnName": "data2",
"ColumnStart": 66,
"ColumnEnd": 79,
"IsKey": false,
"TrimWhitespace": true
},
{
"ColumnName": "data3",
"ColumnStart": 293,
"ColumnEnd": 317,
"IsKey": false,
"TrimWhitespace": true
}
]
-
Columns - This subsection is only relevant for Fixed Width Columns Mode. This is the area only required if no global or root path specific configuration file is defined. The global configuration file, root path, and this field can all be used at the same time with the following priority:
- 1st priority - Columns
- 2nd priority - Root Path
- 3rd priority - Global Columns Configuration File
Use this section to provide the fixed width column specifications. Each column is defined by five properties:
- Column Name - (Required) Name of the column.
- Is Key - Check this box if the column is the key property.
- Trim Whitespace - Check this box to force all whitespace before and after a value to be trimmed. Leave this box unchecked if whitespace should not be trimmed.
- Column Start - (Required) The index of the first character of the column (0 is the first index).
- Column End - (Required) The index of the last character of the column.
Add multiple fixed width columns by clicking the + in the lower right corner of the Fixed Width Columns Settings subsection.
Excel Settings¶
- Has Header - Check this box if the file has a header row. Leave it unchecked if the file does not contain a header row.
- Auto Generate Row Number - Check this box to automatically generate a row number property (AUTO_ROW_NUM).
- Include File Name as Field - Check this box to automatically generate a file name property (AUTO_FILE_NAME).
-
Excel Columns - (Required) A comma separated list that indicates columns to read from an Excel file. An example is shown below.
0, 2-5, 7, 10, 12-15
-
Excel Cells - Specifies one or many cells that should be present on all records from an Excel file. Each Cell is defined with three properties.
- Column Name - (Required) The name of the column to appear in the discovered schema.
- Column Index - (Required) The column index of the cell.
- Row Index - (Required) The row index of the cell.
Column Name: Data_1
Column Index: 0
Row Index: 10
XML Settings¶
- XSD File Path and Name - (Required) - Fully qualified path to the XSD file.
- Include File Name as Field - Check this box to automatically generate a file name property (AUTO_FILE_NAME).
- XML Keys - (Required) - Parts of the composite key for the records.
- Element ID - (Required) - The ID of the XML element tag that contains the key.
- Attribute ID - The ID of the attribute on the XML element tag (if no attribute is provided, the value of the element tag will be used).
Add multiple XML Keys by clicking the + in the lower right corner of the XML Keys subsection.
File Copy Settings¶
This Mode is used to copy data from one location to another. The location to which the file is being sent or delivered is generally referred to as the target.
- Target FTP Hostname - Hostname for the FTP/SFTP to which the file should be moved.
- Target FTP Port - Port for the target using FTP/SFTP.
- Target FTP Username - Username for the target using FTP/SFTP.
- Target FTP Password - Password for the target using FTP/SFTP.
- Target SFTP SSH Key - Path to SSH private key for the target using SFTP.
- Target File Write Mode - (Required) Mode to write target files: Local, FTP, SFTP.
- Target Directory Path - (Required) Path to target directory.
- Overwrite Target - Check this box if the source files should overwrite the target files. If this box is not checked, the source files will not overwrite existing files.
- Minimum Send Delay MS (Required) Minimum time between sending files in milliseconds.
- Original File Name RegEx Select - Regular expression to select values to use in the new file name (e.g., ^(.*)$).
- New File Name RegEx Replace - Regular expression to use as the new file name (e.g., part-$1).
AS400¶
Further information about the AS400 Mode is available upon request.
Schema Definition¶
Schema Discovery¶
The File Reader plugin is capable of reading all files in defined root paths and can automatically generate schemas that can be used to read data out of File Reader.
User Defined Schema Creation¶
The File Reader plugin supports user defined schema creation. This allows the creation of a schema that trims down to only the necessary properties, performs joins to create rich schemas, and other advanced operations. The table names are the same as the schemas discovered automatically. The columns for a given table/schema are the same as the names for the properties of a given schema.
AU_FileInformation¶
The filereader plugin has a feature which allows users to see all the files and folders within the selected SFTP path for the source connection. In order to see this info, the user has to create the filereader plugin with the directory information they want to see the folder/files at.
Viewing AU_FileInformation File¶
-
Create the filereader plugin with the root path of the folder they want to see the file information for.
-
In the filter, select the type of files they want to see. If it is all file types, select*.
NOTE: The file information file is a .csv file. So if you filter for txt files etc., you will not be able to see the information file within discovered schema.
-
To view the AU_FileInformation file which has the info, once the plugin is created, go to discover schema.
-
Within the discover schema, the users will be able to see the AU_FileInformation file, with the fields Rootpath (Location of the file as per the system's file structure), File Name, File Extension and File Size.
-
In order to see the data with AU_FileInformation, within the file, user can either:
a) Generate a sample (maximizing the sample size).
b) Output the data to Aunsight or any other destination of their choice.