Wrong column delimiter
Column delimiters in plain text files can cause annoying problems. Solving such a problem will take you a lot of time. And this is time that you rather spend on doing some real work, or basically anything else that you like to spend your time on.
In this blog post, I will explain which root causes there are for the column delimiter issue. At the last part of this article, I will also offer you the solution. This solution involves using the SQL DataTool.
In a different article, I went through giving the solution of unexpected line breaks in flat files. This might also interest you when you run into the issue of a wrong column delimiter.
Go to the solution of how to import flat files with column delimiter issues directly.
Column delimiter issue
To explain the issue of the column delimiter, I created an example plain text file.
As you can see, the file contains 1 header row, and 5 rows with data. The first 4 columns (FIELD1 to FIELD5) contain text (string) data, the last column is a number (integer). However, only 2 rows contain the right number of column delimiters (4 columns, 4 delimiters). These two rows are row 2 and row 5. The other rows contain too much delimiters.
To import this flat file to a SQL database, I use the SQL Server Import and Export wizard. I choose as Data Source the Flat File source option. As you can see, the name of the file is 'column delimiter.txt'. I leave everything else default, since the wizard will propose a delimiter itself.
In this first example, I will not change anything that is proposed by the SQL Server Import and Export wizard. You will see that the data is imported in the database (or at least most of it). The issue here is that the data is not imported in the database in the correct and usable format.
When I click on Columns, a preview of the data is shown. The SQL Server Import and Export Wizard propose the vertical bar {|} as column delimiter. In this case, that is the right character to propose.
You can see that the data in rows 1 and 4 is complete compared to the rows 2 and 5 in the plain text file. Rows 2, 3 and 5 miss some data. Besides that, FIELD5 does not only contain numbers. As you can expect from the preview, all columns will be automatically defined as VARCHAR fields. Changing FIELD5 to an integer will generate an error later on. So for now, I leave everything as is.
The SQL Server wizard has defined all columns as VARCHAR. Since all the columns in the plain text file are strings, this is fine. When I click Next a few times and push the Finish button, the import actually starts.
The import finishes successful, however not all data is imported. The only reason why I did not got an error is because I imported everything as VARCHAR. If I would have changed FIELD5 to an integer, I would have gotten an error and the import would have been aborted.
The imported data is shown below in the screenshot. It looks more or less like the data we have seen in the preview data screen in the wizard.
As you can see, the data is complete. However, the fields do not contain what I wanted. FIELD5 should only contain numbers. The rows with too many delimiters are imported incorrectly. If you don't mind: great! Otherwise you should use this solution.
Import in the correct format
In the above example, I have not changed anything in the settings of the SQL Server Import and Export wizard. In this example, I will change the data type of FIELD5 to an integer.
Most of the steps are like the steps in the example above. So I will only highlight the changes in the process.
Instead of leaving the data types as proposed by the SQL Server Import and Export wizard, I now change the FIELD5 data type to INTEGER. This can be done at the step Select Source tables and views.
Click in the screen above on the Edit Mappings button. The window as shown below will pop up. In this window, I change the Type of FIELD5 to int. Now let's see what happens when I try to import the data.
First of all, I review the data type mapping at the next step in the wizard. This already gives me a hint of a possible issue.
I leave the settings as is. And as expected, I receive an error message, stating that the conversion of the data failed. The result of this is that no data at all is imported in the database.
- Executing (Error)
Messages
Information 0x402090de: Data Flow Task 1: The total number of data rows processed for file "C:\Temp\column delimiter.txt" is 6.
(SQL Server Import and Export Wizard)
Error 0xc02020c5: Data Flow Task 1: Data conversion failed while converting column "FIELD5" (26) to column "FIELD5" (60). The conversion returned status value 2 and status text "The value could not be converted because of a potential loss of data.".
(SQL Server Import and Export Wizard)
Error 0xc0209029: Data Flow Task 1: SSIS Error Code DTS_E_INDUCEDTRANSFORMFAILUREONERROR. The "Data Conversion 0 - 0.Outputs[Data Conversion Output].Columns[FIELD5]" failed because error code 0xC020907F occurred, and the error row disposition on "Data Conversion 0 - 0.Outputs[Data Conversion Output].Columns[FIELD5]" specifies failure on error. An error occurred on the specified object of the specified component. There may be error messages posted before this with more information about the failure.
(SQL Server Import and Export Wizard)
Error 0xc0047022: Data Flow Task 1: SSIS Error Code DTS_E_PROCESSINPUTFAILED. The ProcessInput method on component "Data Conversion 0 - 0" (52) failed with error code 0xC0209029 while processing input "Data Conversion Input" (53). The identified component returned an error from the ProcessInput method. The error is specific to the component, but the error is fatal and will cause the Data Flow task to stop running. There may be error messages posted before this with more information about the failure. (SQL Server Import and Export Wizard)
As you can see, importing data from a plain text file correctly with incorrect column delimiters is not that easy. In the next section, I will show how the SQL DataTool can help you with this.
Import flat file with column delimiter issues
The SQL DataTool offers a solution to importing flat files with column delimiter issues. It will not automatically import the data in the correct way, but it identifies whether the column separators in the file are correct. It will import all rows with the right column separators. When something is wrong, the line is written away in an error file to be reviewed. The import will not stop, and afterwards, you can append the corrected data.
First of all, I define a data model for the plain text file. As you can see below, FIELD5 is defined as an INT field.
Next, I go the Import data screen. Within this screen, I select the right system and version (with the data model attached). I add the column separator or column delimiter in the designated field and select the table.
When I push the Get data button, the import starts. Almost immediately I get the screen below, stating that some errors were found in the plain text file. The lines or rows with errors are saved in the file column_delimiter_error.txt. This file is available on the same location of the source file.
The data that is imported is available in the database in the right format. So FIELD5 is an integer column and can only contains numeric values.
The file column_delimiter_error_txt contains the rows with errors. First I have to rename the file so it does not end with 'error'. I update the file (e.g. remove the superfluous column delimiters) and import the file using the Append data check in the Import data screen. Afterwards, all the data is available and usable in the database.
This example only contained 5 rows of data. I am working with flat files with hundreds of thousands to multiple tens of millions of records. You can imagine what a headache one column delimiter too many would cause me without using SQL DataTool!
To get your trial version of the SQL DataTool, go to the download section.