IBM Cognos CSV Files: The Encoding Mystery


The ability to export a report to CSV format is one of great out of box capabilities provided by the IBM Cognos suite of tools. A delimited text (CSV) file can be opened in any application associated with the .csv file type, such as Microsoft Excel, Microsoft Wordpad, or Star Office in Unix. In addition, the CSV format file is also widely accepted by most modern database vendors. Providing the end users with this ability makes report data easily re-usable and integration friendly with other applications.
Of course, there are always caveats to consider with any technology solution. For example, I once had a project where the customer informed me that the exported CSV data file couldn’t be used in a third party application. The numeric information and English letters appeared as unreadable “symbols” when the file was consumed by their particular application. Since a CSV file should have no issues being consumed by any application, why was it that this plain CSV file couldn’t be interpreted by that application? After research and some testing, I was able to quickly identify that the root cause was “surprisingly” simple – the character encoding.
To uncover the mystery, let’s first take a look at what can be found inside the “plain-looking” CSV formatted text file. By default Cognos reports saved in CSV format utilize the following attributes:
  • Designed to support Unicode data across many client operating systems.
  • UTF-16 Little Endian data-encoded.
  • Include a BOM (Byte Order Mark) at the beginning of the file.
  • Tab-delimited.
  • Does not enclose strings in quotation marks.
  • Uses a new line character to delimit rows.
  • Show only the results of a report query. Page layout items, such as titles, images, and paramDisplay values do not appear in the CSV output.
As you can see, the CSV file is no longer “plain-looking” but includes multiple different characteristics which must be considered in various cases.
For our example, we will focus on the fact that in the CSV formatted report characters are encoded in UTF-16/Unicode, which is widely used today in many applications (but not all!) to support multi-language. Why characters need to be encoded? Fundamentally computers just deals with numbers. It stores letters, numbers and other characters by assigning a numeric value for each one. Before Unicode was invented, there were many different character encoding systems for assigning these numbers to represent characters. This leads to a significant problem as no single encoding method could contain enough characters to represent even just one single language. For instance, in the English language no single encoding is adequate for all the letters, punctuation, and technical symbols commonly used.
Unicode became the solution. It provides a unique number for every character, no matter what the platform, no matter what application, no matter what the language. The Unicode Standard has been adopted by most industry leading leaders. It is supported in many operating systems, all modern browsers, and many other applications/systems. Unicode enables a single software product such as a single website to be targeted across multiple platforms, languages (countries) without re-engineering. It allows data such as a Cognos report CSV formatted data file to be transferred through many different systems without corruption. That being said, as long as the target system supports Unicode then there should be no issues with re-using the Cognos exported CSV data. In the scenario I have mentioned above, as you can imagine the issue is caused by the target system, which didn’t support Unicode.
I am sure the next question you will ask is how to resolve such encoding conflicting issue? Is there a way to set encoding method when exporting a Cognos report in CSV formatted? Fortunately the answer is “Yes”. You can modify properties for the exporting CSV output format from Cognos system admin console.
Here are the Steps:
1. Start IBM Cognos Connection.
2. In the upper-right corner, click Launch, IBM Cognos Administration.
3. On the Status tab, click System.
4. From the System drop-down menu, click Set properties.
5. Click the Settings tab.
6. Next to Environment, Advanced Settings, click Edit.
7. Enter parameter RSVP.CSV.ENCODING and a supported encoding value such as windows-1252which is also known as “Windows characters”. The default value Cognos used is utf-16le.

8. Click OK. Wait about 30 seconds to take changes effect.

Comments
1 Comments

1 comments:

Post a Comment