This Question is Possibly Answered

1 "correct" answer available (20 pts) 1 "helpful" answer available (10 pts)
25 Replies Last post: Dec 5, 2008 8:49 AM by MarkBird   1 2 Previous Next
Rex Mc Apprentice 7 posts since
Nov 13, 2007
Currently Being Moderated

Nov 20, 2008 3:51 PM

Unicode Development Feedback

“We are currently investigating including support for Unicode (UTF-8) language support, a worldwide standard, in the LANDesk Management Suite. The current consideration is whether or not to support, in addition to Unicode, the older code-page based language capability.  Our current research has lead us to the design that would only support a Unicode implementation . The FAQ below is meant to answer questions that may surface about Unicode and Code-page support. After reading the FAQ, please feel free to respond in the Community. . . with your observations and any potential issues that would arise from a Unicode-only scenario.”  If other concerns arise, please post and we’ll address them as well.

FAQ

·      What if LDMS required a Unicode database and no longer supported a database that stored code-page data?  Would I have to reinstall my database?

o    We are not intending to do an in-place upgrade of the database.  We will require a new database for the Unicode version of LDMS.  For more information, see “How much work do I have to do to prepare for an LDMS Unicode release” below.

·      Will the size of my database increase?

o    In rare cases the size of the database will increase.  We will standardize on UTF-8 encoded data.  The UTF-8 encoding allows characters to be encoded as either one, two or three bytes.  In the rare case that the Unicode character gets encoded as a three-byte character, your database will increase by one byte.
When LDMS supports a Unicode installation, all CHAR data types in the database will be converted to VARCHAR data types.  CHAR data types are fixed length fields and if we used them, we would definitely increase the size of the database.  However, we will not be using any CHAR data types when we switch to Unicode (for other reasons not pertaining to database size).

·      Will the performance of my database degrade?

o    We do not expect the performance of any queries to degrade.  The degradation will come more from the amount of data in the table and the indexes being out-of-sync than it will from Unicode data existing in the table.

·      How would I convert my existing data into Unicode?

o    You won’t have to.  This is something LDMS will do for you during the upgrade process.  We will use the code-page of the DBMS server as the source and UTF-8 as the destination.  Both Oracle and SQL Server support this type of conversion.

·      What does “forcing Unicode” mean if I use SQL Server?

o    This would mean LDMS would require the national character set data types (NCHAR, NVARCHAR, NTEXT).  Every query that queries a string needs to add an N before the beginning quote, if the string being queried contains Unicode characters.  For example
Select * from computer where devicename = ‘ABC’
Select * from computer where devicename = N'MultiByte Characters'

In the first example, A, B and C all have the same location in the Unicode plane as they do in every code-page.  That means, an A in Latin has the same ASCII value as an A in Japanese which has the same value as a Unicode A.  Therefore, no N is required in front of the string literal ‘ABC’.
In the second example, the simplified Chinese characters
do not share the same ASCII value in the simplified Chinese code-page and the Unicode plane.  Therefore, an N is required before the string literal so SQL Server knows that the string is a Unicode string and not a code-page string.

·      What does “forcing Unicode” mean if I use Oracle?

o    We are choosing not to support the national character set data types NCHAR and NVARCHAR.  Instead, we will require a UTF-8 instance for Oracle users.  None of your queries will have to change because everything will be UTF-8.  If we were to use the national character set data types, Oracle users would have to modify all of their external queries (non-LDMS) as explained in the “What does “forcing Unicode” mean if I use SQL Server” above.

·      Will I lose any data?

o    No, if you are operating in a supported LDMS environment.  If you have chosen to put Japanese and Chinese data in the same database, then whichever data isn’t consistent with the code-page of the DBMS server is already corrupted.  Corrupted data in that sense will remain so until the new LDMS inventory scanner is distributed to that client and it reports UTF-8 data.

·      How much work do I have to do to prepare for an LDMS Unicode release?

o    We will require a new database container (database for SQL Server, instance for Oracle).  We are not planning on doing an in-place upgrade of the database.
SQL Server users will need to create a new database on their DBMS server.  We will copy all the data from the previous LDMS database into the new database while at the same time, converting the data to UTF-8.
Oracle users will need to create a new UTF-8 instance.  If they are already running inside a UTF-8 instance, they will need to create a new schema.  We will copy all the data from the old schema into the new schema and convert the data to UTF-8 during the copy process.
Once this conversion is finished, we will no longer need the old database, so you’re at liberty to remove it.  What that really means is that at the time of installation/upgrade, you will need to twice the disk space available so we can “mirror” your database data.

David Thompkins Rookie 17 posts since
Feb 26, 2008
Currently Being Moderated
1. Nov 20, 2008 7:19 PM in response to: Rex Mc
Re: Unicode Development Feedback

This is a tough call -- I personally have first hand experience with the challenges encountered in multi language environments: Asia Pacific seems to be the land of the double byte character set languages .

 

  • What does moving to unicode only do for down level support? Currently we can still manage old clients from new core servers.  Will inventory coming from older clients be an issue?
  • What about multi core environments? Originally, when this discussion began the thought was that the Rollup Core would support unicode and some sort of translation to support collecting data from cores using various different codepages.  Does this suggest that all cores and clients have to be upgraded to maintain rollup functionality?
  • It is noted that a new database container (for SQL200x) will be needed.  In 2005, you can set the database collation at both the server level during installation, and then set databases to use collations different from that which is set at the server level.  I vaguely recall opening a case on this to find that LDMS uses the server collation and not that of the database -- if this is still the case, won't this require that we have customers install a separate SQL instance to support LDMS if their server collation is 1252 or something else?
  • What impact, if any, will this have on other products, such as process manager or ALM, which are (according to the installation guide) to be set up on a SQL instance and database using code page 1252-Latin 1?
MarkBird Rookie 8 posts since
Nov 20, 2008
Currently Being Moderated
2. Nov 21, 2008 8:43 AM in response to: David Thompkins
Re: Unicode Development Feedback

These are great questions.  Thank you very much.

 

1. Inventory will be backwards compatible.  That being said, help me with the upgrade use case.  I have core 1 that is JPN.  All of its clients are JPN.  When I upgrade Core 1, I have two choices: one, I can do an in-place upgrade or two, I can create a new core.  If I do an in-place upgrade, I can use the code-page of the core to convert all the old scan files to UTF-8 and not corrupt any data.  If I build a new core, do I do it off-line and keep the name the same, or do I give it a new name.  If I give it a new name, then I have to redeploy my clients to get them to talk to the new core.  If that's the case, there's no problem because all my old clients now have the new agent on them.  If I keep the name the same and change the OS language at the same time, I can no longer use the code-page of the core to convert the data.  Is this a valid scenario?

 

2. There are numerous discussions currently going about the rollup utility--more than I should go into here.  At the very least, the rollup utility would use the code-page of the core's database as the "source" and UTF-8 as the "destination" and convert the data during the rollup process.

 

3. I will have to investigate this.  I thought we were using the collation of the database, not the collation of the server.  If we're using the server's collation, we'll have to fix that.

 

4. LPM and ALM currently do not co-exist in the same database as LDMS.  They can be on the same SQL server, but not in the same database.  Therefore, as long as we use the database collation and not the server's collation, this shouldn't be an issue.

Paul Hoffmann Master 1,363 posts since
Dec 11, 2007
Currently Being Moderated
4. Nov 25, 2008 3:11 AM in response to: Rex Mc
Re: Unicode Development Feedback

* I'll try to get a few more eyes on this - good stuff here Rex / Mark.

 

* Just to clarify (this has been forumated a bit strangely in the middle of th FAQ, but then clarified at the end), we do NOT do an upgrade of an existing DB, it's more of a data-migration from "old DB in some character-set" to "separate, new DB in UTF-8".

 

Other than that, this looks reasonably straight forward, good FAQ.

 

* Just to clarify - the "performance hit" we'd expect to see would be because of currently existing issues - i.e. bad indexes / need for additional indexes to help the highest-load SQL queries on this environment, type things.

 

* Any particular reason why we went for UTF-8 as opposed to UTF-16? I only ask as some of the Oracle-DBA's I've talked to are very fond of UTF-16 ... it doesn't seem to be that different to me at this point one way or the other, but oh well .

 

* One thing that hasn't been mentioned yet - Rollups. Will this process work for a Rollup as well, or would we need to re-create the Rollup? I'm hoping it's a case of the former, as there's some customers out there who use the Rollup to centrally manage their environment, and "starting from fresh" would be a bit of a tough sell there.

 

* Good point of clarifying the "corrupted character" stuff ... I'm glad to see that this should be a reasonably painless process of just needing the inventory scanner to update content. BUT ... related to that (something to think about) ... what about things like SLM (where a "START <garbled characters showing as boxes>" is going to be separate to "START <some real 2-byte characters>" ...

 

I suspect this would have to be done manually to clean up / sort out what needs to be kept and what not (as I don't see a way that we can automatically determine this, and we're very conservative about deleting / overwriting data.

 

Hope this gives a few things to think about.

 

That's all I have for now. Thanks for bringing this up.

 

Paul Hoffmann

LANDesk EMEA Technical Lead

Zman Expert 1,115 posts since
Dec 14, 2007
Currently Being Moderated
5. Nov 25, 2008 4:48 AM in response to: Rex Mc
Re: Unicode Development Feedback

My big concern, since we do not deal with different languages in our shop, is performance.  You mention query performance in the FAQ, will there be any other areas of the program that MAY suffer from going to Unicode?

Paul Hoffmann Master 1,363 posts since
Dec 11, 2007
Currently Being Moderated
6. Nov 25, 2008 5:01 AM in response to: Zman
Re: Unicode Development Feedback

Well - there's "the data" itself.

 

For instance - say you (like me) aren't well versed in Japanese.

 

If you end up pulling reports on software, which may have Japanese (in this example, but could be any "not your native language" case), then this report is going to be somewhat limited in its use. You would need to get this translated by someone to tell you what it's actually containing (to see if it's something you should be worried about, or whether it's perfectly fine).

 

This is mostly going to be a problem for multi-national organisations - they'll need to be more careful with their reporting, and/or have to prepare the resources of being able to translate these things in a format that they'll be able to actually use the data the DB is containing.

 

One of those cases of "be careful what you wish for" ... while Unicode *WILL* make a lot of things very much easier (no doubt there), it WILL also cause its own (indirect) problems.

 

Not many of us speak 20+ languages after all .

 

Paul Hoffmann

LANDesk EMEA Technical Lead

Zman Expert 1,115 posts since
Dec 14, 2007
Currently Being Moderated
7. Nov 25, 2008 5:37 AM in response to: Paul Hoffmann
Re: Unicode Development Feedback

Good point, however, I was going for more of what performance hits will this change have on overall use, not just queries. You are correct there will be things oozing out of the cracks that nobody ever thought off.

MarkBird Rookie 8 posts since
Nov 20, 2008
Currently Being Moderated
8. Nov 25, 2008 8:58 AM in response to: Zman
Re: Unicode Development Feedback

I don't believe there will be any significant performance hits to the database itself.  In the case of Microsoft SQL Server, it stores all of its data in UCS-2.  So when I currently query MS SQL Server and use a non-Unicode string, SQL Server already converts the data from a "code-page" into UCS-2.  Since we will now be using UTF-8 strings, the conversion will still occur, but in theory, the conversion should be faster.  Here's why:

 

UTF-16 contains multiple Unicode "planes," each consisting of 65536 characters.

 

Unicode Plane 00, the first Unicode plane, also known as the Basic Multilingual Plane (BMP), contains all the characters used in all worldwide languages today.

 

UCS-2 only references Plane 00.

 

UTF-8 is a compression algorithm, not a Unicode plane.

 

UTF-16 will always consume 2 bytes.  UTF-8 will consume 1-3 bytes depending on the character.  The characters A-Z, a-z and 0-9 will be compressed to 1 byte using UTF-8.  Since most of the data reported by the inventory scanner falls into this set of characters, we chose UTF-8 as our standard because we wouldn't increase the size of databases for customers that didn't require a Unicode solution.  We may have to revisit this and standardize on UTF-16 depending on ISO 10646, but we're still investigating that.  Currently, we're sticking with UCS-2 and using the UTF-8 encoding.

 

When you decompress UTF-8, you're left over with a UTF-16 string.  If all the characters in that string fall into the first Unicode plane, plane 0, in reality you have a UCS-2 string.  All of this conversion is already happening within SQL Server.  We are simply removing the final conversion from a code-page string to a UCS-2 string.  In reality, this occurs so fast, it will be negligable.  Therefore, I don't expect there to be performance degradation at the database.

 

That being said, there will be performance hits throughout LDMS.  The inventory service will only deal with UTF-8 scan files.  In order to be backwards compatible, the inventory service will still accept scans that aren't UTF-8 encoded, but the inventory service will have to convert those scan files to UTF-8 before it can process them.  Again, that is a quick conversion.  However, we don't know the code-page of the client, so we will be forced to use the code-page of the core server when converting those scan files.  That means, if you send a JPN code-page scan to a core server, the core server must be JPN in order to convert the scan to UTF-8 without corrupting any of the data.

cplatero Rookie 1 posts since
Mar 7, 2008
Currently Being Moderated
9. Nov 25, 2008 3:21 PM in response to: MarkBird
Re: Unicode Development Feedback

Mark - if I'm reading your post right, the core server will still only be able to properly process scan files from one code page?  For example it won't be able to accept both Chinese and Japanese client inventory scans?

 

Or is that only a limitation for backwards compatiblity and the new client inventory scanner will allow the core to accept both languages?

MarkBird Rookie 8 posts since
Nov 20, 2008
Currently Being Moderated
10. Nov 25, 2008 3:33 PM in response to: cplatero
Re: Unicode Development Feedback

We will be changing the inventory scanner to report UTF-8 data.  If that version of the scanner is deployed, we will store any language you send to us.  If you don't deploy that version of the scanner (legacy), the same scenario that exists today will still exist--the code page of the client must match the code page of the core so we can convert the file at the core.

 

By the way, we're investigating GB18030, which may force us to go to UTF-16.  If we do, we will still compress the strings using UTF-8.  It's just that our source will be UTF-16 instead of UCS-2.

Steve Wieringa Rookie 1 posts since
Nov 26, 2008
Currently Being Moderated
11. Nov 26, 2008 8:08 AM in response to: Rex Mc
Re: Unicode Development Feedback

"The current consideration is whether or not to support, in addition to Unicode, the older code-page based language capability.  Our current research has lead us to the design that would only support a Unicode implementation . "

We are very much looking forward to LANDesk's implementation of a Uni-code database that will enable a multi-language database.  We have no issue in creating a new database or the extra storage this will require temporarily during the migration.  We recommend LANDesk adapt a multi-language supported database, focus engineering resources solely this new format by discontinuing the older code-page database format, and ensure the new multi-language enabled database does not provide a downside to single language customers such as ensuring performance is not an issue and the migration is not painful.

With a multi-language enabled database we plan to consolidate our 7 databases (located in three locations) to 3 databases (located in 3 locations).

If agent upgrades are required to take advantage of this functionality we will require a more bandwidth considerate method of updating our 20,000 + global agents than pushing 20+MB agents 20,000 times over limited WAN links.   Our first preference would be to update the Inventory Scan via Security Suite.  If more of the agent requires upgrading, we would like to see the Advance Agent code corrected so it can be used for upgrading as well as first time installs as this method takes advantage of files on local subnet peers.

 

 



 

Paul Hoffmann Master 1,363 posts since
Dec 11, 2007
Currently Being Moderated
12. Nov 26, 2008 8:51 AM in response to: Steve Wieringa
Re: Unicode Development Feedback

You're describing Advanced Agent there (and just the reason why we created it), Steve . The tech is already available .

 

And in other good news, it get a few refinements along the way as well .

 

Paul Hoffmann

LANDesk EMEA Technical Lead.

David Thompkins Rookie 17 posts since
Feb 26, 2008
Currently Being Moderated
13. Nov 26, 2008 2:06 PM in response to: MarkBird
Re: Unicode Development Feedback

Mark --

 

A few more use cases based on a couple of things I've done during upgrade or install paths:

 

Doing a side by side core upgrade will often prompt me to have my customer create an alias in DNS to point the old core server name to the IP of the new core.  This way, we can get inventory and other functionality up quicker than we may be able to deploy agents.  Of course, we'd be using the keys from the old core.

 

Also, when I was doing an implementation for a China based customer who needed support for both simplified chinese (code page 936) and English Latin 1 (code page 1252), I configured the database to use code page 936 on all of the core server databases.  All of the cores and consoles (including the rollup) I installed under US English. With Asian Character support enabled in Windows, a console accessing a core would display the chinese characters if they were present in inventory.  This introduced a few functionality challenges, which I'll get to, but also allowed for the China headquarters to manage clients from sites in 99 countries without having to terminal service to cores around the world.  One core in a particular region recieves inventory scans from machines with English, Chinese, Japanese, Korean and other OS languages, though as far as I recall there is only the English version of the LANDesk agent being deployed across the boards (since the cores are all English).  This would contravene the upgrade case you mentioned about upgrading a core of a specific language and it only supporting that one language.  I should note, of course, that the other languages on the core in this region aren't supported by the database and come up as garble, but the English and Chinese (simplified) work fine.  I'm conscious of what may happen with them under the proposed initiative.

 

regarding server collation, I was poking through the documentation I put together for the customer and found more detail:

 

<CUSTOMER> has the requirement to support the existing LANDesk infrastructure, which is configured to use the Simplified Chinese Language and Character sets, as well as the solution in progress. Because the current solution uses Chinese, which is a double byte character set, the rollup core which will contain data from this solution, must also be double byte.   From an SQL server standpoint, this would mean that the server collation needs to be of a codepage that supports double byte character sets. In SQL Server 2005, this would generally not produce an issue, as the databases can have their own collation independent of the instance installed default, however testing indicated that the LANDesk rollup operations only seemed to use the server level collation versus the database.   As such, all LANDesk databases will be set to use code page 936 (Simplified Chinese). Additionally, to support the requirement to potentially access LANDesk and Windows by users native to both languages, it is strongly recommended to maintain support for both within the OS as well.   This can be achieved as detailed in the instructions provided by Microsoft in the link below:

 

# http://www.microsoft.com/globaldev/handson/user/xpintlsupp.mspx#E4

 

In reference to what Paul raised, I had a bit of a think about it during the aforementioned customer engagement and found the following would likely be the case with respect to challenges managing LANDesk in multi language environments:

 

 

LANDesk, which at this time doesn’t support Unicode, should work with other languages such as Japanese from a functionality standpoint.  The things I think with respect to this (and per discussions with others in the factory) are as follows:

  • ·          Remote Control should work fine
  • ·          Software Distribution should work fine
  • ·          OS Deployment (not used) should work fine
  • ·          Hardware inventory (overall) should work fine.  There are potential issues with this due to the possibility that some details pulled from BIOS comes from WMI and may be specific to the OS language.

 

 

What will definitely have issues:

·

  •         Software License Monitoring: Application discovery may have some challenges due to the need to qualify applications based on names which might be specific to a language/locale setting
  • ·          Policy Management: Policies that run based on LANDesk queries will have issues if the clients with unsupported character sets are targets.  This is because the clients may not properly report back to inven