kexi project
Rapid Database Application Development
Development
"Microsoft Access for Linux"

Home Download FAQ Support Features Handbook Screenshots Screencasts Compile Kexi Development Authors Contact License Sponsorship Translate This Site

wiki navigation:

Front Page
[info] [diff] [login]
[recent changes]
[most popular]
You can donate
to Kexi Project:
Via PayPal

Spread the word about Kexi!
Get Kexi Now!

Built on the KDE technology
KDE

Member of the Calligra Suite

No Software Patents!

KexiDBDesign:

Database Collation Issues

 started by: jstaniek
 status of ideas mentioned here: planed for 1.1

Table of Contents
   Introduction
     Types
     Affected SQL Operators
     Implications For Data Integrity
     Problem with Non-latin1 Character Sets
   Collation in SQLite Backend
   Collation in MySQL Backend
   Collation in PostgreSQL Backend
   Implications For Query Designer
   Competition

Introduction

^ toc

Types

Collation type is a rule used when comparing (or sorting) text strings in a database. Lets assume we have three database records in a name column of a table:

 Joe
 Adam
 joan
  • NOCASE collation means that a comparison like name > "adam" will return:
 Joe
 Adam
 joan
In other words, case-insensitive comparison is performed.
  • BINARY collation means that a comparison like name > "adam" will return:
 joan
Case-sensitive comparison is performed here, so neither "Joe" nor "Adam" are greater than "adam".
^ toc

Affected SQL Operators

Following SQL comparison operators are affected:

 < > <= >= <> != = == LIKE.
^ toc

Implications For Data Integrity

Example for MySQL:

 create table dictionary (
  word varchar(255) COLLATE latin2_bin NOT NULL,
 );
 ALTER TABLE dictionary ADD UNIQUE INDEX (word);

Having BINARY collation latin2_bin defined now it's possible to insert

 Bag
 bag

while, with NOCASE collation, the unique index would not allow this.

^ toc

Problem with Non-latin1 Character Sets

Daabase backend needs to offer built-in support for Non-latin1 character collations. Kexi could not implement a workaround at the client side if a query has to be executed at the server side.

^ toc

Collation in SQLite Backend

SQLite supports BINARY collation by default.

Non-latin1 Character Sets: we can patch SQLite to add such a support. See ser-defined Collation Sequences.

^ toc

Collation in MySQL Backend

'By default, MySQL searches are not case sensitive.

See http://dev.mysql.com/doc/refman/5.0/en/case-sensitivity.html See http://dev.mysql.com/doc/refman/5.0/en/charset-mysql.html

To force BINARY collation, we will add COLLATE clause, e.g.:

 select * from table where name>='abc' COLLATE latin1_bin

Notes: latin1_bin collation name always works. Collation and character sets can be properly selected on db creation or connecting only for MySQL >= 4.1. See also Table Character Set and Collation (mysql manual). So for newer MySQL versions use specific collations, not just latin1_bin.

Good default for collation/character set seem to be: utf8_general_ci/utf8.

TODO

^ toc

Collation in PostgreSQL Backend

TODO

^ toc

Implications For Query Designer

Good default is a #1 priority here. In most cases users expect NOCASE collation to be the default, so we're:

  • adding NOCASE flag to appropriate KexiDB::BinaryExpr objects when SQL statement is generated/anayzed
  • supporting BINARY keyword in the Design View of the Query Designer
^ toc

Competition



Kexi - "MS Access for Linux" ... and Windows
© Kexi Team
This content is available under GFDL
Last edited: January 11, 2006 by js, visited 0 times.