utf8_general_ci VS utf8_unicode_ci what should we use?

 https://medium.com/@nilesh.patil.d/utf8-general-ci-vs-utf8-unicode-ci-what-should-we-use-cc01b58c00cc

  • Accuracy
    utf8mb4_unicode_ci is based on the Unicode standard for sorting and comparison, which sorts accurately in a very wide range of languages.
    utf8mb4_general_ci fails to implement all of the Unicode sorting rules, which will result in undesirable sorting in some situations, such as when using particular languages or characters.
  • Performance
    utf8mb4_general_ci is faster at comparisons and sorting, because it takes a bunch of performance-related shortcuts.
    On modern servers, this performance boost will be all but negligible. It was devised in a time when servers had a tiny fraction of the CPU performance of today's computers.
    utf8mb4_unicode_ci, which uses the Unicode rules for sorting and comparison, employs a fairly complex algorithm for correct sorting in a wide range of languages and when using a wide range of special characters. These rules need to take into account language-specific conventions; not everybody sorts their characters in what we would call 'alphabetical order'.
  • For examples, the Unicode collation sorts “ß” like “ss”, and “Œ” like “OE” as people using those characters would normally want, whereas utf8mb4_general_ci sorts them as single characters (presumably like "s" and "e" respectively).
  • Some Unicode characters are defined as ignorable, which means they shouldn’t count toward the sort order and the comparison should move on to the next character instead. utf8mb4_unicode_ci handles these properly.
alter table `dbname`.`tablename` convert to characterset utf8 collate utf8_unicode_ci;

Không có nhận xét nào:

Cold Turkey Blocker

 https://superuser.com/questions/1366153/how-to-get-rid-of-cold-turkey-website-blocker-get-around-the-block Very old question, but still wan...