Convert everything to Unicode Charset

proposed by Barry Hunter
In Progress

Right now Geograph uses a basic charset (Latin1) for storing/processing text, this mostly works, but has limited support for accented characters, ligatures and other special characters, meaning their display is intermittent (sometimes works, sometimes get corrupted)

... in theory it should be possible to convert EVERYTHING (website frontend, database, custom code, search engine system etc) all over to use UTF-8/Unicode for maximum compatibility.

(there is perhaps three major components to applying this 1) changing all underlying systems to correctly store utf8, 2) testing/fixing all systems to correctly process (not currupt it!) and importantly 3) cycle though all the already currupted data, and fix it!)

This is a big job with lots of interrelated systems to change and test for compatibility, so ideally could done a specialized project.

Pledges in support of this idea · I'll have a trawl (of my own descriptions and those of any contributor who'd like me to do so) for Welsh place names using diacritics (to-bach), fixing them in the text and adding a tag with the name in plain letters for searchability. pledged by Rudi Winter
· I'm happy to match Rudi's offer for Gaelic. pledged by Tiger

If you built this feature, you get these rewards!
Why people think this would be a good idea... · Sensible to have the same character set across all features of the site, making some less frustrating to use. This would also cut out the need for a lot of post-submission sub-editing. by Robin Stott
· Three of the four languages native to GBI territory use extended alphabets. Many placenames contain diacritics. As a geographical-educational project we should support this properly. by Rudi Winter
· Would be very worthwhile. Rudi missed out Manx, which with English, Latin and Basque doesn't need diacritics - but even they need to spell exotic loans and proper names sometimes. by Tiger

Created: Mon, 21 Mar 2016, Updated: Tue, 19 Jun 2018

