Project: QGIS

Version: 3.12

Feature: Fix numerous shapefile encoding issues

This fixes the (broken by design?) handling of Shapefile encoding, which has been an ongoing issue for years in QGIS.

See discussion at

  • #21264
  • http://osgeo-org.1560.x6.nabble.com/Shapefile-with-file-cpg-codepage-td5275106.html
  • http://osgeo-org.1560.x6.nabble.com/QGIS-ignore-the-cpg-files-when-loading-shapefiles-td5348021.html

(+ others!)

The situation was that we had two different code paths for handling GDAL side attribute decoding OR QGIS side decoding. Unfortunately, they are both incompatible with each other, and due to GDAL API for this, we can't unify the two approaches. (More technical detail in the commit log message!)

So, now we: - always do the decoding on QGIS' side. This allows users to manually override a shapefile's declared encoding because they are often incorrect! - use a port of GDAL's shapefile detection logic (it's not exposed in GDAL API, so I had to re-implement it here) so that we default to reading shapefiles by respecting the embedded encoding information (via CPG files or DBF LDID information) - Completely remove the confusing/broken "Ignore shapefile encoding declaration" option, as it's no longer required -- users are ALWAYS able to manually change the encoding of shapefiles layers if needed - Always show users the detected embedded encoding in the layer properties, instead of always showing "UTF-8" when the embedded encoding information is used

This should give the best of both worlds -- a nice default behavior resulting in shapefiles being read with the correct encoding, whilst still allowing users to override this on a layer-by-layer basis as needed.

This feature was developed by Nyall Dawson