Proposed type mapping addition to DB-API

Author: Carsten Haese

Revision History:

Abstract

While DB-API implementations generally choose meaningful default mappings between database-side data and Python objects, some applications may benefit from added flexibility in this mapping process. I propose a type mapping mechanism that builds on top of an exisiting canonical mapping between database columns and Python objects.

This proposal outlines a type mapping mechanism that allows mappings in either direction, reading from the database as well as writing to the database. It is flexible enough to handle any data type that the underlying database supports, including User-Defined Types.

Specification

This specification assumes the presence of a canonical mapping that maps between database types and Python types. The canonical mapping is simply whatever mechanism a DB-API uses by default to translate between database values and Python objects. This is generally database dependent to allow for database specific types, but standard SQL types should be mapped to Python types in a database-independent manner. For example, it is reasonable to assume that SQL integers will be mapped to Python ints.

I suggest the following addition to DB-API version 2:

Examples and Common Use Cases

The following pseudo-code examples illustrate how an application might use the suggested type mapping feature.

Spatial Data

Let's assume a database that supports spatial data, and the canonical mapping of spatial data uses binary streams. An application could do something like this:

import somedb
from SomeSpatialToolkit import GeometryObject

spatial_output = {
  "geometry": GeometryObject.fromBinary 
}
spatial_input = {
  GeometryObject: GeometryObject.toBinary 
}

conn = dbapi.connect(...)
conn.outputmap = spatial_output
conn.inputmap = spatial_input

Of course, an API module for a database that supports spatial types might want to include standard maps for convenience that encapsulate this functionality.

Strings, Decimals, Dates

Common use cases for type mappings include whether strings should be returned as byte strings or decoded into unicode, whether decimals should be returned as floats or as Decimal instances, and so forth. All these are use cases for adapter functions. The outputmap provides a convenient way to register one-way type adapter functions:

CharAsUnicode = {
  "char": functools.partial(unicode, encoding="utf-8")
}
DecimalAsFloat = {
  "decimal": float
}
conn = somedb.connect(...)
conn.outputmap = somedb.combine_maps(CharAsUnicode, DecimalAsFloat)

This example assumes that the API module implements a utility function for adding two dictionaries together.

API implementations are encouraged to define standard typemaps for often used conversion tasks such as these. The standard typemaps might want to implement __add__ so that application code can simply say:

import somedb
from somedb.typemaps import *
conn = somedb.connect(..., outputmap=CharAsUnicode+DecimalAsFloat)

Possible Extensions

Per-Column mappings

Discussion on the db-api mailing list has brought up the need for column-specific mappings, i.e. the need to map two different columns that participate in the same query to be mapped by different adapters despite having the same data type. The classic use case is a character column storing a date that should be fetched as a datetime object.

For the output direction, this need can be fulfilled by a cursor attribute such as col_outputmap that would map column names to adapter functions that would override the cursor's outputmap.

For the input direction, column specific overrides are not possible to do in a practical way. Input parameters are not necessarily destined for a particular database column, and even if they are, the column name may not be readily available by the underlying database API. For this reason, column-specific input maps are not part of this proposal.

More low-level control

As flexible as the proposed scheme is, it may not fit the needs of all DB-API implementations. In particular, the lowlevel communication layer may want to change the nature of the canonical mapping, i.e. change the lowlevel output binding, depending on whether and how a particular type is mapped.

For example, suppose a decimal column is being read, but the type mapping specifies that decimals should be mapped to floats. Rather than waste time with fetching a string or Decimal instance that contains all digits only to turn them into an imprecise float, it may be faster to bind the column directly as a float at the low level.

Since the internals of output binding are database specific, this should probably be addressed with a database specific extension of the typemap mechanism.

A possible solution is to add a database-specific attribute called _binding_hint to the corresponding converter function. The output binding step could inspect this setting and use it to decide how to bind the column in question. Pseudo-code for binding decimals as floats might look like this:

def FloatPassthrough(floatval):
    return floatval
FloatPassthrough._binding_hint = somedb.hints.BIND_AS_FLOAT

DecimalAsFloat = {
  "decimal": FloatPassthrough
}
conn = somedb.connect(..., outputmap=DecimalAsFloat)

Even more Flexibility

There may be use cases where it may be beneficial to have certain factors influence how the type adapters operate. For example, an adapter that converts byte strings to unicode objects may need to find out on the fly which encoding the database is using.

This need would be served by adding optional keyword parameters to the argument list with which the converter function is called. For instance, the implementation may choose to pass a reference to the current cursor to the adapter.

If it is decided that there are enough use cases for such additional arguments, the list of possible arguments should be standardized.

Conclusion

The proposed scheme allows most, if not all, common use cases of type conversion schemes in existing DB-API implementations. It is similar to schemes that use adapter functions, but the use of arbitrary mappings offers flexibility that can not be matched by more rigidly structured type registry schemes.

Comments and discussion on the db-api mailing list are welcome.