Aggregate class: a first-attempt

Discussion:

Tim Chase

2007-02-22 18:13:40 UTC

Below I've pasted my first attempt at an aggregate function
class. Its __init__ takes a queryset (and an optional list of
aggregate functions to perform if, say, you only want the "sum"s
rather than min/max/avg/sum). Thus you should be able to do
things like
... blah = CharField(maxlength=42)
... xxx = IntegerField()
... yyy = SmallIntegerField()
... zzz = PositiveIntegerField()
...
[add some data to Foo]

stats = Aggregate(Foo.objects.filter(xxx__gt=21))

The "stats" object now has properties
xxx_sum
xxx_max
xxx_min
xxx_average
yyy_sum
yyy_max
yyy_min
yyy_average
zzz_sum
zzz_max
zzz_min
zzz_average

which can be accessed to provide the associated stats for the
given model. If you make your own field types, if they're
numeric, you'll want to add them to the set().

The constructor can also be called with the list of functions you

stats = Aggregate(Foo.objects.all(), ['sum'])

in case you just want the *_sum properties.

Future enhancements might include aggregates that don't involve
sum/avg, so one might get the min/max of a date/time/char field.

I haven't yet figured out a way to suppress the order_by portion,
so what's currently in there is an ugly hack. But it would need
to prevent the standard methods from inserting an ORDER BY clause
against a non-aggregated field.

At the moment, it meets my needs, but I'd gladly consider any
feedback on this first draft version of things. Also, if you
find it useful, feel free to shred it into your own purposes as
desired.

-tkc

#############################################
from django.db import backend, connection

NUMERIC_FIELDS = set((
'FloatField',
'IntegerField',
'PositiveIntegerField',
'PositiveSmallIntegerField',
'SmallIntegerField',
))

FUNCTIONS = [
'min',
'max',
'average',
'sum',
]

def is_numeric_field_type(field):
return field.get_internal_type() in NUMERIC_FIELDS

class Aggregate(object):
def __init__(self, queryset, only_type=None):
self._cache = None
self.queryset = queryset.select_related(False)
self.only_type = only_type or FUNCTIONS

def all(self):
self._get_results()

def _get_aggregate_sql(self):
q = self.queryset._clone()
q._order_by = '?' # can't use None as it gets
# overridden if a default Meta.ordering is specified
# Should do some magic to prevent it from adding
# the Meta.ordering if possible
select, sql, params = q._get_sql_clause()
meta = self.queryset.model._meta
# build the SELECT contents:
# fn(table.field) AS field_fn
selects = [[
"%s(%s.%s) AS %s" % (
fn.lower(),
backend.quote_name(meta.db_table),
backend.quote_name(f.column),
backend.quote_name("%s_%s" % (f.column, fn.lower())),
) for f in meta.fields
if is_numeric_field_type(f)
] for fn in self.only_type]
# the sum(selects, []) flattens
# the list-of-lists into just a list
return sum(selects, []), sql, params

def _get_results(self):
if self._cache is None:
select, sql, params = self._get_aggregate_sql()
cursor = connection.cursor()
sql = "SELECT " + ",".join(select) + sql
cursor.execute(sql, params)
results = cursor.fetchone()
field_names = [d[0] for d in cursor.description]
self._cache = dict(zip(field_names, results))
return self._cache

def __getattribute__(self, k):
try:
return object.__getattribute__(self, k)
except:
results = self._get_results()
return results[k]

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-***@googlegroups.com
To unsubscribe from this group, send email to django-users-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Honza Král

2007-02-22 23:40:10 UTC

Permalink

Post by Tim Chase
Below I've pasted my first attempt at an aggregate function
class. Its __init__ takes a queryset (and an optional list of
aggregate functions to perform if, say, you only want the "sum"s
rather than min/max/avg/sum). Thus you should be able to do
things like
... blah = CharField(maxlength=42)
... xxx = IntegerField()
... yyy = SmallIntegerField()
... zzz = PositiveIntegerField()
...
[add some data to Foo]

stats = Aggregate(Foo.objects.filter(xxx__gt=21))

The "stats" object now has properties
xxx_sum
xxx_max
xxx_min
xxx_average
yyy_sum
yyy_max
yyy_min
yyy_average
zzz_sum
zzz_max
zzz_min
zzz_average
which can be accessed to provide the associated stats for the
given model. If you make your own field types, if they're
numeric, you'll want to add them to the set().
The constructor can also be called with the list of functions you

stats = Aggregate(Foo.objects.all(), ['sum'])

if you add an empty call (no parameters) to order_by(), it will drop
the ORDER BY clause

Post by Tim Chase
At the moment, it meets my needs, but I'd gladly consider any
feedback on this first draft version of things. Also, if you
find it useful, feel free to shred it into your own purposes as
desired.

quseryset = Model.objects.all()
queryset.aggregate( ( 'name', 'city' ), sum=( 'pay',

'some_other_field' ), avg=( 'pay', 'age' ), count=True )

Post by Tim Chase

[

{
'name' : 'some name',
'city' : 'Prague',
'sum' : { 'pay' : 30000, 'some_other_field' : 10000 },
'avg' : { 'pay' : 10000, 'age' : 30 },
'count' : 5,
},
......
]

or something like:

{
( 'some name', 'Prague') : {
'sum' : { 'pay' : 50000, 'some_other_field' : 10000 },
'avg' : { 'pay' : 10000, 'age' : 30 },
'count' : 5,
},
( 'other name', 'other city') : {
....
}
}

it seems reasonably clean interface to me:
fist argument is a tuple of fields on which to call GROUP BY, then
there are several optional kwargs: avg, sum, min, max and count

I am willing to code up a patch for this, it should be pretty
straightforward. If we would want to take it to the next level, we
could allow filter() on such aggregated queryset, that would manifest
itself in the HAVING clause...

what do you think??

Post by Tim Chase
-tkc
#############################################
from django.db import backend, connection
NUMERIC_FIELDS = set((
'FloatField',
'IntegerField',
'PositiveIntegerField',
'PositiveSmallIntegerField',
'SmallIntegerField',
))
FUNCTIONS = [
'min',
'max',
'average',
'sum',
]
return field.get_internal_type() in NUMERIC_FIELDS
self._cache = None
self.queryset = queryset.select_related(False)
self.only_type = only_type or FUNCTIONS
self._get_results()
q = self.queryset._clone()
q._order_by = '?' # can't use None as it gets
# overridden if a default Meta.ordering is specified
# Should do some magic to prevent it from adding
# the Meta.ordering if possible
select, sql, params = q._get_sql_clause()
meta = self.queryset.model._meta
# fn(table.field) AS field_fn
selects = [[
"%s(%s.%s) AS %s" % (
fn.lower(),
backend.quote_name(meta.db_table),
backend.quote_name(f.column),
backend.quote_name("%s_%s" % (f.column, fn.lower())),
) for f in meta.fields
if is_numeric_field_type(f)
] for fn in self.only_type]
# the sum(selects, []) flattens
# the list-of-lists into just a list
return sum(selects, []), sql, params
select, sql, params = self._get_aggregate_sql()
cursor = connection.cursor()
sql = "SELECT " + ",".join(select) + sql
cursor.execute(sql, params)
results = cursor.fetchone()
field_names = [d[0] for d in cursor.description]
self._cache = dict(zip(field_names, results))
return self._cache
return object.__getattribute__(self, k)
results = self._get_results()
return results[k]

--
Honza Kr�l
E-Mail: ***@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-***@googlegroups.com
To unsubscribe from this group, send email to django-users-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-users?hl=en
-~----------~----~--

Tim Chase

2007-02-23 01:46:16 UTC