Discussion:
Aggregate class: a first-attempt
Tim Chase
2007-02-22 18:13:40 UTC
Permalink
Below I've pasted my first attempt at an aggregate function
class. Its __init__ takes a queryset (and an optional list of
aggregate functions to perform if, say, you only want the "sum"s
rather than min/max/avg/sum). Thus you should be able to do
things like
... blah = CharField(maxlength=42)
... xxx = IntegerField()
... yyy = SmallIntegerField()
... zzz = PositiveIntegerField()
...
[add some data to Foo]
stats = Aggregate(Foo.objects.filter(xxx__gt=21))
The "stats" object now has properties
xxx_sum
xxx_max
xxx_min
xxx_average
yyy_sum
yyy_max
yyy_min
yyy_average
zzz_sum
zzz_max
zzz_min
zzz_average

which can be accessed to provide the associated stats for the
given model. If you make your own field types, if they're
numeric, you'll want to add them to the set().

The constructor can also be called with the list of functions you
stats = Aggregate(Foo.objects.all(), ['sum'])
in case you just want the *_sum properties.

Future enhancements might include aggregates that don't involve
sum/avg, so one might get the min/max of a date/time/char field.

I haven't yet figured out a way to suppress the order_by portion,
so what's currently in there is an ugly hack. But it would need
to prevent the standard methods from inserting an ORDER BY clause
against a non-aggregated field.

At the moment, it meets my needs, but I'd gladly consider any
feedback on this first draft version of things. Also, if you
find it useful, feel free to shred it into your own purposes as
desired.

-tkc


#############################################
from django.db import backend, connection

NUMERIC_FIELDS = set((
'FloatField',
'IntegerField',
'PositiveIntegerField',
'PositiveSmallIntegerField',
'SmallIntegerField',
))

FUNCTIONS = [
'min',
'max',
'average',
'sum',
]

def is_numeric_field_type(field):
return field.get_internal_type() in NUMERIC_FIELDS

class Aggregate(object):
def __init__(self, queryset, only_type=None):
self._cache = None
self.queryset = queryset.select_related(False)
self.only_type = only_type or FUNCTIONS

def all(self):
self._get_results()

def _get_aggregate_sql(self):
q = self.queryset._clone()
q._order_by = '?' # can't use None as it gets
# overridden if a default Meta.ordering is specified
# Should do some magic to prevent it from adding
# the Meta.ordering if possible
select, sql, params = q._get_sql_clause()
meta = self.queryset.model._meta
# build the SELECT contents:
# fn(table.field) AS field_fn
selects = [[
"%s(%s.%s) AS %s" % (
fn.lower(),
backend.quote_name(meta.db_table),
backend.quote_name(f.column),
backend.quote_name("%s_%s" % (f.column, fn.lower())),
) for f in meta.fields
if is_numeric_field_type(f)
] for fn in self.only_type]
# the sum(selects, []) flattens
# the list-of-lists into just a list
return sum(selects, []), sql, params

def _get_results(self):
if self._cache is None:
select, sql, params = self._get_aggregate_sql()
cursor = connection.cursor()
sql = "SELECT " + ",".join(select) + sql
cursor.execute(sql, params)
results = cursor.fetchone()
field_names = [d[0] for d in cursor.description]
self._cache = dict(zip(field_names, results))
return self._cache

def __getattribute__(self, k):
try:
return object.__getattribute__(self, k)
except:
results = self._get_results()
return results[k]


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-***@googlegroups.com
To unsubscribe from this group, send email to django-users-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---
Honza Král
2007-02-22 23:40:10 UTC
Permalink
Post by Tim Chase
Below I've pasted my first attempt at an aggregate function
class. Its __init__ takes a queryset (and an optional list of
aggregate functions to perform if, say, you only want the "sum"s
rather than min/max/avg/sum). Thus you should be able to do
things like
... blah = CharField(maxlength=42)
... xxx = IntegerField()
... yyy = SmallIntegerField()
... zzz = PositiveIntegerField()
...
[add some data to Foo]
stats = Aggregate(Foo.objects.filter(xxx__gt=21))
The "stats" object now has properties
xxx_sum
xxx_max
xxx_min
xxx_average
yyy_sum
yyy_max
yyy_min
yyy_average
zzz_sum
zzz_max
zzz_min
zzz_average
which can be accessed to provide the associated stats for the
given model. If you make your own field types, if they're
numeric, you'll want to add them to the set().
The constructor can also be called with the list of functions you
stats = Aggregate(Foo.objects.all(), ['sum'])
in case you just want the *_sum properties.
Future enhancements might include aggregates that don't involve
sum/avg, so one might get the min/max of a date/time/char field.
I haven't yet figured out a way to suppress the order_by portion,
so what's currently in there is an ugly hack. But it would need
to prevent the standard methods from inserting an ORDER BY clause
against a non-aggregated field.
if you add an empty call (no parameters) to order_by(), it will drop
the ORDER BY clause
Post by Tim Chase
At the moment, it meets my needs, but I'd gladly consider any
feedback on this first draft version of things. Also, if you
find it useful, feel free to shred it into your own purposes as
desired.
quseryset = Model.objects.all()
queryset.aggregate( ( 'name', 'city' ), sum=( 'pay',
'some_other_field' ), avg=( 'pay', 'age' ), count=True )
Post by Tim Chase
[
{
'name' : 'some name',
'city' : 'Prague',
'sum' : { 'pay' : 30000, 'some_other_field' : 10000 },
'avg' : { 'pay' : 10000, 'age' : 30 },
'count' : 5,
},
......
]

or something like:

{
( 'some name', 'Prague') : {
'sum' : { 'pay' : 50000, 'some_other_field' : 10000 },
'avg' : { 'pay' : 10000, 'age' : 30 },
'count' : 5,
},
( 'other name', 'other city') : {
....
}
}

it seems reasonably clean interface to me:
fist argument is a tuple of fields on which to call GROUP BY, then
there are several optional kwargs: avg, sum, min, max and count

I am willing to code up a patch for this, it should be pretty
straightforward. If we would want to take it to the next level, we
could allow filter() on such aggregated queryset, that would manifest
itself in the HAVING clause...

what do you think??
Post by Tim Chase
-tkc
#############################################
from django.db import backend, connection
NUMERIC_FIELDS = set((
'FloatField',
'IntegerField',
'PositiveIntegerField',
'PositiveSmallIntegerField',
'SmallIntegerField',
))
FUNCTIONS = [
'min',
'max',
'average',
'sum',
]
return field.get_internal_type() in NUMERIC_FIELDS
self._cache = None
self.queryset = queryset.select_related(False)
self.only_type = only_type or FUNCTIONS
self._get_results()
q = self.queryset._clone()
q._order_by = '?' # can't use None as it gets
# overridden if a default Meta.ordering is specified
# Should do some magic to prevent it from adding
# the Meta.ordering if possible
select, sql, params = q._get_sql_clause()
meta = self.queryset.model._meta
# fn(table.field) AS field_fn
selects = [[
"%s(%s.%s) AS %s" % (
fn.lower(),
backend.quote_name(meta.db_table),
backend.quote_name(f.column),
backend.quote_name("%s_%s" % (f.column, fn.lower())),
) for f in meta.fields
if is_numeric_field_type(f)
] for fn in self.only_type]
# the sum(selects, []) flattens
# the list-of-lists into just a list
return sum(selects, []), sql, params
select, sql, params = self._get_aggregate_sql()
cursor = connection.cursor()
sql = "SELECT " + ",".join(select) + sql
cursor.execute(sql, params)
results = cursor.fetchone()
field_names = [d[0] for d in cursor.description]
self._cache = dict(zip(field_names, results))
return self._cache
return object.__getattribute__(self, k)
results = self._get_results()
return results[k]
--
Honza Kr�l
E-Mail: ***@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-***@googlegroups.com
To unsubscribe from this group, send email to django-users-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-users?hl=en
-~----------~----~--
Tim Chase
2007-02-23 01:46:16 UTC
Permalink
Post by Honza Král
Post by Tim Chase
I haven't yet figured out a way to suppress the order_by portion,
so what's currently in there is an ugly hack. But it would need
to prevent the standard methods from inserting an ORDER BY clause
against a non-aggregated field.
if you add an empty call (no parameters) to order_by(), it will drop
the ORDER BY clause
I thought I had tried this. I'll have to give it another whack
on Monday when I get back from PyCon. I had problems with it
trying to order the resultset by the Meta.ordering properties,
when those fields weren't part of the aggregate. I'll try it
again to see if I did something wrong or if I unearthed a bug.
Post by Honza Král
Post by Tim Chase
quseryset = Model.objects.all()
queryset.aggregate( ( 'name', 'city' ), sum=( 'pay',
'some_other_field' ), avg=( 'pay', 'age' ), count=True )
I like this calling interface as an alternate method for its
fine-tuned control, but there are times it would be nice to not
have to have such tight coupling between the model and the
aggregation. Otherwise, a change in the model requires not only
a change in the template to make use of the results, but in the
view (to request various aggregate functions) to add the desired
aggregation functions.
Post by Honza Král
Post by Tim Chase
[
{
'name' : 'some name',
'city' : 'Prague',
'sum' : { 'pay' : 30000, 'some_other_field' : 10000 },
'avg' : { 'pay' : 10000, 'age' : 30 },
'count' : 5,
},
......
]
As long as the resulting data can be easily accessed directly by
a template, any number of resulting formats would work just fine.
I'd see using it something like

return render_to_response(
'test.html', {'items': Model.objects.all()}
)

and then within the template refer to

<table>
{% for item in items %}
<tr><td>{{ item.pay }}</td></tr>
{% endfor %}
<tr><td>{{ items.aggregate.sum.pay }}</td></tr>
</table>

or possibly

items = Model.objects.all()
return render_to_response('test.html', {
'items': items,
'stats': items.aggregate(),
})

and then within the template refer to

<table>
{% for item in items %}
<tr><td>{{ item.pay }}</td></tr>
{% endfor %}
<tr><td>{{ stats.sum.pay }}</td></tr>
</table>
Post by Honza Král
fist argument is a tuple of fields on which to call GROUP BY, then
there are several optional kwargs: avg, sum, min, max and count
I am willing to code up a patch for this, it should be pretty
straightforward. If we would want to take it to the next level, we
could allow filter() on such aggregated queryset, that would manifest
itself in the HAVING clause...
what do you think??
I like the ideas you present, particularly your clean interface
for the aggregated results ("results.sum.fieldname" rather than
my munged "results.fieldname_sum") and the clean interface for
specific aggregation. However, as stated above, I am hesitant to
give up the simplicity of just asking for "aggregate everything
possible" to prevent tight coupling. Both sound good to me ("but
I want both specific *and* generic" :)

Feel free to swipe any of the code I provided for any patches you
make.

I think it will be a valuable addition as evaluation can be made
lazy (thus, it has no great impact on existing code), and I've
seen it requested several times in the Django-Users ML archives
where the answer was "Django doesn't do that, but you can drop to
hand-rolled SQL to do it for you." I think it would help the
folks that don't think in SQL.

Just my musings...I appreciate your feedback.

-tkc




--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-***@googlegroups.com
To unsubscribe from this group, send email to django-users-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---
Honza Král
2007-02-23 02:26:15 UTC
Permalink
Post by Tim Chase
Post by Honza Král
Post by Tim Chase
I haven't yet figured out a way to suppress the order_by portion,
so what's currently in there is an ugly hack. But it would need
to prevent the standard methods from inserting an ORDER BY clause
against a non-aggregated field.
if you add an empty call (no parameters) to order_by(), it will drop
the ORDER BY clause
I thought I had tried this. I'll have to give it another whack
on Monday when I get back from PyCon. I had problems with it
trying to order the resultset by the Meta.ordering properties,
when those fields weren't part of the aggregate. I'll try it
again to see if I did something wrong or if I unearthed a bug.
Post by Honza Král
Post by Tim Chase
quseryset = Model.objects.all()
queryset.aggregate( ( 'name', 'city' ), sum=( 'pay',
'some_other_field' ), avg=( 'pay', 'age' ), count=True )
I like this calling interface as an alternate method for its
fine-tuned control, but there are times it would be nice to not
have to have such tight coupling between the model and the
aggregation. Otherwise, a change in the model requires not only
a change in the template to make use of the results, but in the
view (to request various aggregate functions) to add the desired
aggregation functions.
yes, and that is how it should be in my opinion. Why have the database
calculate stuff you don't want?
Post by Tim Chase
Post by Honza Král
Post by Tim Chase
[
{
'name' : 'some name',
'city' : 'Prague',
'sum' : { 'pay' : 30000, 'some_other_field' : 10000 },
'avg' : { 'pay' : 10000, 'age' : 30 },
'count' : 5,
},
......
]
As long as the resulting data can be easily accessed directly by
a template, any number of resulting formats would work just fine.
good point, the dictionary one wouldn't work very well in templates
Post by Tim Chase
I'd see using it something like
return render_to_response(
'test.html', {'items': Model.objects.all()}
)
and then within the template refer to
<table>
{% for item in items %}
<tr><td>{{ item.pay }}</td></tr>
{% endfor %}
<tr><td>{{ items.aggregate.sum.pay }}</td></tr>
</table>
this just seems wrong - doing something like this in a template just
doesn't feel right, you should calculate the aggregation in the view -
its an expensive operation you should always think through
Post by Tim Chase
or possibly
items = Model.objects.all()
return render_to_response('test.html', {
'items': items,
'stats': items.aggregate(),
})
I think this should be:

items = Model.objects.all()
return render_to_response('test.html', {
'stats_by_owner': items.aggregate( ('owner',), count=True,
sum=('visits',) max=('rating', ), min=('rating', 'visits') ),
})

{% for stat in stats_by_owner %}
Owner {{ stat.owner }} has written {{ stat.count }} articles wit
ratings between {{ stat.min.rating }} and {{ stat.max.rating }}. His
work was read {{ stat.sum.visits }} times in total and his least
popular article was read {{ stat.min.visits }} times.
{% endfor %}

the problem here is that if this should make any sense, owner should
be an object, not just a field value (user id in this case), I believe
though that this can be done...
Post by Tim Chase
and then within the template refer to
<table>
{% for item in items %}
<tr><td>{{ item.pay }}</td></tr>
{% endfor %}
<tr><td>{{ stats.sum.pay }}</td></tr>
</table>
Post by Honza Král
fist argument is a tuple of fields on which to call GROUP BY, then
there are several optional kwargs: avg, sum, min, max and count
the first argument should also be optional - in case you want the total
Post by Tim Chase
Post by Honza Král
I am willing to code up a patch for this, it should be pretty
straightforward. If we would want to take it to the next level, we
could allow filter() on such aggregated queryset, that would manifest
itself in the HAVING clause...
what do you think??
I like the ideas you present, particularly your clean interface
for the aggregated results ("results.sum.fieldname" rather than
my munged "results.fieldname_sum") and the clean interface for
specific aggregation. However, as stated above, I am hesitant to
give up the simplicity of just asking for "aggregate everything
possible" to prevent tight coupling. Both sound good to me ("but
I want both specific *and* generic" :)
and what about performance and optimization? do you really want to
calculate even the things you have no intention on using? Tight
coupling with the model here makes sense - you cannot really ask for
aggregation, if you don't know, what you want.
Post by Tim Chase
Feel free to swipe any of the code I provided for any patches you
make.
thanks, I will wait for some more feedback though, I want to get the
interface right before I start coding
Post by Tim Chase
I think it will be a valuable addition as evaluation can be made
lazy (thus, it has no great impact on existing code), and I've
seen it requested several times in the Django-Users ML archives
where the answer was "Django doesn't do that, but you can drop to
hand-rolled SQL to do it for you." I think it would help the
folks that don't think in SQL.
or the guys that don't want to rely on one DB engine
Post by Tim Chase
Just my musings...I appreciate your feedback.
same here ;)
Post by Tim Chase
-tkc
--
Honza Kr�l
E-Mail: ***@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-***@googlegroups.com
To unsubscribe from this group, send email to django-users-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-users?hl=en
-~-----
Tim Chase
2007-02-23 11:59:19 UTC
Permalink
Post by Honza Král
Post by Tim Chase
Post by Honza Král
quseryset = Model.objects.all()
queryset.aggregate( ( 'name', 'city' ), sum=( 'pay',
'some_other_field' ), avg=( 'pay', 'age' ), count=True )
I like this calling interface as an alternate method for its
fine-tuned control, but there are times it would be nice to not
have to have such tight coupling between the model and the
aggregation. Otherwise, a change in the model requires not only
a change in the template to make use of the results, but in the
view (to request various aggregate functions) to add the desired
aggregation functions.
yes, and that is how it should be in my opinion. Why have the database
calculate stuff you don't want?
My line of reasoning for allowing it to default to everything
would be the same reason that you can simply call .all() and get
all the fields rather than calling something crazy like

.all(model.field1, model.field3, model.field7)

to just selectively bring back the three fields that you "know"
you want. It gives the template designer access to any of the
fields without needing to resort to involving the coder to make
changes to the view. Any good database will not require a
table/hash-scan for each aggregate, but will gather the aggregate
stats in a single pass, so the cost of doing N fields vs doing
N+1 fields is barely noticeable. And since N is bounded by the
number of fields in the model, this isn't apt to grow out of control.
Post by Honza Král
Post by Tim Chase
As long as the resulting data can be easily accessed directly by
a template, any number of resulting formats would work just fine.
good point, the dictionary one wouldn't work very well in templates
Unless you have some brainstorm for making dictionaries work
nicely in templates, which is another bugaboo I see come up on
the ML occasionally :)
Post by Honza Král
Post by Tim Chase
and then within the template refer to
<table>
{% for item in items %}
<tr><td>{{ item.pay }}</td></tr>
{% endfor %}
<tr><td>{{ items.aggregate.sum.pay }}</td></tr>
</table>
this just seems wrong - doing something like this in a template just
doesn't feel right, you should calculate the aggregation in the view -
its an expensive operation you should always think through
The results should be cached so that if an aggregate is needed by
the template-designer (as per above), the cost happens once with
one additional DB hit. I would be horrified if

<tr>
<td>{{ items.aggregate.sum.pay }}</td>
<td>{{ items.aggregate.sum.hours }}</td>
</tr>

triggered two database hits. :) But if the aggregate stats are
needed, they should be generated on demand, just like .all()
where no DB call actually occurs until its data is used in things
like a {% for x in data %} call. And if it's called, the DB is
only accessed once to bring back the data (okay, glossing over
fetchmany() calls here)
Post by Honza Král
Post by Tim Chase
items = Model.objects.all()
return render_to_response('test.html', {
'items': items,
'stats': items.aggregate(),
})
items = Model.objects.all()
return render_to_response('test.html', {
'stats_by_owner': items.aggregate( ('owner',), count=True,
sum=('visits',) max=('rating', ), min=('rating', 'visits') ),
})
As mentioned, I like both syntaxes. :) They seem to return
somewhat different things though. The .aggregate() call would
only ever return one row for the entire dataset. The
parameterized version would return multiple rows based on the
presence of a column-name tuple as the first parameter.
Otherwise, code starts getting unwieldy when you actually do want
all your fields. I have a model with 20+ fields in it for
various types of money (FloatField) and time-durations
(PositiveIntegerField) involved in transactions (you've gotta
love billing information for cell-phone usage). Without the
ability to say "sum and average everything you can", the code
would start looking like

items.aggregate((,), sum=(
'field1',
'field2',
...
'field20',
'field21',
), average=(
'field1',
'field2',
...
'field20',
'field21',
))

which makes for one very ugly view. Even with a folding editor
like Vim, it's still there.
Post by Honza Král
{% for stat in stats_by_owner %}
Owner {{ stat.owner }} has written {{ stat.count }} articles wit
ratings between {{ stat.min.rating }} and {{ stat.max.rating }}. His
work was read {{ stat.sum.visits }} times in total and his least
popular article was read {{ stat.min.visits }} times.
{% endfor %}
the problem here is that if this should make any sense, owner should
be an object, not just a field value (user id in this case), I believe
though that this can be done...
I agree...when you have the leading tuple, this is a lovely
syntax. It does have some rough edges where one's model
foolishly has fields such as "min" or "max" and suddenly you've
got a clash of ontologies.
Post by Honza Král
the first argument should also be optional - in case you want the total
Makes sense.
Post by Honza Král
Post by Tim Chase
I think it will be a valuable addition as evaluation can be made
lazy (thus, it has no great impact on existing code), and I've
seen it requested several times in the Django-Users ML archives
where the answer was "Django doesn't do that, but you can drop to
hand-rolled SQL to do it for you." I think it would help the
folks that don't think in SQL.
or the guys that don't want to rely on one DB engine
Do any of the supported DB engines not support groupings? I
thought at least that much was pretty standard. But yes,
requiring a developer to learn a second language (even if it's as
useful as knowing SQL is...) goes against the grain of Django's
"get out of the developer's way and let them just be productive
in Python yet do crazy-powerful stuff" philosophy.

Thanks again for your feedback and ideas.

-tkc




--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-***@googlegroups.com
To unsubscribe from this group, send email to django-users-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---
Honza Král
2007-02-23 15:12:40 UTC
Permalink
Post by Tim Chase
Post by Honza Král
Post by Tim Chase
Post by Honza Král
quseryset = Model.objects.all()
queryset.aggregate( ( 'name', 'city' ), sum=( 'pay',
'some_other_field' ), avg=( 'pay', 'age' ), count=True )
I like this calling interface as an alternate method for its
fine-tuned control, but there are times it would be nice to not
have to have such tight coupling between the model and the
aggregation. Otherwise, a change in the model requires not only
a change in the template to make use of the results, but in the
view (to request various aggregate functions) to add the desired
aggregation functions.
yes, and that is how it should be in my opinion. Why have the database
calculate stuff you don't want?
My line of reasoning for allowing it to default to everything
would be the same reason that you can simply call .all() and get
all the fields rather than calling something crazy like
.all(model.field1, model.field3, model.field7)
this is something completely different - it is easier for the db to
just take all the fields
Post by Tim Chase
to just selectively bring back the three fields that you "know"
you want. It gives the template designer access to any of the
fields without needing to resort to involving the coder to make
changes to the view. Any good database will not require a
table/hash-scan for each aggregate, but will gather the aggregate
stats in a single pass, so the cost of doing N fields vs doing
N+1 fields is barely noticeable. And since N is bounded by the
number of fields in the model, this isn't apt to grow out of control.
actually this is not true - if you wish to do just one MIN(), it will
only use index, no full table scan, myISAM can even use index for
COUNT() and so on....

even if it wasn't noticeable at all, I would still be against it,
because its plain wrong, it wastes resources and allows developers not
to think of what they are doing.
Post by Tim Chase
Post by Honza Král
Post by Tim Chase
As long as the resulting data can be easily accessed directly by
a template, any number of resulting formats would work just fine.
good point, the dictionary one wouldn't work very well in templates
Unless you have some brainstorm for making dictionaries work
nicely in templates, which is another bugaboo I see come up on
the ML occasionally :)
Post by Honza Král
Post by Tim Chase
and then within the template refer to
<table>
{% for item in items %}
<tr><td>{{ item.pay }}</td></tr>
{% endfor %}
<tr><td>{{ items.aggregate.sum.pay }}</td></tr>
</table>
this just seems wrong - doing something like this in a template just
doesn't feel right, you should calculate the aggregation in the view -
its an expensive operation you should always think through
The results should be cached so that if an aggregate is needed by
the template-designer (as per above), the cost happens once with
one additional DB hit. I would be horrified if
<tr>
<td>{{ items.aggregate.sum.pay }}</td>
<td>{{ items.aggregate.sum.hours }}</td>
</tr>
triggered two database hits. :) But if the aggregate stats are
needed, they should be generated on demand, just like .all()
where no DB call actually occurs until its data is used in things
like a {% for x in data %} call. And if it's called, the DB is
only accessed once to bring back the data (okay, glossing over
fetchmany() calls here)
yes, sure, but you are talking two different things here... I say that
it seems wrong to initiate an aggregation from template. Besides, try
doing similar with current querysets - it won't get cached because
every call to filter(), order_by() etc will produce a NEW queryset
with no connection to the old one, so even if you would actually run
the latter, the former will have no access to the data.
Post by Tim Chase
Post by Honza Král
Post by Tim Chase
items = Model.objects.all()
return render_to_response('test.html', {
'items': items,
'stats': items.aggregate(),
})
items = Model.objects.all()
return render_to_response('test.html', {
'stats_by_owner': items.aggregate( ('owner',), count=True,
sum=('visits',) max=('rating', ), min=('rating', 'visits') ),
})
As mentioned, I like both syntaxes. :) They seem to return
somewhat different things though. The .aggregate() call would
only ever return one row for the entire dataset. The
parameterized version would return multiple rows based on the
presence of a column-name tuple as the first parameter.
Otherwise, code starts getting unwieldy when you actually do want
all your fields. I have a model with 20+ fields in it for
various types of money (FloatField) and time-durations
(PositiveIntegerField) involved in transactions (you've gotta
love billing information for cell-phone usage). Without the
ability to say "sum and average everything you can", the code
would start looking like
items.aggregate((,), sum=(
'field1',
'field2',
...
'field20',
'field21',
), average=(
'field1',
'field2',
...
'field20',
'field21',
))
well, in this extreme example, I would suggest you use a list:
fields = [ f.name for f in items.model._meta.fields ]
items.aggregate( sum=fields, average=fields, min=fields, max=fields )

not that bad, is it?
Post by Tim Chase
which makes for one very ugly view. Even with a folding editor
like Vim, it's still there.
Post by Honza Král
{% for stat in stats_by_owner %}
Owner {{ stat.owner }} has written {{ stat.count }} articles wit
ratings between {{ stat.min.rating }} and {{ stat.max.rating }}. His
work was read {{ stat.sum.visits }} times in total and his least
popular article was read {{ stat.min.visits }} times.
{% endfor %}
the problem here is that if this should make any sense, owner should
be an object, not just a field value (user id in this case), I believe
though that this can be done...
I agree...when you have the leading tuple, this is a lovely
syntax. It does have some rough edges where one's model
foolishly has fields such as "min" or "max" and suddenly you've
got a clash of ontologies.
true, but only if you would want to aggregate by those fields, we
could work around that by simply moving the grouping fields to a
separate dictionary as well:

[
{
'grouped_by' : { 'owner' : XX },
'min' : { 'pay' : 100 },
'max' : { 'pay' : 101},
},
.....
]
Post by Tim Chase
Post by Honza Král
the first argument should also be optional - in case you want the total
Makes sense.
Post by Honza Král
Post by Tim Chase
I think it will be a valuable addition as evaluation can be made
lazy (thus, it has no great impact on existing code), and I've
seen it requested several times in the Django-Users ML archives
where the answer was "Django doesn't do that, but you can drop to
hand-rolled SQL to do it for you." I think it would help the
folks that don't think in SQL.
or the guys that don't want to rely on one DB engine
Do any of the supported DB engines not support groupings? I
thought at least that much was pretty standard. But yes,
requiring a developer to learn a second language (even if it's as
useful as knowing SQL is...) goes against the grain of Django's
"get out of the developer's way and let them just be productive
in Python yet do crazy-powerful stuff" philosophy.
every sql implementation I know of support group by, the problem is
with other things like quoting etc.
Post by Tim Chase
Thanks again for your feedback and ideas.
-tkc
--
Honza Kr�l
E-Mail: ***@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-***@googlegroups.com
To unsubscribe from this group, send email to django-users-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~--
Tim Chase
2007-02-23 20:47:58 UTC
Permalink
Post by Honza Král
Post by Tim Chase
items.aggregate((,), sum=(
'field1',
'field2',
...
'field20',
'field21',
), average=(
'field1',
'field2',
...
'field20',
'field21',
))
fields = [ f.name for f in items.model._meta.fields ]
items.aggregate( sum=fields, average=fields, min=fields, max=fields )
not that bad, is it?
A nice way to do this, and a good compromise on a clean syntax that is
also easy to do what I need.
Post by Honza Král
true, but only if you would want to aggregate by those fields, we
could work around that by simply moving the grouping fields to a
[
{
'grouped_by' : { 'owner' : XX },
'min' : { 'pay' : 100 },
'max' : { 'pay' : 101},
},
.....
]
Another excellent idea. Pleasantly simple yet a good way to encapsulate
the info while still removing the ambiguity of field-names that conflict
with aggregate-function names. I'm still mildly concerned about being
able to access the contents from within a template, wherein one could
access it by member notation suggested earlier:

stats.grouped_by.owner
stats.sum.pay
stats.max.pay

-tkc






--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-***@googlegroups.com
To unsubscribe from this group, send email to django-users-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---
Honza Král
2007-02-24 19:00:07 UTC
Permalink
I created a ticket for this:

http://code.djangoproject.com/ticket/3566

any comments are welcome
Post by Tim Chase
Post by Honza Král
Post by Tim Chase
items.aggregate((,), sum=(
'field1',
'field2',
...
'field20',
'field21',
), average=(
'field1',
'field2',
...
'field20',
'field21',
))
fields = [ f.name for f in items.model._meta.fields ]
items.aggregate( sum=fields, average=fields, min=fields, max=fields )
not that bad, is it?
A nice way to do this, and a good compromise on a clean syntax that is
also easy to do what I need.
Post by Honza Král
true, but only if you would want to aggregate by those fields, we
could work around that by simply moving the grouping fields to a
[
{
'grouped_by' : { 'owner' : XX },
'min' : { 'pay' : 100 },
'max' : { 'pay' : 101},
},
.....
]
Another excellent idea. Pleasantly simple yet a good way to encapsulate
the info while still removing the ambiguity of field-names that conflict
with aggregate-function names. I'm still mildly concerned about being
able to access the contents from within a template, wherein one could
stats.grouped_by.owner
stats.sum.pay
stats.max.pay
-tkc
--
Honza Kr�l
E-Mail: ***@gmail.com
ICQ#: 107471613
Phone: +420 606 678585

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups "Django users" group.
To post to this group, send email to django-***@googlegroups.com
To unsubscribe from this group, send email to django-users-***@googlegroups.com
For more options, visit this group at http://groups.google.com/group/django-users?hl=en
-~--
Loading...