Discussion:
Use an aggregation function's filter with aggregate of a value from an annotation with a subquery
Daniel Gilge
2018-12-06 00:16:30 UTC
Permalink
Hi,

I think I've found a bug. But I'm not sure and I never opened a ticket for
Django before. So, I first wanted to ask here.

I'm using Django 2.0 and this doesn't work:

subquery = Subquery(
Vote.objects
.filter(comment=OuterRef('pk'))
.values('value')[:1]
)

Comment.objects.annotate(vote=subquery).aggregate(
count=Count('vote', filter=Q(vote__gte=4)),
)

It results in a quite useless AssertionError:

django/db/models/expressions.py


168 169 def set_source_expressions(self, exprs):--> 170 assert len(exprs) == 0 171 172 def _parse_expressions(self, *expressions):
AssertionError:



Vars:
exprs

[Ref(__col8, Col(U0, myapp.Vote.comment))]

self

<django.db.models.expressions.Subquery object at 0x1080077b8>



It probably doesn't make sense because I simplified it. Why I'm using
subqueries is that I have several sums involved in the query:

subquery = Subquery(
Vote.objects
.filter(comment=OuterRef('pk'))
.values('comment_id')
.annotate(sum=Sum('value', filter=Q(**filter_fields)))
.values('sum')[:1]
)

However, what I had to remove is a filter statement and then it works:

Comment.objects.annotate(vote=subquery).aggregate(
count=Count('vote'),
)

Any advice appreciated!
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+***@googlegroups.com.
To post to this group, send email to django-***@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/d405cbf2-0a57-4639-aac1-4376a09099b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Matthew Pava
2018-12-06 02:47:30 UTC
Permalink
Though I can’t address the issue of whether it is a bug, there is a note in the documentation:

https://docs.djangoproject.com/en/2.1/topics/db/aggregation/#filtering-on-annotations
“Avoid using the filter argument
”

You may want to consider conditional expressions (Case 
 When) to achieve your goal.
https://docs.djangoproject.com/en/2.1/ref/models/conditional-expressions/




From: django-***@googlegroups.com [mailto:django-***@googlegroups.com] On Behalf Of Daniel Gilge
Sent: Wednesday, December 5, 2018 6:17 PM
To: Django users
Subject: Use an aggregation function's filter with aggregate of a value from an annotation with a subquery

Hi,

I think I've found a bug. But I'm not sure and I never opened a ticket for Django before. So, I first wanted to ask here.

I'm using Django 2.0 and this doesn't work:

subquery = Subquery(
Vote.objects
.filter(comment=OuterRef('pk'))
.values('value')[:1]
)

Comment.objects.annotate(vote=subquery).aggregate(
count=Count('vote', filter=Q(vote__gte=4)),
)

It results in a quite useless AssertionError:


django/db/models/expressions.py


168

169 def set_source_expressions(self, exprs):

--> 170 assert len(exprs) == 0

171

172 def _parse_expressions(self, *expressions):



AssertionError:


Vars:
exprs


[Ref(__col8, Col(U0, myapp.Vote.comment))]

self


<django.db.models.expressions.Subquery object at 0x1080077b8>



It probably doesn't make sense because I simplified it. Why I'm using subqueries is that I have several sums involved in the query:

subquery = Subquery(
Vote.objects
.filter(comment=OuterRef('pk'))
.values('comment_id')
.annotate(sum=Sum('value', filter=Q(**filter_fields)))
.values('sum')[:1]
)

However, what I had to remove is a filter statement and then it works:

Comment.objects.annotate(vote=subquery).aggregate(
count=Count('vote'),
)

Any advice appreciated!
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+***@googlegroups.com<mailto:django-users+***@googlegroups.com>.
To post to this group, send email to django-***@googlegroups.com<mailto:django-***@googlegroups.com>.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/d405cbf2-0a57-4639-aac1-4376a09099b8%40googlegroups.com<https://groups.google.com/d/msgid/django-users/d405cbf2-0a57-4639-aac1-4376a09099b8%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+***@googlegroups.com.
To post to this group, send email to django-***@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/085f36ad5090413192ff37392f5ea536%40iss2.ISS.LOCAL.
For more options, visit https://groups.google.com/d/optout.
Simon Charette
2018-12-06 05:37:35 UTC
Permalink
Hello Daniel,

It's hard to tell was causes the exception without digging a bit more but
based on your mention of multiple sums I assume you are trying to work
around the cross join of multiple table annotations?

I understand you simplified your query but from what I can see right now
it can be expressed as

commens_count = Comment.objects.annotate(
votes_count=Count('votes', filter=votes_filter),
).filter(
votes_count__gte=4
).count()

That should result in

SELECT COUNT(*) FROM (
SELECT 1
FROM comment
LEFT OUTER JOIN vote ON (vote.comment_id = comment.id)
GROUP BY comment.id
HAVING COUNT(*) FILTER (WHERE ...) > 4
)

There's tickets tracking adding subquery support to aggregate functions but
using subqueries doesn't seem to be necessary here?

Cheers,
Simon
Post by Daniel Gilge
Hi,
I think I've found a bug. But I'm not sure and I never opened a ticket for
Django before. So, I first wanted to ask here.
subquery = Subquery(
Vote.objects
.filter(comment=OuterRef('pk'))
.values('value')[:1]
)
Comment.objects.annotate(vote=subquery).aggregate(
count=Count('vote', filter=Q(vote__gte=4)),
)
django/db/models/expressions.py
exprs
[Ref(__col8, Col(U0, myapp.Vote.comment))]
self
<django.db.models.expressions.Subquery object at 0x1080077b8>
It probably doesn't make sense because I simplified it. Why I'm using
subquery = Subquery(
Vote.objects
.filter(comment=OuterRef('pk'))
.values('comment_id')
.annotate(sum=Sum('value', filter=Q(**filter_fields)))
.values('sum')[:1]
)
Comment.objects.annotate(vote=subquery).aggregate(
count=Count('vote'),
)
Any advice appreciated!
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+***@googlegroups.com.
To post to this group, send email to django-***@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/44dcbf0e-e766-4ba9-a9e6-05ebf1a1709f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Daniel Gilge
2018-12-06 15:31:44 UTC
Permalink
Hi everyone,

@Matthew and @Simon Thanks for your answers!
Post by Simon Charette
I assume you are trying to work
around the cross join of multiple table annotations?
This is correct.
Post by Simon Charette
There's tickets tracking adding subquery support to aggregate functions but
using subqueries doesn't seem to be necessary here?
Does this mean I can’t use subqueries in aggregate right now? (Doesn’t seem to be the case because when I replace the filter in my code below with the commented filter, the query produces expected results so far.) I had a look at following ticket but it seems to be another case:

Add support for aggregation through subqueries
https://code.djangoproject.com/ticket/28296 <https://code.djangoproject.com/ticket/28296>
Post by Simon Charette
You may want to consider conditional expressions (Case 
 When) to achieve your goal.
https://docs.djangoproject.com/en/2.1/ref/models/conditional-expressions/ <https://docs.djangoproject.com/en/2.1/ref/models/conditional-expressions/>
These functions attracted my attention when I was looking for a solution for the described exception. I want to try to adjust my query and use them. (But I thought that what I found is a bug. Therefore, I decided to write this e-mail.) I will see if conditional expressions help.

I actually didn’t want to bother you with the current code of my complete query but I should probably send it to you to understand why I use subqueries and filters that way. However, I’m not experienced with votes and such complex queries and there might be a better solution. But I didn’t want to ask you to take this work.

This subquery is also used in other queries. Therefore, I placed it in a function:

def get_vote_subquery(role, user=None):
filter_fields = {'role': role}
if user:
filter_fields.update({'user': user})
subquery = Subquery(
models.Vote.objects
.filter(segment=OuterRef('pk'))
.values('segment_id')
.annotate(sum=Sum('value', filter=Q(**filter_fields)))
.values('sum')[:1]
)
return subquery

The following code is slightly simplified. I didn’t manage to get the necessary data with one query so far. (I’ll use Redis in the future to get the votes. Therefore, it doesn’t have to be the most effective query for now).

@property
def table_of_contents(self):
if self._table_of_contents:
return self._table_of_contents

# Objects
headings = list(
models.Segment.objects
.filter(tag__in=IMPORTANT_HEADINGS)
.order_by('position')
.values(
'pk',
'position',
'content',
)
)

# Translation state (aggregation because annotation not possible)
aggregates = {}
required = REQUIRED_APPROVALS[self.language]
for i, h in enumerate(headings, start=1):
filters = {'position__gte': h['position']}
try:
filters['position__lt'] = headings[i]['position']
except IndexError:
# The last item
pass
aggregates['translation_done_{}'.format(i)] = Count(
'progress', # this is a field of the model
# filter=Q(**filters),
filter=Q(**filters) & (
Q(progress__gte=TRANSLATION_DONE)
| Q(reviewers_vote__gte=1)
),
)
aggregates['review_done_{}'.format(i)] = Count(
'reviewers_vote',
# filter=Q(**filters),
filter=Q(reviewers_vote__gte=required['reviewer'], **filters),
)
aggregates['trustee_done_{}'.format(i)] = Count(
'trustees_vote',
# filter=Q(**filters),
filter=Q(trustees_vote__gte=required['trustee'], **filters),
)
aggregates = (
models.Segment.objects
.annotate(
reviewers_vote=get_vote_subquery('reviewer'),
trustees_vote=get_vote_subquery('trustee'),
)
.aggregate(**aggregates)
)

# Assign some values
for i, h in enumerate(headings, start=1):
try:
h['segments'] = headings[i]['position'] - h['position']
except IndexError:
# The last item
h['segments'] = self.segments_count - h['position'] + 1
h['translation_done'] = aggregates['translation_done_{}'.format(i)]
h['review_done'] = aggregates['review_done_{}'.format(i)]
h['trustee_done'] = aggregates['trustee_done_{}'.format(i)]
self._table_of_contents = headings
return self._table_of_contents


Cheers,

Daniel
Post by Simon Charette
Hello Daniel,
It's hard to tell was causes the exception without digging a bit more but
based on your mention of multiple sums I assume you are trying to work
around the cross join of multiple table annotations?
I understand you simplified your query but from what I can see right now
it can be expressed as
commens_count = Comment.objects.annotate(
votes_count=Count('votes', filter=votes_filter),
).filter(
votes_count__gte=4
).count()
That should result in
SELECT COUNT(*) FROM (
SELECT 1
FROM comment
LEFT OUTER JOIN vote ON (vote.comment_id = comment.id <http://comment.id/>)
GROUP BY comment.id <http://comment.id/>
HAVING COUNT(*) FILTER (WHERE ...) > 4
)
There's tickets tracking adding subquery support to aggregate functions but
using subqueries doesn't seem to be necessary here?
Cheers,
Simon
Hi,
I think I've found a bug. But I'm not sure and I never opened a ticket for Django before. So, I first wanted to ask here.
subquery = Subquery(
Vote.objects
.filter(comment=OuterRef('pk'))
.values('value')[:1]
)
Comment.objects.annotate(vote=subquery).aggregate(
count=Count('vote', filter=Q(vote__gte=4)),
)
django/db/models/expressions.py
168
--> 170 assert len(exprs) == 0
171
exprs
[Ref(__col8, Col(U0, myapp.Vote.comment))]
self
<django.db.models.expressions.Subquery object at 0x1080077b8>
subquery = Subquery(
Vote.objects
.filter(comment=OuterRef('pk'))
.values('comment_id')
.annotate(sum=Sum('value', filter=Q(**filter_fields)))
.values('sum')[:1]
)
Comment.objects.annotate(vote=subquery).aggregate(
count=Count('vote'),
)
Any advice appreciated!
--
You received this message because you are subscribed to a topic in the Google Groups "Django users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/django-users/JfZguPo-G_g/unsubscribe <https://groups.google.com/d/topic/django-users/JfZguPo-G_g/unsubscribe>.
Visit this group at https://groups.google.com/group/django-users <https://groups.google.com/group/django-users>.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/44dcbf0e-e766-4ba9-a9e6-05ebf1a1709f%40googlegroups.com <https://groups.google.com/d/msgid/django-users/44dcbf0e-e766-4ba9-a9e6-05ebf1a1709f%40googlegroups.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+***@googlegroups.com.
To post to this group, send email to django-***@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/226ABB19-2300-450C-8FEC-DA844682493F%40gmail.com.
For more options, visit https://groups.google.com/d/optout.
Loading...