Skip to content

Missing/duplicate records when using ordering and LimitOffsetPagination #6886

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 of 6 tasks
pierremonico opened this issue Aug 23, 2019 · 7 comments · May be fixed by #8494
Closed
5 of 6 tasks

Missing/duplicate records when using ordering and LimitOffsetPagination #6886

pierremonico opened this issue Aug 23, 2019 · 7 comments · May be fixed by #8494

Comments

@pierremonico
Copy link

pierremonico commented Aug 23, 2019

Checklist

  • I have verified that that issue exists against the master branch of Django REST framework.
  • I have searched for similar issues in both open and closed tickets and cannot find a duplicate.
  • This is not a usage question. (Those should be directed to the discussion group instead.)
  • This cannot be dealt with as a third party library. (We prefer new functionality to be in the form of third party libraries where possible.)
  • I have reduced the issue to the simplest possible case.
  • I have included a failing test as a pull request. (If you are unable to do so we can still accept the issue.)

Steps to reproduce

  1. Create a simple model and order by date:
class Event(models.Model):
    event_date = models.DateTimeField(null=True, blank=True)

    class Meta:
        ordering = ['-event_date']
  1. Create simple serializer and viewset:
class EventSerializer(serializers.ModelSerializer):
    class Meta:
        model = Event
        fields = ('event_date',)
class EventViewSet(viewsets.ModelViewSet):
    queryset = Event.objects.all()
    serializer_class = EventSerializer
  1. Turn on pagination:
REST_FRAMEWORK = {
    ...,
    'DEFAULT_PAGINATION_CLASS': 'rest_framework.pagination.LimitOffsetPagination'
}
  1. Expose endpoint:
router = routers.DefaultRouter()
router.register(r'events', views.EventsViewSet)

urlpatterns = [
    path('', include(router.urls))
]
  1. Make paginated calls:
    /api/events/?limit=25&offset=0
    /api/events/?limit=25&offset=25
    /api/events/?limit=25&offset=50

Expected behavior

Since I have 61 records in my test database, each unique record should be displayed on one of the pages. E.g. the event with the event date should be visible on the first page.

Actual behavior

Some records are not visible on any of the pages, and some records are duplicated across pages. The total count on the response objects is 61 as expected. But some records are simply not displayed.

Setting ordering=['-event_date', 'id'] solves the problem.

NB: my event model actually subclasses an abstract model that has many more fields. I don't know if that has an actual impact but I tried to simplify the example for this issue.

Thanks.

@rpkilby
Copy link
Member

rpkilby commented Aug 23, 2019

Hi @pierremonico - it sounds like something is causing your queryset to become randomly ordered (hence the duplicate items across pages). Looking at the code for LimitOffsetPagination, there isn't anything that modifies the queryset beyond the application fo the limit/offset, so my assumption is that issue is caused elsewhere.

return list(queryset[self.offset:self.offset + self.limit])

I would try logging the sql queries to see if the event query looks sensible.

@n2ygk
Copy link
Contributor

n2ygk commented Aug 23, 2019

Make sure your models have a default ordering field or pagination will return seemingly random ordering. This is documented more clearly for CursorPagination but applies to all pagination AFAIK.

class Something(models.Model):
    field_one = models.CharField(max_length=14)
    filed_two = models.TextField()

    class Meta:
        ordering = ['field_one']

@xordoquy
Copy link
Collaborator

If adding 'id' to the ordering then it likely means you have NULL event_date.
NULL values would not be considered identique by PostgresQL but might be returned in any order I think.
Unless we have a reproductible test case we can analyse, there isn't much we can do.

@pierremonico
Copy link
Author

@rpkilby thanks for the logging tip, I'll try that.
@n2ygk thanks, but as stated in my example, I have ordering set. Adding id solves the problem (last paragraph of my description).
@xordoquy that is indeed correct and probably the cause of the problem!

@oxan
Copy link
Contributor

oxan commented Sep 18, 2019

I just also hit this issue, and it's not limited to NULL values. If you order by any set of fields that isn't unique for every record, it's not guaranteed that the results are always in the same order (see e.g. Django or MySQL documentation). This is particularly nasty when rows are moved across a page boundary, as this causes missing or duplicate records when moving between pages.

It's almost impossible to create a reproducible test case for this, as you need to hit exactly the right conditions for your database engine to shuffle rows around between queries. In my case the load on the server seems to influence this (probably related to whether the data remains in the cache between the queries), but there might be other factors as well.

Nevertheless, I think this is at least undesired behaviour and arguably a bug. Especially if you use OrderingFilter, there's no way to be sure that a unique set of fields. I can't (and don't want to) require all clients to always specify a unique field set, so this should be handled serverside.

Proposal to resolve this:

  • Document this properly. Currently only CursorPagination has the requirement for a unique ordering field documented, while the order schemes require it as well.
  • Add some way to add a "fallback" field(s) that is always added to order_by call (as last field), to ensure that the field set is unique. Most people will probably use the primary key for this, so maybe set that as default value as well.

@oxan
Copy link
Contributor

oxan commented Oct 28, 2019

For those also hitting this problem, you can workaround it by configuring this class in the DEFAULT_FILTER_BACKENDS configuration (instead of rest_framework.filters.OrderingFilter):

from rest_framework.filters import OrderingFilter


# Work around DRF issue #6886 by always adding the primary key as last order field.
class StableOrderingFilter(OrderingFilter):
    def get_ordering(self, request, queryset, view):
        ordering = super(StableOrderingFilter, self).get_ordering(request, queryset, view)
        pk = queryset.model._meta.pk.name

        if ordering is None:
            return ('-' + pk, )

        return list(ordering) + ['-' + pk]

@shantanu-vashishtha
Copy link

shantanu-vashishtha commented Feb 3, 2022

@oxan I am getting null in ordering, shouldn't it always return null as we are overriding the ordering ? As we have a flow where we don't apply order_by in queryset and did it only in CursorPagination by using orderding field

class SomeClassPagination(CustomPagination):
    ordering = "-transacted_at"
    def get_paginated_response(self, data):
        return Response(
            data={
                "records": data,
                "next_cursor": self.get_next_cursor(),
                "previous_cursor": self.get_previous_cursor(),
                "count": len(self.page),
            }
        )

class CustomPagination(CursorPagination)

Elkasitu pushed a commit to Elkasitu/django-rest-framework that referenced this issue May 20, 2022
The ordering recommendations given for the CursorPagination scheme
actually apply to all pagination schemes, an unsuspecting developer that
implements the more common `LimitOffsetPagination` or
`PageNumberPagination` classes is unlikely to be aware of the importance
of consistent ordering.

This commit moves the `Details and limitations` section out of the
`CursorPagination` section and puts it as the very first subsection of
the `Pagination` page so that it's one of the first things that
developers see.

Some examples of inconsistencies as well as how to deal with them are
given, and an extra way to change the ordering of a paginated view is
provided.

Fixes encode#6886
mbaluch pushed a commit to RedHatProductSecurity/osidb that referenced this issue May 24, 2022
Before this commit, Flaw objects are ordered by the created_dt column,
which is not unique and can change, furthermore "hacks" have been
performed on said column multiple times due to bugs in the past.

In order for ordering to be stable and reproducible, at least one of the
columns in the group by must be unique and never changing, otherwise
stuff like DRF Pagination can break [1], leading to missing and or
duplicate records.

This commit also applies this ordering to Affect and Tracker objects
since they have their own paginated endpoints too.

[1] encode/django-rest-framework#6886 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants