Logo

Tornblue

  • Archive
  • RSS
  • Ask me anything

App Engine: How to do an efficient OR query?

Let’s say you want to find all of the current user’s facebook friends who are using your app. You’ve stored the facebook id of each connected user on a User entity and the Facebook API returns to you a large list of facebook ids of the friends of the current user. To look them up in your datastore you basically need a giant OR or IN query on the User entity. Here’s a traditional SQL example of how you might do this:

SELECT * FROM User WHERE facebook_id = '100213' OR facebook_id = '104413' 
     OR facebook_id = '150273' ... etc.

The way the App Engine documentation says to query for multiple possible values is to use an IN query. Example:

employees = User.gql('WHERE facebook_id IN :1', ['100213', '104413', '150273'])

The problem with this is that it quickly becomes slow and maxes out at 30 values. The reason is that behind the scenes it’s actually running an equals query for each possible value. This of course won’t work with thousands of facebook ids you might need to look up.

One way to handle this efficiently in App Engine is to create a separate entity in your datastore for just facebook id lookups. You can call it something like FacebookIdIndex. You will need a FacebookIdIndex entity for each User that has an assigned facebook_id. Here’s what the model might look like:

class FacebookIdIndex(db.Model)
    user = db.ReferenceProperty(User, required=True)

Then, when you create each FacebookIdIndex you set the key_name to the facebook id and set the reference property of the corresponding app User entity. Like so:

facebook_index = FacebookIdIndex(key_name=str(facebook_id), user=user)
facebook_index.put()

This creates an an entity with the key name of the facebook id and one property with a reference back to the User entity we care about. (NOTE: Make sure you convert the facebook_id to a string just in case you get it as an integer. Key names can not be integer values.)

Now, to look up all Users who are friends with the current user in your app, you can do a bulk key name lookup. This is extremely fast compared to separate queries and there is no limit on the number of key names you can lookup. Given a list of facebook ids, here’s how it’s done:

# Get all matching FacebookIdIndex's in one quick bulk key name lookup.
indexes = FacebookIdIndex.get_by_key_name(facebook_ids)

# Now get a list of all of the User entity keys that match the facebook ids.
user_keys = []
for index in indexes:
    # Check if a match. The bulk get by key name will return None for non matches.
    if index:
        # Get the key of the User entity we want.
        user_key = FacebookIdIndex.user.get_value_for_datastore(index)
        # Check if entity exists just to make sure User entity still exists.
        if user_key:
            user_keys.append(user_key)

# If any users matched, bulk get all of them.
if user_keys:
    users = User.get(user_keys)

# Found all friends of the current user using the app!
return users

The get_value_for_datastore() call is used to check for non matches and also to prevent App Engine from making individual get calls for each User entity. We want to delay the get for each User entity for the bulk get() call, which is much faster than doing individual gets for each User entity.

I hope that all makes sense. We’ve now gone from multiple slow GQL queries to two quick bulk get calls.

    • #google app engine
    • #python
    • #gql
  • 3 months ago
  • Comments
  • Permalink
  • Share
    Tweet

The Most Important Page In All App Engine Land

http://code.google.com/appengine/docs/python/datastore/functions.html

For my first post I’ll talk about what I think is a very useful docs page that is sort of buried in the Python App Engine docs. It’s a page that describes all of the functions available in the google.appengine.ext.db package. These are extremely useful functions that I find most App Engine beginners never discover or never use. I’m only going to talk about some of the functions, but all of them can be really useful. I’m not going to talk about run_in_transaction() since that function is covered in the Transactions section of the App Engine docs.

Functions like delete(), delete_async(), get(), get_async(), put(), and put_async() can really help improve the latency in your app. These are functions that let you edit entities in bulk, which can really speed up your datastore requests by sending only one request (RPC) to the datastore instead of many.

Typical case:

employees_trained = Employee.gql("WHERE account IN :1",
                                 training_registration_list)
for e in employees_trained:
    e.new_hire_training_completed = True
    e.put()

Now, instead of doing an individual put() for each employee, we can create create a list of entities to put later all at once:

employees_trained = Employee.gql("WHERE account IN :1",
                                 training_registration_list)
to_save = []
for e in employees_trained:
    e.new_hire_training_completed = True
    to_save.append(e)

db.put(to_save)

If the number of employees to modify is high and you’re saving them individually then it can really slow down your page loads. Editing them in bulk removes a lot of the overhead of many trips back and forth with the datastore. You can even save entities of different model types. Just append any to be updated model instances to the to_save list and call db.put() all at once.

To save even more time you can use the async functions. These will asynchronously start reading/writing to the datastore while the code after it gets to execute. This is so key when you’re doing a lot of processing in one page request.

Quick example:

employees_trained = Employee.gql("WHERE account IN :1",
                                 training_registration_list)
to_save = []
for e in employees_trained:
    e.new_hire_training_completed = True
    to_save.append(e)

employee_async = db.put_async(to_save)

...
# Do a bunch of stuff.
...

# Now check for async result at end of code to guarantee it was written.
employee_async.check_success()

Next up, model_to_protobuf() and model_from_protobuf() can be really useful when trying to serialize an entity to save either in memcache or some other datastore. We specifically use it to store model instances in memcache.

Here’s an example of how to use these methods to set and get model instances from memcache for an Employee model instance:

employee = None
pb_employee = memcache.get(memcache_key)
if pb_employee:
  employee = db.model_from_protobuf(entity_pb.EntityProto(pb_employee))

if not employee:
  employee = Employee.get('<key>')
  memcache.set(memcache_key, db.model_to_protobuf(employee).Encode())

Finally, allocate_ids(), allocate_ids_async(), and allocate_id_range() can be useful for when you need to know what the datastore IDs of entities will be before you create them or to avoid conflicts when transferring data across datastores. Bill Katz gives a good explanation of how those functions are useful for datastore backup and restore (link)

That’s it! You should always check back on that page. Google keeps adding more and more useful functions to the db package.

For my next App Engine post I’ll talk about how to use custom models to handle large OR conditional queries like how to bulk lookup every entity that matches each of a list of different values.

    • #app engine
    • #python
    • #db package
  • 3 months ago
  • 2
  • Comments
  • Permalink
  • Share
    Tweet

Dropping Knowledge, Whether it’s accurate or not.

I’ve been working on Android and App Engine development for a while now and I feel like a lot of the answers I was looking for on the web weren’t there. So I’m going to start writing Android and Google App Engine tips here. Hopefully it will save at least someone a lot of time. I’m a terrible writer so hopefully that doesn’t get in the way.

I’m not claiming to be an expert so I might be wrong from time to time. If so let me know and it will make me better.

On with it!

  • 3 months ago
  • Comments
  • Permalink
  • Share
    Tweet

About

Android & App Engine Tips

Links

Firespotter Labs
Nosh

Pages

  • About Me & This Blog

Me, Elsewhere

  • @vorby on Twitter

Twitter

loading tweets…

Following

  • RSS
  • Random
  • Archive
  • Ask me anything
  • Mobile

Effector Theme by Carlo Franco.

Powered by Tumblr