---
title: "Scaling data processing for Gravity Forms: How ditching PHP made it 100x faster"
date: 2025-10-15
author: "Doeke Norg"
link: "https://staging.gravitykit.com/scaling-data-processing-for-gravity-forms/"
---

# Scaling data processing for Gravity Forms: How ditching PHP made it 100x faster

Big forms, big data, big problems. Gravity Forms’ flexible data structure is a gift until you need to crunch thousands of entries fast. So we built a faster way to process data. **Up to 100x faster**. Here's how we did it.

.wp-block-details {
  background: #f4f9fb;
  border-radius: 6px;
  padding:1.2rem;
  margin: 0; /* remove default margin */
}

.wp-block-details summary {
  font-size: 1.3rem;
  cursor: pointer;
} 

.wp-block-details[open] summary {
  margin-bottom: 1rem; /* only adds spacing when open */
}

.wp-block-details[open] > *:not(summary):first-child {
  margin-top: 0.5rem;
}

## Slow data is no data

At GravityKit, we handle a lot of Gravity Forms entries through all of our add-ons. To name a few: **[GravityView](/products/gravityview/)** displays the entries however you like, **[GravityCharts](/products/gravitycharts/)** will plot the entries of various charts and graphs, and **[GravityMath](/products/gravitymath/)** will perform calculations with that data.

What these plugins have in common is that they are front-facing to the user. You want this data to be visible to your users as fast as possible. If a page takes too long to load, the user is already gone or frustrated at the very least.

But the more entries a form has, the more data needs to be filtered, calculated, and summarized. And this comes at the cost of speed.

We want our customers and their users to be happy. So we set out to tackle the challenges that come with the Gravity Forms data structure, to deliver fast and scalable metrics for forms of all sizes.

## The problem: Slow and memory-heavy data crunching

Gravity Forms is one of the best form plugins for WordPress. It's highly configurable with a range of powerful field types. To achieve this level of configurability, Gravity Forms relies on a data structure known as **Entity-Attribute-Value (EAV)**.

### EAV: A primer

EAV is a data structure that uses two database tables. In the case of Gravity Forms, these are:

1. `gf_entry` – contains the unique ID of the entry (the **Entity**), when it was created and last updated, its status (whether it is active, trashed, or spam), and various other data.
2. `gf_entry_meta` – references the `form_id`, along with:
    - `entry_id` – a reference to the unique entry ID in `gf_entry` (reference to the **Entity**)
    - `meta_key` – holds the ID for a (sub) field of the form (the **Attribute**)
    - `meta_value` – contains the value for the (sub) field (the **Value**)

This data structure allows a form to change over time while using the exact same database. For example, whenever you add a new field to your existing form, the data can be stored in a new row with the new field ID and the provided value. When removing a field, all the rows connected to that form and field can be removed to clear unwanted data.

Additionally, you can store any extra data you might need. A lot of add-ons use this to enrich the entry data so they can do something specific with it. For example, GravityView stores an `is_approved` value to filter out unapproved entries.

**Note:** The EAV pattern is not only used by Gravity Forms, but also by WordPress itself. This allows you to create different post types with their own data, without having to create a database table per post type.

While this data structure allows for high configurability, it also has limitations, especially when you're working with thousands of entries! Let's take a closer look at those limitations.

### Limitation 1: Retrieving entries requires multiple queries

Because the entries and the data are stored in separate tables, it requires **at least** two queries to retrieve the entries.

First, you have to retrieve the required entry rows from the gf\_entry table with a specific form\_id. Then, you must retrieve all the rows from gf\_entry\_meta for that form\_id, and where the entry\_id is one of the entry IDs retrieved from gf\_entry.

When Gravity Forms retrieves entry data, it processes all the rows, one at a time, and combines all the data for the same entry into a single array.

**Note:** During the combining process, Gravity Forms will also allow the form fields to change the stored data before it is returned. These changes *may* also require additional database queries, API calls, or other costly transformations.

### Limitation 2: Entries require memory

The array of entries that Gravity Forms returns requires memory. This memory is what PHP uses to read the values. The available memory is limited to the server that is running PHP and its configuration.

Efficiently processing large numbers of entries requires retrieving them in smaller batches and releasing memory after each batch before proceeding to the next (which is why you need **at least two queries**).

Because of this batching, the *next* query needs to wait until the current batch is processed. Which leads to our next problem: **time**.

### Limitation 3: Processing entries takes time

Imagine having 10,000 entries for a form that has 10 fields. This means that there might be around 100,000 `gf_entry_meta` rows (empty fields will not add a row). With a batch size of 500 entries, it would take 20 batches to process all entries.

Depending on the complexity of the query (whether you are filtering out certain values), it could take up to 50-150ms per batch or even 200ms+ on slower servers.

Add to this the time it takes to create the entry arrays and to process those entries, and you could be looking at a total time of 100-300ms per batch, or **2-6 seconds** to process all entries.

**Note:** The calculation is indicative of an average server. The calculation also does not take into consideration additional queries from WordPress or Gravity Forms.

## The reality: Our products were underperforming

GravityCharts aggregates field information and plots it on a chart. For example, a bar chart might show how often certain options were selected, or how many entries were submitted per day.

When it came to displaying large amounts of entries, performance was being constrained by every limitation discussed.

### Too much information

For the aggregation, we only need the values of the fields we were targeting. Because Gravity Forms retrieves entries in their entirety, we had to pull in (and then throw away) a lot of useless data, wasting precious memory.

### Memory consumption forced batching

To solve the memory issue, we began retrieving our entries in batches. By using a Generator in PHP, we could query the data in batches, while the code could still act as if it were retrieving all values at once.

A Generator is an iterable class in PHP that is very memory efficient. For more information, we recommend reading [Generators over arrays](https://doeken.org/blog/generators-over-arrays).

Even though we fixed the memory issue, it still took a long time for the data to be retrieved and calculated. The most frustrating part was that the calculations weren't even that complex. We were only counting/summing the values or counting the entries per date. This made us wonder: "Should we be doing this in PHP? Maybe there is an alternative.”

### Caching to the rescue?

To prevent calculating the same outcome over and over, we thought about introducing a cache. However, caches are reset periodically. Therefore, some users would still need to endure the calculation time.

It had to be better for everyone, not only the lucky ones who happen to hit the cache. We desperately needed a better solution. Then it hit us: the database could already handle this effortlessly using aggregation functions like `COUNT`, `SUM`, `AVG`, even `MIN` and `MAX`!

## Our solution: Use the right tool for the job

While working on the internal package we use at GravityKit that powers our advanced Conditional Logic for Gravity Forms, we realized that **we need to let the database do the aggregation work!** Databases already know how to aggregate values. In fact, they are optimized for this work!

We quickly tested a proof-of-concept that retrieved the data in a **single** query, and found it was not only **way faster**, it also used **significantly less memory in PHP**, since the database did the heavy lifting.

All excited, we got to work. To make an API that we could internally use in all our packages.  
  
To make this work, we built a fast and reusable layer on top of Gravity Forms' Query Builder: `GF_Query`. (Fun fact: GravityKit originally created that system).

We extended it to support SQL-powered summaries through a clean and developer-friendly API.

Now we can get grouped counts, sums, and averages in a single query, without ever loading full entries into PHP.

It's clean, expressive, and *super* fast! But it wasn't easy.

For the full technical breakdown, see the **Developer Notes** below.

Technical overview for developersSince GravityKit originally created the `GF_Query` class that powers Gravity Forms entry retrieval, we know it quite well. It is what makes our advanced Condition Logic so powerful, and why we wanted to use it as the core for our aggregate abstraction. This would allow us to retrieve aggregates based on existing conditional logic.

We created two classes, Query and Field, that can form an expressive query to get the required aggregated data from the database. We added chainable methods and made the classes immutable, so we could reuse parts of the code without thinking twice.

Here's an example of what we created. This example assumes a form (ID: `5`) with two form fields:

1. Category, which is a select field that has 3 values: One, Two , and Three
2. Number, which is a number field

```
$gf_query = new GF_Query( 5, ['status' => 'active'] );
$query = Query::from( $gf_query );
$category_field = Field::from_ids( 5, '1' );
$number_field = Field::from_ids( 5, '2' );
```

```
$gf_query = new GF_Query( 5, ['status' => 'active'] );
$query = Query::from( $gf_query );
$category_field = Field::from_ids( 5, '1' );
$number_field = Field::from_ids( 5, '2' );
```

As you can see, we start with a regular `GF_Query` that already has conditions applied; in this case, we are only interested in active entries.

First, let's count all the entries for this form:

```
$query->count(); // Returns e.g. [['count' => 15 ]]
```

```
$query->count(); // Returns e.g. [['count' => 15 ]]
```

As you can see, it will return an array of arrays. This is because, more often than not, you will want to get the count, grouped by some field. The key for the count is count. That pattern is the same for all aggregates.

The real power is getting the count, grouped by a category. In this example, the value of the selected category is stored under 1; the field ID.

```
$query->group_by( $category_field )->count();
/**
  * Returns e.g. 
  * [
  *     [ '1' => 'One', 'count' => '5' ],
  *     [ '1' => 'Two', 'count' => '4' ],
  *     [ '1' => 'Three', 'count' => '6' ],
  * ]
  */
```

```
$query->group_by( $category_field )->count();
/**
  * Returns e.g. 
  * [
  *     [ '1' => 'One', 'count' => '5' ],
  *     [ '1' => 'Two', 'count' => '4' ],
  *     [ '1' => 'Three', 'count' => '6' ],
  * ]
  */
```

Now, let's do some aggregations. Every aggregate function needs a reference to the field you want to perform the calculation on. We have min, max,avg, and sum. But we can also combine them all in a **single query** with `all()`. We can even group by multiple fields by adding additional `Field` arguments to the `group_by()` method. You could, for example, group by category per date.

```
$by_category = $query->group_by( $category_field /*, $date_field */ );
$results     = $by_category->all( $number_field );

/**
  * Returns e.g. 
  * [
  *     [ '1' => 'One', 'min' => '1', 'max' => '5', 'avg' => '4', 'sum' => '20', count' => '5' ],
  *     [ '1' => 'Two', 'min' => '1', 'max' => '5', 'avg' => '4', 'sum' => '20', 'count' => '4' ],
  *     [ '1' => 'Three', 'min' => '1', 'max' => '5', 'avg' => '4', 'sum' => '20', 'count' => '6' ],
  * ]
```

```
$by_category = $query->group_by( $category_field /*, $date_field */ );
$results     = $by_category->all( $number_field );

/**
  * Returns e.g. 
  * [
  *     [ '1' => 'One', 'min' => '1', 'max' => '5', 'avg' => '4', 'sum' => '20', count' => '5' ],
  *     [ '1' => 'Two', 'min' => '1', 'max' => '5', 'avg' => '4', 'sum' => '20', 'count' => '4' ],
  *     [ '1' => 'Three', 'min' => '1', 'max' => '5', 'avg' => '4', 'sum' => '20', 'count' => '6' ],
  * ]
```

Because Query is immutable, when we create the `$by_category` parameter, it is a clone of `$query` with the `group_by` applied. This means we still have access to the original $query configuration for a different use case.

For GravityCharts, it meant that we could now perform a query like this:

```
$results = Query::from( $existing_gf_query )
	->group_by( Field::from_field( $existing_group_by_field ) )
	->sum();
```

```
$results = Query::from( $existing_gf_query )
	->group_by( Field::from_field( $existing_group_by_field ) )
	->sum();
```

The reason we wrapped existing GF\_Field instances in a `Field` class is so we could add helper methods that would make the abstraction even more concise and expressive. It allowed us to add sorting methods like `->asc()` and -`>desc()`, and consolidate the required transformations to their SQL output, based on the provided `GF_Field`.

It also allowed us to add a `with_alias()` helper, which replaces the ID of the field with a specific key in the results.

Here is a complete example:

```
$category = Field::from_ids( $form_id, $field_id )->with_alias( ‘category’ );
$result = Query::from( $gf_query )
  ->group_by( $category )
  ->order_by( $category->asc() )
  ->count();

/**
 * Sorted by category name in ascending order.
 * [
 *  ['category' => 'One', 'count' => 5],
 *  ['category' => 'Three', 'count' => 6],
 *  ['category' => 'Two', 'count' => 6],
 * ]
 */

```

```
$category = Field::from_ids( $form_id, $field_id )->with_alias( ‘category’ );
$result = Query::from( $gf_query )
  ->group_by( $category )
  ->order_by( $category->asc() )
  ->count();

/**
 * Sorted by category name in ascending order.
 * [
 *  ['category' => 'One', 'count' => 5],
 *  ['category' => 'Three', 'count' => 6],
 *  ['category' => 'Two', 'count' => 6],
 * ]
 */

```

## Challenges we had to overcome

While the idea of letting MySQL do "the heavy lifting" is pretty simple, there were a few things we had to migrate that PHP used to do for us.

### Currency formats

Some values include currency symbols. Others store decimals and thousands differently depending on the default currency. To fix this, we sanitize and normalize all of this *inside the SQL query*, so the aggregate input is consistent. Then we reapply the correct formatting after the query is finished.

### JSON-based multi-select fields

Not every field stores its value(s) in the same way. If a field has a single value, it uses a single row in `gf_entry_meta`. If it has multiple values (like checkboxes or a complex field like an address), it uses multiple rows.

However, a multi-select field stores its *multiple* selected values as JSON in a *single* row. Because of that, we couldn't `GROUP BY` ``. But we found two solutions for this problem:

1. When available (on MySQL 8+, or MariaDB equivalent), we `JOIN` the JSON data with the `JSON_TABLE` function. This creates a temporary table on which it can still perform the `GROUP BY` statement.
2. For older MySQL versions, we created a backup PHP solution that would post-process the aggregates that MySQL produces. While this still requires some "manual processing", the amount of data is very small, which means minimal overhead.

### Adjusting for time zones

The `date_created` and `date_updated` columns on the `gf_entry` table are always stored in GMT. To offset those dates to the correct time zone (as defined in the WordPress settings), we previously used PHP `DateTime` and `DateTimeZone` objects during the entry processing.

In order to let MySQL handle this, we check for the availability of time zones and `CONVERT_TZ` in the database. If this is not available, the aggregation will still be possible, but the results could be slightly off. In that case, we have the option to show a user notification.

## What this means for you? Faster GravityKit products!

This abstraction has been working so well that even with 100,000+ entries, the time it takes to get the value is mere milliseconds. And while we are not against caching, it has not been necessary. This means that the aggregates are *always* up to date, we do not have to make trade-offs in functionality, or limit the dataset to a smaller dataset.

By avoiding PHP and processing entries directly in the database, we've managed to:

- Make the processing up to 100x faster
- Eliminate the need for batching
- Reduce memory usage significantly
- Preserve our user-friendly conditional logic

## Try it out

Try [GravityMath](/products/gravitymath) or [GravityCharts](/products/gravitycharts) to experience this increased speed for yourself!