Companies generally evaluate three types of technologies when they consider data export: data integration tools, data sharing products, and data export platforms.
When it comes to moving data around, data engineers have many tools at their disposal, including ETL tools (e.g., FiveTran, Airbyte), Reverse ETL products (e.g., HighTouch, Celsius), and iPaaS/Application Integration Platforms (e.g., Workato, Boomi).
None of these tools provide a solid foundation for customer-facing data export features, which is why they are not marketed as solutions for this problem.
However, since these tools are widely used within software companies, they often end up in the early stages of the evaluation process. To help you save time on research, we’ll outline some of their shortcomings below.
Each fact is problematic on its own. Combined, they create several problems for any data export setup including the following:
Most public APIs include rate limits to prevent overuse. That means data integration tools, which are built on APIs, can only replicate data as fast as the API allows.
We estimate that most tools can only update 6,666 records in one minute. At that speed, a data integration tool can update a maximum table size of 10 million records daily.
If your customers want millions of records updated in near real-time, API-based tools aren’t the way to go.
Connectors aren’t developed by data owners. They’re made by the tool providers and third-party consultants.
As George Fraser, Founder of Fivetran, shared in a recent Reddit post, the process of converting APIs into connectors is complicated:
“The thing about ETL is it’s a “shallow ramp.” You can build a connector to Postgres/Salesforce/whatever that works 80% of the time in a week. You can get it to 95% in a few months. To get to 99.9 is fiendishly difficult, it takes us (5T) years for some connectors.”
Multiple Reddit threads document these performance problems, including this one on Airbyte.
Another common theme in customer complaints is high costs. For example, according to FiveTran’s website, it would cost $1,070 to upsert 2 million records per month and $11,345 to upsert 200 million records per month (if it could process that much given API limits).
Remember, software companies want to profit from their products. Industry gross margins are typically 70 to 90%, meaning a software company would need to charge at least $3,566 to update 2 million records per month. That’s far more than most customers would be willing to pay.
Building data export on top of one of these products would result in handing most, if not all, of the profits to a third party.
Since data integration tools are built for internal users, not external customers, they are missing key areas of functionality that teams will need in order to support customers.
For instance, data integration tools don’t include native support for multi-tenant databases. To extract data for an individual customer, data would need to be manually separated before it could be transferred. Also, error reporting and performance checks aren’t easily tied to individual customers.
Teams would need to do a thorough GAP analysis on product functionality from start to finish before it could externalize a data integration tool to data export customers.
Many data warehouses, including Snowflake, Redshift, BigQuery, and Databricks, offer some form of data sharing.
These data warehouse vendors have a similar approach to data sharing. Companies can securely share selected objects (databases, tables, views, functions) from their accounts with other accounts.
For example, if ACME Software keeps its data in Snowflake, it could share select data with customers who also have accounts on Snowflake. Customers who don’t use Snowflake can set up special accounts, called reader accounts, to view data.
There are no pipelines to build or data to transform. The data is instantly available.
Changes your customers make will have no impact on your data.
Your customers don’t pay the data warehouse provider for storage. They only pay for analysis.
Data can’t be shared to customers on other platforms.
Data needs to be replicated and stored in each region it’s shared in.
Consumers can’t share or replicate the data they receive. They can also lose access.
If the data you want to share is already on one of these platforms and your customers are on the same platform in the same region, starting to share data is relatively straightforward. However, if your customer base is more geographically diverse, you may need to set up, manage, and pay for multiple pipelines. Further, if you need to support multiple platforms, you’ll need to build, manage, and pay for additional services.
Databricks works differently. It is built on an open protocol for secure data sharing called Delta Sharing. This protocol allows organizations to share data with others in a secure and scalable manner, regardless of the platform or technology the recipient is using.
Delta Sharing can be used in two ways:
Customers can share data within Databricks like they can with the other data warehouse vendors. However, Databricks isn’t used as widely as the other platforms, so this probably won’t be an effective way to share data with your customers.
Data can be shared outside of Databricks as long as each destination supports the Delta Sharing protocol. Data analysis tools like Apache Spark, PowerBI, and Tableau support the protocol. However, competing data warehouses, like Snowflake, BigQuery, and RedShift, do not. Popular databases, like PostgreSQL and MySQL, also do not support the technology.
Also, whenever customers query data, companies will incur compute fees if their data is housed with a cloud provider.
If your goal is to push data to your customers’ BI tools, Databricks may be a good fit. However, if you’d like to push data to all of your customers’ data warehouses, databases, and object storage services, Databricks probably isn’t a good fit.
Prequel was founded to help product teams build data features. It’s the only solution on the market designed to solve this use case. It includes every feature outlined in this report and more.
Every month, customers like Zuora, Gong, and LogRocket use Prequel to send billions of records to customers on 20+ data platforms, including Snowflake, BigQuery, Redshift, and Postgres.
The following section breaks down each core capability that enterprise customers require. Check our detailed guide here for more.
Choosing the right tool for building data export capabilities depends on your specific needs. While data integration tools and data-sharing products offer some functionality, they fail to deliver core functionality that enterprises require.
Prequel is the only platform that was built to solve this problem. It includes everything enterprises need and more.