Which tools can you use to build data export?

Companies generally evaluate three types of technologies when they consider data export: data integration tools, data sharing products, and data export platforms.

Data Integration Tools

When it comes to moving data around, data engineers have many tools at their disposal, including ETL tools (e.g., FiveTran, Airbyte), Reverse ETL products (e.g., HighTouch, Celsius), and iPaaS/Application Integration Platforms (e.g., Workato, Boomi).

None of these tools provide a solid foundation for customer-facing data export features, which is why they are not marketed as solutions for this problem.

However, since these tools are widely used within software companies, they often end up in the early stages of the evaluation process. To help you save time on research, we’ll outline some of their shortcomings below.

There are three structural reasons why data integration tools aren’t a good fit for data export:

Data integration tools aren’t made by the data owners
APIs are a terrible way to keep lots of large datasets up to date
Data integration tools are built for internal users, not external customers

Each fact is problematic on its own. Combined, they create several problems for any data export setup including the following:

1. APIs have rate limits that constrain speed.

Most public APIs include rate limits to prevent overuse. That means data integration tools, which are built on APIs, can only replicate data as fast as the API allows.

We estimate that most tools can only update 6,666 records in one minute. At that speed, a data integration tool can update a maximum table size of 10 million records daily.

If your customers want millions of records updated in near real-time, API-based tools aren’t the way to go.

2. Third-party development creates bugs.

Connectors aren’t developed by data owners. They’re made by the tool providers and third-party consultants.

As George Fraser, Founder of Fivetran, shared in a recent Reddit post, the process of converting APIs into connectors is complicated:

“The thing about ETL is it’s a “shallow ramp.” You can build a connector to Postgres/Salesforce/whatever that works 80% of the time in a week. You can get it to 95% in a few months. To get to 99.9 is fiendishly difficult, it takes us (5T) years for some connectors.”

Multiple Reddit threads document these performance problems, including this one on Airbyte.

3. High costs shift profits to vendors.

Another common theme in customer complaints is high costs. For example, according to FiveTran’s website, it would cost $1,070 to upsert 2 million records per month and $11,345 to upsert 200 million records per month (if it could process that much given API limits).

Remember, software companies want to profit from their products. Industry gross margins are typically 70 to 90%, meaning a software company would need to charge at least $3,566 to update 2 million records per month. That’s far more than most customers would be willing to pay.

Building data export on top of one of these products would result in handing most, if not all, of the profits to a third party.

4. Missing functionality would be time-consuming and expensive to build.

Since data integration tools are built for internal users, not external customers, they are missing key areas of functionality that teams will need in order to support customers.

For instance, data integration tools don’t include native support for multi-tenant databases. To extract data for an individual customer, data would need to be manually separated before it could be transferred. Also, error reporting and performance checks aren’t easily tied to individual customers.

Teams would need to do a thorough GAP analysis on product functionality from start to finish before it could externalize a data integration tool to data export customers.

Data Sharing Tools

Many data warehouses, including Snowflake, Redshift, BigQuery, and Databricks, offer some form of data sharing.

Snowflake, Redshift, and BigQuery

These data warehouse vendors have a similar approach to data sharing. Companies can securely share selected objects (databases, tables, views, functions) from their accounts with other accounts.

For example, if ACME Software keeps its data in Snowflake, it could share select data with customers who also have accounts on Snowflake. Customers who don’t use Snowflake can set up special accounts, called reader accounts, to view data.

Benefits

1. No data is copied or transferred between accounts.

There are no pipelines to build or data to transform. The data is instantly available.

2. Shared data is read-only for your customers.

Changes your customers make will have no impact on your data.

3. Customers pay for compute, not storage.

Your customers don’t pay the data warehouse provider for storage. They only pay for analysis.

Limitations

1. Data can’t be shared cross-cloud.

Data can’t be shared to customers on other platforms.

2. Data needs to be replicated to be accessed across regions.

Data needs to be replicated and stored in each region it’s shared in.

3. Consumers don’t own or retain the data.

Consumers can’t share or replicate the data they receive. They can also lose access.

Summary

If the data you want to share is already on one of these platforms and your customers are on the same platform in the same region, starting to share data is relatively straightforward. However, if your customer base is more geographically diverse, you may need to set up, manage, and pay for multiple pipelines. Further, if you need to support multiple platforms, you’ll need to build, manage, and pay for additional services.

Databricks

Databricks works differently. It is built on an open protocol for secure data sharing called Delta Sharing. This protocol allows organizations to share data with others in a secure and scalable manner, regardless of the platform or technology the recipient is using.

Delta Sharing can be used in two ways:

Sharing data within Databricks
Sharing data outside of Databricks

Sharing data within Databricks

Customers can share data within Databricks like they can with the other data warehouse vendors. However, Databricks isn’t used as widely as the other platforms, so this probably won’t be an effective way to share data with your customers.

Sharing data outside of Databricks

Data can be shared outside of Databricks as long as each destination supports the Delta Sharing protocol. Data analysis tools like Apache Spark, PowerBI, and Tableau support the protocol. However, competing data warehouses, like Snowflake, BigQuery, and RedShift, do not. Popular databases, like PostgreSQL and MySQL, also do not support the technology.

Also, whenever customers query data, companies will incur compute fees if their data is housed with a cloud provider.

Summary

If your goal is to push data to your customers’ BI tools, Databricks may be a good fit. However, if you’d like to push data to all of your customers’ data warehouses, databases, and object storage services, Databricks probably isn’t a good fit.

Data Export Tools

Prequel was founded to help product teams build data features. It’s the only solution on the market designed to solve this use case. It includes every feature outlined in this report and more.

Every month, customers like Zuora, Gong, and LogRocket use Prequel to send billions of records to customers on 20+ data platforms, including Snowflake, BigQuery, Redshift, and Postgres.

How does Prequel compare to the alternatives?

The following section breaks down each core capability that enterprise customers require. Check our detailed guide here for more.

‍

Connect to your data platform

	Prequel	Snowflake	BigQuery	Redshift	Databricks
Data Warehouses
Snowflake	✓	✓	✗	✗	✗
Google BigQuery	✓	✗	✓	✗	✗
Amazon Redshift	✓	✗	✗	✓	✗
Databricks	✓	✗	✗	✗	✓
Amazon Athena	✓	✗	✗	✗	✗
ClickHouse	✓	✗	✗	✗	✗
Materialize	✓	✗	✗	✗	✗
Firebolt	✓	✗	✗	✗	✗
Databases
PostgreSQL	✓	✗	✗	✗	✗
MariaDB	✓	✗	✗	✗	✗
MySQL	✓	✗	✗	✗	✗
Amazon Aurora PostgreSQL	✓	✗	✗	✗	✗
Amazon Aurora MySQL	✓	✗	✗	✗	✗
MongoDB	✓	✗	✗	✗	✗
SingleStore	✓	✗	✗	✗	✗
Oracle	✓	✗	✗	✗	✗
Object Storage
Google Cloud Storage	✓	✗	✗	✗	✗
Amazon S3	✓	✗	✗	✗	✗
Microsoft Azure Blob Storage	✓	✗	✗	✗	✗
Cloudflare R2	✓	✗	✗	✗	✗

‍

Connect to your customers’ data platforms

Data Warehouses
	Prequel	Snowflake	BigQuery	Redshift	Databricks
Snowflake	✓	✓	✗	✗	✗
Google BigQuery	✓	✗	✓	✗	✗
Amazon Redshift	✓	✗	✗	✓	✗
Databricks	✓	✗	✗	✗	✓
Amazon Athena	✓	✗	✗	✗	✗
ClickHouse	✓	✗	✗	✗	✗
Firebolt	✓	✗	✗	✗	✗
Materialize	✓	✗	✗	✗	✗
Databases
PostgreSQL	✓	✗	✗	✗	✗
MariaDB	✓	✗	✗	✗	✗
MySQL	✓	✗	✗	✗	✗
Amazon Aurora PostgreSQL	✓	✗	✗	✗	✗
Amazon Aurora MySQL	✓	✗	✗	✗	✗
MongoDB	✓	✗	✗	✗	✗
SingleStore	✓	✗	✗	✗	✗
Oracle	✓	✗	✗	✗	✗
Object Storage
Google Cloud Storage	✓	✗	✗	✗	✗
Amazon S3	✓	✗	✗	✗	✗
Microsoft Azure Blob Storage	✓	✗	✗	✗	✗
SFTP	✓	✗	✗	✗	✗
Cloudflare R2	✓	✗	✗	✗	✗
Connection Modalities
Username and password	✓	✓	✓	✓	✓
IP whitelisting	✓	✓	✓	✓	✓
Service accounts	✓	✓	✓	✓	✓
SSH tunneling	✓	✓	✓	✓	✓
Role-based account controls	✓	✓	✓	✓	✓

‍

Read from your data platform and write to your customers’ data platforms

	Prequel	Snowflake	BigQuery	Redshift	Databricks
Table & Schema-Based Tenancy	✓	n/a	n/a	n/a	n/a
Version-controlled schemas	✓	n/a	n/a	n/a	n/a
Schema migration	✓	n/a	n/a	n/a	n/a
Full transfer	✓	n/a	n/a	n/a	n/a
Incremental transfer	✓	n/a	n/a	n/a	n/a
Windowed transfer	✓	n/a	n/a	n/a	n/a
Eventual consistency	✓	n/a	n/a	n/a	n/a
Data integrity checks	✓	n/a	n/a	n/a	n/a

‍

Provide data security and comply with regulatory requirements

	Prequel	Snowflake	BigQuery	Redshift	Databricks
Private deployment	✓	✓	✓	✓	✓
No data retention	✓	n/a	n/a	n/a	n/a
Data residency	✓	✓	✓	✓	✓
Certifications & Testing	✓	✓	✓	✓	✓

‍

Deliver an excellent customer experience from onboarding to support

	Prequel	Snowflake	BigQuery	Redshift	Databricks
White-labeling	✓	✗	✗	✗	✗
Embedding	✓	✗	✗	✗	✗
Magic Links	✓	✗	✗	✗	✗
Table selection	✓	✓	✓	✓	✓
Frequency selection	✓	n/a	n/a	n/a	n/a
Documentation	✓	✓	✓	✓	✓
Direct support	✓	✓	✓	✓	✓

‍

Use tools, including APIs, to automate workflows

	Prequel	Snowflake	BigQuery	Redshift	Databricks
Webhooks & alerts	✓	✗	✗	✗	✗
API	✓	✓	✓	✓	✓
Developer Documentation	✓	✓	✓	✓	✓

‍

Monetize your data product

	Prequel	Snowflake	BigQuery	Redshift	Databricks
Affordable	✓	✓	✓	✓	✓
Client cost model	✓	✗	✗	✗	✗

The right tool for the job

Choosing the right tool for building data export capabilities depends on your specific needs. While data integration tools and data-sharing products offer some functionality, they fail to deliver core functionality that enterprises require.

Prequel is the only platform that was built to solve this problem. It includes everything enterprises need and more.

In This Article

Which tools can you use to build data export?

Which tools can you use to build data export?

Data Integration Tools

There are three structural reasons why data integration tools aren’t a good fit for data export:

1. APIs have rate limits that constrain speed.

2. Third-party development creates bugs.

3. High costs shift profits to vendors.

4. Missing functionality would be time-consuming and expensive to build.

Data Sharing Tools

Snowflake, Redshift, and BigQuery

Benefits

1. No data is copied or transferred between accounts.

2. Shared data is read-only for your customers.

3. Customers pay for compute, not storage.

Limitations

1. Data can’t be shared cross-cloud.

2. Data needs to be replicated to be accessed across regions.

3. Consumers don’t own or retain the data.

Summary

Databricks

Sharing data within Databricks

Sharing data outside of Databricks

Summary

Data Export Tools

How does Prequel compare to the alternatives?

Connect to your data platform

Data Warehouses

Databases

Object Storage

Connect to your customers’ data platforms

Data Warehouses

Databases

Object Storage

Connection Modalities

Read from your data platform and write to your customers’ data platforms

Provide data security and comply with regulatory requirements

Deliver an excellent customer experience from onboarding to support

Use tools, including APIs, to automate workflows

Monetize your data product

The right tool for the job

Ready to see Prequel in action?