Percona has a mission to provide the best open source database software, support, and services so our users can innovate freely.

We are proud of how far we have come over the last 16+ years. Continuing this trajectory into the future improvements in the development of our software products will require many decisions. Our hope is to make these decisions with as much useful data as possible. Data-informed decision-making is key to delivering products that users love and also key to making the difficult choices on where to invest precious development resources and funds.

How we have used data for data-driven decisions in the past

This is not the first time we have collected data and based our investment decisions on it. 

Back in 2020, we set out to learn more about how Percona Monitoring and Management (PMM) was being used in order to provide a better product to the community. The following are a few examples of decisions that came from this data and some of what comes next in terms of additional telemetry in Percona products. 

Many know that the two M’s in PMM stand for Monitoring and Management ⏤ and we were working on improving the Management in PMM ⏤ so we started building in components that would allow users to not just monitor their environments but also maintain them. One such feature was managed backups. The feedback from users was that “backup management is a must” and “we need something that allows us to manage all of our database backups in one place, not server by server.” When we looked at adoption, we were surprised to see almost none; the little we did see was just us. We needed to make a decision on continuing to invest in this feature or pull the plug on it. Using this data enabled us to have very targeted conversations with users about where we were missing the mark. We poured their feedback into the product, and as we iterated, adoption took off!

This data also showed us that PMM is monitoring tens of thousands of MySQL instances, and when we looked at the breakdown of versions, we saw that ⏤ as of 10 months ago ⏤ the majority of those were still on version 5.7, which is scheduled for EOL by the end of October 2023. This insight allowed us to put two critical programs in place to help companies either make the transition to 8.0 faster or offer End-of-Life support to those who were unable to make the switch. Both programs have been met with great responses from new and existing customers who found themselves stuck!

These two examples of data-driven decisions at Percona are great illustrations of what we want to be able to do across all the products and deployment configurations.

The usage trends of today are the investment areas of tomorrow

To continue creating added benefits for our users and customers, we are adding telemetry directly to our core database products. With this mechanism in place, we will gain insights into how databases are being used, both in terms of environments, versions, and lifecycle, as well as key features of the solutions.

This will help us understand more about how these products are being used so we can provide the same level of data-informed decision-making on MySQL, PostgreSQL, and MongoDB.

While this isn’t new to the industry, it is new for Percona distributions for databases. In our ongoing effort to be as transparent with our customers and users as possible, we are outlining explicitly what kind of information we will collect, what we do with it, and how you can opt out.

We will share the data back

We believe in open source at Percona. This principle has led us to where we are today, and we want to apply it to our implementation of telemetry as well.

The process of gathering data will take some time, but we plan to periodically share statistics that we collect, like breakdowns of versions of database software being used or popular operating systems and architectures. This blog post already comes with some of such data (breakdown of MySQL versions we observed via PMM over the last six months; please see above).

Opt-out mechanisms with no impact on functionality

Our telemetry is going to be enabled by default. All the statistics we collect are anonymized and thus can’t be tracked back to their origins. We do not collect any proprietary information. That being said, if you decide you do not want to share this data with Percona, you can opt out from the telemetry mechanisms easily. Read on for how to control the telemetry. This information will also be provided within Percona documentation pages for all the products with the telemetry module.

The decision not to share the telemetry data with Percona has absolutely no impact on database functionality.

Do you want to learn more?

All the details about what we collect will be found in the help documents structure for involved Percona products. These articles will also help you understand how to opt out from the telemetry, how it is being sent to Percona, and how you can exercise your data owners’ rights.

The type and scope of the data we collect with the telemetry will evolve over time. This blog contains the information precise for the first release of the telemetry module. Please be sure to check the associated product documentation to learn more about the telemetry being collected for each of the products.

Here is an excerpt from Percona XtraDB Cluster 8.0.34 release documentation, which is the first database product to receive Percona telemetry:

What information is collected

At this time, telemetry is added only to the Percona packages and Docker images. Percona XtraDB Cluster collects only information about the installation environment. Future releases may add additional metrics.

Be assured that access to this raw data is rigorously controlled. Percona does not collect personal data. All data is anonymous and cannot be traced to a specific user. To learn more about our privacy practices, read our Percona Privacy statement.

An example of the data collected is the following:

Disable telemetry

Starting with Percona XtraDB Cluster 8.0.34-26-1, telemetry will be enabled by default. If you decide not to send usage data to Percona, you can set the PERCONA_TELEMETRY_DISABLE=1 environment variable for either the root user or in the operating system prior to the installation process. Setting this environment variable looks different depending on the deployment method. The upcoming documentation release will explain this process in more detail.

Telemetry for Operators is collected via a different mechanism but can be disabled using instructions here (this is for Postgres Operator, but the same steps work for MySQL and MongoDB).

Please note that in the example above, the “id” field is the measurement id, and “instance_id” field is an id of the database node where the data is collected from. The “instance_id” is required by our backend software to differentiate data coming from different sources, but it is impossible for Percona to determine the actual origin or host information based on this identifier.

Check our code

You can check the telemetry module’s code in our GitHub repository: https://github.com/percona-lab/telemetry-agent 

Do you have feedback?

Please share it through this Percona Community Forum thread I have created for that purpose.

Subscribe
Notify of
guest

0 Comments
Inline Feedbacks
View all comments