How to "Hack" more than 1000 databases (TSDB) in 48 hours and for less than 5 USD

Iker
10 min readApr 8, 2022
Final Stats of the Process

This article is a quick translation of the original (In Spanish); and has purely educational and security awareness purposes. No alteration or modification of any database was made during its development.

Today, traditional (relational) databases are among the most widespread and used technologies in businesses; we could say that all companies with an information system necessarily have a database. Docker Hub images of MySQL, PostgreSQL, Oracle DB, SQL Server, and DB2 easily exceed one billion downloads; however, these are not necessarily the solution to all business requirements.

Docker Images for Mysql and Postgres

Time Series Databases (TSDB)

TSDBs are not a new concept; however, their use, adoption, and transformation into a central platform for businesses are relatively recent. These databases have enabled businesses to get insights about metrics, traces, and records, among others, almost in real-time. These DBS's advantages over traditional ones are mainly their simplicity of use through REST services, access speed, and extensive data volume analysis. However, this comes at the cost of loss in data accuracy. The prominent use cases for TSDBs(which mainly consist of pairs of timestamps and values) are mostly monitoring, financial and scientific data, data collection from IoT devices, and in some cases, transactional systems.

Data visualization using Chronograph, data is stored in an InfluxDB database.

The increasing popularity of TSDBs over non-relational databases is not a small thing, with InfluxDB being the most popular time-series database today, closely followed by Kdb+ and Prometheus. These TSDBs find more spaces in companies to demonstrate their functionality and value for the businesses.

DB Engines — TSDB Popularity
InfluxDB has exceeded the 500 million downloads over Docker Hub; the complement tools are over 100 million downloads.

InfluxDB and Security (or the lack of)

Despite InfluxDB being currently the most popular TSDB; the versions before the current and unusual 2.0 were far from being considered mature and safe; this is not because the product has many vulnerabilities; but because the default security configuration is not the best one. InfluxDB's primary use case is collecting, aggregating, and correlating measurements from different systems; therefore, the information collected is considered to have a low level of confidentiality, availability, and even integrity. Therefore, InfluxDB versions before 2.0 do not require authentication, even for its administration. Adding insult to injury, it does not require encryption for data transmission.

InfluxDB configuration on Docker Hub. Auth or Encryption !No thanks¡

TSDBs implementations have far exceeded the initial use cases since strategic decisions regarding the performance of the services or products that businesses offer rely on the measurements, metrics, and thresholds set by organizations for their systems. Also, there are multiple cases where resource provisioning and new services are performed according to these metrics. A failure in a specific measurement can have direct economic impacts on businesses or reveal sensitive information to an attacker.

Worker performance heatmap, based on metrics stored in InfluxDB

¿How to Hack 1000+ InfluxDB Instances?

InfluxDB is a database offered mainly through Docker installation; its popularity among the world's largest cloud providers is relatively high. It is a database that can be deployed in minutes in a SaaS model. That popularity causes thousands of unauthenticated databases to exist in many different cloud providers (Amazon, Azure, Alibaba, etc.); the only thing you need to hack these databases is to find them!.

InfluxDB at Digital Ocean MarketPlace

If we want to reach as many objectives as possible, we must develop a pipeline to automate our attack since it is not feasible to verify each DB manually. For this reason, we will divide the attack into three stages, which are very similar to a pipeline used to store data in a TSDB.

General Pipeline…. in Spanish

STAGE I: Reconnaissance — How to scan 100 million IPs in 48 hours

The first phase consists of a port scan, looking for computers with port 8086 open. This scan should be performed on one of the cloud providers, probably is for the best to use the world's largest and most commercially successful one (Amazon). The IP address ranges of these providers are public (except for Alibaba); a simple search will give us some URLs where we can download the IP ranges:

Once we have the cloud provider's IP ranges, we proceed to select the ones associated with instances or applications. After a quick review of network masks, we will find that we face 93,551,242 IP addresses.

Given the number of IPs that we must analyze, it is necessary to use the best possible tool; in this case, NMAP, not only because of its speed, there are other faster alternatives, but because of its precision and reliability. Thanks to Fyodor's "Scanning The Internet" presentation at Blackhat 2008 (https://nmap.org/presentations/BHDC08/), we can identify some useful arguments for NMAP, resulting in the following script:

Nmap Speed — Remember to leave a blank line at the end of the targets file.

The script generates a text file for each segment that gets scanned; the output folder will be “./nmap-scans/amazon/”. The text file contains the IP addresses that have port 8086 open. Although the script would work just fine from any desktop computer, even from a raspberry pi, the scan’s success heavily depends upon our location, network stability, and the ping to our objectives, affecting the effectiveness and, therefore, the number of targets discovered will increase or decrease.

Given the performance conditions mentioned earlier, it is recommended to have a Linux computer instance in each of the clouds; In our case, an AWS c5.large instance for USD 0.085/hour with Ubuntu 20 is fine. The reason for choosing such a “high” instance is due to network performance, a large amount of bandwidth is not required, but we do need low latency to scan 93 million IPs in less than 48 hours. With less aggressive parameters in NMAP, a t1 or t2 instance could work just fine.

Nmap Installation over AWS c5.large instance

STAGE II: Identification and Information Gathering

Once the scanning process has started, files containing the IP addresses with port 8086 open will be generated. In this phase, the process consists of the following steps:

1. Synchronize files to a local folder.

2. Monitor the folder, identifying new files.

3. Make an initial request to the base URL of each IP to determine if it is an InfluxDB instance.

http://{{ip_addr}}:8086/

4. Perform a request to endpoint “/query?” with parameters “db=_internal” and “q=SHOW%20DATABASES”; this should allow us to identify the databases present in the instance.

http://{{ip_addr}}:8086/query?db=_internal&q=SHOW%20DATABASES

5. For each database, we perform a request to endpoint “/query?” with parameters “db={{database}}” and “&q=SHOW%20MEASUREMENTS”; this allows us to get a list of the measurements that are being collected in the database.

http://{{ip_addr}}:8086/query?db={{database}}&q=SHOW%20MEASUREMENTS

6. Parse, unify and structure our data.

7. Send the data to a local InfluxDB instance for performance monitoring, and finally to ElasticSearch to have the data in JSON format, perform analysis, and get some statistics.

The first step can be accomplished with a cron job that runs every 5 minutes and performs SFTP synchronization between the remote repository and a local folder. All the remaining steps can be automated using Node-Red, which will be in charge of monitoring, processing, and validating the InfluxDB instances. Finally, Node-Red will send the data to the local InfluxDB and ElasticSearch instances for monitoring and later analysis.

Node-Red Flow used

In Node-Red flow, the initial request identifies if the URL corresponds to an InfluxDB database; the response to the request includes headers that reveal the presence of InfluxDB and its version; in some specific configurations, it even lists the databases present in the instance.

InfluxDB GET request-response

A InfluxDB properly configured will require authentication for the endpoint “/query”, and a 401 HTTP code will be returned; this should help us discard those databases that have authentication enabled. There are cases in which the header identification exists, we can list the databases, but they do not have any measurements; this happens because InfluxDB is a popular database for educational projects, some of which are just for testing; we would not be interested in those instances.

It is key at this point that, any database that has passed the filters is a database whose access is completely PUBLIC. Since there is no authentication, all commands are being executed as Administrators, so it is possible to perform any task over the instance, including the creation, modification, and deletion of measurements and databases present in the instance.

Unusual response from Alibaba Cloud

Note: Weirdly, a lot, if not most, of the Alibaba Cloud addresses seem to have some service that always responds with a 403 HTTP code and has no content. The HTML they include is just for refreshing the page every 5 seconds.

STAGE III: Monitoring and Statistics Visualization

With a bit of luck and skill, if everything went well, the only thing left to do is wait, monitor the evolution of the process, and wait for it to finish. For the monitoring task, we could not use any other than the most popular tool… InfluxDB. Using Node-Red InfluxDB node, it is possible to send the metrics directly to a local instance; in my case, its version 2.0. Grafana helps us with the visualization of our process’s progress in a dashboard with all the information that we may need.

First 6 hours of the analysis and monitoring dashboard first version
monitoring dashboard final version

After 48 hours and USD 4.93 in AWS credits later, we will have found about 55 thousand computers with port 8086 open, and 3 thousand computers will correspond to InfluxDB. Approximately 1000 instances will not have authentication, leaving us with just under 2% of the computers initially identified.

AWS Billing

For the analysis section, in my opinion, InfluxDB is not well suited for the task due to the low level of detail and the difficulty of manually exploring the data. Now we have 1,015 instances, 2332 unique databases, and more than 161 thousand different measurements, a better alternative for analyzing the results could be TimeScaleDB or ElasticSearch.

Process General statistics

Something very curious is the amount of InfluxDB instances in Digital Ocean. This situation may be because Amazon and Azure have their own TSDB solutions, such as Amazon Timestream and Azure Time Series Insights, which are serverless solutions. Their integration and management are more accessible and natural than deploying an additional database engine.

Distribution of databases identified by Cloud provider

In total, 39 different versions of InfluxDB were identified. The most prevalent version is 1.8.3, followed by 1.8.9, which is not too bad; generally speaking, InfluxDB has very few CVEs, and the only recent one that has some impact is CVE-2019–20933, which could allow authentication to be bypassed (Seems that there are not many efforts in Authentication).

InfluxDB versions distribution

When we start performing the databases and metrics analysis, we notice that the most common database is “telegraf”, which is the complement developed for collecting measurements. However, having 1015 InfluxDB instances, we will also find other databases with sensitive information, such as:

  • Server Monitoring with internal network IP addresses and names.
  • IoT sensors and their corresponding metrics
  • Kubernetes monitoring, with details of each instance
  • Analytics systems and transport systems
  • Industrial platforms and automation
  • Cryptocurrency transaction records
  • Cryptocurrency mining cluster monitoring
  • Accounting Systems
  • Nginx and Apache error and access traces, including URLs with sensitive information
  • A very long etcetera

The information in the databases found comes from home users and large companies that use InfluxDB in their day-to-day operations. However, these databases are fully open for to an attacker; therefore, not having authentication implemented is no longer trivial and harmless.

ElasticSearch Dashboard — Full video: https://youtu.be/nBqO4KdFDzk
Information present in the Instances

Closing

The process that has been outlined in this article is not necessarily the most efficient nor even the most effective (Nmap Scripting could be better). There are multiple ways to get the same results, and some might even argue that using “Shodan” is better, given that it is possible to initially filter the InfluxDB instances, which are about 20 thousand worldwide according to their site; however, it sure would not cost 5 dollars!

InfluxDB 2.0 is a much more mature platform, with significant changes over the security layer level and data query language. These changes make adoption a very slow process that requires redoing or entirely translating the business’s logic into a new language. Actions should not be delayed until the database update is performed; they should be executed today.

InfluxDB 2.0 Initialization Process

InfluxDB, prior to version 2.0, did not have security mechanisms enabled by default, but this does not mean that these types of mechanisms are absent from the solution. As always, and applicable to any product, the recommendation will be to update to the latest version (1.8.10) and configure authentication and encryption; that should be enough for keeping instances protected against curious attackers who are willing to run an automatic process for 48 hours and spend 5 dollars.

My cat staring away, embarrassed by how you set up your databases

Related Links:

--

--

Iker

CyberSecurity, Information Security, Tech and Data Enthusiast, Amateur Developer