Essential Lessons from the Duolingo API Breach

Duolingo is one of the largest and most popular language learning apps in the world. In August of 2023, it was reported that data on 2.6 million Duolingo users – including names, email addresses, and more – had been leaked onto a hacking forum. Duolingo’s API for user account access only required an email address, with insufficient verification. That allowed a hacker with a large email database to run a script against the API and gather a bunch of account data each time a matching email was found in their system. This once again underscores the importance of API Security as this PII data allows scammers to create very believable phishing attacks. This is potentially bad for every Duolingo customer, and very bad for Duolingo’s reputation.

The Duolingo leak shows that APIs have become a major frontier in the cybersecurity wars. Consequently, any company that does not take API Security seriously does so at their own peril. As many have pointed out, the data leaked from Duolingo was not the result of a traditional incursion exploit and might barely be considered a breach at all. Yet, the impact is tangible and damaging, for both Duolingo and their customers. So, what can we learn from this incident and what actions should we take to avoid being the next victim?

Lesson 1: You Have More APIs Than You Know, and You Don’t Know What They All Do

Typical enterprises today rely on an average of twenty thousand APIs to run their businesses, from front-end websites serving customers to thousands of apps and microservices on the back end. Whether developed in-house, contracted, or open source, 90% of developers are using private or third-party APIs (Slashdata Developer Economics Survey). That’s an awful lot of APIs without proper security control or review. In addition to potential security issues, APIs often have multiple functions, even if only one is actively in use. Hackers may be able to trigger other functions with unknown consequences to the business. The opportunities for hackers are endless.

These “Shadow APIs” that live outside the normal IT governance and security processes have become the number 1 attack vector. Out of 16.7 billion malicious transactions, almost a third (5 billion) targeted unknown, unmanaged and unprotected APIs1. OWASP lists these APIs under API9:2023 in their API Security Top 10 – Improper Inventory Management. Most enterprises have no way of even identifying every API in use, not to mention evaluate their functions or remediate security issues.

Lesson 2: Your APIs May Be Helping the Hackers to an API Breach

API protocols that simplify communications between systems and services are perfect opportunities for malicious actors to gain unwanted access. For instance, a Duolingo customer is leveraging APIs when they log into their account to access their usage, spending, and profile data. In this data leak, hackers used a bot to find and access Duolingo accounts through this API to identify accounts and scrape information associated with each. This API made it easy by not validating each user before granting access. Poor or missing validation logic is an extremely common API security gap that can easily lead to an API breach.

Another common way that APIs make hacking easy is by exposing data structures in function calls. This can occur in a million ways, but a simple example would be a URL or data call that includes a number reference to an account. All a hacker has to do is increment that number to call the next account. Easy. Developers are paid to program functionality, and security often takes a back seat, or is never addressed at all.

APIs often focus on function only – not security.
You can’t protect what you don’t know about.

“This incident underscores that not all attacks on digital resources involve traditional hacking techniques,” – Jason Kent, hacker in residence at Cequence Security, in Infosecurity Magazine.

Lesson 3: Traditional Tools Won’t Protect You from Poor API Hygiene

Traditional security approaches assume the goal of hackers is to gain access to sensitive information protected behind some secure boundary, like a firewall or authentication gateway. Many APIs are designed to go around or through those boundaries intentionally, to efficiently serve business needs. If traditional security is the lock and deadbolt on the front door, APIs are the open windows and vents that also provide access into the house, if you simply know where to look.

While traditional security solutions monitor and identify the precursors and activity associated with traditional hacks, API usage is normal and expected, making it much harder to identify when malicious activity is happening. Poor logic, such as the insufficient validation of the Duolingo APIs, is perhaps one of the most common API security holes but won’t be discovered by most security solutions. Beyond logic, the list of API hygiene issues is long indeed. Some of the more common issues (also mostly missed by traditional security tools that do not address the full API Protection lifecycle) include:

Poor data input sanitizing, allowing injection attacks
Insufficient or missing encryption
Validation to protect against cross-site scripting (XSS)
Inadequate rate limiting or throttling to protect service
Poor error handling
Unintended information or data structure disclosure
Weak or broken session management
Dependencies on third-party and/or outdated libraries
Insufficient logging
Unknown and unaccounted for APIs
Etc.

Only solutions specifically designed to identify, inventory, and evaluate APIs can protect against these and many other security holes. Unfortunately, too many organizations still have this hole in their security arsenal.

APIs could be open windows, even when the doors are locked.

“The Duolingo data breach highlights the vulnerabilities posed by poorly secured APIs and the potential for business logic abuse by threat actors,” said Jason Kent, Hacker in Residence, Cequence Security, in CPO Magazine.

Takeaway: Understand and Mitigate Your API Attack Surface to Prevent an API Breach

Understanding the importance and necessity of API security is only the first step to completing your security strategy. The first thing any company must be able to do is identify and INVENTORY the tens of thousands of APIs that are already operating in the organization. Once a complete list is generated, every API must be evaluated for COMPLIANCE against every API hygiene rule, including OpenAPI specifications, appropriate governance guidelines and security best practices. To the best of your ability, proactively address and correct identified issues. Then, ideally, deploy a system capable of MITIGATION, which provides the ability to block API attacks in real time with a low false positive rate. Mitigation capabilities should also include DDoS PROTECTION.

Now that the bad actors have recognized the API opportunity and have made it their number one attack surface, it’s time for organizations to make it a priority as well.

Get an Attacker’s View into Your Organization

Free API Security Assessment

Essential Lessons from the Duolingo API Breach

Lesson 1: You Have More APIs Than You Know, and You Don’t Know What They All Do

Lesson 2: Your APIs May Be Helping the Hackers to an API Breach

Lesson 3: Traditional Tools Won’t Protect You from Poor API Hygiene

Takeaway: Understand and Mitigate Your API Attack Surface to Prevent an API Breach

Get an Attacker’s View into Your Organization

Sign up for the latest Cequence Security news

Related Articles

Why Do I Need API Security if I Have a WAF and API Gateway?

CFPB to Announce Major Open Banking Proposed Rule

Unpacking API Security from Development to Runtime: Key Insights for Cybersecurity Pros