How to bypass CAPTCHA: Breaking Down a Complex Process in the Simplest Terms

от автора

CAPTCHA is not just a single word that can be defined; it’s an acronym consisting of nine words (and two prepositions): Completely Automated Public Turing Test To Tell Computers and Humans Apart. This mouthful was shortened to the concise CAPTCHA to avoid creating yet another hard-to-pronounce term. Translated into Russian, this abbreviation sounds like «Полностью автоматизированный публичный тест Тьюринга для различения компьютеров и людей» (Fully Automated Public Turing Test to Differentiate Computers and Humans).

You can’t form «CAPTCHA» from this set of words in Russian, can you? But that’s not really necessary; everyone understands what it’s about. Find the specified images or enter the given text to confirm you’re not a robot.

Bypassing a CAPTCHA isn’t difficult if you’re human. It’s more challenging if you’re a human with a bunch of accounts working autonomously (this is called automation). You need to use automation tools, switch proxies, buy browser fingerprints, and this list can go on for quite a while—there are many options.

The problem is that there’s no universal solution as such. For instance, a parser that collects information from a specific site is written for the particular type of CAPTCHA present on the site at the time of writing. If the site updates its CAPTCHA, the parser stops working—or rather, when encountering a new CAPTCHA, the parser ceases to function.

That was a bit jumbled… Let me try again. If your parser can bypass reCAPTCHA V2, it won’t be able to bypass reCAPTCHA V3 because these are different types of CAPTCHAs, each with its own set of unique parameters that need to be considered.

Clarification: If your parser doesn’t include all solving methods but only one, then the above applies.

And a simple example: if you try to push a sphere into a cube-shaped hole, it won’t fit. The same principle applies here.

Now let’s delve into the types of CAPTCHAs and what’s required to recognize them.


Why and What Needs to Bypass CAPTCHA, and Who Helps with It?

It’s a fairly simple question for those who understand the topic and more complex for those who don’t. It might seem that a user browsing the internet wants to register somewhere or download something, a CAPTCHA pops up—they solve it and continue their work. That’s it.

For everyday internet surfing, this information is absolutely useless. However, for those who handle automation tasks, test web resource loads, parse data, and perform similar tasks—understanding the types of CAPTCHAs is crucial as it saves a lot of time.

So, bypassing CAPTCHAs is important for:

  1. Automators: Those who automate routine tasks, the monotonous execution of which on certain resources triggers CAPTCHAs.

  2. Testers: When setting up web resource security, various scenarios must be anticipated, including bot influxes. To protect against this in the future, one must simulate such an influx in the present.

  3. Script or Parser Developers: Manually collecting information takes a lot of time, even from one’s own internet resources, let alone competitors’. Any resource with information protects itself from overloads, including with CAPTCHAs.

Who helps in bypassing CAPTCHAs? The most skilled programmers don’t need anyone’s help; they can train their own models to solve CAPTCHA or use proxies (cycling through proxies can prevent CAPTCHAs from appearing altogether).

For those who can’t or don’t want to bother, CAPTCHA solving services come to the rescue. They come in several types:

Manual recognition is more expensive but offers near 100% accuracy (you get what you pay for). In contrast, automatic recognition services may be cheaper, but the quality leaves much to be desired. When bypassing complex CAPTCHAs, they can be entirely useless. OCR is generally intended for bypassing text CAPTCHAs and can’t handle the latest generation of CAPTCHAs.

Now, let’s explore the types of CAPTCHAs.


CAPTCHA bypass by Type and Their Differences

There are several types of CAPTCHAs that vary in complexity and the number of factors considered to pass them.

Listing each CAPTCHA would be tedious, so I tried to classify them, and here’s what I came up with:

  1. Image and Visual CAPTCHAs

    • reCAPTCHA V2

    • hCaptcha

    • GeeTest CAPTCHA

    • Rotate CAPTCHA

  2. Behavioral and Invisible CAPTCHAs

    • reCAPTCHA V3

    • Cloudflare Turnstile

  3. Text CAPTCHAs

    • Simple CAPTCHA

    • Amazon CAPTCHA

  4. Audio CAPTCHAs

    • Audio CAPTCHA

  5. Other Interactive CAPTCHAs

    • CyberSiARA

    • atbCAPTCHA

    • GeeTest CAPTCHA

    • MTCaptcha

    • CutCaptcha

    • Tencent CAPTCHA

    • Lemin CAPTCHA


Image and Visual CAPTCHAs — Bypassing CAPTCHAs in Pictures

This category includes the top two CAPTCHAs: reCAPTCHA V2 and hCaptcha, which are present on 70-80% of websites. Yes, their solving principles differ, but their operational logic is somewhat similar.

  • reCAPTCHA V2: Uses images and requires selecting objects (e.g., cars). It considers user actions like mouse movements and clicks.

  • hCaptcha: Similar to reCAPTCHA, it uses image selection and analyzes clicks and delays.

Beyond standard tasks (needing to click on a specific object or image), each CAPTCHA analyzes user behavior. In each case, this set is different, and the prioritization of these factors may vary. Only the company developing the bot protection knows these details for sure; we can only make assumptions by evaluating the significance of mouse movements or delays.

A less popular CAPTCHA in this category is Rotate CAPTCHA. Here, you need to rotate an image to the correct position. This CAPTCHA considers the accuracy of user manipulations during the solution.


Behavioral and Invisible CAPTCHAs — Solve CAPTCHA You Can’t See

An entirely different type of CAPTCHA that analyzes user behavior without explicit interaction. However, it’s essential to understand that the invisible part of the CAPTCHA is just the tip of the iceberg. If the system decides that the user (or bot) doesn’t meet the criteria of a legitimate visitor, they’ll be prompted to solve another (visible) CAPTCHA, which could be any of the types listed—textual, interactive, etc.

Invisible CAPTCHAs include:

  • reCAPTCHA V3: Analyzes actions on the page without direct interaction and assigns a risk score. It works with a user score ranging from 0 to 1, and depending on the administrator’s settings, it either lets the user through or continues to challenge them with visible CAPTCHAs.

  • Cloudflare Turnstile: Uses device and network data for verification without user intervention. It’s considered more complex than the previous type because if the invisible CAPTCHA deems you a bot, the visible challenge will be much harder.


Text CAPTCHAs — Solve CAPTCHA Almost for Free

The simplest type of CAPTCHA—a text that needs to be entered into a specified field. Despite its simplicity, this CAPTCHA once had several levels of complexity. It might consider spaces, be case-sensitive, include numbers or special characters, etc. But the fact remains that, compared to other types, bypassing a text CAPTCHA is very straightforward.

Cheap OCR services were used to bypass text CAPTCHA. Why «were»? Because text CAPTCHA are now rarely encountered, and many free solutions have been devised for their bypass that don’t require payment.

The main parameter considered when solving such CAPTCHA is the correctness of text input. So, the task for an automator (if there’s a need to automate text CAPTCHA recognition) is straightforward.

However, text CAPTCHA aren’t giving up easily, and the existence of Amazon CAPTCHA is proof of that. Yes, Amazon CAPTCHA is similar to a simple CAPTCHA and often uses text and numbers, but the catch is that these numbers and letters are distorted. If the system doesn’t like something, the difficulty level increases, switching from a text CAPTCHA to more advanced verification methods.


Audio CAPTCHAs — How to Bypass CAPTCHA by Ear

Originally developed for the convenience of visually impaired users, allowing them to pass CAPTCHAs if the system suspects the user. The principle of Audio CAPTCHA is that the user listens to audio and inputs the heard text, considering the accuracy of sound recognition. The presence of audio CAPTCHAs allowed for bypassing complex CAPTCHAs. Users (or bots) could simply switch to audio, transcribe it, and pass the CAPTCHA.


Other Interactive CAPTCHAs — When bypassing CAPTCHA Feels Like a Game

This category includes various CAPTCHA that use non-standard methods to verify you’re human. They might not have anything in common with the original Turing test but instead offer users complex or simple tasks.

  • CyberSiARA: Presents special tasks like puzzles or object selection and analyzes behavior.

  • atbCAPTCHA: Interactive tasks involving element manipulation, considering the accuracy of actions.

  • GeeTest CAPTCHA: Based on tasks like dragging puzzles or other objects. The system considers mouse movements, reaction time, and user behavior to determine if it’s a bot. GeeTest actively adapts tasks based on the suspiciousness of actions.

  • MTCaptcha: Uses tasks involving object selection and image manipulation, analyzing interaction time and accuracy.

  • CutCaptcha: Requires cutting or moving parts of an image, considering the precision of movements.

  • Tencent CAPTCHA: Offers puzzles and movement tasks, recording the accuracy and smoothness of actions.

  • Lemin CAPTCHA: Includes mini-games and object selection tasks, analyzing behavior and response time.


How to solve Each Type of CAPTCHA?

It would be too simple if every CAPTCHA could be solved roughly the same way. I decided to classify CAPTCHAs based on their solving methods, and here’s what I found:


Solving CAPTCHA with Tokens

Used for CAPTCHAs where you need to obtain a special response token for confirmation. The solution is straightforward: you send the CAPTCHA to a recognition service, and they send you a token in return. You insert this token where needed, and the CAPTCHA is considered solved.

CAPTCHAs that use this method include:

  • reCAPTCHA V2

  • reCAPTCHA V3

  • hCaptcha

  • Cloudflare Turnstile


Bypass CAPTCHA by Text Input

This involves recognizing text from an image and entering it. You send the CAPTCHA image to the service, and they return the text, which you input into the required field.

This method applies to:

  • Amazon CAPTCHA

  • Text CAPTCHA


Solve CAPTCHA Using the Coordinate Method

You provide the required parameters to the service. After solving the CAPTCHA, the service sends you coordinates and their sequence, which you need to use to bypass the CAPTCHA. The method isn’t perfect but is popular because it can be more effective for some complex CAPTCHAs than using tokens.

CAPTCHAs that use this method include:

  • GeeTest CAPTCHA

  • Click CAPTCHA

  • Draw Around

  • Capy Puzzle CAPTCHA

Some CAPTCHAs can be solved using multiple methods—either tokens or coordinates. The most reliable way is solving with a token, but it might be more complex. Using coordinates might be easier but doesn’t always work.

In the case of coordinates, you don’t get a ready answer with a guarantee that the CAPTCHA is passed but rather a method to bypass it, and you still have to implement the bypass yourself (or your bot does).


Thus, CAPTCHAs can be solved using different methods and approaches. Technical skills or choosing the right CAPTCHA recognition service play a significant role, and in many cases, both are essential.


ссылка на оригинал статьи https://habr.com/ru/articles/847302/


Комментарии

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *