Anti-bot Discussion on the i=1 thing

I only referred to it to indicate that legitimate websites can have high rates of traffic, and that having strict rate limits will probably impact legitimate sites.

Using EP limits for rate limiting to protect sites would be a laughably stupid solution given that it penalizes the entire account if the usage is too high. Any rate limiting solution should keep out bad traffic and let good traffic go through, which is the kind of distinction a process limit doesn’t do at all.

Tracking individual URLs for rate limiting can be useful, but comes with a lot of caveats and sharp edges. It can capture too little or too much depending on how the site is built.

If a site is using a single URL route to expose different features, like /index.php?url=/posts/123/comments or /api.php?function=GetPostComments&postId=123 you can have legitimate traffic making a large number of requests to the same path. This can be mitigated by also checking the query string, but that would make it trivial to bypass the rate limits by just adding bogus query parameters.

At the same time, an attacker could very quickly try to scan the URLs /user/1/settings, /user/2/settings, etc. which is definitely not regular traffic, but queries many different URLs.

The final thing to consider is the amount of bookkeeping that needs to be done. Rate limiting in web servers can be done very efficiently with in-memory hash buckets, but any such internal register has a limited size. Keeping track of just IP addresses is doable, but if you have to keep track of request counts on IP+URL, the number of counters quickly becomes very big, so either you need to dedicate a lot of memory to it, or evict items really quickly.

I personally really doubt how effective a URL based rate limiting system could really be, given that a lot of attack traffic doesn’t involve hitting the same URL repeatedly. Many bots will just fire off thousands of requests at different URLs with different parameters to try and poke for certain software with certain configuration.

Do you have any ideas on how we could rid of the URL parameter?

The ?i=1 part is a counter to keep track of the number of attempts. That way, if the browser does the redirect but doesn’t send the cookie, it doesn’t get stuck in an infinite redirect loop where it will just keep refreshing the page until the user navigates away, DDoS-ing the server in the process.

Admittedly, redirecting to the Google cookies page is probably just being lazy. A page that would describe what happened (security system, browser validation, checks JS and cookies, etc.) would be more helpful, rather than a page about cookies with no information as to why.

5 Likes

Such information is charged additionally :joy:

CF has something similar but “doesn’t bother” with queries


Of course that’s why there should be more layers as well as more rules in FW
which would repel at least the stupidest bots in the simplest and most painless way

it is easy for me to configure a lot of things for my website
for example, I have a rate limit for all .php on CF
and of course I don’t use a clean url (no extensions), so whatever you add to the url to make it fake and unique, chances are high that you’ll be blocked/RL

I have the impression that we are only discovering hot water here…
it would be ideal to have bot management that can detect and block bad bots using various techniques,
anything less than that is not smart enough and is easily circumvented and it is not fully automated (the rules change on the fly depending on what is happening in RT on the server), etc.

And with all that, we should also think about the appearance of quantum computers and adapt the aes.js code so that outsmarts them :smirk:

4 Likes

From what I understand from that article, it’s not actually that similar. Cloudflare’s solution appears to inject code into the web page after it has already been returned. It can be used to stop bots on subsequent requests, but the first request still goes through.

The testcookie solution is most similar to the Under Attack Mode solution from Cloudflare. Under Attack mode also checks for Javascript and cookies.

I don’t know what happens if you get an Under Attack mode page but cookies are disabled. Will you just keep looping the challenge page or will Cloudflare kick you out?

4 Likes

I don’t know - as far as I understand it stays in a loop until you enable js and cookie so that you are able to solve the challenge https://community.cloudflare.com/t/understanding-under-attack-mode/358178

They probably follow bots that are unable to solve even after x attempts and do not offer them a challenge at all through some time period and I assume that they are then shown the classic CF ray ID (error)

4 Likes

Hi Admin and Oxy,

Just got back from a development project.

I think we have to acknowledge the difference between a hosting and a proxy provider. Proxy focuses on the network layer while hosting can range from the application layer to the network layer (if excluding hardware for both for the time being). Cloudflare can do this without query parameters because their system is engineered to keep track of the connection even after the challenge and make sure it stays that way. However for hosting, once the cookie header is set, everything is good to pass, or at least not all network traffic with the header is rescanned for malicious things.

It’s more about the fundamental thing: the current system relies on the client-side to store the iteration, which can be forged. In a poorly coded PHP while loop scenario, the URL will always come without i=1 and no cookie storage, effectively DDoSing the server. While certain rate limits can block the IP, it’s easy to change the IP to bypass that as well, blocking innocent visitors if there’s an open Wi-Fi in the process. As long as the iteration is kept on the client side, this ain’t gonna improve. Cloudflare has a middle machine to keep track of the iterations so it does not have an impact on the web browser, but that’s an additional cost from a hosting perspective, that’s why I say there’s a difference between proxy and hosting in the first place.

lol :rofl: Or maybe you can redirect to something like error.inifityfree.com to show something about the error or simply instruct users to turn on Js and have a read more info button that goes to Google?

Good doubt indeed, as it’s also not effective. Bad actors have already evolved beyond that to send distinct requests to bypass checks on request headers, user agents and even custom request methods. So URLs are just child’s play for them.

I do see framework-specific solutions implementations on the market, that can track routes based on the framework’s architecture, on which rate-limit is imposed. For example, many routes that are under WordPress Admin are towards wp-admin/index.php, then those routes are detected from within WordPress function calls and tracked that way, but I doubt if developing such a thing from a hosting management perspective is cost-effective given the amount of application that is supported in PHP, not to mention custom PHP scripts, of which is the majority here.

They place either a localStorage or a cookie-based token to make subsequent requests inherit the previously passed challenge, but it also gets refreshed after a while. Fundamentally speaking, it’s still JavaScript and Cookie for sure.

Solution: you’ll be prompted to turn on cookies without page refresh if in a browser (both legitimate/emulated)

That time period is very long, presumably not exist at all, but it’s just there as a good to know. They cannot risk blocking legitimate visitors that come right after a Ddosser on the same IP, similar to the free WiFi paradox.

Cheers!

4 Likes

This is what it will show (I saw that myself before)
image
(Original image source: https://community.cloudflare.com/t/the-web-always-shows-enable-cookies/433257)

You know InfinityFree is MOFH… It basically means Admin can’t do anything to change that

6 Likes

I tested this shortly after asking this question myself. And it turns out that you’ll just stay on the challenge page and aren’t redirected at all.

I wondered a bit why, but then I realized you could just test within Javascript whether cookies are enabled. You can read the document.cookie string after writing it or check the navigator.cookieEnabled parameter to see it. Checking that could be a nice and simple addition to the current security check to avoid the redirect to Google. Only browsers that do support cookies in Javascript but don’t send them to the server would be affected, which is a very small group.

And maybe with some combination of request logging and banning, where you ban IPs that see the challenge page but keep trying for some reason. That doesn’t seem too complicated to implement and might allow you to get rid of the URL counter at all.

The URL counter is only really relevant against “accidental” DoS attacks. If you want to keep hammering the challenge page with requests, you can just do so. Just have a loop that calls the same URL over and over again and don’t look at the results. No need to do anything with the Javascript code or the redirects if you don’t want to.

For the most part, IP based rate limits are pretty effective. Static blocklists don’t work very well due to the reasons you named, but you can’t just spoof your IP address. One public WiFi network will probably just have one IP address you talk out with, and while you can reset your home router, you can’t just send requests from thousands of IPs at the same time. Getting many servers at cloud hosting companies gives you a lot of different IPs, but those can be blocked by the recipient with very little side effects. And actually having a botnet is such a high level of sophistication that there isn’t a lot you can do against that.

True, such functionality exists, and the client area also has it. But its usefulness is limited because for this rate limit to work, the PHP request must already be started, which from a DoS prevention makes it pretty worthless. It does work well against (light) brute force attacks and spam though.

9 Likes

At the beginning of 2018, in one post I expressed my surprise that aes.js was not minified (all in one line of code) and today everything is the same, in addition, it also contains a lot of comments inside.
I understand that the ones related to copyright were left, but I don’t understand why the long ones that describe the code were not removed…

If they used minified code, then the required traffic would be cut in half (bandwidth)

Screenshot 2023-10-14 195357

and Gzip/Brotli was not used either

Screenshot 2023-10-14 201341

8 Likes

I think this may actually be one of the things they could easily (is anything ever easy for iFN though? lol) and reasonably implement.

7 Likes

Hi Admin,

I was getting occupied by some other things this weekend, but I’m back to the discussion.

In this case, I would like how will any user get out of the IP ban if they use shared IPs like public WiFi hotspots or coffee shop networks?

Yes, this part is what I was talking about, Ddosers can simply do this to eat up the bandwidth and just keep downloading the challenge page, which is kinda still an effective Ddos if you ask me.

VPN, mobile cellular network, or use another hotspot, dah~ While it’s possible to identify VPNs and simply block them, it would also mean that some visitors aren’t going to connect if they are in locations without regular internet, this might not be a concern for most website owners tho.

Hi Oxy and wackyblackie,

The notion of “if it doesn’t break, don’t change it” :rofl:

Cheers!

5 Likes

I wanted to show actually how illogical it is because
on the one hand, there is an antibot system that should prevent unnecessary traffic
and on the other hand there is an unoptimized script (a script that is served millions of times per day) that consumes/generates probably as much traffic as the bots.

5 Likes

I intend to compile a list of suggestions from this article and discuss those with iFastNet. Compressing/minifying the aes.js file won’t help address the fundamental issues we’ve been discussing here, but it seems like a straight forward improvement to me.

It would have to be a short lived ban in combination with reasonable limits. All it needs to do is prevent someone from hammering the server, not replicate the “three strikes” system that currently exists. Just something to stop the repeating loop.

You could even limit it on IP+User Agent to reduce the impact somewhat.

It’s useful against one very specific type of DDoS. It’s indeed useless against all others. The usefulness is limited, but it’s definitely there.

Cellular network is 1 IP, every hotspot is 1 IP, but you may have to move around to access different hotspots, VPNs can provide multiple servers, but have limits on the number of connections you can use at any time. But you could use multiple VPN providers, depending on your system and it’s networking stack.

With some effort you could maybe get a dozen simultaneous IPs active. It’s simple enough to block them all. URLs, user agents, etc. can all be filled with unique values so you can have thousands or millions of unique entries with little effort.

10 Likes

Just one thought when I saw the other post mentioning social preview, is it possible to whitelist certain popular well-known social services like Google, Facebook, Twitter (now X), WhatsApp, Signal and Telegram?

I think there’s a valid point for those to bypass the security thing at least from a website owner perspective. Otherwise getting traffic and ranking SEO would be quite tough here if website can simply rely on word-of-mouth.

3 Likes

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.