HAVING COUNT() might fail when too many rows

Hi,

This is an issue I am unable to reproduce locally on a WAMP server. So I assume this is specifc to this hosting service.

I did find a handful of workarounds, but I am still unable to explain why they work, nor why the query doesn’t always work to begin with.

I have the following table :

CREATE TABLE `art-tag` (
    `artId` int(11) NOT NULL,
    `tagId` int(11) NOT NULL
)
ALTER TABLE `art-tag`
    ADD UNIQUE KEY `art-tag` (`artId`,`tagId`),
    ADD KEY `artwork` (`artId`),
    ADD KEY `tag` (`tagId`);

With the following content :
art-tag.sql (7.4 KB)

On which I run this query

SELECT DISTINCT(artId)
FROM `art-tag` 
WHERE tagId IN (37) 
GROUP BY artId
HAVING COUNT(tagId) >= 1

In most situations, it works as intended. But now if I replace the tagId 37 with 25 it doesn’t return any results anymore.
Do I even have results for 25? Yes I do ! In fact I should have more results for 25 than for 37.
What I eventually found out, is that the tagIds with 39 entries or more were affected by the issue, those with 34 entries or less seemt no to be.
(The breakpoint must be somewhere in-between)

The workarounds I found are,
altering the SELECT clause :

SELECT DISTINCT(artId), COUNT(tagId)

or querying more than one tagId :

WHERE tagId IN (25, -1)

The issue would also affect this syntax :

WHERE tagId = 25

but is has a similarly odd workaround :

WHERE tagId = 25 OR tagId = -1

I did some testing with this and I’m very surprised by the results as well.

I was able to reproduce the issue locally using the same database flavor and version as we use on free hosting (Percona Server 5.6). Surprisingly, using the InnoDB storage engine worked fine, but with the MyISAM storage engine I got the same problem. Since InnoDB is typically the default, this is probably why you were unable to reproduce the issue locally as well.

I don’t know if you would get the same issue with MyISAM on other MySQL flavors and versions. I only tested with Percona 5.6.

My best guess is that this is some kind of quirk in the MySQL query planner. After all, you’re selecting rows which have a particular tagId, and then counting rows which have multiple tagIds. If MySQL does the grouping first and the selection after, you will get results. If MySQL does the selection first and the grouping after, the HAVING clause will fail.

What I also found is that omitting the DISTINCT clause from the select will also produce results (which, thanks to the unique constraint, should produce the exact same result set).

Long story short, I wouldn’t say that this is exactly a bug in the hosting platform. Instead, this is a quirk in MySQL which manifests itself in the configuration we’re using. The best I can do right now is to recommend to write your queries to accommodate for this, and to test your application with software which resembles our hosting stack as much as possible to avoid these kinds of “works on my machine” issues.

1 Like

The default storage engine on my WAMP installation has always been MyISAM. Anyway, I made sure I was using this engine, and still couldn’t reproduce it locally.

Unfortunately I’m not used to anything but Wamp, and don’t know whether or when I can familiarize myself with Percona. More so, this is a software I plan to release publicly, and having to potentially test out every single hosting environnement out there doesn’t really rejoice me.

Aside from using Percona, do you have a document listing the exact set up of your environnement ? There are probably a few more settings I can tune up on my side to replicate this environnement more faithfully.
If indeed, it turns out to be an issue specific to Percona, I believe we should let them know about it so they can look into it; this kind of discrepancy between frameworks isn’t making it easy for anyone.

Which is odd, because InnoDB is generally the default.

Which version and flavor of MySQL do you use? Percona 5.6 and MySQL 5.6 should be very similar, but MySQL 5.5 or 5.7 probably work differently. And from what I can tell, WAMP ships MySQL 5.7 (and MariaDB 10.3).

You can use the SHOW VARIABLES query to check the server settings. But to my knowledge, we don’t run any specific settings which would cause different behavior compared to other Percona 5.6 + MyISAM servers.

Given that you don’t experience this issue with MySQL 5.7, this issue may not be present in Percona 5.7, so this issue may have already been fixed in recent versions.

If you feel like opening a bug report with Percona for this, please go ahead.

But, to be honest, with a query like the one you wrote, I’m not really surprised that MySQL doesn’t know what to do with it. I don’t know what the “right” answer is when combining WHERE tagId IN and HAVING COUNT(tagId). Queries like that are hard to argue about, both by humans and by query planners. My suggestion would be to avoid ambiguous queries like that entirely.

I don’t find this query to be particularly convoluted, in fact when I came up with it, I was surprised with how straight-forward it was compared to the other solutions I had considered; I doubt it can be made any simpler. This combination of WHERE and HAVING simply translates to “Find all arts which have at least X tags from the given list”.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.