Is there a way to only get the head? Seems like it’s a waste dealing with the body, when dealing with meta.
In bash you can use curl -I https://example.com
But that’s the only way I know, and I’m guessing Probably not what you’re looking for?
Seeking optimization of:
https://yusi-dev.hstn.me/versioning/main.php?file=social/services/get_meta.php
A quick Google search tells me that the cURL options CURLOPT_NOBODY
and CURLOPT_HEADER
can be used to only retrieve header information without retrieving the body.
As shown here:
https://stackoverflow.com/questions/1378915/header-only-retrieval-in-php-via-curl
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.
Tried to CURL_NOBODY. It seems rather than NOBODY, it’s NOHTML. Didn’t ask for the HEADER information. I asked for the HEAD tag.
Please note that requests from non-browsers (like cURL) are blocked on free hosting:
I merged the posts into the old topic for context, and reopened it so we can continue the discussion.
I’m sorry, but I don’t think what you’re asking for is possible with cURL.
cURL is just a HTTP client. And the HTTP protocol can be used to transfer any kind of data. The CURLOPT_NOBODY
option refers to the HTTP response body, not the HTML page body. And it’s not called CURLOPT_NOHTML
, because that would make zero sense if you’re downloading a CSS file or an image file for example, because there is never any HTML to begin with.
cURL doesn’t really “understand” HTML. It just downloads whatever is at the URL and gives the downloaded data as-is to your code.
If you only want to read the HTML page <head>
section, you will still need to download the HTTP response body to get access to the HTML page.
I think it’s possible to do what you want by reading the HTTP response body (i.e. the HTML page) in a streaming way, so you can read some data, look for a </head>
tag, and when found, close the connection. But I’m having a hard time finding out how to do that exactly, and all the potential methods I saw were pretty complicated.
Got it. Body in body.
Seems weird they (standard bodies) default to wasting network resources rather than putting in a little extra work to do an XPath type query on HTML and only return the relevant section(s).
Seems like you’re caught up in a rote response routine. Not sure this is even relevant to the discussion. We are talking retrieval, not external invocation.
Again, cURL is just a HTTP client. It’s a very good HTTP client, and can do basically anything you can imagine when it comes to HTTP.
But it only does HTTP. It doesn’t care at all about what data is being transferred over HTTP. Remember that cURL is not just for transferring HTML, it can be used to transfer basically any kind that exists.
It’s a tool that does one thing and does it well. As it supposed to be.
I see a very long list of issues when it comes to doing what you want as part of cURL. To name a few:
- Doing an XPath query means parsing the entire document first, which means you cannot just do it to save bandwidth.
- HTML can be very messy, and parsing can be hard. Doing this streaming makes it all the more difficult.
- The HTML head section being small and the body being big is an assumption, which is by no means guaranteed to be the case.
- HTML documents in general are not that big (usually only a few MB at most), providing little practical benefit for bandwidth savings.
- cURL adding specific support for HTML opens the door for requests to support virtually any data type in existence, which could lead to massive feature bloat that most people will never interact with.
All in all, what you want to do is very much a niche use case, and in my opinion it’s not something you can expect a generic and wildly used tool like cURL to just handle for you.
If you want to do it, you probably can, and it might even benefit you. But you’ll have to build it yourself, or find third party software that an do it for you. Because cURL can’t do it and probably never will.