Wikipedia:Reference desk/Archives/Computing/2023 May 10

= May 10 =

Is there a way to detect that a news website is a "News" website?
Does the HTML source code of news websites include any special or distinctive HTML tag or HTML tag attribute that indicates something like "Hey there! This website is primarily about news and is part of the mass media!"?

Thanks. 2A10:8012:17:CDC6:D066:AC47:1958:9121 (talk) 20:16, 10 May 2023 (UTC)


 * They will typically have meta tags for description and keywords identifying themselves as news sources. For example, for The New York Times, their website www.nytimes.com has (slightly simplified),  and  . For the BBC at www.bbc.com we find, more concisely,   and  . None of this is standardized, and nothing prevents Joe Shmoe from Podunk to set up a website advertizing itself as the go to place for in-depth reporting of the latest news from all over the world.  --Lambiam 12:01, 11 May 2023 (UTC)
 * Thanks. 2A10:8012:17:CDC6:79FB:4D40:EB68:7253 (talk) 00:11, 12 May 2023 (UTC)
 * Detecting whether a website is a "News" website based solely on the HTML source code can be challenging and not always reliable. While some news websites may include specific HTML tags or attributes indicating their nature, there is no standardized or universal tag that all news websites must use.
 * However, you can look for certain elements in the HTML source code that might suggest a website is focused on news. Here are a few common indicators:
 * Meta Tags: As you mentioned, news websites often include meta tags for description and keywords that identify themselves as news sources. These tags may contain keywords like "news," "breaking news," "current events," etc.
 * Structured Data Markup: Some news websites implement structured data markup, such as schema.org's NewsArticle markup, to provide structured information about their articles. This markup can include properties like headline, date published, author, and more.
 * RSS Feeds: Many news websites offer RSS feeds that allow users to subscribe to their content. Look for tags with type="application/rss+xml" or type="application/atom+xml" attributes, which can indicate the presence of an RSS feed.
 * URL Structure: News websites often have URLs that reflect their news sections or categories. For example, a URL like "news.example.com" or "example.com/news" may suggest a news-oriented website.
 * Content Markup: News articles on reputable news websites often follow a specific content structure. Look for HTML tags commonly used in news articles, such as for headlines, for paragraphs,  for publication dates, and  for article sources.
 * It's important to note that these indicators are not foolproof and may vary from website to website. Additionally, some websites may not have clear indicators or may use generic tags that are not specific to news. Therefore, it's advisable to consider multiple factors, including the website's branding, content, and reputation, when determining if a website is a reliable news source. DSamuel088 (talk) 09:07, 17 May 2023 (UTC)