Template talk:COVID-19 pandemic data/California medical cases by county

Source for population numbers?
Could someone post a source for the population numbers used to produce "Cases / 10,000"? Better still, is there a way to copy those numbers to this page? That will make it much easier to update the "Cases / 10,000" with the daily updates. — Preceding unsigned comment added by 24.6.53.222 (talk) 04:45, 16 April 2020 (UTC)


 * It's from List of counties in California. I have a spreadsheet where I update them daily, which is a bit more efficient though I wish it would happen automatically (like, that Wikimedia had a spreadsheet function for its tables...)  Or maybe someone could make a bot to do it? platypeanArchcow (talk) 01:52, 5 May 2020 (UTC)


 * See for a start. Let's figure out how to streamline and eventually automate the rest of the counties using either Commons tabular data or Wikidata statements. – Minh Nguyễn  &#x1f4ac; 20:58, 21 May 2020 (UTC)

San Diego: Include Federal Quarantine or not?
PlatypeanArchcow - thanks for the data updates. It looks like there's some inconsistency in whether we include the "Federal Quarantine" headcount from SanDiegoCounty.gov in the San Diego total; is there any consensus about whether those should be counted as part of SD's total? — Preceding unsigned comment added by Michaelrhanson (talk • contribs) 15:44, 21 March 2020 (UTC)


 * Yeah, I decided originally not to include "federal quarantine" but now think it's easier to just put in the topline number. I originally thought "federal quarantine" was people from Diamond Princess etc but it may just include all travel related cases.  In the end, a lot of the data isn't entirely comparable since counties are reporting slightly different things.  Might as well keep it simple.


 * By the way, thanks for the updates on your end as well! Make sure to update the topline when you do. platypeanArchcow (talk) 01:08, 23 March 2020 (UTC)

Why not all counties?
I noticed that Tuolumne county is not listed. (As of March 27 (perhaps earlier), it has 1 non-resident case it reported). Colusa is not listed and has a case too. Kings, also This template only lists 44 of 58 counties. At this point, wouldn't it make sense to include all counties, so the daily update wouldn't miss counties that report their first case? I could put them in, but my Wikipedia skills are not well honed. Currently missing counties and best page I found for current status: MabryTyson (talk) 21:09, 28 March 2020 (UTC)
 * 1) Alpine  http://alpinecountyca.gov/AlertCenter.aspx?AID=COVID19-INFORMATION-AND-UPDATES-11
 * 2) Colusa  http://www.countyofcolusa.org/99/Public-Health
 * 3) Del Norte  http://www.co.del-norte.ca.us/departments/health-human-services/public-health
 * 4) Glenn  https://www.countyofglenn.net/dept/health-human-services/public-health/covid-19
 * 5) Kings  https://www.countyofkings.com/departments/health-welfare/public-health/coronavirus-disease-2019-covid-19/-fsiteid-1
 * 6) Lake   http://health.co.lake.ca.us/Coronavirus.htm  (then click on latest pdf dashboard)
 * 7) Lassen  http://www.lassencounty.org/dept/public-health/public-health
 * 8) Mariposa  http://www.mariposacounty.org/1592/COVID-19-Information
 * 9) Modoc  https://www.modocsheriff.us/modoc-covid-19-incident-updates
 * 10) Plumas  https://www.plumascounty.us/2669/Novel-Coronavirus-2019-COVID-19
 * 11) Sierra  http://sierracounty.ca.gov/582/Coronavirus-COVID-19
 * 12) Tehama  https://www.tehamacohealthservices.net/services/communicable-diseases/ (then click on latest update)
 * 13) Trinity  https://www.trinitycounty.org/COVID-19
 * 14) Tuolumne https://www.tuolumnecounty.ca.gov/250/Public-Health

Thanks for this list!

Remaining as of 4/6: platypeanArchcow (talk) 02:06, 7 April 2020 (UTC)
 * 1) Lassen https://lassencares.org/ (new site!)
 * 2) Mariposa  http://www.mariposacounty.org/1592/COVID-19-Information
 * 3) Modoc  https://www.modocsheriff.us/modoc-covid-19-incident-updates
 * 4) Sierra  http://sierracounty.ca.gov/582/Coronavirus-COVID-19
 * 5) Trinity  https://www.trinitycounty.org/COVID-19

Disclosure: paid editing
I was paid by my employer (Google) while doing the edits which added and edited the table of county-level statistics. (Nature of edit: explicitly added the five counties for which there is an official count of zero cases.) Tal Cohen (talk) 12:08, 13 April 2020 (UTC)

Update of totals, cases/10K, and update date
I see many edits to individual case numbers without updates to the total line at the top, the cases/10K value, or the updated date at the bottom, even by major contributors to the page. Any thoughts on what to do about this? —[ Alan M 1  (talk) ]— 08:01, 16 April 2020 (UTC)

Sorry -- new editor here! I'll (try to remember to) update the total line at the top every time.

As for the others: If I update some but not all counties, is the best practice to leave the updated date unchanged, to show updates aren't complete? And about the cases/10K, please see my question at the top of the page. What is the source for population numbers?

24.6.53.222 (talk) 05:44, 17 April 2020 (UTC)

Clarify meaning of "recovered" numbers?
The meaning of "recovered" seems to vary by county. For example, Marin county assumes that all cases more than 2 weeks old are recovered. I'm afraid it might be misleading to show these numbers in the table with no further explanation. But some counties offer no explanation at all. Thoughts?

24.6.53.222 (talk) 05:48, 17 April 2020 (UTC)

Filter by region
I added an optional bayarea parameter that filters the table down to just the counties that comprise the San Francisco Bay Area, for the table at COVID-19 pandemic in the San Francisco Bay Area. Currently it uses TemplateStyles to hide other counties using CSS, as a hacky workaround until we can make this template more structured. (Ideally, I think we should store these numbers in Commons as a data table or in Wikidata as statements, but we'd probably want to address automation at the same time.) – Minh Nguyễn &#x1f4ac; 10:21, 21 May 2020 (UTC)

Tabular data
The figures for Santa Clara County and San Francisco are now drawn automatically from c:Data:COVID-19 Cases in Santa Clara County, California.tab and c:Data:COVID-19 cases in San Francisco.tab, respectively, via, while population figures are drawn from the counties' Wikidata items. This hopefully makes two counties easier to keep up-to-date. Please do not update the two data tables by manually copying values from the county dashboards; instead, see the data tables' talk pages for instructions on how to run a script that updates the whole table consistently. This matters for presenting accurate time series charts at COVID-19 pandemic in the San Francisco Bay Area. – Minh Nguyễn &#x1f4ac; 20:55, 21 May 2020 (UTC)

Per Capita math make no sense
Shouldn’t the cases per 100K simply be 10x that of the cases per 10K? A few counties are, but many are wildly off?

Also, what’s the point of having both per 10K & per 100K? One or the other would suffice.

Gecko GMobile (talk) 19:16, 1 June 2020 (UTC)
 * Thanks for catching and reporting. You're correct, this is wildly off. It looks like these errors were introduced by . Dan, can you take a thorough look? I suspect you made a mistake in ordering the column wrongly or something. I reverted all changes back to the last version that did not have this problem: it's better to have slightly outdated data than these errors.
 * I also agree that having both numbers is confusing, especially because they seem mildly off. Can't we just get rid of the /10k number and replace that with new cases, or something else more useful? effeietsanders 01:38, 3 June 2020 (UTC)


 * While we're at it, can we replace the January 2018 population estimates with July 2019 population estimates? – Minh Nguyễn &#x1f4ac; 21:07, 3 June 2020 (UTC)
 * Does anyone know who that IP is? If they could log in, we could actually have a conversation about this :) effeietsanders 22:17, 3 June 2020 (UTC)


 * Originally, this template had 10k numbers. 100k seems to be the standard for other states, so someone added that column. I presume the 10k was left for compatibility with something? At this point, someone should be bold and remove the 10k column. And I support switching to July 2019 population. EphemeralErrata (talk) 03:17, 6 June 2020 (UTC)

This confused me too. I thought maybe Cases/10k was actually supposed to be Deaths/10k, which would be an interesting number. Jb510 (talk) 23:34, 3 June 2020 (UTC)

Numbers for Los Angeles county (and probably other counties) are very old
What is going on here? I have been watching this issue for several days, and it has not been resolved. Ever since this edit (https://en.wikipedia.org/w/index.php?title=Template:COVID-19_pandemic_data/California_medical_cases_by_county&oldid=960450560), the numbers have been stale. They currently say 43,052 cases, but if you look at the source, the number shown is 63,844. I'm happy to attempt to fix it but have no experience doing this and don't want to mess things up. --Emmmmar (talk) 17:54, 8 June 2020 (UTC)
 * You're welcome to visit each county's Covid Dashboard and transcribe the numbers into this template. Change the Cases, Deaths, and if available, the Recovery numbers. As a bonus, update the cases/100k number too. Leave the auto-filled San Francisco Bay counties alone. Most states have a statewide dashboard that makes it easy to update these templates. California does too, except it is a day behind the individual county sites. Thus the necessity to visit each county's dashboard.EphemeralErrata (talk) 09:17, 9 June 2020 (UTC)

Removal from article and upcoming automation
I've temporarily removed this template from COVID-19 pandemic in California out of concern that it doesn't meet standards for inclusion in an article. The table had been updated piecemeal for months, which was problematic enough, but since a week ago, the table has been continually adjusted without any citations. Sources are especially important for this table, because different sources have very different reporting standards. (For example, some sources include San Quentin inmates in Marin County's case total, while others exclude them.) The usual sources for populations (U.S. Census Bureau, California Department of Finance, California State Association of Counties) don't corroborate the populations in this table, either.

On the bright side, I'm almost ready to replace this template with COVID-19 pandemic data/California medical cases by county/sandbox, which is automatically populated from Wikidata statements. It automatically sorts the rows, sums up the figures for the header row, and cites its sources. The populations currently come from Census Bureau estimates from last year, but if we find a better source, we can easily switch to it. Before we can deploy the new table, we need to automate updating Wikidata. This script gathers the requisite data from COVID Atlas, which automatically scrapes county and state dashboards and other aggregators, and generates QuickStatements commands that I've been running by hand. I'm still working through a few data issues in COVID Atlas, and I need to replace the QuickStatements part of the workflow with a proper bot, probably using pywikibot.

Thanks to Praline97, Qwerty325, Emmmmar, and others for your tireless contributions over the past several months as we coped with a lack of automation. Hopefully we can free up some of your time to work on other articles. – Minh Nguyễn &#x1f4ac; 00:50, 10 August 2020 (UTC)


 * Thank you, I think, for the upcoming automation. I've been editing 8+ other states, but not my home state due to the mess in this template. Two concerns: It is important to create an accessible daily history of case counts - does the Wikidata approach do that? Dashboards can have errors, omissions, and inconsistencies - does the Atlas crew include a human that checks and corrects the data? EphemeralErrata (talk) 11:52, 12 August 2020 (UTC)


 * Wikidata items have revision histories just like this template and the tabular data at Commons that powers COVID-19 pandemic data/San Francisco Bay Area medical cases by county. It's also possible for an item to have a number-of-cases statement for each day of the outbreak, but the approach I'm pursuing would only keep the latest day. Maintaining scores of statements for past days would quickly become unmanageable, because we'd need to keep all those past days' numbers up-to-date as the counties revise their numbers retroactively. COVID Atlas is a volunteer-driven open source project; no one is formally assigned to keep an eye on the data's validity, but there are ideas for automated tests and other users have been proactive in reporting issues. It would actually be easier for me to write a bot that scrapes the sites directly, but relying on COVID Atlas allows me to share the significant burden of maintaining the scrapers. – Minh Nguyễn &#x1f4ac; 06:48, 13 August 2020 (UTC)

Deaths per capita would be an interesting number
Where are the data structures and software that result in this table?

Deaths per capita by county would be an interesting number.... 0mtwb9gd5wx (talk) 09:25, 20 June 2021 (UTC)


 * @0mtwb9gd5wx: Sorry for missing this question. The table on the page is manually maintained. EphemeralErrata has just migrated it to the CDPH dashboard as the data source. (Thanks!) There's also a separate table for the Bay Area counties that's hooked up to a series of JSON tables via Module:Medical cases data; those tables are ultimately based on county dashboards, which differ from the state dashboard, most notably in Marin County. I've been updating the tables by script for the past couple years, but they're more reusable and extensible than the wikitext table in this template, or for that matter most of the county dashboards. Minh Nguyễn &#x1f4ac; 08:17, 21 February 2022 (UTC)

CDPH script
EphemeralErrata: Not sure what you used to pull together Special:Diff/1072345522, but in case it helps, I whipped up a little Bash script that grabs the latest per-county stats from CDPH and formats it as tabular data. I can look into integrating it into Module:Medical cases data, but for now, Module:Tabular data can transclude the whole table or any part you need:

Minh Nguyễn &#x1f4ac; 11:51, 21 February 2022 (UTC)

Partial updates
@EphemeralErrata: Same question as on Commons: going forward, as CDPH winds down its reporting, should we freeze this template as it is today or remove the case count column while updating the death count column each week? Minh Nguyễn &#x1f4ac; 08:34, 20 May 2023 (UTC)


 * As we transition from active reporting to historical reporting, I feel we should freeze this template as a snapshot in time. For example, if someone is looking for information on the 1918-20 pandemic, they'll expect the counts to cover that interval and not subsequent flair ups nor derivative viruses. In the future, I expect screen-scraping this template's history to be a valuable data source for a student who wishes to study the spread of Covid, but is unable to obtain official government data. Or they might access the data on Commons, though that is not possible for many other states. With Covid, it is looking like the long tail of continued cases could eventually swamp the counts from the initial pandemic. Perhaps it is time for a new article and template possibly called Covid in California post pandemic? California's continued reporting of deaths is unique as other states have simply ceased all detailed reporting. Some ceased reporting months ago, or even in 2022. Reporting very tiny changes has certain privacy concerns - such reporting can leak demographic data and resident status. For examples of challenging reporting, look at my work on the case count templates for Utah, which had dual level reporting, and Oklahoma, which ceased county level death reporting. EphemeralErrata (talk) 15:00, 20 May 2023 (UTC)
 * @EphemeralErrata: Yes, freezing this template and creating a new one for the long tail would make plenty of sense. Actually, the CDPH dataset is also a time series (cases/deaths/tests by county by date). We could upload the entire dataset to Commons for future research purposes, in case CDPH later takes down the archive and convenient API endpoint for SQL queries, but it's a massive amount of data for one page. Maybe one per county?  I've also been maintaining per-county time series tables on Commons for Bay Area counties, but those are from the county health departments themselves, some of which have different criteria than the state (particularly Alameda and Marin counties). A handful are continuing to post new case counts, though I don't know for how long. There are huge caveats anyways since systematic testing has ended. Some counties are posting wastewater test time series, which would be a fascinating alternative visualization in articles once Graph:Lines is operational again.  Minh Nguyễn  &#x1f4ac; 21:01, 20 May 2023 (UTC)