User talk:Dušan Kreheľ/Wikipedia talk:New matrix format

Existing "ez" compressed format, as well as pageviews complete
Hi! The pageviews "complete" dump version does just this. It's a bit of a mess because the Analytics team that maintains these dumps has changed a lot in the middle of a big effort to create the new dump. But the details that are relevant are thus:

Milimetric (WMF) (talk) 19:46, 5 September 2022 (UTC)
 * pageviews_ez is the old dump that first implemented an idea similar to yours: https://dumps.wikimedia.org/other/pagecounts-ez/
 * pageviews_complete is the new version that should meet your needs going forward: https://dumps.wikimedia.org/other/pageview_complete/readme.html
 * lots of documentation updates are needed to make this clear
 * we need to clean up old jobs that are still running and giving the impression that other datasets are supposed to be how people download data


 * @Milimetric (WMF): Thx, I looked. My way idea was to have the years export. pageview_complete have only the day statistics. Dušan Kreheľ (talk) 20:32, 18 September 2022 (UTC)
 * @Dušan Kreheľ: Indeed, pageviews_complete has daily and monthly statistics. The monthly rollups are here, linked from the daily ones: https://dumps.wikimedia.org/other/pageview_complete/monthly/.  Perhaps that should be clearer from the front page.  If yearly rollups are useful as well, we should probably just add them to this dataset rather than creating a different dataset, in my opinion.  What do you think? Milimetric (WMF) (talk) 13:36, 19 September 2022 (UTC)
 * @Milimetric (WMF): Thx for the comment and the links. My actual answer on your question is in the section Epilogue of the article. You look. Dušan Kreheľ (talk) 20:21, 16 October 2022 (UTC)

Comparison for other formats
Thanks for sharing it - this is interesting idea. I wonder how does it work compared to other known formats to store matrix with many zeros like Sparse_matrix. Eran (talk) 09:50, 15 October 2022 (UTC)


 * @ערן: Excelent comment. I compared the examples for the section Compressed sparse row (CSR, CRS or Yale format) from enwiki page and my format is better. Dušan Kreheľ (talk) 07:08, 16 October 2022 (UTC), Dušan Kreheľ (talk) 07:09, 16 October 2022 (UTC)