User:Srishti0gupta/Outreachy 3

Notebook
Link to Outreachy Task 3 Notebook : https://public.paws.wmcloud.org/User:Srishti0gupta/Outreachy_Final_3.ipynb

How I learnt
I started by understanding your notebook [1]: https://bitbucket.org/mikepeel/wikicode/src/master/example.py

Also, in line 106 of this file there needs to be a minor edit, '<' tag before the first strong

print 'Museum: ' + item.split("strong>Museu: ")[1].split(" ")[0].strip + "\n" I implemented the code[1] for Task 2 to understand the output of each function and variables types. I used a top-down approach, focussing on what I was getting as output and what I need and where I can make changes possibly.

For this task, I started with understanding function/use of sparql. But understood that gives link of the QIDs which have these PNumber and QNumber linked on the page. Hence, it would have not solved my problem, atleast.

I learned about the various classes like pywikibot.Page and pywikibot.Itempage difference, using help function, it informed me that latter inherits from former. That latter extracts information from the WikiData Item Page of a QID, while pywikibot.Page extracts data from the Article Page of the QID. getfunction results a dictionary for pywikibot.Itempage class, while a string from pywikibot.Page.

I digged in sparesite to understand that get the I can utilise the request function to extract the data and use split function to go to string granules which have the needed information and apply regex to extract the Pnumber and Qnumber.