User:Mikelzubi/sandbox Archaeology of the Patagonian coast from Wikidata and Scholia

= Archaeology of the Patagonian coast from Wikidata and Scholia = Miguel Ángel Zubimendi1, Jorge Julián Cueto2, Fernando Archuby3

Ver Normas editoriales.

Abstract (entre 200 a 500 palabras)
Background:

Results:

Conclusion:

El Equipo de Wikimedistas del Museo de La Plata (WikiMLP) es un grupo de usuarios de Wikimedia que, al mismo tiempo, se dedica  a la investigación científica. Entendemos que Wikimedia puede ser un nexo crucial entre la ciencia abierta y el conocimiento libre. Al conformarnos nos propusimos mejorar los contenidos de ciencias naturales, antropológicas y afines en los Proyectos Wikimedia, así como enriquecer las experiencias e interacciones entre éstos y la Universidad. En esta comunicación presentamos una experiencia en el uso de los Proyectos Wikimedia como infraestructura para la ciencia abierta y la investigación académica. En tal sentido, y de un modo experimental y exploratorio, iniciamos una vía de indagación específica para analizar y discutir las tendencias y cambios en el estudio de la arqueología litoral patagónica, tomando como insumo las publicaciones académicas sobre este tema desde fines del siglo XIX (línea de trabajo de investigación de uno de los autores del presente trabajo, MAZ). Para tal fin, empleamos la base de datos Wikidata y la herramienta Scholia. Tomando como marco el recorte temático, comenzamos por la búsqueda bibliográfica, que fue incorporada como elementos a Wikidata, registrando la variabilidad de fuentes (artículos en revistas científicas, trabajos en actas de congresos, tesis doctorales, libros académicos, etc.), así como toda la información asociada (autores, fecha, publicaciones, idioma, cantidad de páginas). Los trabajos fueron calificados de acuerdo con aspectos específicos dentro de la temática general, para referenciar, por ejemplo, si tratan sobre estudios comparativos, restos líticos, faunísticos o bioarqueológicos. Como paso siguiente, generamos diferentes tipos de gráficas en Scholia, que permitieron establecer tendencias en la producción académica a lo largo de casi 140 años en los estudios arqueológicos de la costa patagónica. Las formas de representación siguen criterios autorales (cantidad de trabajos por autor, números de páginas publicadas), relativos a la producción académica (tipo de publicación, número de páginas, idioma de la obra, ubicación geográfica, entre otros) y relacionales (redes de co-autorías o de citación). La potencia heurística de las representaciones gráficas facilita la identificación de distintas tendencias temáticas, geográficas o de autorías en algunos momentos, lo que podría reflejar cambios producto de desarrollos teóricos o metodológicos. Como ejemplos, se observa un énfasis en estudios de restos humanos al inicio de las investigaciones, o en estudios faunísticos a partir de la década de 1980, correlacionable con tendencias generales en la historia de la disciplina en el país. Asimismo, es posible caracterizar el contexto institucional y relacional de las investigaciones en el área, a partir de los organismos e instituciones representados en las publicaciones y redes de co-autorías. Por último, si bien se trata de un acercamiento exploratorio a Wikidata como base de datos y Scholia como herramienta analítica, podemos observar, en esta instancia, su potencial en investigaciones académicas y las posibilidades de aplicación para estudios de distintas disciplinas que combinen datos cualitativos y cuantitativos enmarcados dentro de una concepción de la ciencia abierta y el conocimiento libre.

= Short Abstract = En lenguago "plano", sencillo, no academico, y tiene que ser más corto

= Introduction = The “Equipo de Wikimedistas de la Universidad Nacional de La Plata” (Wikimedians Team of the National University of La Plata, from now on “WikiMLP”), aims to improve the content of natural sciences, anthropology and related fields in the Wikimedia Projects and enrich the experiences and interactions between them and the university institutions. In this case, we are part of the National University of La Plata, one of the largest and most prestigious of Argentina, with more than 90,000 regular students and 10,000 teaching staff. Also, it ranks in major positions in Latin America according to different evaluations (CITAS de los indices esos qeu se publican cada tanto), and is one of the most important research centers of Argentina (CITA??).

In this work we present a pilot experience in the use of Wikimedia projects as infrastructure for open science and academic research. In this sense, and in an experimental and exploratory way, we began a specific line of inquiry to analyze and discuss some trends and changes in the study of coastal archaeology in the Argentine Patagonia. To do this, we use as input different types of  academic publications on this topic, from the late 19th century to 2023. Then, we used Wikidata to create a database of publications on the subject and Scholia for data analysis and visualization. This research field is a line of work of one of the authors of this paper (MAZ) as part of his investigation as part of the National Scientific and Technical Research Council (CONICET).

Methodology
The work was approached in several steps: first, a survey of all bibliographic sources related to the archaeology of the Argentine Patagonian coast was carried out. This decision implies a restriction in terms of subject and geography, in order to delimit the data corpus and circumscribe the number of works to be analyzed. Regarding the thematic criterion, those publications that referred to coastal archaeological contexts, presented or discussed archaeological pieces obtained on the coast, or theoretical or historiographic works on coastal archaeology were included in the study. The Argentine continental Patagonian coast was considered as the coastline from the mouth of the Colorado River (Buenos Aires province) to Cabo Vírgenes (Santa Cruz province) (Figure 1).

The second step consisted of the design of a database that allows the classification of each record by type of publication. The following types of scientific texts were considered: scientific articles in academic journals (Q13442814), chapters of academic books (Q21481766),  academic books (Q7433672), proceedings of conferences (Q23927052), undergraduate (Q798134) and doctoral theses, (Q187685) and unpublished manuscripts (Q87167). This database was then used to enter and create Wikidata elements [comentario Fernando: No entiendo bien para qué esta esta oración].

At the time of creating the elements, we observed that the attributes of many properties of the publications also had to be entered in Wikidata, so that, in parallel, elements were created for all the authors of the works (P50) and the publications (P1433). Likewise, in the case of those published in books, elements were created for the editors (P98). For undergraduate or doctoral theses, the institution where they were presented (P4101) and the number of pages (P1104) were also incorporated. It should be noted that, since this is a study of the development of a specific thematic field, where we are interested in addressing not only the quantitative aspect of the publications but also the institutional dimensions. Hence, when including information about the authors, we considered their workplaces and the information that was available both in the institutional affiliations of the publications and on the websites of the Science and Technology agencies [comentario Fernando: no entinedo esto (la oración].

In all cases, a series of data was recorded to allow the measurement of bibliometric indicators to evaluate the bibliographic production on the subject (Table 1): title of the work (P1476), author (P50), language (P407), date of publication (P577), the page range (P304) and the total number of pages (P1104). In the particular case of journals, books and book chapters, the place of publication (P1433), volume (P478) and publication number (P433) were included. In all cases, the link to the full work (P953) and the DOI (P356) were also recorded, if these were available. The works were also assigned several main subjects of the work (P921), one of them included in all to make visualizations in Scholia, which we created specifically and called "archaeology of the Argentine Patagonian coast" [Fernando: Quizás haya que dejar esto en español, porque así quizás no lo encuentren.] (Q115632712); others of geographical scope, corresponding to the provinces referred to in each work; and then a series of general thematic descriptors, such as: lithic studies (Q115634413), zooarchaeology (Q318668), underwater archaeology (Q765822), bioarchaeology (Q13404081) or archaeomalacology (Q5705097), which were later grouped into larger categories to facilitate comparison. Table 1 summarizes the categories, elements used and statements in Wikidata for each type of publication.

The third step consisted of loading and creating elements and statements in Wikidata. This was initially done manually and individually, and later the OpenRefine tool was used for the creation and management of elements in batches, and QuickStatements for the massive upload of data.

As a final step, queries were generated using Wikidata Query Service, as well as plots using the Scholia service, a tool for the management and visualization of scientific bibliographic information through Wikidata. Scholia uses the Wikidata identifier (QID) to perform the analyses, so that each Wikidata element can be viewed according to different "aspects" (author, work, organization, publisher, founder, etc.). In our case, we used the data visualization option by topics (main subject of the work; P921), and used "archaeology of the Argentine Patagonian coast" [Fernando: dejar en español] (Q115632712) as a filter. In addition, spreadsheets were used for the analysis and visualization of some variables extracted using the Wikidata Query Service.

Results
In total, up to now we have incorporated 603 published works on the archaeology of the Patagonian coast, published between 1864 and 2023. In turn, 314 authors and 154 publications have been entered, considering both academic journals and books, conference proceedings and theses. The distribution of the different elements created is presented in Table 2. Using Scholia, we obtained various types of tabulated information and graphs on different aspects related to publications on coastal archaeology of the Argentine Patagonia. This allowed us to generate analysis and discussions on the main study trends from their beginnings to the present, as well as formulate hypotheses about the development of the discipline at the local level.

Figure 1 represents the types of works published on this topic over time. We also generated graphs that allow analyzing changes in academic production, considering the language of the publications (Figure 2), the provinces where the work was carried out (Figure 3), the general categories of topics addressed (Figure 4), and changes in the percentage representation of authorship by gender (Figure 5). In addition, we used Scholia to analyze co-authorship networks over time, constructing two graphs representing them before and after the year 2000 (Figure 6).

= Discussion = The history of archaeological research in Argentina has been addressed on various occasions, from approaches that tried to cover the entire disciplinary history, particular aspects of its development  or analyses based on publications in specialized journals. There is also a tradition of historiographic analysis addressed at different times, especially in the last decades within the framework of a significant theoretical renewal (for example ), as well as new perspectives in recent years. As for the analysis of the development of research on the Patagonian coasts, they have been partially approached in several works, mainly aimed at characterizing the main trends throughout the history of the discipline in the country, interpreting the different forms of occupation and use of the coastlines, or even the specific exploitation of certain resources.

In this work, we make a brief characterization of the history of research on the continental Patagonian coast of Argentina based on the different graphs generated. For example, in Figure 2 there is a accumulated graph of the total publications on the archeology of the Argentine Patagonian coast. We observe a low production rate during the 19th century (Figure 2), which is consistent with the incipient development of the archaeological discipline at the national level. Then, from the beginning of the 20th century, we observe an increase in the number of papers, which is probably linked to the emergence and consolidation of specialized research centres in museums and universities, such as those in Buenos Aires and La Plata. In this way, articles begin to appear, mostly in journals published by scientific institutions related with natural sciences (por ejemplo? se pueden mencionar algunas: Museo de La Plata, Museo de Ciencias Naturales de BsAs, Museo Etnográfico??). During these first decades, approximately until the 1930s, we also observe works published in other languages, such as French, Italian and English (Figure 2), which seem to reflect the links with other foreign institutions, such as the article in French published by Francisco P. Moreno in 1874 in the Revue d'Anthropologie; or the origin of the authors of the papers, which were living in Argentina or as a result of research stays in the country, such as the examples of the Italian Michele del Lupo, or the American William Henry Holmes, who published in 1912 some lithic artifacts recovered by Aleš Hrdlička and Bailey Willis in coastal sites of the southern Buenos Aires province. Between 1900 and 1980, a more or less constant frequency in the number of published works is observed (Figure 1). Also, as can be seen in Figure 2, the majority language is Spanish. The increase in bibliographic productions can probably be read as part of a production context that went through two stages of the development of the discipline in Argentina, one partially dominated by an evolutionary framework centred on authors such as Felix Outes and Milcíades Alejo Vignati -who have the largest number of works-; and another, which begins in 1950 and is characterized by the imprint given by the Austrian archaeologist Oswald Menghin, who settled in Argentina fleeing the European post-war period due to his clear links with Nazi philosophy in his country. Although this researcher, together with Marcelo Bórmida, gave a strong impetus to the discipline in the country, especially in the Pampas and Patagonia, as seen in Figure 1, it does not seem to be reflected in an increase in the number of publications produced in the archaeology of the Patagonian coast.

It is interesting to note that, during this stage, the authors of the works were teachers and researchers linked to history, folklore or anatomy courses. Only from the 1960s, shortly after the anthropology degrees were created at the universities of Buenos Aires and La Plata at the end of 1950's decade, and the process of institutionalization of the discipline accelerated, an expansive moment occurred and professional archaeologists began to appear. However, it is striking that immediately after the first researchers trained in the discipline emerged, there is no increase in the production of works on the coast, as seems to be the case for other areas of Patagonia, such as the interior of Santa Cruz or the far south. Somehow this could be the product of certain bias towards an interior archaeology, and an emphasis in the study of cave sites, which large and complete stratigraphic sequences, which were absent in the coast. Also, in this times there is also a conception that the ancient inhabitants lived in the past like the ethnographical Tehuelches, the people who live there who were visited and watched by sailors and travelers who visited the Patagonian coast between the 16th and 19th centuries.

Regarding the geographical scope of archaeological research (Figure 3), a lack of systematicity is observed regarding the provinces. Between 1900 and 1970, the publications came from various provinces in approximately in equal proportions, with slight predominance of some at certain times, such as Buenos Aires initially, or Río Negro during the boom of the historical-cultural archaeology driven by Bórmida in the 1960s.

Since the 1980s, Santa Cruz has become the main focus of archaeological research on the Patagonian coast, representing more than 50% in all decades (Figure 3). At the same time, a sustained, almost exponential increase in the number of publications on the continental Patagonian coast is recorded, which continues at least until the beginning of the 2020s (Figure 1). On the other hand, the first half of this stage seems to reflect the beginning of studies undertaken by different researchers, mainly from the University of Buenos Aires who were trained in the 1970s and used new approaches and perspectives, leaving aside the historical-cultural paradigm of their professors; even discussing some assumptions made previously with new theoretical and methodological paradigms. In turn, there is greater thematic diversity, with research oriented towards new problems and materialities. These new researchers achieved their academic consolidation between 1980 and 2000, at which time they began to train a new generation of archaeologists. This stage, of greater theoretical and methodological confrontation in the discipline, is reflected by the higher percentage of works related to archaeological theory (a category that includes reviews of the disciplinary history) observed in the 1980-2000 period in Figure 4.

On the other hand, since the 1980s, there has also been a proportional increase in the presence of female authors (Figure 5), a phenomenon that consolidates in the following decades (also observed for anthropology in general ), with a stabilization around a gender representation of 60% in the 1980-2019 period and a slight subsequent increase. It is also evident in Figure 1 that there is a boom in the quantity and diversity of publications, among which the existence of a large number of presentations at scientific conferences (mainly the Patagonian Archaeology Meetings since 1987 and the National Congresses of Argentine Archaeology since 1999, which are held regularly and uninterruptedly (despite the complex economic context that has characterized the country in recent decades), and whose proportional presence on the total work grows steadily during these decades.

From the year 2000 onwards, a strong increase in the number of published works is observed, as a reflection of the emergence of new research projects, which begin studies in areas of the Patagonian coast that had not previously been considered, and which are consolidated through the training of students, mainly evidenced by the presence of bachelor's and doctoral theses (Figure 1). This can also be observed by analyzing the co-authorship graph, which was scarce before that year, while afterwards, a dense network with structurally important nodes is verified (Figure 6). This moment of maturation and consolidation of research in the archaeology of the continental Patagonian coast is also consistent with expansive policies in Argentine science and greater internationalization, as evidenced by the strong increase in the last two decades in publications in English (Figure 2).

= Final remarks = The heuristic power of the graphical representations obtained through Scholia from the structured information entered into Wikidata facilitates the identification of different thematic, geographical or authorial trends. This strategy derives its strength from the quantity and representativeness of the databases. In this case, the 603 bibliographic elements included, represent exhaustively the knowledge production of the archaeology of the Patagonian coast and provide support for the assertions derived from its synthesis. Although it is an exploratory approach [Fernando: Me parece que no es exploratorio, ya que la base de datos es exhaustiva.] to Wikidata as a database and Scholia as an analytical tool, we can observe, at this instance, its potential in academic research and the possibilities of application for studies of different disciplines that combine qualitative and quantitative data framed within a conception of open science and free knowledge. This implies that this project can be improved and updated through new contributions to Wikidata, which anyone can perform [Fernando: Pregunta: si el trabajo tiene enlaces a los gráficos de Scholia, y alguien agrega nuevos trabajos, al hacer clic, ¿se actualizan los gráficos?]. At the same time, it also becomes a potential source of consultation for those interested in having bibliographic information on the subject. On the other hand, in the future, we hope to be able to improve the open database in Wikidata and refine the analytical tools for the generation of graphs that allow for better analyses. We also believe that this work can be applied as a model to evaluate other disciplines. In this way, we hope to have demonstrated the potential of using Wikimedia projects as infrastructure for open science and academic research.

= Acknowledgments = We would especially like to thank the Wikimedia Foundation for the funding of a grant that allowed us to support these investigations.

= References =