User:Pier4r/ProposedContributions

= Overview = Proposed contributions (mostly to talk pages) that got removed, so, instead of making an edit war (that for me takes only energy and is a unproductive. Especially when one removes first instead of talking), I prefer to put those here an maybe link them from the talk page.

For the ones wanting to surf the wiki through random articles
I have made a collection of script that first collects the list of wiki pages and then generates random links every day. More info:
 * https://www.assembla.com/spaces/various-works-only-code/git-2/source/master/shell-posix/linux/wards_wiki_index


 * To collect wiki links
 * 1) !/bin/sh


 * 1) Do not forget to disable word wrap
 * 2) else you get with some long lines.

true_value=0 false_value=1
 * 1) const
 * 1) const

base_url="http://c2.com/cgi/wiki?" wiki_page_extension_url="WardsWiki" temporary_dl_filename="temp.html" temporary_filename="temp2.txt"

working_dir=/tmp/wards_wiki_index downloaded_pages_dir=${working_dir}/downloaded cache_dir=${working_dir}/cache cache_wiki_url_dir=${cache_dir}/links cache_wiki_urls_file=${cache_dir}/to_check_links.txt wiki_camelcase_urls_dir=${working_dir}/wiki_links wiki_downloaded_urls_file=${wiki_camelcase_urls_dir}/wiki_links.txt

seconds_between_downloads=5

logical_name_dir="not used" if test -z "${logical_name_dir}" ; then : #nop fi
 * 1) input var
 * 1) input var

wiki_urls_to_check=1 checked_wiki_urls_line=1 wiki_dl_errors=1 seconds_now=0 seconds_last_download=$( date +"%s" )
 * var
 * var


 * 1) functions
 * 1) functions

download_web_page{ #parameters local web_page_url=${1} local output_file_path=${2} #internal local was_successful=${true_value}

for counter in $(seq 1 3); do   wget "${web_page_url}" -O ${output_file_path}  --limit-rate=1k #because wardswiki does not like too many wget too quickly #let's slow down the download. was_successful=$? if test ${was_successful} -eq ${true_value} ; then break fi   sleep 2 done

if test ${was_successful} -ne ${true_value} ; then echo "error in downloading" echo "${web_page_url}" exit 1 fi }

mkdir -p ${working_dir} mkdir -p ${downloaded_pages_dir} mkdir -p ${cache_dir} mkdir -p ${cache_wiki_url_dir} touch ${cache_wiki_urls_file} mkdir -p ${wiki_camelcase_urls_dir} touch ${wiki_downloaded_urls_file}
 * 1) script
 * 1) script

cd ${cache_dir}

while test ${wiki_urls_to_check} -gt 0 ; do if test $( grep -c "^${wiki_page_extension_url}\$" "${wiki_downloaded_urls_file}" ) -eq 0 ; then #test if the wiki page was already visited or not, if not, continue #in terms of load this has not a big impact, because the file will have a maximum #of 50'000 lines that are no so big for, let's say, an asus 904hd with #celeron 900 and a not fast hd.
 * 1) gets the pages, analyze them, get the links and further pages.
 * 1) gets the pages, analyze them, get the links and further pages.

seconds_now=$( date +"%s" ) if test $( expr ${seconds_now} "-" ${seconds_last_download} ) -lt ${seconds_between_downloads} ; then sleep ${seconds_between_downloads} fi   seconds_last_download=$( expr ${seconds_now} "+" ${seconds_between_downloads} ) #before the download that could require a lot of time. #calling again 'date' does not always work. Dunno why. download_web_page "${base_url}${wiki_page_extension_url}" ${temporary_dl_filename}

# grep -o 'title.*/title' wardsWiki | cut -c 7- | cut -d '<' -f 1 # grep -o -E 'wiki\?[A-Z][a-zA-Z0-9]*' wardsWiki wiki_page_title=$( grep -o 'title.*/title' ${temporary_dl_filename} | cut -c 7- | cut -d '<' -f 1 ) #grabbing the content within ' Camel Case

if test -z "${wiki_page_title}" ; then wiki_page_title="wiki_dl_errors.${wiki_dl_errors}" let wiki_dl_errors+=1 fi

cp ${temporary_dl_filename} "${downloaded_pages_dir}/${wiki_page_title}.html" #copy the page to the 'downloaded page' with the title name

echo "${wiki_page_extension_url}" >> "${wiki_downloaded_urls_file}" #save the wiki link name as downloaded

#save the wiki_urls in the page grep -o -E 'wiki\?[A-Z][a-zA-Z0-9]*' ${temporary_dl_filename} | cut -d '?' -f 2 > ${temporary_filename} #grabbing something like 'wiki?CamelCase

#the following part could be compressed in a single 'cat >>' #but the impact for now is lower than other statements. while read wiki_link_line ; do     echo "${wiki_link_line}" >> "${cache_wiki_urls_file}" let wiki_urls_to_check+=1 done < ${temporary_filename} fi

#get the next page to visit echo "${wiki_page_extension_url}" >> "${cache_wiki_urls_file}.checked" #put the checked line in a 'check' file db. wiki_page_extension_url=$( head -n 1 "${cache_wiki_urls_file}" ) #extract the new line to check let wiki_urls_to_check-=1 tail -n +2 "${cache_wiki_urls_file}" > "${cache_wiki_urls_file}.tailtmp" #http://unix.stackexchange.com/questions/96226/delete-first-line-of-a-file #remove the next page to visit from the remaining list mv "${cache_wiki_urls_file}.tailtmp" "${cache_wiki_urls_file}"

#if test -z "${wiki_page_extension_url}" ; then #if not wiki pages are retrieved, it means that we are finished #even in case of loops, since we checked the url at least once #it will get deleted and never downloaded again. #Other pages can readd it, but it will be bypassed. # wiki_urls_to_check=0 #fi done

<<documentation

Todos { - gets a wiki page, extract useful links according to observed patterns, then continue the explorations. }

Assumptions { - wards wiki links have the characteristic part 'wiki?CameCaseName' }

Tested on { - cygwin on win xp with busybox interpreter } documentation


 * To create random pages
 * 1) !/bin/sh
 * 2) actually i should use env
 * 3) documentation at the end

set -eu # stop at errors and undefined variables wiki_links_filepath='/sda3/c2_wiki_links/wiki_links.txt' file_lines_num=36631 random_links_number=20 # note that repetitions can appear
 * 1) constants
 * 1) constants

html_result_random_links_filepath='/sda3/www/pier_pub/c2_wiki_links/random_links_c2wiki.html'

c2_wiki_base_url_string='http://www.c2.com/cgi/wiki?'

double_quote_string='"'

paragraph_open_html_string=' ' paragraph_closed_html_string=' '

hyperlink_closed_html_string=''

awk_command="" wiki_page_selected_string="" wiki_url_string=""
 * 1) variables
 * 1) variables


 * 1) functions
 * 1) functions

generate_random_line_numbers { awk_command=' BEGIN{ srand; for (i=0; i < draws; i++) { print( int(max_num*rand ) + 1 ); }   } '  awk -v draws=${random_links_number} -v max_num=${file_lines_num} "${awk_command}" }


 * 1) script
 * 1) script

echo "" > "${html_result_random_links_filepath}"
 * 1) clear the previous file

for line_num in $( generate_random_line_numbers ) ; do # not efficient but first effective then efficient wiki_page_selected_string=$(awk -v line_number=${line_num} 'NR==line_number' "${wiki_links_filepath}") wiki_url_string="${c2_wiki_base_url_string}${wiki_page_selected_string}" echo "${paragraph_open_html_string}${wiki_page_selected_string}${hyperlink_closed_html_string}${paragraph_open_html_string}" >> "${html_result_random_links_filepath}" done
 * 1) fill the file with new random links

<<documentation

Purpose { - given a file with page names of the c2.com wiki (the original wiki), with the page names in camel case (see other mini project to download the wards wiki).

Create a random selection of those every day to use to navigate that wiki, that for me has a lot of interesting "frozen discussions", in a random way. Since if i navigate according to interests i need to rely heavily on bookmarks and after a while i need organization that is not easy to achieve on devices like the nook. }

Tools used { - written in vim on openwrt 12.09 over an asus 500 v2 Normally i would have used a windows notepad using winscp but training vim skills is useful, not only to appreciate it but to use it better in case of need and maybe to consider it as new main lightweight plain text / code editor - for a more efficient version i may use directly awk, because in busybox or bash i coordinate optimized tools but when i use mostly one optimized tool i may use only that instead of using a uneeded wrapper. }

Notes { - i really have to write a function, and then maintain it, to generate arbitrary long random integer number from /dev/urandom. Could be that someone did it already and just maintaining this type of stuff will take a lot of time (during the years) for a person like me, but really having /urandom almost everywhere and not having a function to quickly copy to use it is annoying. I used a very approximate function in the past but i need to improve it and i do not want to use it now.

i will use the srand from awk (thus loading the system with a lot of small processes starting and stopping) but i have to be careful on the seed to use. }

documentation