Privacy Policy Cookie Policy Terms and Conditions User talk:Yurik/Interwiki Bot FAQ - Wikipedia, the free encyclopedia

User talk:Yurik/Interwiki Bot FAQ

From Wikipedia, the free encyclopedia

Contents

[edit] Introduction

ATTENTION: Most of messages I get are in the form

Your bot keeps adding interwiki xx:xxxxx to the article xx:xxxxx. I have removed it 10 times, but it still does it. They are not the same articles. Please make it stop!!!.

Before you remove it again, or leave an angry message on my talk page, please understand why it happens.

The bot does not know anything about the subject matter, nor does it care if they are the same or not. If the bot placed a link, it means that the link already exists somewhere else, and it just got copied. Removing it on one page will not fix the problem - somewhere some human made a mistake of linking two unrelated articles, and bot propagated that mistake to another site (see more details below). To fix it, you must manually remove all the bad links. If just one remains, it will come back. I am still working on the web-based tool to make the removal easier, but it is not ready yet.


[edit] I would like the bot to run on language XX...

To let the bot run on a new language, you must first put a note at Requests for bot status. Once the flag is granted, i will add it to the list.

[edit] How to change many interwikies at once

See Interwiki Conflict Resolver tool - eventually it will be real time, but for now use it to tell me what needs to get done.

[edit] How does the bot operate?

The bot is given a single site (in this example, ru). The bot takes one page, and looks at all interwiki links to other sites. It then takes interwikies from all those sites. The process is repeated until there are no new links from any of the sites. If there is no more than one page per site, the bot places links to all found sites on all the pages involved. As a result, all pages become interlinked.
Example: ru:Wikipedia has links to en and fr, fr has links to zh, fr, and da, etc... As the result, the list will include pages from ru,en,fr,zh,da, and any other found. As long as each site has only one page, bot will place links to all found pages on each one of them.
Conflicts: If bot finds more than one page on any of the sites, it stops and asks operator for help. The operator has to analyze each page and choose one page that most accurately reflect the original topic. Once all conflicts are resolved, all pages are updated with the new information.

[edit] Is there a dictionary bot to find new links?

No. The bot operates only on the links found on the given page, and uses them to discover more links.

[edit] What about dates, years, etc?

The bot knows about different years and date formats used on different sites. Enter more formats here: User:Yurik/Formats. For example, February 25 on en is recognized as 25th day in February, and is matched with corresponding day in all other known sites, if they have it. There is no need to have any interwiki links. At present bot recognizes years AD/BC, decades AD/BC, centuries AD/BC, millenniums AD/BC, and Days of the month. It correctly handles Arabic and Roman numerals, and knows the sites that decided that year 2000 is in the 21st century.

[edit] The bot keeps adding back an incorrect link to site xx, what should I do?

  • This tool was designed to help users sort out these kinds of problems, but the tool is not fully complete. Use it to tell me how links should be resolved.
One or more of the sites found during discovery also point to site xx (see #How does the bot operate?).
Any of the following solutions can be used to solve this problem:
  • Find or create the correct page on site xx, and fix just one of the other site's pages with a new link instead of the existing one.
The bot will see two links to site xx, and will ask operator what to do.
or
  • Edit the page on xx to link with the proper existing page on other sites, thus also causing a conflict.
or
  • Go through all the pages and remove offending link everywhere (but remember -- if you miss just one, it will come back)

Example: en, ru, ja, and ko are all interconnected. ko describes some other topic than the first 3. Removing it on just ru will not help, as all other sites still point to it. To fix this, create or find a page on ko that matches the topic and edit just one site, like en to point to new ko page. Alternatively, find the topic of ko site on either en, ru, or ja and change ko page to point to it.

[edit] The bot deleted a link, but i know it's there!

The links are case sensitive, please make sure the link has the same case as the article.

[edit] Why is bot replacing non-Latin characters with question marks or blanks?

It's not. Your computer has no appropriate font installed, so for example Chinese or Japanese characters will appear as question marks. The links still work and will get you to the proper page (you probably won't be able to read it, as most of those characters will also be question marks). The reason for bot to do this is to get rid of the unreadable html Unicode notation (like ? used to be written as 國). The ease of use should be self-evident.

[edit] Bot is adding empty links to other sites

See #Why is bot replacing non-Latin characters with question marks or blanks? above.

[edit] Why should the bot change all sites at once?

To find all linked pages, the bot needs to check all linked sites (count N). Afterwards, the bot used to change just one page. Other sites were running their own bots, that also checked N sites and changed one. The total server load was N sites * N reads + N writes. Changing all sites at once allows total server load to be N reads + N writes -- a very significant improvement.
Another reason is that when sites are kept in sync, if some site renames the page A into AA, that change is immediately seen everywhere. If later some decides that A should be a topic of its own, there will be no conflict, as no site is pointing to A, only to AA. This is a fairly common scenario I had to resolve.

[edit] Disambiguation handling

When running in autonomous mode, bot checks if the page is a disambig or not, and makes sure that all the other pages it links to have the same status. This means that when page A has a disambiguation template, all linked pages must also have a disambiguation template, otherwise they will be ignored. The reverse is also true - a regular page link to a disambig page will also be ignored.

[edit] The bot is hiding vandalisms!

Please be aware that there is an option to hide bot edits from your watchlist and from recentchanges. Alternatively, choose 'expand view' for the watchlists and RC in your preferences. That way you'll be able to observe all human edits, even if a bot made an edit afterwards.

[edit] The bot replaces one link with another

Sometimes bot will modify a link to a site by replacing it with another link to that same site. This may happen for one of two reasons:

  1. The target is a redirect, in which case bot will link to the actual page rather than going through a redirect. Redirects are automatically created when the page is given a new name.
  2. The target is a disambiguation page, yet another linked page in another language has a link to a non-disambiguation page. Regular page is always chosen instead of a disambig.
THIS WEB:

aa - ab - af - ak - als - am - an - ang - ar - arc - as - ast - av - ay - az - ba - bar - bat_smg - be - bg - bh - bi - bm - bn - bo - bpy - br - bs - bug - bxr - ca - cbk_zam - cdo - ce - ceb - ch - cho - chr - chy - closed_zh_tw - co - cr - cs - csb - cu - cv - cy - da - de - diq - dv - dz - ee - el - eml - en - eo - es - et - eu - fa - ff - fi - fiu_vro - fj - fo - fr - frp - fur - fy - ga - gd - gl - glk - gn - got - gu - gv - ha - haw - he - hi - ho - hr - hsb - ht - hu - hy - hz - ia - id - ie - ig - ii - ik - ilo - io - is - it - iu - ja - jbo - jv - ka - kg - ki - kj - kk - kl - km - kn - ko - kr - ks - ksh - ku - kv - kw - ky - la - lad - lb - lbe - lg - li - lij - lmo - ln - lo - lt - lv - map_bms - mg - mh - mi - mk - ml - mn - mo - mr - ms - mt - mus - my - mzn - na - nah - nap - nds - nds_nl - ne - new - ng - nl - nn - no - nov - nrm - nv - ny - oc - om - or - os - pa - pag - pam - pap - pdc - pi - pih - pl - pms - ps - pt - qu - rm - rmy - rn - ro - roa_rup - roa_tara - ru - ru_sib - rw - sa - sc - scn - sco - sd - se - searchcom - sg - sh - si - simple - sk - sl - sm - sn - so - sq - sr - ss - st - su - sv - sw - ta - te - test - tet - tg - th - ti - tk - tl - tlh - tn - to - tokipona - tpi - tr - ts - tt - tum - tw - ty - udm - ug - uk - ur - uz - ve - vec - vi - vls - vo - wa - war - wo - wuu - xal - xh - yi - yo - za - zea - zh - zh_classical - zh_min_nan - zh_yue - zu

Static Wikipedia 2008 (no images)

aa - ab - af - ak - als - am - an - ang - ar - arc - as - ast - av - ay - az - ba - bar - bat_smg - bcl - be - be_x_old - bg - bh - bi - bm - bn - bo - bpy - br - bs - bug - bxr - ca - cbk_zam - cdo - ce - ceb - ch - cho - chr - chy - co - cr - crh - cs - csb - cu - cv - cy - da - de - diq - dsb - dv - dz - ee - el - eml - en - eo - es - et - eu - ext - fa - ff - fi - fiu_vro - fj - fo - fr - frp - fur - fy - ga - gan - gd - gl - glk - gn - got - gu - gv - ha - hak - haw - he - hi - hif - ho - hr - hsb - ht - hu - hy - hz - ia - id - ie - ig - ii - ik - ilo - io - is - it - iu - ja - jbo - jv - ka - kaa - kab - kg - ki - kj - kk - kl - km - kn - ko - kr - ks - ksh - ku - kv - kw - ky - la - lad - lb - lbe - lg - li - lij - lmo - ln - lo - lt - lv - map_bms - mdf - mg - mh - mi - mk - ml - mn - mo - mr - mt - mus - my - myv - mzn - na - nah - nap - nds - nds_nl - ne - new - ng - nl - nn - no - nov - nrm - nv - ny - oc - om - or - os - pa - pag - pam - pap - pdc - pi - pih - pl - pms - ps - pt - qu - quality - rm - rmy - rn - ro - roa_rup - roa_tara - ru - rw - sa - sah - sc - scn - sco - sd - se - sg - sh - si - simple - sk - sl - sm - sn - so - sr - srn - ss - st - stq - su - sv - sw - szl - ta - te - tet - tg - th - ti - tk - tl - tlh - tn - to - tpi - tr - ts - tt - tum - tw - ty - udm - ug - uk - ur - uz - ve - vec - vi - vls - vo - wa - war - wo - wuu - xal - xh - yi - yo - za - zea - zh - zh_classical - zh_min_nan - zh_yue - zu -

Static Wikipedia 2007:

aa - ab - af - ak - als - am - an - ang - ar - arc - as - ast - av - ay - az - ba - bar - bat_smg - be - bg - bh - bi - bm - bn - bo - bpy - br - bs - bug - bxr - ca - cbk_zam - cdo - ce - ceb - ch - cho - chr - chy - closed_zh_tw - co - cr - cs - csb - cu - cv - cy - da - de - diq - dv - dz - ee - el - eml - en - eo - es - et - eu - fa - ff - fi - fiu_vro - fj - fo - fr - frp - fur - fy - ga - gd - gl - glk - gn - got - gu - gv - ha - haw - he - hi - ho - hr - hsb - ht - hu - hy - hz - ia - id - ie - ig - ii - ik - ilo - io - is - it - iu - ja - jbo - jv - ka - kg - ki - kj - kk - kl - km - kn - ko - kr - ks - ksh - ku - kv - kw - ky - la - lad - lb - lbe - lg - li - lij - lmo - ln - lo - lt - lv - map_bms - mg - mh - mi - mk - ml - mn - mo - mr - ms - mt - mus - my - mzn - na - nah - nap - nds - nds_nl - ne - new - ng - nl - nn - no - nov - nrm - nv - ny - oc - om - or - os - pa - pag - pam - pap - pdc - pi - pih - pl - pms - ps - pt - qu - rm - rmy - rn - ro - roa_rup - roa_tara - ru - ru_sib - rw - sa - sc - scn - sco - sd - se - searchcom - sg - sh - si - simple - sk - sl - sm - sn - so - sq - sr - ss - st - su - sv - sw - ta - te - test - tet - tg - th - ti - tk - tl - tlh - tn - to - tokipona - tpi - tr - ts - tt - tum - tw - ty - udm - ug - uk - ur - uz - ve - vec - vi - vls - vo - wa - war - wo - wuu - xal - xh - yi - yo - za - zea - zh - zh_classical - zh_min_nan - zh_yue - zu

Static Wikipedia 2006:

aa - ab - af - ak - als - am - an - ang - ar - arc - as - ast - av - ay - az - ba - bar - bat_smg - be - bg - bh - bi - bm - bn - bo - bpy - br - bs - bug - bxr - ca - cbk_zam - cdo - ce - ceb - ch - cho - chr - chy - closed_zh_tw - co - cr - cs - csb - cu - cv - cy - da - de - diq - dv - dz - ee - el - eml - en - eo - es - et - eu - fa - ff - fi - fiu_vro - fj - fo - fr - frp - fur - fy - ga - gd - gl - glk - gn - got - gu - gv - ha - haw - he - hi - ho - hr - hsb - ht - hu - hy - hz - ia - id - ie - ig - ii - ik - ilo - io - is - it - iu - ja - jbo - jv - ka - kg - ki - kj - kk - kl - km - kn - ko - kr - ks - ksh - ku - kv - kw - ky - la - lad - lb - lbe - lg - li - lij - lmo - ln - lo - lt - lv - map_bms - mg - mh - mi - mk - ml - mn - mo - mr - ms - mt - mus - my - mzn - na - nah - nap - nds - nds_nl - ne - new - ng - nl - nn - no - nov - nrm - nv - ny - oc - om - or - os - pa - pag - pam - pap - pdc - pi - pih - pl - pms - ps - pt - qu - rm - rmy - rn - ro - roa_rup - roa_tara - ru - ru_sib - rw - sa - sc - scn - sco - sd - se - searchcom - sg - sh - si - simple - sk - sl - sm - sn - so - sq - sr - ss - st - su - sv - sw - ta - te - test - tet - tg - th - ti - tk - tl - tlh - tn - to - tokipona - tpi - tr - ts - tt - tum - tw - ty - udm - ug - uk - ur - uz - ve - vec - vi - vls - vo - wa - war - wo - wuu - xal - xh - yi - yo - za - zea - zh - zh_classical - zh_min_nan - zh_yue - zu