MWParserFromHell

A parser for MediaWiki wikicode
Download

MWParserFromHell Ranking & Summary

Advertisement

  • Rating:
  • License:
  • MIT/X Consortium Lic...
  • Price:
  • FREE
  • Publisher Name:
  • Ben Kurtovic
  • Publisher web site:
  • https://github.com/earwig/

MWParserFromHell Tags


MWParserFromHell Description

MWParserFromHell is a Python package that provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode. It supports Python 2 and Python 3.Developed by Earwig with help from Σ.InstallationThe easiest way to install the parser is through the Python Package Index, so you can install the latest release with pip install mwparserfromhell (get pip). Alternatively, get the latest development version:git clone git://github.com/earwig/mwparserfromhell.gitcd mwparserfromhellpython setup.py installYou can run the comprehensive unit testing suite with python setup.py test.UsageNormal usage is rather straightforward (where text is page text):>>> import mwparserfromhell>>> wikicode = mwparserfromhell.parse(text)wikicode is a mwparserfromhell.wikicode.Wikicode object, which acts like an ordinary unicode object (or str in Python 3) with some extra methods. For example:>>> text = "I has a template! {{foo|bar|baz|eggs=spam}} See it?">>> wikicode = mwparserfromhell.parse(text)>>> print wikicodeI has a template! {{foo|bar|baz|eggs=spam}} See it?>>> templates = wikicode.filter_templates()>>> print templates>>> template = templates>>> print template.namefoo>>> print template.params>>> print template.get(1).valuebar>>> print template.get("eggs").valuespamSince every node you reach is also a Wikicode object, it's trivial to get nested templates:>>> code = mwparserfromhell.parse("{{foo|this {{includes a|template}}}}")>>> print code.filter_templates()>>> foo = code.filter_templates()>>> print foo.get(1).valuethis {{includes a|template}}>>> print foo.get(1).value.filter_templates(){{includes a|template}}>>> print foo.get(1).value.filter_templates().get(1).valuetemplateAdditionally, you can include nested templates in filter_templates() by passing recursive=True:>>> text = "{{foo|{{bar}}={{baz|{{spam}}}}}}">>> mwparserfromhell.parse(text).filter_templates(recursive=True)Templates can be easily modified to add, remove, alter or params. Wikicode can also be treated like a list with append(), insert(), remove(), replace(), and more:>>> text = "{{cleanup}} '''Foo''' is a ]. {{uncategorized}}">>> code = mwparserfromhell.parse(text)>>> for template in code.filter_templates():... if template.name == "cleanup" and not template.has_param("date"):... template.add("date", "July 2012")...>>> print code{{cleanup|date=July 2012}} '''Foo''' is a ]. {{uncategorized}}>>> code.replace("{{uncategorized}}", "{{bar-stub}}")>>> print code{{cleanup|date=July 2012}} '''Foo''' is a ]. {{bar-stub}}>>> print code.filter_templates()You can then convert code back into a regular unicode object (for saving the page!) by calling unicode() on it:>>> text = unicode(code)>>> print text{{cleanup|date=July 2012}} '''Foo''' is a ]. {{bar-stub}}>>> text == codeTrueLikewise, use str(code) in Python 3.Integrationmwparserfromhell is used by and originally developed for EarwigBot; Page objects have a parse method that essentially calls mwparserfromhell.parse() on page.get().If you're using PyWikipedia, your code might look like this:import mwparserfromhellimport wikipedia as pywikibotdef parse(title): site = pywikibot.get_site() page = pywikibot.Page(site, title) text = page.get() return mwparserfromhell.parse(text)If you're not using a library, you can parse templates in any page using the following code (via the API):import jsonimport urllibimport mwparserfromhellAPI_URL = "http://en.wikipedia.org/w/api.php"def parse(title): raw = urllib.urlopen(API_URL, data).read() res = json.loads(raw) text = res.values() return mwparserfromhell.parse(text)Product's homepage


MWParserFromHell Related Software