Using Semantic MediaWiki for creating standards for professional activity

I think that 97 percent of users underestimate the power of Semantic MediaWiki. It’s not just a tool that will help you to manage your encyclopedia. In fact using wikis for just encyclopedias or for documentation is just one dumb, staightforward use case.

On SMWCon I’ve desribed how we’ve used SMW for creating standards by the professional communities. The result of their effort is one document – the standard of the professional activity. It describes stuff you have to know, actions you have to perform at job and qualities you have to have in order to be a professional.

In order to produce a good document we have to structure the process of creation. In other words we have to follow specific methodology that ensures that all skills, knowledge and qualities are properly described. The methodology states how to structure the building blocks of the standard and their relations and depencies. Of course we need to develop the other part of the methodology – the social part. It will describe the process of creation itself. Just imagine, to create the standard that describes a manager you have to have various levels of managers – from linear managers to CEO. These guys will desagree with each other and you need to somehow create a document with their consensus.

That’s what Semantic MediaWiki is good for: SUPPORTING COMPLICATED METHODOLOGIES. That’s what my talk is about:

There was a great guy at the conference, Alexander Gesinn. He is in 3% of people that understand what SMW can be used in. And he’s making a business in business process management. What I was amazed about is how he managed to pack it: his semantic::apps is really a tool for BPM, it’s no bloody general-purpose-wiki. It’s a product!!!

Results of SMWCon and possible future of Semantic MediaWiki

SMWCon Fall logo. SMW logo blended with Berlin TV Tower

SMWCon conference have just ended and I’ve successfully recovered after it. It was the biggest conference ever: we’ve got almost 90 participants. It was also a very interesting one: we’ve got business talks and scientific talks, talks about open governement and talks about enterprise wikis and a lot more.

Semantic MediaWiki remains to be one of the rare projects that related to Semantic Web and at the same time doesn’t rely solely upon grants to be alive. In fact, most of the core developers of the platform have nothing to do with linked data and semantic web and I think that’s good.

I was a Program Chair, in other words I was responcible for a content of the conference. Unfortunately I couldn’t reach anyone from DataRangers team: these dudes would be most welcomed guests in the conference because they really can talk how to turn semantic wiki into a focused business solution. Still, Alexander from was amazing too: you could never guess that his semantic::apps solution is SMW inside.

I think that for Semantic MediaWiki future several things are quite important and I’m going to contentrate on them:

  • proper positioning. What SMW is good for? It’s not clear from the website.
  • outreach. MORE PEOPLE. We need the community as big as Joomla/Drupal/Wordpress
  • funding of the development. It’s an old notice that if you put some money into some project, it may become more successfull

About the positioning I’d say that we have to distance from Semantic Web. It was good to have big grants, back then in 2006, but now the dissapointment in Semantic Web grows – despite the fact that many SW-technologies have become part of everyday Web. In my opinion we have to focus the following audience:

  • open government data people. There are some good use cases of that, nost notably NYCPedia
  • enterprise wikis. Most of them are behind firewalls and it’s sometimes not easy to contact the people from there. But they really have proper requirements and funding too.
  • Documentation projects. Webplatform is the one, recently Parson Communications started to use SMW for technical writing.
  • not Semantic Web research projects from bioinformatics, neuroscience, engineering. For example I’ve just stumbled upon the Texas Instruments wiki. BlueBrain project also wants to use us.
  • consultancies that use SMW for supporting some particular methodology. Examples are WikiVote with standards and roadmaps, with Business Process Modeling

The funding part is especially interesting. So far I see the following ways to bring money and/or workforce into Semantic MediaWiki:

  • If you’re a professor you can have your bachelors to make projects that affect Semantic MediaWiki. Ideal project here will be to measure the efficiency, latency, speed for different storages.
  • If you’re using SMW for your research and have a grant you can use the grant money to sponsor features you need.
  • If you’re developer you can try applying for Wikimedia Individual Engagement Grant or be a mentor in Google Summer of Code. This is how I and Stephan pushed Semantic Glossary a little bit forward.
  • If you can’t spend your money but have qualifield people that ready to make improvements in SMW, please commit them back to core! The whole Linux Kernel works this way: there is a lot of people from Intel, NVidea and IBM people that write open source code in their working hours.
  • If you’re person or a company and you need small feature (e.g. new result format), hire one of the developers: he will write it for you and make it open source thus providing support
  • If you’re a company and you need BIG feature (e.g. speeding up, support for new store, new parameters for #ask, etc.) you can try to ask how many more companies need that. If it’s a common demand we can create a fund where every company put some money.

During the conference I’ve proposed this last model of groupfunding and I’m very eager to try it in action. I haven’t heard of this model in any of the projects but I think that it can just work in our case. Here is the poster I’ve presented at the conference about that:


What I’m thinking about now is the pilot project that will be supported with this groupfunding. It should be something medium-sized and long awaited, something that will interest many companies. Some candidates for the pilot feature I’ve come up so far

  • measuring performance. That is load testing. Many parties want to know how the amount of properties and subobjects affects latency/responce time. How does it work for RDF store? Is it quicker? How much quicker? How about the memory consumption? With and without cache?Ā 
  • Developer documentation. This is tricky, every time sombody ask Jeroen about the proper way to do anything he answers that currently the code is a big mess and SMW will have new cool developer API soon. But anyway have description of something would already be good.
  • fully-fledged SPARQL support. That is, support make inline SPARQL quesries to work with all kinds of result formats
  • Stuff from our questionnaries, for example:
    • support displaying linked properties like “?a.b”,
    • greater support for ORs and ANDs,
    • more up-to-date display of data,
    • free-text search features in queries
    • brackets/braces for complex queries
    • things for forms and page schemas, for example visual editing of forms, WYSIWYG support in textareas, automatic escaping for form field content
    • Access fucking control. I know, it depends on many other factors. šŸ™
  • Support for a new storage that will boost the performance. Maybe MongoDB?
  • Semantification
  • Custom datatypes

Any other ideas?

Page Schemas screencast

Page Schemas is a great extension for Semantic MediaWiki that allows us to generate the structure of our websites from the XML description called schema. I’ve fallen in love with this extension long before it have been implemented (back then Yaron wanted to call it Semantic Schemas).

So, Page Schemas works like this:

  1. you go to a category page;
  2. you define all the properties, templates and forms that will be used for the pages of this category – this is called the schema;
  3. you push the Generate button and enjoy the result;
  4. if you need to make improvements in the structure of your schema (like add some fields, change the allowed values for properties or modify the template) – you edit the previously defined schema and re-generate all the pages
Page Schemas look like Create a class special page but:
  • It’s more powerful and beautiful: you can define multiple templates for forms (remember ‘Add another’ button?), define the parameters of the fields (like ‘values from category’)
  • It allows you to add and remove the fields and templates without all that tediousĀ mucking about in mediawiki editor
  • it has very nice pluggable architecture and you can connect your own extension to it


Fighting spam in MediaWiki

Cleaning up spam on and on have resulted in a little tutorial and a presentation I want to show.

Ok, so here is some advice that can alone improve your spam situation.

  1. Healthy community. If your wiki has enough users who care about the content, they will clean up the spam manually.
  2. Heuristics and little tricks. For example you can set the waiting period for all the newly-registered users. That will help because the majority of bots register and immediately write something
  3. Questy captcha instead of any other captcha
  4. If nothing helps use behavior analysis tool called AbuseFilter