My last project at WikiVote was to create a knowledgebase for the main Russian news agency, ITAR-TASS. In the next posts I’m going to describe the project and analyse its results. This post is an introduction to the task we had.
For me it all started when I received a message from TASS telling how awesome the Semantic Web and semantic wikis are, and how much TASS needs these technologies. The head of computing in their monitor department have read my articles in habrahabr and got very excited about the dynamic way of creating the content and data semantic wikis provide.
I wondered, why does the news agency may need a wiki? How would they use it? Some time later I realized that the answer to this question is pretty straightforward: they will use it as the homemade Wikipedia, that is: they will store facts about people, events, places etc.
You may ask, “why not use the actual Wikipedia for that”? Well, they can’t do that because of the following reasons:
Inaccuracy. The main news agency that represent Russia can’t rely on a crowdsourced encyclopedia. They’re afraid that the facts in Wikipedia will be inaccurate.
Position. Of course both wikipedia and the news agencies try to remain as neutral as possible. Sometimes however, it’s hard to describe the events and things in a neutral way, so TASS can’t rely on a crowdsourced position of Wikipedia
Proper accents. For example how are we going to describe Arnold Schwarzenegger – as a politics, bodybuilder or actor?
Besides I did my best to explain that non-semantic wikis sucks when the amount of facts is big enough and there is not a lot of editors working on the content.
So, the project began. Our system ment to replace this knowledgebase:
Yeah, there is a whole floor in TASS full of these card indexes. The smell of the paper and dust returned be to my childhood when I’ve spent days in the public library where my mother worked. The API of the knowledge base is the telephone and the old lady who know how the indexes are organized.
I think that 97 percent of users underestimate the power of Semantic MediaWiki. It’s not just a tool that will help you to manage your encyclopedia. In fact using wikis for just encyclopedias or for documentation is just one dumb, staightforward use case.
On SMWCon I’ve desribed how we’ve used SMW for creating standards by the professional communities. The result of their effort is one document – the standard of the professional activity. It describes stuff you have to know, actions you have to perform at job and qualities you have to have in order to be a professional.
In order to produce a good document we have to structure the process of creation. In other words we have to follow specific methodology that ensures that all skills, knowledge and qualities are properly described. The methodology states how to structure the building blocks of the standard and their relations and depencies. Of course we need to develop the other part of the methodology – the social part. It will describe the process of creation itself. Just imagine, to create the standard that describes a manager you have to have various levels of managers – from linear managers to CEO. These guys will desagree with each other and you need to somehow create a document with their consensus.
That’s what Semantic MediaWiki is good for: SUPPORTING COMPLICATED METHODOLOGIES. That’s what my talk is about:
There was a great guy at the conference, Alexander Gesinn. He is in 3% of people that understand what SMW can be used in. And he’s making a business in business process management. What I was amazed about is how he managed to pack it: his semantic::apps is really a tool for BPM, it’s no bloody general-purpose-wiki. It’s a product!!!
Timeline is one of the most impressive result format in Semantic MediaWiki. But SIMILE timeline looks very old-fashioned and its code is a great mess. Moreover many of our clients find it too complicated and I totally agree with them:
It’s open source
It’s clean and simple and it really appears interactive to me. It seems like it invites you to click and zoom it
It allows you to insert complex html into event boxes
It’s damn powerful: they have clustering, mobile version, jquery themes support, skin support, grouping and editing (I’m thrilled to support that in SMW)
it has beautiful documentation for programmer. One of the best documentation I’ve seen since I program on Qt.
it uses Google API and thus it’s illegal to use it without internet (though technically possible)
Go to extension page and download this nice result format and try it in your projects! If you need some specific features to be added, just ask me. As always, if I have time to implement them in my free time, I’ll gladly do so. Otherwise, funding is also possible.
SMWCon conference have just ended and I’ve successfully recovered after it. It was the biggest conference ever: we’ve got almost 90 participants. It was also a very interesting one: we’ve got business talks and scientific talks, talks about open governement and talks about enterprise wikis and a lot more.
Semantic MediaWiki remains to be one of the rare projects that related to Semantic Web and at the same time doesn’t rely solely upon grants to be alive. In fact, most of the core developers of the platform have nothing to do with linked data and semantic web and I think that’s good.
I was a Program Chair, in other words I was responcible for a content of the conference. Unfortunately I couldn’t reach anyone from DataRangers team: these dudes would be most welcomed guests in the conference because they really can talk how to turn semantic wiki into a focused business solution. Still, Alexander from Gesinn.it was amazing too: you could never guess that his semantic::apps solution is SMW inside.
I think that for Semantic MediaWiki future several things are quite important and I’m going to contentrate on them:
proper positioning. What SMW is good for? It’s not clear from the website.
outreach. MORE PEOPLE. We need the community as big as Joomla/Drupal/Wordpress
funding of the development. It’s an old notice that if you put some money into some project, it may become more successfull
About the positioning I’d say that we have to distance from Semantic Web. It was good to have big grants, back then in 2006, but now the dissapointment in Semantic Web grows – despite the fact that many SW-technologies have become part of everyday Web. In my opinion we have to focus the following audience:
open government data people. There are some good use cases of that, nost notably NYCPedia
enterprise wikis. Most of them are behind firewalls and it’s sometimes not easy to contact the people from there. But they really have proper requirements and funding too.
Documentation projects. Webplatform is the one, recently Parson Communications started to use SMW for technical writing.
not Semantic Web research projects from bioinformatics, neuroscience, engineering. For example I’ve just stumbled upon the Texas Instruments wiki. BlueBrain project also wants to use us.
consultancies that use SMW for supporting some particular methodology. Examples are WikiVote with standards and roadmaps, Gesinn.it with Business Process Modeling
The funding part is especially interesting. So far I see the following ways to bring money and/or workforce into Semantic MediaWiki:
If you’re a professor you can have your bachelors to make projects that affect Semantic MediaWiki. Ideal project here will be to measure the efficiency, latency, speed for different storages.
If you’re using SMW for your research and have a grant you can use the grant money to sponsor features you need.
If you can’t spend your money but have qualifield people that ready to make improvements in SMW, please commit them back to core! The whole Linux Kernel works this way: there is a lot of people from Intel, NVidea and IBM people that write open source code in their working hours.
If you’re person or a company and you need small feature (e.g. new result format), hire one of the developers: he will write it for you and make it open source thus providing support
If you’re a company and you need BIG feature (e.g. speeding up, support for new store, new parameters for #ask, etc.) you can try to ask how many more companies need that. If it’s a common demand we can create a fund where every company put some money.
During the conference I’ve proposed this last model of groupfunding and I’m very eager to try it in action. I haven’t heard of this model in any of the projects but I think that it can just work in our case. Here is the poster I’ve presented at the conference about that:
What I’m thinking about now is the pilot project that will be supported with this groupfunding. It should be something medium-sized and long awaited, something that will interest many companies. Some candidates for the pilot feature I’ve come up so far
measuring performance. That is load testing. Many parties want to know how the amount of properties and subobjects affects latency/responce time. How does it work for RDF store? Is it quicker? How much quicker? How about the memory consumption? With and without cache?
Developer documentation. This is tricky, every time sombody ask Jeroen about the proper way to do anything he answers that currently the code is a big mess and SMW will have new cool developer API soon. But anyway have description of something would already be good.
fully-fledged SPARQL support. That is, support make inline SPARQL quesries to work with all kinds of result formats
Stuff from our questionnaries, for example:
support displaying linked properties like “?a.b”,
greater support for ORs and ANDs,
more up-to-date display of data,
free-text search features in queries
brackets/braces for complex queries
things for forms and page schemas, for example visual editing of forms, WYSIWYG support in textareas, automatic escaping for form field content
Access fucking control. I know, it depends on many other factors. 🙁
Support for a new storage that will boost the performance. Maybe MongoDB?
SEMAT stands for Software Engineering Method and Theory. It has been designed by Ivar Jacobson, the co-author of UML, Use Case methodology and Rational Unified Process. The project is pretty big and has many parts and I’m sure it will influence the software engineering world a great deal.
Here are the parts I can understand:
(academic part) there is an effort in SEMAT to create a solid grounding for the software engineering. For example there are dozens of processes and methodologies for software engineering: what do they have in common?
(practical part) SEMAT provides a framework to quickly analyse the software project from product point of view. What is the current state of the the requirements: do they formalized or maybe already satisfied? Do we know all the stakeholders of the product? What is the current state of engineering Team? And so on.
Of course I like the practical part more. In fact I’ve totally fallen in love with the thing and want it to be known to the whole world. Because of that when I met Ivar Jacobson at the presentation in Moscow I immediately asked him how do they want to advertise SEMAT.
It turned out that they wanted to do the following:
present SEMAT on academic events and publish papers to reach the academic world. Because of that parts of SEMAT wil have a chance to be taught at more universities.
contact the big enterprises and propose them to try SEMAT in practice. These case studies will show how efficient the methodology is and bring more practical professional to SEMAT camp
publish in good practical journals like ACM and IEEE journals to reach some part of engineering community
… and that’s it. “But hey, listen, what about reaching the simple folks? They don’t work in IBM and don’t read IEEE journals. They don’t know about CMMI or RUP and really think that it’s Alan Cooper who have brought the most powerful ideas into software design. They read Joel on software and Wired and hackernews and Reddit. How do you plan to reach them?” Well, there wasn’t much of the plan, mostly because of generation gap. And here it seems I can help.
I’ve interviewed the most active members of SEMAT community and together we’ve created this list of goals. So with SEMAT community we want to:
become more transparent
become more open for the new ideas
convert the potential of supporters to the actual force
reach more people and advertise ourselves in academia, enterprise AND SIMPLE FOLKS
reach more younger people
organize those people in a self-supportive community that can do things useful for SEMAT
To reach some of these goals I decided to start from actions that require almost no effort but will produce the big effect:
create a medium where the people can ask questions and communicate. After some trials I’ve chosen Nabble mailing list for that
another option is LinkedIn group which have existed before but have been private
start working on the guidelines for transparency and openness for the community members
gather all the existing materials such as presentations, papers, articles, book chapters
Moving on! This post is only a start, I’ll try to describe more my adventures as community manager. Let’s see if I can improve SEMAT community working on that for 1-2 hours a day!
Good news for me! My protege Zhenya have been accepted for Google Summer Of Code 2013! I and Stephan Gambke are Wikimedia Foundation mentors for the project aims inproving two MediaWiki extensions: Lingo and Semantic Glossary.
A fairly simple idea of gathering statistics from all MediaWiki’s they can find resulted in surprisingly useful website with huge potential. Wikiapiary knows which version of software your wiki use: the skin, extensions, version of MediaWiki itself.
some of the extensions are not as popular as we thought them to be. For example Semantic MediaWiki is installed only in 10 percent of the wikis
Many wiki admins don’t upgrade their software very often: pretty interesting fact in the world of WMF where nobody give a damn about being compatible with as much versions of MediaWiki as possible.
Why do I claim that Wikiapiary is a super useful website?
For start it provides the list of extensions and skins, ranked by popularity, which has never been done before. It’s important for me as wiki admin
Secondly it creates a connection between the developers of the extensions and their users. As a programmer I can monitor how popular my extension is, can ask my users to upgrade their versions in case of security risks, maybe propose custom services for them
Thirdly it creates a repository of MediaWiki websites based on which it’s possible to organize contests, write case studies, make research and share the experience about proper wiki gardening.
I think that Jamie Thingelstad, the author of Wikiapiary is doing a great job and I’ll try to help him in any way I can.
I’ve been searching for the way to export all the uploaded files in MediaWiki for about two days and now I know how to do it!
Here is the problem: the files in MediaWiki have their pages in the File: namespace. All the files are stored in images directory. The problem is that if you try to simply copy the file directory from one MediaWiki installation to another, you will not see those files in the latter. Bigger problem are the error messages you’ll get when you try to insert the file that is presented in directory but is not presented in MediaWiki database.
Here is the right way to dump MediaWiki uploads:
1. In the SOURCE wiki form the list of files you want to dump. It’s usually done with Special:AllPages page, where you pick the namespace File. Copy the list and use some Vim magic to deal with whitespaces and tabs, turning them to line breaks:
2. Suppose that the list of uploads is in the file ~/filelist on your server
When you’re making the software that is based on some open source code it’s crucially important to participate in a life of the community. Why?
First of all let’s remember why have we decided to be dependent on some OSS at the first place. Well, it’s better than making all from scratch because the other great programmers have already done all the work for us. Since this software are being used by some amount of people, the developers have probably fixed many bugs. In many cases the code that is created in opensource has some kind of code standard and is covered with some amount of unit tests. If the community is big and the projects this cored is used in are important – than someone have already thought about the security as wel. The reuse of that code is saving us a lot of money.
This code is evolving – the developers add a lot of new features, make the software faster and more beautiful. So when we’re using the open source code as a platform for our projects we’re trying hard not to patch it too much because we want to use all these updates. This is also very understandable.
So what about the original thesis? We’re making money here, why do we have to answer the newbie questions on the forums, figuring out the mistakes in the mailing lists, improving documentation and sponsoring the events that are related to that software? I’ll try to offer my opinion.
Fist of all: the more people are interested in the software – the more people eventually participate in its creation. And as I have demonstrated earlier, we consider the open source developers as our employees that create the great code, but we don’t have to pay them anything. Of course we need more of them to come.
Secondly – the more people are interested in software and intensively use it – the more bugs they report. Those are our free testers, and because of that we also want to open our own code.
Thirdly – when the software is popular it’s easier to find the hackers and wizards who can work as our employees. The more those guys on a market – the more opportunities for us to chose.
How the software become popular? Well, it has to be good and usable, have good documentation, have the community that can answer any question. You can also promote it with the advertisement and the events. As you can see it’s now the full circle.