Tag Archives: software engineering

Pro Git Kindle Edition

Just in case you own a Amazon Kindle an you are working a lot with Git. And just in case you struggle with Git the same kind as I do e.g. when it comes to a rebase, have a look at Pro Git. It’s available as Kindle edition for free at the moment from amazon.de.


Although the book is from 2014, it still covers many of these daily things that are beyond a simple push or commit when using Git. Should be in every developer’s book shelf.

Geeks in Teams – Team Geek by Fitzpatrick and Collins-Sussman

teamgeekSome weeks ago, I came across a quite interesting title from O’Reilly Media, Team Geek by Brian W. Fitzpatrick and Ben Collins-Sussman. There are quite a lot of books out in the shelves trying to explain how to behave as well as how to improve your life as a professional programmer. However, most of them are either hard to read, difficult to understand or just boring while repeating stereotypes over and over.

Team Geek is quite different, though. It seems the book benefits from the experience of the authors. Both come up with a bunch of experience working at Google and probably dealing with quite a lot of people during their professional life.

The Content

Six chapters, each about 20 pages – some with topics you probably never thought about and others just confirming what you ever thought of but never believed in. And that’s basically what you most benefit from. The book shows you are not alone with your thoughts how teams and collaboration should work. It’s about you as a developer (as a human thing) but also working in a team of geeks (often not understood as humans at all).

The Truth

Based on my very own experience during my professional life, I have to acknowledge almost everything the two authors write is true. Considering the fact, the book is written based on an US American context, with different culture, people and background, most of the topics are true for European developers as well. It seems the kind of human becoming a developer is the same all over the world. Whatever if it was during my time in UK or Germany, the you can apply many of the patterns provided in the book to your daily job.

The Reader

Professional developers, managers, team leads, architects, open source developers and even designers could benefit from the book. However, I would definitely recommend to already provide some experience in this kind of business to fully understand (i.e. to feel with the authors) what’s written and to benefit from the book. Not sure if beginners (e.g. students) or juniors can benefit from the book. Eventually, the reader will find some hints how to improve his or her daily life within a world of geeks and nerds and how to strengthen the very own standing within the company or group.


  • well written and easy to read
  • chapters of the right size to read during an evening
  • nice illustrations (not a reason to buy but really nice to look at)
  • great content
  • references for further reading given


  • terrible to useless index
  • not suitable for juniors and beginners (but that’s fine)

Where From

You can order the paperback or the Kindle version from Amazon or get the entire set of digital formats directly from O’Reilly.

The Role of an Architect in Scrum

Over the last few days I (re-)thought a lot of the role of an architect in a Scrum team. I tried to avoid to read about other opinions and just thought of my very own experience over the last few years in agile teams.

There should be an architect in your organization, however, he should not be part of the Scrum team itself. Scrum is about the development process in your organization or project and as such is it a required constraint for the architecture to develop.

The architect deals with non technical issues, so called facts of life, politics, organizational constraints, budget restrictions and so on. No team member could address thees in the regular development sprints.

As an architect one should analysis and manage risks – again nothing a team member could do during a regular two or four week sprint. Considering risks probably begins long before the team is Assembled and t he project is started. In fact, as an architect one could even point out the project might not doable due to various reasons.

An architect has to run in iterative cycles as constraints change, customers come up with new and modified requirements and organizational goals mit shift over time. All this might require some redesign or evolution of the architecture. However, these architectural cycles are not bound to the development cycles of e team. They are more related to the business, customers and organization.

The architect is a technical leader. Maybe by prototypes or bullet tracing he shows how hard parts of the system can or will be addressed. As such he provides a base to the team to better estimate the complexity of the upcoming implementation. Eventually, as architect it is important to address the hardest issues first whiled the Scrum team addresses the tasks with the highest business value first.

As an architect you teach and coach your team. In Martin Fowler's article Who Needs an Architect?, published in IEEE Software, Fowler points out, that an good architect mentors his team, not sitting in his ebony tower.

Dealing with uncertainty is probably one of the most underestimated aspects of the dayjob of an architect. In the role of an architect, I have to make decisions under a high degree of uncertainty. As many aspects of a project change over time, early decisions are uncertain. However, as an architect, you have to consider the consequences of these decisions. It's a major part of your job to consider risks and work on plans wether any event occures that affects your project both, a a threat or a chance.

I contrast, in the role of an developer, you should not be confronted with uncertainty. You have a tough schedule and proper tasks to fulfill, Technologie and tools are in place and in the best case you have all the knowledge to perfomreyou task. If the requirements for your task are not clear, you probably cannot fulfill it.

I have experienced this very issue during the last few years several times. Most of the time this was caused by the lack of an architect or the position of the architect was not dealing with the role of the architect in a proper way, most of the time writing code themselfs.

Therefore, do architects write code? This might be one of the most discussed topics for software engineering. Personally, I have experienced both, architects writing (productive) code all the time a well as architects never worked in product development. As an architect you probably should provide certain coding skills. Earning your street cred before becoming an architect is inevitable. You must be able to read, understand and improve the code. However, you have to delegate the actual task of building the code base to others. Nevertheless, you probably have to improve your coding skills permanently. Therefore, an erchitect must be able to write excellent code, however he should not write the bits delivers despite prototypes and tracer bullets.

As a fact, running an agile environment does not supersede an architect. Neither does it reduntize the overall product planing and designing process. The architect is not part of the Scrum team as he does not deliver within the Scrum team. He is in a position existing before the Scrum team is assembled, he leads the direction and can hold an consulting position for the team. However, the a architect is not part of the sprint plan and as he is not part of the Scrum team as such.

Futures Software Architectures

Looking at myself, I see how different I work nowadays with devices than almost 30 years ago. In the early days of personal computers you spend a lot of time in figuring out what you actually can do with your Commodore C64 or your very first 286 hardware while knowing each component's specification. Nowadays it is simply about the available software. Most of the users probably do even not know about any technical details of the device they are using beside if it is slow or fast.

If you look at professionals who use computers, they often use one specific application, which maybe is shut down only once at the end of the week. Personal users probably don't know that there are more applications on the computer than the web browser.

As computer professionals we tend to forget to think about the why others do use computers. We see the full potential of the latest programming language, the computing power, the maximum available bandwidth and all the fancy features we know about.

Tablets such as the iPad or the new Nexus are great for end users. Quite intuitive to use, and no need to worry about the hardware. Whatever users want to to, they simply have to find the right app. I fact, I use my iPad for many common tasks, even for writing, blogging and editing images the apps are meanwhile quite well done.

Specialized applications used by various professionals do not need a fully equipped personal computer. Ever looked at a doctor's place? In every surgery you might find a personal computer running often just one program. Or have a look at a common electronic it furniture megastore. Each information desk will probably has one personal computer running one program on it. Often, these programs are typically host applications where the client continually requests information from a server application

There is no reason to put a fully equipped computer in every room for a single application. Either a thin client or some lightweight tablet might the answer here. Either a web hosted application or a small application communicating with a server (e.g. In the cloud) might be a good solution.

Cloud Hosted App

As professional software architects and designers we should consider this while designing application even if stakeholders still request old fashioned desktop applications.


Clean Interface Inheritance

Recently, I had a interesting conversation about interface inheritance with one of my colleagues. Reason was a decision necessary how to implement basic behavior based on a set of interfaces for a number of classes. At first, I was not comfortable to  let one interface inherit from another. I was quite biased from the design of the code base I am currently working on.

Generally spoken, each class inherits from its very own interface. In addition the inheritance scheme of classes is reflected in the interface inheritance as seen in the example below. Why would one come up with such a design at all?

Complexe Interface Inheritance

To understand this design (it’s still a design) some more context is required. In the particular codebase a dependency injection container is used to resolve instances of a particular type. To do so a unique identifier is required. This could be a string but also an interface. E.g. the Microsoft Extensibility Framework (MEF) makes usage of interfaces for resolving. Using MEF it is quite easy to get a set of components implementing a particular interface (e.g. some kind of IPlugin interface).

The issue I’ve seen, there are only few common interfaces in the codebase . Instead of collecting all types of a particular interface, dedicated interfaces are used to resolve particular instances of types. However, by using interfaces this way, the focus is to identify types by their interfaces not using interfaces as contracts.

Back to the issue how to design the interface inheritance, we came up with two alternate approaches, both valid indeed, but with very different design goals.

Implementing or Inheriting interfaces

In the left hand approach two types inherit from the same base type implementing a particular interface. Both types also implement another interface, i.e. both types fulfill the same contract. In the right hand approach we see both types inheriting from the same interfaces as well. While at the end both approaches will end in the same a similar result, there is a significant difference in the semantic.

Based on both approaches, we came up with two possible solutions for a .NET implementation. While this might seem quite academic to you, there is quite a difference how one might use these types.

Inheriting Interfaces vs Implementing Interfaces

As in the example before, both approaches will end up in two classes implemented identically, however, both implementations show semantic differences best seen when considering the usage of these classes. In the left hand example one could iterate through a typed list of IAlgorithm calling a Dispose method required by the IDisposable interface. This implementation is obviously contravariant. Following the right hand scheme, you might still iterate through the list, however,  before accessing the Dispose method it is required to cast the concrete instance to IDisposable. While being still contravariant, it is not implicitly possible.

The question now, is to decide when to use the fist or the second approach. Interfaces inheriting from other interfaces is absolutely valid once you can answer the question if your type is some kind of with yes. In the given example each algorithm is an IDisposable. No exception, no excuse. Choosing the second approach you should be able to answer the question whether your type needs to fulfill a particular contract with yes. If only a few algorithms need to be fulfill the contract given by the IDisposable interface, and an algorithm is not a IDisposable by default, the second approach might be the right to choose. While each algorithm is still an IAlgorithm, only some of them could implement the IDisposable interface.

Maybe this seems obvious to you, however, I still see quite experienced developers having significant problems in choosing appropriate inheritance structures. From avoiding inheritance at all to the point of using the most complex inheritance structures you might have ever seen in your programmers life. There is no right or wrong but there might be a best solution suitable to your problem. So never hesitate questioning your current design looking for a better approach. 

The Cleaner Coder

The Clean CoderI recently finished the latest book from Robert C. Martin aka Uncle Bob, called The Clean Coder.
Once finished there are many pro and cons about this book. At the beginning I was quite skeptic about the book but at the end I am glad I’ve read it to the very end.

The books is neither a set of rules that make you a better software developer nor is it really provide a code of conduct to follow in your professional life. However, if you spend some time in this industry, you will have many déjà vu moments while reading this book.

Very positive (if you a frequent reader and being interested in the people behind the books) is the fact that you will learn a lot about Martin as a person. Each chapter is more or less a short essay about a past project or part of his former work, some experiences he made and the (not so surprisingly) conclusions he made. There are quite a few sentences that are worth to remember, sometimes things you thought of many times but have not found the right words to write it down. Also you might find some interesting anecdotes you might learn something from (or did you know where carriage return and new line come ‘/r/n’ from and why they vary on different operating systems?).

One very positive aspect is that he points out what a professional (software developer) is, how he should behave and what could do to be recognized as such. In our industry you are still recognized as some kind of nerd, a geek who codes 24h hours a day, does not sleep, consumes a lot of caffeine and plays video games which a lack of social skills. While some of these things might be true, one expects often that you work more than the regular time, you solve each and every problem without any failure and that you come up with miracles, wonders performing magic, voodoo and code kung-fu each and every day (often for a very conservative salary). Nothing you would expect from other professionals (lawyers, doctors etc.).

Eventually, he writes about many things I, and probably you too, experience each and every day in our day work. At the end it is a nice reading book you might read during some evenings. Just the very final chapter about the tools he uses in his work (vi, Emacs, Eclipse etc.) and the frequent mentioning of FitNesse (which is Martins’ project) are quite unnecessary.

I am not fully convinced that the book is a must have reading, however, I work in this industry for nearly 13 years in various projects, research and product development, large and small companies, consulting and academia with different teams in different countries. At the very end it is quite calming to see once more that my problems are everywhere the same, have been the same for a long time and probably will stay the same for a long time in this industry.

Clean Code: o = 0

I do necessarily  agree with all statements in Clean Code by Robert C. Martin. One of the sections I though is completely obsolete was a statement about disinformative names:

“A truly awful example of disinformative names would be the use of lower-case L or uppercase O as variable names, especially in combination. The problem, of course, is that they look almost entirely like the constants one and zero, respectively.”

The corresponding example he gives is the following:

int a = 1;
if (O == 1)
  a = 01;
  l = O1;

So far I though it is obvious not to write such code, however, I came across similar code these days.

for (int o = 0; o < args.NewItems.Count; o++)
 string s = args.NewItems[o].ToString();

What’s the problem here? The variable name o is used for a counter and initialized with 0. While this is already hard to read, o might indicate that we deal with an object here. So when having just a brief look over this code you might get the impression it iterates through a set of objects. This is further supported by the usage of the NewItems property here, as in .NET object references is quite commonly used to resolve e.g. a key/value pair within collections.

When using a counter variable without meaning one should use common names such as i or j that a commonly recognized as counter variables.

for (int i = 0; i < args.NewItems.Count; i++)
 string s = args.NewItems[i].ToString();

This is only a slight modification but already improves the readability of the code.


int a = 1;
if (O == 1)
a = 01;
l = 01,


GPL as a Business Model

How stupid I was: while dealing with licenses for years, Dirk Riehle finally gave me a reality check. In an interview with Software Engineering Radio he told about his recent work at SAP and his experience and research with open source business models. Exaggerating I say: GPL is great, because GPL is the most capitalistic license you can think of.

As Dirk explained in the podcast the whole dual licensing model is build up on GPL. You sell your product but you also want to be pseudo open source. Then you open source using the GPL. All your competitors can read your code, extend the code, but then have to release everything under the GPL again. So there is no no benefit for your competitor. If customer’s or competitors want to extend your code they have to purchase the second license you have.

Personally, I prefer either closed source or quite permissive licenses (FreeBSD, MsPL etc). If yo are going to give your code away, do it right. If you want to build a business on your code base – keep it closed. So I always was quite careful about not reading GPL code or even worse, copying code snippets into your code base that might be under GPL.

Since yesterday however, I really have a different view on the whole topic: If you are open sourcing  a commercial open source project and you perform dual licensing, GPL is used as a pure instrument for your business methods. If you open source  a community project under GPL you probably have not understood the concepts at all.

Also interesting in this podcast was the facts about shifting revenues. So, licensing is a tool used for shifting revenues among various business areas. If you are selling a database you are probably interested, that all operating systems are free of charge, so the customer has more money left to pay your product and your service. If you are some company similar to SAP, you are probably interested in all operating systems and databases are being for free: Consequently, the customer has more money left to spend on your product and services. If you selling your operating system you are for sure interested in having all programs running on top of your system are for free. That way, the customer has more money left to spend on your operating system and the services.

The next time you read about some company switching from Windows to Linux the question is not about saving money on licenses. At the end, I personally don’t thing that the corresponding IT budget will be cut down due to the saved licensing fees. I rather think the budget is shifted to some other area.

I just realized I was focused on the GPL from the view of a developer for too long. If you feel the same, I highly recommend the interview with Dirk.

A Monologue on Pub/Sub

I am going to implementing a publish/subscribe mechanisms for our recent Web-based research prototype. Therefore, I am interested what others already thought about this pattern. Looking for the  pub/sub paradigm, Google returns the corresponding Wikipedia article on the first place. Screening the article, there are a few starting points worth to be remembered:

  • Pub/sub is a sibling of the message queue paradigm
  • Subscribers typically receive only a sub-set of the total messages published; selecting messages for reception is called filtering
  • In topic-based systems messages are published to topics or named logical channels
  • In content-based systems, attributes or the content of messages must match constraints defined by the subscriber
  • In hybrid systems, publishers post messages to topics; subscribers only receive content-based subscriptions on a particular topic
  • Brokers might be used to maintain subscriptions, store and forward messages and perform the filtering
  • The first time a pub/sub mechanism was described was in Exploiting Virtual Synchrony in Distributed Systems [pdf] by K. Birman and T. Joseph.
  • Publishers and subscribers remain ignorant of the system topology; publishers don’t know about the existence of any subscribers; this allows to create a loosely-coupled system
  • Scalability for pub/sub under high load in large deployments currently remains a research question

Looking for more information on the Web I cam across the Publish/Subscribe integration pattern from Microsoft’s patterns & practices. The problem statement seems reasonable:

  • How can an application in an integration architecture only send messages to the applications that are interested in receiving the messages without knowing the identities of the receivers?

In the context description, the following communication infrastructures are mentioned:

  • Bus
  • Broker
  • Point-to-Point

In contrast to to the Wikipedia article, we learn about three different types of mechanisms:

  • List-based Publish/Subscribe
  • Broadcast-based Publish/Subscribe
  • Content-based Publish/Subscribe

Having a closer look, the List-based Publish/Subscribe mechanism maintains a list of subscribers, similar to the Observer pattern. Attach() and Detach() operations allow to modify the list of subscribers while a Notify() operation is used to send updates to the subscribers. Seems to be well suited if you have one publisher and many subscribers, but does not look suitable if subscribers watch many subjects. The core functionality of List-based Publish/Subscribe can thus be identified as

  • The publisher maintains a list of all subscribers
  • The publisher notifies each one individually

If we understand subscription lists as named channels, the List-based Publish/Subscribe represents a topic-based subscription mechanism.

The Broadcast-bases Publish/Subscribe mechanism simply dumps messages to the local are network. Each subscriber is responsible for listening and inspecting the subject line of the message. If the subject line matches, the subscriber processes the message. This approach seems to be a optimum in decoupling the system. Clearly, this can be identified as some kind of a topic-based system. If the publisher needs to know about subscribers to a particular topic, a hybrid approach can be chosen, where a additional process requests information about interested subscribers. To establish the hybrid system, however, every subscriber must respond to the request. Another name mentioned in the article is publish/subscribe channel with reactive filtering due to the responsibility of each subscriber to filter the messages on its own.

In difference to the Wikipedia article, in this article the author differentiates between topic-based and content-based mechanisms. In this context, both, List-based Publish/Subscribe and Broadcast-based Publish/Subscribe are understood as topic-based mechanisms.

While topics are considered as a pre-defined set of subjects, each message in a content-based system can be understood as a  single dynamic logical channel. This idea was proposed in The Evolution of Publish/Subscribe Communication Systems [pdf]. We will come back to this paper later.

Where to implement the pub/sub functionality depends on your underlying communication structure:

  • Bus: Implement the subscription mechanism in the bus interface
  • Broker: Implement the mechanism through subscription lists to the broker
  • Point-to-Point: Implement the mechanism through subscription lists in the publisher

The article also differentiates between fixed subscriptions and dynamic subscriptions. While applications cannot control their subscriptions, dynamic subscriptions allow to modify subscriptions through certain control messages.

Some more keywords are listed in the article:

  • Initial subscription: How communicate subscribers their subscription to the communication infrastructure when they are initially added
  • Wildcard subscription: If supported, subscribers can subscribe to multiple topics through one subscription
  • Topic discovery: How can subscribers discover available topics if dynamic subscriptions are supported

How to implement a dynamic list-based publish-subscribe pattern is illustrated in the MSDN library.

I found also some article about the way EDA (event-driven architecture) extends SOA including a nice depiction of the idea behind EDA. There, EDA is proposed for a publish/subscribe mechanism rather than a command/control mechanism as provided by SOA. EDA seems especially suitable when you are facing

  • Workflow type of processes and
  • Processes that cross functional organizations borders.

It is also mentions that some good support for the EDA pub/sub pattern would be a declarative model.

I also came along this article giving a brief overview of Publish-Subscribe Channel pattern from the book Enterprise Integration Patterns by G. Hohpe and B. Woolf. It basically tells that the channel delivers a copy of the message to each of the output channels where each output channels has only one subscriber. After the message is consumed, it is removed from the channel.

Having a look into Exploiting Virtual Synchrony in Distributed Systems, mentioned in the beginning, gives you an insight into several issues in distributed systems. One interesting fact to bear in mind is about synchrony vs. asynchrony. If your publisher requires responses this could be 0, 1 or n for n subscribers. If you expect 0 responses you actually run a asynchronous system. For so-called process groups an interface is provided, allowing to join or leave a group but also to receive updates on the group memberships. Sounds similar? The Observer pattern, I see here. In the described news service, one already realizes the common concepts described before: “Each subscriber receives a copy of any message having a ‘subject’ for which it has enrolled on the order they were posted.“. The overall description is rather abstract, but gives a interesting insight into the development of the mechanism.

Afterwards I ended up directly with The Evolution of Publish/Subscribe Communication Systems, providing a well written summary of the publish/subscribe paradigm. Especially the decoupling fact has been structured into

  • Anonymity: parties do not need to know each other,
  • Decoupling in time: interacting parties do not need to be up at the same time,
  • Decoupling in flow: sending and receipt does not block parties.

Again, we see content-based and topic-based mechanisms which makes me think twice of the classification proposed in the Wikipedia article. Back to the paper, the authors state that content-based pub/sub systems cannot rely on

  • Centralized architectures based on
  • Network level solutions.

A single server simply cannot deal with a high number of subscribers and the limited number of IP multicast addresses does not fit the large number of logical channels. Rather they propose a application-level realization through a set of event-brokers, exchanging information on a point-to-point basis. For broker interaction the following issues are pointed out:

  • Subscription and information routing: I.e. creating a mapping between subscriptions and subscribers and the matching and forwarding  of operations.

Maybe its worth to mention, that both papers address communication systems on network and overlay network infrastructure-levels than on application-level. However, the concepts are the same.

Baldoni comes up with the concept of ad-hoc subscription languages, compared to SQL for databases. This, however, requires a-priori knowledge of the structure of the information space. At least, the idea of selecting subscriptions or topics using a query language sounds quite appealing. As future research direction, a potential formal specification of the subscription service, provided by a pub/sub system is proposed.

  • Notification semantics would provide the conditions if, when and how many times an information is delivered to a subscriber. This is pointed out as a mandatory feature if the pub/sub mechanisms would be applied to mission-critical or dependable applications.
  • Publishing semantics should allow to define the lifetime of information. I an pub/sub-based system, the subscriber has no rights to remove elements from a queue. To avoid overflow, the information must be removed from the queue. This, however, is clearly publisher dependent.

Another often cited paper I have a look at is The Many Faces of Publish/Subscribe [pdf]. Similar to the paper before, the three decoupling dimensions time, space and synchronization are considered to extract the common concepts of different variants of the pub/sub paradigm. We learn that individual point-to-point and synchronous communication leads to rigid and static applications. Three types of pub/sub mechanisms are introduced:

  • Topic-based
  • Content-based
  • Type-based

The basic terms for sending and receiving messages through a software bus/event used here are

  • Event for the message to be delivered and
  • Notification for the act of delivering this event.

The core system should provide a

  • Event notification service providing
  • Storage and management for subscriptions and
  • Efficient delivering of events.

The events used here are called subscribe(), unsubscribe() and publish() – not that different from the ones we know from the Observer pattern. Some new operation is called advertise() to advertise the nature of future events of an publisher. That way, the event service can adjust to the expected event flows and subscribers can learn when new types of information come available. We also learn about alternative communication paradigms here:

  • Message passing is just about sending and receiving messages through communication channels. For the sender, the process is asynchronous, while the receiver must act synchronous. Both parties must be active at the same time and the sender must know its receivers. Consequently, the parties a coupled both, in space and time.
  • RPC (mentioned the first time in Implementing Remote Procedure Calls [pdf] and A Survey of Remote Procedure Calls [pdf]) makes remote interactions appear the same way as local ones. Here we have a strong space and time coupling since the the invoking object hold a reference to the invoked one. One attempt for removing synchrony was e.g. applied by CORBA using one-way modifiers. In this context, the authors mention the expression fire-and-forget.
  • Notifications allow a decoupling of synchronization by performing two independent invocations. The first (sent from client to server) provides a callback reference used by the server to notify the client about changes. This is mentioned to be a limited version of pub/sub mechanism and directly related to the Observer pattern we already learned before.
  • Shared spaces are definitely not what I am going to use, however it is interesting to read the summary. All communication between parties takes place using tuple spaces (e.g. known from Linda) using three operations in(), out() and read(). This approach is both, time and space decoupled but remains synchronized and is thus somewhat limited in scalability.
  • Message queuing often uses some pub/sub mechanism.In difference to tuple spaces, message queues provide some transactional, timing and ordering guarantees. In difference to the pub/sub mechanism we learned before, messages are concurrently pulled by the consumer.

For the three pub/sub forms we fi
nd some more detailed information:

  • Topic-based pub/sub is based on the notion of topics or subjects, extending the notion of channels. Subscribers can subscribe topics, identified by keywords and are related to the concept of groups and group communication. When you think now of the paper we discussed before: The Isis system, described in Exploiting Virtual Synchrony in Distributed Systems is also mentioned as the one introducing the pub/sub concept the first time. Some nice expression I read in the related section was the concept of event space. In topic-based systems, the event space can be addressed hierarchically, while groups usually offer only a flat structure.
  • Content-based pub/sub (aka property-based) should introduce a subscription scheme based on the particular event. Some properties events to be used for structuring could be: internal attributes of data structures or meta-data associated to events. Here again, we read about subscription languages but more in detail about filters on form of name-value pairs combined with simple operators (=, <, >, <=, >=) resulting in so-called subscription patterns.
  • Type-based pub/sub is meant to replace the name-based classification of topics by a scheme according to the type of events.

Having a closer look at events we learn about the classification into messages (delivery through a single operation e.g. notify) and invocations (event triggers some specific operation on the subscriber). Furthermore, invocations are directed to a certain kind of objects and provide some well-known semantics. You can also differentiate between on-way invocations (COM+ or CORBA Event Service) and those requiring some return value.

We see different kinds of architectures there:

  • Centralized architectures are using a centralized component for storing and forwarding events. Consequently, this component is a single source of failure.
  • Distributed architectures omit this centralized component and are well suited for efficient delivery of messages.
  • Hybrid approaches provide a decentralized notification and storage service.

Dissemination of messages is also discussed but relies a lot on the underlying concepts. Efficient multicast in content-based pub/sub systems, however, is pointed out to be still an issue.

Some more points to be considered are related to QoS:

  • Since the publisher does not know about when and if the sent messages are processed some mechanism is required to ensure persistence of the information.
  • More QoS features deal with priorities (only relevant for messages in transit) and transaction if multiple messages are combined to atomic operations.
  • Reliability is finally pointed out as one of the most important features in distributed information systems.

Bearing this information in mind, I now have a look into the Publish-Subscribe Notification for Web Services [pdf] whitepaper as part of the WS-Notification family. The document deals with the notification pattern for notification-based or event-driven systems in the Web service context. Here we see the same pattern as learned before:

  • Subscribers  register dynamically with the publisher
  • Multiple subscribers can register with a publisher
  • The distributing Web service sends one separate copy to each of the subscriber

The spec defines (among others) some interesting requirements:

  • Support of resource-constrained devices
  • Support both, direct and brokered notification
  • Transformation and aggregation of brokered topics
  • Publishing of runtime meta-data (for discovering available elements)
  • Allow federation of brokers

In the terminology section we find another interesting statement saying “a Subscription is a WS-Resource” where a WS-Resource is defined as follows:

A Web service having an association with a stateful resource, where the stateful resource is defined by a resource properties document type and the association is expressed by annotating a WSDL portType with the type definition of the resource properties document

Got it? At least let us think of subscriptions as resources. This idea lines up well with my current research.

Also the fact of hierarchically structured topics is considered: Especially topic trees are hierarchically structured topics and topic spaces are a set of topic trees grouped together into the same namespace (obviously due to administrative reasons).

It is actually the first document dealing with security aspect, listing the following classes of attacks:

  • Message alteration
  • Message disclosure aka confidentiality
  • Key integrity
  • Authentication
  • Accountability, i.e. a function of the type and string of the key/algorithm used
  • Availability, e.g. DoS attacks
  • Replay of messages

Finally, I found the Distributed Publish/Subscribe Event System on CodePlex: In the whitepaper, the various types of pub/sub are characterized by

  • Coupling
  • Brokered subscriptions
  • Persistent vs. transient subscriptions
  • Delivery of events and
  • Routing.

The Web Solutions Platform (WSP) is designed as a distributed pub/sub system and works both, intra-machine and inter-machine. Applications here subscribe to event types, so it looks like an event-based pub/sub system. The document provides some more descriptions on the system itself but no more  information on publish/subscribe mechanism in general.

That’s a lot of stuff and now I have to spend some time in reflecting all these information for my design.