June 28, 2017

Goodbye Mozilla

By Chris Lord

Today is effectively my last day at Mozilla, before I start at Impossible on Monday. I’ve been here for 6 years and a bit and it’s been quite an experience. I think it’s worth reflecting on, so here we go; Fair warning, if you have no interest in me or Mozilla, this is going to make pretty boring reading.

I started on June 6th 2011, several months before the (then new, since moved) London office opened. Although my skills lay (lie?) in user interface implementation, I was hired mainly for my graphics and systems knowledge. Mozilla was in the region of 500 or so employees then I think, and it was an interesting time. I’d been working on the code-base for several years prior at Intel, on a headless backend that we used to build a Clutter-based browser for Moblin netbooks. I wasn’t completely unfamiliar with the code-base, but it still took a long time to get to grips with. We’re talking several million lines of code with several years of legacy, in a language I still consider myself to be pretty novice at (C++).

I started on the mobile platform team, and I would consider this to be my most enjoyable time at the company. The mobile platform team was a multi-discipline team that did general low-level platform work for the mobile (Android and Meego) browser. When we started, the browser was based on XUL and was multi-process. Mobile was often the breeding ground for new technologies that would later go on to desktop. It wasn’t long before we started developing a new browser based on a native Android UI, removing XUL and relegating Gecko to page rendering. At the time this felt like a disappointing move. The reason the XUL-based browser wasn’t quite satisfactory was mainly due to performance issues, and as a platform guy, I wanted to see those issues fixed, rather than worked around. In retrospect, this was absolutely the right decision and lead to what I’d still consider to be one of Android’s best browsers.

Despite performance issues being one of the major driving forces for making this move, we did a lot of platform work at the time too. As well as being multi-process, the XUL browser had a compositor system for rendering the page, but this wasn’t easily portable. We ended up rewriting this, first almost entirely in Java (which was interesting), then with the rendering part of the compositor in native code. The input handling remained in Java for several years (pretty much until FirefoxOS, where we rewrote that part in native code, then later, switched Android over).

Most of my work during this period was based around improving performance (both perceived and real) and fluidity of the browser. Benoit Girard had written an excellent tiled rendering framework that I polished and got working with mobile. On top of that, I worked on progressive rendering and low precision rendering, which combined are probably the largest body of original work I’ve contributed to the Mozilla code-base. Neither of them are really active in the code-base at the moment, which shows how good a job I didn’t do maintaining them, I suppose.

Although most of my work was graphics-focused on the platform team, I also got to to do some layout work. I worked on some over-invalidation issues before Matt Woodrow’s DLBI work landed (which nullified that, but I think that work existed in at least one release). I also worked a lot on fixed position elements staying fixed to the correct positions during scrolling and zooming, another piece of work I was quite proud of (and probably my second-biggest contribution). There was also the opportunity for some UI work, when it intersected with platform. I implemented Firefox for Android’s dynamic toolbar, and made sure it interacted well with fixed position elements (some of this work has unfortunately been undone with the move from the partially Java-based input manager to the native one). During this period, I was also regularly attending and presenting at FOSDEM.

I would consider my time on the mobile platform team a pretty happy and productive time. Unfortunately for me, those of us with graphics specialities on the mobile platform team were taken off that team and put on the graphics team. I think this was the start in a steady decline in my engagement with the company. At the time this move was made, Mozilla was apparently trying to consolidate teams around products, and this was the exact opposite happening. The move was never really explained to me and I know I wasn’t the only one that wasn’t happy about it. The graphics team was very different to the mobile platform team and I don’t feel I fit in as well. It felt more boisterous and less democratic than the mobile platform team, and as someone that generally shies away from arguments and just wants to get work done, it was hard not to feel sidelined slightly. I was also quite disappointed that people didn’t seem particular familiar with the graphics work I had already been doing and that I was tasked, at least initially, with working on some very different (and very boring) desktop Linux work, rather than my speciality of mobile.

I think my time on the graphics team was pretty unproductive, with the exception of the work I did on b2g, improving tiled rendering and getting graphics memory-mapped tiles working. This was particularly hard as the interface was basically undocumented, and its implementation details could vary wildly depending on the graphics driver. Though I made a huge contribution to this work, you won’t see me credited in the tree unfortunately. I’m still a little bit sore about that. It wasn’t long after this that I requested to move to the FirefoxOS systems front-end team. I’d been doing some work there already and I’d long wanted to go back to doing UI. It felt like I either needed a dramatic change or I needed to leave. I’m glad I didn’t leave at this point.

Working on FirefoxOS was a blast. We had lots of new, very talented people, a clear and worthwhile mission, and a new code-base to work with. I worked mainly on the home-screen, first with performance improvements, then with added features (app-grouping being the major one), then with a hugely controversial and probably mismanaged (on my part, not my manager – who was excellent) rewrite. The rewrite was good and fixed many of the performance problems of what it was replacing, but unfortunately also removed features, at least initially. Turns out people really liked the app-grouping feature.

I really enjoyed my time working on FirefoxOS, and getting a nice clean break from platform work, but it was always bitter-sweet. Everyone working on the project was very enthusiastic to see it through and do a good job, but it never felt like upper management’s focus was in the correct place. We spent far too much time kowtowing to the desires of phone carriers and trying to copy Android and not nearly enough time on basic features and polish. Up until around v2.0 and maybe even 2.2, the experience of using FirefoxOS was very rough. Unfortunately, as soon as it started to show some promise and as soon as we had freedom from carriers to actually do what we set out to do in the first place, the project was cancelled, in favour of the whole Connected Devices IoT debacle.

If there was anything that killed morale for me more than my unfortunate time on the graphics team, and more than having FirefoxOS prematurely cancelled, it would have to be the Connected Devices experience. I appreciate it as an opportunity to work on random semi-interesting things for a year or so, and to get some entrepreneurship training, but the mismanagement of that whole situation was pretty epic. To take a group of hundreds of UI-focused engineers and tell them that, with very little help, they should organised themselves into small teams and create IoT products still strikes me as an idea so crazy that it definitely won’t work. Certainly not the way we did it anyway. The idea, I think, was that we’d be running several internal start-ups and we’d hopefully get some marketable products out of it. What business a not-for-profit company, based primarily on doing open-source, web-based engineering has making physical, commercial products is questionable, but it failed long before that could be considered.

The process involved coming up with an idea, presenting it and getting approval to run with it. You would then repeat this approval process at various stages during development. It was, however, very hard to get approval for enough resources (both time and people) to finesse an idea long enough to make it obviously a good or bad idea. That aside, I found it very demoralising to not have the opportunity to write code that people could use. I did manage it a few times, in spite of what was happening, but none of this work I would consider myself particularly proud of. Lots of very talented people left during this period, and then at the end of it, everyone else was laid off. Not a good time.

Luckily for me and the team I was on, we were moved under the umbrella of Emerging Technologies before the lay-offs happened, and this also allowed us to refocus away from trying to make an under-featured and pointless shopping-list assistant and back onto the underlying speech-recognition technology. This brings us almost to present day now.

The DeepSpeech speech recognition project is an extremely worthwhile project, with a clear mission, great promise and interesting underlying technology. So why would I leave? Well, I’ve practically ended up on this team by a series of accidents and random happenstance. It’s been very interesting so far, I’ve learnt a lot and I think I’ve made a reasonable contribution to the code-base. I also rewrote python_speech_features in C for a pretty large performance boost, which I’m pretty pleased with. But at the end of the day, it doesn’t feel like this team will miss me. I too often spend my time finding work to do, and to be honest, I’m just not interested enough in the subject matter to make that work long-term. Most of my time on this project has been spent pushing to open it up and make it more transparent to people outside of the company. I’ve added model exporting, better default behaviour, a client library, a native client, Python bindings (+ example client) and most recently, Node.js bindings (+ example client). We’re starting to get noticed and starting to get external contributions, but I worry that we still aren’t transparent enough and still aren’t truly treating this as the open-source project it is and should be. I hope the team can push further towards this direction without me. I think it’ll be one to watch.

Next week, I start working at a new job doing a new thing. It’s odd to say goodbye to Mozilla after 6 years. It’s not easy, but many of my peers and colleagues have already made the jump, so it feels like the right time. One of the big reasons I’m moving, and moving to Impossible specifically, is that I want to get back to doing impressive work again. This is the largest regret I have about my time at Mozilla. I used to blog regularly when I worked at OpenedHand and Intel, because I was excited about the work we were doing and I thought it was impressive. This wasn’t just youthful exuberance (he says, realising how ridiculous that sounds at 32), I still consider much of the work we did to be impressive, even now. I want to be doing things like that again, and it feels like Impossible is a great opportunity to make that happen. Wish me luck!

June 15, 2017

How the Osmocom GSM stack is funded

By Harald "LaF0rge" Welte

As the topic has been raised on twitter, I thought I might share a bit of insight into the funding of the Osmocom Cellular Infrastructure Projects.

Keep in mind: Osmocom is a much larger umbrella project, and beyond the Networks-side cellular stack is home many different community-based projects around open source mobile communications. All of those have started more or less as just for fun projects, nothing serious, just a hobby [1]

The projects implementing the network-side protocol stacks and network elements of GSM/GPRS/EGPRS/UMTS cellular networks are somewhat the exception to that, as they have evolved to some extent professionalized. We call those projects collectively the Cellular Infrastructure projects inside Osmocom. This post is about that part of Osmocom only


From late 2008 through 2009, People like Holger and I were working on bs11-abis and later OpenBSC only in our spare time. The name Osmocom didn't even exist back then. There was a strong technical community with contributions from Sylvain Munaut, Andreas Eversberg, Daniel Willmann, Jan Luebbe and a few others. None of this would have been possible if it wasn't for all the help we got from Dieter Spaar with the BS-11 [2]. We all had our dayjob in other places, and OpenBSC work was really just a hobby. People were working on it, because it was where no FOSS hacker has gone before. It was cool. It was a big and pleasant challenge to enter the closed telecom space as pure autodidacts.

Holger and I were doing freelance contract development work on Open Source projects for many years before. I was mostly doing Linux related contracting, while Holger has been active in all kinds of areas throughout the FOSS software stack.

In 2010, Holger and I saw some first interest by companies into OpenBSC, including Netzing AG and On-Waves ehf. So we were able to spend at least some of our paid time on OpenBSC/Osmocom related contract work, and were thus able to do less other work. We also continued to spend tons of spare time in bringing Osmocom forward. Also, the amount of contract work we did was only a fraction of the many more hours of spare time.

In 2011, Holger and I decided to start the company sysmocom in order to generate more funding for the Osmocom GSM projects by means of financing software development by product sales. So rather than doing freelance work for companies who bought their BTS hardware from other places (and spent huge amounts of cash on that), we decided that we wanted to be a full solution supplier, who can offer a complete product based on all hardware and software required to run small GSM networks.

The only problem is: We still needed an actual BTS for that. Through some reverse engineering of existing products we figured out who one of the ODM suppliers for the hardware + PHY layer was, and decided to develop the OsmoBTS software to do so. We inherited some of the early code from work done by Andreas Eversberg on the jolly/bts branch of OsmocomBB (thanks), but much was missing at the time.

What follows was Holger and me working several years for free [3], without any salary, in order to complete the OsmoBTS software, build an embedded Linux distribution around it based on OE/poky, write documentation, etc. and complete the first sysmocom product: The sysmoBTS 1002

We did that not because we want to get rich, or because we want to run a business. We did it simply because we saw an opportunity to generate funding for the Osmocom projects and make them more sustainable and successful. And because we believe there is a big, gaping, huge vacuum in terms of absence of FOSS in the cellular telecom sphere.

Funding by means of sysmocom product sales

Once we started to sell the sysmoBTS products, we were able to fund Osmocom related development from the profits made on hardware / full-system product sales. Every single unit sold made a big contribution towards funding both the maintenance as well as the ongoing development on new features.

This source of funding continues to be an important factor today.

Funding by means of R&D contracts

The probably best and most welcome method of funding Osmocom related work is by means of R&D projects in which a customer funds our work to extend the Osmocom GSM stack in one particular area where he has a particular need that the existing code cannot fulfill yet.

This kind of project is the ideal match, as it shows where the true strength of FOSS is: Each of those customers did not have to fund the development of a GSM stack from scratch. Rather, they only had to fund those bits that were missing for their particular application.

Our reference for this is and has been On-Waves, who have been funding development of their required features (and bug fixing etc.) since 2010.

We've of course had many other projects from a variety of customers over over the years. Last, but not least, we had a customer who willingly co-funded (together with funds from NLnet foundation and lots of unpaid effort by sysmocom) the 3G/3.5G support in the Osmocom stack.

The problem here is:

  • we have not been able to secure anywhere nearly as many of those R&D projects within the cellular industry, despite believing we have a very good foundation upon which we can built. I've been writing many exciting technical project proposals
  • you almost exclusively get funding only for new features. But it's very hard to get funding for the core maintenance work. The bug-fixing, code review, code refactoring, testing, etc.

So as a result, the profit margin you have on selling R&D projects is basically used to (do a bad job of) fund those bits and pieces that nobody wants to pay for.

Funding by means of customer support

There is a way to generate funding for development by providing support services. We've had some success with this, but primarily alongside the actual hardware/system sales - not so much in terms of pure software-only support.

Also, providing support services from a R&D company means:

  • either you distract your developers by handling support inquiries. This means they will have less time to work on actual code, and likely get side tracked by too many issues that make it hard to focus
  • or you have to hire separate support staff. This of course means that the size of the support business has to be sufficiently large to not only cover the cots of hiring + training support staff, but also still generate funding for the actual software R&D.

We've tried shortly with the second option, but fallen back to the first for now. There's simply not sufficient user/admin type support business to rectify dedicated staff for that.

Funding by means of cross-subsizing from other business areas

sysmocom also started to do some non-Osmocom projects in order to generate revenue that we can feed again into Osmocom projects. I'm not at liberty to discuss them in detail, but basically we've been doing pretty much anything from

  • custom embedded Linux board designs
  • M2M devices with GSM modems
  • consulting gigs
  • public tendered research projects

Profits from all those areas went again into Osmocom development.

Last, but not least, we also operate the sysmocom webshop. The profit we make on those products also is again immediately re-invested into Osmocom development.

Funding by grants

We've had some success in securing funding from NLnet Foundation for specific features. While this is useful, the size of their projects grants of up to EUR 30k is not a good fit for the scale of the tasks we have at hand inside Osmocom. You may think that's a considerable amount of money? Well, that translates to 2-3 man-months of work at a bare cost-covering rate. At a team size of 6 developers, you would theoretically have churned through that in two weeks. Also, their focus is (understandably) on Internet and IT security, and not so much cellular communications.

There are of course other options for grants, such as government research grants and the like. However, they require long-term planning, they require you to match (i.e. pay yourself) a significant portion, and basically mandate that you hire one extra person for doing all the required paperwork and reporting. So all in all, not a particularly attractive option for a very small company consisting of die hard engineers.

Funding by more BTS ports

At sysmocom, we've been doing some ports of the OsmoBTS + OsmoPCU software to other hardware, and supporting those other BTS vendors with porting, R&D and support services.

If sysmocom was a classic BTS vendor, we would not help our "competition". However, we are not. sysmocom exists to help Osmocom, and we strongly believe in open systems and architectures, without a single point of failure, a single supplier for any component or any type of vendor lock-in.

So we happily help third parties to get Osmocom running on their hardware, either with a proprietary PHY or with OsmoTRX.

However, we expect that those BTS vendors also understand their responsibility to share the development and maintenance effort of the stack. Preferably by dedicating some of their own staff to work in the Osmocom community. Alternatively, sysmocom can perform that work as paid service. But that's a double-edged sword: We don't want to be a single point of failure.

Osmocom funding outside of sysmocom

Osmocom is of course more than sysmocom. Even for the cellular infrastructure projects inside Osmocom is true: They are true, community-based, open, collaborative development projects. Anyone can contribute.

Over the years, there have been code contributions by e.g. Fairwaves. They, too, build GSM base station hardware and use that as a means to not only recover the R&D on the hardware, but also to contribute to Osmocom. At some point a few years ago, there was a lot of work from them in the area of OsmoTRX, OsmoBTS and OsmoPCU. Unfortunately, in more recent years, they have not been able to keep up the level of contributions.

There are other companies engaged in activities with and around Osmcoom. There's Rhizomatica, an NGO helping indigenous communities to run their own cellular networks. They have been funding some of our efforts, but being an NGO helping rural regions in developing countries, they of course also don't have the deep pockets. Ideally, we'd want to be the ones contributing to them, not the other way around.

State of funding

We're making some progress in securing funding from players we cannot name [4] during recent years. We're also making occasional progress in convincing BTS suppliers to chip in their share. Unfortunately there are more who don't live up to their responsibility than those who do. I might start calling them out by name one day. The wider community and the public actually deserves to know who plays by FOSS rules and who doesn't. That's not shaming, it's just stating bare facts.

Which brings us to:

  • sysmocom is in an office that's actually too small for the team, equipment and stock. But we certainly cannot afford more space.
  • we cannot pay our employees what they could earn working at similar positions in other companies. So working at sysmocom requires dedication to the cause :)
  • Holger and I have invested way more time than we have ever paid us, even more so considering the opportunity cost of what we would have earned if we'd continued our freelance Open Source hacker path
  • we're [just barely] managing to pay for 6 developers dedicated to Osmocom development on our payroll based on the various funding sources indicated above

Nevertheless, I doubt that any such a small team has ever implemented an end-to-end GSM/GPRS/EGPRS network from RAN to Core at comparative feature set. My deepest respects to everyone involved. The big task now is to make it sustainable.


So as you can see, there's quite a bit of funding around. However, it always falls short of what's needed to implement all parts properly, and even not quite sufficient to keep maintaining the status quo in a proper and tested way. That can often be frustrating (mostly to us but sometimes also to users who run into regressions and oter bugs). There's so much more potential. So many things we wanted to add or clean up for a long time, but too little people interested in joining in, helping out - financially or by writing code.

On thing that is often a challenge when dealing with traditional customers: We are not developing a product and then selling a ready-made product. In fact, in FOSS this would be more or less suicidal: We'd have to invest man-years upfront, but then once it is finished, everyone can use it without having to partake in that investment.

So instead, the FOSS model requires the customers/users to chip in early during the R&D phase, in order to then subsequently harvest the fruits of that.

I think the lack of a FOSS mindset across the cellular / telecom industry is the biggest constraining factor here. I've seen that some 20-15 years ago in the Linux world. Trust me, it takes a lot of dedication to the cause to endure this lack of comprehension so many years later.

[1]just like Linux has started out.
[2]while you will not find a lot of commits from Dieter in the code, he has been playing a key role in doing a lot of prototyping, reverse engineering and debugging!
[3]sysmocom is 100% privately held by Holger and me, we intentionally have no external investors and are proud to never had to take a bank loan. So all we could invest was our own money and, most of all, time.
[4]contrary to the FOSS world, a lot of aspects are confidential in business, and we're not at liberty to disclose the identities of all our customers

FOSS misconceptions, still in 2017

By Harald "LaF0rge" Welte

The lack of basic FOSS understanding in Telecom

Given that the Free and Open Source movement has been around at least since the 1980ies, it puzzles me that people still seem to have such fundamental misconceptions about it.

Something that really triggered me was an article at LightReading [1] which quotes Ulf Ewaldsson, a leading Ericsson excecutive with

"I have yet to understand why we would open source something we think is really good software"

This completely misses the point. FOSS is not about making a charity donation of a finished product to the planet.

FOSS is about sharing the development costs among multiple players, and avoiding that everyone has to reimplement the wheel. Macro-Economically, it is complete and utter nonsense that each 3GPP specification gets implemented two dozens of times, by at least a dozen of different entities. As a result, products are way more expensive than needed.

If large Telco players (whether operators or equipment manufacturers) were to collaboratively develop code just as much as they collaboratively develop the protocol specifications, there would be no need for replicating all of this work.

As a result, everyone could produce cellular network elements at reduced cost, sharing the R&D expenses, and competing in key areas, such as who can come up with the most energy-efficient implementation, or can produce the most reliable hardware, the best receiver sensitivity, the best and most fair scheduling implementation, or whatever else. But some 80% of the code could probably be shared, as e.g. encoding and decoding messages according to a given publicly released 3GPP specification document is not where those equipment suppliers actually compete.

So my dear cellular operator executives: Next time you're cursing about the prohibitively expensive pricing that your equipment suppliers quote you: You only have to pay that much because everyone is reimplementing the wheel over and over again.

Equally, my dear cellular infrastructure suppliers: You are all dying one by one, as it's hard to develop everything from scratch. Over the years, many of you have died. One wonders, if we might still have more players left, if some of you had started to cooperate in developing FOSS at least in those areas where you're not competing. You could replicate what Linux is doing in the operating system market. There's no need in having a phalanx of different proprietary flavors of Unix-like OSs. It's way too expansive, and it's not an area in which most companies need to or want to compete anyway.

Management Summary

You don't first develop and entire product until it is finished and then release it as open source. This makes little economic sense in a lot of cases, as you've already invested into developing 100% of it. Instead, you actually develop a new product collaboratively as FOSS in order to not have to invest 100% but maybe only 30% or even less. You get a multitude of your R&D investment back, because you're not only getting your own code, but all the other code that other community members implemented. You of course also get other benefits, such as peer review of the code, more ideas (not all bright people work inside one given company), etc.

[1]that article is actually a heavily opinionated post by somebody who appears to be pushing his own anti-FOSS agenda for some time. The author is misinformed about the fact that the TIP has always included projects under both FRAND and FOSS terms. As a TIP member I can attest to that fact. I'm only referencing it here for the purpose of that that Ericsson quote.

May 28, 2017

Playing back GSM RTP streams, RTP-HR bugs

By Harald "LaF0rge" Welte

Chapter 0: Problem Statement

In an all-IP GSM network, where we use Abis, A and other interfaces within the cellular network over IP transport, the audio of voice calls is transported inside RTP frames. The codec payload in those RTP frames is the actual codec frame of the respective cellular voice codec. In GSM, there are four relevant codecs: FR, HR, EFR and AMR.

Every so often during the (meanwhile many years of ) development of Osmocom cellular infrastructure software it would have been useful to be able to quickly play back the audio for analysis of given issues.

However, until now we didn't have that capability. The reason is relatively simple: In Osmocom, we genally don't do transcoding but simply pass the voice codec frames from left to right. They're only transcoded inside the phones or inside some external media gateway (in case of larger networks).

Chapter 1: GSM Audio Pocket Knife

Back in 2010, when we were very actively working on OsmocomBB, the telephone-side GSM protocol stack implementation, Sylvain Munaut wrote the GSM Audio Pocket Knife (gapk) in order to be able to convert between different formats (representations) of codec frames. In cellular communcations, everyoe is coming up with their own representation for the codec frames: The way they look on E1 as a TRAU frame is completely different from how RTP payload looks like, or what the TI Calypso DSP uses internally, or what a GSM Tester like the Racal 61x3 uses. The differences are mostly about data types used, bit-endinanness as well as padding and headers. And of course those different formats exist for each of the four codecs :/

In 2013 I first added simplistic RTP support for FR-GSM to gapk, which was sufficient for my debugging needs back then. Still, you had to save the decoded PCM output to a file and play that back, or use a pipe into aplay.

Last week, I picked up this subject again and added a long series of patches to gapk:

  • support for variable-length codec frames (required for AMR support)
  • support for AMR codec encode/decode using libopencore-amrnb
  • support of all known RTP payload formats for all four codecs
  • support for direct live playback to a sound card via ALSA

All of the above can now be combined to make GAPK bind to a specified UDP port and play back the RTP codec frames that anyone sends to that port using a command like this:

$ gapk -I -f rtp-amr -A default -g rawpcm-s16le

I've also merged a chance to OsmoBSC/OsmoNITB which allows the administrator to re-direct the voice of any active voice channel towards a user-specified IP address and port. Using that you can simply disconnect the voice stream from its normal destination and play back the audio via your sound card.

Chapter 2: Bugs in OsmoBTS GSM-HR

While going through the exercise of implementing the above extension to gapk, I had lots of trouble to get it to work for GSM-HR.

After some more digging, it seems there are two conflicting specification on how to format the RTP payload for half-rate GSM:

In Osmocom, we claim to implement RFC5993, but it turned out that (at least) osmo-bts-sysmo (for sysmoBTS) was actually implementing the ETSI format instead.

And even worse, osmo-bts-sysmo gets event the ETSI format wrong. Each of the codec parameters (which are unaligned bit-fields) are in the wrong bit-endianness :(

Both the above were coincidentially also discovered by Sylvain Munaut during operating of the 32C3 GSM network in December 2015 and resulted the two following "work around" patches: * HACK for HR * HACK: Fix the bit order in HR frames

Those merely worked around those issues in the rtp_proxy of OsmoNITB, rather than addressing the real issue. That's ok, they were "quick" hacks to get something working at all during a four-day conference. I'm now working on "real" fixes in osmo-bts-sysmo. The devil is of course in the details, when people upgrade one BTS but not the other and want to inter-operate, ...

It yet remains to be investigated how osmo-bts-trx and other osmo-bts ports behave in this regard.

Chapter 3: Conclusions

Most definitely it is once again a very clear sign that more testing is required. It's tricky to see even wih osmo-gsm-tester, as GSM-HR works between two phones or even two instances of osmo-bts-sysmo, as both sides of the implementation have the same (wrong) understanding of the spec.

Given that we can only catch this kind of bug together with the hardware (the DSP runs the PHY code), pure unit tests wouldn't catch it. And the end-to-end test is also not very well suited to it. It seems to call for something in betewen. Something like an A-bis interface level test.

We need more (automatic) testing. I cannot say that often enough. The big challenge is how to convince contributors and customers that they should invest their time and money there, rather than yet-another (not automatically tested) feature?

May 27, 2017

Free Ideas for UI Frameworks, or How To Achieve Polished UI

By Chris Lord

Ever since the original iPhone came out, I’ve had several ideas about how they managed to achieve such fluidity with relatively mediocre hardware. I mean, it was good at the time, but Android still struggles on hardware that makes that look like a 486… It’s absolutely my fault that none of these have been implemented in any open-source framework I’m aware of, so instead of sitting on these ideas and trotting them out at the pub every few months as we reminisce over what could have been, I’m writing about them here. I’m hoping that either someone takes them and runs with them, or that they get thoroughly debunked and I’m made to look like an idiot. The third option is of course that they’re ignored, which I think would be a shame, but given I’ve not managed to get the opportunity to implement them over the last decade, that would hardly be surprising. I feel I should clarify that these aren’t all my ideas, but include a mix of observation of and conjecture about contemporary software. This somewhat follows on from the post I made 6 years ago(!) So let’s begin.

1. No main-thread UI

The UI should always be able to start drawing when necessary. As careful as you may be, it’s practically impossible to write software that will remain perfectly fluid when the UI can be blocked by arbitrary processing. This seems like an obvious one to me, but I suppose the problem is that legacy makes it very difficult to adopt this at a later date. That said, difficult but not impossible. All the major web browsers have adopted this policy, with caveats here and there. The trick is to switch from the idea of ‘painting’ to the idea of ‘assembling’ and then using a compositor to do the painting. Easier said than done of course, most frameworks include the ability to extend painting in a way that would make it impossible to switch to a different thread without breaking things. But as long as it’s possible to block UI, it will inevitably happen.

2. Contextually-aware compositor

This follows on from the first point; what’s the use of having non-blocking UI if it can’t respond? Input needs to be handled away from the main thread also, and the compositor (or whatever you want to call the thread that is handling painting) needs to have enough context available that the first response to user input doesn’t need to travel to the main thread. Things like hover states, active states, animations, pinch-to-zoom and scrolling all need to be initiated without interaction on the main thread. Of course, main thread interaction will likely eventually be required to update the view, but that initial response needs to be able to happen without it. This is another seemingly obvious one – how can you guarantee a response rate unless you have a thread dedicated to responding within that time? Most browsers are doing this, but not going far enough in my opinion. Scrolling and zooming are often catered for, but not hover/active states, or initialising animations (note; initialising animations. Once they’ve been initialised, they are indeed run on the compositor, usually).

3. Memory bandwidth budget

This is one of the less obvious ideas and something I’ve really wanted to have a go at implementing, but never had the opportunity. A problem I saw a lot while working on the platform for both Firefox for Android and FirefoxOS is that given the work-load of a web browser (which is not entirely dissimilar to the work-load of any information-heavy UI), it was very easy to saturate memory bandwidth. And once you saturate memory bandwidth, you end up having to block somewhere, and painting gets delayed. We’re assuming UI updates are asynchronous (because of course – otherwise we’re blocking on the main thread). I suggest that it’s worth tracking frame time, and only allowing large asynchronous transfers (e.g. texture upload, scaling, format transforms) to take a certain amount of time. After that time has expired, it should wait on the next frame to be composited before resuming (assuming there is a composite scheduled). If the composited frame was delayed to the point that it skipped a frame compared to the last unladen composite, the amount of time dedicated to transfers should be reduced, or the transfer should be delayed until some arbitrary time (i.e. it should only be considered ok to skip a frame every X ms).

It’s interesting that you can see something very similar to this happening in early versions of iOS (I don’t know if it still happens or not) – when scrolling long lists with images that load in dynamically, none of the images will load while the list is animating. The user response was paramount, to the point that it was considered more important to present consistent response than it was to present complete UI. This priority, I think, is a lot of the reason the iPhone feels ‘magic’ and Android phones felt like junk up until around 4.0 (where it’s better, but still not as good as iOS).

4. Level-of-detail

This is something that I did get to partially implement while working on Firefox for Android, though I didn’t do such a great job of it so its current implementation is heavily compromised from how I wanted it to work. This is another idea stolen from game development. There will be times, during certain interactions, where processing time will be necessarily limited. Quite often though, during these times, a user’s view of the UI will be compromised in some fashion. It’s important to understand that you don’t always need to present the full-detail view of a UI. In Firefox for Android, this took the form that when scrolling fast enough that rendering couldn’t keep up, we would render at half the resolution. This let us render more, and faster, giving the impression of a consistent UI even when the hardware wasn’t quite capable of it. I notice Microsoft doing similar things since Windows 8; notice how the quality of image scaling reduces markedly while scrolling or animations are in progress. This idea is very implementation-specific. What can be dropped and what you want to drop will differ between platforms, form-factors, hardware, etc. Generally though, some things you can consider dropping: Sub-pixel anti-aliasing, high-quality image scaling, render resolution, colour-depth, animations. You may also want to consider showing partial UI if you know that it will very quickly be updated. The Android web-browser during the Honeycomb years did this, and I attempted (with limited success, because it’s hard…) to do this with Firefox for Android many years ago.


I think it’s easy to read ideas like this and think it boils down to “do everything asynchronously”. Unfortunately, if you take a naïve approach to that, you just end up with something that can be inexplicably slow sometimes and the only way to fix it is via profiling and micro-optimisations. It’s very hard to guarantee a consistent experience if you don’t manage when things happen. Yes, do everything asynchronously, but make sure you do your book-keeping and you manage when it’s done. It’s not only about splitting work up, it’s about making sure it’s done when it’s smart to do so.

You also need to be careful about how you measure these improvements, and to be aware that sometimes results in synthetic tests will even correlate to the opposite of the experience you want. A great example of this, in my opinion, is page-load speed on desktop browsers. All the major desktop browsers concentrate on prioritising the I/O and computation required to get the page to 100%. For heavy desktop sites, however, this means the browser is often very clunky to use while pages are loading (yes, even with out-of-process tabs – see the point about bandwidth above). I highlight this specifically on desktop, because you’re quite likely to not only be browsing much heavier sites that trigger this behaviour, but also to have multiple tabs open. So as soon as you load a couple of heavy sites, your entire browsing experience is compromised. I wouldn’t mind the site taking a little longer to load if it didn’t make the whole browser chug while doing so.

Don’t lose sight of your goals. Don’t compromise. Things might take longer to complete, deadlines might be missed… But polish can’t be overrated. Polish is what people feel and what they remember, and the lack of it can have a devastating effect on someone’s perception. It’s not always conscious or obvious either, even when you’re the developer. Ask yourself “Am I fully satisfied with this” before marking something as complete. You might still be able to ship if the answer is “No”, but make sure you don’t lose sight of that and make sure it gets the priority it deserves.

One last point I’ll make; I think to really execute on all of this, it requires buy-in from everyone. Not just engineers, not just engineers and managers, but visual designers, user experience, leadership… Everyone. It’s too easy to do a job that’s good enough and it’s too much responsibility to put it all on one person’s shoulders. You really need to be on the ball to produce the kind of software that Apple does almost routinely, but as much as they’d say otherwise, it isn’t magic.

May 23, 2017

Power-cycling a USB port should be simple, right?

By Harald "LaF0rge" Welte

Every so often I happen to be involved in designing electronics equipment that's supposed to run reliably remotely in inaccessible locations,without any ability for "remote hands" to perform things like power-cycling or the like. I'm talking about really remote locations, possible with no but limited back-haul, and a very high cost of ever sending somebody there for remote maintenance.

Given that a lot of computer peripherals (chips, modules, ...) use USB these days, this is often some kind of an embedded ARM (rarely x86) SoM or SBC, which is hooked up to a custom board that contains a USB hub chip as well as a line of peripherals.

One of the most important lectures I've learned from experience is: Never trust reset signals / lines, always include power-switching capability. There are many chips and electronics modules available on the market that have either no RESET, or even might claim to have a hardware RESET line which you later (painfully) discover just to be a GPIO polled by software which can get stuck, and hence no way to really hard-reset the given component.

In the case of a USB-attached device (even though the USB might only exist on a circuit board between two ICs), this is typically rather easy: The USB hub is generally capable of switching the power of its downstream ports. Many cheap USB hubs don't implement this at all, or implement only ganged switching, but if you carefully select your USB hub (or in the case of a custom PCB), you can make sure that the given USB hub supports individual port power switching.

Now the next step is how to actually use this from your (embedded) Linux system. It turns out to be harder than expected. After all, we're talking about a standard feature that's present in the USB specifications since USB 1.x in the late 1990ies. So the expectation is that it should be straight-forward to do with any decent operating system.

I don't know how it's on other operating systems, but on Linux I couldn't really find a proper way how to do this in a clean way. For more details, please read my post to the linux-usb mailing list.

Why am I running into this now? Is it such a strange idea? I mean, power-cycling a device should be the most simple and straight-forward thing to do in order to recover from any kind of "stuck state" or other related issue. Logical enabling/disabling of the port, resetting the USB device via USB protocol, etc. are all just "soft" forms of a reset which at best help with USB related issues, but not with any other part of a USB device.

And in the case of e.g. an USB-attached cellular modem, we're actually talking about a multi-processor system with multiple built-in micro-controllers, at least one DSP, an ARM core that might run another Linux itself (to implement the USB gadget), ... - certainly enough complex software that you would want to be able to power-cycle it...

I'm curious what the response of the Linux USB gurus is.

May 17, 2017

CAMEL and protocol design

By Holger "zecke" Freyther

Today I want to share the pain of running a production 3GPP TCAP/MAP/CAP system and network protocol design in general. The excellent Free Software ASN1/TCAP/MAP/CAP stack (which is made possible by the Pharo live programming environment) I helped creating is in heavy production usage (powering standard off-the-shelf components like a SGSN, an AuC or non-standard components to enable new business cases) and sees roaming traffic from a lot of networks. From time to time something odd comes up.

In TCAP/MAP/CAP messages but also Request/Response and the possible Errors are defined using ASN1. Over the last decades ETSI and 3GPP have made various major versions and minor releases (e.g. adding new optional attributes to requests/responses/errors). The biggest new standard is CAMEL and it is so big and complicated that it was specified in four phases (each phase with their own versions of the ApplicationContext, think of it as an versioned and entry into the definition for all messages and RPC calls).

One issue in supporting a specific module version (application-context-name) is to find the right minor release of 3GPP (either the newest or oldest for that ACN). Then it is a matter to copy and paste the ASN1 definition from either a PDF or a WordDocument into individual files.. and after that is done one can fix the broken imports (or modify the ASN1 parser to make a global look-up) and typos for elements.

This artificial barrier creates two issue for people implementing MAP/CAP using components. Some use inferior ASN1 tools or can’t be bothered to create the input files and decide to hardcode the message content (after all BER/DER is more or less just nested TLV entries). The second issue is related to time/effort as well. When creating the CAMEL ASN1 files I didn’t want to do the work four times (once for each phase) and searched for shortcuts too.

The first issue materialized itself by equipment sending completely broken messages or not sending mandatory(!) elements. So what happens if a big telco sends you a message the stack can’t decode, you look up the oldest and youngest release defining this ACN and see the element that is attempted to be parsed was always mandatory? Right, one adds an OPTIONAL modifier to be able to move forward…

The second issue is on me though. I started with a set of CAMEL phase3 files and assumed that only the operations (and their arguments/response) would be different across different CAMEL phases but the support structs they use would stay the same. My assumption (and this brings us to protocol design) was that besides the versioning of the module they would be conservative and extend supporting types in a forward compatible way and integrated phase2 and phase1 into the same set of files.

And then reality sets in and the logs of the system showed a message that caused an exception during parsing (normally only happens for the first kind of issue). An extension to the Request structure was changed in a not forward compatible way. Let’s have a look:

InitialDPArgExtension ::= SEQUENCE {

-naCarrierInformation [0] NACarrierInformation OPTIONAL,
-gmscAddress [1] ISDN-AddressString OPTIONAL,
+ gmscAddress [0] ISDN-AddressString OPTIONAL,
*more new optional elements*
+ …,
+ enhancedDialledServicesAllowed [11] NULL OPTIONAL,
*more elements after the extension marker*

So one element (naCarrierInformation) got removed and then every following element was renumbered and the extension marker was moved further down. In theory the InitialDPArgExtension name binding exists once in the phase2 to definition and once in phase3 and 3GPP had all rights to define a new binding with different. An engineering question is if this was a good decision?

A change in application-context allows to remove some old cruft and make room for new. The tag space might be considered a scarce resource and making room is saving a resource. On the other hand in the history of GSM no other struct had ran out of tags and there are various other approaches to the problem. The above is already an extension to an extension and the step to an extension of an extension of an extension doesn’t seem so absurd anymore.

So please think of forward compatibility when designing protocols, think of the implementor and make the definition machine readable and please get the imports right so one doesn’t need to resort to a global symbol search. If you are having interesting core network issues related to TCAP, MAP and CAP consider contacting me.

May 06, 2017

MariaDB Galera and custom health probe for Azure LoadBalancer

By Holger "zecke" Freyther

My Galera set-up on Kubernetes and the Azure LoadBalancer in front of it seem to work nicely but one big TODO is to implement proper health checks. If a node is down, in maintenance or split from the network it should not be part of the LoadBalancer. The Azure LoadBalancer has support for custom HTTP probes and I wanted to write something very simple that handles the HTTP GET, opens a MySQL connection to the destination, check if it is connected to a primary. As this is about health checks the code should be small and reliable.

To improve my Go(-lang) skills I decided to write my healthcheck in Go. And it seemed like a good idea, Go has a powerful HTTP package, a SQL API package and two MySQL implementations. So the entire prototype is just about 72 lines (with comments and empty lines) and I think that qualifies as small. Prototyping the MySQL code took some iterations but in general it went quite quickly. But how reliable is it? Go introduced the nice concept of a context.Context. So any operation should be associated with a context and it should be passed as argument from one method to another. One can create a child context and associate it with a deadline (absolute time) or timeout (relative) and has a way to cancel it.

I grabbed the Context from the HTTP Request, added a timeout and called a function to do the MySQL check. Wow that was easy. Some polish to parse the parameters from the CLI and I am ready to deploy it! But let’s see how reliable it is?

I imagined the following error conditions:

  1. The destination IP is reachable but no one listening on the port. The TCP connection will fail quickly (SYN -> RST,ACK)
  2. The destination IP ends in a blackhole (no RST, ACK) received. One would have a large connect timeout
  3. The Galera node (or machine hosting it) is overloaded. While the connect succeeds the authentication or a query might stall
  4. The Galera node is split and not a master

The first and fourth error conditions are easy to test/simulate and trivial to implement properly. I then moved to the third one. My first choice was to implement an infinitely slow Galera node and did that by using nc -l 3006 to accept a TCP connection and then send nothing. I made a healthprobe and waited… and waited.. no timeout. Not after 2s as programmed in the context, not after 2min and not after.. (okay I gave up after 30 min). Pretty discouraging!

After some reading and browsing I saw an open PR to add context.Context support to the MySQL backend. I modified my import, ran go get to fetch it, go build and retested. Okay that didn’t work either. So let’s try the other MySQL implementation, again change the package imports, go get and go build and retest. I picked the wrong package name but even after picking the right package this driver failed to parse the Database URL. At that point I decided to go back to the first implementation and have a deeper look.

So while many of the SQL API methods take a Context as argument, the Open one does not. Open says it might or might not connect to the database and in case of MySQL it does connect to it. Let’s see if there is a workaround? I could spawn a Go routine and have a selective receive on the result or a timeout. While this would make it possible to respond to the HTTP request it does create two issues. First one can’t cancel Go routines and I would leak memory, but worse I might run into a connection limit of the Galera node. What about other workarounds? It seems I can play with a custom parameter for readTimeout and writeTimeout and at least limit the timeout per I/O operation. I guess it takes a bit of tuning to find good values for a busy system and let’s hope that context.Context will be used more in more places in the future.

May 02, 2017

OsmoDevCon 2017 Review

By Harald "LaF0rge" Welte

After the public user-oriented OsmoCon 2017, we also recently had the 6th incarnation of our annual contributors-only Osmocom Developer Conference: The OsmoDevCon 2017.

This is a much smaller group, typically about 20 people, and is limited to actual developers who have a past record of contributing to any of the many Osmocom projects.

We had a large number of presentation and discussions. In fact, so large that the schedule of talks extended from 10am to midnight on some days. While this is great, it also means that there was definitely too little time for more informal conversations, chatting or even actual work on code.

We also have such a wide range of topics and scope inside Osmocom, that the traditional ad-hoch scheduling approach no longer seems to be working as it used to. Not everyone is interested in (or has time for) all the topics, so we should group them according to their topic/subject on a given day or half-day. This will enable people to attend only those days that are relevant to them, and spend the remaining day in an adjacent room hacking away on code.

It's sad that we only have OsmoDevCon once per year. Maybe that's actually also something to think about. Rather than having 4 days once per year, maybe have two weekends per year.

Always in motion the future is.

Overhyped Docker

By Harald "LaF0rge" Welte

Overhyped Docker missing the most basic features

I've always been extremely skeptical of suddenly emerging over-hyped technologies, particularly if they advertise to solve problems by adding yet another layer to systems that are already sufficiently complex themselves.

There are of course many issues with containers, ranging from replicated system libraries and the basic underlying statement that you're giving up on the system packet manager to properly deal with dependencies.

I'm also highly skeptical of FOSS projects that are primarily driven by one (VC funded?) company. Especially if their offering includes a so-called cloud service which they can stop to operate at any given point in time, or (more realistically) first get everybody to use and then start charging for.

But well, despite all the bad things I read about it over the years, on one day in May 2017 I finally thought let's give it a try. My problem to solve as a test balloon is fairly simple.

My basic use case

The plan is to start OsmoSTP, the m3ua-testtool and the sua-testtool, which both connect to OsmoSTP. By running this setup inside containers and inside an internal network, we could then execute the entire testsuite e.g. during jenkins test without having IP address or port number conflicts. It could even run multiple times in parallel on one buildhost, verifying different patches as part of the continuous integration setup.

This application is not so complex. All it needs is three containers, an internal network and some connections in between. Should be a piece of cake, right?

But enter the world of buzzword-fueled web-4000.0 software-defined virtualised and orchestrated container NFW + SDN vodoo: It turns out to be impossible, at least not with the preferred tools they advertise.


The part that worked relatively easily was writing a few Dockerfiles to build the actual containers. All based on debian:jessie from the library.

As m3ua-testsuite is written in guile, and needs to build some guile plugin/extension, I had to actually include guile-2.0-dev and other packages in the container, making it a bit bloated.

I couldn't immediately find a nice example Dockerfile recipe that would allow me to build stuff from source outside of the container, and then install the resulting binaries into the container. This seems to be a somewhat weak spot, where more support/infrastructure would be helpful. I guess the idea is that you simply install applications via package feeds and apt-get. But I digress.

So after some tinkering, I ended up with three docker containers:

  • one running OsmoSTP
  • one running m3ua-testtool
  • one running sua-testtool

I also managed to create an internal bridged network between the containers, so the containers could talk to one another.

However, I have to manually start each of the containers with ugly long command line arguments, such as docker run --network sigtran --ip -it osmo-stp-master. This is of course sub-optimal, and what Docker Services + Stacks should resolve.

Services + Stacks

The idea seems good: A service defines how a given container is run, and a stack defines multiple containers and their relation to each other. So it should be simple to define a stack with three services, right?

Well, it turns out that it is not. Docker documents that you can configure a static ipv4_address [1] for each service/container, but it seems related configuration statements are simply silently ignored/discarded [2], [3], [4].

This seems to be related that for some strange reason stacks can (at least in later versions of docker) only use overlay type networks, rather than the much simpler bridge networks. And while bridge networks appear to support static IP address allocations, overlay apparently doesn't.

I still have a hard time grasping that something that considers itself a serious product for production use (by a company with estimated value over a billion USD, not by a few hobbyists) that has no support for running containers on static IP addresses. that. How many applications out there have I seen that require static IP address configuration? How much simpler do setups get, if you don't have to rely on things like dynamic DNS updates (or DNS availability at all)?

So I'm stuck with having to manually configure the network between my containers, and manually starting them by clumsy shell scripts, rather than having a proper abstraction for all of that. Well done :/

Exposing Ports

Unrelated to all of the above: If you run some software inside containers, you will pretty soon want to expose some network services from containers. This should also be the most basic task on the planet.

However, it seems that the creators of docker live in the early 1980ies, where only TCP and UDP transport protocols existed. They seem to have missed that by the late 1990ies to early 2000s, protocols like SCTP or DCCP were invented.

But yet, in 2017, Docker chooses to

Now some of the readers may think 'who uses SCTP anyway'. I will give you a straight answer: Everyone who has a mobile phone uses SCTP. This is due to the fact that pretty much all the connections inside cellular networks (at least for 3G/4G networks, and in reality also for many 2G networks) are using SCTP as underlying transport protocol, from the radio access network into the core network. So every time you switch your phone on, or do anything with it, you are using SCTP. Not on your phone itself, but by all the systems that form the network that you're using. And with the drive to C-RAN, NFV, SDN and all the other buzzwords also appearing in the Cellular Telecom field, people should actually worry about it, if they want to be a part of the software stack that is used in future cellular telecom systems.


After spending the better part of a day to do something that seemed like the most basic use case for running three networked containers using Docker, I'm back to step one: Most likely inventing some custom scripts based on unshare to run my three test programs in a separate network namespace for isolated execution of test suite execution as part of a Jenkins CI setup :/

It's also clear that Docker apparently don't care much about playing a role in the Cellular Telecom world, which is increasingly moving away from proprietary and hardware-based systems (like STPs) to virtualised, software-based systems.


May 01, 2017

Book on Practical GPL Compliance

By Harald "LaF0rge" Welte

My former gpl-violations.org colleague Armijn Hemel and Shane Coughlan (former coordinator of the FSFE Legal Network) have written a book on practical GPL compliance issues.

I've read through it (in the bath tub of course, what better place to read technical literature), and I can agree wholeheartedly with its contents. For those who have been involved in GPL compliance engineering there shouldn't be much new - but for the vast majority of developers out there who have had little exposure to the bread-and-butter work of providing complete an corresponding source code, it makes an excellent introductory text.

The book focuses on compliance with GPLv2, which is probably not too surprising given that it's published by the Linux foundation, and Linux being GPLv2.

You can download an electronic copy of the book from https://www.linuxfoundation.org/news-media/research/practical-gpl-compliance

Given the subject matter is Free Software, and the book is written by long-time community members, I cannot help to notice a bit of a surprise about the fact that the book is released in classic copyright under All rights reserved with no freedom to the user.

Considering the sensitive legal topics touched, I can understand the possible motivation by the authors to not permit derivative works. But then, there still are licenses such as CC-BY-ND which prevent derivative works but still permit users to make and distribute copies of the work itself. I've made that recommendation / request to Shane, let's see if they can arrange for some more freedom for their readers.

April 30, 2017

OsmoCon 2017 Review

By Harald "LaF0rge" Welte

It's already one week past the event, so I really have to sit down and write some rewview on the first public Osmocom Conference ever: OsmoCon 2017.

The event was a huge success, by all accounts.

  • We've not only been sold out, but we also had to turn down some last minute registrations due to the venue being beyond capacity (60 seats). People traveled from Japan, India, the US, Mexico and many other places to attend.
  • We've had an amazing audience ranging from commercial operators to community cellular operators to professional developers doing work relate to osmocom, academia, IT security crowds and last but not least enthusiasts/hobbyists, with whom the project[s] started.
  • I've received exclusively positive feedback from many attendees
  • We've had a great programme. Some part of it was of introductory nature and probably not too interesting if you've been in Osmocom for a few years. However, the work on 3G as well as the current roadmap was probably not as widely known yet. Also, I really loved to see Roch's talk about Running a commercial cellular network with Osmocom software as well as the talk on Facebook's OpenCellular BTS hardware and the Community Cellular Manager.
  • We have very professional live streaming + video recordings courtesy of the C3VOC team. Thanks a lot for your support and for having the video recordings of all talks online already at the next day after the event.

We also received some requests for improvements, many of which we will hopefully consider before the next Osmocom Conference:

  • have a multiple day event. Particularly if you're traveling long-distance, it is a lot of overhead for a single-day event. We of course fully understand that. On the other hand, it was the first Osmocom Conference, and hence it was a test balloon where it was initially unclear if we'll be able to get a reasonable number of attendees interested at all, or not. And organizing an event with venue and talks for multiple days if in the end only 10 people attend would have been a lot of effort and financial risk. But now that we know there are interested folks, we can definitely think of a multiple day event next time
  • Signs indicating venue details on the last meters. I agree, this cold have been better. The address of the venue was published, but we could have had some signs/posters at the door pointing you to the right meeting room inside the venue. Sorry for that.
  • Better internet connectivity. This is a double-edged sword. Of course we want our audience to be primarily focused on the talks and not distracted :P I would hope that most people are able to survive a one day event without good connectivity, but for sure we will have to improve in case of a multiple-day event in the future

In terms of my requests to the attendees, I only have one

  • Participate in the discussions on the schedule/programme while it is still possible to influence it. When we started to put together the programme, I posted about it on the openbsc mailing list and invited feedback. Still, most people seem to have missed the time window during which talks could have been submitted and the schedule still influenced before finalizing it
  • Register in time. We have had almost no registrations until about two weeks ahead of the event (and I was considering to cancel it), and then suddenly were sold out in the week ahead of the event. We've had people who first booked their tickets, only to learn that the tickets were sold out. I guess we will introduce early bird pricing and add a very expensive last minute ticket option next year in order to increase motivation to register early and thus give us flexibility regarding venue planning.

Thanks again to everyone involved in OsmoCon 2017!

Ok, now, all of you who missed the event: Go to https://media.ccc.de/c/osmocon17 and check out the recordings. Have fun!

April 24, 2017

Troubleshooting Kubernetes/Azure Storage

By Holger "zecke" Freyther

In my previous posts I wrote about my set-up of MariaDB Galera on Kubernetes. Now I have some first experience with this set-up and can provide some guidance. I used an ill-fated TCP health-check that lead to MariaDB Galera blocking the originating IPv4 address from accessing the cluster due to never completing a MySQL handshake and it seems (logs are gone) that this lead to the sync between different systems breaking too.

When I woke up my entire cluster was down and didn’t recover. Some pods restarted and I run into a Azure Kubernetes bug where a Persistent Storage would be umounted but not detached. This means the storage can not be re-attached to the new pod. The Microsoft upstream project is a bit hostile but the issue is known. If you are seeing an error about the storage still being detached/attached. You can go to the portal, find the agent that has it attached and detach it by hand.

To bring the cluster back online there is a chicken/egg problem. The entrypoint.sh discovers the members of the cluster by using environment variables. If the cluster is entirely down and the first pod is starting, it will just exit as it can’t connect to the others. My first approach was to keep the other nodes down and use kubectl edit rc/galera-node-X and set replicas to 0. But then the service is still exporting the information. In the end I deleted the srv/galera-node-X and waited for the first pod to start. Once it was up I could re-create the services again.

My next steps are to add proper health checks, some monitoring and see if there is a more long term archive for the log data of a (deleted) pod.


April 16, 2017

Things you find when using SCTP on Linux

By Harald "LaF0rge" Welte

Observations on SCTP and Linux

When I was still doing Linux kernel work with netfilter/iptables in the early 2000's, I was somebody who actually regularly had a look at the new RFCs that came out. So I saw the SCTP RFCs, SIGTRAN RFCs, SIP and RTP, etc. all released during those years. I was quite happy to see that for new protocols like SCTP and later DCCP, Linux quickly received a mainline implementation.

Now most people won't have used SCTP so far, but it is a protocol used as transport layer in a lot of telecom protocols for more than a decade now. Virtually all protocols that have traditionally been spoken over time-division multiplex E1/T1 links have been migrated over to SCTP based protocol stackings.

Working on various Open Source telecom related projects, i of course come into contact with SCTP every so often. Particularly some years back when implementing the Erlang SIGTAN code in erlang/osmo_ss7 and most recently now with the introduction of libosmo-sigtran with its OsmoSTP, both part of the libosmo-sccp repository.

I've also hard to work with various proprietary telecom equipment over the years. Whether that's some eNodeB hardware from a large brand telecom supplier, or whether it's a MSC of some other vendor. And they all had one thing in common: Nobody seemed to use the Linux kernel SCTP code. They all used proprietary implementations in userspace, using RAW sockets on the kernel interface.

I always found this quite odd, knowing that this is the route that you have to take on proprietary OSs without native SCTP support, such as Windows. But on Linux? Why? Based on rumors, people find the Linux SCTP implementation not mature enough, but hard evidence is hard to come by.

As much as it pains me to say this, the kind of Linux SCTP bugs I have seen within the scope of our work on Osmocom seem to hint that there is at least some truth to this (see e.g. https://bugzilla.redhat.com/show_bug.cgi?id=1308360 or https://bugzilla.redhat.com/show_bug.cgi?id=1308362).

Sure, software always has bugs and will have bugs. But we at Osmocom are 10-15 years "late" with our implementations of higher-layer protocols compared to what the mainstream telecom industry does. So if we find something, and we find it even already during R&D of some userspace code, not even under load or in production, then that seems a bit unsettling.

One would have expected, with all their market power and plenty of Linux-based devices in the telecom sphere, why did none of those large telecom suppliers invest in improving the mainline Linux SCTP code? I mean, they all use UDP and TCP of the kernel, so it works for most of the other network protocols in the kernel, but why not for SCTP? I guess it comes back to the fundamental lack of understanding how open source development works. That it is something that the given industry/user base must invest in jointly.

The leatest discovered bug

During the last months, I have been implementing SCCP, SUA, M3UA and OsmoSTP (A Signal Transfer Point). They were required for an effort to add 3GPP compliant A-over-IP to OsmoBSC and OsmoMSC.

For quite some time I was seeing some erratic behavior when at some point the STP would not receive/process a given message sent by one of the clients (ASPs) connected. I tried to ignore the problem initially until the code matured more and more, but the problems remained.

It became even more obvious when using Michael Tuexen's m3ua-testtool, where sometimes even the most basic test cases consisting of sending + receiving a single pair of messages like ASPUP -> ASPUP_ACK was failing. And when the test case was re-tried, the problem often disappeared.

Also, whenever I tried to observe what was happening by meas of strace, the problem would disappear completely and never re-appear until strace was detached.

Of course, given that I've written several thousands of lines of new code, it was clear to me that the bug must be in my code. Yesterday I was finally prepare to accept that it might actually be a Linux SCTP bug. Not being able to reproduce that problem on a FreeBSD VM also pointed clearly into this direction.

Now I could simply have collected some information and filed a bug report (which some kernel hackers at RedHat have thankfully invited me to do!), but I thought my use case was too complex. You would have to compile a dozen of different Osmocom libraries, configure the STP, run the scheme-language m3ua-testtool in guile, etc. - I guess nobody would have bothered to go that far.

So today I tried to implement a test case that reproduced the problem in plain C, without any external dependencies. And for many hours, I couldn't make the bug to show up. I tried to be as close as possible to what was happening in OsmoSTP: I used non-blocking mode on client and server, used the SCTP_NODELAY socket option, used the sctp_rcvmsg() library wrapper to receive events, but the bug was not reproducible.

Some hours later, it became clear that there was one setsockopt() in OsmoSTP (actually, libosmo-netif) which enabled all existing SCTP events. I did this at the time to make sure OsmoSTP has the maximum insight possible into what's happening on the SCTP transport layer, such as address fail-overs and the like.

As it turned out, adding that setsockopt for SCTP_FLAGS to my test code made the problem reproducible. After playing around which of the flags, it seems that enabling the SENDER_DRY_EVENT flag makes the bug appear.

You can find my detailed report about this issue in https://bugzilla.redhat.com/show_bug.cgi?id=1442784 and a program to reproduce the issue at http://people.osmocom.org/laforge/sctp-nonblock/sctp-dry-event.c

Inside the Osmocom world, luckily we can live without the SENDER_DRY_EVENT and a corresponding work-around has been submitted and merged as https://gerrit.osmocom.org/#/c/2386/

With that work-around in place, suddenly all the m3ua-testtool and sua-testtool test cases are reliably green (PASSED) and OsmoSTP works more smoothly, too.

What do we learn from this?

Free Software in the Telecom sphere is getting too little attention. This is true even those small portions of telecom relevant protocols that ended up in the kernel like SCTP or more recently the GTP module I co-authored. They are getting too little attention in development, even more lack of attention in maintenance, and people seem to focus more on not using it, rather than fixing and maintaining what is there.

It makes me really sad to see this. Telecoms is such a massive industry, with billions upon billions of revenue for the classic telecom equipment vendors. Surely, they would be able to co-invest in some basic infrastructure like proper and reliable testing / continuous integration for SCTP. More recently, we see millions and more millions of VC cash burned by buzzword-flinging companies doing "NFV" and "SDN". But then rather reimplement network stacks in userspace than to fix, complete and test those little telecom infrastructure components which we have so far, like the SCTP protocol :(

Where are the contributions to open source telecom parts from Ericsson, Nokia (former NSN), Huawei and the like? I'm not even dreaming about the actual applications / network elements, but merely the maintenance of something as basic as SCTP. To be fair, Motorola was involved early on in the Linux SCTP code, and Huawei contributed a long series of fixes in 2013/2014. But that's not the kind of long-term maintenance contribution that one would normally expect from the primary interest group in SCTP.

Finally, let me thank to the Linux SCTP maintainers. I'm not complaining about them! They're doing a great job, given the arcane code base and the fact that they are not working for a company that has SCTP based products as their core business. I'm sure the would love more support and contributions from the Telecom world, too.

April 09, 2017

SIGTRAN/SS7 stack in libosmo-sigtran merged to master

By Harald "LaF0rge" Welte

As I blogged in my blog post in Fabruary, I was working towards a more fully-featured SIGTRAN stack in the Osmocom (C-language) universe.

The trigger for this is the support of 3GPP compliant AoIP (with a BSSAP/SCCP/M3UA/SCTP protocol stacking), but it is of much more general nature.

The code has finally matured in my development branch(es) and is now ready for mainline inclusion. It's a series of about 77 (!) patches, some of which already are the squashed results of many more incremental development steps.

The result is as follows:

  • General SS7 core functions maintaining links, linksets and routes
  • xUA functionality for the various User Adaptations (currently SUA and M3UA supported)
    • MTP User SAP according to ITU-T Q.701 (using osmo_prim)
    • management of application servers (AS)
    • management of application server processes (ASP)
    • ASP-SM and ASP-TM state machine for ASP, AS-State Machine (using osmo_fsm)
    • server (SG) and client (ASP) side implementation
    • validated against ETSI TS 102 381 (by means of Michael Tuexen's m3ua-testtool)
    • support for dynamic registration via RKM (routing key management)
    • osmo-stp binary that can be used as Signal Transfer Point, with the usual "Cisco-style" command-line interface that all Osmocom telecom software has.
  • SCCP implementation, with strong focus on Connection Oriented SCCP (as that's what the A interface uses).
    • osmo_fsm based state machine for SCCP connection, both incoming and outgoing
    • SCCP User SAP according to ITU-T Q.711 (osmo_prim based)
    • Interfaces with underlying SS7 stack via MTP User SAP (osmo_prim based)
    • Support for SCCP Class 0 (unit data) and Class 2 (connection oriented)
    • All SCCP + SUA Address formats (Global Title, SSN, PC, IPv4 Address)
    • SCCP and SUA share one implementation, where SCCP messages are transcoded into SUA before processing, and re-encoded into SCCP after processing, as needed.

I have already done experimental OsmoMSC and OsmoHNB-GW over to libosmo-sigtran. They're now all just M3UA clients (ASPs) which connect to osmo-stp to exchange SCCP messages back and for the between them.

What's next on the agenda is to

  • finish my incomplete hacks to introduce IPA/SCCPlite as an alternative to SUA and M3UA (for backwards compatibility)
  • port over OsmoBSC to the SCCP User SAP of libosmo-sigtran
    • validate with SSCPlite lower layer against existing SCCPlite MSCs
  • implement BSSAP / A-interface procedures in OsmoMSC, on top of the SCCP-User SAP.

If those steps are complete, we will have a single OsmoMSC that can talk both IuCS to the HNB-GW (or RNCs) for 3G/3.5G as well as AoIP towards OsmoBSC. We will then have fully SIGTRAN-enabled the full Osmocom stack, and are all on track to bury the OsmoNITB that was devoid of such interfaces.

If any reader is interested in interoperability testing with other implementations, either on M3UA or on SCCP or even on A or Iu interface level, please contact me by e-mail.

April 03, 2017

Starting to use the Galera cluster

By Holger "zecke" Freyther

In my previous post I wrote about getting a MariaDB Galera cluster  started on Kubernetes. One of my open issues was how to get my existing VM to connect to it. With Microsoft Azure the first thing is to add Network peering between the Kubernetes cluster and the normal VM network. As previously mentioned the internal IPv4 address of the Galera service is not reachable from outside and the three types of exposing a service are:

  • LoadBalancer
  • ClusterIP
  • NodePort

While the default Microsoft Azure setup already has two LoadBalancers, the kubectl expose –type=LoadBalancer command does not seem to allow me to chose which load balancer to use. So after trying this command my Galera cluster was reachable through a public IPv4 address on the standard MySQL port. While it is password protected it didn’t seem like a good idea. To change the config you can use something like kubectl edit srv/galera-cluster and change the type to another one. Then I tried the NodePort type and got the MySQL port exposed on all masters and thanks to the network peering was able to connect to them directly. Then I manually modified the already configured/created Microsoft Azure LoadBalancer for the three masters to export port 3306 and map it to the internal port. I am also doing a basic health check which checks if port 3306 can be connected to.

Now I can start using the Galera cluster from my container based deployment before migrating it fully to Kubernetes. My next step is probably to improve the health checks to only get primaries listed in the LoadBalancer and then add monitoring to it as well.

March 27, 2017

Galera on Kubernetes

By Holger "zecke" Freyther

As part of my journey to “cloud” computing I built a service that is using MySQL and as preparation for the initial deployment I set myself the following constraints:

  • Deploy in containers
  • Be able to tolerate some failure of ” VM”s
  • Be able to grow/replace storage without downtime


There are pre-made mariadb:10.1 containers but to not rely on a public registry I have used the Microsoft Azure Container Service to upload my container. The integration into the standard docker tools to create and upload containers just worked. It allows me to give a place for modified containers as well.


With Azure it doesn’t seem possible to online resize (grow) a volume and if I ever want to switch from ext4 to xfs (or zfs?) I should run some form of fault tolerant MySQL to take a node and upgrade it. These days MariaDB 10.1 includes Galera support and besides some systematic issues (which I don’t seem to run in as I have little to no transactions) it seems quite easy to set-up.

Fault tolerance

Fault tolerance comes in a couple flavors. Galera is a multi-master database where the cluster will continue to allow writes as long as there is a majority of active nodes. If I start with three nodes, I can take one off the cluster to maintain.

Kubernetes will reschedule a pod/container to a different machine (“agent”) in case one becomes unhealthy and it will expose the Galera cluster through a LoadBalancer and a single IPv4 address for it. This means only active members of the cluster will be contacted.

The last part is provided by Microsoft Azures availability set. Distributing the Agents into different zones should prevent all of them to go down at the same time during maintenance.

So in theory this looks quite nice, only practice will tell how this will play out.


After having picked Microsoft Azure, Kubernetes and Galera, it is time to set it up. I have started with an example found here. I had to remove some labels to make it work with the current format, moved the container to mariadb:10.1 and modified the default config.

I had to look a bit on how to get persistent storage. I am directly mounting the disk for the pod an alternative is a persistent volume claim. This might be a better approach.

The biggest issue is starting the first service. It requires to pass special parameters to initialize the cluster and involved a round of kubectl edit/kubectl delete to get it up. Having the second and third member join was more easy.


Besides having to gain more experience with it, I do face a couple of problems with this setup and need to explore solutions (or wait for comments?).

I deployed my application before having a Kubernetes cluster and now need to migrate. The default networking of Kubernetes works by adding a lot of masquerading entries on agents and masters. In the cluster these addresses are routable by masquerading but from external they are not reachable. I need to find a way to access it, probably by sacrificing some redundancy first. The other option is to use kubectl expose but I don’t want my cluster to have a public IPv4 address. I need to see how to have an internal load balancer with a private/internal IPv4 address.

Galera cluster management is a bit troubling. The first time I start with a new disk it will not properly connect to the master but would register itself to the LoadBalancer/Service. I manually need to do a kubectl delete of the pod and wait for it to reschedule. That is probably easy to fix. The second part of the problem is that I should use health checks and only register the pod once it has connected and synced to the primaries.

Rolling upgrades seem to have a systematic issue too. The default way for the built-in replication controller looks like a new pod (N+1) will be launched and brought up and then the current galera node will be stopped (back to N). This falls apart with the way I mount the storage/disk. E.g. the new pod can not mount the disk as it is already mounted and the old pod will not be deleted.

Least problematic is auto-scaling. In the example set-up each node is a service by itself, using one persistent disk. It makes scaling the cluster a bit difficult. I can add new nodes and they will discover the master(s) but to have the masters remember the new nodes, I would need to have the pods recycle.


March 26, 2017

OsmoCon 2017 Updates: Travel Grants and Schedule

By Harald "LaF0rge" Welte


April 21st is approaching fast, so here some updates. I'm particularly happy that we now have travel grants available. So if the travel expenses were preventing you from attending so far: This excuse is no longer valid!

Get your ticket now, before it is too late. There's a limited number of seats available.

OsmoCon 2017 Schedule

The list of talks for OsmoCon 2017 has been available for quite some weeks, but today we finally published the first actual schedule.

As you can see, the day is fully packed with talks about Osmocom cellular infrastructure projects. We had to cut some talk slots short (30min instead of 45min), but I'm confident that it is good to cover a wider range of topics, while at the same time avoiding fragmenting the audience with multiple tracks.

OsmoCon 2017 Travel Grants

We are happy to announce that we have received donations to permit for providing travel grants!

This means that any attendee who is otherwise not able to cover their travel to OsmoCon 2017 (e.g. because their interest in Osmocom is not related to their work, or because their employer doesn't pay the travel expenses) can now apply for such a travel grant.

For more details see OsmoCon 2017 Travel Grants and/or contact osmocon2017@sysmocom.de.

OsmoCon 2017 Social Event

Tech Talks are nice and fine, but what many people enjoy even more at conferences is the informal networking combined with good food. For this, we have the social event at night, which is open to all attendees.

See more details about it at OsmoCon 2017 Social Event.

March 23, 2017

Upcoming v3 of Open Hardware miniPCIe WWAN modem USB breakout board

By Harald "LaF0rge" Welte

Back in October 2016 I designed a small open hardware breakout board for WWAN modems in mPCIe form-factor. I was thinking some other people might be interested in this, and indeed, the first manufacturing batch is already sold out by now.

Instead of ordering more of the old (v2) design, I decided to do some improvements in the next version:

  • add mounting holes so the PCB can be mounted via M3 screws
  • add U.FL and SMA sockets, so the modems are connected via a short U.FL to U.FL cable, and external antennas or other RF components can be attached via SMA. This provides strain relief for the external antenna or cabling and avoids tearing off any of the current loose U.FL to SMA pigtails
  • flip the SIM slot to the top side of the PCB, so it can be accessed even after mounting the board to some base plate or enclosure via the mounting holes
  • more meaningful labeling of the silk screen, including the purpose of the jumpers and the input voltage.

A software rendering of the resulting v3 PCB design files that I just sent for production looks like this:


Like before, the design of the board (including schematics and PCB layout design files) is available as open hardware under CC-BY-SA license terms. For more information see http://osmocom.org/projects/mpcie-breakout/wiki

It will take some expected three weeks until I'll see the first assembled boards.

I'm also planning to do a M.2 / NGFF version of it, but haven't found the time to get around doing it so far.

March 21, 2017

Osmocom - personal thoughts

By Harald "LaF0rge" Welte

As I just wrote in my post about TelcoSecDay, I sometimes worry about the choices I made with Osmocom, particularly when I see all the great stuff people doing in fields that I previously was working in, such as applied IT security as well as Linux Kernel development.


When people like Dieter, Holger and I started to play with what later became OpenBSC, it was just for fun. A challenge to master. A closed world to break open and which to attack with the tools, the mindset and the values that we brought with us.

Later, Holger and I started to do freelance development for commercial users of Osmocom (initially basically only OpenBSC, but then OsmoSGSN, OsmoBSC, OsmoBTS, OsmoPCU and all the other bits on the infrastructure side). This lead to the creation of sysmocom in 2011, and ever since we are trying to use revenue from hardware sales as well as development contracts to subsidize and grow the Osmocom projects. We're investing most of our earnings directly into more staff that in turn works on Osmocom related projects.


It's important to draw the distinction betewen the Osmocom cellular infrastructure projects which are mostly driven by commercial users and sysmocom these days, and all the many other pure juts-for-fun community projects under the Osmocom umbrella, like OsmocomTETRA, OsmocomGMR, rtl-sdr, etc. I'm focussing only on the cellular infrastructure projects, as they are in the center of my life during the past 6+ years.

In order to do this, I basically gave up my previous career[s] in IT security and Linux kernel development (as well as put things like gpl-violations.org on hold). This is a big price to pay for crating more FOSS in the mobile communications world, and sometimes I'm a bit melancholic about the "old days" before.

Financial wealth is clearly not my primary motivation, but let me be honest: I could have easily earned a shitload of money continuing to do freelance Linux kernel development, IT security or related consulting. There's a lot of demand for related skills, particularly with some experience and reputation attached. But I decided against it, and worked several years without a salary (or almost none) on Osmocom related stuff [as did Holger].

But then, even with all the sacrifices made, and the amount of revenue we can direct from sysmocom into Osmocom development: The complexity of cellular infrastructure vs. the amount of funding and resources is always only a fraction of what one would normally want to have to do a proper implementation. So it's constant resource shortage, combined with lots of unpaid work on those areas that are on the immediate short-term feature list of customers, and that nobody else in the community feels like he wants to work on. And that can be a bit frustrating at times.

Is it worth it?

So after 7 years of OpenBSC, OsmocomBB and all the related projects, I'm sometimes asking myself whether it has been worth the effort, and whether it was the right choice.

It was right from the point that cellular technology is still an area that's obscure and unknown to many, and that has very little FOSS (though Improving!). At the same time, cellular networks are becoming more and more essential to many users and applications. So on an abstract level, I think that every step in the direction of FOSS for cellular is as urgently needed as before, and we have had quite some success in implementing many different protocols and network elements. Unfortunately, in most cases incompletely, as the amount of funding and/or resources were always extremely limited.


On the other hand, when it comes to metrics such as personal satisfaction or professional pride, I'm not very happy or satisfied. The community remains small, the commercial interest remains limited, and as opposed to the Linux world, most players have a complete lack of understanding that FOSS is not a one-way road, but that it is important for all stakeholders to contribute to the development in terms of development resources.

Project success?

I think a collaborative development project (which to me is what FOSS is about) is only then truly successful, if its success is not related to a single individual, a single small group of individuals or a single entity (company). And no matter how much I would like the above to be the case, it is not true for the Osmocom cellular infrastructure projects. Take away Holger and me, or take away sysmocom, and I think it would be pretty much dead. And I don't think I'm exaggerating here. This makes me sad, and after all these years, and after knowing quite a number of commercial players using our software, I would have hoped that the project rests on many more shoulders by now.

This is not to belittle the efforts of all the people contributing to it, whether the team of developers at sysmocom, whether those in the community that still work on it 'just for fun', or whether those commercial users that contract sysmocom for some of the work we do. Also, there are known and unknown donors/funders, like the NLnet foundation for some parts of the work. Thanks to all of you, and clearly we wouldn't be where we are now without all of that!

But I feel it's not sufficient for the overall scope, and it's not [yet] sustainable at this point. We need more support from all sides, particularly those not currently contributing. From vendors of BTSs and related equipment that use Osmocom components. From operators that use it. From individuals. From academia.

Yes, we're making progress. I'm happy about new developments like the Iu and Iuh support, the OsmoHLR/VLR split and 2G/3G authentication that Neels just blogged about. And there's progress on the SIMtrace2 firmware with card emulation and MITM, just as well as there's progress on libosmo-sigtran (with a more complete SUA, M3UA and connection-oriented SCCP stack), etc.

But there are too little people working on this, and those people are mostly coming from one particular corner, while most of the [commercial] users do not contribute the way you would expect them to contribute in collaborative FOSS projects. You can argue that most people in the Linux world also don't contribute, but then the large commercial beneficiaries (like the chipset and hardware makers) mostly do, as are the large commercial users.

All in all, I have the feeling that Osmocom is as important as it ever was, but it's not grown up yet to really walk on its own feet. It may be able to crawl, though ;)

So for now, don't panic. I'm not suffering from burn-out, mid-life crisis and I don't plan on any big changes of where I put my energy: It will continue to be Osmocom. But I also think we have to have a more open discussion with everyone on how to move beyond the current situation. There's no point in staying quiet about it, or to claim that everything is fine the way it is. We need more commitment. Not from the people already actively involved, but from those who are not [yet].

If that doesn't happen in the next let's say 1-2 years, I think it's fair that I might seriously re-consider in which field and in which way I'd like to dedicate my [I would think considerable] productive energy and focus.

Returning from TelcoSecDay 2017 / General Musings

By Harald "LaF0rge" Welte

I'm just on my way back from the Telecom Security Day 2017 <https://www.troopers.de/troopers17/telco-sec-day/>, which is an invitation-only event about telecom security issues hosted by ERNW back-to-back with their Troopers 2017 <https://www.troopers.de/troopers17/> conference.

I've been presenting at TelcoSecDay in previous years and hence was again invited to join (as attendee). The event has really gained quite some traction. Where early on you could find lots of IT security / hacker crowds, the number of participants from the operator (and to smaller extent also equipment maker) industry has been growing.

The quality of talks was great, and I enjoyed meeting various familiar faces. It's just a pity that it's only a single day - plus I had to head back to Berlin still today so I had to skip the dinner + social event.

When attending events like this, and seeing the interesting hacks that people are working on, it pains me a bit that I haven't really been doing much security work in recent years. netfilter/iptables was at least somewhat security related. My work on OpenPCD / librfid was clearly RFID security oriented, as was the work on airprobe, OsmocomTETRA, or even the EasyCard payment system hack

I have the same feeling when attending Linux kernel development related events. I have very fond memories of working in both fields, and it was a lot of fun. Also, to be honest, I believe that the work in Linux kernel land and the general IT security research was/is appreciated much more than the endless months and years I'm now spending my time with improving and extending the Osmocom cellular infrastructure stack.

Beyond the appreciation, it's also the fact that both the IT security and the Linux kernel communities are much larger. There are more people to learn from and learn with, to engage in discussions and ping-pong ideas. In Osmocom, the community is too small (and I have the feeling, it's actually shrinking), and in many areas it rather seems like I am the "ultimate resource" to ask, whether about 3GPP specs or about Osmocom code structure. What I'm missing is the feeling of being part of a bigger community. So in essence, my current role in the "Open Source Cellular" corner can be a very lonely one.

But hey, I don't want to sound more depressed than I am, this was supposed to be a post about TelcoSecDay. It just happens that attending IT Security and/or Linux Kernel events makes me somewhat gloomy for the above-mentioned reasons.

Meanwhile, if you have some interesting projcets/ideas at the border between cellular protocols/systems and security, I'd of course love to hear if there's some way to get my hands dirty in that area again :)

March 07, 2017

VMware becomes gold member of Linux Foundation: And what about the GPL?

By Harald "LaF0rge" Welte

As we can read in recent news, VMware has become a gold member of the Linux foundation. That causes - to say the least - very mixed feelings to me.

One thing to keep in mind: The Linux Foundation is an industry association, it exists to act in the joint interest of it's paying members. It is not a charity, and it does not act for the public good. I know and respect that, while some people sometimes appear to be confused about its function.

However, allowing an entity like VMware to join, despite their many years long disrespect for the most basic principles of the FOSS Community (such as: Following the GPL and its copyleft principle), really is hard to understand and accept.

I wouldn't have any issue if VMware would (prior to joining LF) have said: Ok, we had some bad policies in the past, but now we fully comply with the license of the Linux kernel, and we release all derivative/collective works in source code. This would be a positive spin: Acknowledge past issues, resolve the issues, become clean and then publicly underlining your support of Linux by (among other things) joining the Linux Foundation. I'm not one to hold grudges against people who accept their past mistakes, fix the presence and then move on. But no, they haven't fixed any issues.

They are having one of the worst track records in terms of intentional GPL compliance issues for many years, showing outright disrespect for Linux, the GPL and ultimately the rights of the Linux developers, not resolving those issues and at the same time joining the Linux Foundation? What kind of message sends that?

It sends the following messages:

  • you can abuse Linux, the GPL and copyleft while still being accepted amidst the Linux Foundation Members
  • it means the Linux Foundations has no ethical concerns whatsoever about accepting such entities without previously asking them to become clean
  • it also means that VMware has still not understood that Linux and FOSS is about your actions, particularly the kind of choices you make how to technically work with the community, and not against it.

So all in all, I think this move has seriously damaged the image of both entities involved. I wouldn't have expected different of VMware, but I would have hoped the Linux Foundation had some form of standards as to which entities they permit amongst their ranks. I guess I was being overly naive :(

It's a slap in the face of every developer who writes code not because he gets paid, but because it is rewarding to know that copyleft will continue to ensure the freedom of related code.

UPDATE (March 8, 2017):
 I was mistaken in my original post in that VMware didn't just join, but was a Linux Foundation member already before, it is "just" their upgrade from silver to gold that made the news recently. I stand corrected. Still doesn't make it any better that the are involved inside LF while engaging in stepping over the lines of license compliance.
UPDATE2 (March 8, 2017):
 As some people pointed out, there is no verdict against VMware. Yes, that's true. But the mere fact that they rather distribute derivative works of GPL licensed software and take this to court with an armada of lawyers (instead of simply complying with the license like everyone else) is sad enough. By the time there will be a final verdict, the product is EOL. That's probably their strategy to begin with :/

Gory details of USIM authentication sequence numbers

By Harald "LaF0rge" Welte

I always though I understood UMTS AKA (authentication and key agreement), including the re-synchronization procedure. It's been years since I wrote tools like osmo-sim-auth which you can use to perform UMTS AKA with a SIM card inserted into a PC reader, i.e. simulate what happens between the AUC (authentication center) in a network and the USIM card.

However, it is only now as the sysmocom team works on 3G support of the dedicated OsmoHLR (outside of OsmoNITB!), that I seem to understand all the nasty little details.

I always thought for re-synchronization it is sufficient to simply increment the SQN (sequence number). It turns out, it isn't as there is a MSB-portion called SEQ and a lower-bit portion called IND, used for some fancy array indexing scheme of buckets of highest-used-SEQ within that IND bucket.

If you're interested in all the dirty details and associated spec references (the always hide the important parts in some Annex) see the discussion between Neels and me in Osmocom redmine issue 1965.