Note to myself: Regular Expressions performance

This post is mostly as a reminder for myself, to not loose these important links again. But said that, it's probably interesting for you too, when you care about performance and the .NET behaviour around regular expressions (Regex).

The common things

In the .NET world, some things are very common. First is, you are advised to use a StringBuilder whenever you concatenate some strings. Second is: If a Regex is slow, use RegexOptions.Compiled to fix it. Well... Now, in fact there are reasons for this sort of advise. String concatenation IS slow, for various, commonly known reasons. But still a StringBuilder has some overhead and there are situations where using it imposes an unwanted overhead.

The very same goes for RegexOptions.Compiled, and Jeff Atwood, aka Coding Horror, wrote a very good article about that a few years ago: To compile or not to compile (Jeff Atwood).

In one of the comments another article from MSDN (BCL Blog) is referenced, where the different caching behaviour of Regex in .NET 1.1 vs. .NET 2.0 is explained: Regex Class Caching Changes between .NET Framework 1.1 and .NET Framework 2.0 (Josh Free).

The not-so-common things

There is only a single thing that is true for each and every kind of performance optimization. And it's the simple two words: "It depends.".

With regular expressions, the first thing any performance issue depends on is, if you really need a regular expression for the task. Of course, if you really know regular expressions, what they can do and what they can't, and for what they are the correct tool, you are very likely to not run into those kinds of problems. But when you just learned about the power of Regexes (all you have is a hammer) everything starts to look as a string desperatly waiting to get matched (everything is a nail). What I want to say is: Not everything that could be solved with a Regex also should be solved by one. Again, I have a link for me and you to keep in your Regex link collection: Regular Expressions: Now You Have Two Problems (Jeff Atwood).

Now, finally to performance optimization links.

There is a good blog article series on the MSDN BCL Blog (like the one above) that goes very deep into how the Regex class performs in different scenarios. You find them here:

And, besides those, once again a nice article on "catastrophic backtracking" from Jeff: Regex Performance (Jeff Atwood).

One more thing

There are three articles, that are not really available anymore. Three very good articles from Mike, that you can only retrieve from the wayback machine. I'm really thinking hard about providing a mirror for these articles on my blog too. But until then, here are the links:

My résumé of BASTA conference 2013

Right now I'm on the train on my journey back from BASTA 2013. Time for a small résumé.

Making it short: The BASTA conference was great. Especially of course meeting friends and colleagues you usually just see at conferences. It's quite interesting to see some of the guys you know from other conferences like EKON or Delphi Live! that are now also here at BASTA (especially from tool vendors that have their origin in the Delphi space).

Besides my own session about JavaScript Unit Testing (here are the Slides and code samples and also a small summary (in German) published just after my talk) I also attended some other sessions. Especially the whole "Modern Business Application" track, mainly driven from our colleagues at thinktecture was very interesting.

But perhaps even more interesting were some of the boothes. Especially one of them caught my attention: Testhub (in German). I really like the idea of outsourcing your testing efforts to testing specialists. And also the price tag on this kind of crowd-testing seems very appealing. I had the opportunity to talk directly to one of the founders of Testhub, and chatted with him about requirements against testing and the whole concept seems very well thought-out.

Im a bit sad that I have to leave that early, but I have other duties that cannot I don't want to wait. I wish all other BASTA attendees and speakers another two exciting days in Mainz. I'm looking forward to see some of them at EKON in November.

Update: Added a link to the summary article on "Windows Developer".

I’m Speaking at EKON 17, too

Nick started the "I'm speaking at" campaign for EKON 17, and so I thought I'd team up and join him, not only on this campaign but also going to support his Unit-Testing session with some automation tips & tricks in my own CI session.

I'm giving two sessions at

EKON 17, 04. -06. November 2013 in Cologne:

Both talks will be held in German, but since I keep my slides in english you should be able to follow, and I'm also happy to explain things in english when asked to do so.

The first session is about Continuous Integration (CI), where I'm going to explain why you should do it and what tools are available and how powerful a good CI setup can be.

The second talk is an introduction into Git. I'm explaining the differences between SVN and Git, and showing you what distributed version control can do for you.

So, I'd be happy to see you at EKON in Cologne.

Oh, and if you're still unsure about whether you should go, then just contact me. We can arrange a little discount 😉

EKON 17 Banner

And some additional info about EKON (in German):

EKON 17 – Das Konferenz-Highlight für Delphi-Entwickler

Vom 04. bis 06. November 2013 präsentiert das Entwickler Magazin in Köln die 17. Entwickler Konferenz. Die EKON ist das große jährliche Konferenz-Highlight für die Delphi-Community und bietet in diesem Jahr insgesamt 30 Sessions und 4 Workshops mit vielen nationalen und internationalen Profis der Delphi-Community – unter anderen mit Marco Cantú von Embarcadero, Bernd Ua, Ray Konopka, Nick Hodges, Cary Jensen u.v.m. Fünf Tracks stehen zur Auswahl, von Tips &Techniques, IDE & Tools, Crossplatform/ Mobile/ Web über OOP/ Fundamentals bis hin zu Datenbanken. Auch Neuigkeiten aus dem Hause Embarcadero, wie beispielsweise die iOS-Entwicklung stehen auf der Agenda. Alle Infos auf www.entwickler-konferenz.de.

Why is everyone using Disqus?

Recently I discovered that more and more Blogs I visit start to use Disqus. And I don't understand, why.

As Tim Bray said: "Own your space on the Web, and pay for it. Extra effort, but otherwise you’re a sharecropper".

I read it as this is not just about owning your own 'real estate' on the web, but also owning your content. There is a saying going through the net (I couldn't discover the original source, but it's quoted like hell out there): "If you're not paying for it, you're the product".

What's this Disqus thing in the first place?

Maybe the reason that I can't understand why everyone starts to use Disqus is, that I didn't get the Disqus concept right myself.
For me, Disqus is outsourcing the discussion part (read: comments area) from your blog (or website) to a third party. Completely. Including the user-generated content in these comments.

Disqus offers itself as an easy way to build up communities. It's targeted at advertising, so everything you post in your comments via Disqus may will be used to target ads.

If your read through their terms and conditions, you will notice that your personal identifiable information you and third parties (like Facebook, when connecting to Disqus) pass to them may also be sold, together with any information you 'publicly share' via their service (read: your comments).

What's so bad about it?

Well, you may decide for yourself if not actually owning your own content is okay for you. You may decide to share your comments on a site that uses Disqus or you may decide to NOT share your thoughts there.

But making this decision for your Blog is making this decision for all your visitors and everyone that want to comment on your posts is forced to share their comments with Disqus - or not share their thoughts with you.

The latter is the real problem with it. I won't comment on sites using Disqus. So you won't receive feedback from me. Okay, some people would rather say that's a good thing ;-), but others would be pretty happy about what I have to say.

The technical debt

On several occasions I noticed that the Disqus service isn't that reliable. I am commuting a lot. Right now I'm sitting in a train and have tethered internet connection. Mostly, Disqus doesn't load at all for me. I can't tell why. Especially not why it mostly happens when I'm on a tethered connection. And honestly, I don't care.

When using Disqus for your site, you're not only sourcing out your comments and your user's content, but also the continuity of your service. What, if the Disqus API changes? You need to react, or lose you're comments. What, if they decide to shut down the service? You lose your comments. Maybe you're able to export all 'your' stuff previously. But then you're on your own how to import that stuff into your own system again.

In my opinion, the price you pay with using this service is too high. You may loose participants in your discussions, you loose control over your content and you loose control over the availability of parts of your service.

Oh, wait. I forgot to mention the advantages you have from giving up your control. Erm.. Okay. Maybe someone tells me? I can't find any.

Update: Fixed a typo. Thanks to Bob for the heads-up.

Ask a Ninja: Is the “Googlevelopment” approach bad?

I stumbled upon a recent and very interesting blog post from Rick Strahl: "The Search Engine Developer". Rick in turn was motivated by a post from Scott Hanselman who asked "Am I really a developer or just a good googler?".

That inspired me to write this post, too. Mostly because this topic has to do a lot with self-improvement, learning and attitude.

What is it, what Ninja calls Googlevelopment?

We all know it: If we encounter something in our job we don't know, it is very tempting to throw some words related to it into the search engine of your choice, sift through the results and if there's a link to StackOverflow or to a blog from certain persons you know (like from conferences, book authors, via twitter, from others that pointed you to them earlier), these are your first stops. You don't even look further, because your problem is most probably solved. You copy the code, paste it into your solution, make some adjustments, test it and don't think further. There's no problem anymore.

To the a point

Scott is just having so much keystrokes left, and because of that didn't give a broad explanation on WHY he has the opinion he wrote down. Well, it isn't even in fact an opinion you read in his post, but a call for action: Try to stop googlevelopment and do it the old fashioned way: Write it yourself, go to user groups, learn, do code Katas etc. One can easily guess that Scott thinks googlevelopment is a bad Hobbit habit, and you shouldn't do it.

Rick, instead, was a bit more chatty. He mentioned that it "feels like [he's] become a slacker at best and a plagiarizer at worst" sometimes. He summed up his experience, back to the days where there simply was no publicly available Internet - no chance to copy your code -, through the 90ies (some online manuals, discussion forums), through the millenium where blogs started to spread and up to now, where collaborative sites like StackExchange are flourishing.

Using libraries, for Rick, is "standing on the shoulder of giants", and copying and adopting code from the intertubes gives him a rough feeling about the interior of the library, to be able to use it the right way, but not too deeply because, his example was a QR code library, that's not his actual problem domain.

He, while being totally right on that matter, said that there is no need to re-invent the wheels others invented previously. And then, there's this bummer: "It's fun to wax nostalgic about 'back in the day' stories, but I don't miss those days one bit. I don't think anybody does..."

Not missing the old days?

Rick, honestly? You don't miss these days? I think this 'back in the day' time was the time that made you the developer you are today. Those were the days that made you learn your stuff.

Today's younger developers, that didn't went through this (more or less hard) school of learning by themselves, trying things, failing, learning from their failure, inspecting stuff, who JUST started as 'search engine developers' or googlevelopers, can't really leverage the information they find on the net. Fiddling around with your platform, with your compiler, with sample sources (if any), with documentation is in the first place teaching you how to learn.

Rick then goes on describing that, because there are so much things out there, it could happen that you have a great idea and want to go on this. Then you might find finished implementations (even if not really 'good') - and just stick with them. Even if those implementations would deserve a new competing library from exactly you - because you could do better. But you left it alone.

Making this decision, re-implement or stick with a not-so-good solutions, is, of course, mainly driven by time / effort / money, but also by an educated analysis of risks and chances and the technical debt you're taking when using a not-so-good solution. You also need to be educated to estimate whether a re-invention would benefit your (and maybe others too, talking about open source) solution.

You can't, however, get that evaluation right when you haven't learned the implications of doing it yourself vs. using existing stuff when you did not do a lot by yourself previously.

Ask a Ninja: Is Googlevelopment bad?

So, now it's time for my personal opinion on that topic.

I already mentioned that I'm not with Ricks point of view. I think it's sad that he does not miss the old days. I started developing software very early. I got my first PC when I was 9 and two to three years later just 'using' it got boring. With 14 I wrote the first program that I sold to a doctor to manage lists of his patients. The german health insurance card was just available and there were keyboards with an integrated card reader that would just 'type' in all the data from the card when it was inserted.

My program just stored the data in a flat file (I didn't know that this format I chose was already known as CSV), and I had to invent and implement my own sorting algorithm. If I remember correctly, I did something like an insertion sort. I figured out ways to send that data to a printer when requested. And I spend a lot of time formatting the outputs of my program to look nice and present them in a beautiful way to its users (mostly young women that worked there and that I tried to impress back then, hell I was 14 🙂 ). So, I figured all that out. It took long. I learned a lot. And it was fun.

I'd love to learn new stuff all day. Fiddling with stuff. Trying to get things done by myself. I really miss that a lot. Sadly, in todays businesses this isn't possible anymore. There's just a tiny window of time available for that kind of stuff.

Conclusion

Finally Rick comes to this conclusion: "Search engines are another powerful tool in our arsenal and we can and should let them help us do our job and make that job easier. But at the same time we shouldn't let them lull us into a false sense of security - into a sense of thinking that all we need is information at our fingertips.".

Having all that information at our fingertips empowers us to build great stuff. It is information we can use to learn from it. And we have the power to decide to NOT use it. Rick linked to an older article from Scott: We need to Sharpen the Saw - this is us - on a regular base.

We should try to develop - not googlevelop - more stuff by ourself. This strengthens our skills and makes us more sensitive for when we have to use stuff others did. We need to find the right balance between "standing on the shoulder of giants" and trying to be the giant. This fine line, where you're in balance, is a moving target, though:

  • Young Padawan, more fiddling on your own, not using the force you must.
  • Knight, use the force and learn from it.
  • Master Jedi, more fiddling on your own again you should.

This is my idea. Well, not really mine. I just adopted it. With some friends I share a maxime. In German: "Schau in Dich. Schau um Dich. Schau über Dich." This goes for three steps of learning:

  1. "Schau in Dich." - Look inside you. This is about self-awareness. You should learn about yourself, about your learning.
  2. "Schau um Dich." Look around you. This is about learning not from yourself, but from others. And also about learning what your influence is on others, but that would go to far at this stage.
  3. "Schau über Dich." Look beyond you. Okay, that is a very loose translation. The aim of this part is to make you think about things in a larger scale, and push the limits.

This is also, what the learning of an apprentice, journeman and master craftsman was back in the old days. The apprentice should learn to learn. The journeman travels around, this enables him to learn more from others of his craft. The master then knows the basics of his craft, but he also tried to improve his skills, to be able to compete. Masters also could leverage their skills to try our really new stuff on their own - and succeed with that. Masters usually also were eligible to join the guild, where there was a lot of exchange between them - also about new stuff they discovered.

There is a slight chance, that this, what was done for decades back then, had some truth in it. And we software developers, engineers or craftsmen, could (and should) try to map this to our daily learning again.

Bottom line

Well. This is just a line. At the bottom. For no reason. 🙂

Scheduled downtime in September

My hosting service provider just informed me that my server will have a scheduled downtime next month.

They need to physically move some machines into a new data center, and my (virtual) server Gallifrey is running on one of these. Of course in times of cloud and virtualization that should not be a reason for a downtime, but since I'm only paying 30€ / month for a 4 core, 4 GB RAM, 200 GB hdd virtual machine (to compare: a single azure small vm would cost twice as much), I think I am not in the position to complain.

Downtime will be from Friday, 13th of September, 22:00 CEST (20:00 UTC) to Saturday, 14th September, 06:00 CEST (04:00 UTC).

I'm confident that, even if this happens on a Friday the 13th, the move will be finished in time and the actual downtime is possibly shorter than that.

Horrendous cool software

Mobile connectivity

Devices should work together. All devices. Also from different companies. They don't.

A good example is the feature 'USB Tethering':

  • When I plug my iPhone in my Macbook it works.
  • When I plug my iPhone in my Windows notebook, it works.
  • When I plug my Google Nexus in my Windows notebook, it works.
  • When I plug my Google Nexus in my Macbook Air... I'm screwed.

I didn't find a way to make it work. I always needed to open up a mobile WiFi Hotspot to tether when I'm on the road. That sucked. And it costs a lot of battery on my phone. And I need that for Ingress 😉

Then someone saved my day. It was Joshua Wise (@jwise0) by writing a cool piece of software. What he did was writing a driver for Mac OS X that understands Microsoft's proprietary RNDIS protocol - that is used for USB tethering by Google's Android devices. No, I won't go on the topic 'Google uses a proprietary Microsoft protocol for their relatively open Android platform' now. That's not my thing.

The software has the interesting name HoRNDIS (pronounced horrendous) and its source is also available on GitHub. He also have binary packages available on the projects homepage for a simple installation.

And, what should I say? It works like a charm. I installed the driver, plugged my phone in via USB, activated USB tethering through the menu and now I'm publishing this blog post USB tethered.

Thank you, Joshua. Very much.

Re-activated my performance monitoring with Rackspace

In my last blog post about my Rackspace monitoring solution I described why I deactivated the performance tracking: I measured the wire and not my blogs performance.

I was totally surprised that my post got an answer to that problem in three comments one and two days later. And those comments came from Rackspace employees. I never thought about contacting them about this. First of all, this was not on my high priority list and second, since I'm only using the cloud monitoring stuff for a dollar and a half per month, I never thought about bothering them with a request about that. I wanted to dig into it myself later, when time was right for it.

I knew that Rackspace is advertising with 'fanatical' support - but I never would have thought that they would scan random blogs for potential issues and support there. I am totally suprised and I think I already became a fan.

Now, the idea from Justin was to set the consistency level of the check from the default value QUORUM to ALL. In my case this changed the alerting behaviour from "Oh, two of the three zones are slow, let's warn him." to, "Well, both US zones are slow, but the EU zone London still performs good, let's wait with an alert until that gets slow too".

The other idea was to consecutiveCount to a larger value, so that an alert is only sent if the check fails for X consecutive times. I have this in mind, but for now the consistency level is totally enough for me to re-activate the check. So this is my new performance check I activated a day later:

:set consistencyLevel=ALL

if (metric['duration'] > 2500) {
  return CRITICAL, "HTTP request took more than 2.5 seconds, it took #{duration} milliseconds."
} 

if (metric['duration'] > 1800) { 
  return WARNING, "HTTP request took more than 1.8 seconds, it took #{duration} milliseconds."
}

return OK, "Overall performance is okay: #{duration} milliseconds."

With this in place I now get alerted whenever London thinks my Blog is slow, and it gives me response times from all three zones in that alerting mail. This check is just running for two weeks now, and alerted me two times. Both times the alert went green the check after, so the consecutive count would have prevented those mailings, but I think this 'spam' rate is still okay and I don't see the need to fix this.

Actually, I'm pretty happy about those mails once in a while, because I feel that my monitoring is in place and working.

So the bottom line is: Rackspace support really, really rocks. And the monitoring can be tweaked to meet my needs.

Why FireMonkey is wrong, the second

I just stumbled upon a really, really great post on Steven Sinofskys Blog.

His article is about the challenges of cross-platform development in general, and he brings up some rather good points on why some approaches will eventually fail.

I'm pretty sure he doesn't even know about FireMonkey, but this is what he has to say on cross platform libraries in general:

One of the most common approaches developers attempt (and often provided by third parties as well) is to develop or use a library that abstracts away platform differences or claims to map a unique “meta API” to multiple platforms. These cross—platform libraries are conceptually attractive but practically unworkable over time. Again, early on this can work. Over time the platform divergence is real.

And then he continues:

Worse, as an app developer you end up relying on essentially a “shadow” OS provided by a team that has a fraction of the resources for updates, tooling, documentation, ..
[...]
It is important to keep in mind that the platforms are evolving rapidly and the customer desire for well-integrated apps (not just apps that run).

The very last point is what I already stated in my own post on why FireMonkey is wrong. I didn't even write about the even more important first ones. And this only are quotes from one paragraph where he thinks about cross-platform libraries.

I strongly suggest that you take a few minutes and read what Sinofsky wrote about cross-platform development. And then, if you currently feel that FireMonkey could be the right tool for you, try to understand his points and re-think your position on cross-platform tooling. I'm sure you will see that FireMonkey can't be the right tool for you - or anybody.

A small update to my Rackspace Cloud Monitoring configuration

In the last post I mentioned how I set up my Rackspace Cloud Monitoring system to notify me when my Blog fails or performs badly.
I tweaked the configuration a little bit now: I deactivated the performance check.

Why is that? Because it did not monitor my Blog's overall performance but just the quality of the transatlantic wires. And I can tell you: It's very volatile.

While having constant good response times from the check zone in London, both U.S. zones are okay, bad, then critical and then okay again in a matter of some minutes. So I was spamming myself with that check.

I'd love to have a redundant performance check in place, but there are currently no two check zones in Europe, and I did not find a way up to now to restrict the performance check to the London values only. I think I'll do some more research on that later. For now, I'm fine with the Code 200 'Up and running' check.