Note to myself: Regular Expressions performance

This post is mostly as a reminder for myself, to not loose these important links again. But said that, it’s probably interesting for you too, when you care about performance and the .NET behaviour around regular expressions (Regex).

The common things

In the .NET world, some things are very common. First is, you are advised to use a StringBuilder whenever you concatenate some strings. Second is: If a Regex is slow, use RegexOptions.Compiled to fix it. Well… Now, in fact there are reasons for this sort of advise. String concatenation IS slow, for various, commonly known reasons. But still a StringBuilder has some overhead and there are situations where using it imposes an unwanted overhead.

The very same goes for RegexOptions.Compiled, and Jeff Atwood, aka Coding Horror, wrote a very good article about that a few years ago: To compile or not to compile (Jeff Atwood).

In one of the comments another article from MSDN (BCL Blog) is referenced, where the different caching behaviour of Regex in .NET 1.1 vs. .NET 2.0 is explained: Regex Class Caching Changes between .NET Framework 1.1 and .NET Framework 2.0 (Josh Free).

The not-so-common things

There is only a single thing that is true for each and every kind of performance optimization. And it’s the simple two words: “It depends.”.

With regular expressions, the first thing any performance issue depends on is, if you really need a regular expression for the task. Of course, if you really know regular expressions, what they can do and what they can’t, and for what they are the correct tool, you are very likely to not run into those kinds of problems. But when you just learned about the power of Regexes (all you have is a hammer) everything starts to look as a string desperatly waiting to get matched (everything is a nail). What I want to say is: Not everything that could be solved with a Regex also should be solved by one. Again, I have a link for me and you to keep in your Regex link collection: Regular Expressions: Now You Have Two Problems (Jeff Atwood).

Now, finally to performance optimization links.

There is a good blog article series on the MSDN BCL Blog (like the one above) that goes very deep into how the Regex class performs in different scenarios. You find them here:

And, besides those, once again a nice article on “catastrophic backtracking” from Jeff: Regex Performance (Jeff Atwood).

One more thing

There are three articles, that are not really available anymore. Three very good articles from Mike, that you can only retrieve from the wayback machine. I’m really thinking hard about providing a mirror for these articles on my blog too. But until then, here are the links:

My résumé of BASTA conference 2013

Right now I’m on the train on my journey back from BASTA 2013. Time for a small résumé.

Making it short: The BASTA conference was great. Especially of course meeting friends and colleagues you usually just see at conferences. It’s quite interesting to see some of the guys you know from other conferences like EKON or Delphi Live! that are now also here at BASTA (especially from tool vendors that have their origin in the Delphi space).

Besides my own session about JavaScript Unit Testing (here are the Slides and code samples and also a small summary (in German) published just after my talk) I also attended some other sessions. Especially the whole “Modern Business Application” track, mainly driven from our colleagues at thinktecture was very interesting.

But perhaps even more interesting were some of the boothes. Especially one of them caught my attention: Testhub (in German). I really like the idea of outsourcing your testing efforts to testing specialists. And also the price tag on this kind of crowd-testing seems very appealing. I had the opportunity to talk directly to one of the founders of Testhub, and chatted with him about requirements against testing and the whole concept seems very well thought-out.

Im a bit sad that I have to leave that early, but I have other duties that cannot I don’t want to wait. I wish all other BASTA attendees and speakers another two exciting days in Mainz. I’m looking forward to see some of them at EKON in November.

Update: Added a link to the summary article on “Windows Developer”.

I’m Speaking at EKON 17, too

Nick started the “I’m speaking at” campaign for EKON 17, and so I thought I’d team up and join him, not only on this campaign but also going to support his Unit-Testing session with some automation tips & tricks in my own CI session.

I’m giving two sessions at

EKON 17, 04. -06. November 2013 in Cologne:

Both talks will be held in German, but since I keep my slides in english you should be able to follow, and I’m also happy to explain things in english when asked to do so.

The first session is about Continuous Integration (CI), where I’m going to explain why you should do it and what tools are available and how powerful a good CI setup can be.

The second talk is an introduction into Git. I’m explaining the differences between SVN and Git, and showing you what distributed version control can do for you.

So, I’d be happy to see you at EKON in Cologne.

Oh, and if you’re still unsure about whether you should go, then just contact me. We can arrange a little discount 😉

EKON 17 Banner

And some additional info about EKON (in German):

EKON 17 – Das Konferenz-Highlight für Delphi-Entwickler

Vom 04. bis 06. November 2013 präsentiert das Entwickler Magazin in Köln die 17. Entwickler Konferenz. Die EKON ist das große jährliche Konferenz-Highlight für die Delphi-Community und bietet in diesem Jahr insgesamt 30 Sessions und 4 Workshops mit vielen nationalen und internationalen Profis der Delphi-Community – unter anderen mit Marco Cantú von Embarcadero, Bernd Ua, Ray Konopka, Nick Hodges, Cary Jensen u.v.m. Fünf Tracks stehen zur Auswahl, von Tips &Techniques, IDE & Tools, Crossplatform/ Mobile/ Web über OOP/ Fundamentals bis hin zu Datenbanken. Auch Neuigkeiten aus dem Hause Embarcadero, wie beispielsweise die iOS-Entwicklung stehen auf der Agenda. Alle Infos auf www.entwickler-konferenz.de.

Why is everyone using Disqus?

Recently I discovered that more and more Blogs I visit start to use Disqus. And I don’t understand, why.

As Tim Bray said: “Own your space on the Web, and pay for it. Extra effort, but otherwise you’re a sharecropper“.

I read it as this is not just about owning your own ‘real estate’ on the web, but also owning your content. There is a saying going through the net (I couldn’t discover the original source, but it’s quoted like hell out there): “If you’re not paying for it, you’re the product“.

What’s this Disqus thing in the first place?

Maybe the reason that I can’t understand why everyone starts to use Disqus is, that I didn’t get the Disqus concept right myself.
For me, Disqus is outsourcing the discussion part (read: comments area) from your blog (or website) to a third party. Completely. Including the user-generated content in these comments.

Disqus offers itself as an easy way to build up communities. It’s targeted at advertising, so everything you post in your comments via Disqus may will be used to target ads.

If your read through their terms and conditions, you will notice that your personal identifiable information you and third parties (like Facebook, when connecting to Disqus) pass to them may also be sold, together with any information you ‘publicly share’ via their service (read: your comments).

What’s so bad about it?

Well, you may decide for yourself if not actually owning your own content is okay for you. You may decide to share your comments on a site that uses Disqus or you may decide to NOT share your thoughts there.

But making this decision for your Blog is making this decision for all your visitors and everyone that want to comment on your posts is forced to share their comments with Disqus – or not share their thoughts with you.

The latter is the real problem with it. I won’t comment on sites using Disqus. So you won’t receive feedback from me. Okay, some people would rather say that’s a good thing ;-), but others would be pretty happy about what I have to say.

The technical debt

On several occasions I noticed that the Disqus service isn’t that reliable. I am commuting a lot. Right now I’m sitting in a train and have tethered internet connection. Mostly, Disqus doesn’t load at all for me. I can’t tell why. Especially not why it mostly happens when I’m on a tethered connection. And honestly, I don’t care.

When using Disqus for your site, you’re not only sourcing out your comments and your user’s content, but also the continuity of your service. What, if the Disqus API changes? You need to react, or lose you’re comments. What, if they decide to shut down the service? You lose your comments. Maybe you’re able to export all ‘your’ stuff previously. But then you’re on your own how to import that stuff into your own system again.

In my opinion, the price you pay with using this service is too high. You may loose participants in your discussions, you loose control over your content and you loose control over the availability of parts of your service.

Oh, wait. I forgot to mention the advantages you have from giving up your control. Erm.. Okay. Maybe someone tells me? I can’t find any.

Update: Fixed a typo. Thanks to Bob for the heads-up.

Ask a Ninja: Is the “Googlevelopment” approach bad?

I stumbled upon a recent and very interesting blog post from Rick Strahl: “The Search Engine Developer“. Rick in turn was motivated by a post from Scott Hanselman who asked “Am I really a developer or just a good googler?“.

That inspired me to write this post, too. Mostly because this topic has to do a lot with self-improvement, learning and attitude.

What is it, what Ninja calls Googlevelopment?

We all know it: If we encounter something in our job we don’t know, it is very tempting to throw some words related to it into the search engine of your choice, sift through the results and if there’s a link to StackOverflow or to a blog from certain persons you know (like from conferences, book authors, via twitter, from others that pointed you to them earlier), these are your first stops. You don’t even look further, because your problem is most probably solved. You copy the code, paste it into your solution, make some adjustments, test it and don’t think further. There’s no problem anymore.

To the a point

Scott is just having so much keystrokes left, and because of that didn’t give a broad explanation on WHY he has the opinion he wrote down. Well, it isn’t even in fact an opinion you read in his post, but a call for action: Try to stop googlevelopment and do it the old fashioned way: Write it yourself, go to user groups, learn, do code Katas etc. One can easily guess that Scott thinks googlevelopment is a bad Hobbit habit, and you shouldn’t do it.

Rick, instead, was a bit more chatty. He mentioned that it “feels like [he’s] become a slacker at best and a plagiarizer at worst” sometimes. He summed up his experience, back to the days where there simply was no publicly available Internet – no chance to copy your code -, through the 90ies (some online manuals, discussion forums), through the millenium where blogs started to spread and up to now, where collaborative sites like StackExchange are flourishing.

Using libraries, for Rick, is “standing on the shoulder of giants“, and copying and adopting code from the intertubes gives him a rough feeling about the interior of the library, to be able to use it the right way, but not too deeply because, his example was a QR code library, that’s not his actual problem domain.

He, while being totally right on that matter, said that there is no need to re-invent the wheels others invented previously. And then, there’s this bummer: “It’s fun to wax nostalgic about ‘back in the day’ stories, but I don’t miss those days one bit. I don’t think anybody does…

Not missing the old days?

Rick, honestly? You don’t miss these days? I think this ‘back in the day’ time was the time that made you the developer you are today. Those were the days that made you learn your stuff.

Today’s younger developers, that didn’t went through this (more or less hard) school of learning by themselves, trying things, failing, learning from their failure, inspecting stuff, who JUST started as ‘search engine developers’ or googlevelopers, can’t really leverage the information they find on the net. Fiddling around with your platform, with your compiler, with sample sources (if any), with documentation is in the first place teaching you how to learn.

Rick then goes on describing that, because there are so much things out there, it could happen that you have a great idea and want to go on this. Then you might find finished implementations (even if not really ‘good’) – and just stick with them. Even if those implementations would deserve a new competing library from exactly you – because you could do better. But you left it alone.

Making this decision, re-implement or stick with a not-so-good solutions, is, of course, mainly driven by time / effort / money, but also by an educated analysis of risks and chances and the technical debt you’re taking when using a not-so-good solution. You also need to be educated to estimate whether a re-invention would benefit your (and maybe others too, talking about open source) solution.

You can’t, however, get that evaluation right when you haven’t learned the implications of doing it yourself vs. using existing stuff when you did not do a lot by yourself previously.

Ask a Ninja: Is Googlevelopment bad?

So, now it’s time for my personal opinion on that topic.

I already mentioned that I’m not with Ricks point of view. I think it’s sad that he does not miss the old days. I started developing software very early. I got my first PC when I was 9 and two to three years later just ‘using’ it got boring. With 14 I wrote the first program that I sold to a doctor to manage lists of his patients. The german health insurance card was just available and there were keyboards with an integrated card reader that would just ‘type’ in all the data from the card when it was inserted.

My program just stored the data in a flat file (I didn’t know that this format I chose was already known as CSV), and I had to invent and implement my own sorting algorithm. If I remember correctly, I did something like an insertion sort. I figured out ways to send that data to a printer when requested. And I spend a lot of time formatting the outputs of my program to look nice and present them in a beautiful way to its users (mostly young women that worked there and that I tried to impress back then, hell I was 14 🙂 ). So, I figured all that out. It took long. I learned a lot. And it was fun.

I’d love to learn new stuff all day. Fiddling with stuff. Trying to get things done by myself. I really miss that a lot. Sadly, in todays businesses this isn’t possible anymore. There’s just a tiny window of time available for that kind of stuff.

Conclusion

Finally Rick comes to this conclusion: “Search engines are another powerful tool in our arsenal and we can and should let them help us do our job and make that job easier. But at the same time we shouldn’t let them lull us into a false sense of security – into a sense of thinking that all we need is information at our fingertips.“.

Having all that information at our fingertips empowers us to build great stuff. It is information we can use to learn from it. And we have the power to decide to NOT use it. Rick linked to an older article from Scott: We need to Sharpen the Saw – this is us – on a regular base.

We should try to develop – not googlevelop – more stuff by ourself. This strengthens our skills and makes us more sensitive for when we have to use stuff others did. We need to find the right balance between “standing on the shoulder of giants” and trying to be the giant. This fine line, where you’re in balance, is a moving target, though:

  • Young Padawan, more fiddling on your own, not using the force you must.
  • Knight, use the force and learn from it.
  • Master Jedi, more fiddling on your own again you should.

This is my idea. Well, not really mine. I just adopted it. With some friends I share a maxime. In German: “Schau in Dich. Schau um Dich. Schau über Dich.” This goes for three steps of learning:

  1. “Schau in Dich.” – Look inside you. This is about self-awareness. You should learn about yourself, about your learning.
  2. “Schau um Dich.” Look around you. This is about learning not from yourself, but from others. And also about learning what your influence is on others, but that would go to far at this stage.
  3. “Schau über Dich.” Look beyond you. Okay, that is a very loose translation. The aim of this part is to make you think about things in a larger scale, and push the limits.

This is also, what the learning of an apprentice, journeman and master craftsman was back in the old days. The apprentice should learn to learn. The journeman travels around, this enables him to learn more from others of his craft. The master then knows the basics of his craft, but he also tried to improve his skills, to be able to compete. Masters also could leverage their skills to try our really new stuff on their own – and succeed with that. Masters usually also were eligible to join the guild, where there was a lot of exchange between them – also about new stuff they discovered.

There is a slight chance, that this, what was done for decades back then, had some truth in it. And we software developers, engineers or craftsmen, could (and should) try to map this to our daily learning again.

Bottom line

Well. This is just a line. At the bottom. For no reason. 🙂

Re-activated my performance monitoring with Rackspace

In my last blog post about my Rackspace monitoring solution I described why I deactivated the performance tracking: I measured the wire and not my blogs performance.

I was totally surprised that my post got an answer to that problem in three comments one and two days later. And those comments came from Rackspace employees. I never thought about contacting them about this. First of all, this was not on my high priority list and second, since I’m only using the cloud monitoring stuff for a dollar and a half per month, I never thought about bothering them with a request about that. I wanted to dig into it myself later, when time was right for it.

I knew that Rackspace is advertising with ‘fanatical’ support – but I never would have thought that they would scan random blogs for potential issues and support there. I am totally suprised and I think I already became a fan.

Now, the idea from Justin was to set the consistency level of the check from the default value QUORUM to ALL. In my case this changed the alerting behaviour from “Oh, two of the three zones are slow, let’s warn him.” to, “Well, both US zones are slow, but the EU zone London still performs good, let’s wait with an alert until that gets slow too”.

The other idea was to consecutiveCount to a larger value, so that an alert is only sent if the check fails for X consecutive times. I have this in mind, but for now the consistency level is totally enough for me to re-activate the check. So this is my new performance check I activated a day later:

:set consistencyLevel=ALL

if (metric['duration'] > 2500) {
  return CRITICAL, "HTTP request took more than 2.5 seconds, it took #{duration} milliseconds."
} 

if (metric['duration'] > 1800) { 
  return WARNING, "HTTP request took more than 1.8 seconds, it took #{duration} milliseconds."
}

return OK, "Overall performance is okay: #{duration} milliseconds."

With this in place I now get alerted whenever London thinks my Blog is slow, and it gives me response times from all three zones in that alerting mail. This check is just running for two weeks now, and alerted me two times. Both times the alert went green the check after, so the consecutive count would have prevented those mailings, but I think this ‘spam’ rate is still okay and I don’t see the need to fix this.

Actually, I’m pretty happy about those mails once in a while, because I feel that my monitoring is in place and working.

So the bottom line is: Rackspace support really, really rocks. And the monitoring can be tweaked to meet my needs.

Why FireMonkey is wrong, the second

I just stumbled upon a really, really great post on Steven Sinofskys Blog.

His article is about the challenges of cross-platform development in general, and he brings up some rather good points on why some approaches will eventually fail.

I’m pretty sure he doesn’t even know about FireMonkey, but this is what he has to say on cross platform libraries in general:

One of the most common approaches developers attempt (and often provided by third parties as well) is to develop or use a library that abstracts away platform differences or claims to map a unique “meta API” to multiple platforms. These cross—platform libraries are conceptually attractive but practically unworkable over time. Again, early on this can work. Over time the platform divergence is real.

And then he continues:

Worse, as an app developer you end up relying on essentially a “shadow” OS provided by a team that has a fraction of the resources for updates, tooling, documentation, ..
[…]
It is important to keep in mind that the platforms are evolving rapidly and the customer desire for well-integrated apps (not just apps that run).

The very last point is what I already stated in my own post on why FireMonkey is wrong. I didn’t even write about the even more important first ones. And this only are quotes from one paragraph where he thinks about cross-platform libraries.

I strongly suggest that you take a few minutes and read what Sinofsky wrote about cross-platform development. And then, if you currently feel that FireMonkey could be the right tool for you, try to understand his points and re-think your position on cross-platform tooling. I’m sure you will see that FireMonkey can’t be the right tool for you – or anybody.

A small update to my Rackspace Cloud Monitoring configuration

In the last post I mentioned how I set up my Rackspace Cloud Monitoring system to notify me when my Blog fails or performs badly.
I tweaked the configuration a little bit now: I deactivated the performance check.

Why is that? Because it did not monitor my Blog’s overall performance but just the quality of the transatlantic wires. And I can tell you: It’s very volatile.

While having constant good response times from the check zone in London, both U.S. zones are okay, bad, then critical and then okay again in a matter of some minutes. So I was spamming myself with that check.

I’d love to have a redundant performance check in place, but there are currently no two check zones in Europe, and I did not find a way up to now to restrict the performance check to the London values only. I think I’ll do some more research on that later. For now, I’m fine with the Code 200 ‘Up and running’ check.

Technical problems, a solution and Rackspace cloud monitoring

Some of you may have noticed that my blog experienced some technical difficulties yesterday morning.

For some reason I couldn’t find out the IIS still served static files, but anything that had to do with code like this Blog, my TeamCity, YouTrack, Stash and Fisheye applications did not respond anymore. The sad thing was that I even couldn’t RDP into my VM, and so I had to trigger a reboot through the hosters web interface.

What I really disliked was that I noticed the problem only when I wanted to log into my blog to check for comments and spam.
To improve that I thought about monitoring my server or better the services it runs. So I asked Google to suggest some monitoring solutions that could help me out.

First hit

The first hit was Rackspace Cloud Monitoring. The price of 1.50 USD / month is great because I don’t want to spend a lot for checking my private stuff, but everything at about 5€ / month would be okay for me. The feature set described on their homepage was okay for me. What I really need is some service that makes a request against my blog and checks if it returns a 200 status code, and alert me if this is not the case.

So I signed up for a Rackspace cloud account. After a few minutes I got called to verify my account and the guy on the other end of the line offered help for getting started with them. I really like this approach, because it really takes down the barriers.

My first and single difficulty

After I signed up and was activated I logged into the management portal and looked for the monitoring options. Guess what? Nothing there. Their homepage stated it should be easy to configure the monitoring through the portal, but I could not find an option.

I tweeted about that and almost immediately I got a response with a link to a getting started video. Honestly, this was the point where I was really impressed. The Rackspace community obviously is very strong and willing to help. That’s great.

So, watching the video I learned that I could set up monitoring for a VM that I host on Rackspace, but if I delete that VM the monitoring setup would vanish too. Nothing for me, because I don’t need a VM but just the monitoring.

After tweeting about that I got this very helpful response:

I didn’t want to use the API, because I actually wanted to easily click together my simple 200-check. So I tried out this labs-GUI.

The setup

I didn’t dig into the documentation before I started. Actually I thought it should be possible to figure out how to set up a simple HTTP monitoring by just clicking through it. The labs GUI is a very basic Twitter Bootstrap interface that just enables you to access the functionality. Right now there is no real UX, but that’s okay. It works 😉

First I entered an ‘Entity’. I thought this would be the thing to monitor, so I entered ‘Gallifrey’, the name of my server. Turned out I got it right. What I could do additionally is to install a monitoring agent on Gallifrey to have it send data about CPU, memory and disk usage to Rackspace that I could use for my monitoring too.

Entities

For this entity I now could add a ‘Check’. I named it ‘Blog’ as I wanted to check the blog on Gallifrey.

Here I could configure that this is a HTTP check, the URL to test and from which locations Rackspace should test this. I checked London and two U.S. locations as 3 zones cost the same as just a single one.

Now, this check alone won’t help me. I need to tell the system what to do after a check and what are the error and ok conditions: Enter ‘alarms’.

Alarms are the actual thing I want: A mail, whenever something goes wrong. The alarm is fed with the information from the check, evaluates it by rules I enter (the something) and where to mail the information to.

I started with my status code check (see screenshot on the right).
Status code alert

For the check language I had to check the documentation, but the samples are very self-explanatory so that I had this check running in minutes.

I then added another step that should notify about the performance of my blog. For this I used this check:

if (metric['duration'] > 2500) {
  return CRITICAL, "HTTP request took more than 2.5 seconds, it took #{duration} milliseconds."
} 
if (metric['duration'] > 1800) { 
  return WARNING, "HTTP request took more than 1.8 seconds, it took #{duration} milliseconds."
}
return OK, "Overall performance is okay: #{duration} milliseconds."

The values may seem a bit high, but since two of the three check locations aren’t in Europe I have to take some transatlantic latency into account. These values seem to work, because with lower values I already got quite some mails warning me that the performance seemed low 😉

Noticing the alerts

To be really notified I created an filter in my e-mail account that marks the mails with ‘critical’ or ‘warning’ status as important. This way I get notified directly because I don’t let my phone notify me of every mail I receive.

Conclusion

Rackspace is very fast, easy to use and has a great community that helps you getting started in minutes.

With just about 15 to 20 minutes effort and a current investment of 1.50 USD / month I have a very easy to set up and hopefully reliable monitoring for my personal blog. This way I can react faster when something strange happens.

Disclaimer: I’m just a new customer of Rackspace and not related to them in any other way than that I’m paying them to monitor my blog.

 

Ask a Ninja: Where do you get your Ninja-skills from?

My second “Ask a Ninja” post is about where to get your skills from.

Well, first of all training, experimenting, using a good portion of your spare time for improving your skills. And then, of course, from others. Others that are willing to share their experience and their knowledge. Preferably in a medium that can be persisted (but also one-to-one sessions are invaluable for sharing knowledge).

Because of that I have a quite impressive library full of technical books. In the recent years I moved my library into an electronic form. I have a lot of ebooks and carry most of my library with me all the time on my Kindle. That way I can look up and re-read the important things whenever necessary. You don’t need to know everything, but you need to know where you have that information when it’s required.

So, what’s the point of this blog post you ask?

Well, I just stumbled upon an impressive library in ebook form that is available for free from Microsoft. Its a Huge collection of free Microsoft ebooks for you. They cover Sharepoint, SQL Server, Visual Studio, Web Development, Windows Store development, Azure and Windows.

So, if you want to improve your Ninja-Skills – go and grab them while their hot and start to read. And of course, spent some time to experiment with the knowledge to tighten what you just read. 😉