Note to myself: Regular Expressions performance

This post is mostly as a reminder for myself, to not loose these important links again. But said that, it's probably interesting for you too, when you care about performance and the .NET behaviour around regular expressions (Regex).

The common things

In the .NET world, some things are very common. First is, you are advised to use a StringBuilder whenever you concatenate some strings. Second is: If a Regex is slow, use RegexOptions.Compiled to fix it. Well... Now, in fact there are reasons for this sort of advise. String concatenation IS slow, for various, commonly known reasons. But still a StringBuilder has some overhead and there are situations where using it imposes an unwanted overhead.

The very same goes for RegexOptions.Compiled, and Jeff Atwood, aka Coding Horror, wrote a very good article about that a few years ago: To compile or not to compile (Jeff Atwood).

In one of the comments another article from MSDN (BCL Blog) is referenced, where the different caching behaviour of Regex in .NET 1.1 vs. .NET 2.0 is explained: Regex Class Caching Changes between .NET Framework 1.1 and .NET Framework 2.0 (Josh Free).

The not-so-common things

There is only a single thing that is true for each and every kind of performance optimization. And it's the simple two words: "It depends.".

With regular expressions, the first thing any performance issue depends on is, if you really need a regular expression for the task. Of course, if you really know regular expressions, what they can do and what they can't, and for what they are the correct tool, you are very likely to not run into those kinds of problems. But when you just learned about the power of Regexes (all you have is a hammer) everything starts to look as a string desperatly waiting to get matched (everything is a nail). What I want to say is: Not everything that could be solved with a Regex also should be solved by one. Again, I have a link for me and you to keep in your Regex link collection: Regular Expressions: Now You Have Two Problems (Jeff Atwood).

Now, finally to performance optimization links.

There is a good blog article series on the MSDN BCL Blog (like the one above) that goes very deep into how the Regex class performs in different scenarios. You find them here:

And, besides those, once again a nice article on "catastrophic backtracking" from Jeff: Regex Performance (Jeff Atwood).

One more thing

There are three articles, that are not really available anymore. Three very good articles from Mike, that you can only retrieve from the wayback machine. I'm really thinking hard about providing a mirror for these articles on my blog too. But until then, here are the links:

Ask a Ninja: Is the “Googlevelopment” approach bad?

I stumbled upon a recent and very interesting blog post from Rick Strahl: "The Search Engine Developer". Rick in turn was motivated by a post from Scott Hanselman who asked "Am I really a developer or just a good googler?".

That inspired me to write this post, too. Mostly because this topic has to do a lot with self-improvement, learning and attitude.

What is it, what Ninja calls Googlevelopment?

We all know it: If we encounter something in our job we don't know, it is very tempting to throw some words related to it into the search engine of your choice, sift through the results and if there's a link to StackOverflow or to a blog from certain persons you know (like from conferences, book authors, via twitter, from others that pointed you to them earlier), these are your first stops. You don't even look further, because your problem is most probably solved. You copy the code, paste it into your solution, make some adjustments, test it and don't think further. There's no problem anymore.

To the a point

Scott is just having so much keystrokes left, and because of that didn't give a broad explanation on WHY he has the opinion he wrote down. Well, it isn't even in fact an opinion you read in his post, but a call for action: Try to stop googlevelopment and do it the old fashioned way: Write it yourself, go to user groups, learn, do code Katas etc. One can easily guess that Scott thinks googlevelopment is a bad Hobbit habit, and you shouldn't do it.

Rick, instead, was a bit more chatty. He mentioned that it "feels like [he's] become a slacker at best and a plagiarizer at worst" sometimes. He summed up his experience, back to the days where there simply was no publicly available Internet - no chance to copy your code -, through the 90ies (some online manuals, discussion forums), through the millenium where blogs started to spread and up to now, where collaborative sites like StackExchange are flourishing.

Using libraries, for Rick, is "standing on the shoulder of giants", and copying and adopting code from the intertubes gives him a rough feeling about the interior of the library, to be able to use it the right way, but not too deeply because, his example was a QR code library, that's not his actual problem domain.

He, while being totally right on that matter, said that there is no need to re-invent the wheels others invented previously. And then, there's this bummer: "It's fun to wax nostalgic about 'back in the day' stories, but I don't miss those days one bit. I don't think anybody does..."

Not missing the old days?

Rick, honestly? You don't miss these days? I think this 'back in the day' time was the time that made you the developer you are today. Those were the days that made you learn your stuff.

Today's younger developers, that didn't went through this (more or less hard) school of learning by themselves, trying things, failing, learning from their failure, inspecting stuff, who JUST started as 'search engine developers' or googlevelopers, can't really leverage the information they find on the net. Fiddling around with your platform, with your compiler, with sample sources (if any), with documentation is in the first place teaching you how to learn.

Rick then goes on describing that, because there are so much things out there, it could happen that you have a great idea and want to go on this. Then you might find finished implementations (even if not really 'good') - and just stick with them. Even if those implementations would deserve a new competing library from exactly you - because you could do better. But you left it alone.

Making this decision, re-implement or stick with a not-so-good solutions, is, of course, mainly driven by time / effort / money, but also by an educated analysis of risks and chances and the technical debt you're taking when using a not-so-good solution. You also need to be educated to estimate whether a re-invention would benefit your (and maybe others too, talking about open source) solution.

You can't, however, get that evaluation right when you haven't learned the implications of doing it yourself vs. using existing stuff when you did not do a lot by yourself previously.

Ask a Ninja: Is Googlevelopment bad?

So, now it's time for my personal opinion on that topic.

I already mentioned that I'm not with Ricks point of view. I think it's sad that he does not miss the old days. I started developing software very early. I got my first PC when I was 9 and two to three years later just 'using' it got boring. With 14 I wrote the first program that I sold to a doctor to manage lists of his patients. The german health insurance card was just available and there were keyboards with an integrated card reader that would just 'type' in all the data from the card when it was inserted.

My program just stored the data in a flat file (I didn't know that this format I chose was already known as CSV), and I had to invent and implement my own sorting algorithm. If I remember correctly, I did something like an insertion sort. I figured out ways to send that data to a printer when requested. And I spend a lot of time formatting the outputs of my program to look nice and present them in a beautiful way to its users (mostly young women that worked there and that I tried to impress back then, hell I was 14 🙂 ). So, I figured all that out. It took long. I learned a lot. And it was fun.

I'd love to learn new stuff all day. Fiddling with stuff. Trying to get things done by myself. I really miss that a lot. Sadly, in todays businesses this isn't possible anymore. There's just a tiny window of time available for that kind of stuff.

Conclusion

Finally Rick comes to this conclusion: "Search engines are another powerful tool in our arsenal and we can and should let them help us do our job and make that job easier. But at the same time we shouldn't let them lull us into a false sense of security - into a sense of thinking that all we need is information at our fingertips.".

Having all that information at our fingertips empowers us to build great stuff. It is information we can use to learn from it. And we have the power to decide to NOT use it. Rick linked to an older article from Scott: We need to Sharpen the Saw - this is us - on a regular base.

We should try to develop - not googlevelop - more stuff by ourself. This strengthens our skills and makes us more sensitive for when we have to use stuff others did. We need to find the right balance between "standing on the shoulder of giants" and trying to be the giant. This fine line, where you're in balance, is a moving target, though:

  • Young Padawan, more fiddling on your own, not using the force you must.
  • Knight, use the force and learn from it.
  • Master Jedi, more fiddling on your own again you should.

This is my idea. Well, not really mine. I just adopted it. With some friends I share a maxime. In German: "Schau in Dich. Schau um Dich. Schau über Dich." This goes for three steps of learning:

  1. "Schau in Dich." - Look inside you. This is about self-awareness. You should learn about yourself, about your learning.
  2. "Schau um Dich." Look around you. This is about learning not from yourself, but from others. And also about learning what your influence is on others, but that would go to far at this stage.
  3. "Schau über Dich." Look beyond you. Okay, that is a very loose translation. The aim of this part is to make you think about things in a larger scale, and push the limits.

This is also, what the learning of an apprentice, journeman and master craftsman was back in the old days. The apprentice should learn to learn. The journeman travels around, this enables him to learn more from others of his craft. The master then knows the basics of his craft, but he also tried to improve his skills, to be able to compete. Masters also could leverage their skills to try our really new stuff on their own - and succeed with that. Masters usually also were eligible to join the guild, where there was a lot of exchange between them - also about new stuff they discovered.

There is a slight chance, that this, what was done for decades back then, had some truth in it. And we software developers, engineers or craftsmen, could (and should) try to map this to our daily learning again.

Bottom line

Well. This is just a line. At the bottom. For no reason. 🙂

Why FireMonkey is wrong, the second

I just stumbled upon a really, really great post on Steven Sinofskys Blog.

His article is about the challenges of cross-platform development in general, and he brings up some rather good points on why some approaches will eventually fail.

I'm pretty sure he doesn't even know about FireMonkey, but this is what he has to say on cross platform libraries in general:

One of the most common approaches developers attempt (and often provided by third parties as well) is to develop or use a library that abstracts away platform differences or claims to map a unique “meta API” to multiple platforms. These cross—platform libraries are conceptually attractive but practically unworkable over time. Again, early on this can work. Over time the platform divergence is real.

And then he continues:

Worse, as an app developer you end up relying on essentially a “shadow” OS provided by a team that has a fraction of the resources for updates, tooling, documentation, ..
[...]
It is important to keep in mind that the platforms are evolving rapidly and the customer desire for well-integrated apps (not just apps that run).

The very last point is what I already stated in my own post on why FireMonkey is wrong. I didn't even write about the even more important first ones. And this only are quotes from one paragraph where he thinks about cross-platform libraries.

I strongly suggest that you take a few minutes and read what Sinofsky wrote about cross-platform development. And then, if you currently feel that FireMonkey could be the right tool for you, try to understand his points and re-think your position on cross-platform tooling. I'm sure you will see that FireMonkey can't be the right tool for you - or anybody.

Ask a Ninja: Where do you get your Ninja-skills from?

My second "Ask a Ninja" post is about where to get your skills from.

Well, first of all training, experimenting, using a good portion of your spare time for improving your skills. And then, of course, from others. Others that are willing to share their experience and their knowledge. Preferably in a medium that can be persisted (but also one-to-one sessions are invaluable for sharing knowledge).

Because of that I have a quite impressive library full of technical books. In the recent years I moved my library into an electronic form. I have a lot of ebooks and carry most of my library with me all the time on my Kindle. That way I can look up and re-read the important things whenever necessary. You don't need to know everything, but you need to know where you have that information when it's required.

So, what's the point of this blog post you ask?

Well, I just stumbled upon an impressive library in ebook form that is available for free from Microsoft. Its a Huge collection of free Microsoft ebooks for you. They cover Sharepoint, SQL Server, Visual Studio, Web Development, Windows Store development, Azure and Windows.

So, if you want to improve your Ninja-Skills - go and grab them while their hot and start to read. And of course, spent some time to experiment with the knowledge to tighten what you just read. 😉

Ask a Ninja: Automated WordPress blog backup using Git

I thought I had posted this already, but the article list of my blog tells otherwise. Early this year I posted how I moved this blog from the old server to the current one. After that I thought I also could automate the backup this way.

So, what are the required steps?

  • Create a dump of the database.
  • Add the dump and all local modifications to the local repository.
  • Commit the changes to the local repo.
  • Push to a remote repository.

In my case I like to go sure and push to two remote repositories.

So, this is the script that will backup my blog and push it to my repos:

D:
cd D:Websdotnetninja.de
SET PATH=%PATH%;D:MariaDBbin
del backup.sql
mysqldump --skip-dump-date -u backup blog_dotnetninja.de > backup.sql
git add .
git commit -m "Automatic backup"
git push origin
git push backup master
exit

To automate the backup I just created a simple scheduled task to execute this script once a day.
Restoring the blog from the backup is as easy as described in my blog post about the move.

Ask a Ninja: Do I need Typescript?

If the .Net Ninja would have been asked this question, this would be the answer:

A few days ago Anders Hejlsberg showed a new thing currently brewing in the Microsoft labs: TypeScript.

TypeScript is:

  • JavaScript
  • + some (optional) language extensions
  • + a Compiler (more of a extractor, in fact), that removes the extensions and throws out vanilla JavaScript

The compiler itself is also written in TypeScript, so it can be compiled down to pure JavaScript and run wherever JavaScript will run too.

So, now that we know that TypeScript is a mere superset on top of normal JavaScript - what is in these additions that could be interesting?

  • Strong typing
  • Classes
  • Interfaces
  • Simple inheritance
  • Modules

Well, in fact that's pretty much it. With some annotations in Pascal-Style (that is, colon + type identifier) you can define that a specific variable, function argument or function return value needs to be of a certain type.

var testFunc = function(arg1: string) { return "Argument was: " + arg1; };

Now the TypeScript compiler knows that only strings should be passed into the function assigned to testFunc. And it can infer from the input argument and the operation within this function, that the return value must also be a string. Now, when you try to pass i.e. a number into this function, the compiler will warn you about this, and the same goes when you want to add a number to the return value of this function.

Actually not only the compiler, but also the full IDE support in VisualStudio will highlight this as a potential problem. Also the IDE is so smart to restrict your Intellisense autocompletion to valid types only. These simple annotations are a big player in making JavaScript a bit safer when working with different types.

TypeScript also allows you, to annotate external libraries like jQuery, Prototype, Qooxdoo etc., and it comes with some of them already pre-annotated to give you a head-start.

The other interesting thing is that the way of modularizing the scripts sticks strongly to what is currently proposed to become the next EcmaScript 6 standard. Of course this is only a specification draft by now, and will take some time to be finalized, and it is not sure if the specs will stay this way forever, but this way it is very likely that what you learn with TypeScript can be used in the future for vanilla JavaScript too.

Ask the Ninja: "So, do I *need* TypeScript?"
Ninja says:
Need as in totally and absolutely required? Of course no.

TypeScript is an addition to JavaScript that, if used correctly, can help you avoid some nasty bugs. And only, if you are a fan of strong typing and come to JavaScript from other strong typed languages on the .NET or Java Platform or even from Delphi. Then TypeScript is targeted for you!

When you already are a happy JavaScript developer and make use of the dynamic typing features of the language, switch prototype chains on your objects as required and love applying and removing things at run-time, then there is nothing in TypeScript for you.