March 2013 - Gingter Ale

2013-03-26

I’m done with Drobo, too…

I made a mistake. A big mistake. Something I can correct, and which I will correct very soon.

My mistake? I already teasered it in my last post about my pet project: I bought a Drobo S as a storage solution. The Drobo S is the predecessor of the current Drobo 5D.

The title of this post is a clone of the I'm done with Drobo post of Scott Kelby. In this post he describes that he had some issues with his Drobo. He eventually ended up in a situation where all drives in his Drobo were stillokay, the Drobo itself wasn't and since his device was out of warranty he would have to buy into an extended support package to be able to access his data.

Well, my own situation is not (yet) that bad, but I have a strong feeling I may end up in a similar position.

Now, what are my issues with my Drobo S?

I already use my second replacement unit. The very first Drobo I received after ordering had a problem with the drive bay in the middle and wouldn't recognize a disc in it. In the first replacement unit all five slots worked and that was fine for almost a year.

Then, as I already mentioned in my previous post, my server suddenly started loosing the connection to the Drobo. On a regular basis I came home and my home server would miss the drive. Only a reboot of the host computer would (most probably, but not always) fix this.

This of course was very annoying, but was not extremely critical because I only had media files stored in the Drobo which were available through a TVersity media server. I could not stream videos through my home when the drive was lost, but that was okay in the beginning. The connection was eSATA, because USB is too slow for streaming two full-HD streams at once.

It became more critical when I started to run my evaluation VM on that home server and placed the virtual hard disk of the server on drive D (my Drobo drive). A disconnect could leave the VM in an inconsistent state and probably damage my infrastructure.

Then the disconnects started to happened more frequently over time until I encountered this issue daily and even multiple times on a single evening. As a software developer I know how to troubleshoot and check for possible error sources: The Drobo also lost it's eSATA connection to another machine. USB was fine on both, but as already mentioned not an option because of the slowness of USB.

The Drobo service tried hard to fix this and eventually sent me another replacement unit.
This was okay. Now - guess what happened then? The replacement unit starts to show the same issues too. Now, even the USB connection get's dropped once in a while.

So, while I initially was extremely happy with my Drobo and it's performance, I'm currently in a state of constant alert for when my Drobo will eventually fail and won't be accessible anymore - together with all my data I stored on it.

Of course I have a backup of the important data (honestly, my terabyte large video archive isn't important enough to keep it as a backup, so that would be a loss, but given the time I can spend on watching them it wouldn't be that hard). But the main idea of a large storage, with a very fast connection directly attached to my home server is to have direct and instant, always-on access to the data. Something, that I thought my Drobo could provide. But something, that a Drobo obviously isn't capable of providing in a reliable way.

So I'm done with Drobo, because I can't trust my device to function properly any longer.
I need to check for alternatives soon. If anybody knows of a solution for my problem, that is holding currently about 6 TB of data, more incoming, with very good performance and data throughput (just like a normal internal HDD), so please tell me.

2013-03-25

Setting up my infrastructure – Part 7: The evaluation begins: Installations

After I picked the evaluation candidates I first tried a test-setup on a development VM at home.

Download

For this I downloaded the evaluation products from the Atlassian homepage and the free installers from JetBrains. Please note the slight difference between a 'product' and an 'installer' download. I wanted to do a side-by-side installation of all tools on the same VM to compare them easily.

Just as a little side-note, I will do a blog post on my hardware that drove me crazy during the evaluation. Just so far by now: I have a Drobo storage attached to my server at home, and I had the virtual server hard disks on that drive. Now guess what happens when suddenly the host machine looses the connection to the Drobo. Regularly and over and over again. But as said, this will be part of a separate blog post on its own.

So, after the download I ended up with two .exe installers for YouTrack and TeamCity, and with two zip archives for Jira and Bamboo. The Atlassian web site then directed me to a documentation link where I had to look for the installation instructions matching my setup.

Installation

All four products are Java-based.

JetBrains solved this very sound by obviously packaging the required runtime directly into their programs. I did not need to install Java on the system before installing YouTrack and TeamCity. Both programs as well as the first build agent of TeamCity were installed as autostart windows services automagically. They installed fine and directly started to run on the their corresponding port that I could change in the installer.

Now the tricky part began: Installing the Atlassian tools. First of all, the documentation suggested to install the 32-bit SDK, even on 64-bit machines. Just to get this straight: We're talking about software that aims to be run in a production environment for enterprises, and they suggest using a lot of ram. This was my first WTF-moment with the Atlassian tools. I loved to choose a 64-bit runtime, and not the SDK but the real runtime, but well...

So, I installed Java. The JDK. For 32 bit. I then had to unzip the zip file and choose an installation and an instance folder: Second WTF-moment. The instance folder is something like the working directory of the program. Okay, so I did. In a small side-note in the installation documentation there is mentioned that there should be no space in any path name. "Any" means, no spaces in the path to Java, the products installation directory and the products instance directory. Of course, Java is installed in "C:Program Files..." by default. With a space in it.

Being a software developer myself I can only shake my head about such a ridiculous requirement. A software should be written in a way that it can cope with valid paths on the corresponding operating system. Especially software that is intended to help other software developers. Well, of course I ran into problems with my default Java installation location and had to uninstall and re-install Java again to another location.

The next tricky part was installing the Atlassian software as a windows service. You have to manually use a Java service wrapper tool for that. Oh, and I almost forgot: To configure Jira and Bamboo you need to manually edit configuration files, which are not really well documented...

After all, I got all four system to run. That is, I could open them in the browser and set the systems up.

So far, it's an extremely clear plus for YouTrack and TeamCity. Installation is very easy and no hassle with config files, Java paths and service wrapper tools. The Atlassian stuff might be suited for enterprise use with a special person dedicated to setting up, configuring, fine-tuning and maintaining the system, but for a one-man show the overhead of a simple tool installation seems too much.

In the next post I'm going to describe the first functionality tests.

See the other parts in this series:

[contentblock id=infrastructurelinks]

2013-03-11

Why I don’t post pictures of my daughter in social networks

I've been asked a few times why I don't share pictures (or any data) of my little daughter on the intertubes.

Well, it is not true that there is not a single one of her available. If you know a few not publicly available details and where to look / what to search for, you may be able to find a single image showing just her. And there is one other picture she happens to be visible on, and that is my wife's Facebook profile picture that shows our whole family at our wedding. So that's not a photograph of her alone, and you can't see too much of her on it. Besides the first image, there is not even one online mention of her name on the internet, and I want it to stay that way.

I take care of what I share on the internet and with whom I share what kind of information. It's a bit tricky to keep track of that, but it's possible. As an example, I'm pretty sure that no one knows my middle names, without having in person heard them from me or someone who knows them. Well, at least, Google doesn't know them, which is a good indicator for that. Oh, and by the way, if you know them and haven't heard them from me, then please tell me who told you. I need to sort that out... 😉

So, why exactly is my daughter's name and images of her such a secret to the public (= non friends & family)?
It's because I want to protect her. And it's because she is (better: she will be, at some day in the future) the only one that should decide what should be shared about her.

Social engineering is a threat

So what it is, I want to protect her against? It's a very easy form of social engineering to gain her trust later. The sad thing is, that it's not just very but extremely easy.

I want to give an example: What could be the problem of sharing a picture of her, now when she's about 20 months old, in our garden, together with her uncle and any other random person that is well-known to our family while playing with a ball?
This is not a problem now, but let's pretend since that very day 2 and a half-year passed, and she's in the age to be an attractive target for a child abuser.

Well, this bad guy could see this old picture on a social network. Analyzing the picture's metadata (not necessarily the date on which the picture was shared) he knows when this picture was made, and potentially also the geo-coordinates where it was taken, because modern smartphones or cameras do add these information to the image data. He then probably simply follows the connection graph of family members and their friends to find out the names of all the people on the image. It's even more easy if those people are tagged.

With all this information, he can make up a story around this day - she won't remember that for sure. She probably only knows the situation from the very same picture as her single memory of that occasion. So he could go like "Do you still know me? I'm that friend of your uncle Martin and Pete. I was around when we played with that red ball in your grandpa's backyard and helped you get up again after you hurt your ankle." The last thing is totally made up, but nobody remembers such a small injury - but of course, he helped her out. That's for sure a nice friend of Martin and Pete she had so much fun with.

And then he goes "Martin said, I should pick you up and bring you to your Mom. She's back from work early and want to go to the city with you to buy some nice clothes. Perhaps you will get some ice cream too?" That's it. She's going with him.

And all this just because too much information was shared. Information, that is harmless in that specific context, but can be combined to design a seemingly authentic story a little girl can't look through. The more information you share, the better the story can get, the more trust that person can build up.

Is it real?

The example above of course was made up. But it was made up in a time where burglars already check your Facebook timeline to find out when you're in vacation and then check Google Maps and Google Earth to determine the best way onto your ground and where to vanish unseen with all your stuff. Such a story is possible, and someone will eventually do that or already did that, and I just didn't read about it.

If you share, think about what and to whom

I also found some good advice in this other post over here at reputation.com.
Of course the zero-sharing path I chose may be not the right thing for you. If you really want to share, then make sure what you share. Remove metadata from pictures before uploading them. And make sure, with whom you share. Most social networks have options to only share with your close friends or certain groups (like your family). And you can make sure that those people are not able to re-share this in larger circles.

This is, what this post is about:
Use those features. Take control of your information flow. And make sure that you don't share information that could be abused easily.

2013-03-05

Setting up my infrastructure – Part 6: The evaluation candidates

In this post I want to introduce the evaluation candidates for the bug tracker and the continuous integration software I'm going to use for my pet project.

Since I want to spent not too much time on my infrastructure, I just want to check out two or three candidates for each, and I already have a list of bug trackers I'm definitely not going to use, so I start with them first. Also, the Wikipedia comparison sheet of issue tracking systems is a good reference to exclude some applications. I'm going to start from the side of the bug tracker and from that I make an inner join on the available CI servers with integration possibility as the join condition.

My filters

Again my little disclaimer: This is my personal list of filters for my personal pet project. They may or may not apply to your use case or requirement catalog.

First of all, I want to host the solution myself and not depend on someone else's infrastructure. Then as the next thing I already mentioned I want at least a minimum integration of bug tracker and CI server, so anything that totally does not know from each other is not in my scope, as well as tools that don't integrate with any CI server solution.

You already know that I use a Windows Server for hosting, and I don't want to mess around too much with my IIS, so I'd like to stick with solutions that are either ASP.NET or PHP applications, or that are not hosted in the IIS directly. Besides that I don't want to manually administer an extra apache on a system. I don't know enough of that and I don't want to spent my time learning how to manage another web server when I already know how to manage my IIS and administering web servers is not my main business. I'd rather spent my time learning more about things that really push my skills forward and making me more specialized.

When thinking about the database, I want to use either MySQL/MariaDB or Microsoft SQL Server Express. I know how to manage both as well as Oracle (which I don't want to set up and keep it running myself without the help of an experienced Oracle DBA), and learning setting up and running yet another database is not on my to-do list for now.

Those restrictions already strike out a lot of possible systems, and the next one will make the list even shorter: I don't want to use something that is not commercially maintained. There are several reasons for that. If there's a bug that itches me, I don't want to hope that the community is going to fix it. In several open source projects the normal answer to a bug report is "where's the pull request for the fix?". I don't want to dig into the code of my bug tracker to fix issues myself. I'm willing to pay for my tooling even if I try to keep expenses low.

This is the last filter: The software should be affordable for a one person show and scale up to a small team of about 5 until it gets more expensive.

The candidates

After applying my filters to the list of available bug trackers, only a few are left over. They are only commercial solutions where I can rely on support. Then I additionally filtered a bit more for products from companies where I have a feeling that they are well-known in the developer communities so I can additionally rely on fast help via StackExchange.

First of all, something what I tested some time ago and is indeed a good software for keeping track of your project and your to-dos is FogBugz, but the self-hosted edition is too expensive for me (the entry point is a 5 user licence at 999 USD).

As I already use a lot of stuff from Atlassian, it would be logical to check out their solution too. This would be Jira. It is the same 10 USD for 10 User entry point and integrates with Stash, FishEye and Crucible. That would make it a first class citizen in my current environment. They also have a build server, Bamboo, that would fit in nicely too. So Jira and Bamboo are my first candidates for the evaluation.

Besides that I already use tooling from JetBrains (ReSharper, DotPeek), and they also offer a bug tracking tool called YouTrack and a build server named TeamCity. For both tools JetBrains offer free licences that restrict either the number of possible users or build configurations. So with 10 users and my single project I would be in the free licence area for both, and upgrades are affordable for larger teams starting at 450€ for a 10-user YouTrack licence and 25 users is a mere 225€ upgrade. TeamCity upgrade is more expensive, but it is also possible and allowed to set up more than one free TeamCity instance if it really would be necessary. This seems a good pack and so they are in the evaluation.

So far I am very happy with SoureTree, ReSharper and DotPeek and I have a feeling that both companies can deliver a decent bug tracking and continuous integration software for my needs. That's why I chose to stop with picking candidates at this point. Evaluating four products is a not so trivial task already and if both products in a category would fail, then I still can choose other candidates to check.

Continue with the next part, or see the other parts in this series:

[contentblock id=infrastructurelinks]

2013-03-01

Setting up my infrastructure – Part 5: Additional tools, server and hosting

In this post I'm going to mention all the other necessary stuff for a project like mine.

Preamble: Actually, it is a spare time, private thing, and as such I don't want to spent too much money on it. I also don't (yet?) know how long this will take and as such don't want to pay too much. Especially not on subscriptions for services.

So, where to start? I think source control is the most important thing for a software project, so let's go.

Source control

I chose Git. In the first infrastructure post I already mentioned some of my versioning tooling (which in fact already changed up to now). I have a lot of experience with SVN, not yet so much with Git, but as I already mentioned it seems that from an adoption and acception point of view Git is the new mainstream source control tool. It is powerful, it is cross-platform, and GUI clients support is growing. My other alternative would be Mercurial (Hg), but despite it's better windows GUI clients, adoption is not that good and I want to be able to ask questions on StackOverflow and get help quickly.

So, I already said I was using Bitbucket from Atlassian for hosting free private repositories. This is only partially correct by now. I decided to self-host my repositories and use bitbucket as an additional off-site backup for my repositories. Why is that? I don't want to be fully dependent on a single point of failure (Bitbucket). They host in the cloud, and we all saw that the big cloud players like Amazon with EC3 and Microsoft with Azure can encounter large-scale problems. Even if Atlassian takes all precocious measures to keep their service available, which is probably not the case given a lot of people are only using the free stuff, something really stupid like expired certificates at the cloud side could render the service unavailable for hours or even days.

My idea is the following: I mainly work on my self-hosted repository. Whenever my build server has a new successful build, it will automatically push that to the Bitbucket repo. This way I have a repo backup on my dev mashine, Bitbucket with the latest fully working state (since you commit and push often, that should not be too far away from my local copy) and of course my self-hosted repo. That should be enough safety in case something happens to my notebook, my server or Atlassian.

Speaking of Atlassian, they have this great Git client SourceTree for Mac. They recently announced opening up a beta test for SourceTree for Windows via Twitter. Guess what? I signed up 😉

You see, I use Bitbucket from them, I use SourceTree on the Mac from them and I'm eager to get experience with their SourceTree for Windows. Atlassian is very present in my Git-centric versioning environment, which is why I also started to use their product Stash. Stash is BitBucket on my own server. I can create repositories, manage permissions (okay, currently I'm the only user) and have it automatically manage my branches. And it is very cheap at yearly 10 USD for 10 potential users. So when my project succeeds, and I stock up my development team beyond 10, then I for sure will have the money to upgrade.

Source quality

Since you now the tooling I use to store my sources and to manage it on my server and my development machine I want to introduce another tool I bought and installed, even if it's usefulness is (currently) questionable. I bought FishEye and Crucible from Atlassian to. At 10 USD each it was not a real investment, and I feel that FishEye lets me keep control over my code more easily. It allows fast searching through all the project code (in 5 repositories for 10 users) and lets me browse through the history of my code in a convenient way. Crucible as a code review tool is probably not of so much use for a one man show, but perhaps later on somebody want to join my efforts with this project and potentially participate on revenues, if this becomes successful. Crucible is the only tool thats the 10 USD for only 5 and not 10 users.

Hosting

For a long time I had a hosted Linux root server (dune) at Strato for 49€ / month. It used to host my email server (I completely switched to Gmail for my domain a few years ago), hosted my first blogs, some home pages and discussion forums for the guilds when I still was playing. Besides that I had a very small Windows Server at 1&1 (smarthost), which I got for 14 € / month as a special offer during my studies. But it was not powerful enough to replace all services on dune.

As I already posted, this blog (and almost all other things hosted on dune and smarthost) now moved to Gallifrey. Gallifrey is a big Windows Server 2012 'Level 4' V-Server at Strato, with 4 virtual CPU cores, 4 Gig of ram and a 250 GB HDD. Enough power to host those littles web sites, my blog and my complete build environment. I ordered Gallifrey when there was a 6-month free offer and it is at 29€ / month. So I canceled dune and smarthost, which will in fact save me about 34€ / month while at the same time offering more power.

Backup

As already mentioned, my sources will be automatically backed up to BitBucket. By now, I also put the sources of this blog and all other homepages into Git repositories which are also automatically backed up this way. All databases are dumped on a regular basis and copied over both my home server and a cloud storage. Same goes for the working directories with config files and changing contents. They are copied to a backup location, zipped and transferred together with the database dumps. All that is triggered by a scheduled task on the v-server.

Summary

So the toolset for my pet project is right now:

VMWare Fusion
VMWare Workstation
Microsoft Windows 8 Professional
Microsoft Visual Studio 2012 Professional
JetBrains ReSharper
Source management:
- Git command line client
- GitHub Client for Windows
- Atlassian SourceTree for Windows Beta
- Atlassian BitBucket for Backup
- Atlassian Stash for Repository management
- Atlassian FishEye for code search
- Atlassian Crucible for code reviews
Other little helpful tools:
- JetBrains dotPeek .NET decompiler
- The Regulator, a regular expression tool
- LinqPad to easily test code snippets and experiment around

Update: Fixed some typos. Thanks Manuel 🙂

Continue with the next part, or see the other parts in this series:

[contentblock id=infrastructurelinks]