After working as a freelancer on the biggest social network in Spain, I’ve come to value the principles of what Eric Ries calls, the Lean Startup. Among all the things he talks about, one of them stroke really powerful with us, and that is, the notion on Continuous Development. Having experienced the slowness of development in my previous company, I realized that what Eric proposes was key to the success of Inkzee.
We meditated the decision, and we got to the conclusion it was worth the effort and that we would, not only learn a lot during the process, but that we would get greater productivity on the long run. So off we went to implement the 5 steps Eric proposes:
- Continuous integration server: We created our own home brewed mini system that allows us to add new tests that get started every time we need them. For now it’s a rather rudimentary system but does the job handsomely.
- Source control commit check: We already had a source control system, so we just added all the commit checks. At first we thought this was a waste of time, but with some time we realized how many bugs we had/introduced without this. We added 3 very basic checks, a python syntax check (with pylint) , a very basic php syntax check and the automatic triggering of all the unit tests we have. These checks also enforced a common style rules for all the code under SVN, a somehow gruesome task at first (all the old code was triggering the syntax and style checks), but very rewarding in the end.
- Simple deployment script: We had a very small deployment script but we fine tuned it so that it work flawlessly with the new AWS infrastructure. As for now, we make all the deployments in the same fashion, something that reduces the number of flaws you can introduce in this step. The script also creates a backup copy of the previous running code, in case you need to revert to the old version because of some critical error. As to date we’ve never needed to revert anything
- Real-time alerting: We introduced a lot of Munin plugins to monitor all type of parameters on our servers. We even developed some very simple Munin plugins for Tokyo Cabinet. We are still missing some alerting here. We have all the graphs but we still need to setup some alerting framework to detect weird situations.
- Root cause analysis (5 whys): This is probably the most critical part of the process. We’ve realized that this 5th step is what makes all the previous ones work. This process is an iterative one, you start with a small thing but with time, it will grow into an amazing process that’s is able to detect the slightness problem way before it makes it to the live servers. We’ve become used to always ask the 5 whys and it’s helping in improving the quality of the software we code.
So all in all, it’s an incredible experience we are still figuring out but that to date, has been impressive. We’ve managed to do 14 code uploads in a single day with no bugs whatsoever, plus stopped introducing small potentially critical bugs in the code base and all the way into production. If you want to give it a try, please do read the original posts by Eric Ries, you wont regret it.

#1 by Jorge - December 28th, 2009 at 09:47
Alex,
Agree on most, small integration servers are indeed a need. Go through the path of Development server in the developer machine, then integration with more or less the same environment, and production.
With Syntax I simply use python plugin for Eclipse, that is take care of.
In deployment I use versions for all content. That is, a CSS app.css would go to S3 as app-0001.css. These files are compresses and long times in the future. I can version independent modules (JS, CS, Themes, etc…). Integration server and development server goes with app-0000.css with no future dates and loading normally. Every time I do a change in a module, generate a release in that part of JS, CSS, and the HTML will call that file. I have a deployment script that takes care of this. Loading of heavy JS content is faster this way for users visiting more than once.
I do a monitor alert, very simple, that a search is accomplished right and therefore front and back are communicating, web servers ok and services. But this area is never enough, last night found that home page had an error and did not notice (I am playing around with a new front release) and did not have development debug in Django to False, when error message shows and Email is sent with error. Still, the better monitoring you can afford the better.
#2 by Alex - December 28th, 2009 at 10:40
Hey Jorge thanks a lot for sharing!! Yeah, the file versioning is quite handy, specially for js/css flushing. We should definitely implement something like that.
#3 by Jorge - January 6th, 2010 at 22:44
A follow up for environments….
In the corps I worked in Spain they usually have the environment for developer (Eclipse, Webshere, etc…), then a development server, an integration server and production server. Some even have more environments. But actually there is ususally problems going from one environment to another, no matter how hard people try this not to happen. So, my advice would be if you don’t have too many human resources, go with just development server and production environment. I mean development server not the environment in the machine of developer.
If you have many people working on the project then play around with environments, integration ones are usufull, even more if you have domain issues with other web services, providers, etc… that need to replicate a production server.
Another problem with environments is that when many projects are uploading stuff to for example a development server, admins sometimes needs to shutdown stuff which leads to people not being able to test their apps in the development server. Amazon could help here, creating an instance for the people on that project and have many development instances only used when analyst or project managers are testing, etc…
#4 by abarrera - January 7th, 2010 at 11:11
This is interesting… Right now we use 2 environments, testing and production. Both of them are (or we try) identical. Same OS, same sw, etc. so that when testing it doesn’t breaks things in production. The truth is that we’ve sometimes uncovered bugs with this, having some sw in the development environment and then not having it in testing and of course in production.
I agree with you, even though right now we have a server outside AWS for the svn and testing, we should, in the long run, put it into AWS so that it uses the exact same image.
Thanks for the comment man!