2008-06-24

Day off and mobile internet

Disclaimer: That post should be published one day before, but it is today, due to unusual mobile posting problems.

I took one day off to enjoy summer in beautiful nature surroundings.
Being in possession of cell phone with some smartphone possibilities, I have decided to try out mobile internet.

Few days ago, I have turned on gprs service at my cell provider. Everyday I have been surrounded by computers, almost all connected online.

What for do i need internet on my cell? One reason are convenient online services kille weather or dining searching. The other explanation is that I am computer/internet addict.
I have tried some useful software on my nokia s40 phone.
Here is my list:
Opera mini - good for search and news browsing
Gmail mobile - explanation is not needed
Midpssh - nice ssh client - for use in critical situations
Some IM ,SIP VOIP client - still searching
The real pain is text input. It would be nice to use better method than T9

Enough. I am going for a walk

2008-06-16

Analysing SleepTracker data for R.E.M. phases

I'm collecting sleep data from my Sleeptracker watch for above 3 months. It's a good time to check what is interesting in there. My goal is to check pattern of my sleep R.E.M. phases.

First, I get my sleep data in csv format stored on www.sleeptracker.net site. Then using simple python script I've converted all night/time data into relative values. Now I have all SleepTracker movement occurrence time saved in "hours after going sleep".

After looking at initial data I was a little bit disappointed. I have not seen clean gaps for non R.E.M phases. Picture was so cluttered. There were many possible reasons. I've choosen one - sometimes I go sleep at different time, but sleep patterns may be anchored to usual hours.

I decided to make more preprocessing to move out "outstanding" data. I removed data instances with non standard "going to bed" times, movements it the first hour of sleep and some made some minor tweakings. When I viewed data again the picture was much clearer.



I rerun data clustering software using standard kmeans method and get some results:

Cluster 1
Mean/Mode: 0.8458
Std Devs: 0.2783

Cluster 2
Mean/Mode: 1.5433
Std Devs: 0.2069

Cluster 3
Mean/Mode: 2.6117
Std Devs: 0.2857

Cluster 4
Mean/Mode: 3.6685
Std Devs: 0.236

Cluster 5
Mean/Mode: 4.8086
Std Devs: 0.2848

Cluster 6
Mean/Mode: 5.9405
Std Devs: 0.4276


That means that probably REM phases occurs about 0.85, 1.5, 2.6, 3.6, 4.8 and 5.9 hours after falling asleep. Common value found in literature is about 1.5 hour between R.E.M. phases.
Bigger standard deviation in later vs earlier phases reflects longer R.E.M. phases at each cycle, what is also found in sleep research. First cycle must be treated in special way - the sleep pattern is different sudden after falling asleep.
The last one can be also less reliable because of standard waking time after about 6 hours.

I decided to pick R.E.M. time values by my self using "smoothed" histogram of sleep data. Local modalities are clearly visible on picture.


Now middle of phase values seems like: 0.8, 1.6 , 2.3 , 2.8, 3.4 , 4 , 4.9. Comparing these results with those from automatic clustering method it seems that "k means" method joined 2 close groups together.
I assume that manually picked results are better.

Time for conclusion.

To get more reliable results I need more data and regular sleep :) . SleepTracker is quite effective by using simple accelerometer movement detection method. Unfortunately many movement events aren't recorded (for example the subtle ones or when the arm is blocked under the pillow). I will try to catch more good data and check results again.

2008-06-09

Summer is coming

It's really hot for last few days - real summer weather. I'm often going out for outdoor activities.

I'm finishing production stage of one of bigger projects I'm working now. I'm thinking about summer time and resting in some nice place.

Price of gas is arising, but I'm planning only one long distance trip. For other destinations I'm going to visit some places near by, maybe do some biking.

Anyway- my home computer is more turned off, and my new projects are going on the shelf for some time.

2008-06-02

PyCha chart drawing library working under SPING

I've ported PyCha chart drawing library to work under SPING instead of Cairo. I've discussed that idea with PyCha lead developer - Lorenzo Gil Sanchez.

Lorenzo said he will still stick with Cairo, that is good library ported and available for many systems. Anyway I've ported PyCha to SPING and tested with SPING creating SpingCha off-project.


Here is lastly created PyCha news group , where you can find posts about PyCha development.
And - if anyone is interested - working SpingCha release in files section.

2008-05-26

Fighting with duplication - attack of the clones

Last week I proceeded with clean up tasks on one project I'm working now. It's big number of jsp generated reports. Some time ago I setup initial environment for such reports, so even analytics without special JSP or JAVA knowledge could do some reports. It was temporary solution, before jumping to more enterprise tool. Results were not so bad - new reports were added very quick. Everything worked and was usable so enterprise tool is just forgotten mention.

Over two years it overgrowth overall code base and many problem issues and problems were related with those reports. In harder cases I had to dig into that and what I have seen is a massive duplication. I any possible form. So when we found bug in one section of code there was huge probability that the same problem was duplicated more times elsewhere. It was even worse. Instead of new features added to base report there where created new versions of the same report with new parameters or features. And divergence between those reports arises over time. So you can imagine how much time maintenance started cost at some point.

I like to stick a specially with one coding rule: do not create new and remove existing duplication.

Duplication starts when some functional block of code appears at least in two places in code. Sometimes it is just simple one liner - but even then you should consider pros and cons for wrapping that piece of code in some procedural statement.

Let's back to vicious mechanism of avalanche of duplicated code in non programmers environment - when people don't know good "coding" practices. One of reports (let's call it A) was copied with some functional difference (as B). We have got almost two identical files. Then comes another feature that seems to interfering with previous one. Because new feature ticket was assigned only to one of those reports (after some time everybody sees two reports and thinks - those are two separate reports) and new feature seems so "new" and practice before was to create separate functional report (for "clear solution") so there comes new report C derived from A. After some time somebody realized that there is no such feature for B report. Continuing previous process we have now 4 reports with D version. Then comes major change - added new module that shows new values in similar manner so base reports A,B,C,D are copied as 2A,2B,2C,2D and then changed some values and layout. Almost pure copy'n'paste coding.

Having one simple report at the beginning that would have parametrized additional features it's now eight versions of duplicated code with less than 10% differences. Cost of removing bug or implementing new feature is to about eight times bigger than for single but more complicated file. Not including effects of further "extension" mechanism - avalanche just gains new mass in exponential rate.

You could think what kind of procedures you have there to allow that kind of practices. Ok - team of report analytics was told to eliminate visible duplications, but it wasn't enough. When somebody concentrates on complex analytical problem doesn't think much about removing duplication. Even when report is complex and amount of code and sql is enormous. It's just additional burden that seems not helpful for analytical problem solving.

What could help? Some quick course of coding practices and techniques in context of that environment - basics how to write reusable pieces of code, avoid common pitfalls and reduce codebase to ease development and maintenance. And more practical explanatory examples. Sometimes duplication problem is visible but there is no simple solution - then pairing with more experienced programmer would help. It's just organizational issue.

So now the team is fighting with reports bugs and feature requests. And the biggest impact we are gaining now by not touching those new features or bugs but by merging and removing duplicates. It's ironic but number of bugs is decreasing now in reverse exponential rate.