Thursday, 8 November 2012

Anti spam framework - making sense of Spam

Anti Spam System used in CF - Real Estate

Hello Everybody.

In this post, I will be explaining the framework we have implemented to handle spam on our site. As you know commonfloor allows owners,seekers & real estate Agents to find and communicate with each other . The side effects of this wonderful idea is spamming. We needed a system to monitor,detect and protect genuine users from spam messages.

We have 4 categories of users

  • Seeker - looking to buy/rent a property 
  • Owner - looking to sell/rent
  • Real Estate Agent
  • Builder

At commonfloor, we allow only registered (users with verified mobile numbers) users to communicate with each other. When a seeker is interested in a project/property, he/she sends a message to the owner of that project/property via sms & email.

All communications are processed by our anti-spam system before being sent. When the system receives a message to be checked, it sends the msg to the algorithm which returns a 'spam score' (a probability of the msg being spam) We have internally set a threshold for this spam score. if the score is above the threshold, it will be categorized as 'spam' and the message wont be sent. if the score is below the threshold(the algorithm is not sure if the msg is a spam), it will be sent to a moderation console where our team will decide whether the msg is a spam or not . This decision is sent as a feedback to the algorithm for self-learning. The algorithm gets better as it processes more messages and receives more feedback. Based on the decision made in the moderation console the message will either be sent or discarded. All spam messages are stored in a separate db and used to train the algorithm and for future analysis.

With this system in place, I can assure that users will receive messages from genuine users and are protected from receiving spam messages.

Monday, 3 September 2012

Tips in debugging Java script, CSS issues with Internet Explorer

1. When some UI issue like the input size is displayed smaller in size , then the problem may be because the parent elements size may not have been defined or it may be of smaller size.

2. In the case of tables in ie, when one of the column does not have any value , then we should keep an extra   to represent some value , otherwise it wiil show the broken lines. We can also add frame attribute if the above doesnt work.
 
3. getElementByClassName function of prototype does not work in IE7 . We should use $$(element type : classname ).

4. Most common issue with javascript failing in IE is an extra comma is added after last array element also.

5. Including a js file 2 times, shows that element not present in IE7 , IE8.

6. In Jquery new option for adding dynamic select doesn't work in IE7 - the workaround is to use .append .

7. IE7/IE8 - ifrmae gives default border. In order to remove it, need to set this explicity for iframe - frame-border = 0 ;

8. When we are trying to put some javascript code inside some unclosed element , then in IE7 it shows problem.
   Workaround - add some dummy element as first clid of the parent and after evalating the javascript remove the firstchild.

9. directly selecting option with the id not working in IE7,8,9.

10. border attribute doesnt work in IE 7 , 8. Work around is to use images as borders.

11. The css written for the html should be verified from w3schools css validators.

12. The js written for the html should be verified using w3schools js validators.

13. HTML should be proper , like proper defining of the head, body tags.

Friday, 27 April 2012

Strange PHP/Zend/Apache2 issue

Setting up a new laptop for development on a complex system hardly ever goes according to plan. The same thing happened with me and I ended up getting my entire office involved in figuring out the problem.

I ensure I back up every single thing when I switch systems. I did the same here. I've even contemplated writing a small shell script that can install everything I'll need. But anyway, all that happened perfectly. Everything was installed and we were ready to go.

I fired a browser, hit the URL and lo, the scary 403. At commonfloor.com, we use a pretty cool .htaccess file. And for some reason, apache2 was looking for the file four directory levels above where it was pointing to. It had to look for it at /path_to_home/prog/php/commonfloor.com/another_dir/another_dir/ but it was looking for the file here /path_to_home/prog/. That wasn't even the DocumentRoot inside apache, and the errors being thrown were:

(
13)Permission denied: /home/ashesh/prog/.htaccess pcfg_openfile: unable to check htaccess file, ensure it is readable
Gwibberish. The error message wasn't going to lead us anywhere.

What fixed the problem:


  • Set AllowOverride None
  • Ensure libapache2-mod-php5 is installed/loaded [install it using your package manager if not, enable it using sudo a2enmod php5 if your package manager didn't do it already
  • Restart apache2. Hit the URL and groove away in joy!


[I know that the fix is simple but the point is that it's pretty easy to overlook these problems when faced with the task of setting up a new computer.]

The king of fixes: A really cool systems guy like ours. All credits to Goutham ji.



Tuesday, 20 March 2012

Keeping PHP Sessions on Memcached

This happens to be one of the areas where the production/stage setup is different from our local setups. On prod/stage, PHP sessions are stored in memcached. What that does is that it makes it impossible to store medium - large chunks of data in the session. Doing that is anyhow not advisable.
When an attempt is made to write a large piece of data to the session, it results in a write failure from php to memcached. Subsequently, a read failure from memcached immediately follows.
In cases where session information is needed, this results in irregular behavior. In our case, it resulted in content console users being logged out. The failure can be different in different cases but the pattern will be the same.

Check: PHP error logs, SELinux context (if SELinux is enabled) etc.
Clues: Failures occur more irregularly when there is a write failure as discussed. The failures are more regular with SELinux issues.

The fix for the SELinux issue is simple. Just check:
# getsebool httpd_can_network_memcache

That should output:
httpd_can_network_memcache --> on

If it outputs an off, just do:
# setsebool -P httpd_can_network_memcache 1

As simple as that.