IT Milk

Personal blog of Daehee Park

Funny side of search engine research

As part of my undergraduate research, I’ve been trying to extract information from the AOL search log released in 2006. I’ve had some laughs from several bizarre queries, but I just discovered this old blog post from Something Awful that compiled many of the interesting search queries coming out of the AOL search log. Here’s one of the funnier sessions:

Download the AOL search log or check out parts 1, 2, 3 of Something Awful’s AOL Search Log Special.

What Happened to YellowWorld?

Back in high school when I really wanted to be Asian, I used to be a frequent visitor of YellowWorld, a popular portal and forum for Asian Americans. I still stop by every once in a while, but the front page of YellowWorld.org has not been updated since 2006. The forums still seem to be fairly active, though. Is there no longer cause for politically minded Asian Americans to organize online?

State College Police Department uses Twitter

I thought it was a joke when Twitter notified me that StateCollegePD was following me. But this article in today’s Collegian proves that it is true.

I am skeptical about the State College PD using Twitter to reach out to the community because, as the article mentions, “It will appeal certainly more toward students and younger people, and people who are tech-savvy.” This service will probably only be used by tech-savvy because that is a realistic reflection of Twitter’s general user population. Would you expect an average resident to, all of a sudden, sign up for Twitter just to follow the police department’s updates? I expect StateCollegePD to become another ghost on Twitter after a couple months of stubborn effort.

On another note, it’s interesting that StateCollegePD managed to search out 68 local twitter users in State College, PA, including me. When I tried to do the same thing a while back when I first started using Twitter, I could only find about two dozen at most.

Phishing Attempt: Update Your PSU Account Now

from webnews@psu.edu
reply-to online.webnews953@gmail.com

———————————————————————–
This is a WebNews Email Account Update
See the below mailing information

———————————————————————–
Update Your PSU Email Now.

Dear PSU Email Owner,This message is from PSU messaging center to all PSU Email owners. We are currently upgrading our
data base and e-mail center. We are deleting all unused PSU email to create more space for new one.To prevent your
account from closing you will have to update it below so that we will know that it’s a present used account.

However PSU has been receiving complaints from our customers for unauthorised use of the PSU Email. As a result we are
making an extra security check on all of our Customers mailbox in order to protect their information from theft and
fraud.

Warning!!! Email owner that refuses to update his or her Email,within two days of receiving this warning will lose his
or her Email permanently. You are require to send us the below information via : online.webnews953@gmail.com

Requested Information

Email Username : ………. …..
Email Password : …………….
Date of Birth : …………….
Country or Territory : ……….

Thanks for your co-operation.

Copyright @2008 PSU. All rights reserved.

The effect of client website quality in Google Online Marketing Challenge

I’m curious how the global winners of the Google Online Marketing Challenge completed a successful Adwords campaign with a framed website. Check out their client business’ website at The Hangout.

The use of frames means that the only landing page they would have been able to use was http://www.thehangout.com.au/base_set.html, which could not possibly have a great quality score. Unless they linked directly into a subframe, which would pose another problem since it would take away the navigation bar. This makes their accomplishment even more impressive. :)

For future competitors, now you know that the quality of the client’s website does not have an effect on your final scores.

Google Online Marketing Challenge is a Black Box

After the results from the Google Online Marketing Challenge were announced this morning, I noticed some annoyance among spectators that the Challenge is a black box.

The admin at Nimble Books complains that objective data is not available on the winning campaigns:

Unfortunately, when you surf through to the information about the challenge, there is absolutely no information about the quantitative results or the techniques that the winning teams used.

This would be 1000% more interesting if they revealed the CTR and conversions that the winners were able to achieve, and if they shared the written reports by the teams.

Then Svetlana at Profy says,

There are almost no details on how exactly the results were evaluated based on the students’ reports so I’d really love to see a comment from the The Hangout (hopefully the students also taught their partner to use Google Alerts as well to track their name online as a part of their online marketing campaign) to get more information on how exactly the program helped them increase the brand visibility or customers’ base.

I agree that the results can seem meaningless for anyone who did not participate in the Challenge.

The official Google Online Marketing Challenge results have been released as simple descriptions of each campaign: Strong, Good, Fair, Needs Improvement, Ineligible. This may help the teams themselves that can search for their scores by their Adwords client ID’s, but an outsider has no way of gauging what sort of skills these winners possess.

I would be interested in comparing the academic teams’ results with the industry standard. Of course, we would not expect the same quality of campaign strategies coming from students who picked up Adwords only a couple months before the Challenge was to begin.

However, Google and the organizers of the Challenge cannot be blamed for this lack of information. Releasing the Adwords results would most likely have legal implications with the client businesses in the Challenge because it would allow competing businesses to gain hold of valuable keyword data. How nice would it be to obtain long-tail keywords, click-through and conversion data for a couple thousand businesses that participated? This would definitely go against Google’s intentions for this academic competition.

One way to improve the next Google Online Marketing Challenge is to require each team to create a blog (why not cross-promote Blogger along with Adwords for this opportunity?). I tried to write down some noteworthy progress checks along the way for our team on Gomcha.com. Having team blogs would open up the Challenge to spectators and allow them to peer into what’s going on with all the competing teams across the world.

We Win the Americas for the Google Online Marketing Challenge

The results are officially announced today on the official Google Online Marketing Challenge website. Our team took first place in the Americas region. Our members included Caroline Furey, Joe Lewis, Matt Maisel and me.

(Left to right) Joe Lewis, Matt Maisel, Caroline Furey, Daehee Park

As a result, we will get a trip to the Google headquarters in Mountain View, CA and receive a Apple Macbook.

This is extremely exciting news because I had high hopes for the Challenge. I couldn’t believe it when I discovered that we were a top ranking team. But the result makes sense since the members listed above were diligent and enthusiastic about the work involved, and they tried to learn as much as they could about Google Adwords throughout the Challenge.

Our strategies included duplicating campaigns with geo-targeting; dynamic keyword insertion in tightly organized ad groups; proper keyword matching with broad, phrase and exact; extensive negative keyword research and attention-grabbing copywriting. Each member carefully studied the Adwords Learning Center, which is a high-quality free resource provided by Google. We managed to apply most if not all of the advanced topics covered in the Learning Center.

Thank you Dr. Jim Jansen for providing the opportunity for teams at Penn State. Dr. Jansen is also a member of the Global Academic Panel for the Challenge. (Dr. Jansen was not part of the final judging process due to our making the final 15.)

The global winner comes from the University of Western Australia; the Asia-Pacific winner from the Australian Graduate School of Management; the Europe-Middle East-Africa winner from Universitat Bern (Switzerland).

Links of interest:

Scraping Penn State’s Schedule of Courses

Last year, around this time, I wondered how Dave Smith obtained his course data for the Penn State scheduling website LionSchedules.com. Without an official API that serves the data, he would have to manually scrape the Schedule of Courses.

I forgot about this thought until recently Alan deLevie (Sophomore, Political Science) mentioned to me his own experiment with the Schedule of Courses. He was having problems with parsing the course section information. So I decided to give it a shot.

Having learned a bit of Python over the past month, I realized that scraping the Schedule of Courses is not a difficult task. I put in a few hours yesterday getting Beautiful Soup to successfully scrape all the data present on one of the course lists.

The Penn State Schedule of Courses is running on an ancient script that is horribly coded based on current standards. It’s a big jumble of table, font and bold tags. The most challenging aspect of scraping this site was grouping the course information and individual section information together, since they are located in two adjacent but separate tables.

Based on Alan’s recommendation, I’ll work on storing all of the course data in a database and making it available through a free API.

Django on MediaTemple dv 3.0 or 3.5 server

I am currently struggling to set up Django on a MediaTemple dv 3.0 server. I failed to find any resources on Google that worked for me, so I combed through the private MediaTemple user forums. I found the following forum posting to be useful, so I’m re-posting it here so that it’s publicly accessible via search engines:

Thanks for the help, I got it working. For anyone’s future reference here are the steps I took to make this work:

Using:

Python2.4
MySqlDb 1.2.1_p2
Django off the svn trunk

Download and compile and install mysqldb

Edit your vhost.conf file just as leland said, as root in the conf dir of vhost:

Code:


SetHandler python-program
PythonHandler django.core.handlers.modpython
PythonPath “['/var/www/vhosts/your-domain.com/django-projects'] + sys.path”
SetEnv DJANGO_SETTINGS_MODULE your-project-name.settings
PythonDebug Off
PythonInterpreter UniqueID


SetHandler None


SetHandler None

Create your django-projects directory and set the owner:group

Code:

chown ftpusername:psaserv django-templates django-projects

I symlinked my django-admin.py to /usr/local/bin/django-admin.py

You will need to restart apache and, this was one that puzzled me for a bit…I had to resave my domain settings in plesk to get it to pick up the vhost.conf file.

(A good reference for this: http://groups.google.com.tr/group/djang … 4de833d9c2)

Once you have done that create a project and make sure the vhost is setup correctly…

To test I just setup a basic page like in the django book:

http://www.djangobook.com/en/1.0/chapter03/

- joshm

Our new project management tools

We’ve been using Beanstalk for version control (BEST subversion hosting interface) and Basecamp for general project management, and they really drive our productivity up! Both have monthly fees but the expenses are well worth the time saved. This is all in preparation for a new direction we are taking with Meezik.

If you're looking for something specific then give the search form below a try:

RSS Wordpress Grady (theme) Valid XHTML Return to the Top ↑