Entries Tagged 'Uncategorized' ↓

Cobol v. Fortran

I thought it’d be interesting to compare how many people admit to knowing ancient programming languages on their LinkedIn pages. This is in a contrast to my post on the popularity of hip JVM languages Scala and Clojure. True to it’s reputation for scientific computation power, Fortran is primarily used by scientific organization – an interesting contrast to Clojure and Scala, where LinkedIn shows primarily open-source companies behind language tools (Typesafe, Apache, etc). Fortran

Cobol, by contrast, is claimed almost entirely by big consulting firms. If these firms push to sell services in a particular skill, they are almost guaranteed to show up in the results – hard to beat a company like TCS, which has 238,000 employees, or Infosys, which has 156,000 employees.

Cobol

I’m not sure how accurate the data LinkedIn provides is – one surprising result here is both these languages skew young. Likely, LinkedIn is self-selected to younger people, being more likely to look for jobs, especially in software, and more perhaps likely to be open online. These may also skew based on the Indian firms mentioned above.

Fortran use by age

Fortran by Age

Cobol use by age

Cobol By Age

Node.Js use by age

Consider the comparison to a relatively new technology, Node.js:

Node.js by Age

Generating ARFF files for Weka from Postgres

Since all my scraped data is in Postgres, this is the easiest way to get it out – the fastest iteration possible. At some point I’ll probably switch to a Java library. It’s interesting to see, but probably the only lesson from this is that all ETL scripts are ugly.

WITH advertisers_ranked AS (
	SELECT advertiser_id, REPLACE(REPLACE(LOWER(advertiser), ' ', '_'), '/', '_') advertiser,
	6 + dense_rank() OVER (partition BY 1 ORDER BY advertiser) advertiser_rank -- 6 for the number of attributes prior to the 'advertiser' attributes
	FROM advertisers
)
SELECT '@RELATION flippa' line
UNION ALL
SELECT '@ATTRIBUTE default numeric' line
UNION ALL
SELECT '@ATTRIBUTE siteid string' line
UNION ALL
SELECT '@ATTRIBUTE banned {0,1}' line
UNION ALL
SELECT '@ATTRIBUTE length numeric' line
UNION ALL
SELECT '@ATTRIBUTE h1 numeric' line
UNION ALL
SELECT '@ATTRIBUTE h2 numeric' line
UNION ALL
SELECT '@ATTRIBUTE h3 numeric' line
UNION ALL
(SELECT '@ATTRIBUTE ' || advertiser || ' {0, 1}' line
FROM advertisers_ranked ORDER BY advertiser_rank)
UNION ALL
SELECT '@DATA' line
UNION ALL
-- there are N advertisers per row, this combines them into one
SELECT '{' || siteid || ', ' || banned || ', ' || LENGTH || ', ' || h1 || ', ' || h2 || ', ' || h3 || ', ' || array_to_string(array_agg(advertiser ORDER BY advertiser_rank), ', ') || '}' line
FROM (
	SELECT DISTINCT
	        '1 ' || s.site_id siteid,
		'2 ' || (CASE WHEN seller LIKE '%banned%' THEN 1 ELSE 0 END) AS banned,
		'3 ' || CHAR_LENGTH(description) LENGTH,
		'4 ' || (LENGTH(description) - LENGTH(regexp_replace(LOWER(description),'h1','','g'))) / LENGTH('h1') h1,
		'5 ' || (LENGTH(description) - LENGTH(regexp_replace(LOWER(description),'h2','','g'))) / LENGTH('h2') h2,
		'6 ' || (LENGTH(description) - LENGTH(regexp_replace(LOWER(description),'h3','','g'))) / LENGTH('h3') h3,
		advertiser_rank || ' 1' advertiser,
		advertiser_rank
	FROM sites s
	JOIN sites_advertisers ON s.site_id = sites_advertisers.site_id
	JOIN advertisers_ranked a ON a.advertiser_id = sites_advertisers.advertiser_id
	JOIN auctions ON auctions.site_id = s.site_id
	) a
GROUP BY siteid, banned, LENGTH, h1, h2, h3

Adding Adzerk Units to Blogspot blogs

This applies to adding any html/js widget to blogspot, but in this case, we’re adding adzerk. Select “Layout” on the left hand side of blogger. Click images to enlarge.

Select to add a new widget:

Select the HTML/JS widget:

Paste in the html code. Adzerk tells you to put this in the head block, but this allows you to do it without changing the template html. Make sure to put in the div entry on a following line. If it does not display, check that you did not accidentally switch into rich text mode, as this will mess up the layout. If that happens, edit the gadget and re-add the content.

How to fix “Error: NAMESPACE_ERR: DOM Exception 14″ in Chrome

The only references I’ve found online are to an old bug in Chrome.

Error: NAMESPACE_ERR: DOM Exception 14

I generated this with the following xpath expression generated by Firebug:

id('content')/x:div[2]/x:div[3]/x:div[1]/x:h2[2]

This causes this javascript line to fail:

document.evaluate("id('content')/x:div[2]/x:div[3]/x:div[1]/x:h2[2]", document, null, XPathResult.ANY_TYPE, null)

The fix is very simple: remove the namespaces – the “x:”, like so:

id('content')/div[2]/div[3]/div[1]/h2[2]

And then the JS code will work:

document.evaluate("id('content')/div[2]/div[3]/div[1]/h2[2]", document, null, XPathResult.ANY_TYPE, null)

Installing postgres client for nodejs

This fixes the following error:

/vagrant/node_modules/pg/wscript:16: error: The program ['pg_config'] is required
apt-get install libpq-dev
npm install pg

Sample Greasemonkey Script in Chrome to process local files

The following script will fire an alert box for local files:

// ==UserScript==
// @name Matcher
// @descriptions Match Local Files
// @version 1
// @match file://*
// ==/UserScript==
alert(1);

Save as “match.user.js”. Drop into a Chrome tab and you will be prompted to install. These are done per user profile. Each time you re-install the script, you need to remove the script from the extensions tab and repeat this process.

In order for this to work correctly, you need to open the Extensions settings in Chrome. You need to allow access to file urls, and incognito mode, if necessary.


(Click to enlarge)