bashで人気言語アンケートサイトの結果をcsvに変換してみる

■元ネタ

 参考:ハッカーニュース人気言語
 http://d.hatena.ne.jp/karasuyamatengu/20120324/1332561697

■本家からデータ生成

$ w3m -dump 'http://news.ycombinator.com/item?id=3746692' | \
  grep -B 1 "[0-9] points\$" | sed s/"^ *\|\[grayarrow\] \| points"//g > temp.txt

■CSV形式に変換

$ sed s/" "/"__"/g temp.txt | \
  for list in `xargs`;do echo -ne "\"$list\",";done | \
  sed s/"[0-9]\","/"&\n"/g | \
  sed s/",\$"//g | \
  sed s/"__"/" "/g >temp.csv

■Top10の表示

$ sort -t\" -k 4 -nr temp.csv | head -10
"Python","2332"
"Ruby","1294"
"JavaScript","1043"
"C","723"
"C#","596"
"PHP","470"
"Java","408"
"Haskell","408"
"C++","396"
"Clojure","349"

■合計と平均

$ awk -F\" '{print $4}' temp.csv | \
  awk '{ sum += $1;count++} END {print "Score = " sum "\nAvg   = " sum/count}'
Score = 10422
Avg   = 281.676

■合計と平均2 小数点以下切捨て

$ awk -F\" '{print $4}' temp.csv | \
  awk '{ sum += $1;count++} END {print "Score = " sum "\nAvg   = " (sum-(sum%count))/count}'
Score = 10422
Avg   = 281

■平均値以上の人気言語

$ awk -F\" '{print $4}' temp.csv | \
  for num in `xargs`;do test "$num" -ge "281" && grep "\"$num\"" temp.csv;done | \
  sort -k 4 -t\" -nr
"Python","2332"
"Ruby","1294"
"JavaScript","1043"
"C","723"
"C#","596"
"PHP","470"
"Java","408"
"Java","408"
"Haskell","408"
"Haskell","408"
"C++","396"
"Clojure","349"
"CoffeeScript","281"

■平均値未満の人気言語

$ awk -F\" '{print $4}' temp.csv | \
  for num in `xargs`;do test "$num" -lt "281" && grep "\"$num\"" temp.csv;done | \
  sort -t\" -k 4 -nr
"Objective C","262"
"Lisp","235"
"Perl","229"
"Scala","183"
"Other","163"
"Scheme","143"
"Erlang","118"
"Lua","111"
"SQL","76"
"Assembly","75"
"Actionscript","71"
"OCaml","67"
"Smalltalk","51"
"Shell","46"
"Groovy (Added Two Hours Late Due To Requests)","43"
"D","41"
"Visual Basic","32"
"Forth","31"
"Tcl","27"
"Delphi","25"
"ColdFusion","24"
"Pascal","19"
"Ada","17"
"Fortran","16"
"Cobol","9"
"Rexx","8"

□グラフのこと忘れてたので追記。。。
 ※改行を挿入するsedへのパイプを削除すれば一行になります。

$ for list in "`sort -t \\\" -k 4 -nr temp.csv`";do \
    echo "$list" | sed s/"\""//g | awk -F\, '{print $1 "," $2 "<br/><hr align=\"left\" width=\"" $2 "\">"}'; \
  done > temp.html
$ RETURN=$(for i in `seq 1 100`;do echo -ne "-";done); \
  w3m -dump temp.html | sed s/""/"-"/g | sed s/${RETURN}/"&\n"/g | nl -b p^[A-z] | sed s/"^ *-"/"-"/g
     1  Python,2332
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------
---------------------------------
     2  Ruby,1294
----------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
     3  JavaScript,1043
----------------------------------------------------------------------------------------------------
-------------------------------------------------
     4  C,723
----------------------------------------------------------------------------------------------------
---
     5  C#,596
-------------------------------------------------------------------------------------
     6  PHP,470
-------------------------------------------------------------------
     7  Java,408
----------------------------------------------------------
     8  Haskell,408
----------------------------------------------------------
     9  C++,396
--------------------------------------------------------
    10  Clojure,349
-------------------------------------------------
    11  CoffeeScript,281
----------------------------------------
    12  Objective C,262
-------------------------------------
    13  Lisp,235
---------------------------------
    14  Perl,229
--------------------------------
    15  Scala,183
--------------------------
    16  Other,163
-----------------------
    17  Scheme,143
--------------------
    18  Erlang,118
----------------
    19  Lua,111
---------------
    20  SQL,76
----------
    21  Assembly,75
----------
    22  Actionscript,71
----------
    23  OCaml,67
---------
    24  Smalltalk,51
-------
    25  Shell,46
------
    26  Groovy (Added Two Hours Late Due To Requests),43
------
    27  D,41
-----
    28  Visual Basic,32
----
    29  Forth,31
----
    30  Tcl,27
---
    31  Delphi,25
---
    32  ColdFusion,24
---
    33  Pascal,19
--
    34  Ada,17
--
    35  Fortran,16
--
    36  Cobol,9
-
    37  Rexx,8
-

□シェルは25位ですか。そうですかw。。。

    25  Shell,46

□元ネタさんの更新があったので。。。

 ハッカーニュースで嫌われている言語
 http://d.hatena.ne.jp/karasuyamatengu/20120324/1332604058

□本家からデータ取得

$ w3m -dump 'http://news.ycombinator.com/item?id=3748961' | \
  grep -B 1 "[0-9] points\$" | sed s/"^ *\|\[grayarrow\] \| points"//g > distemp.txt

□CSV変換

sed s/" "/"__"/g distemp.txt | \
  for list in `xargs`;do echo -ne "\"$list\",";done | \
  sed s/"[0-9]\","/"&\n"/g | \
  sed s/",\$"//g | \
  sed s/"__"/" "/g > distemp.csv

□グラフ化

$ for list in "`sort -t \\\" -k 4 -nr distemp.csv`";do \
    echo "$list" | sed s/"\""//g | awk -F\, '{print $1 "," $2 "<br/><hr align=\"left\" width=\"" $2 "\">"}'; \
  done > distemp.html

$ RETURN=$(for i in `seq 1 100`;do echo -ne "-";done); \
  w3m -dump distemp.html | sed s/""/"-"/g | sed s/${RETURN}/"&\n"/g | nl -b p^[A-z] | sed s/"^ *-"/"-"/g
     1  Java,991
----------------------------------------------------------------------------------------------------
-----------------------------------------
     2  PHP,986
----------------------------------------------------------------------------------------------------
----------------------------------------
     3  Visual Basic,577
----------------------------------------------------------------------------------
     4  C++,528
---------------------------------------------------------------------------
     5  JavaScript,407
----------------------------------------------------------
     6  Ruby,217
-------------------------------
     7  Perl,204
-----------------------------
     8  Objective C,200
----------------------------
     9  Actionscript,163
-----------------------
    10  Python,110
---------------
    11  CoffeeScript,109
---------------
    12  C#,103
--------------
    13  Shell,90
------------
    14  ColdFusion,85
------------
    15  SQL,70
----------
    16  Cobol,66
---------
    17  C,53
-------
    18  Assembly,43
------
    19  Scala,38
-----
    20  Other,35
-----
    21  Tcl,32
----
    22  Fortran,31
----
    23  Haskell,30
----
    24  Lisp,29
----
    25  Delphi,25
---
    26  Scheme,24
---
    27  Pascal,24
---
    28  Groovy,23
---
    29  Clojure,20
--
    30  Erlang,17
--
    31  Ada,17
--
    32  Forth,12
-
    33  D,12
-
    34  Smalltalk,10
-
    35  OCaml,10
-
    36  Lua,10
-
    37  Rexx,7
-

□並べ替えや順位が分からないと、グラフの意味が無いような気がしたので。
 後、これだけのためにGUIは要らない。