language-agnostic - والرأسي - كيفية رسم منحنى بياني في excel




إنشاء مخطط ASCII للكلمات الأكثر استخدامًا في نص معين (20)

التحدي:

إنشاء مخطط ASCII من الكلمات الأكثر استخدامًا في نص معين.

القواعد:

  • لا تقبل سوى az و AZ (الحروف الأبجدية) كجزء من الكلمة.
  • تجاهل الغلاف ( She == she لغرضنا).
  • تجاهل الكلمات التالية (تماما ، وأنا أعلم): the, and, of, to, a, i, it, in, or, is
  • التوضيح: don't تفكر في: هذا سوف يؤخذ كـ "كلمتين" مختلفتين في نطاقات az و AZ : ( don و t ).

  • اختيارياً (لقد فات الأوان على تغيير المواصفات الآن بشكل رسمي) يمكنك اختيار إسقاط "الكلمات" أحادية الحرف (قد يؤدي ذلك إلى اختصار قائمة التجاهل أيضًا).

تحليل text معين (اقرأ ملف محدد عبر حجج سطر الأوامر أو تم توصيله بالشبكة ؛ افترض us-ascii ) وقم بإنشاء word frequency chart بنا مع الخصائص التالية:

  • اعرض الرسم البياني (انظر أيضًا المثال أدناه) للكلمات الـ22 الأكثر شيوعًا (مرتبة بترتيب تنازلي).
  • width شريط يمثل عدد مرات حدوث (تردد) للكلمة (بالتناسب). إلحاق مساحة واحدة وطباعة الكلمة.
  • تأكد من أن هذه الأشرطة (بالإضافة إلى فراغ مساحة الكلمات) تتلاءم دائمًا: يجب أن يكون bar + [space] + word + [space] دائمًا <= 80 حرفًا (تأكد من أنك قد تتعامل مع اختلاف الأشرطة وأطوال الكلمة: على سبيل المثال: الثانية الكلمة الأكثر شيوعا يمكن أن يكون وقتا أطول ثم الأول في حين لا تختلف كثيرا في التردد). تعظيم عرض شريط داخل هذه القيود ومقياس القضبان بشكل مناسب (وفقا للترددات التي يمثلونها).

مثال:

يمكن العثور على النص على سبيل المثال هنا ( مغامرات أليس في بلاد العجائب ، من قبل لويس كارول ).

سينتج هذا النص المحدد المخطط التالي:

 _________________________________________________________________________
|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|____________________________________________________| alice 
|______________________________________________| was 
|__________________________________________| that 
|___________________________________| as 
|_______________________________| her 
|____________________________| with 
|____________________________| at 
|___________________________| s 
|___________________________| t 
|_________________________| on 
|_________________________| all 
|______________________| this 
|______________________| for 
|______________________| had 
|_____________________| but 
|____________________| be 
|____________________| not 
|___________________| they 
|__________________| so 


لمعلوماتك: هذه هي الترددات المبنية على الرسم البياني أعلاه:

[('she', 553), ('you', 481), ('said', 462), ('alice', 403), ('was', 358), ('that
', 330), ('as', 274), ('her', 248), ('with', 227), ('at', 227), ('s', 219), ('t'
, 218), ('on', 204), ('all', 200), ('this', 181), ('for', 179), ('had', 178), ('
but', 175), ('be', 167), ('not', 166), ('they', 155), ('so', 152)]

مثال ثاني (لمعرفة ما إذا كنت قد قمت بتطبيق المواصفات الكاملة): استبدل كل ظهور you في ملف Alice in Wonderland superlongstringstring :

 ________________________________________________________________
|________________________________________________________________| she 
|_______________________________________________________| superlongstringstring 
|_____________________________________________________| said 
|______________________________________________| alice 
|________________________________________| was 
|_____________________________________| that 
|______________________________| as 
|___________________________| her 
|_________________________| with 
|_________________________| at 
|________________________| s 
|________________________| t 
|______________________| on 
|_____________________| all 
|___________________| this 
|___________________| for 
|___________________| had 
|__________________| but 
|_________________| be 
|_________________| not 
|________________| they 
|________________| so 

الفائز:

أقصر حل (حسب عدد الأحرف ، لكل لغة). إستمتع!

تحرير : جدول يلخص النتائج حتى الآن (2012-02-15) (تمت إضافته في الأصل من قبل المستخدم Nas Banov):

Language          Relaxed  Strict
=========         =======  ======
GolfScript          130     143
Perl                        185
Windows PowerShell  148     199
Mathematica                 199
Ruby                185     205
Unix Toolchain      194     228
Python              183     243
Clojure                     282
Scala                       311
Haskell                     333
Awk                         336
R                   298
Javascript          304     354
Groovy              321
Matlab                      404
C#                          422
Smalltalk           386
PHP                 450
F#                          452
TSQL                483     507

تمثل الأرقام طول أقصر حل بلغة معينة. يشير "Strict" إلى حل يقوم بتطبيق المواصفات تمامًا (يسحب |____| أشرطة ، يغلق الشريط الأول في الأعلى بخط ____ ، يفسر إمكانية الكلمات الطويلة ذات التردد العالي إلخ). "مسترخى" يعني أن بعض الحريات قد اتخذت لتختفي إلى حل.

حلول فقط أقصر ثم يتم تضمين 500 حرفا. يتم فرز قائمة اللغات حسب طول الحل "الصارم". تستخدم "Unix Toolchain" للدلالة على الحلول المختلفة التي تستخدم shell * nix التقليدي بالإضافة إلى مزيج من الأدوات (مثل grep و tr و sort و uniq و head و perl و awk).


206

قذيفة، grep، tr، grep، sort، uniq، sort، head، perl

~ % wc -c wfg
209 wfg
~ % cat wfg
egrep -oi \\b[a-z]+|tr A-Z a-z|egrep -wv 'the|and|of|to|a|i|it|in|or|is'|sort|uniq -c|sort -nr|head -22|perl -lape'($f,$w)[email protected];$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b\n$_"'
~ % # usage:
~ % sh wfg < 11.txt

جلالة الملك ، فقط رأيت أعلاه: sort -nr -> sort -n ثم head -> tail => 208 :)
update2: erm ، بالطبع ما سبق هو سخيفة ، لأنه سيتم عكس ذلك الحين. إذن ، 209.
update3: تحسين regexp للإقصاء -> 206

egrep -oi \\b[a-z]+|tr A-Z a-z|egrep -wv 'the|and|o[fr]|to|a|i[tns]?'|sort|uniq -c|sort -nr|head -22|perl -lape'($f,$w)[email protected];$.>1or($q,$x)=($f,76-length$w);$b="_"x($f/$q*$x);$_="|$b| $w ";$.>1or$_=" $b\n$_"'



للمتعة ، إليك إصدار perl-only (أسرع بكثير):

~ % wc -c pgolf
204 pgolf
~ % cat pgolf
perl -lne'$1=~/^(the|and|o[fr]|to|.|i[tns])$/i||$f{lc$1}++while/\b([a-z]+)/gi}{@w=(sort{$f{$b}<=>$f{$a}}keys%f)[0..21];$Q=$f{$_=$w[0]};$B=76-y///c;print" "."_"x$B;print"|"."_"x($B*$f{$_}/$Q)."| $_"[email protected]'
~ % # usage:
~ % sh pgolf < 11.txt

GolfScript، 177 175 173 167 164 163 144 131 130 char

بطيئة - 3 دقائق للنص عينة (130)

{32|.123%97<[email protected]}%]''*n%"oftoitinorisa"2/-"theandi"3/-$([email protected]{.3$>1{;)}if}/]2/{~~\;}$22<.0=~:2;,76\-:1'_':0*' '\@{"
|"\~1*2/0*'| '@}/

تفسير:

{           #loop through all characters
 32|.       #convert to uppercase and duplicate
 123%97<    #determine if is a letter
 [email protected]       #return either the letter or a newline
}%          #return an array (of ints)
]''*        #convert array to a string with magic
n%          #split on newline, removing blanks (stack is an array of words now)
"oftoitinorisa"   #push this string
2/          #split into groups of two, i.e. ["of" "to" "it" "in" "or" "is" "a"]
-           #remove any occurrences from the text
"theandi"3/-#remove "the", "and", and "i"
$           #sort the array of words
([email protected]         #takes the first word in the array, pushes a 1, reorders stack
            #the 1 is the current number of occurrences of the first word
{           #loop through the array
 .3$>1{;)}if#increment the count or push the next word and a 1
}/
]2/         #gather stack into an array and split into groups of 2
{~~\;}$     #sort by the latter element - the count of occurrences of each word
22<         #take the first 22 elements
.0=~:2;     #store the highest count
,76\-:1     #store the length of the first line
'_':0*' '\@ #make the first line
{           #loop through each word
"
|"\~        #start drawing the bar
1*2/0       #divide by zero
*'| '@      #finish drawing the bar
}/

"صحيح" (على أمل). (143)

{32|.123%97<[email protected]}%]''*n%"oftoitinorisa"2/-"theandi"3/-$([email protected]{.3$>1{;)}if}/]2/{~~\;}$22<..0=1=:^;{[email protected],-^*\/}%$0=:1'_':0*' '\@{"
|"\~1*^/0*'| '@}/

أقل بطيئة - نصف دقيقة. (162)

'"'/' ':S*n/S*'"#{%q
'\+"
.downcase.tr('^a-z','
')}\""+~n%"oftoitinorisa"2/-"theandi"3/-$([email protected]{.3$>1{;)}if}/]2/{~~\;}$22<.0=~:2;,76\-:1'_':0*S\@{"
|"\~1*2/0*'| '@}/

إخراج مرئية في سجلات المراجعة.


Ruby 207 213 211 210 207 203 201 200 char

تحسين على Anurag ، يتضمن اقتراح من rfusca. كما يزيل الحجة للفرز وعدد قليل من لعب الغولف الثانوية الأخرى.

w=(STDIN.read.downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).group_by{|x|x}.map{|x,y|[-y.size,x]}.sort.take 22;k,l=w[0];m=76.0-l.size;puts' '+'_'*m;w.map{|f,x|puts"|#{'_'*(m*f/k)}| #{x} "}

تنفيذ ك:

ruby GolfedWordFrequencies.rb < Alice.txt

تعديل: ضع "puts" مرة أخرى ، يجب أن يكون هناك لتجنب وجود علامات اقتباس في الإخراج.
Edit2: تم تغيير الملف-> IO
Edit3: تمت إزالة / i
Edit4: إزالة الأقواس حول (f * 1.0) ، معاد حسابها
Edit5: استخدام إضافة سلسلة للسطر الأول؛ توسيع s في المكان.
تحرير 6: جعل تعويم m ، إزالة 1.0. تحرير: لا يعمل ، تغيير الأطوال. تحرير: ليس أسوأ من ذي قبل
Edit7: استخدام STDIN.read .


Windows PowerShell ، 199 حرفًا

$x=$input-split'\P{L}'-notmatch'^(the|and|of|to|.?|i[tns]|or)$'|group|sort *
filter f($w){' '+'_'*$w
$x[-1..-22]|%{"|$('_'*($w*$_.Count/$x[-1].Count))| "+$_.Name}}
f(76..1|?{!((f $_)-match'.'*80)})[0]

(آخر فاصل سطر ليس ضروريًا ، ولكن تم تضمينه هنا لسهولة القراءة.)

(تتوفر الشفرة الحالية وملفات الاختبار الخاصة بي في مستودع SVN الخاص بي . آمل أن تصادف حالات الاختبار الخاصة بي معظم الأخطاء الشائعة (طول الشريط ، ومشكلات مطابقة regex وعدد قليل من الآخرين))

الافتراضات:

  • ASCII الولايات المتحدة كمدخل. ربما تحصل غريبة مع يونيكود.
  • على الأقل كلمتين غير متوقفتين في النص

History

نسخة مسترخية (137) ، حيث يتم حسابها بشكل منفصل الآن ، على ما يبدو:

($x=$input-split'\P{L}'-notmatch'^(the|and|of|to|.?|i[tns]|or)$'|group|sort *)[-1..-22]|%{"|$('_'*(76*$_.Count/$x[-1].Count))| "+$_.Name}
  • لا يغلق الشريط الأول
  • لا يمثل طول كلمة كلمة غير أولية

ترجع اختلافات طول شريط حرف واحد مقارنة بالحلول الأخرى إلى PowerShell باستخدام التقريب بدلاً من الاقتطاع عند تحويل أرقام الفاصلة العائمة إلى أعداد صحيحة. نظرًا لأن المهمة تتطلب فقط طول الشريط النسبي ، فيجب أن يكون هذا جيدًا.

بالمقارنة مع الحلول الأخرى ، اتبعت نهجًا مختلفًا بعض الشيء في تحديد طول الشريط الطويل عن طريق محاولة الخروج بأعلى طول ممكن وألا يتجاوز طوله 80 حرفًا.

يمكن العثور على إصدار قديم موضح here .


روبي 1.9 ، 185 حرفًا

(يعتمد بشكل كبير على حلول روبي الأخرى)

w=($<.read.downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).group_by{|x|x}.map{|x,y|[-y.size,x]}.sort[0,22]
k,l=w[0]
puts [?\s+?_*m=76-l.size,w.map{|f,x|?|+?_*(f*m/k)+"| "+x}]

بدلاً من استخدام أي مفاتيح تبديل سطر الأوامر مثل الحلول الأخرى ، يمكنك ببساطة تمرير اسم الملف كوسيطة. (على ruby1.9 wordfrequency.rb Alice.txt )

بما أنني أستخدم الأحرف الحرفية هنا ، فإن هذا الحل لا يعمل إلا في Ruby 1.9.

تحرير: استبدال الفواصل المنقوطة بفواصل سطر لـ "سهولة القراءة". : P

تحرير 2: أشار Shtééf نسيت مساحة زائدة - ثابت ذلك.

تحرير 3: إزالة المساحة الزائدة مرة أخرى ؛)


روبي ، 215 ، 216 ، 218 ، 221 ، 224 ، 236 ، 237 حرفًا

التحديث 1: يا هلا ! إنه تعادل مع solution JS Bangs . لا يمكن التفكير في طريقة لخفض أي أكثر :)

تحديث 2: لعب خدعة الجولف القذرة. تم التغيير each map لحفظ حرف واحد :)

تحديث 3: تغيير File.read إلى IO.read +2. لم تكن Array.group_by مثمرة للغاية ، تغيرت reduce +6. ليست هناك حاجة إلى التحقق غير الحساس لحالة الأحرف بعد الغلاف السفلي مع downcase في regex +1. يمكن إجراء الفرز بترتيب تنازلي بسهولة بإلغاء القيمة +6. مجموع الوفورات +15

التحديث 4: [0] بدلاً من. .first ، +3. (@ Shtééf)

التحديث 5: قم بتوسيع المتغير l في المكان ، +1. وسِّع المتغير s في المكان ، +2. (@ Shtééf)

التحديث 6: استخدام إضافة سلسلة بدلاً من الاستيفاء للخط الأول ، +2. (@ Shtééf)

w=(IO.read($_).downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).reduce(Hash.new 0){|m,o|m[o]+=1;m}.sort_by{|k,v|-v}.take 22;m=76-w[0][0].size;puts' '+'_'*m;w.map{|x,f|puts"|#{'_'*(f*1.0/w[0][1]*m)}| #{x} "}

التحديث 7: ذهبت من خلال الكثير من hoopla للكشف عن التكرار الأول داخل العروة ، باستخدام المتغيرات الحالة. كل ما حصلت عليه هو +1 ، على الرغم من احتمال وجوده. الحفاظ على الإصدار السابق ، لأنني أعتقد أن هذا هو السحر الأسود. (@ Shtééf)

(IO.read($_).downcase.scan(/[a-z]+/)-%w{the and of to a i it in or is}).reduce(Hash.new 0){|m,o|m[o]+=1;m}.sort_by{|k,v|-v}.take(22).map{|x,f|@f||(@f=f;puts' '+'_'*(@m=76-x.size));puts"|#{'_'*(f*1.0/@f*@m)}| #{x} "}

نسخة قابلة للقراءة

string = File.read($_).downcase

words = string.scan(/[a-z]+/i)
allowed_words = words - %w{the and of to a i it in or is}
sorted_words = allowed_words.group_by{ |x| x }.map{ |x,y| [x, y.size] }.sort{ |a,b| b[1] <=> a[1] }.take(22)
highest_frequency = sorted_words.first
highest_frequency_count = highest_frequency[1]
highest_frequency_word = highest_frequency[0]

word_length = highest_frequency_word.size
widest = 76 - word_length

puts " #{'_' * widest}"    
sorted_words.each do |word, freq|
  width = (freq * 1.0 / highest_frequency_count) * widest
  puts "|#{'_' * width}| #{word} "
end

To use:

echo "Alice.txt" | ruby -ln GolfedWordFrequencies.rb

انتاج:

 _________________________________________________________________________
|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|_____________________________________________________| alice 
|_______________________________________________| was 
|___________________________________________| that 
|____________________________________| as 
|________________________________| her 
|_____________________________| with 
|_____________________________| at 
|____________________________| s 
|____________________________| t 
|__________________________| on 
|__________________________| all 
|_______________________| this 
|_______________________| for 
|_______________________| had 
|_______________________| but 
|______________________| be 
|_____________________| not 
|____________________| they 
|____________________| so 

F#, 452 chars

Strightforward: get a sequence a of word-count pairs, find the best word-count-per-column multiplier k , then print results.

let a=
 stdin.ReadToEnd().Split(" .?!,\":;'\r\n".ToCharArray(),enum 1)
 |>Seq.map(fun s->s.ToLower())|>Seq.countBy id
 |>Seq.filter(fun(w,n)->not(set["the";"and";"of";"to";"a";"i";"it";"in";"or";"is"].Contains w))
 |>Seq.sortBy(fun(w,n)-> -n)|>Seq.take 22
let k=a|>Seq.map(fun(w,n)->float(78-w.Length)/float n)|>Seq.min
let u n=String.replicate(int(float(n)*k)-2)"_"
printfn" %s "(u(snd(Seq.nth 0 a)))
for(w,n)in a do printfn"|%s| %s "(u n)w

Example (I have different freq counts than you, unsure why):

% app.exe < Alice.txt

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|___________________________________________| that
|___________________________________| as
|________________________________| her
|_____________________________| with
|_____________________________| at
|____________________________| t
|____________________________| s
|__________________________| on
|_________________________| all
|_______________________| this
|______________________| had
|______________________| for
|_____________________| but
|_____________________| be
|____________________| not
|___________________| they
|__________________| so

Haskell - 366 351 344 337 333 characters

(One line break in main added for readability, and no line break needed at end of last line.)

import Data.List
import Data.Char
l=length
t=filter
m=map
f c|isAlpha c=toLower c|0<1=' '
h w=(-l w,head w)
x!(q,w)='|':replicate(minimum$m(q?)x)'_'++"| "++w
q?(g,w)=q*(77-l w)`div`g
b x=m(x!)x
a(l:r)=(' ':t(=='_')l):l:r
main=interact$unlines.a.b.take 22.sort.m h.group.sort
  .t(`notElem`words"the and of to a i it in or is").words.m f

How it works is best seen by reading the argument to interact backwards:

  • map f lowercases alphabetics, replaces everything else with spaces.
  • words produces a list of words, dropping the separating whitespace.
  • filter ( notElem words "the and of to ai it in or is") discards all entries with forbidden words.
  • group . sort sorts the words, and groups identical ones into lists.
  • map h maps each list of identical words to a tuple of the form (-frequency, word) .
  • take 22 . sort sorts the tuples by descending frequency (the first tuple entry), and keeps only the first 22 tuples.
  • b maps tuples to bars (see below).
  • a prepends the first line of underscores, to complete the topmost bar.
  • unlines joins all these lines together with newlines.

The tricky bit is getting the bar length right. I assumed that only underscores counted towards the length of the bar, so || would be a bar of zero length. The function b maps cx over x , where x is the list of histograms. The entire list is passed to c , so that each invocation of c can compute the scale factor for itself by calling u . In this way, I avoid using floating-point math or rationals, whose conversion functions and imports would eat many characters.

Note the trick of using -frequency . This removes the need to reverse the sort since sorting (ascending) -frequency will places the words with the largest frequency first. Later, in the function u , two -frequency values are multiplied, which will cancel the negation out.


Transact SQL set based solution (SQL Server 2005) 1063 892 873 853 827 820 783 683 647 644 630 characters

بفضل غابي لبعض الاقتراحات المفيدة للحد من عدد الأحرف.

NB: إضافة فواصل الأسطر لتجنب أشرطة التمرير فقط مطلوب فاصل أسطر الأخير.

DECLARE @ VARCHAR(MAX),@F REAL SELECT @=BulkColumn FROM OPENROWSET(BULK'A',
SINGLE_BLOB)x;WITH N AS(SELECT 1 i,LEFT(@,1)L UNION ALL SELECT i+1,SUBSTRING
(@,i+1,1)FROM N WHERE i<LEN(@))SELECT i,L,i-RANK()OVER(ORDER BY i)R INTO #D
FROM N WHERE L LIKE'[A-Z]'OPTION(MAXRECURSION 0)SELECT TOP 22 W,-COUNT(*)C
INTO # FROM(SELECT DISTINCT R,(SELECT''+L FROM #D WHERE R=b.R FOR XML PATH
(''))W FROM #D b)t WHERE LEN(W)>1 AND W NOT IN('the','and','of','to','it',
'in','or','is')GROUP BY W ORDER BY C SELECT @F=MIN(($76-LEN(W))/-C),@=' '+
REPLICATE('_',-MIN(C)*@F)+' 'FROM # SELECT @[email protected]+' 
|'+REPLICATE('_',-C*@F)+'| '+W FROM # ORDER BY C PRINT @

نسخة قابلة للقراءة

DECLARE @  VARCHAR(MAX),
        @F REAL
SELECT @=BulkColumn
FROM   OPENROWSET(BULK'A',SINGLE_BLOB)x; /*  Loads text file from path
                                             C:\WINDOWS\system32\A  */

/*Recursive common table expression to
generate a table of numbers from 1 to string length
(and associated characters)*/
WITH N AS
     (SELECT 1 i,
             LEFT(@,1)L

     UNION ALL

     SELECT i+1,
            SUBSTRING(@,i+1,1)
     FROM   N
     WHERE  i<LEN(@)
     )
  SELECT   i,
           L,
           i-RANK()OVER(ORDER BY i)R
           /*Will group characters
           from the same word together*/
  INTO     #D
  FROM     N
  WHERE    L LIKE'[A-Z]'OPTION(MAXRECURSION 0)
             /*Assuming case insensitive accent sensitive collation*/

SELECT   TOP 22 W,
         -COUNT(*)C
INTO     #
FROM     (SELECT DISTINCT R,
                          (SELECT ''+L
                          FROM    #D
                          WHERE   R=b.R FOR XML PATH('')
                          )W
                          /*Reconstitute the word from the characters*/
         FROM             #D b
         )
         T
WHERE    LEN(W)>1
AND      W NOT IN('the',
                  'and',
                  'of' ,
                  'to' ,
                  'it' ,
                  'in' ,
                  'or' ,
                  'is')
GROUP BY W
ORDER BY C

/*Just noticed this looks risky as it relies on the order of evaluation of the 
 variables. I'm not sure that's guaranteed but it works on my machine :-) */
SELECT @F=MIN(($76-LEN(W))/-C),
       @ =' '      +REPLICATE('_',-MIN(C)*@F)+' '
FROM   #

SELECT @[email protected]+' 
|'+REPLICATE('_',-C*@F)+'| '+W
             FROM     #
             ORDER BY C

PRINT @

انتاج |

 _________________________________________________________________________ 
|_________________________________________________________________________| she
|_______________________________________________________________| You
|____________________________________________________________| said
|_____________________________________________________| Alice
|_______________________________________________| was
|___________________________________________| that
|____________________________________| as
|________________________________| her
|_____________________________| at
|_____________________________| with
|__________________________| on
|__________________________| all
|_______________________| This
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|_____________________| not
|____________________| they
|____________________| So
|___________________| very
|__________________| what

ومع الخيط الطويل

 _______________________________________________________________ 
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|____________________________________________________| said
|______________________________________________| Alice
|________________________________________| was
|_____________________________________| that
|_______________________________| as
|____________________________| her
|_________________________| at
|_________________________| with
|_______________________| on
|______________________| all
|____________________| This
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|__________________| not
|_________________| they
|_________________| So
|________________| very
|________________| what

عقد LabVIEW 51 ، 5 هياكل ، 10 مخططات

تعليم الفيل لرقص الصنبور ليست جميلة. أنا ، آه ، تخطي عدد الأحرف.

يتدفق البرنامج من اليسار إلى اليمين:


Perl: 203 202 201 198 195 208 203 / 231 chars

$/=\0;/^(the|and|of|to|.|i[tns]|or)$/i||$x{lc$_}++for<>=~/[a-z]+/gi;map{$z=$x{$_};$y||{$y=(76-y///c)/$z}&&warn" "."_"x($z*$y)."\n";printf"|%.78s\n","_"x($z*$y)."| $_"}(sort{$x{$b}<=>$x{$a}}keys%x)[0..21]

Alternate, full implementation including indicated behaviour (global bar-squishing) for the pathological case in which the secondary word is both popular and long enough to combine to over 80 chars ( this implementation is 231 chars ):

$/=\0;/^(the|and|of|to|.|i[tns]|or)$/i||$x{lc$_}++for<>=~/[a-z]+/gi;@e=(sort{$x{$b}<=>$x{$a}}keys%x)[0..21];for(@e){$p=(76-y///c)/$x{$_};($y&&$p>$y)||($y=$p)}warn" "."_"x($x{$e[0]}*$y)."\n";for(@e){warn"|"."_"x($x{$_}*$y)."| $_\n"}

The specification didn't state anywhere that this had to go to STDOUT, so I used perl's warn() instead of print - four characters saved there. Used map instead of foreach, but I feel like there could still be some more savings in the split(join()). Still, got it down to 203 - might sleep on it. At least Perl's now under the "shell, grep, tr, grep, sort, uniq, sort, head, perl" char count for now ;)

PS: Reddit says "Hi" ;)

Update: Removed join() in favour of assignment and implicit scalar conversion join. Down to 202. Also please note I have taken advantage of the optional "ignore 1-letter words" rule to shave 2 characters off, so bear in mind the frequency count will reflect this.

Update 2: Swapped out assignment and implicit join for killing $/ to get the file in one gulp using <> in the first place. Same size, but nastier. Swapped out if(!$y){} for $y||{}&&, saved 1 more char => 201.

Update 3: Took control of lowercasing early (lc<>) by moving lc out of the map block - Swapped out both regexes to no longer use /i option, as no longer needed. Swapped explicit conditional x?y:z construct for traditional perlgolf || implicit conditional construct - /^...$/i?1:$x{$ }++ for /^...$/||$x{$ }++ Saved three characters! => 198, broke the 200 barrier. Might sleep soon... perhaps.

Update 4: Sleep deprivation has made me insane. Well. More insane. Figuring that this only has to parse normal happy text files, I made it give up if it hits a null. Saved two characters. Replaced "length" with the 1-char shorter (and much more golfish) y///c - you hear me, GolfScript?? I'm coming for you!!! sob

Update 5: Sleep dep made me forget about the 22row limit and subsequent-line limiting. Back up to 208 with those handled. Not too bad, 13 characters to handle it isn't the end of the world. Played around with perl's regex inline eval, but having trouble getting it to both work and save chars... lol. Updated the example to match current output.

Update 6: Removed unneeded braces protecting (...)for, since the syntactic candy ++ allows shoving it up against the for happily. Thanks to input from Chas. Owens (reminding my tired brain), got the character class i[tns] solution in there. Back down to 203.

Update 7: Added second piece of work, full implementation of specs (including the full bar-squishing behaviour for secondary long-words, instead of truncation which most people are doing, based on the original spec without the pathological example case)

Examples:

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|___________________________________________| that
|____________________________________| as
|________________________________| her
|_____________________________| with
|_____________________________| at
|__________________________| on
|__________________________| all
|_______________________| this
|_______________________| for
|_______________________| had
|_______________________| but
|______________________| be
|_____________________| not
|____________________| they
|____________________| so
|___________________| very
|__________________| what

Alternative implementation in pathological case example:

 _______________________________________________________________
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|____________________________________________________| said
|______________________________________________| alice
|________________________________________| was
|_____________________________________| that
|_______________________________| as
|____________________________| her
|_________________________| with
|_________________________| at
|_______________________| on
|______________________| all
|____________________| this
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|__________________| not
|_________________| they
|_________________| so
|________________| very
|________________| what

Clojure 282 strict

(let[[[_ m]:as s](->>(slurp *in*).toLowerCase(re-seq #"\w+\b(?<!\bthe|and|of|to|a|i[tns]?|or)")frequencies(sort-by val >)(take 22))[b](sort(map #(/(- 76(count(key %)))(val %))s))p #(do(print %1)(dotimes[_(* b %2)](print \_))(apply println %&))](p " " m)(doseq[[k v]s](p \| v \| k)))

Somewhat more legibly:

(let[[[_ m]:as s](->> (slurp *in*)
                   .toLowerCase
                   (re-seq #"\w+\b(?<!\bthe|and|of|to|a|i[tns]?|or)")
                   frequencies
                   (sort-by val >)
                   (take 22))
     [b] (sort (map #(/ (- 76 (count (key %)))(val %)) s))
     p #(do
          (print %1)
          (dotimes[_(* b %2)] (print \_))
          (apply println %&))]
  (p " " m)
  (doseq[[k v] s] (p \| v \| k)))

Gawk -- 336 (originally 507) characters

(after fixing the output formatting; fixing the contractions thing; tweaking; tweaking again; removing a wholly unnecessary sorting step; tweaking yet again; and again (oops this one broke the formatting); tweak some more; taking up Matt's challenge I desperately tweak so more; found another place to save a few, but gave two back to fix the bar length bug)

Heh heh! I am momentarily ahead of [Matt's JavaScript][1] solution counter challenge! ;) and [AKX's python][2].

The problem seems to call out for a language that implements native associative arrays, so of course I've chosen one with a horribly deficient set of operators on them. In particular, you cannot control the order in which awk offers up the elements of a hash map, so I repeatedly scan the whole map to find the currently most numerous item, print it and delete it from the array.

It is all terribly inefficient, with all the golfifcations I've made it has gotten to be pretty awful, as well.

Minified:

{gsub("[^a-zA-Z]"," ");for(;NF;NF--)a[tolower($NF)]++}
END{split("the and of to a i it in or is",b," ");
for(w in b)delete a[b[w]];d=1;for(w in a){e=a[w]/(78-length(w));if(e>d)d=e}
for(i=22;i;--i){e=0;for(w in a)if(a[w]>e)e=a[x=w];l=a[x]/d-2;
t=sprintf(sprintf("%%%dc",l)," ");gsub(" ","_",t);if(i==22)print" "t;
print"|"t"| "x;delete a[x]}}

line breaks for clarity only: they are not necessary and should not be counted.

انتاج:

$ gawk -f wordfreq.awk.min < 11.txt 
 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|____________________________________________________| alice
|______________________________________________| was
|__________________________________________| that
|___________________________________| as
|_______________________________| her
|____________________________| with
|____________________________| at
|___________________________| s
|___________________________| t
|_________________________| on
|_________________________| all
|______________________| this
|______________________| for
|______________________| had
|_____________________| but
|____________________| be
|____________________| not
|___________________| they
|__________________| so
$ sed 's/you/superlongstring/gI' 11.txt | gawk -f wordfreq.awk.min
 ______________________________________________________________________
|______________________________________________________________________| she
|_____________________________________________________________| superlongstring
|__________________________________________________________| said
|__________________________________________________| alice
|____________________________________________| was
|_________________________________________| that
|_________________________________| as
|______________________________| her
|___________________________| with
|___________________________| at
|__________________________| s
|__________________________| t
|________________________| on
|________________________| all
|_____________________| this
|_____________________| for
|_____________________| had
|____________________| but
|___________________| be
|___________________| not
|__________________| they
|_________________| so

Readable; 633 characters (originally 949):

{
    gsub("[^a-zA-Z]"," ");
    for(;NF;NF--)
    a[tolower($NF)]++
}
END{
    # remove "short" words
    split("the and of to a i it in or is",b," ");
    for (w in b) 
    delete a[b[w]];
    # Find the bar ratio
    d=1;
    for (w in a) {
    e=a[w]/(78-length(w));
    if (e>d)
        d=e
    }
    # Print the entries highest count first
    for (i=22; i; --i){               
    # find the highest count
    e=0;
    for (w in a) 
        if (a[w]>e)
        e=a[x=w];
        # Print the bar
    l=a[x]/d-2;
    # make a string of "_" the right length
    t=sprintf(sprintf("%%%dc",l)," ");
    gsub(" ","_",t);
    if (i==22) print" "t;
    print"|"t"| "x;
    delete a[x]
    }
}

Java - 896 chars

931 chars

1233 chars made unreadable

1977 chars "uncompressed"

تحديث: لقد قللت بقوة عدد الأحرف. يحذف الكلمات أحادية الكلمة لكل المواصفات المحدثة.

أنا أحسد C # و LINQ كثيرا.

import java.util.*;import java.io.*;import static java.util.regex.Pattern.*;class g{public static void main(String[] a)throws Exception{PrintStream o=System.out;Map<String,Integer> w=new HashMap();Scanner s=new Scanner(new File(a[0])).useDelimiter(compile("[^a-z]+|\\b(the|and|of|to|.|it|in|or|is)\\b",2));while(s.hasNext()){String z=s.next().trim().toLowerCase();if(z.equals(""))continue;w.put(z,(w.get(z)==null?0:w.get(z))+1);}List<Integer> v=new Vector(w.values());Collections.sort(v);List<String> q=new Vector();int i,m;i=m=v.size()-1;while(q.size()<22){for(String t:w.keySet())if(!q.contains(t)&&w.get(t).equals(v.get(i)))q.add(t);i--;}int r=80-q.get(0).length()-4;String l=String.format("%1$0"+r+"d",0).replace("0","_");o.println(" "+l);o.println("|"+l+"| "+q.get(0)+" ");for(i=m-1;i>m-22;i--){o.println("|"+l.substring(0,(int)Math.round(r*(v.get(i)*1.0)/v.get(m)))+"| "+q.get(m-i)+" ");}}}

"المقروءه":

import java.util.*;
import java.io.*;
import static java.util.regex.Pattern.*;
class g
{
   public static void main(String[] a)throws Exception
      {
      PrintStream o = System.out;
      Map<String,Integer> w = new HashMap();
      Scanner s = new Scanner(new File(a[0]))
         .useDelimiter(compile("[^a-z]+|\\b(the|and|of|to|.|it|in|or|is)\\b",2));
      while(s.hasNext())
      {
         String z = s.next().trim().toLowerCase();
         if(z.equals(""))
            continue;
         w.put(z,(w.get(z) == null?0:w.get(z))+1);
      }
      List<Integer> v = new Vector(w.values());
      Collections.sort(v);
      List<String> q = new Vector();
      int i,m;
      i = m = v.size()-1;
      while(q.size()<22)
      {
         for(String t:w.keySet())
            if(!q.contains(t)&&w.get(t).equals(v.get(i)))
               q.add(t);
         i--;
      }
      int r = 80-q.get(0).length()-4;
      String l = String.format("%1$0"+r+"d",0).replace("0","_");
      o.println(" "+l);
      o.println("|"+l+"| "+q.get(0)+" ");
      for(i = m-1; i > m-22; i--)
      {
         o.println("|"+l.substring(0,(int)Math.round(r*(v.get(i)*1.0)/v.get(m)))+"| "+q.get(m-i)+" ");
      }
   }
}

إخراج أليس:

 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|____________________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| with
|______________________________| at
|___________________________| on
|__________________________| all
|________________________| this
|________________________| for
|_______________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| they
|____________________| so
|___________________| very
|___________________| what

إخراج دون كيشوت (أيضا من غوتبورغ):

 ________________________________________________________________________
|________________________________________________________________________| that
|________________________________________________________| he
|______________________________________________| for
|__________________________________________| his
|________________________________________| as
|__________________________________| with
|_________________________________| not
|_________________________________| was
|________________________________| him
|______________________________| be
|___________________________| don
|_________________________| my
|_________________________| this
|_________________________| all
|_________________________| they
|________________________| said
|_______________________| have
|_______________________| me
|______________________| on
|______________________| so
|_____________________| you
|_____________________| quixote

JavaScript 1.8 (SpiderMonkey) - 354

x={};p='|';e=' ';z=[];c=77
while(l=readline())l.toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,function(y)x[y]?x[y].c++:z.push(x[y]={w:y,c:1}))
z=z.sort(function(a,b)b.c-a.c).slice(0,22)
for each(v in z){v.r=v.c/z[0].c
c=c>(l=(77-v.w.length)/v.r)?l:c}for(k in z){v=z[k]
s=Array(v.r*c|0).join('_')
if(!+k)print(e+s+e)
print(p+s+p+e+v.w)}

Sadly, the for([k,v]in z) from the Rhino version doesn't seem to want to work in SpiderMonkey, and readFile() is a little easier than using readline() but moving up to 1.8 allows us to use function closures to cut a few more lines....

Adding whitespace for readability:

x={};p='|';e=' ';z=[];c=77
while(l=readline())
  l.toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,
   function(y) x[y] ? x[y].c++ : z.push( x[y] = {w: y, c: 1} )
  )
z=z.sort(function(a,b) b.c - a.c).slice(0,22)
for each(v in z){
  v.r=v.c/z[0].c
  c=c>(l=(77-v.w.length)/v.r)?l:c
}
for(k in z){
  v=z[k]
  s=Array(v.r*c|0).join('_')
  if(!+k)print(e+s+e)
  print(p+s+p+e+v.w)
}

Usage: js golf.js < input.txt

Output:

 _________________________________________________________________________ 
|_________________________________________________________________________| she
|_______________________________________________________________| you
|____________________________________________________________| said
|____________________________________________________| alice
|______________________________________________| was
|___________________________________________| that
|___________________________________| as
|________________________________| her
|_____________________________| at
|_____________________________| with
|____________________________| s
|____________________________| t
|__________________________| on
|_________________________| all
|_______________________| this
|______________________| for
|______________________| had
|______________________| but
|_____________________| be
|_____________________| not
|___________________| they
|___________________| so

(base version - doesn't handle bar widths correctly)

JavaScript (Rhino) - 405 395 387 377 368 343 304 chars

I think my sorting logic is off, but.. I duno. Brainfart fixed.

Minified (abusing \n 's interpreted as a ; sometimes):

x={};p='|';e=' ';z=[]
readFile(arguments[0]).toLowerCase().replace(/\b(?!(the|and|of|to|a|i[tns]?|or)\b)\w+/g,function(y){x[y]?x[y].c++:z.push(x[y]={w:y,c:1})})
z=z.sort(function(a,b){return b.c-a.c}).slice(0,22)
for([k,v]in z){s=Array((v.c/z[0].c)*70|0).join('_')
if(!+k)print(e+s+e)
print(p+s+p+e+v.w)}

PHP CLI version (450 chars)

This solution takes into account the last requirement which most purists have conviniently chosen to ignore. That costed 170 characters!

Usage: php.exe <this.php> <file.txt>

Minified:

<?php $a=array_count_values(array_filter(preg_split('/[^a-z]/',strtolower(file_get_contents($argv[1])),-1,1),function($x){return !preg_match("/^(.|the|and|of|to|it|in|or|is)$/",$x);}));arsort($a);$a=array_slice($a,0,22);function R($a,$F,$B){$r=array();foreach($a as$x=>$f){$l=strlen($x);$r[$x]=$b=$f*$B/$F;if($l+$b>76)return R($a,$f,76-$l);}return$r;}$c=R($a,max($a),76-strlen(key($a)));foreach($a as$x=>$f)echo '|',str_repeat('-',$c[$x]),"| $x\n";?>

Human readable:

<?php

// Read:
$s = strtolower(file_get_contents($argv[1]));

// Split:
$a = preg_split('/[^a-z]/', $s, -1, PREG_SPLIT_NO_EMPTY);

// Remove unwanted words:
$a = array_filter($a, function($x){
       return !preg_match("/^(.|the|and|of|to|it|in|or|is)$/",$x);
     });

// Count:
$a = array_count_values($a);

// Sort:
arsort($a);

// Pick top 22:
$a=array_slice($a,0,22);


// Recursive function to adjust bar widths
// according to the last requirement:
function R($a,$F,$B){
    $r = array();
    foreach($a as $x=>$f){
        $l = strlen($x);
        $r[$x] = $b = $f * $B / $F;
        if ( $l + $b > 76 )
            return R($a,$f,76-$l);
    }
    return $r;
}

// Apply the function:
$c = R($a,max($a),76-strlen(key($a)));


// Output:
foreach ($a as $x => $f)
    echo '|',str_repeat('-',$c[$x]),"| $x\n";

?>

انتاج:

|-------------------------------------------------------------------------| she
|---------------------------------------------------------------| you
|------------------------------------------------------------| said
|-----------------------------------------------------| alice
|-----------------------------------------------| was
|-------------------------------------------| that
|------------------------------------| as
|--------------------------------| her
|-----------------------------| at
|-----------------------------| with
|--------------------------| on
|--------------------------| all
|-----------------------| this
|-----------------------| for
|-----------------------| had
|-----------------------| but
|----------------------| be
|---------------------| not
|--------------------| they
|--------------------| so
|-------------------| very
|------------------| what

When there is a long word, the bars are adjusted properly:

|--------------------------------------------------------| she
|---------------------------------------------------| thisisareallylongwordhere
|-------------------------------------------------| you
|-----------------------------------------------| said
|-----------------------------------------| alice
|------------------------------------| was
|---------------------------------| that
|---------------------------| as
|-------------------------| her
|-----------------------| with
|-----------------------| at
|--------------------| on
|--------------------| all
|------------------| this
|------------------| for
|------------------| had
|-----------------| but
|-----------------| be
|----------------| not
|---------------| they
|---------------| so
|--------------| very

Python 2.6, 347 chars

import re
W,x={},"a and i in is it of or the to".split()
[W.__setitem__(w,W.get(w,0)-1)for w in re.findall("[a-z]+",file("11.txt").read().lower())if w not in x]
W=sorted(W.items(),key=lambda p:p[1])[:22]
bm=(76.-len(W[0][0]))/W[0][1]
U=lambda n:"_"*int(n*bm)
print "".join(("%s\n|%s| %s "%((""if i else" "+U(n)),U(n),w))for i,(w,n)in enumerate(W))

انتاج:

 _________________________________________________________________________
|_________________________________________________________________________| she 
|_______________________________________________________________| you 
|____________________________________________________________| said 
|_____________________________________________________| alice 
|_______________________________________________| was 
|___________________________________________| that 
|____________________________________| as 
|________________________________| her 
|_____________________________| with 
|_____________________________| at 
|____________________________| s 
|____________________________| t 
|__________________________| on 
|__________________________| all 
|_______________________| this 
|_______________________| for 
|_______________________| had 
|_______________________| but 
|______________________| be 
|_____________________| not 
|____________________| they 
|____________________| so 

Python 2.x, latitudinarian approach = 227 183 chars

import sys,re
t=re.split('\W+',sys.stdin.read().lower())
r=sorted((-t.count(w),w)for w in set(t)if w not in'andithetoforinis')[:22]
for l,w in r:print(78-len(r[0][1]))*l/r[0][0]*'=',w

Allowing for freedom in the implementation, I constructed a string concatenation that contains all the words requested for exclusion ( the, and, of, to, a, i, it, in, or, is ) - plus it also excludes the two infamous "words" s and t from the example - and I threw in for free the exclusion for an, for, he . I tried all concatenations of those words against corpus of the words from Alice, King James' Bible and the Jargon file to see if there are any words that will be mis-excluded by the string. And that is how I ended with two exclusion strings: itheandtoforinis and andithetoforinis .

PS. borrowed from other solutions to shorten the code.

=========================================================================== she 
================================================================= you
============================================================== said
====================================================== alice
================================================ was
============================================ that
===================================== as
================================= her
============================== at
============================== with
=========================== on
=========================== all
======================== this
======================== had
======================= but
====================== be
====================== not
===================== they
==================== so
=================== very
=================== what
================= little

Rant

Regarding words to ignore, one would think those would be taken from list of the most used words in English. That list depends on the text corpus used. Per one of the most popular lists ( http://en.wikipedia.org/wiki/Most_common_words_in_English , http://www.english-for-students.com/Frequently-Used-Words.html , http://www.sporcle.com/games/common_english_words.php ), top 10 words are: the be(am/are/is/was/were) to of and a in that have I

The top 10 words from the Alice in Wonderland text are the and to a of it she i you said
The top 10 words from the Jargon File (v4.4.7) are the a of to and in is that or for

So question is why or was included in the problem's ignore list, where it's ~30th in popularity when the word that (8th most used) is not. etc, etc. Hence I believe the ignore list should be provided dynamically (or could be omitted).

Alternative idea would be simply to skip the top 10 words from the result - which actually would shorten the solution (elementary - have to show only the 11th to 32nd entries).

Python 2.x, punctilious approach = 277 243 chars

The chart drawn in the above code is simplified (using only one character for the bars). If one wants to reproduce exactly the chart from the problem description (which was not required), this code will do it:

import sys,re
t=re.split('\W+',sys.stdin.read().lower())
r=sorted((-t.count(w),w)for w in set(t)-set(sys.argv))[:22]
h=min(9*l/(77-len(w))for l,w in r)
print'',9*r[0][0]/h*'_'
for l,w in r:print'|'+9*l/h*'_'+'|',w

I take an issue with the somewhat random choice of the 10 words to exclude the, and, of, to, a, i, it, in, or, is so those are to be passed as command line parameters, like so:
python WordFrequencyChart.py the and of to ai it in or is <"Alice's Adventures in Wonderland.txt"

This is 213 chars + 30 if we account for the "original" ignore list passed on command line = 243

PS. The second code also does "adjustment" for the lengths of all top words, so none of them will overflow in degenerate case.

 _______________________________________________________________
|_______________________________________________________________| she
|_______________________________________________________| superlongstringstring
|_____________________________________________________| said
|______________________________________________| alice
|_________________________________________| was
|______________________________________| that
|_______________________________| as
|____________________________| her
|__________________________| at
|__________________________| with
|_________________________| s
|_________________________| t
|_______________________| on
|_______________________| all
|____________________| this
|____________________| for
|____________________| had
|____________________| but
|___________________| be
|___________________| not
|_________________| they
|_________________| so

Scala 2.8, 311 314 320 330 332 336 341 375 characters

including long word adjustment. Ideas borrowed from the other solutions.

Now as a script ( a.scala ):

val t="\\w+\\b(?<!\\bthe|and|of|to|a|i[tns]?|or)".r.findAllIn(io.Source.fromFile(argv(0)).mkString.toLowerCase).toSeq.groupBy(w=>w).mapValues(_.size).toSeq.sortBy(-_._2)take 22
def b(p:Int)="_"*(p*(for((w,c)<-t)yield(76.0-w.size)/c).min).toInt
println(" "+b(t(0)._2))
for(p<-t)printf("|%s| %s \n",b(p._2),p._1)

Run with

scala -howtorun:script a.scala alice.txt

BTW, the edit from 314 to 311 characters actually removes only 1 character. Someone got the counting wrong before (Windows CRs?).


Scala, 368 chars

First, a legible version in 592 characters:

object Alice {
  def main(args:Array[String]) {
    val s = io.Source.fromFile(args(0))
    val words = s.getLines.flatMap("(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(_)).map(_.toLowerCase)
    val freqs = words.foldLeft(Map[String, Int]())((countmap, word)  => countmap + (word -> (countmap.getOrElse(word, 0)+1)))
    val sortedFreqs = freqs.toList.sort((a, b)  => a._2 > b._2)
    val top22 = sortedFreqs.take(22)
    val highestWord = top22.head._1
    val highestCount = top22.head._2
    val widest = 76 - highestWord.length
    println(" " + "_" * widest)
    top22.foreach(t => {
      val width = Math.round((t._2 * 1.0 / highestCount) * widest).toInt
      println("|" + "_" * width + "| " + t._1)
    })
  }
}

The console output looks like this:

$ scalac alice.scala 
$ scala Alice aliceinwonderland.txt
 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|____________________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| at
|______________________________| with
|_____________________________| s
|_____________________________| t
|___________________________| on
|__________________________| all
|_______________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| they
|____________________| so
|___________________| very
|___________________| what

We can do some aggressive minifying and get it down to 415 characters:

object A{def main(args:Array[String]){val l=io.Source.fromFile(args(0)).getLines.flatMap("(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(_)).map(_.toLowerCase).foldLeft(Map[String, Int]())((c,w)=>c+(w->(c.getOrElse(w,0)+1))).toList.sort((a,b)=>a._2>b._2).take(22);println(" "+"_"*(76-l.head._1.length));l.foreach(t=>println("|"+"_"*Math.round((t._2*1.0/l.head._2)*(76-l.head._1.length)).toInt+"| "+t._1))}}

The console session looks like this:

$ scalac a.scala 
$ scala A aliceinwonderland.txt
 _________________________________________________________________________
|_________________________________________________________________________| she
|_______________________________________________________________| you
|_____________________________________________________________| said
|_____________________________________________________| alice
|_______________________________________________| was
|____________________________________________| that
|____________________________________| as
|_________________________________| her
|______________________________| at
|______________________________| with
|_____________________________| s
|_____________________________| t
|___________________________| on
|__________________________| all
|_______________________| had
|_______________________| but
|______________________| be
|______________________| not
|____________________| they
|____________________| so
|___________________| very
|___________________| what

I'm sure a Scala expert could do even better.

Update: In the comments Thomas gave an even shorter version, at 368 characters:

object A{def main(a:Array[String]){val t=(Map[String, Int]()/:(for(x<-io.Source.fromFile(a(0)).getLines;y<-"(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r findAllIn x) yield y.toLowerCase).toList)((c,x)=>c+(x->(c.getOrElse(x,0)+1))).toList.sortBy(_._2).reverse.take(22);val w=76-t.head._1.length;print(" "+"_"*w);t map (s=>"\n|"+"_"*(s._2*w/t.head._2)+"| "+s._1) foreach print}}

Legibly, at 375 characters:

object Alice {
  def main(a:Array[String]) {
    val t = (Map[String, Int]() /: (
      for (
        x <- io.Source.fromFile(a(0)).getLines
        y <- "(?i)\\w+\\b(?<!\\bthe|and|of|to|a|i|it|in|or|is)".r.findAllIn(x)
      ) yield y.toLowerCase
    ).toList)((c, x) => c + (x -> (c.getOrElse(x, 0) + 1))).toList.sortBy(_._2).reverse.take(22)
    val w = 76 - t.head._1.length
    print (" "+"_"*w)
    t.map(s => "\n|" + "_" * (s._2 * w / t.head._2) + "| " + s._1).foreach(print)
  }
}




code-golf