The Free Dictionary  
mailing list For webmasters
Welcome Guest Forum Search | Active Topics | Members

MASAKARI: The people's choice 'General Purpose Grade' English wordlist Options
Drag0nspeaker
Posted: Tuesday, December 04, 2012 3:41:14 PM

Rank: Advanced Member

Joined: 9/12/2011
Posts: 24,719
Neurons: 128,036
Location: Livingston, Scotland, United Kingdom
Quote:
(re: Sancta Sophia) Even Sanmayce is stunned here, how to call this "conversion", in my view the sultan should have destroyed it COMPLETELY and just then build what he wanted.


I would just call it a rededication.
Since the building is celebrating Hagia Sophia - Holy Wisdom - it does not really deserve to be considered Christian or Muslim or anything else - they are all trying to be 'Holily Wise'.

hmmmThink if you want to consider 'elf' to be a masculine noun, then surely the feminine should be woelf!






Wyrd bið ful aræd - bull!
Sanmayce
Posted: Thursday, December 06, 2012 1:27:22 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
I have only one word for such performers/songs - natural.
Thanks a lot for this gem-video leonAzul.
I watched 2 'covers', they appeared soulless to me.

Lyrics
Chambers Brothers - Time Has Come Today

Time has come today
Young hearts can go their way
Can't put it off another day
I don't care what others say
They say we don't listen anyway
Time has come today
(Hey)

Oh
The rules have changed today (Hey) [or The room has changed today]
I have no place to stay (Hey)
I'm thinking about the subway (Hey)
My love has flown away (Hey) [or blown away]
My tears have come and gone (Hey)
Oh my Lord, I have to roam (Hey) [or to run]
I have no home (Hey)
I have no home (Hey)

Now the time has come (Time)
There's no place to run (Time)
I might get burned up by the sun (Time)
But I had my fun (Time)
I've been loved and put aside (Time)
I've been crushed by the tumbling tide (Time)
And my soul has been psychedelicized (Time)

(Time)
Now the time has come (Time)
There are things to realize (Time)
Time has come today (Time)
Time has come today (Time)

Time [x11]

Oh
Now the time has come (Time)
There's no place to run (Time)
I might get burned up by the sun (Time)
But I had my fun (Time)
I've been loved and put aside (Time)
I've been crushed by tumbling tide (Time)
And my soul has been psychedelicized (Time)

(Time)
Now the time has come (Time)
There are things to realize (Time)
Time has come today (Time)
Time has come today (Time)

Time [x4]
Yeah


I want these lyrics to be 100% errorless, do you see any bug left?
This video is already an official video-sound facet of sub-project 'Masakari', yeah!

Last night another update of Masakari package has been done: dumping a suggestion (EXHAUSTIVE!) list for all unfamiliar (taken from your files) to current corpus (315,933 words) thus allowing to explore all 'akin' words within order 1 and 2 of all possible combinations of all adjacent chars.
This new feature makes of Masakari a word-play-tool i.e. something to play with.
My approach is simple and slow (for now), by using only two wildcards '^' and '$' (first stands for "any ALPHA character OR empty", second stands for "any ALPHA character AND not empty") the suggestion list will be always non-empty.

A quick example, let our text file named 'OneOfYourFiles.txt' (located in '\Masakari\Your_textual_folders' folder) contains two dummy lines full of errors:
To furify psychedlicized throng at lenght. <- This line has 3 unfamiliar words to our 315,933 words
To purify psychedlicized throng at length. <- This line has 1 unfamiliar word to our 315,933 words

Then just run 'RUNME_suggest.BAT' (located in '\Masakari' folder) and after several seconds you will have in NOTEPAD all suggestions for each unfamiliar word listed by groups.
Also, as usual 'Masakari.html' is autoloaded into your WEB-browser, in this case showing these 3 unfamiliar words:
furify
lenght
psychedlicized


For first word the result is:
The group of '$urify|':
aurify
purify

The group of 'f$rify|':
The group of 'fu$ify|':
The group of 'fur$fy|':
The group of 'furi$y|':
The group of 'furif$|':
The group of '$$rify|':
aerify
aurify
purify
verify

The group of 'f$$ify|':
The group of 'fu$$fy|':
The group of 'fur$$y|':
furphy
The group of 'furi$$|':
furies
The group of 'furif^|':
The group of 'furi^^|':
furies

For second word the result is:
The group of '$enght|':
The group of 'l$nght|':
The group of 'le$ght|':
The group of 'len$ht|':
The group of 'leng$t|':
The group of 'lengh$|':
The group of '$$nght|':
The group of 'l$$ght|':
The group of 'le$$ht|':
The group of 'len$$t|':
The group of 'leng$$|':
length
The group of 'lengh^|':
The group of 'leng^^|':
length

SOED has:
psychedelic, adjective & noun.
...
* psychedelically adverb M20.
* psychedelicize /-sVIz/ verb trans. (colloq.) make psychedelic or bizarrely colourful M20.

Seeing how leonAzul misspelled 'psychedelicized' let us exploit it and see what suggestion list it yields:

Stage 1, checking order 1:
The group of '$sychedlicized|':
The group of 'p$ychedlicized|':
The group of 'ps$chedlicized|':
The group of 'psy$hedlicized|':
The group of 'psyc$edlicized|':
The group of 'psych$dlicized|':
The group of 'psyche$licized|':
The group of 'psyched$icized|':
The group of 'psychedl$cized|':
The group of 'psychedli$ized|':
The group of 'psychedlic$zed|':
The group of 'psychedlici$ed|':
The group of 'psychedliciz$d|':
The group of 'psychedlicize$|':

Stage 2, checking order 2:
The group of '$$ychedlicized|':
The group of 'p$$chedlicized|':
The group of 'ps$$hedlicized|':
The group of 'psy$$edlicized|':
The group of 'psyc$$dlicized|':
The group of 'psych$$licized|':
The group of 'psyche$$icized|':
The group of 'psyched$$cized|':
The group of 'psychedl$$ized|':
The group of 'psychedli$$zed|':
The group of 'psychedlic$$ed|':
The group of 'psychedlici$$d|':
The group of 'psychedliciz$$|':

Stage 3, ensuring a neverempty suggestion list:
The group of 'psychedlicize^|':
The group of 'psychedliciz^^|':
The group of 'psychedlici^^^|':
The group of 'psychedlic^^^^|':
The group of 'psychedli^^^^^|':
The group of 'psychedl^^^^^^|':
The group of 'psyched^^^^^^^|':
psyched
psychedelia
psychedelic
psychedelics


No worries for getting an empty list, since Masakari's wordlist contains all 26 single letters as entries (here 'p^^^^^^^^^^^^^|' ensures nonemptyness).

Current revision dumps all (for educational purposes) hits from stages 1 and 2, as for stage 3 it dumps all hits from the first group yielding any hit(s) and skips next patterns i.e. sub-stages. For above example:
The group of 'psychedlicize^|':
The group of 'psychedliciz^^|':
The group of 'psychedlici^^^|':
The group of 'psychedlic^^^^|':
The group of 'psychedli^^^^^|':
The group of 'psychedl^^^^^^|':
The group of 'psyched^^^^^^^|':
First hits occur in this sub-stage! Skip the rest sub-stages!
The group of 'psyche^^^^^^^^|':
The group of 'psych^^^^^^^^^|':
The group of 'psyc^^^^^^^^^^|':
The group of 'psy^^^^^^^^^^^|':
The group of 'ps^^^^^^^^^^^^|':
The group of 'p^^^^^^^^^^^^^|':

It is a matter of tweaking how many suggestions one would want, default setting will be to stop as soon as first X hits are obtained (either from each stage or group).
This means that if stage 1 is successful then stages 2 and 3 might be skipped.
The same is valid for each step within the stage 3 - we don't need e.g. all p* words.

All-in-all, just write words (you are interested in) into some file and copy it to '\Masakari\Your_textual_folders' folder and double-click on 'RUNME_suggest.BAT', then you will be able to explore the outcome of three stages mentioned above. This will save you time to look up dictionaries.

Of course Masakari's wordlist is to be enriched with abovemissed (outs, psychedelicize, ...) and other (SOED's out* verbs, schmucky, schnooky, ...) words soon.

@Drag0nspeaker
>I would just call it a rededication.
You are maybe right, but I know a thing about Orthodox Christianity and desecration/defilement or even defiliation (wow what a word, SOED lacks it!) is what takes place here. In Bulgaria alone many churches were 'rededicated' to stables/prisons and ... no need for going through.

Abstraction of a child from its parents.
http://www.thefreedictionary.com/Defiliation

>... then surely the feminine should be woelf!
If your logic is based on 'man'/'woman' pair, maybe, but I don't like it, for instance the plural 'elves' I don't like either, my choice is:
elf/elfs/elfess/elfesses

Yet a very good article suggests two other forms for feminine forms 'elfen/elven' and 'elfe'.
http://encyclopedia.thefreedictionary.com/elves

Here I recall one very good song 'Nightwish - Elvenpath', I love Tarja Turunen, for those who missed her songs I recommend my favs:
"Elvenpath"
"Beauty and the Beast"
"Angels Fall First"
"Know Why the Nightingale Sings"
"She Is My Sin"
"Nemo"
"10th Man Down"
a supersong indeed, I love it.


He learns not to learn and reverts to what all men pass by.
Jyrkkä Jätkä
Posted: Thursday, December 06, 2012 1:39:42 PM

Rank: Advanced Member

Joined: 9/21/2009
Posts: 37,069
Neurons: 226,557
Location: Helsinki, Southern Finland Province, Finland
Then, Sanmayce,
you maybe like Tarja Turunen's traditional Christmas song Varpunen Jouluaamuna (a Sparrow in Christmas Morning).
It's a very beloved song for every Finn, old and young.

http://www.youtube.com/watch?v=IuLtbZPIbcE


In the beginning there was nothing, which exploded.
Sanmayce
Posted: Thursday, December 06, 2012 2:07:35 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
For my regret many of her songs are unknown to me, one night I will exhaustively listen to them.
Thanks JJ, never heard this piece-of-sadness, very touchy, some fragments of her singing remind me of Sissel's technique in 'Prince Igor'.


He learns not to learn and reverts to what all men pass by.
Jyrkkä Jätkä
Posted: Thursday, December 06, 2012 2:19:20 PM

Rank: Advanced Member

Joined: 9/21/2009
Posts: 37,069
Neurons: 226,557
Location: Helsinki, Southern Finland Province, Finland
Varpunen jouluaamuna

(An old and beloved Finnish Christmas carol composed by Otto Kotilainen 1913 to the poem by Zacharias Topelius, 1859)

Lumi on jo peittänyt kukat laaksosessa,
järven aalto jäätynyt talvipakkasessa.
Varpunen pienoinen, syönyt kesäeinehen,
järven aalto jäätynyt talvipakkasessa.

Pienen pirtin portailla oli tyttökulta:
tule, varpu, riemulla, ota siemen multa!
Joulu on, koditon varpuseni onneton,
tule tänne riemulla, ota siemen multa!

Tytön luo nyt riemuiten lensi varpukulta:
kiitollisna siemenen otan kyllä sulta.
Palkita Jumala tahtoo kerran sinua.
kiitollisna siemenen otan kyllä sulta!

En mä ole, lapseni, varpu tästä maasta,
olen pieni veljesi, tulin taivahasta.
Siemenen pienoisen, jonka annoit köyhällen,
pieni sai sun veljesi enkeleitten maasta.



The snow has covered the flowers in the valley,
the wave of the lake frozen in the winter cold.
A little sparrow, eaten it's summer food,
the wave of the lake frozen in the winter cold.

On the stairs of a little cottage sits a dear girl:
come, sparrow, happily, take a seed from me!
It is Christmas, my poor homeless sparrow,
come here happily, take a seed from me!

To the girl now happily flew the dear sparrow:
thankfully a seed I will take from you.
God will want to reward you one day.
Thankfully a seed I will take from you!

I am not, my child, a bird of this earth,
I am your little brother, I came from heaven.
The little seed, that you gave to the poor,
got your little brother from the land of angels.


(Sorry if this is threadjacking ;-)

In the beginning there was nothing, which exploded.
Sanmayce
Posted: Thursday, December 06, 2012 2:27:14 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
FANTASTIC!!!

Thanks for the text, you are reading my mind, just was in search for it.
>(Sorry if this is threadjacking ;-)
Nonsense.

He learns not to learn and reverts to what all men pass by.
leonAzul
Posted: Thursday, December 06, 2012 4:29:07 PM

Rank: Advanced Member

Joined: 8/11/2011
Posts: 7,836
Neurons: 24,496
Location: Miami, Florida, United States
Sanmayce wrote:
I have only one word for such performers/songs - natural.
Thanks a lot for this gem-video leonAzul.
I watched 2 'covers', they appeared soulless to me.


Lyrics
Chambers Brothers - Time Has Come Today

Time has come today
Young hearts can go their way
I can't put it off another day
I don't care what others say
They think we don't listen anyway
Time has come today
(Hey)

Oh
The rules have changed today (Hey)
I have no place to stay (Hey)
I can't think about the subway (Hey)
Love has flown away (Hey)
My tears have come and gone (Hey)
Oh my Lord, I have to roam (Hey)
I have no home (Hey)
I have no home (Hey)

Now the time has come: (Time)
No place to run (Time)
I might get burned up by the sun (Time)
But I had my fun (Time)
I've been loved and put aside (Time)
I've been crushed by the tumbling tide (Time)
And my soul's been psychedelicized (Time)

(Time)
Now the time has come (Time)
There are things to realize (Time)
Time has come today (Time)
Time has come today (Time)

Time [x11]

Oh
Now the time has come: (Time)
No place to run (Time)
I might get burned up by the sun (Time)
But I had my fun (Time)
I've been loved and put aside (Time)
I've been crushed by th' tumbling tide (Time)
And my soul's been psychedelicized (Time)

(Time)
Now the time has come (Time)
There are things to realize (Time)
Time has come today (Time)
Time has come today (Time)


Sanmayce wrote:

I want these lyrics to be 100% errorless, do you see any bug left?

You got most of it. I just tidied up a few loose ends.

Sanmayce wrote:

Seeing how leonAzul misspelled 'psychedelicized' let us exploit it and see what suggestion list it yields:


Oopsie.

I hope I caught any typographical errors above. Anxious

Just the same, you can expect errors along with legitimate neologisms in your corpora, so I suppose there is some value in my mistake in a Bayesian sort of way.

Whistle



"Make it go away, Mrs Whatsit," he whispered. "Make it go away. It's evil."
Sanmayce
Posted: Sunday, December 09, 2012 11:49:55 AM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Thanks for the lyrics proofing leonAzul.

They think we don't listen anyway
I can't think about the subway (Hey)

Is there any chance the second red passage to be AIN'T ...
My hearing (or rather decoding) is not good at all, for some reason (I guess a psychological one) it is just bad.

>... so I suppose there is some value in my mistake in a Bayesian sort of way.
Of course it has value, every mistake is a stepping stone.

In one of my recent posts I did shoot a phrase without knowing how plausible it is:
"... a full in shades or a shadeful image."
I was thinking of (or rather associating with) known to me 'rich-in-something' as in 'rich in ideas' correct phrase.
Even then I knew that its usage should be checked, so backpedalling a little.
For my disappointment after looking in my mini 3-gram corpus the result was:

Tchutcheling '* rich_in_shades|' ...
Tchutcheling '* rich_of_shades|' ...
Tchutcheling '* full_in_shades|' ...
Tchutcheling '* full_of_shades|' ...
Dumped lines i.e. hits so far: 1
Tchutchelo: Encountered lines in all files: 100,088,208

E:\_Gamera_r15_12348>type Tchutchelo.txt
0,000,004 full_of_shades

Strange, even though a phrase like 'full of himself' was known to me at that past moment I chose 'full in shades' because 'rich in shades' analog was on my mind along with the beautiful 'shadeful'.
For 'shade' entry SOED has only:
* shadeless adjective E17.
* shadelessness noun L19.

Why not shadeful/shadefulness!

Apparently, millions of lines of English texts lack my 'full in shades'/'rich in shades', perhaps they don't sound natively enough, is that so?

In order other people to have opportunity to play with my 3-grams I included those 100 million 3-grams to the package.
To upload or not to upload was the question, for research purposes I decided 'to'.
Actually sub-project 'MASAKARI' has its own page already, here.

Next file 'Example.txt' located in 'Masakari\_Gamera_r15_3-grams\Your_textual_folders' folder is being ripped down to 3-grams by running 'RUNME.BAT':
HERITAGE holds these DASHLESS compound 'out*' adverbs/nouns/idioms:
out and away;
out box;
out at the elbows;
out at the heel;
out at the heels;
out from under;
out in the cold;
out loud;
out of it;
out of breath;
out of character;
out of commission;
out of date;
out of hand;
out of humor;
out of joint;
out of key;
out of line;
out of luck;
off of (one's) gourd;
out of (one's) gourd;
off of (one's) head;
out of (one's) head;
out of phase;
out of play;
out of plumb;
out off plumb;
out of pocket;
out of print;
out of season;
out of sight;
out of sorts;
out of square;
out of step;
out of stock;
out of the blocks;
out of the blue;
out of the loop;
out of the question;
out of the running;
out of the way;
out of the woods;
out of the woodwork;
out of this world;
out of turn;
out of wedlock;
out of whack;
out of work;
(out) on a limb;
out to lunch;


Below is the output:
0,043,510 out_of_sight
0,014,244 of_the_woods
0,076,026 of_the_way
0,002,639 out_of_line
0,000,482 of_the_woodwork
0,000,017 one_s_gourd
0,003,817 one_s_head
0,010,738 of_the_blue
0,000,736 out_and_away
0,002,025 out_of_joint
0,000,666 out_of_stock
0,000,728 out_of_humor
0,000,137 out_of_key
0,003,654 out_of_work
0,001,531 out_of_luck
0,000,122 out_of_square
0,000,199 out_of_plumb
0,009,641 out_of_one
0,000,494 out_of_turn
0,001,678 out_of_pocket
0,075,099 out_of_it
0,001,603 at_the_heels
0,000,477 out_of_whack
0,031,148 of_the_question
0,001,953 out_of_sorts
0,000,258 out_of_play
0,001,156 of_the_blocks
0,011,519 in_the_cold
0,003,761 out_of_season
0,000,859 out_to_lunch
0,716,421 out_of_the
0,000,126 off_of_one
0,000,823 out_of_step
0,007,954 out_of_breath
0,079,014 out_in_the
0,001,728 out_of_print
0,000,549 at_the_heel
0,008,009 of_the_loop
0,001,226 out_of_character
0,042,323 out_of_this
0,001,050 out_of_commission
0,001,668 out_of_phase
0,002,314 of_the_running
0,025,213 out_at_the
0,005,060 out_of_hand
0,000,793 out_of_wedlock
0,041,210 of_this_world
0,008,252 out_from_under
0,008,189 out_of_date
0,000,789 on_a_limb
0,000,452 at_the_elbows

The bad thing is that (for now) this 'ranking' is awfully slow, in fact it is useful for checking only few of your phrases.

And to make the full query (by using wildcards) from the console, after 7 minutes of searching we have:

E:\Masakari\_Gamera_r15_3-grams>copy con Tchutchelo.ini
full_in_*
full_of_*
rich_in_*
rich_of_*
^Z
1 file(s) copied.

E:\Masakari\_Gamera_r15_3-grams>Tchutchelo_r2.exe _Gamera.tar.3.sorted.4andabove.lst /xgram
Tchutchelo, revision 2+, written by Kaze.
Purpose1: Reports number of lines(LFs) in files from a given filelist.
Purpose2: Dumps to 'Tchutchelo.txt' all lines (<=960) matching wildcarded line(S) stored in 'Tchutchelo.ini'.
Usage: Tchutchelo.exe AllTXTfiles.lst [/suggest|/xgram]
Example:
D:\>dir *.txt/s/b>AllTXTfiles.lst
D:\>copy con Tchutchelo.ini
*dog|
F6
D:\>Tchutchelo.exe AllTXTfiles.lst
Note1: Seven wildcards are available:
wildcard '*' any character(s) or empty,
wildcard '@'/'#' any character {or empty}/{and not empty},
wildcard '^'/'$' any ALPHA character {or empty}/{and not empty},
wildcard '|'/'%' any NON-ALPHA character {or empty}/{and not empty}.
Note2: Due to different line endings(CRLF in Windows; LF in UNIX)
you must add a '|' wildcard in place of CR:
for example in case of searching for '*.pdf' write '*.pdf|'.
Note3: A pseudo bug exists uncrushed - End-Of-File must be LF character.
Note4: Files can exceed 4GB limit.
Note5: Optional /suggest uses 'Tchutchelo_converted.ini' ('Tchutchelo.ini' converted transparently).
Note6: Optional /xgram uses 'Tchutchelo_converted.ini' ('Tchutchelo.ini' converted transparently).
Note7: When option /suggest is used then 'Tchutchelo.ini' should be wildcardless.
Note8: When option /xgram is used then 'Tchutchelo.ini' should be wildcardless.
Creating 'Tchutchelo_converted.ini' file ...
Tchutcheling '* full_in_*|' ...
Tchutcheling '* full_of_*|' ...
Tchutcheling '* rich_in_*|' ...
Tchutcheling '* rich_of_*|' ...
Dumped lines i.e. hits so far: 9,553

Tchutchelo: Encountered lines in all files: 100,088,208
Tchutchelo: Shortest line (CR included): 20
Tchutchelo: Longest line (CR included): 52
Tchutchelo: Dumped lines i.e. hits: 9,553
Tchutchelo: Performance: 6,965KB/s

E:\Masakari\_Gamera_r15_3-grams>dir Tchutchelo.txt
Volume in drive E is SSD_Sanmayce
Volume Serial Number is 9CF6-FEA3

Directory of E:\Masakari\_Gamera_r15_3-grams

12/09/2012 01:39 PM 260,891 Tchutchelo.txt
1 File(s) 260,891 bytes
0 Dir(s) 39,457,914,880 bytes free

E:\Masakari\_Gamera_r15_3-grams>type Tchutchelo.txt|more
0,006,120 full_in_the
0,000,698 full_in_his
0,000,395 full_in_view
0,000,296 full_in_my
0,000,288 full_in_her
0,000,222 full_in_sight
0,000,182 full_in_their
0,000,157 full_in_front
0,000,108 full_in_our
0,000,101 full_in_all
0,000,095 full_in_a
0,000,080 full_in_its
0,000,067 full_in_this
0,000,064 full_in_six
0,000,053 full_in_himself
0,000,051 full_in_your
0,000,043 full_in_chapter
0,000,039 full_in_appendix
0,000,037 full_in_man
0,000,035 full_in_face
0,000,028 full_in_order
0,000,027 full_in_him
0,000,023 full_in_section
0,000,021 full_in_that
0,000,020 full_in_it
0,000,019 full_in_form
0,000,018 full_in_such
0,000,018 full_in_everything
0,000,017 full_in_terms
0,000,017 full_in_knowledge
0,000,017 full_in_every
0,000,016 full_in_these
0,000,016 full_in_one
0,000,016 full_in_another
0,000,015 full_in_other
0,000,014 full_in_themselves
0,000,012 full_in_regard
...
0,021,323 full_of_the
0,005,201 full_of_a
0,004,619 full_of_water
0,004,610 full_of_love
0,004,590 full_of_life
0,003,704 full_of_people
0,003,697 full_of_light
0,003,622 full_of_joy
0,003,563 full_of_tears
0,002,888 full_of_all
0,002,833 full_of_them
0,002,756 full_of_his
0,002,731 full_of_grace
0,002,729 full_of_it
0,002,358 full_of_hope
0,002,085 full_of_good
0,001,983 full_of_interest
0,001,824 full_of_gold
0,001,821 full_of_such
0,001,693 full_of_energy
0,001,601 full_of_meaning
0,001,559 full_of_holes
0,001,509 full_of_blood
0,001,395 full_of_men
0,001,390 full_of_fear
0,001,346 full_of_wonder
0,001,233 full_of_flowers
0,001,175 full_of_these
0,001,152 full_of_faith
0,001,140 full_of_sorrow
0,001,133 full_of_fire
0,001,110 full_of_this
0,001,091 full_of_gratitude
0,001,079 full_of_compassion
0,001,044 full_of_that
0,001,039 full_of_pain
0,001,034 full_of_fine
0,001,018 full_of_pity
0,001,003 full_of_her
0,000,983 full_of_trouble
0,000,966 full_of_knowledge
0,000,958 full_of_anxiety
0,000,932 full_of_fun
0,000,932 full_of_confidence
0,000,931 full_of_wisdom
0,000,931 full_of_strange
0,000,920 full_of_danger
0,000,899 full_of_enthusiasm
0,000,877 full_of_great
0,000,871 full_of_money
0,000,856 full_of_anger
0,000,851 full_of_promise
0,000,829 full_of_god
0,000,828 full_of_old
0,000,825 full_of_stars
0,000,814 full_of_little
0,000,790 full_of_pride
0,000,785 full_of_thoughts
0,000,784 full_of_courage
0,000,774 full_of_sweet
0,000,772 full_of_wrath
0,000,770 full_of_grief
0,000,765 full_of_glory
0,000,761 full_of_admiration
0,000,760 full_of_sympathy
0,000,755 full_of_an
0,000,747 full_of_wine
0,000,745 full_of_their
0,000,733 full_of_dead
0,000,733 full_of_books
0,000,718 full_of_business
0,000,705 full_of_darkness
0,000,704 full_of_tenderness
0,000,693 full_of_bliss
0,000,675 full_of_new
0,000,667 full_of_years
0,000,659 full_of_evil
0,000,627 full_of_what
0,000,625 full_of_kindness
0,000,622 full_of_terror
0,000,619 full_of_happiness
0,000,613 full_of_spirit
0,000,613 full_of_incense
0,000,609 full_of_power
0,000,608 full_of_eyes
0,000,601 full_of_beauty
0,000,589 full_of_rage
0,000,587 full_of_zeal
0,000,586 full_of_tender
0,000,585 full_of_those
0,000,584 full_of_peace
0,000,582 full_of_thought
0,000,580 full_of_surprises
0,000,574 full_of_hatred
0,000,572 full_of_self
0,000,572 full_of_my
0,000,569 full_of_care
0,000,562 full_of_milk
0,000,562 full_of_feeling
0,000,556 full_of_small
0,000,556 full_of_information
0,000,554 full_of_smoke
0,000,552 full_of_curiosity
0,000,537 full_of_wit
0,000,537 full_of_fish
0,000,534 full_of_sweetness
0,000,533 full_of_hate
0,000,528 full_of_mystery
0,000,526 full_of_ideas
...
0,001,894 rich_in_the
0,000,439 rich_in_a
0,000,428 rich_in_this
0,000,428 rich_in_gold
0,000,423 rich_in_all
0,000,405 rich_in_mercy
0,000,368 rich_in_good
0,000,347 rich_in_his
0,000,341 rich_in_faith
0,000,294 rich_in_minerals
0,000,263 rich_in_cattle
0,000,261 rich_in_america
0,000,241 rich_in_its
0,000,175 rich_in_natural
0,000,172 rich_in_protein
0,000,153 rich_in_iron
0,000,151 rich_in_vitamin
0,000,149 rich_in_their
0,000,145 rich_in_such
0,000,142 rich_in_fruits
0,000,129 rich_in_carbon
0,000,126 rich_in_that
0,000,125 rich_in_food
0,000,120 rich_in_every
0,000,118 rich_in_mineral
0,000,110 rich_in_love
0,000,108 rich_in_sweets
0,000,101 rich_in_milk
0,000,101 rich_in_hope
0,000,100 rich_in_color
0,000,099 rich_in_organic
0,000,098 rich_in_fossils
0,000,094 rich_in_beauty
0,000,093 rich_in_flocks
0,000,091 rich_in_virtue
0,000,090 rich_in_those
0,000,090 rich_in_promise
0,000,089 rich_in_fat
0,000,088 rich_in_both
0,000,087 rich_in_spirit
0,000,087 rich_in_money
0,000,085 rich_in_her
0,000,083 rich_in_colour
0,000,082 rich_in_resources
0,000,081 rich_in_these
0,000,078 rich_in_calcium
0,000,077 rich_in_potassium
0,000,076 rich_in_many
0,000,076 rich_in_grace
0,000,074 rich_in_fish
0,000,072 rich_in_oxygen
0,000,072 rich_in_nutrients
0,000,072 rich_in_meath
0,000,072 rich_in_friends
...
0,000,122 rich_of_the
0,000,089 rich_of_this
0,000,032 rich_of_their
0,000,020 rich_of_myself
0,000,015 rich_of_all
0,000,014 rich_of_a
0,000,013 rich_of_another
0,000,011 rich_of_gold
0,000,010 rich_of_hue
0,000,010 rich_of_both
0,000,009 rich_of_our
0,000,009 rich_of_every
0,000,009 rich_of_earth
0,000,008 rich_of_whom
0,000,007 rich_of_what
0,000,007 rich_of_colour
...


E:\Masakari\_Gamera_r15_3-grams>

That's one way of dumping all 3-grams you want to explore.

He learns not to learn and reverts to what all men pass by.
almostfreebird
Posted: Sunday, December 09, 2012 11:13:53 PM

Rank: Advanced Member

Joined: 4/22/2011
Posts: 2,820
Neurons: 7,024
Location: Japan



I just noticed and recognized the word "Gamera".




leonAzul
Posted: Monday, December 10, 2012 3:54:35 PM

Rank: Advanced Member

Joined: 8/11/2011
Posts: 7,836
Neurons: 24,496
Location: Miami, Florida, United States
Sanmayce wrote:
Thanks for the lyrics proofing leonAzul.

They think we don't listen anyway
I can't think about the subway (Hey)

Is there any chance the second red passage to be AIN'T ...
My hearing (or rather decoding) is not good at all, for some reason (I guess a psychological one) it is just bad.


I had to listen very carefully to this myself for several reasons. One of them is that I share your difficulty with understanding speech when it is sung. My mind often follows the changes in pitch and timbre of a voice as if it were a musical instrument, rather than a person speaking.

The pronunciation of "can't" in this instance is very close to "ain't." In this informal context, the phrases "I can't think about the subway" and "I ain't thinking 'bout the subway" are nearly synonymous — they both mean that living in the subway system is rejected from consideration. Listening to other performances, I came to the conclusion that "can't think" fits the rhythm and sound better. I hear several instances where a /k/ and a glottal stop could be confused, but none where a clipped pronunciation of the "-ing" suffix could be detected that would justify the "ain't thinking" interpretation.

"I can't think" also accords better with the overall theme of the song. When "They think," it is an unfounded assumption; when "I can't think," it is a conclusion based on the evidence of crime statistics.

If you will permit a slight digression here, this song is something of a musical reply to the Beatles' She's Leaving Home. The Beatles describe the puzzlement of parents who thought they had done everything they could for their child, but realize too late that "fun is the one thing that money can't buy."

The Chambers Brothers describe the feeling of abandonment that young adults feel, when the institutions their parents support have become so formal and sterile that they fail to address the human feelings of community and religiosity for which they are named. It is not that material things were bad, but rather that they had become a distraction from the emotional and personal things that everyone, and especially a child, needs to be happy and feel loved.


"Make it go away, Mrs Whatsit," he whispered. "Make it go away. It's evil."
Sanmayce
Posted: Tuesday, December 11, 2012 2:15:42 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Okay, I couldn't bear those awfully slow searches so after some quick tweaks in my flagman (in fact flagelf) Leprechaun, revision 15FIXFIX+ came into being allowing superfast x-grams vs x-grams checks.

To illustrate the boost (using file 'Example.txt' from my last post) I threw its 59 3-grams against MASAKARI's 100,088,208 3-grams and after 71 seconds the file 'Your_words_unfamiliar_to_Masakari.txt' was autoloaded into NOTEPAD:
adverbs_nouns_idioms
compound_out_adverbs
dashless_compound_out
heritage_holds_these
holds_these_dashless
out_adverbs_nouns
out_off_plumb
these_dashless_compound

/Unfamiliar 3-grams to MASAKARI, powered with 100,088,208 3-grams, copyleft Sanmayce 2012-Dec-11/

This is the fastest execution I am capable of (for now), here the latency (i.e. the starting overhead is huge (70s), BUT there is no search-structure whatsoever - it means the External/Internal memory footprint is supersmall).
And of course the throughput/bandwidth is EXCELLENT, for instance getting the "suspicious" 3-grams from 200MB pure English text (comprised of 22+ million 3-grams) took only 128 seconds!
In other words, checking whether 22 million phrases are within/outwith corpus of 100 million phrases takes 2 minutes which equals 22,531,192/128=176,024 phrases per second, a mutsi performance in my eyes.
My approach resembles arming the Masakari with long range weapons mainly, these 100 million phrases are "bare necessities", the real-world corpus is more likely to be in range 1-4 billion, where the sort approach can't stand a chance, I mean the current approach is superb for really heavy loads.

@almostfreebird
>I just noticed and recognized the word "Gamera".
Shame, shame, you gotta catch up, 'Gamera' is a Japanese hallmark.



I am crazy on this theme, embodiment of Earth Spirit facing the abominations - there is simply no better theme in the whole world - it trumps all religious/mythical clash-ons.
The final scene (after battle versus God Evil Iris) where Gamera walks through hellish fires betrayed, injured and pawless and still remaining kind - this is the epitomy of Kami-Gamera.
The Gamera's look is so moving in that scene - a martyr.

@leonAzul
Thank you very much, I like your analysis.
>It is not that material things were bad, but rather that they had become a distraction from the emotional and personal things that everyone, and especially a child, needs to be happy and feel loved.
So true.

Now I want to share my joy with a very beloved performance of Madonna which is in that spirit:

Lyrics
Madonna- Nobody Knows Me

I've had so many lives
Since I was a child
And I realize
How many times I've died

I'm not that kind of guy
Sometimes I feel shy
I think I can fly
Closer to the sky

No one's telling you how to live your life
But it's a setup until you're fed up

This world is not so kind
People trap your mind
It's so hard to find
Someone to admire

I, I sleep much better at night
I feel closer to the light
Now I'm gonna try
To improve my life

No one's telling you how to live your life
But it's a setup until you're fed up
It's no good when you're misunderstood
But why should I care
What the world thinks of me
Won't let a stranger
Give me a social disease

Nobody, nobody knows me
Nobody knows me
...


I spotted a comment 'Incroyable talent', I would say an artist who defies the social disease relentlessly in a catwalk manner.


He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Friday, December 14, 2012 12:54:49 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
At last I wrote the long overdue revision 16 of my x-gram ripper.
The ability to make instant (with less than a second latency i.e. initial response time) queries has been added.
The price for such functionality is one additional 7GB file (speaking of my 100 million 3-grams).
These raw 3-grams are 2.73GB in size, the search-structure housing them is that 7GB file.
Of course when something is gained something is lost at the same time, here the trade-off concerns speed-size:
With gaining <1s latencies losing bandwidth happens, that is, we lose those hundreds of thousands phrases per second performance since search-structure file is too big to fit fast internal memory.
Quite obviously the two approaches are not rivalrous, they complement each other according to the situation.

Needless to say but anyway I'm foxy enough to feint the odds and to ensure spry behaviour even when using netbooks (with only 512MB main RAM), the current needs are 128MB physical memory to house the HASH, the rest i.e. TREES can reside even on HDD, SSD is at least ten times better.
I am gonna make it available as soon as the reripping finishes, had my computer had 64GB RAM I would have finished by now.

Just want to put on the table the weight of one freely downloadable English text corpus namely 'enwiki-20120403-pages-articles.xml'.
http://dumps.wikimedia.org/enwiki/
As name suggests it is a compilation of all English Wikipedia articles, the file is 37,430,769,961 bytes long.

Some months ago I ripped it and for 3-grams the stats are:

Total memory needed for one pass: 52,701,578KB
Total distinct phrases: 625,323,984
Total time: 51784 second(s)
Total performance: 61,038P/s i.e. phrases per second

The outcome is:
enwiki-20120403-pages-articles.3.sorted 19,354,345,361 bytes

For example those 19,354,345,361 bytes are the pure/raw data they require 52,701,578KB HASH+TREES structure for fast searching.
Through practical rips as this it becomes clear what are the hardware requirements for dealing with one real-world text corpus.

Having removed 'occurrences' field we have: 19,354,345,361 - (625,323,984*10) = 13,101,105,521 raw x-grams,
now I need the signal-noise ratio, here, raw x-grams vs search-structure ratio which
is 13,101,105,521/52,701,578KB or 12,794,048KB/52,701,578KB = 24% (bigger-the-better).
For me 1:4 ratio is a good one especially when x-grams are ten times as much, just take a seat: a 500GB search-structure.

elfin adj.
1a. Relating to or suggestive of an elf.
1b. Made, done, or produced by an elf.
2. Small and sprightly or mischievous.
3. Having a magical quality or charm; fairylike: moved across the dimly lit stage with elfin grace.

/HERITAGE/

The word that locked/captured my lock 'sprightly':
'adj. Full of spirit and vitality; lively; brisk. adv. In a lively, animated manner.'
was used within the 'small_and_sprightly_or_mischievous' 5-gram, the 3-gram I want to have in my armament is 'small_and_sprightly' - fitting elf description well.
Also the 'moved_across_the_dimly_lit_stage_with_elfin_grace' 9-gram is yummy, the 3-gram I want to enrich my 3-gram corpus is 'with_elfin_grace'.

My point: corpus lacking the must-have 3-grams 'small_and_sprightly' and 'with_elfin_grace' is a crippled one, the same goes for all those nifty x-grams floating in books/magazines/newspapers not grabbed/ripped yet - this is the source of my avariciousness/greediness - I want them all.

Two new words entered my realm:

elfin, adjective & noun.
A. adjective.
1. Of, pertaining to, or produced by an elf or elves; of the nature of an elf. L16.
2. Diminutive; full of strange charm; suggestive of an elf. L18.
B. noun.
1. An elf. L16.
2. The land or realm of the elves. Scot. L16.
3. A child. M18.

/SOED/

Oh, as much as I cherish SOED I can't help saying here HERITAGE outdefines SOED with second and third definitions ... definitely.
I was under the "mainstream" entrapment that the proper adjective(s) must look like as elf-Y or/and elf-IC or/and elf-OUS or/and elf-ISH, that's why looking up dictionaries is always a have-to (or must-do) thing.

Now, who can derive the adverb(s)!

The second word popped up by itself while I was looking at ENJOY, the beautiful counterpart ENFUN appeared uninvited. I hope I have not plagiarized it.
Very interesting EN- prefixed verbs are out there, many many more 'uninvited' are waiting in the wings.

He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Monday, December 31, 2012 9:48:46 AM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Two things, to state the stats of current phrase corpora I am wrestling with and to say thanks to two compression specialists.

ENWIKI's 1-grams: enwiki-20120403-pages-articles.1.sorted 271,221,937 bytes
ENWIKI's 2-grams: enwiki-20120403-pages-articles.2.sorted 4,861,307,858 bytes
ENWIKI's 3-grams: enwiki-20120403-pages-articles.3.sorted 19,354,345,361 bytes

GAMERA's 1-grams: _Gamera.tar.1.sorted 207,307,606 bytes
GAMERA's 2-grams: _Gamera.tar.2.sorted 3,267,897,913 bytes
GAMERA's 3-grams: _Gamera.tar.3.sorted 14,630,478,827 bytes

4andaboveenwiki-20120403-pages-articles.1.sorted (comprised of 12,475,645 1-grams) 84,222,463 bytes:
Removed all x-grams with 1&2&3 / 1&2 / 1 occurrence(s):
4,014,713 / 4,888,917 / 6,899,422

4andaboveenwiki-20120403-pages-articles.2.sorted (comprised of 187,975,215 2-grams) 915,914,243 bytes:
Removed all x-grams with 1&2&3 / 1&2 / 1 occurrence(s):
36,382,919 / 47,924,490 / 75,191,525

4andaboveenwiki-20120403-pages-articles.3.sorted (comprised of 625,323,984 3-grams) 1,972,732,692 bytes:
Removed all x-grams with 1&2&3 / 1&2 / 1 occurrence(s):
65,903,363 / 94,439,908 / 171,730,429

4andabove_Gamera.tar.1.sorted (comprised of 9,181,275 1-grams) 56,502,419 bytes:
Removed all x-grams with 1&2&3 / 1&2 / 1 occurrence(s):
2,715,302 / 3,311,284 / 4,744,816

4andabove_Gamera.tar.2.sorted (comprised of 124,669,942 2-grams) 889,537,624 bytes:
Removed all x-grams with 1&2&3 / 1&2 / 1 occurrence(s):
35,116,064 / 45,817,353 / 69,801,969

4andabove_Gamera.tar.3.sorted (comprised of 477,829,381 3-grams) 2,938,594,566 bytes:
Removed all x-grams with 1&2&3 / 1&2 / 1 occurrence(s):
100,088,208 / 143,871,600 / 243,752,159

And some console dump to show how superior ST order 5 (with Preceding Contexts for sorting) is:

E:\_Gamera_r15_12348>dir 4*

12/26/2012 02:52 AM 84,222,463 4andaboveenwiki-20120403-pages-articles.1.sorted
12/26/2012 02:52 AM 915,914,243 4andaboveenwiki-20120403-pages-articles.2.sorted
12/26/2012 02:41 AM 1,972,732,692 4andaboveenwiki-20120403-pages-articles.3.sorted
12/25/2012 02:13 AM 56,502,419 4andabove_Gamera.tar.1.sorted
12/25/2012 02:19 AM 889,537,624 4andabove_Gamera.tar.2.sorted
12/25/2012 03:18 AM 2,938,594,566 4andabove_Gamera.tar.3.sorted

E:\_Gamera_r15_12348>6_Graffithize_x-leton_MAX.bat
Graffith(graphite), Text decompressor-finder-dumper, r.02++_Graphein, written by Kaze.
Graffith is a wrapper over bsc version 2.3.0, written by Ilya Grebnov.

4andaboveenwiki-20120403-pages-articles.1.sorted compressed 84222463 into 11033849 in 9.063 seconds. !!! 7.6:1 !!!

Graffith(graphite), Text decompressor-finder-dumper, r.02++_Graphein, written by Kaze.
Graffith is a wrapper over bsc version 2.3.0, written by Ilya Grebnov.

4andaboveenwiki-20120403-pages-articles.2.sorted compressed 915914243 into 87978461 in 82.547 seconds. !!! 10.4:1 !!!

Graffith(graphite), Text decompressor-finder-dumper, r.02++_Graphein, written by Kaze.
Graffith is a wrapper over bsc version 2.3.0, written by Ilya Grebnov.

4andaboveenwiki-20120403-pages-articles.3.sorted compressed 1972732692 into 192143780 in 194.281 seconds. !!! 10.2:1 !!!

Graffith(graphite), Text decompressor-finder-dumper, r.02++_Graphein, written by Kaze.
Graffith is a wrapper over bsc version 2.3.0, written by Ilya Grebnov.

4andabove_Gamera.tar.1.sorted compressed 56502419 into 6870598 in 6.297 seconds. !!! 8.2:1 !!!

Graffith(graphite), Text decompressor-finder-dumper, r.02++_Graphein, written by Kaze.
Graffith is a wrapper over bsc version 2.3.0, written by Ilya Grebnov.

4andabove_Gamera.tar.2.sorted compressed 889537624 into 79620218 in 84.250 seconds. !!! 11.1:1 !!!

Graffith(graphite), Text decompressor-finder-dumper, r.02++_Graphein, written by Kaze.
Graffith is a wrapper over bsc version 2.3.0, written by Ilya Grebnov.

4andabove_Gamera.tar.3.sorted compressed 2938594566 into 255637638 in 272.703 seconds. !!! 11.4:1 !!!

E:\_Gamera_r15_12348>dir 4and*.bsc

12/26/2012 03:11 AM 11,033,849 4andaboveenwiki-20120403-pages-articles.1.sorted.bsc
12/26/2012 03:12 AM 87,978,461 4andaboveenwiki-20120403-pages-articles.2.sorted.bsc
12/26/2012 03:15 AM 192,143,780 4andaboveenwiki-20120403-pages-articles.3.sorted.bsc
12/26/2012 03:16 AM 6,870,598 4andabove_Gamera.tar.1.sorted.bsc
12/26/2012 03:17 AM 79,620,218 4andabove_Gamera.tar.2.sorted.bsc
12/26/2012 03:22 AM 255,637,638 4andabove_Gamera.tar.3.sorted.bsc

The batch '6_Graffithize_x-leton_MAX.bat' executes 'GRAFFITH_r2++_Graphein_2.3.0_Intel_12.1_32bit.exe e %1 %1.bsc -m2b256 -cp -Tt'. Which is ST order 5.

To compress x-gram files like:
...
0,145,554 could_not_be
0,145,299 that_i_am
0,145,066 who_had_been
0,144,961 the_idea_of
0,144,692 the_spirit_of
0,144,616 and_so_on
0,144,608 to_make_the
0,144,178 that_you_have
0,143,155 npnf_cache_npnf
0,142,964 that_she_was
0,142,741 as_shown_in
0,142,520 have_to_be
0,142,210 in_terms_of
0,141,397 should_not_be
0,141,328 as_a_result
0,141,105 the_result_of
0,140,881 if_he_had
0,140,679 they_are_not
0,140,673 a_matter_of
0,140,500 on_account_of
0,140,493 that_i_have
0,140,489 a_long_time
0,140,197 which_is_the
0,139,954 it_can_be
0,139,700 don_t_have
0,139,688 do_you_think
0,139,627 was_one_of
...
0,034,253 located_in_the
0,034,235 in_the_sky
0,034,223 it_was_just
0,034,210 of_the_many
0,034,198 at_the_present
0,034,177 they_didn_t
0,034,175 should_be_considered
0,034,168 he_was_an
0,034,163 considered_to_be
0,034,149 the_following_code
0,034,147 whole_of_the
0,034,135 voice_of_the
0,034,134 be_associated_with
0,034,119 on_his_face
0,034,117 at_the_age
...
0,019,815 maximum_number_of
0,019,813 experience_of_the
0,019,808 devoted_to_the
0,019,807 as_the_sun
0,019,806 and_the_son
0,019,803 leading_the_way
0,019,801 the_interpretation_of
0,019,798 in_the_sea
0,019,794 the_people_to
0,019,789 without_a_word
0,019,788 and_his_own
0,019,786 there_is_none
...
0,016,300 of_the_boat
0,016,300 as_we_shall
0,016,299 and_vice_versa
0,016,298 testing_and_certification
0,016,298 of_the_imperial
0,016,292 the_city_was
0,016,292 and_they_all
0,016,291 on_the_occasion
...
0,011,294 until_it_is
0,011,294 at_length_he
0,011,292 when_i_think
0,011,292 of_new_england
0,011,292 notify_the_webmaster
0,011,292 in_the_more
0,011,292 and_who_are
0,011,292 american_academy_of
0,011,291 t_know_whether
...
0,009,794 the_very_moment
0,009,794 is_to_take
0,009,794 i_had_taken
0,009,793 well_in_the
0,009,793 that_there_may
0,009,793 distinguished_from_the
0,009,793 by_the_sword
...
0,007,614 have_recourse_to
0,007,614 due_to_their
0,007,614 couple_of_days
0,007,613 the_same_or
0,007,613 fall_to_the
0,007,613 at_the_commencement
0,007,612 transformed_into_a
...
0,003,609 prominent_in_the
0,003,609 other_than_individual
0,003,609 of_her_old
0,003,609 not_obliged_to
0,003,609 not_let_her
0,003,609 never_dreamed_of
0,003,609 in_the_discharge
0,003,609 in_his_works
0,003,609 i_came_home
0,003,609 heart_in_the
0,003,609 foundation_of_all
0,003,609 dwells_in_the
0,003,609 do_this_with
0,003,609 deprived_of_their
...

in 11:1 proportion is what I need - a superb phrase compressor and very very fast too, in one word: Phrase-Grinder.

Many thanks go to author of ST - Michael Schindler, author of libdivsufsort - Yuta Mori, also to author of BSC - Ilya Grebnov.

The niftiness of ST lies in:
First step: a limited order Sort Transform. This transformation is related to the Burrows-Wheeler transformation used in blocksorting compression methods.
Second step: a probability model for blocksorted files.
The last step: an entropy encoding using a range coder.

Since I am into factorization (phrase-ripping) the entire written English language, tool like theirs is a must-have or rather a must-use - TeraBytes/10 - mutsi.
AFAIK there is nothing even close to their work (enabling superb English language text compression), I often think of their Phrase-Grinder simply as 'THE SMASHER'.

Thanks to their work next revision of Masakari (~605MB) will offer 1-gram, 2-gram and 3-gram phrase-checking (a new path was taken: external binary search - a well balanced size-speed approach) against two corpora: Gamera revision 15 phrase corpus and enwiki-20120403-pages-articles phrase corpus.

I have been hardly hit by a song last night.
Gone Lyrics
written and performed by Ioanna Gika
from 'Snow White and The Huntsman' Soundtrack

Dark the stars and dark the moon,
Hush the night and the morning loon,
Tell the horses and beat on your drum,
Gone their master, gone their son,

Dark the oceans, dark the sky,
Hush the whales and the ocean tide,
Tell the salt marsh and beat on your drum,
Gone their master, gone their son,

Dark to light and light to dark,
Three black carriages, three white carts,
What brings us together is what pulls us apart,
Gone our brother, gone our heart.
...


Songs like this make me feel how magical life is. Be lifeful.

He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Thursday, January 03, 2013 11:24:45 AM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
First steps into x-gramming with Masakari revision 5

The goal: to offer a free open-source console (command prompt) English phrase spell-checker.
Auditory: everyone under the sun.
Functionality: allows to get stats of how your 1-word/2-word/3-word phrases are related to two huge text corpora: Gamera & Wikipedia.
Minuses: raw - a far cry from what one sensitive spell-checker must do.
Pluses: raw - beautiful in its simplicity.

Quick 'getting started' walkthrough:
Step1: Download the 618MB package from Masakari's homepage.
Step2: Extract it to drive/path of your choice, you are gonna need some 20GB (after the extraction in step 3 they are reduced to 7GB).
Step3: Go to folder '?:\Masakari_revision5\_Gamera_r15_3-grams' and double-click on 'CREATE_BINARY_CHUNKS.BAT' batch file, after about an hour of decompressing the needed files are prepared.
Step4: Copy your TXT (textual) files/folders to '?:\Masakari_revision5\_Gamera_r15_3-grams\Your_textual_folders' folder.
Step5: Now double-click on 'RUNME_2x3.BAT' located in '?:\Masakari_revision5\_Gamera_r15_3-grams' and after a few seconds your results are gonna be loaded into NOTEPAD.

Note1: The good part of current revision 5 is the extremely small footprint/requirements to the hardware - it doesn't allocate any RAM (both virtual and physical) and it stresses the CPU not.
Note2: Due to lots of drive random reads (seek operations) SSDs are far better choice, HDDs are lots slower.

The article benchmark: Japan's K Computer.pdf.txt (6,683 bytes)
Finished in 7 seconds.
The article along with its appendix (1/2/3 stats first for Gamera next for ENWIKI) is here.
A screenshot of the article:



A screenshot of the appendix:



The book benchmark: 'Thus Spake Zarathustra' by Friedrich Nietzsche, revision 4.txt (518,719 bytes)
Finished in (36:12 - 33:23) 169 seconds.

The book-pack benchmark: OSHO.TXT (206,908,949 bytes)
Finished in (54:20 - 13:22) 2458 seconds.

Due to dividing the G&W x-grams to chunks of fixed-length (thus drive-cache friendly) locality came into game guns blazing:
206,908,949:518,719 >> 2458:169 which equals 398 >> 14, hence we have sub-linear (in terms of size vs time) behavior.

Enfun!

He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Tuesday, January 08, 2013 11:43:49 AM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Finally I did a research on the word chosen for this sub-project.

MASAKARI

まさかり

broadaxe

The masakari is mainly used to smooth down and finish timber. There are two types of masakari. One has a longer handle and is used mainly for lumbering work, while the other type with a shorter handle, also called the carpenters' broadax (daiku-masakari), is used for rough carpentry work or for making wedges from scraps of wood.

A carpenter's hewing ax.
This ax was an essential tool for rough hewing work and for smoothing logs to be used as pillars and beams.
It is still used by carpenter's who work on shrines and temples.

Ono and Masakari as religious symbols

The animistic tradition from ancient times state that deities descend to and reside in the mountains. For lumbermen, the mountain was therefore a sacred territory which required strict ritual abstentions to be entered. The ax has been closely related with this religious revering of the mountain and its trees. For example, the first act amongst the myriad of Shinto rituals carried out before the lumbering for the rebuilding of the Ise Shrine every 20 years, is the cutting into a tree with a ritually purified ax (imi-ono). Moreover in the festival of the pillar (Onbashira-matsuri) at the Suwa shrine, a vermillion-lacquered ax is used to cut down a tree which is to become the sacred pillar.

In Buddhist symbolism the ax also acquires the power of cutting off evil, and there are numerous existing statues of bodhisattva holding axes. Shugen-do, a traditional Japanese religion born out of an amalgam of different religions including Shintoism and Buddhism which has a particular connection with mountains, regards the ax as one of the symbolic objects to be carried by practitioners when going into mountains for ascetic training.

Ax is also an important (heavenly/carpentry) instrument in Laoism, a few excerpts from pseudo-chapter 74 of 'Dao De Jing':

Translation: Lin Yutang
And to take the place of the executioner
Is like handling the hatchet for the master carpenter.
He who handles the hatchet for the master carpenter
Seldom escapes injury to his hands.


Translation: Hua-Ching Ni
To become the executioner of artificial righteousness is like the inexperienced lad who would brandish a sharp axe of a master carpenter.
He can seldom escape cutting himself.


Translation: Witter Bynner
Nature is executioner.
When man usurps the place,
A carpenter's apprentice takes the place of the master:
And 'an apprentice hacking with the master's axe
May slice his own hand.'


Translation: Richard Wilhelm
There is always a power of death that kills.
To kill instead of leaving killing to this power of death
is as if one wanted to use the axe oneself
instead of leaving it to the carpenter.
Whosoever would use the axe
instead of leaving it to the carpenter
shall rarely get away
without injuring his hand.


KINTARO - From the folklore of Japan - Legendary symbol of virtue and strength

Kintaro is a beloved legendary and symbolic figure from Japan. Like many legendary figures he appears in both history and mythology. According to classic Japanese literature he was fathered by a great Red Dragon ( the thunder god - see below ) who visited his mountain sorceress mother in a dream. She awoke amidst powerful claps of thunder and knew at once she was with child. Kintaro means "Golden Boy" and his jealous uncle sought to kill him. His mother took him and fled into the Hakone mountains to the deepest forests of Mount Kintoki. Growing up deep in the forest his beautiful spirit caused him to become a special friend to all the wild animals, most especially the rabbits and the bears. He loved to play with his animal friends about the rocks of the Yuhi no Taki Falls. So strong was he as a boy and so gifted at Sumo wrestling that he could throw down a bear. He was a very good boy, rosy-cheeked and chubby and always carried a hatchet, the Japanese symbol of the thunder god and is usually depicted riding his beloved bear.

The full (3 pages) documentlet is here.


He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Sunday, January 13, 2013 9:29:00 AM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
I couldn't resist not to rip the latest unigrams from enwiki-pages-articles (from 2012-Dec) and google-books (from 2012-Oct), so the result is one small (only 88MB) and alacrious word-wildcard-dumper: GRAFFITH_enwiki-pages-articles-2012_google-books-2012_unigrams.zip.

enwiki-20121201-pages-articles.xml 42,153,646,707 bytes
googlebooks-eng-all-1gram-20120701 24,971,498,516 bytes

E:\googlebooks-eng-all-1gram-20120701>dir googlebooks-eng-all-1gram-20120701-?

01/12/2013 03:33 PM 1,801,526,075 googlebooks-eng-all-1gram-20120701-a
01/12/2013 03:36 PM 1,268,392,934 googlebooks-eng-all-1gram-20120701-b
01/12/2013 03:40 PM 2,090,710,388 googlebooks-eng-all-1gram-20120701-c
01/12/2013 03:43 PM 1,252,213,884 googlebooks-eng-all-1gram-20120701-d
01/12/2013 03:45 PM 1,085,415,448 googlebooks-eng-all-1gram-20120701-e
01/12/2013 03:47 PM 959,470,924 googlebooks-eng-all-1gram-20120701-f
01/12/2013 03:49 PM 823,166,881 googlebooks-eng-all-1gram-20120701-g
01/12/2013 03:51 PM 948,615,440 googlebooks-eng-all-1gram-20120701-h
01/12/2013 04:58 PM 1,093,823,911 googlebooks-eng-all-1gram-20120701-i
01/12/2013 03:52 PM 327,435,021 googlebooks-eng-all-1gram-20120701-j
01/12/2013 03:54 PM 547,335,615 googlebooks-eng-all-1gram-20120701-k
01/12/2013 03:56 PM 959,686,094 googlebooks-eng-all-1gram-20120701-l
01/12/2013 03:59 PM 1,501,649,198 googlebooks-eng-all-1gram-20120701-m
01/12/2013 04:00 PM 730,118,203 googlebooks-eng-all-1gram-20120701-n
01/12/2013 04:02 PM 732,438,650 googlebooks-eng-all-1gram-20120701-o
01/12/2013 04:06 PM 1,900,898,850 googlebooks-eng-all-1gram-20120701-p
01/12/2013 04:06 PM 136,197,316 googlebooks-eng-all-1gram-20120701-q
01/12/2013 04:09 PM 1,137,454,640 googlebooks-eng-all-1gram-20120701-r
01/12/2013 04:13 PM 2,316,331,839 googlebooks-eng-all-1gram-20120701-s
01/12/2013 04:16 PM 1,383,305,366 googlebooks-eng-all-1gram-20120701-t
01/12/2013 04:17 PM 466,747,550 googlebooks-eng-all-1gram-20120701-u
01/12/2013 04:18 PM 560,636,053 googlebooks-eng-all-1gram-20120701-v
01/12/2013 04:20 PM 612,797,788 googlebooks-eng-all-1gram-20120701-w
01/12/2013 04:20 PM 70,121,491 googlebooks-eng-all-1gram-20120701-x
01/12/2013 04:20 PM 129,575,526 googlebooks-eng-all-1gram-20120701-y
01/12/2013 04:21 PM 135,433,431 googlebooks-eng-all-1gram-20120701-z

The unigrams are in these two files:
enwiki-20121201-pages-articles.xml_28803331_words_sorted-by-1st-field.txt 590,469,840 bytes (28,803,331 distinct words)
googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt 103,845,200 bytes (5,038,456 distinct words)

E:\>type enwiki-20121201-pages-articles.xml_28803331_words_sorted-by-1st-field.txt|more
...
0,099,630 siege
0,099,627 assume
0,099,624 commissioner
0,099,585 photography
0,099,583 warfare
0,099,579 supposed
0,099,577 decades
0,099,556 senators
0,099,553 cameron
0,099,546 certification
0,099,458 cv
0,099,444 rhode
0,099,382 parallel
0,099,361 oakland
0,099,346 attached
0,099,345 mit
0,099,312 harvey
0,099,238 dawn
0,099,209 principles
...

E:\>type googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt|more
...
0,027,275 iy
0,027,263 ist
0,027,246 received
0,027,236 job
0,027,221 classes
0,027,216 standards
0,027,204 room
0,027,189 ji
0,027,185 mod
0,027,168 rn
0,027,156 gold
0,027,146 letters
0,027,117 israel
0,027,111 length
0,027,109 sn
0,027,097 environment
0,027,091 citizens
0,027,080 mc
0,027,050 val
0,027,042 gef
...


For some reason enwiki now features 28,803,331 words while a previous version only 13- million! Who knows the reason for such big discrepancy? Maybe they started to phoneticize, and all those 15 million are simply transcription debris!

Anyway these two freely downloadable resources are big and useful, and it is good to have their building-blocks (here only the single words) under your fingertips.

And one example showing how I found an additional adjective by looking up first this word-dumper instead of dictionaries:

Input your pattern(s):
* alacri*
^Z

Note: '^Z' stands for Ctrl+Z keystroke, and the space between first asterisk and 'alacri' stands for Tab keystroke.

The result is autoloaded in NOTEPAD:
0,000,598 alacrity
0,000,089 alacris
0,000,023 alacritech
0,000,023 alacrite
0,000,014 alacrimia
0,000,010 alacrima
0,000,008 alacrityawareness
0,000,006 alacriportana
0,000,006 alacribus
0,000,006 alacria
0,000,004 alacrina
0,000,003 alacrium
0,000,003 alacritude
0,000,002 alacritus
0,000,002 alacritous
0,000,002 alacritate
0,000,002 alacritas
0,000,002 alacritaconsulting
0,000,002 alacrita
0,000,002 alacrimosus
0,000,001 alacritysim
0,000,001 alacrious
0,000,001 alacrinis
0,000,001 alacricity
0,000,001 alacri
0,002,466 alacrity
0,000,546 alacritas
0,000,543 alacris
0,000,504 alacri
0,000,503 alacriter
0,000,478 alacritate
0,000,344 alacritie
0,000,312 alacrius
0,000,305 alacritous
0,000,266 alacritatem
0,000,241 alacrima
0,000,239 alacritv
0,000,201 alacrities
0,000,199 alacriores
0,000,175 alacrior
0,000,160 alacriorem
0,000,126 alacrit
0,000,118 alacritatis
0,000,090 alacritously
0,000,086 alacriori
0,000,083 alacribus
0,000,080 alacrious
0,000,067 alacritye
0,000,067 alacriore
0,000,061 alacritati
0,000,035 alacrite
0,000,030 alacrimia
0,000,019 alacridad
0,000,009 alacritech

Used corpora (in that order):
enwiki-20121201-pages-articles.xml_28803331_words_sorted-by-1st-field.txt
googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt


From this dump I found that two adjectives are in use:
0,000,598 alacrity
...
0,000,002 alacritous
...
0,000,001 alacrious
...
0,002,466 alacrity
...
0,000,305 alacritous
...
0,000,080 alacrious
...


It seems that 'alacritous' outrates 'alacrious'.

Strange HERITAGE defines not 'alacrious'.
But SOED does:

alacrious, adjective.
E17–E18.
[from Latin alacris var. of alacer (see ALACRITY) + -OUS.]
Brisk, lively, active.
* †alacriously adverb: only in 17.

alacrity, noun.
LME.
[Latin alacritas, from alacr-, alacer brisk: see -ITY.]
Briskness, cheerful readiness, liveliness.
* alacritous adjective (rare) brisk, lively, active L19.
* alacritously adverb (rare) L19.


Once (1840-1860) 'alacrious' dominated as you can see in alacritous vs alacrious graph.

Thus, sometimes dictionary lookups can wait until the big picture is obtained and just then to narrow and screen the outcome.

Also having read the article Bigger, Better Google Ngrams: Brace Yourself for the Power of Grammar by Mr. Zimmer I see one major feature every (known to me) analyzer is short of: namely the word-clustering (how words are grouped within the sentence (even the surrounding sentences - harder to implement, though)) which is far more wide and fuzzy than collocations (x-grams for that matter). In future I intend to try to tread this path and lessen the damage.

This would answer many questions like dr koray's:
Which one of these words (uprising, rebellion and riot) can be used for an incident in a prison that is organized by convicts?
For words uprising, rebellion, riot vs prison, convicts a word-clustering-distance would say much, that is how far are e.g. 'riot' and 'prison' from each other. Very very interesting stuff in my eyes.


He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Monday, January 21, 2013 10:04:07 AM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Fastest structureless fuzzy string searching for obtaining Levenshtein Distance?!
Last two nights I was writing a long-awaited (by me at least) x-gram suggester, the time was well-spent since the three heuristics/optimizations, which I was lucky to put in the mix, boosted thunderously the generic Wagner–Fischer algorithm, so:

Enter Galadriel...



"The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965."

E:\_Kaze_Levenshtein_Galadriel>dir/og/oe

01/21/2013 05:18 AM <DIR> Galadriel_logo
01/21/2013 05:27 AM 26 makeEXE.bat
01/21/2013 05:27 AM 322 TESTbigrams.bat
01/21/2013 05:27 AM 79,620,218 4andabove_Gamera.tar.2.sorted.bsc
01/21/2013 05:27 AM 26,593 Galadriel.c
01/21/2013 05:27 AM 61,305 Galadriel.cod
01/21/2013 05:27 AM 58,880 Galadriel.exe
01/21/2013 05:27 AM 598,528 GRAFFITH_r2++_Graphein_2.3.0_Intel_12.1_32bit.exe
01/21/2013 05:27 AM 1,566 KAZE prompt.lnk
01/21/2013 05:27 AM 722 README.txt
01/21/2013 05:27 AM 3,869,529 MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
01/21/2013 05:27 AM 53,460,640 googlebooks-eng-all-1gram-20120701_5038456_words.wrd

E:\_Kaze_Levenshtein_Galadriel>Galadriel.exe 3 psychedlicize MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Galadriel, an x-gram suggesteress using Wagner-Fischer Levenshtein Distance, revision 1+++, copyleft Sanmayce 2013-Jan-21.
Galadriel: Total/Checked/Dumped xgrams: 316,423/175,492/2
Galadriel: Performance: 316,423 xgrams/s

E:\_Kaze_Levenshtein_Galadriel>type Galadriel.txt
psychedelicize
psychedelicized


E:\_Kaze_Levenshtein_Galadriel>Galadriel.exe 3 psychedlicize googlebooks-eng-all-1gram-20120701_5038456_words.wrd
Galadriel, an x-gram suggesteress using Wagner-Fischer Levenshtein Distance, revision 1+++, copyleft Sanmayce 2013-Jan-21.
Galadriel: Total/Checked/Dumped xgrams: 5,038,456/1,537,152/4
Galadriel: Performance: 5,038,456 xgrams/s

E:\_Kaze_Levenshtein_Galadriel>type Galadriel.txt
psychedelicism
psychedlic
psychedelicize
psychedelicized


E:\_Kaze_Levenshtein_Galadriel>

Failing to screen the misspelled 'psychedlic' they (Google) have much work to do, I even more.

Galadriel is a must-have tool, especially when search is against phrases - very useful results appear with Levenshtein >> 1.
The package (110MB) with all the above examples reproduceable is here.

Handiness, while needing a fuzzy search not caring what you are typing, came e.g. for above word 'reproduceable'.
Not knowing the exact word I just wrote 'reproducable' and, voila, not one but two adjectives popped up:

E:\_Kaze_Levenshtein_Galadriel>Galadriel.exe 3 reproducable MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd
Galadriel, an x-gram suggesteress using Wagner-Fischer Levenshtein Distance, revision 1++, copyleft Sanmayce 2013-Jan-19.
Galadriel: Total/Checked/Dumped xgrams: 316,423/237,439/15
Galadriel: Performance: 316,423 xgrams/s

E:\_Kaze_Levenshtein_Galadriel>type Galadriel.txt
improducible
irreproducible
produceable
producible
reproachable
reproduce
reproduceable
reproducible
reproducibles
reproducibly
reproductive
reprovable
unproduceable
unproducible
unreproducible

E:\_Kaze_Levenshtein_Galadriel>

Of course Galadriel shall reinforce the 1-gram/2-gram/3-gram suggesting in MASAKARI revision 6 for all unknown x-grams.

Enjoy!

He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Saturday, August 10, 2013 1:56:03 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Two basic ideas I have been trying to mix into one - Pre/In/Post BINDING-WORDS and pagodas.

An example for the first:
www.sanmayce.com/Pre-In-Post_BINDING-WORD_'on'_Heritage_344-levels.txt.html

An example for the second:
The old goal (currently sub-project 'Pagoda' aka 'Shin-Bashira' 真柱 is after it) is to create an exhaustive repository for all major English words.
Simply, each pagoda is to hold the usage of a single word throughout/across the whole written English.
The cutest thing about pagodas is that they contain precomputed x-grams under one roof thus eliminating the need of mumbo-jumbo algorithms and computational resources altogether.
Of course, the price is 2-9 terabytes (huge for nowadays standards) drive (external memory) space.
Did your knees go soft, if so, see what (Some 40,000 tons of super-reinforced steel) amount of steel was needed to construct the tallest building in the world - 'Tokyo Skytree'.

"A central feature of the tower, which opened to the public May 22, is a system to control swaying. The technology, used for the first time, has been dubbed "shinbashira" after the central pillar found in traditional five-story pagodas. The 375-meter-long, steel-reinforced concrete shinbashira is not directly connected to the tower itself and is designed to cancel out the swaying of the needle-like tower during an earthquake."



The above is not a side-by-side comparison of the size of the Tokyo Sky Tree and some ancient gigantic pagoda.
/http://www.japanflix.com/the-tokyo-sky-trees-secret-weapon-against-earthquakes/



"A sturdy pillar runs from bottom to top through the central axis of the pagoda. All three- and five-storied pagodas in Japan have such a pillar, called a shinbashira. Interestingly, very few similar wooden towers still standing in China or the Republic of Korea have one. It is thought to be a distinctive feature of old wooden towers in Japan."
/http://web-japan.org/nipponia/nipponia33/en/topic/index02.html/

That's how a shrunk pagoda looks like:



As you can see here 'ON' pagoda has 9,628,427 stories, 344 vs 9,628,427 - a scary build-up.
As I see things the movement is two-way, from well-established resources (as those 344 'HERITAGE' phrases) to huge phrase corpora (as GAMERA's x-grams) and vice versa.
Two processes: enriching and screening until they deliver RICH output.
Does anyone see this convergence more clearly, my sight is locked on the most primitive techniques?

I wonder how to present thoroughly in a practical way e.g. 'ON' usage, any ideas?

Ha, while writing I was listening to Tom Jones songs (my mom being a big fanESS) and a verse caught my attention:
You've got such a hold on me

Had we had e.g. 'ON' pagoda order 9, then the golden 8-gram 'you_ve_got_such_a_hold_on_me' would have waiting to be looked upon.


He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Wednesday, September 04, 2013 1:20:00 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
A long long overdue but at last a Text Tool with GUI (Graphical User Interface) or rather a simple SHELL around my latest console search tool comes here.

So, enter Gallowwalker...

- a superfast Windows application to search your files for patterns using Exact/Wildcards/Fuzzy;
- an intuitive SIMPLE interface;
- 14 screenshots illustrating different search modes;
- 100% free, here.

Earlier this year having written the 'search engine' I posted the 'walkthrough' and gave a glimpse from the kitchen. For more info: here.

This endeavor resulted in appearance of first in the INTERNET text searcher utilizing the I/O read bandwidth at its fullest!
For example my humble laptop (Core 2 2200MHz, 2 cores) is equipped with Samsung 470 64GB which gives average linear read 241MB/s at 1MB blocks, obtained with 'Everest'.
From Wikipedia (a 40+GB file searched at 232MB/s) torture you can see how full is my fullest: (241-232)/232*100% = 3.8% deviation: "metal fatigue" searched into 'enwiki-20121201-pages-articles.xml' in 173 seconds i.e. 173s*232MB/s = 173*232 = 40,136MB read.

Let's say your machine is equipped with modern SSD drive and modern CPU, e.g. SATA3 SSD drive (500+MB/s READ) and AMD Vishera (8 threads) or Intel HASWELL (8 threads), then above 173 seconds you can divide by 3 at once which is ONE MINUTE - not bad for FULL-TEXT Wikipedia search.
If your machine is a BEAST, say 16 threads and 64GB RAM, you can expect exact search to operate at 18-30GB/s meaning 2 seconds for traversing the whole Wikipedia, a heart-gladdening performance I call it.

Gallowwalker is a piece of beauty taken from the near future, to say that it is superfast is an understatement, but the thing that makes me happy most is the 'psychedelicism' behind this sub-project - the notion of making the world more colorful more vivid, to ease the pain of not having a versatile word assistant.

And a quick-walkthrough for Levenshtein Distance buttons while I was wondering whether current MASAKARI has 'psychedelicism':

Step 1: Typing the pattern at top.
Step 2: Pressing 'List Textual Files' button.
Step 3: Selecting the 'MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd' file.


Step 4: Pressing '04' button.


Step 5: Pressing 'List Textual Files' button. 'Enter' or Double-Click over 'Kazahana.txt'


Step 6: View your hits.


That's all, the list of all words within Levenshtein Distance 4.

Feel free to ask whatever interests you, I write Gallowwalker mainly for my everyday needs but I am ready to add functionality needed by other English language lovers.

He learns not to learn and reverts to what all men pass by.
leonAzul
Posted: Wednesday, September 04, 2013 4:55:38 PM

Rank: Advanced Member

Joined: 8/11/2011
Posts: 7,836
Neurons: 24,496
Location: Miami, Florida, United States
Have you re-invented SQL, or even egrep?
Applause



"Make it go away, Mrs Whatsit," he whispered. "Make it go away. It's evil."
Sanmayce
Posted: Friday, September 06, 2013 12:03:02 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Leon, my signature explains everything, the approaches which you mention are professional while Gallowwalker is merely an amateurish always-to-be-under-your-fingertips 'word-calculator'.
I returned back to basics not to 'reinvent the wheel' but rather to make a 16-wheeler, a vehicle OF YOUR OWN capable to drive you in style towards your destination.
SQL is from different genre not even opera, not in the class of full-text search tools, it is a thousand times faster, but guess what, it is like using a rental limo - the voyage won't be as free as it should, I mean the price of delivering such power is so salty and comes with such dependencies that you can kiss your walkabout goodbye.
Have you noticed how happily relaxed are the guys riding e.g. 3-wheelers, how untroubled and enjoying the moment.


Ha, a flashback from 'FEARLESS' starring Jeff Bridges, where our guy rented an ugly car and drove it with head outside the window and windblown hair, one of the best movies I have ever seen.

Yesterday while watching 'UFC 164' I heard the commentator saying '... where Pettis excels at ...', immediately this phrase was written in my stone tablets, therefore the added (4 minutes) clip showcased the 'excel at' usage in Wikipedia.
As you can see from the clip, Wikipedia editors have work to do:
[*~excel*at] 0,000,507 excel_at /2andabove_enwiki-20121201-pages-articles.2.txt.sorted/
[*~excel*at] 0,000,390 excels_at /2andabove_enwiki-20121201-pages-articles.2.txt.sorted/
[*~excel*at] 0,000,005 excells_at /2andabove_enwiki-20121201-pages-articles.2.txt.sorted/ !The five (100 times as low) occurrences suggest WRONG USAGE!
[*~excel*at] 0,000,004 excell_at /2andabove_enwiki-20121201-pages-articles.2.txt.sorted/ !The four (100 times as low) occurrences suggest WRONG USAGE!

It took me 10 minutes to track/gun these 5+4 lines down:

[*excell^ at*] *Keep. Fascinating, obscure perhaps but not even borderline IMO. Exactly the sort of stuff Wikipedia excells at capturing. [[User:Andrewa|Andrewa]] 12:53, 18 Nov 2004 (UTC) /enwiki-20121201-pages-articles.xml/
[*excell^ at*] Weiskopf got into golf course design working initially with [[Jay Morrish]], but now has his own established practice.&lt;ref name=&quot;ag&quot;/&gt; He has at least 40 courses to his credit in many parts of the world, including the ''Monument'' and ''Pinnacle'' courses at [[Troon North Golf Club]] in [[Scottsdale, Arizona]];&lt;ref&gt;{{cite web |title=Course Design: Tom Weiskopf Excells at New Job Title |last=Holland |first=David R. |publisher=World Golf |url=http://www.worldgolf.com/course-design/tom-weiskopf.htm |accessdate=April 17, 2010}}&lt;/ref&gt; and Loch Lomond, venue of the [[Scottish Open (golf)|Scottish Open]] from 1995 to 2010.&lt;ref name=&quot;ag&quot;/&gt; Many of the courses have received considerable praise. He has also worked as a golf analyst for [[CBS Sports]],&lt;ref name=&quot;ag&quot;/&gt; covering the 1981 and 1985 to 1995 Masters. Since 2008, he has contributed to [[ABC Sports]] and [[ESPN]]'s coverage of [[The Open Championship]]. /enwiki-20121201-pages-articles.xml/
[*excell^ at*] | ShortSummary = When Chocolat discovers a star fencer by the name of Night-switch, she sees that this fencer is actually a magic user and excells at taking hearts. Chocolat sparks an idea from this and then decides to do some fencing too. At the fencing club she meets Pierre, a middle school student who seems to never fail in making others admire him. He challenges Chocolat that if she wins one point from him, then she can join the club. She takes up his offer but loses to Pierre. However, Pierre's charm makes Chocolat reveal her heart, endangering her because if a wizard or witch has her/his heart taken, their soul is lost and they will die. /enwiki-20121201-pages-articles.xml/
[*excell^ at*] #'''Strongest Support'''. What more can I say about this one?! He quite simply excells at every endevor he puts himself to...I have no doubt he will do likewise as an Arbitrator.--[[User:R.D.H. (Ghost In The Machine)|R.D.H. (Ghost In The Machine)]] 11:14, 17 December 2006 (UTC) /enwiki-20121201-pages-articles.xml/
[*excell^ at*] Dale studied at the University of Durham after failing to make a notable impression on the game as a teenager. However, it was Greame Fowler, head coach of the Durham University Cricketing Centre of Excellence that gave Dale new oppourtunities to excell at first class level. Although Dale achieved a degree, his cricket career diminshed within 3 years in academia owing much to a back injuy in 2004. /enwiki-20121201-pages-articles.xml/
[*excell^ at*] In 2012 the Pinecrest Wrestling team took 2nd in the NC State team dual tournament and won the State title in the NC Team individual tournament. With Pinecrest's victory at the State tournament they ended the streak of Parkland High School who had won the State title the five previous years in a row and were looking for six. Pinecrest qualified seven wrestlers to go to the State tournament which set a historical high for the school.Senior Zac lupien at 113 and sophomore Rider Excell at 160 both and had sold performances at the tournament by winning a hand full of matches each. Senior and captain Garrett Bateman ended his remarkable career at Pinecrest by placing 5th at the state tournament, racking up a few big wins earning key team points for Pinecrest. Freshman Irvin Enriquez finished his historical first year at Pinecrest by placing 4th In the state at the 106 pound weight class. This is the best finish by a freshman from Pinecrest in the state tournament in school history. Junior Luke Fetla also had and an equally impressive performance at 220 pounds where he also placed 4th in the state. A month after the State tournament Fetla also became the first national all-American in Pinecrest history when he placed 7th at the NCHS junior nationals in Virginia Beach. Sophomore Dallas Roemer had a season peaking performance at the state tournament. Roemer surprised many when beat many high ranking upperclassman to make it to the state finals as only a sophomore. Even though Dallas lost a close match in the finals to the top ranked wrestler in the state the future still looks very bright from Dallas as he has two more years to states his dominance in NC. Older brother Dustin Roemer wrestling at 152 pounds was the final member of the seven wrestlers who qualified for States. Dustin was a senior and captain for the Pinecrest Wrestling team. Dustin was also the State champ at the 145 pound weight class from the year before. Roemer had also signed to wrestle at the University of Virginia before his senior season had started. Roemer had a solid performance at the State tournament earing 3 pins out of his 4 matches, one of those pins came in the State finials which also not only sealed the State Title for himself but also the team state title for Pinecrest. Dustin finished his Senior year with a record of 56-0,he also setting a school record with 48 pins in a single season. The Pinecrest Wrestling team in 2012 was coached by; head coach Travis Flinchum and assistant coaches Sam Narayan,Andy Socha, and Zach Martin. When Flinchum took over the Pinecrest wrestling team in 2008 the team had taken seocnd to last in the conference and only had one wrestler qualify for the state tournment. With in four years Flinchum with his superior coaching abilities and the with the help of his assistant coaches transformed the Pinecrest wrestling team into the number one wrestling team in North Carolina. /enwiki-20121201-pages-articles.xml/
[*excell^ at*] ::::CPUs and GPUs are fairly different machines. The former is usually intended for general-purpose computing (though they are often optimized for a special application), while the latter usually excells at certain types of vectorizable problems. You can try to compare them on a per-application basis, but be careful about attributing more significance to such comparisons than is appropriate. A friend of mine did some research in solving partial differential equations (Poisson's) numerically with GPUs, and in his demonstrations the GPU could solve the equation sets an order of magnitude or two faster than a C program on a late CPU. However, he also pointed out that the GPU is impractical for code that branches (conditionals, complex loops) since it isn't designed for general purpose program execution. -- [[User:Matt Britt|mattb]] &lt;code&gt;@ 2007-04-10T15:48Z&lt;/code&gt; /enwiki-20121201-pages-articles.xml/
[*excell^ at*] Excell compiled a collection of hymns and gospel songs around 1880 which was published as ''Sacred Echoes'' by John J. Hood of [[Philadelphia]] in 1881, the year he marked as his start in the business. ''Sing the Gospel'', published around the time of the move to Chicago, was issued under the &quot;E. O. Excell&quot; imprint. ''Echoes of Eden'' followed two years later in 1884. An archetype of later &quot;combined&quot; song books was produced in 1885 when contents of ''Sing the Gospel'', ''Echoes of Eden'' and limited new material were repackaged into ''The Gospel in Song'', a hymnbook later advertised to contain the songs and solos sung by Excell at Sam Jones' Gospel meetings. /enwiki-20121201-pages-articles.xml/
[*excell^ at*] Peter Nelson was born on the 26 of April 1931 in the [[Adelaide]] suburb of [[Black Forest, South Australia|Black Forest]]. The son of [[wheelwright]] and automotive manufacturing engineer Frederick Nelson and his wife Winifred (née Mostyn), he was educated at [[Christian Brothers College, Adelaide]]. He idolised his brothers, and when they returned from [[World War II]] he followed them into whatever sports they took up. His brothers remembered him as ''someone who could take up any sport and excell at it''. In fact he was so good at [[Swimming (sport)|swimming]] that he was offered a place on the South Australian state swimming squad. However by this time he had discovered [[cycling]]. /enwiki-20121201-pages-articles.xml/

It turns out that one of these 9 suggestions/alerts is false, namely: '... sung by Excell at Sam Jones' Gospel meetings.'

Had I been a professional I would already have ported it for tablets (the most promising gadget in my view), but my laziness dominates my activity, he-he.
Three years ago on one Russian forum I posted a prediction, made by some hardware systems analysts, stating that 2 billion tablets are going to flood the world by the end of 2014. AFAIK pretty correct so far.
Now, that number (2,200,000,000) has been haunting me ever since I saw it - the number of people using English language.
The math is like that: at each and every tablet a NUMBER calculator is available but a WORD calculator is missing - another reason for incoming another myriad of wrong wording. A wakeup call for all developers.

And what is wrong with AMATEURISM, it is derived from [French, from Latin amātor, lover, from amāre, to love.]
Whoever said that amateurism is inferior to professionalism, must have been a woodenheaded "professional".
I want to see the works of professionals who love what they do - a rare sight indeed.

Couleur de vie, couleur de joie / Colour of life, colour of joy
Où sont tes larmes, tous tes mots / Where are your tears, all your words
Et puisque t'en as marre de voir / And since you're so tired of seeing
Tous les gens souffrir / All the people suffer
Cette notion trop simple et naïve / This simple, naive idea
Parvenir à vaincre une loi / Will manage to overcome a law
Loi invisible, si injuste / An invisible law, so unjust
Si injuste, si sombre / So unjust, so dark

Vis ta vie, elle est si belle / Live your life, it is so beautiful
Vis ta vie, c'est la tienne / Live your life, it is yours
Vis ta vie, sans mensonge / Live your life without lies
Vis ta vie, comme tu veux / Live your life however you want to

Couleur de vie, couleur de joie / Colour of life, colour of joy
Où est ce feu philosophique / Where is that philosophical spark
Ton esprit condensé, pulverisé / Your spirit dragged down, pulverized,
Subtilé par toutes tes larmes / Subjugated by your tears

Vis ta vie, elle est si belle / Live your life, it is so beautiful
Vis ta vie, c'est la tienne / Live your life, it is yours
Vis ta vie, sans mensonge / Live your life without lies
Vis ta vie, comme tu veux / Live your life however you want to


Wordless I am.

He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Tuesday, September 10, 2013 1:33:39 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Due to 717 megabytes of size I was hesitant to include 'Wikipedia 2012' and 'Google Books 2012' word corpora along with MASAKARI wordlist in the initial release of Gallowwalker.
On second thought I realized that few people have time to extract words from above huge projects, so I included them, thus effective word lookups are available in Gallowwalker for the 3 above-mentioned corpora, also I put in the mix and GAMERA's corpus.
The file _GW.7z now is 128MB long.

Some stats:
- 'Wikipedia 2012' features 28,803,331 words, 1andabove_enwiki-20121201-pages-articles.1.txt.sorted (590,469,840 bytes);
- 'Google Books 2012' features 5,038,456 words, googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt (103,845,200 bytes);
- 'GAMERA revision 17' features 2,805,843 words, 4andabove_Gamera17LBL.1.txt.sorted (58,445,626 bytes);
- 'MASAKARI revision 3' features 316,423 words, MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd (3,869,529 bytes).

While watching 'Iron Man 3' I overheard a rare word: 'weaponizable', the context was that by hacking the DNA of a given creature it could be turned into a weapon.
Immediately urgency arose to check how many '*izable' words MASAKARI have also SOED, in short it hurts not to have such a descriptive class of adjectives.

First, SOED gave next 80 words for '*izable':
authorizable
autoxidizable
baptizable
civilizable
cognizable
colonizable
conceptualizable
criticizable
crystallizable
customizable
diagonalizable
diazotizable
factorizable
fertilizable
formalizable
fossilizable
gelatinizable
generalizable
hybridizable
hypnotizable
incognizable
inoxidizable
ionizable
irrealizable
irrecognizable
localizable
magnetizable
memorizable
mesmerizable
metabolizable
metrizable
mineralizable
mobilizable
modernizable
monopolizable
moralizable
nasalizable
nominalizable
normalizable
operationalizable
organizable
oxidizable
passivizable
peptizable
phagocytizable
poeticizable
polarizable
polymerizable
prizable
pulverizable
realizable
recognizable
recrystallizable
renormalizable
resizable
seizable
serializable
sizable
solubilizable
standardizable
sterilizable
summarizable
uncategorizable
uncivilizable
uncriticizable
uncrystallizable
universalizable
unlocalizable
unnaturalizable
unorganizable
unrealizable
unrecognizable
unseizable
unsizable
unsystematizable
utilizable
vaporizable
verbalizable
visualizable
volatilizable


Next, MASAKARI (using Gallowwaker) gave next 162 words for '*izable':
[*izable] acclimatizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] acetylizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] achromatizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] aggrandizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] alcoholizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] alkalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] amortizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] antagonizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] authorizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] autoxidizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] axiomatizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] baptizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] capitalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] capsizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] carbonizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] catechizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] categorizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] characterizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] civilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] cognizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] colonizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] computerizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] criticizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] crystallizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] customizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] demagnetizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] diagonalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] diazotizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] dramatizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] dysoxidizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] electrizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] enolizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] epistolizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] etymologizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] factorizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] fertilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] feudalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] formalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] fossilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] gelatinizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] generalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] graphitizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] harmonizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] hybridizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] hypnotizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] immortalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] impolarizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] incognizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] incrystallizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] inorganizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] inoxidizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] ionizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] irrealizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] irrecognizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] linearizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] lionizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] localizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] magnetizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] mechanizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] memorizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] mesmerizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] metabolizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] metrizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] mineralizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] mobilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] modernizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] monetizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] monopolizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] noncatechizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] noncognizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] noncrystallizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] nonlocalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] nonmagnetizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] nonoxidizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] nonpolarizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] nonpolymerizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] nonrealizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] nonrenormalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] nonvulcanizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] normalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] operationalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] optimizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] organizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] ostracizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] oxidizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] oxygenizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] parallelizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] parameterizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] patronizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] pectizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] penalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] peptizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] personalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] polarizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] polymerizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] precognizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] preutilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] prizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] pulverizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] quantizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] rationalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] realizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] recognizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] renormalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] resizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] rhythmizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] satirizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] seizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] serializable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] sizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] socializable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] standardizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] sterilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] subsidizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] summarizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] synchronizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] synthesizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] systemizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] tautomerizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unanatomizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unantagonizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] uncapsizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] uncategorizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] uncharacterizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] uncivilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] uncognizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] uncriticizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] uncrystallizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] undemagnetizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] undramatizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unfertilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] ungelatinizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unhypnotizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] universalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unlocalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unnaturalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unorganizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unoxidizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unpatronizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unpolarizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unprizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unrealizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unrecognizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unseizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unsizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unsocializable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unstandardizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unsummarizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unsympathizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unsystemizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] untheorizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] unutilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] utilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] vaporizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] vectorizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] verbalizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] victimizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] virtualizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] visualizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] vitriolizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] volatilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*izable] vulcanizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/

Long story short (clashing the SOED's 80 and MASAKARI's 162), the next 10 words from SOED are new to MASAKARI r.3:
conceptualizable
moralizable
nasalizable
nominalizable
passivizable
phagocytizable
poeticizable
recrystallizable
solubilizable
unsystematizable


This atrocity (especially lacking the beautiful 'poeticizable') is to be ended in next r.4 of MASAKARI.

I coudn't fight the urge to derive 'UNmodernizable/NONmodernizable' from 'modernizable', 'НЕмодернизируем' - has a nice ring in my native language as well, don't know what substitutes are out there for 'unable to be modernized'.
In that regard 'UNutilizable' is very handy, instead of compound wording 'unable to be utilized'.
Another useful prefix alongside with 'UN'&'NON' is 'RE', it gives 'REutilizable' a distant cousin of 'rechargeable' (with 2/6 occurrences respectively in Wikipedia).
How about compound prefixes? Like 'UNREutilizable' = 'not able to be reutilized'.
Also I spotted 'UNDEmagnetizable' i.e. the use of 'DE'.

Let's see what next 4 word corpora hold for us when searching '*weaponizable', '*modernizable', '*utilizable':

Looking for '*weaponizable' in Wikipedia 2012:
[*weaponizable] 0,000,005 weaponizable /1andabove_enwiki-20121201-pages-articles.1.txt.sorted/

Looking for '*weaponizable' in Google Books 2012:
[*weaponizable] 0,000,030 weaponizable /googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt/

Looking for '*weaponizable' in GAMERA revision 17:
NONE

Looking for '*weaponizable' in MASAKARI revision 3:
NONE

Looking for '*modernizable' in Wikipedia 2012:
NONE

Looking for '*modernizable' in Google Books 2012:
[*modernizable] 0,000,056 modernizable /googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt/

Looking for '*modernizable' in GAMERA revision 17:
NONE

Looking for '*modernizable' in MASAKARI revision 3:
[*modernizable] modernizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/

Looking for '*utilizable' in Wikipedia 2012:
[*utilizable] 0,000,027 utilizable /1andabove_enwiki-20121201-pages-articles.1.txt.sorted/
[*utilizable] 0,000,002 reutilizable /1andabove_enwiki-20121201-pages-articles.1.txt.sorted/
[*utilizable] 0,000,002 inutilizable /1andabove_enwiki-20121201-pages-articles.1.txt.sorted/

Looking for '*utilizable' in Google Books 2012:
[*utilizable] 0,000,673 utilizable /googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt/
[*utilizable] 0,000,356 unutilizable /googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt/
[*utilizable] 0,000,164 nonutilizable /googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt/
[*utilizable] 0,000,082 inutilizable /googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt/
[*utilizable] 0,000,085 reutilizable /googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt/

Looking for '*utilizable' in GAMERA revision 17:
[*utilizable] 0,000,176 utilizable /4andabove_Gamera17LBL.1.txt.sorted/
[*utilizable] 0,000,010 unutilizable /4andabove_Gamera17LBL.1.txt.sorted/

Looking for '*utilizable' in MASAKARI revision 3:
[*utilizable] preutilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*utilizable] unutilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/
[*utilizable] utilizable /MASAKARI_General-Purpose_Grade_English_Wordlist_r3_316423_words.wrd/

Obviously alternatives to 'UN' exist, namely 'IN'&'NON' prefixes.

And of course couldn't stand to put in the queue the UNobvious/INobvious/NONobvious 'UNweaponizable' i.e. 'with no undo/reverse once turned into weapon', how about the rest if any.
I still don't understand the 'RULE' of how to use 'UN'&'IN'&'NON' prefixes, from above three SOED validates only 'UNobvious', are the rest unacceptable?
Strange, MASAKARI contains all the three!
unobvious vs inobvious vs nonobvious
Interesting, the graph shows how 'nonobvious' trumps 'unobvious' since 1970.

We have 'INsensitive' and 'UNacceptable' in use but not 'UNsensitive/NONsensitive' and 'INacceptable', why, what decides which prefix is more fit?
Ha, a follow up question: 'MISfit' does not have substitutes like 'UNfit','INfit''NONfit', right?!
Many times in the past I used to ... use 'UNsensitive' instead of 'INsensitive'.
insensitive vs unsensitive vs nonsensitive

I asked myself what variants are plausible (with given prefixes at right):
magnetizable: UN/IN/NON/UNDE/DE/RE
rechargeable: UN/IN/NON/UNDE/DE/RE
weaponizable: UN/IN/NON/UNDE/DE/RE

So far I found next ones in use:
magnetizable: DE/NON/UNDE
rechargeable: UN/RE/NON/NONRE
weaponizable: UN/IN/NON

Just one example: 'The production of potential weapon products is more or less a nonissue; one can readily produce a fuel that is appropriate for power generation but nonweaponizable (i.e. won't sustain the kind of supercritical reactor that results in an explosion).'

SOED has many long interesting words given not in their full form but chopped with apostrophes, I already manually extracted these words in V-Z range, all chopped A-U words must be added to current revision, that is, next revision 4 is to house them.

In short, shooting with Gallowwalker at those 4 word corpora should ease the screening process of what is what i.e. how your patterns/hits are distributed across different wordlists.


He learns not to learn and reverts to what all men pass by.
leonAzul
Posted: Tuesday, September 10, 2013 1:52:14 PM

Rank: Advanced Member

Joined: 8/11/2011
Posts: 7,836
Neurons: 24,496
Location: Miami, Florida, United States
Sanmayce wrote:

Long story short (clashing the SOED's 80 and MASAKARI's 162), the next 10 words from SOED are new to MASAKARI r.3:
conceptualizable
moralizable
nasalizable
nominalizable
passivizable
phagocytizable
poeticizable
recrystallizable
solubilizable
unsystematizable


This atrocity (especially lacking the beautiful 'poeticizable') is to be ended in next r.4 of MASAKARI.

Yes, that is a pity as it is quite pulchritudenizable.

Yet, if you don't mind my saying so, could you please look the other way and silently forget about solubilizable.

I find it most abominatizable.
Whistle


"Make it go away, Mrs Whatsit," he whispered. "Make it go away. It's evil."
leonAzul
Posted: Tuesday, September 10, 2013 2:32:01 PM

Rank: Advanced Member

Joined: 8/11/2011
Posts: 7,836
Neurons: 24,496
Location: Miami, Florida, United States
Sanmayce wrote:

Obviously alternatives to 'UN' exist, namely 'IN'&'NON' prefixes.

And of course couldn't stand to put in the queue the UNobvious/INobvious/NONobvious 'UNweaponizable' i.e. 'with no undo/reverse once turned into weapon', how about the rest if any.
I still don't understand the 'RULE' of how to use 'UN'&'IN'&'NON' prefixes, from above three SOED validates only 'UNobvious', are the rest unacceptable?
Strange, MASAKARI contains all the three!
unobvious vs inobvious vs nonobvious
Interesting, the graph shows how 'nonobvious' trumps 'unobvious' since 1970.

We have 'INsensitive' and 'UNacceptable' in use but not 'UNsensitive/NONsensitive' and 'INacceptable', why, what decides which prefix is more fit?

Mostly habit, but there are some guidelines.

The universal prefix of negation is "non-". It is also the most predictable in meaning "not", in other words, a simple negation without a sense of deficiency or lesser value. It expresses a lack of something.

The next most regular is "un-", which almost always includes a sense of reversal. It expresses an opposite of something.

As negations, "in-", "im-" and "ir-" are tricky, but often follow some sort of musical pattern of matching the sound of the word to which they are prefixed.

"In-" has the further complication of also being derived from the meaning of the preposition "in-" or "into-". Thus, just as an affix that is placed at the beginning is a prefix, one that is placed in the middle is called an infix, which would make it infixable, logically enough, but not unfixable or irreparable.
Whistle

Sanmayce wrote:

Ha, a follow up question: 'MISfit' does not have substitutes like 'UNfit','INfit''NONfit', right?!

They all have uses, some as adverbs, some as nouns (the original adjectival meaning has gone out of use).

A misfit doesn't fit in.
Someone (or something) unfit is not in the proper condition.
A tenon is a good example of an infit. (That is an extremely rarely used word.)
Matching trousers, vest, and jacket make a good outfit.
Trying to get a bushel of peaches into a one-peck basket is a non-fit.
Whistle


Sanmayce wrote:

I asked myself what variants are plausible (with given prefixes at right):
magnetizable: UN/IN/NON/UNDE/DE/RE
rechargeable: UN/IN/NON/UNDE/DE/RE
weaponizable: UN/IN/NON/UNDE/DE/RE

So far I found next ones in use:
magnetizable: DE/NON/UNDE
rechargeable: UN/RE/NON/NONRE
weaponizable: UN/IN/NON

Just one example: 'The production of potential weapon products is more or less a nonissue; one can readily produce a fuel that is appropriate for power generation but nonweaponizable (i.e. won't sustain the kind of supercritical reactor that results in an explosion).'

SOED has many long interesting words given not in their full form but chopped with apostrophes, I already manually extracted these words in V-Z range, all chopped A-U words must be added to current revision, that is, next revision 4 is to house them.

In short, shooting with Gallowwalker at those 4 word corpora should ease the screening process of what is what i.e. how your patterns/hits are distributed across different wordlists.

Happy hunting!
Applause


"Make it go away, Mrs Whatsit," he whispered. "Make it go away. It's evil."
Sanmayce
Posted: Tuesday, September 10, 2013 3:08:35 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Thank you Leon,
many things I don't know, that's why my intent is first to collect the words which are useful (their low rank i.e. level of occurrences is not problematic) and then to provide sentences/contexts containing them.

SOED says for 'pulchritude':
Beauty.
- pulchritudinous adjective beautiful E20.


Leon you use 'pulchritudenizable', should it not be pulchritude+ize+izable i.e. 'pulchritudizable'.

The beautiful 'beautify' I have encountered in many good contexts not 'beautize' though, an old issue of mine in some old posts. Another 'rule' which I am unfamiliar with: how to form transitive verbs from the nouns.
That's why I think 'pulchritudizable' is not a non-fit, not so outrageous I mean.

>The universal prefix of negation is "non-". It is also the most predictable in meaning "not", in other words, a simple negation without a sense of deficiency or lesser value. It expresses a lack of something.
>The next most regular is "un-", which almost always includes a sense of reversal. It expresses an opposite of something.
>As negations, "in-", "im-" and "ir-" are tricky, but often follow some sort of musical pattern of matching the sound of the word to which they are prefixed.

This is very interesting, I have had many troubles (and still have) with 'in'/'ir'/'dis'/'un'/'de' prefixes.
Just one example: I want to sense the nuance of IRREVERSIBILITY versus NEGATING:
'UNweaponizable' i.e. 'with no undo/reverse once turned into weapon'.
'NONweaponizable' i.e. 'not able to be turned into weapon'.
BUT, sometimes 'UN' and 'NON' overlap, don't they! As in 'unexisting' vs 'nonexisting'.
As you can see from the graph, trading places occurred in 1858.

leonAzul wrote:
Happy hunting!
Applause


Thank you, it is really a hunting.

He learns not to learn and reverts to what all men pass by.
leonAzul
Posted: Tuesday, September 10, 2013 6:25:36 PM

Rank: Advanced Member

Joined: 8/11/2011
Posts: 7,836
Neurons: 24,496
Location: Miami, Florida, United States
Sanmayce wrote:
Thank you Leon,
many things I don't know, that's why my intent is first to collect the words which are useful (their low rank i.e. level of occurrences is not problematic) and then to provide sentences/contexts containing them.

SOED says for 'pulchritude':
Beauty.
- pulchritudinous adjective beautiful E20.


Leon you use 'pulchritudenizable', should it not be pulchritude+ize+izable i.e. 'pulchritudizable'.

To be honest, I am playing, so don't take this too seriously.

"Pulchritudizable" (if there were such a word) would describe someone like a demi-god who could create beauty (pulchritude) itself.

"Pulchritudenizable" would be something that is able to be made beautiful (pulchritudinous). The switch to "-den-" is after the pattern of forming a participle from a noun like "woolen" or "wooden".

But as long as we are playing, this probably would sound more legitimate as "pulchritizable".
Whistle

Sanmayce wrote:

The beautiful 'beautify' I have encountered in many good contexts not 'beautize' though, an old issue of mine in some old posts. Another 'rule' which I am unfamiliar with: how to form transitive verbs from the nouns.
That's why I think 'pulchritudizable' is not a non-fit, not so outrageous I mean.

Perhaps your numbers crunching will reveal this. Many of the patterns have more to do with rhythm and sound than with grammatical rules.

Sanmayce wrote:

>The universal prefix of negation is "non-". It is also the most predictable in meaning "not", in other words, a simple negation without a sense of deficiency or lesser value. It expresses a lack of something.
>The next most regular is "un-", which almost always includes a sense of reversal. It expresses an opposite of something.
>As negations, "in-", "im-" and "ir-" are tricky, but often follow some sort of musical pattern of matching the sound of the word to which they are prefixed.

This is very interesting, I have had many troubles (and still have) with 'in'/'ir'/'dis'/'un'/'de' prefixes.
Just one example: I want to sense the nuance of IRREVERSIBILITY versus NEGATING:


In a sense, negation can mean two subtly different things: a lack of something or its polar opposite. Perhaps a simple mathematical example will help.

Consider a thing with the states of +1, 0, and -1.

At +1. it exists (to do); at 0, it is not there (to do not), and at -1, it is the opposite of what it was (to undo).


Where it is tricky is figuring out which words have a meaningful polar opposite, and which can only be 1 or 0.

Sanmayce wrote:

'UNweaponizable' i.e. 'with no undo/reverse once turned into weapon'.
'NONweaponizable' i.e. 'not able to be turned into weapon'.
BUT, sometimes 'UN' and 'NON' overlap, don't they!

Indeed, that is the case with these two words, because what is being negated is the ability to be weaponized at all.
Thus, "unweaponizable" might be given a subtle distinction as something which could have the ability to be weaponized restored (reversed), it usually means the same as nonweaponizable or having no weaponizability ever.

The meaning builds out from the stems:

weapon - a concrete noun is or it isn't, so a non-weapon makes sense, but an unweapon would be a very odd concept—although I suppose a plowshare could be considered an "unsword"

weapon==>weaponize - this has now been transformed to mean a process which could be reversible, so there is a distinction between non-weaponize (don't make into a weapon) and unweaponize (reverse the process and revert to being a non-weapon)

weapon==>weaponize==>weaponizable - this now describes something that is capable of being transformed into a weapon, so any affix one adds will modify that capability, not the rest of the meaning

Sanmayce wrote:

As in 'unexisting' vs 'nonexisting'.
As you can see from the graph, trading places occurred in 1858.

Earlier in the thread, several of us mentioned something about this process. This can quickly show which words, spellings, and collocations are more prevalent, but it needs semantical analysis to discover the patterns behind them or shifts in meaning. In other words, this ngram search can only tell you the relative frequency of occurrance of the words, but it can't tell you directly whether this represents an equivalence or shift in meaning.

"Make it go away, Mrs Whatsit," he whispered. "Make it go away. It's evil."
excaelis
Posted: Tuesday, September 10, 2013 9:04:09 PM

Rank: Advanced Member

Joined: 6/30/2010
Posts: 10,981
Neurons: 32,652
Location: Toronto, Ontario, Canada
I still don't understand what this does for you.

Sanity is not statistical
Sanmayce
Posted: Thursday, September 12, 2013 11:10:26 AM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Leon, thanks for the explanations, for a long time I have been baffled by these transformations.

>... I am playing ...

Yes, I sensed that, playfulness is a key thing.

>In a sense, negation can mean two subtly different things: a lack of something or its polar opposite. Perhaps a simple mathematical example will help.
>Consider a thing with the states of +1, 0, and -1.
>At +1. it exists (to do); at 0, it is not there (to do not), and at -1, it is the opposite of what it was (to undo).
>Where it is tricky is figuring out which words have a meaningful polar opposite, and which can only be 1 or 0.


Aaall riighht! That type of illustrating rocks.

>This can quickly show which words, spellings, and collocations are more prevalent, but it needs semantical analysis to discover the patterns behind them or shifts in meaning. In other words, this ngram search can only tell you the relative frequency of occurrance of the words, but it can't tell you directly whether this represents an equivalence or shift in meaning.

I vaguely understand what you say, and yes, stats/occurrences are not the last instance when judging what is what, however this is my world for now, my digging is superficial I realize that.

@excaelis
>I still don't understand what this does for you.
Nothing special really, for a native hoarding words out of contexts may seem futile but not for me, in order to have a strong base one should strengthen the basics, in my case providing rich 1-gram files as first step followed by 2/3/4/5-gram files, they are interconnected: mostly from left-to-right. By having, say, 750000 (as OED does) words, I could filter/limit/reduce the number of the following 2-gram file DRASTICALLY, that is, to get rid of garbage.
Excaelis, I think me being a ESL dummy forever is one great advantage while being a native I consider disadvantageous simply because those people take so many things in language for granted which clouds even kills the process of questioning. Simply it is my way or as one old song goes 'WALK THIS WAY' - it is my workout needed for future (gun)fights, enough.

While browsing SOED for ', prefix' next "weird" words appeared:

EN-, prefix
ENfree verb trans. set free L16–E17.
ENwood verb trans. cover with trees E17.

TRANS-, prefix
TRANSmake verb trans. [translating Greek metapoiein] (chiefly Theology) make into something different, refashion; transelement: M19.
TRANSmarginal adjective beyond the margin of normal consciousness, subliminal E20.
TRANSmortal adjective (chiefly poet.) beyond what is mortal, immortal M20.
TRANSmundane adjective that is or lies beyond the world L18.
TRANSnatural adjective †(a) supernatural; (b) rare of transmuted nature: M16.
TRANSqualify verb trans. (rare) change in quality M17

DIS-, prefix
DISindividualize verb trans. divest of individuality M19
DISunify verb trans. undo or prevent the unity of L19.

UN-, prefix
UNbarbarize verb trans. make less barbarous; civilize: M17.
UNfrenchify verb trans. (rare) L16.
UNdarken verb trans. dispel the darkness from, make light L16.

Note: Three instances of 'ible' or/and 'able':
transcend: transcendible adjective (rare) able to be transcended L17.
add: addable or addible adj.
transcribe: transcribable adj.

This speech of hers is nearly UNtranscribable!
Is this savage UNbarbarizable?


Similarly to HERITAGE's definition of:
computerize
1. To furnish with a computer or computer system.
2. To enter, process, or store (information) in a computer or system of computers.
computerizable adj.


And some raw stats (looking in MASAKARI) for:
'*ify' vs '*ize' = 286 vs 2181
'*ified' vs '*ized' = 324 vs 1491
'*ifiable' vs '*izable' = 69 vs 162

That is, '-ize' forming is more common than '-ify'.

Something that can be made (more) clear, how would you describe it, 'clarifiable' is a good one.
After looking here and there I couldn't find the word describing the ability something/someone to be beautified, a shame, 'beautifiable' is my suggestion, can anyone trump (i.e. come up with better one) it!

SOED helped me to distinguish DE- and UN- prefixes:

DE-, prefix
3. With privative sense (denoting removal or reversal), in verbs from Latin, as decorticate etc., from French, as debauch, defrock, etc., and as a freely productive prefix, forming verbs (with derivs.) from verbs, as de-acidify, decentralize, decentralization, de-escalate, depressurize, desegregate, etc., or from nouns, as defuse, de-ice, delouse, detrain, etc.

UN-, prefix
1. Prefixed to adjectives, ppl adjectives, and nouns (esp. abstract nouns) to express negation or privation, as uneducated, unfair, unhappiness, unnourishing, unrest; sometimes expressing a reversal of sense, as unselfish, unsociable (in such cases a simple neg. is expressed using NON-).

Such a gem-word makes me unstoppable sister-words come naturally by itself:
- 'UNbeautifiable/NONbeautifiable' - unable to be beautified, beauty is unable to be added;
- 'DISbeautifiable/DEbeautifiable' - beauty is able to be removed/reversed:

Ay-ay, what have I done, sisters of 'uglify' knock on the door...

Exhausting different wordings is a never-ending story and perhaps the most joybringing GAME.


He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Sunday, September 15, 2013 2:15:58 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
A forum member (davedave) asked about 'HEART/HEARTED' usage, somehow I overlooked some nifty 'heart' phrases (3-grams):

Last night while playing with 'Gallowwalkers' subtitles 'eternally_pure_of_heart' popped up, that forced me to check how 'heart' goes with other prepositions.



'pure of heart' vs 'pure at heart' vs 'pure in heart'

... only the pure at heart want to see God.” If that's what gives you satisfaction, then you have been transformed, ...
/Gazing Through a Prism Darkly: Reflections on Merold Westphal's ... edited by B. Putt/

This is not our battle, this war is against Jesus Christ and we are just but the vessels he must use. We need to open and clear a space for him to come in, we will be his true temples and we cannot house him if we are not pure at heart. We will ...
/Ensnared By Ndivhuho Thavhana/

... that the Dasyus were the original inhabitants of the land; they were brave, pure at heart, and upright in their conduct.
/Aryans, Jews, Brahmins: Theorizing Authority through Myths of Identity By Dorothy M. Figueira/

So what is the blessedness of the pure in heart? "They shall see God." So it must be impurity of heart that blinds us to God.
/The Beatitudes for Today By James C. Howell/

How above three prepositions (of/at/in) collocate with 'heart' is a must-know.

I didn't know next two BEAU words:
http://www.thefreedictionary.com/beatitude
http://www.thefreedictionary.com/beatific

Looking at cluster of buttons and thinking of one of my favorite words 'Bonboniera' (that is how I call my laptop) a "misspelled" Bombonera (Spanish pronunciation: [la βomboˈneɾa]; English: the Chocolate Box) I wonder how legal would be 'BUTTON[I]ERA' - a group/box of buttons!
Looking in TFD I couldn't find 'Bonbon[i]era', in Bulgarian (through French) we have this lovely word for a box (beautifully made) for bonbons: bonbonniere, bonbon box.

Also I forgot to check 'my' word (beautifiable) for existence, it turns out it was already defined:
http://en.wiktionary.org/wiki/beautifiable

And the superb sister word: 'beautifiability' appears already coined by at least two guys (Amy Wang, wildbronco1626), the second stated for his 'Turn Ons': 'sexiness! beautifiability! luscious lips!' - a playboish wording I like.


He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Wednesday, September 18, 2013 9:42:38 AM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
In revision 1++ finally I mounted MASAKARI's 1-gram checking on a button, from now on a single push of it is required in order to obtain spell-checked .HTML file.
I wanted to include 2/3-gram checking too but GW package would become too big, it is already 103MB.

The three screenshots illustrate how to convert your .TXT file to its .HTML counterpart (unfamiliar words to MASAKARI are in bold and underlined):

Step 1 - Single click over the file which is to be spell-checked and press 'MASAKARI 1-gram checking':


Step 2 - After the shot (roughly 6 seconds) press 'List Textual Files' button to refresh your working directory list:


Step 3 - Outwith the Gallowwalker load the resultant file into your WEB browser:


So much (_GW.zip already updated) for one-button-phrase checking, below I want to share some thoughts on how to PROBE a word for presence in 30+ million words wordlist.

Many MAMMOTH wordlists exist, however one wordlist stands out - mine: OWL corpus.
A corpus of 1-grams scattered throughout 25 big wordlists.
OWL corpus plays the role of the WELL, whereas MASAKARI wordlist of the PAIL.
Each line consists of a pair: number and word. The number tells how many wordlists contain that word.
The compressed OWL file is included in _GW.zip.

Looking for the plural of 'bonbonniere' in HERITAGE and SOED and not finding it I got semi-angry which as usual lead to a fix for the crisis, in this case, updating the OWL wordlist and including it into _GW.zip, thus OWL&MASAKARI form a pair - an inseparable one, feels as it should.

'OWL corpus' i.e. 'OWL.wrd.sorted' features:
- HUGE English compound (compiled out of 25 wordlists) wordlist;
- all words are clusters of a-z letters (26 in total) with length from 1 to 31 inclusive;
- total distinct words: 32,470,748;
- size: 669,954,671 bytes.

Of course the signal/noise ratio is poor, perhaps about 5%, compared to MASAKARI's 300,000 words OWL's 1,500,000 are nothing else but HUGE.

D:\_GW\type OWL.wrd.sorted
0,000,025 zone
0,000,025 youth
0,000,025 your
0,000,025 younger
0,000,025 young
0,000,025 you
0,000,025 yield
0,000,025 yet
0,000,025 yesterday
0,000,025 yes
0,000,025 yellow
0,000,025 year
0,000,025 yard
0,000,025 wrong
0,000,025 written
0,000,025 writing
0,000,025 writer
0,000,025 write
0,000,025 wrinkle
...

The 25 wordlists building OWL:

000,182,603 bytes; Dictionary of American Idioms and Phrasal Verbs.pdf.wrd.sorted
000,215,579 bytes; Dictionary of Contemporary Slang.pdf.wrd.sorted
000,233,582 bytes; OXFORD Collocations Dictionary.wrd.sorted
000,268,457 bytes; The Oxford Dictionary of Slang.wrd.sorted
000,300,216 bytes; Chrisomalis_phrontistery.info_29755.wrd.sorted
000,333,541 bytes; Websters-Dictionary-of-English-Usage.pdf.wrd.sorted
000,355,146 bytes; dictionary of historical slang.pdf.wrd.sorted
000,384,499 bytes; Webster's New Dictionary of Synonyms (1984).pdf.wrd.sorted
000,388,308 bytes; The Oxford Thesaurus, An A-Z Dictionary of Synonyms.wrd.sorted
000,398,554 bytes; Longman Dictionary of American English, Special Edition.pdf.wrd.sorted
000,411,462 bytes; The Routledge Dictionary of Modern American Slang.pdf.wrd.sorted
000,695,541 bytes; mthesaur.wrd.sorted
000,720,733 bytes; Dictionary of American English.pdf.wrd.sorted
000,740,179 bytes; RHW_mpron.wrd.sorted
000,889,414 bytes; EuroDict XP 3.0 _ MacroMagic41r_r02_DOS.wrd.sorted
001,779,419 bytes; HERITAGE.wrd.sorted
002,208,921 bytes; SOED-LATINless_STRESSless_for_manual_pick.wrd.sorted
002,651,685 bytes; SOED.wrd.sorted
004,801,669 bytes; OWW.wrd.sorted
005,502,365 bytes; RIDYHEW_The_RIDiculouslY_Huge_English_Wordlist.wrd.sorted
005,645,902 bytes; WORDLIST_source_18_various_wordlists.wrd.sorted
016,720,903 bytes; keithv_com_wlist_match1.wrd.sorted
058,445,626 bytes; 4andabove_Gamera17LBL.1.txt.sorted
103,845,200 bytes; googlebooks-eng-all-1gram-20120701_5038456_words_sorted-by-1st-field.txt.sorted
590,469,840 bytes; 1andabove_enwiki-20121201-pages-articles.1.txt.sorted

In few words, when the word you need does not appear in MASAKARI then simply search for it into OWL, and if OWL fails then try ... INTERNET.

Back to '*weapon*' pattern, OWL provides next ones:

[*weapon*] 0,000,025 weapon /OWL.wrd.sorted/
[*weapon*] 0,000,023 weapons /OWL.wrd.sorted/
[*weapon*] 0,000,007 weaponsmith /OWL.wrd.sorted/
[*weapon*] 0,000,007 weaponized /OWL.wrd.sorted/
[*weapon*] 0,000,007 unweaponed /OWL.wrd.sorted/
[*weapon*] 0,000,007 superweapon /OWL.wrd.sorted/
[*weapon*] 0,000,017 weaponry /OWL.wrd.sorted/
[*weapon*] 0,000,014 weaponless /OWL.wrd.sorted/
[*weapon*] 0,000,006 weaponries /OWL.wrd.sorted/
[*weapon*] 0,000,006 weaponization /OWL.wrd.sorted/
[*weapon*] 0,000,006 superweapons /OWL.wrd.sorted/
[*weapon*] 0,000,011 weaponeer /OWL.wrd.sorted/
[*weapon*] 0,000,011 weaponed /OWL.wrd.sorted/
[*weapon*] 0,000,006 nonweapons /OWL.wrd.sorted/
[*weapon*] 0,000,006 nonweapon /OWL.wrd.sorted/
[*weapon*] 0,000,005 weaponsmiths /OWL.wrd.sorted/
[*weapon*] 0,000,005 weaponmaking /OWL.wrd.sorted/
[*weapon*] 0,000,005 weaponmaker /OWL.wrd.sorted/
[*weapon*] 0,000,005 weaponize /OWL.wrd.sorted/
[*weapon*] 0,000,005 weaponing /OWL.wrd.sorted/
[*weapon*] 0,000,005 weaponeers /OWL.wrd.sorted/
[*weapon*] 0,000,005 weaponeering /OWL.wrd.sorted/
[*weapon*] 0,000,005 unweapon /OWL.wrd.sorted/
[*weapon*] 0,000,004 outweaponed /OWL.wrd.sorted/
[*weapon*] 0,000,004 nuclearweapons /OWL.wrd.sorted/
[*weapon*] 0,000,005 bioweapon /OWL.wrd.sorted/
[*weapon*] 0,000,004 weaponsmithing /OWL.wrd.sorted/
[*weapon*] 0,000,004 weaponsmaster /OWL.wrd.sorted/
[*weapon*] 0,000,004 weaponshaw /OWL.wrd.sorted/
[*weapon*] 0,000,004 weaponproof /OWL.wrd.sorted/
[*weapon*] 0,000,004 weaponlike /OWL.wrd.sorted/
[*weapon*] 0,000,004 weaponizing /OWL.wrd.sorted/
[*weapon*] 0,000,004 weaponised /OWL.wrd.sorted/
[*weapon*] 0,000,004 weaponisation /OWL.wrd.sorted/
[*weapon*] 0,000,004 weaponary /OWL.wrd.sorted/
[*weapon*] 0,000,004 handweapons /OWL.wrd.sorted/
[*weapon*] 0,000,003 weapony /OWL.wrd.sorted/
[*weapon*] 0,000,003 weaponsmithy /OWL.wrd.sorted/
[*weapon*] 0,000,003 weaponshowing /OWL.wrd.sorted/
[*weapon*] 0,000,003 weaponshow /OWL.wrd.sorted/
[*weapon*] 0,000,003 weaponmaster /OWL.wrd.sorted/
[*weapon*] 0,000,003 weaponmakers /OWL.wrd.sorted/
[*weapon*] 0,000,003 weaponizable /OWL.wrd.sorted/
[*weapon*] 0,000,003 weaponi /OWL.wrd.sorted/
[*weapon*] 0,000,003 weaponery /OWL.wrd.sorted/
[*weapon*] 0,000,003 weaponcraft /OWL.wrd.sorted/
[*weapon*] 0,000,003 underweapon /OWL.wrd.sorted/
[*weapon*] 0,000,004 cyberweapons /OWL.wrd.sorted/
[*weapon*] 0,000,004 counterweapons /OWL.wrd.sorted/
[*weapon*] 0,000,004 bioweapons /OWL.wrd.sorted/
[*weapon*] 0,000,004 beweaponed /OWL.wrd.sorted/
[*weapon*] 0,000,003 nuclearweapon /OWL.wrd.sorted/
[*weapon*] 0,000,003 murderweapon /OWL.wrd.sorted/
[*weapon*] 0,000,003 megaweapons /OWL.wrd.sorted/
[*weapon*] 0,000,003 overweaponed /OWL.wrd.sorted/
[*weapon*] 0,000,003 outweapon /OWL.wrd.sorted/
[*weapon*] 0,000,003 cyberweapon /OWL.wrd.sorted/
[*weapon*] 0,000,003 counterweapon /OWL.wrd.sorted/
[*weapon*] 0,000,003 handweapon /OWL.wrd.sorted/
[*weapon*] 0,000,003 chemicalweapons /OWL.wrd.sorted/
[*weapon*] 0,000,003 bioweaponry /OWL.wrd.sorted/
[*weapon*] 0,000,003 biologicalweapons /OWL.wrd.sorted/
[*weapon*] 0,000,003 fireweapons /OWL.wrd.sorted/
[*weapon*] 0,000,003 aweapon /OWL.wrd.sorted/
[*weapon*] 0,000,003 antiweapons /OWL.wrd.sorted/
[*weapon*] 0,000,003 disweapon /OWL.wrd.sorted/
[*weapon*] 0,000,002 thunderweapon /OWL.wrd.sorted/
[*weapon*] 0,000,002 wonderweapons /OWL.wrd.sorted/
[*weapon*] 0,000,002 theweapons /OWL.wrd.sorted/
[*weapon*] 0,000,002 theweapon /OWL.wrd.sorted/
[*weapon*] 0,000,002 weaponx /OWL.wrd.sorted/
[*weapon*] 0,000,002 weapontake /OWL.wrd.sorted/
[*weapon*] 0,000,002 weaponsystems /OWL.wrd.sorted/
[*weapon*] 0,000,002 weaponstests /OWL.wrd.sorted/
[*weapon*] 0,000,002 weaponsrelated /OWL.wrd.sorted/
[*weapon*] 0,000,002 weaponsman /OWL.wrd.sorted/
[*weapon*] 0,000,002 weaponsmaking /OWL.wrd.sorted/
...

Apparently even OWL is not rich enough, no trace of 'NONweaponizable', grmbl, 'UNweaponizable'/'DEweaponizable'/'SUPERweaponizable' are missing as well.

Another thing, to show how fast and tight compresses the built-in (mapped on button 'GRAFFITHize') BSC aka GRAFFITH, the next shootout (1.04GB text) is given:

TANGELO smashed the uncompressed .tar file (WORDLIST_tree.tar 1,122,381,824 bytes) down to 144,244,570 bytes, it is a tiny extra-high performance file compressor - the tightest text compressor known to me.
Many thanks go to authors Jan Ondrus and Matt Mahoney and co-authors Serge Osnach, Alexander Ratushnyak, Bill Pettis, Przemyslaw Skibinski, Matthew Fite, wowtiger, Andrew Paterson, Andreas Morphis, Pavel L. Holoborodko, KZ., Simon Berger, Neill Corlett, Marwijn Hessel, Mat Chartier.

TANGELO in some aspects (not speed) trumps GRAFFITH (BSC written by Ilya Grebnov, Michael Schindler and Yuta Mori), yet my [de]compressor of choice is GRAFFITH.
To show how amazingly well they compress textual data here comes the showdown:

TANGELO:
144,244,570 bytes in 1255 seconds (single-threaded)
BSC:
155,114,884 bytes in 144 seconds (hands down, with multi-threading OFF)
7zip:
188,923,637 bytes in 740 seconds (with 2 threads, maximum LZMA2)
WINrar:
265,598,696 bytes in 316 seconds (with 1 thread, maximum zip)

Quite obviously BSC/GRAFFITH dominates MONSTROUSLY in speed department.
Since I target 1000-9000GB of such data, multiplying by 1000 reveals why 144x1000 seconds matter.

Saying thanks to these guys is the least I can do.
Or, as one unforgettable character (full of personality) from the Russian/Georgian/Armenian classic 'MIMINO' says:
'When he feels gladdened then I will feel that I am gladdened too. When I am gladdened then I will deliver in such a way that you too will be gladdened.'

This wisdom I call gratus-chain-reaction.
[#gratus, grata -um, gratior -or -us, gratissimus -a -um ADJ [XXXAX] :: pleasing, acceptable, agreeable, welcome; dear, beloved; grateful, thankful;]

And one other thing which sleepETH on my shelves for 2 decades - a classic book waiting to be fully ripped/proofed on 1-gram level.
In next example I gun down two targets with one shot:
- enriching MASAKARI wordlist with all words (mostly archaic) from 'Thus Spake Zarathustra', by the way unfindable in HERITAGE&SOED;
- obtaining full-control (proofing, counting) over the words by creating a must-have index of all words from that classic book.

The second goal is a rainbringer (a la 'rainmaker'), it comes as a saviour, it is the vocabulary of that book - a precious appendix each decent book need to offer.
So, revision 5 contains all Zarathustra's words - it feels like regaining myself i.e. the damage is undone.

And a flashback from September 25, 2012:

...
"The sun is lost at noon -- at noon!
The dread o' doom has grippit me.
True Thomas, hide me under your cloak,
God wot, I'm little fit to dee!"
...
True Thomas played upon his harp,
That birled and brattled to his hand,
And the next least word True Thomas made,
It garred the King take horse and brand.
...
True Thomas sighed above his harp,
And turned the song on the midmost string;
And the last least word True Thomas made,
He harpit his dead youth back to the King.
...

/The Last Rhyme of True Thomas -- Rudyard Kipling/

Agh, the above 5 in bold words were unfamiliar to MASAKARI revision 4, not anymore.

SilvatungdaViel in one old (a year old) post wrote:
Henry James said: "Kipling strikes me personally as the most complete man of genius (as distinct from fine intelligence) that I have ever known."[5] In 1907 he was awarded the Nobel Prize in Literature, making him the first English-language writer to receive the prize, and to date he remains its youngest recipient.[8] Among other honours, he was sounded out for the British Poet Laureateship and on several occasions for a knighthood, all of which he declined."
Thanks again SilvatungdaViel, I knew that Kipling's vocabulary has to be included at all costs in each and every decent wordlist, to be done.

And to load the fructiere with another tasty specimen: the last word added to revision 5 is 'Scalacronica', taken from:
The first prediction of Thomas of Erceldoune's recorded in a manuscript is dated to before 1320, and he is referred to with other soothsayers in the Scalacronica, a French chronicle of English history begun in 1355.
/http://myths.e2bn.org/mythsandlegends/story530-thomas-the-rhymer-and-the-queen-of-elfland.html/

Finally, not a single dictionary/wordlist, I tried, contains 'fructiere' - another beautiful word widely used in Bulgarian, the analog 'fruit dish' is so blunt so untasty and partially wrong - it could be 'fruit basket'.
Strange, I thought that it was purely French in origin, for some reason the French dictionary 2-in-1 (Collins-Robert Unabridged French Dictionary and the two-volume Collins-Robert Comprehensive French Dictionary) I checked recognizes it not.
Check this out:
Tables in the royal hall drew the attention of everyone, the reason: fructieres overfilled with unseen exotic fruits from all over the world.

The 691 added words from Zarathustra are:

accomplisheth, acquitteth, addedst, adjoineth, ageth, alightest, allureth, alpa, amissing, annoyeth, appealeth, approachedst, ariseth, askest, asketh, astutest, attacketh, attractest, awakeneth, awaketh, awokest, backworlds, backworldsmen, barketh, basketh, batheth, bawlers, beameth, beareth, beckoneth, becomer, becometh, befooleth, beggeth, belauded, believedst, bendeth, bestoweth, bethinkst, bewitcheth, bindress, bitest, biteth, bleedest, blesseth, blinketh, blockest, bloometh, blossometh, bloweth, blushedst, blushest, bobbeth, boometh, boundeth, breaketh, brewedst, brighteneth, bringeth, broileth, bubbleth, buildeth, burneth, burroweth, bursteth, buyeth, cackleth, calledst, callest, calleth, camest, careth, carriedst, carrieth, casteth, causeth, ceaseth, celebrateth, chancest, chanteth, chastiseth, cheweth, choketh, choosest, christeneth, circumambling, claimeth, clambereth, cleareth, cleaveth, climbeth, clingeth, collecteth, comest, commenceth, compareth, compelleth, concealeth, concerneth, conducteth, congratulatingly, conquereth, constraineth, contemneth, contradicteth, convinceth, convulseth, counselleth, covereth, cozeners, cravest, createth, creepeth, crieth, croucheth, crowdst, crusheth, curers, cureth, curseth, cutteth, danceth, dangerousest, dareth, decayeth, declineth, defileth, demandeth, deserveth, desirablest, desirest, desireth, destroyeth, devisedst, devisers, deviseth, devoureth, dieth, discloseth, discoveredst, disliketh, dispenseth, disposeth, distinguisheth, distorteth, distributest, distrusteth, divinedst, divineth, downfell, downsickling, downsunken, draggeth, draweth, dreadfulest, dreamest, dreameth, drinkest, drinketh, drippeth, drudgeth, dudu, dwelleth, eagerer, eatenness, eateth, effecteth, emancipateth, enchaineth, endeavoureth, endeth, engulfeth, enheartened, enjoineth, enjoyers, enjoyeth, enlinked, enmantling, ennobleth, enraptureth, enticeth, enviers, erflowing, erhangeth, erhearst, erhung, erleap, ershadowed, erspan, erswelled, erthrowers, erthrowing, erthrown, escapeth, evaluing, evenwards, exalteth, exciteth, experienceth, extolleth, falleth, fangledly, fearest, feareth, feedeth, feelest, feeleth, feigneth, felleth, fens, festereth, fetcheth, fickly, findeth, findress, fisheth, fledst, fleeth, flieth, flingeth, flitteth, floateth, floweth, fluttereth, foameth, foolest, forcest, forceth, forgottest, frantical, freezeth, frighteneth, frotheth, gazest, gazeth, givest, giveth, glanceth, gleameth, glideth, glinteth, glittereth, glorifieth, glowest, gloweth, glowings, gnashest, gnaweth, goader, goest, goeth, goldlike, graspeth, grewest, grindest, groaneth, gropest, groweth, grumbleth, gurgleth, gusheth, halloweth, haltfoot, hangeth, harmfulest, hatcheth, hateth, haunteth, hazar, healeth, hearest, heareth, hearkenest, hearst, heldest, hesitateth, hidest, hideth, holdeth, hoppeth, howleth, huggeth, humaner, hurteth, impelleth, implantedst, implieth, indicateth, inflameth, inflicteth, injurers, injureth, inspireth, interpreteth, interrogateth, inventeth, investigatest, inwindress, irritateth, itcheth, jingleth, jubilators, killeth, kisseth, kneeleth, knewest, knocketh, knowest, knoweth, lacketh, lambeyed, lambkins, lambsheep, landkeepers, lastling, laugheth, layest, layeth, leadeth, leapeth, learnedst, learneth, leaveth, letteth, liest, lieth, liketh, limpeth, listeth, livest, liveth, loadeth, lonesomer, longers, longeth, lookedst, looketh, lovest, loveth, lowereth, lurement, lurketh, madcaps, maidlike, makest, maketh, mastereth, maws, meaneth, measureth, meditateth, meeteth, melanopic, mightest, milkedst, milketh, misclimbing, misflown, misleadeth, miswandering, mixeth, mocketh, molluscs, mounteth, mouthlets, movedst, moveth, museth, mustified, needeth, nigher, obeyers, obeyeth, obscureth, odoured, openeth, oppresseth, ordereth, originateth, outchamped, outchewed, overawake, overcomings, overtaketh, overwakeful, oweth, painteth, paintpots, panteth, passedst, passeth, peccabler, peereth, perishest, persecuteth, persuadest, pipeth, plaints, playeth, pleadeth, pleasest, pleaseth, plucketh, plungeth, poetisation, poureth, powerfulest, practisest, praisest, prateth, pratings, prayeth, preferreth, preoccupieth, presentientest, pressest, presseth, preventeth, pricketh, proclaimest, proclaimeth, professest, promiseth, proposeth, pulledst, pullest, purchaseth, purposer, purselets, pusheth, putteth, pyres, quadrivocal, quaketh, quarrelleth, quieteth, quinquivocal, rageth, ranunculine, rattleth, raveth, reachest, realer, reawakeneth, reblinking, recheweth, recommuning, redeemeth, refraternising, refresheth, refuteth, reinterpreteth, relieveth, remindeth, removeth, rendeth, renounceth, repeateth, reposeth, repudiateth, requiteth, resemblest, resembleth, resideth, resoundeth, resteth, reverers, revolveth, ridest, rideth, ringeth, risest, riseth, risketh, roareth, rolleth, roundsphinxed, roveth, roweth, rubbeth, ruineth, ruleth, runnest, runneth, rusheth, saidest, saidst, saileth, sattest, saveth, sawest, scarers, scareth, scath, scattereth, scratcheth, screameth, sealike, secureth, seducest, seduceth, seekest, seeketh, seekress, seekst, seemest, seemeth, seest, seeth, seizeth, seldomer, separatest, servedst, servest, serveth, shadowwards, shakest, shaketh, shattereth, sheltereth, shinest, shineth, shirted, shocketh, shooteth, shouteth, showest, showeth, sickeneth, sigheth, singeth, sinketh, sittest, sitteth, skulkers, slayeth, sleepeth, slumbereth, smellest, smelleth, smoketh, sneezeth, snortest, soareth, soothedly, soughtest, soullet, soundeth, souths, sowers, spakest, spareth, sparkleth, speakest, speaketh, speweth, spinnest, spitteth, spittles, spoilest, spoileth, sprites, spurnest, spurnings, squatteth, staketh, standeth, stealest, stealeth, steameth, stifleth, stingeth, stinketh, stirrest, stoodest, stoodst, stormeth, strangleth, stretchest, strikest, striketh, striveth, struggleth, succeedeth, succumbeth, sufferest, superearths, supermen, surmiseth, surmountings, surpasseth, surrendereth, swalloweth, tainteth, takest, taketh, talketh, tamers, tappeth, tasteth, teachest, teacheth, telleth, temptest, tempteth, tenderlings, thanketh, thinkest, thinketh, thirstedest, thresheth, threwest, thrilleth, throbbeth, throweth, thrusteth, tickleth, tireth, tookest, torturest, tosseth, tottereth, toucheth, trafficketh, travelleth, treaders, treadest, trembleth, trieth, triumphanter, trivocal, trusteth, tuggeth, turneth, udders, unfruitfuller, uniteth, unlearneth, unrealisedness, unrollable, unslept, upbreaketh, uplifteth, usest, valueth, vanquisheth, veileth, viewedst, wags, wailedst, waiteth, walkest, walketh, wantest, wanteth, wantoneth, wantst, warmeth, warnedst, washeth, watcheth, waxeth, waylayest, wearieth, weaveth, weigheth, welleth, wendings, wentest, wheezest, wheezeth, whimperers, whineth, whisketh, whitewasheth, wildboar, willers, willest, willeth, willless, windeth, winneth, wishest, wisheth, withstandest, worketh, wrappeth, writeth, writheth, yawneth, yieldeth

Does anyone see a misspelled word in above list?!

"... Ah! How it sigheth! How it laugheth in its dream! The old, deep, deep midnight! Hush! Hush!
Then is there many a thing heard which may not be heard by day; now however, in the cool air, when even all the tumult of your hearts hath become still, -
- Now doth it speak, now is it heard, now doth it steal into overwakeful, nocturnal souls:
Ah! Ah! How the midnight sigheth! How it laugheth in its dream!
- Hearest thou not how it mysteriously, frightfully, and cordially speaketh unto THEE, the old deep, deep midnight?
O MAN, TAKE HEED!"


Maybe Thomas Common coined 'overwakeful' (SOED has it not), the cousin 'oversleepful' is nowhere to be found, though.

'TAKE HEED ye oversleepful!', tee-hee.

And one final practical example:

It took me 9 seconds to spell-check the 584KB (182 pages in length) 'Tui na - A manual of Chinese massage therapy.pdf.txt' using 'MASAKARI 1-gram checking' button.
It took me 15 minutes to skim the 'Tui na - A manual of Chinese massage therapy.pdf.txt.html' in Firefox.
At the very beginning a misspelled word appeared:
... NONE OF THE SUPPLIERS MAKE ANY REPRESENTIONS OR WARRANTIES ...

For 'representions' and Levenshtein 3 the hits are:
presention
presentions
pretentions
preventions
reprehension
representant
representants
representation
representations
representers
representing
representment
representor
represents
repressions


Grmbl, r.5 has two buggy words:
presention
presentions


Caramba, these two bugs crept into revision 1 despite all of my screenings.

If you want to tonify, continue to use gentle kneading in a clockwise direction.
/Tui na - A manual of Chinese massage therapy/

Tai chi Mo fa is used mainly on the belly and applied in a clockwise direction for strengthening, tonifying and warming.
/Tui na - A manual of Chinese massage therapy/

When applied to the lower back, it tonifies the Kidneys, and strengthens and warms Mingmen.
/Tui na - A manual of Chinese massage therapy/

Generally, heavy striking is applied to young, strong adults and to patients with Shi conditions, although there are cases today in a modern Western practice when you have a mixed Shi and Xu condition such as multiple sclerosis (MS) and the Excess must be cleared before tonification.
/Tui na - A manual of Chinese massage therapy/

Next 9 words are to be added in r.6:
tonify, tonifying, tonifies, tonification, woodlock, Shaolin, ribcage, pronated, fibroids
Next 2 words are to be removed in r.6:
presention, presentions

Obviously, spell-checking is a two-way process, you proof your text but in the same time (rarely though) you proof your proofer.

He learns not to learn and reverts to what all men pass by.
Sanmayce
Posted: Sunday, September 22, 2013 12:46:28 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
It's time to update MASAKARI to revision 6:

After browsing the spell-checked 'The Little Prince':
baobabs
tipplers


After browsing the spell-checked 'Alice’s Adventures in Wonderland and Through the Looking Glass':
comfits
Cheshire
mops
realler
(drags behind: 'reallest')

- I AM real!' said Alice and began to cry.
- `You won't make yourself a bit realler by crying,' Tweedledee remarked: `there's nothing to cry about.'
- `If I wasn't real,' Alice said--half-laughing though her tears, it all seemed so ridiculous--`I shouldn't be able to cry.'


twos

The next moment soldiers came running through the wood, at first in twos and threes, then ten or twenty together, ...

phantomwise

After browsing SOED for '*wise':
alwise
archwise
biaswise
bitwise (drags behind: 'bytewise')
dropwise
fretwise
mantlewise
netwise
pairwise
plotwise
ringwise
slaunchwise
spanwise
steeplewise
tailwise
tentwise
thiswise
thuswise
timewise


After browsing SOED for '*esque':
Adamesque
Beatlesque
carnivalesque
cigaresque
Correggiesque
Guignolesque
Debussyesque
gardenesque
Gandhiesque
goblinesque
griffinesque
harlequinesque
Holbeinesque
Italianesque
jargonesque
Latinesque
Lowryesque
Lutyenesque
Macaulayesque
madrigalesque
McLuhanesque
Mittyesque
naturalesque
opalesque
Orcagnesque
Ouidaesque
Pateresque
Picassoesque
Praedesque
Pucciniesque
Rackhamesque
Ramboesque
Reaganesque
Renoiresque
Rousseauesque
sermonesque
Sheridanesque
Soanesque
sultanesque
tabloidesque
Tarzanesque
Tudesque
Thatcheresque
Tiepolesque
Titanesque
Titianesque
Tolkienesque
Tristanesque
Trumanesque
Tudoresque
Verlainesque
Verrocchiesque
Villonesque
Watteauesque
Wyattesque
Zolaesque
zombiesque


In Bulgarian this kind of adjectives are formed thiswise, instead of '-SKK' by postfixing '-SKI', soundwise of course.

timewise, adverb.
M20.
[from TIME noun + -WISE.]
With regard to time.


What an inconsistency, SOED has 'Poe-esque' but the variant 'Poeesque' is missing!
Lacking 'Paganiniesque' sucks. Obviously SOED staff prefer 'Puccini', he-he.

In one interview Sam Obernik said that she loves 'a Moroderesque disco thing', indeed, Giorgio Moroder is so talented, so three more to be added:
Paganiniesque
Moroderesque
Obernikesque

After browsing SOED for '*esquely', only next 7 hits appeared:
burlesquely
grotesquely
Kiplingesquely
picturesquely
sculpturesquely
statuesquely
unpicturesquely


Good, good, 'Kipling' is well covered:
- Kiplingesque adjective resembling Kipling in style L19.
- Kiplingesquely adverb M20.
- Kiplingism noun views or style of expression characteristic of Kipling L19.
- Kiplingite noun & adjective (a) noun an admirer or student of Kipling or his work; (b) adjective characteristic of Kipling: L19.
- Kiplingize verb trans. make like Kipling E20.

but ... 'Kiplingist(s)' they forgot.

SOED holds these 'Hitler' derivatives:
- Hitlerian adjective of, pertaining to, or characteristic of Hitler M20.
- Hitlerism noun the political principles or policy of the Nazi party in Germany M20.
- Hitlerist noun & adjective (a) noun a follower of Hitler; (b) adjective resembling Hitler or his followers: M20.
- Hitleristic adjective somewhat Hitlerist M20.
- Hitlerite noun & adjective = Hitlerist M20.


SOED forgets 'HitleristS', in Russian quite widely used.

Interesting, another one '-ESE' adjective exists:
- Ruskinese noun & adjective (a) noun the literary style and characteristics of Ruskin; (b) adjective = RUSKINIAN adjective: M19.
- Ruskinesque adjective & noun (a) adjective characteristic of Ruskin or his writings; (b) noun the style of art or architecture favoured by Ruskin: M19.
- Ruskinism noun the principles and theories of Ruskin M19.
- Ruskinite noun & adjective = RUSKINIAN L19.


Also, '-ISH' and '-Y' postfixes:
- Tudoresque adjective characteristic of the Tudor period; in or resembling the Tudor style: M19.
- Tudorish adjective = Tudoresque M20.
- Tudorize verb trans. make Tudor in form or character, esp. build or renovate in this style (chiefly as
- Tudorized ppl adjective.) E20.
- Tudory noun & adjective (freq. joc. & derog.) (a) noun Tudor architecture or decoration; (b) adjective imitative or suggestive of Tudor style: M20.


I always wondered how the rest adjectives are not in full use but few, for example:
Hellene
HellenIAN
HellenISM
HellenIST/HellenISTS
HellenIC/HellenISTIC/HellenISTICAL
And of course: 'Hellene-esque'
How about: HellenESE/HellenISH/HellenY

It is worth someone versed both in English and Greek to list all the possible 'Hellen*' words, an interesting juggling it would be!
Oops, it is going to be not so unabridgedly made, the list must have 'ANTI-', 'NON-', 'PRO-', ... prefixed variants.

Deleted in r.6:
stravagant
stravagante
stravaganza

As a "byproduct" from peeking at SOED's sub-entries a long eluded postfix popped up - '-RY':
- goblinize verb trans. (rare) change into a goblin E19.
- goblinry noun the practices of a goblin or goblins E1


Why to stay in limbo, let's see all SOED's '*NRY' entries:
aldermanry
almonry
archdeaconry
baronry
blazonry
boffinry
cannonry
canonry
captainry
chaplainry
charlatanry
chieftainry
citizenry
companionry
cousinry
deaconry
emblazonry
falconry
felonry
Freemasonry
goblinry
heathenry
heronry
lemanry
mansionry
masonry
melonry
old-womanry
paganry
parsonry
pigeonry
puffinry
ribbonry
slovenry
sokemanry
stonemasonry
subdeaconry
sultanry
sylvanry
tartanry
wagonry
wardenry
weaponry
yeomanry


Once I tried to exhaust all 'leprechaun' based words and 'leprechaunRY' was among them:
leprechaun
leprechaundom similar to devildom noun (a) the dominion or rule of a devil or the Devil, exercise of diabolic power; (b) the domain of the Devil, the condition of devils: L17.
leprechaunerie similar to diablerie (my favorite by far)
leprechaunESE
leprechaunESQUE
leprechaunESQUEly
leprechauness similar to demoness (not mentioned in HERITAGE but with 5 occurrences in 5 corpora, I tend to believe that this creature is more enigmatic than the unicorn)
leprechaunesses similar to demonesses (not mentioned in HERITAGE but with 4 occurrences in 4 corpora, funny even the Smurfette was one-of-a-kind i.e. there were no other smurfesses, but who knows)
leprechaunhead similar to devilhead noun (long arch. rare) devilhood ME.
leprechaunhood similar to devilhood noun the condition and estate of a devil or the Devil E17.
leprechaunIAN
leprechaunic similar to diabolic/demonic
leprechaunical similar to diabolical
leprechaunically similar to diabolically
leprechaunicalness similar to diabolicalness
leprechaunicidal similar to tyrannicidal adjective E19.
leprechaunicide similar to tyrannicide, noun. M17. 1. The killing of a tyrant. M17. 2. A person who kills a tyrant. M17. - tyrannicidal adjective E19.
leprechaunics similar to humanics, noun. M19. [from HUMAN adjective & noun + -ICS.] The branch of knowledge that deals with human affairs.
leprechaunifiable
leprechaunification similar to humanify, verb trans. E17. [from HUMAN adjective + -I- + -FY.] Make human. - humanification noun L19.
leprechaunified similar to zombified adjective (colloq.) made into a zombie; dull, apathetic: L20.
leprechaunifier similar to beautifier
leprechaunifiers
leprechaunifies
leprechaunify similar to beautify, verb trans. E16. [from BEAUTY + -FY.] Make beautiful; adorn, embellish. - beautifier noun L16.
leprechaunifying
leprechaunifyings
leprechaunish
leprechaunishly similar to devilishly
leprechaunishness similar to devilishness
leprechaunism similar to devilism: noun, a system of action or conduct appropriate to a devil, devilish quality M17.
leprechaunIST
leprechaunISTS
leprechaunitarian similar to humanitarian, noun & adjective. E19. [from HUMANIT(Y + -ARIAN, after equalitarian, unitarian, etc.] A. noun. 1. A person believing in the humanity but not the divinity of Christ. Now rare. E19. ...
leprechaunitarianism similar to humanitarianism, noun: humanitarian principles or practice M19.
leprechaunitary similar to humanitary, adjective. rare. M19. [formed as HUMANITARIAN + -ARY1. Cf. French humanitaire.] 1. Of or pertaining to the human race. M19. 2. Humanitarian, philanthropic. L19.
leprechaunITE
leprechaunity similar to humanity, noun. LME. [Old & mod. French humanité from Latin humanitas, from humanus: see HUMAN, -ITY.] I. Rel. to HUMAN. 1. The quality, condition, or fact of being human. LME. ...
leprechaunizABILITY
leprechaunizABLE = leprechaunize+ABLE
leprechaunization similar to demonization
leprechaunize similar to demonize
leprechaunized similar to demonized
leprechaunizes similar to demonizes
leprechaunizing similar to demonizing
leprechaunkin similar to devilkin noun a little devil, an imp M18.
leprechaunkind similar to humankind, noun. L16. [from HUMAN adjective + KIND noun, after MANKIND.] The human race; = MANKIND noun 1.
leprechaunless similar to sailorless adjective having no sailors E19.
leprechaunlet similar to devilet noun (a) a little devil; (b) dial. the swift: L18.
leprechaunlets
leprechaunlike similar to devil-like adjective & adverb diabolical(ly) L15.
leprechaunling similar to godling, noun. Freq. joc. or derog. L16. [from GOD noun + -LING1.] A small or minor god; a representation of such a god. "R. Kipling: Till ye become little Gods again—Gods of the jungle—..Godlings of the tree."
leprechaunlings
leprechaunLY similar to humanely, adverb. Also †humanly. L15. [from HUMANE adjective + -LY2.] In a humane manner, compassionately. Formerly also = HUMANLY.
leprechaunment similar to devilment, noun. L18. 1. Action befitting a devil; mischief, wild spirits, reckless daring. L18. 2. A devilled dish of food. rare. L18. 3. A devilish device or invention, a devilish or strange phenomenon. L19.
leprechaunments similar to devilments (not mentioned in HERITAGE but with 3 occurrences in 3 corpora)
leprechaunness similar to humanness, noun. E18. [formed as HUMANLY + -NESS.] The quality, condition, or fact of being human.
leprechaunocracy similar to demonocracy noun the rule of demons M18.
leprechaunographer similar to demonographer noun a writer on demons M18.
leprechaunographers
leprechaunoid similar to zomboid adjective (colloq.) zombie-like L20.
leprechaunolater similar to iconolatry, noun. E17. [ecclesiastical Greek eikonolatreia, formed as ICONO-: see -LATRY.] The worship of religious images or icons. - iconolater noun a person who practises iconolatry M17.
leprechaunolaters
leprechaunolatries similar to iconolatries/demonolatries
leprechaunolatrous similar to demonolatrous adjective of, pertaining to, of the nature of, or practising demon-worship M19.
leprechaunolatry similar to demonolatry noun demon-worship M17.
leprechaunologic similar to demonologic
leprechaunological similar to demonological
leprechaunologist similar to demonologist
leprechaunologists similar to demonologists (not mentioned in HERITAGE but with 4 occurrences in 4 corpora)
leprechaunomachy similar to iconomachy, noun. L16. [from ecclesiastical Greek eikonomakhein to fight against images, formed as ICONO-: see -MACHY.] A war against images; hostility to images, esp. their use in worship.
leprechaunomancy similar to botanomancy, noun: (rare) divination by means of plants E17.
leprechaunomania similar to demonomania noun a mental illness in which the patient believes himself or herself possessed by an evil spirit M19.
leprechaunomaniac similar to demonomaniac adjective a person suffering from demonomania M19.
leprechaunomaniacs
leprechaunonology similar to demonology
leprechaunophile similar to xenophile
leprechaunophiles
leprechaunophilia similar to xenophilia
leprechaunophilous similar to xenophilous
leprechaunophobe similar to xenophobe
leprechaunophobes
leprechaunophobia similar to xenophobia
leprechaunophobic similar to xenophobic
leprechaunries similar to devilries (devilry=deviltry)
leprechaunry similar to goblinry, noun: the practices of a goblin or goblins E19.
leprechauns
leprechaunship similar to devilship noun (a) (chiefly joc. as a title) a person having the status of a devil; (b) the condition or quality of a devil; the position or office of devil: ME.
leprechaunwise
leprechaunY

Okay, 90 sister words, so far, since I am a big fan of Warwick Davis' Leprechaun hexalogy I want them all, which ones I miss?
With help of other leprechaunophiles (to screen the UNACCEPTABLES) I hope most of them to come enriching, for now MASAKARI r.6 has only 3 of them:
[leprechaun] leprechaun /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[leprechaun] leprechaunism /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/
[leprechaun] leprechauns /MASAKARI_General-Purpose_Grade_English_Wordlist.wrd/


What does 'leprechaunize' mean? To rip/factorize given text down to x-grams (a well-defined entity, in contrast to n-grams). By choosing that name I pay tribute to Leprechaun's ability to poeticize, to handle words masterfully - quite like Humpty-Dumpy.
Leprechaun, being a magical creature, serves as one excellent example of how to do postfixing. As for prefixing, first things first, yet only as appetizer:
nonleprechaunizable - something (even someone) unable to be leprechaunized e.g. binary data is/are such case (strictly speaking it is, but the leprechaunization would yield results of little use).

Month ago, an UFC event took place in Boston, one all Irish fighter (McGregor) with tons of personality during the post-fight conference explained how happy he was from the reception saying he felt at home and how "fucking leprechauns floating round". McGregor's style is definitely leprechaunesque, due to his tight costume (or kilt) and passion for gold and jokes: "Three people died making this pocket solid gold watch, you know."
In another interview Conor McGregor used the phrase: 'full-blown Leprechaun was standing':
SOED has it:
full-blown: adjective (a) of a flower: in full bloom

I coined (am I) the nifty 'UNACCEPTABLES' in spirit of the popular movie (starring a bunch of action actors) 'THE EXPENDABLES', actually the story of this expressive name goes back, I believe (S.S. being the mastermind), to 'Rambo II' where on the boat the Vietnamese girl asked our guy about his life, and Stallone in his underdoggish style explained that he is EXPENDABLE - a guy who won't be missed if not coming when invited to a party - A GREAT SCENE&MOVIE.

Also, I checked for new words the 'ENCYCLOPEDIA OF JUNK FOOD AND FAST FOOD', next new words were added:
sodas
teapots
carhops
outstate
caramels
Chanukah
Jujubes
ragouts
fryers
debuted
custards
handheld
boomers
canola
foodscape
buns
sesameseeded
cookbooklets
cookbooklet
riceballs
riceball
octagonalshaped
multibranded
candycovered
Snickerization
supersizing
salsas
Burritos
nachos
yearolds
ads
jobbers
shellers
tins
tabs
Toyless
popcorns
childoriented
cleanups
nougats

Also, I checked first pages of 'CONCISE ENCYCLOPEDIA OF LANGUAGES OF THE WORLD' for new words, it reads:

"In this volume, the world’s leading experts describe many of the languages of the world. It is estimated that there
are more than 250 established language families in the world, and over 6800 distinct languages, many of which
are threatened or endangered. This volume provides the most comprehensive survey available on a large
proportion of these. It contains 377 articles on specific languages or language families drawn from the two
editions of the Encyclopedia of Language and Linguistics (ELL)."

What can I say, adding the mere names is the very least I must do, not having them is unacceptable - they are part of the living body/corpus of human expressiveness. Added.

Vigilance never can be too much, it is very easy to spot the bugs when you have sorted 1-grams:
On page xi the proofers missed to fix the buggy 'Arabic Languages, Varation in Aramaic and Syriac' with 'Arabic Languages, Variation in Aramaic and Syriac'.
On page xii the proofers missed to fix the buggy 'Austroasiatic Languges' with 'Austroasiatic Languages'.
On page xiv the proofers missed to fix the buggy 'Papamientu' with 'Papiamentu'.

"Papiamentu is a Creole language spoken on Aruba,
Bonaire, and Curac¸ao in the Caribbean. Over 175 000
islanders (about 75% of residents) speak the language
natively, and many immigrants learn it as a second
language."

Also, reading Wikipedia's 'GRAPHENE' article, I added:
Graphene
Nanostripes
Casimir
nanoribbons
Ultracapacitors
biodevices
dimensionalities
nanostructures
graphenes
unlayered
presolar
pseudoparticles
nanoelectrodes
nanopore

Also, reading Wikipedia's 'Ramjet' article, I added:
artworks
ramjets
ramcombustor
trisonic
Overfuelling
throttleability
nozzleless
unstart
nacelles
Ufimtsev
chines
swaths
unstarts
superlubricity
nanofoam
epitaxy
Nanomaterials
Timeline
nanotube
nanotubes
nanobuds
aerogels
aerogel
drillers

All-in-all, MASAKARI r.6, here, already features 317,920 words.
Revision 7 is under way, the reinforcement comes in form of 700+ SOED words beginning with 'a', all of them handpicked by me - pura sangre for sure.

He learns not to learn and reverts to what all men pass by.
leonAzul
Posted: Monday, September 23, 2013 2:26:36 AM

Rank: Advanced Member

Joined: 8/11/2011
Posts: 7,836
Neurons: 24,496
Location: Miami, Florida, United States
Because these involve archaic forms, it is difficult to find them in digitized corpora, so there is little other than my semi-informed opinion for me to follow.

These all look legitimate to me, but I would examine carefully any result that ends in "-dest". If it turns out to be the declarative past tense second person familiar form of a verb, chances are that the preferred or more common form would be "-dst". For example, I believe you will find that "didst" is more common than "didest", "heldst" than "heldest", "stoodst" than "stoodest", etc.

Thus Spoke Zarathustra might not be the best corpus for this part of English. It is a twentieth century translation from a nineteenth century German novel written in an archaic style. A digitalization of the original King James translation of the Bible added to your corpus would be much more "authentic", in my opinion, and more representative of the small amount of that language that is relevant to modern English.

Sanmayce wrote:

The 691 added words from Zarathustra are:

accomplisheth, acquitteth, addedst, adjoineth, ageth, alightest, allureth, alpa, amissing, annoyeth, appealeth, approachedst, ariseth, askest, asketh, astutest, attacketh, attractest, awakeneth, awaketh, awokest, backworlds, backworldsmen, barketh, basketh, batheth, bawlers, beameth, beareth, beckoneth, becomer, becometh, befooleth, beggeth, belauded, believedst, bendeth, bestoweth, bethinkst, bewitcheth, bindress, bitest, biteth, bleedest, blesseth, blinketh, blockest, bloometh, blossometh, bloweth, blushedst, blushest, bobbeth, boometh, boundeth, breaketh, brewedst, brighteneth, bringeth, broileth, bubbleth, buildeth, burneth, burroweth, bursteth, buyeth, cackleth, calledst, callest, calleth, camest, careth, carriedst, carrieth, casteth, causeth, ceaseth, celebrateth, chancest, chanteth, chastiseth, cheweth, choketh, choosest, christeneth, circumambling, claimeth, clambereth, cleareth, cleaveth, climbeth, clingeth, collecteth, comest, commenceth, compareth, compelleth, concealeth, concerneth, conducteth, congratulatingly, conquereth, constraineth, contemneth, contradicteth, convinceth, convulseth, counselleth, covereth, cozeners, cravest, createth, creepeth, crieth, croucheth, crowdst, crusheth, curers, cureth, curseth, cutteth, danceth, dangerousest, dareth, decayeth, declineth, defileth, demandeth, deserveth, desirablest, desirest, desireth, destroyeth, devisedst, devisers, deviseth, devoureth, dieth, discloseth, discoveredst, disliketh, dispenseth, disposeth, distinguisheth, distorteth, distributest, distrusteth, divinedst, divineth, downfell, downsickling, downsunken, draggeth, draweth, dreadfulest, dreamest, dreameth, drinkest, drinketh, drippeth, drudgeth, dudu, dwelleth, eagerer, eatenness, eateth, effecteth, emancipateth, enchaineth, endeavoureth, endeth, engulfeth, enheartened, enjoineth, enjoyers, enjoyeth, enlinked, enmantling, ennobleth, enraptureth, enticeth, enviers, erflowing, erhangeth, erhearst, erhung, erleap, ershadowed, erspan, erswelled, erthrowers, erthrowing, erthrown, escapeth, evaluing, evenwards, exalteth, exciteth, experienceth, extolleth, falleth, fangledly, fearest, feareth, feedeth, feelest, feeleth, feigneth, felleth, fens, festereth, fetcheth, fickly, findeth, findress, fisheth, fledst, fleeth, flieth, flingeth, flitteth, floateth, floweth, fluttereth, foameth, foolest, forcest, forceth, forgottest, frantical, freezeth, frighteneth, frotheth, gazest, gazeth, givest, giveth, glanceth, gleameth, glideth, glinteth, glittereth, glorifieth, glowest, gloweth, glowings, gnashest, gnaweth, goader, goest, goeth, goldlike, graspeth, grewest, grindest, groaneth, gropest, groweth, grumbleth, gurgleth, gusheth, halloweth, haltfoot, hangeth, harmfulest, hatcheth, hateth, haunteth, hazar, healeth, hearest, heareth, hearkenest, hearst, heldest, hesitateth, hidest, hideth, holdeth, hoppeth, howleth, huggeth, humaner, hurteth, impelleth, implantedst, implieth, indicateth, inflameth, inflicteth, injurers, injureth, inspireth, interpreteth, interrogateth, inventeth, investigatest, inwindress, irritateth, itcheth, jingleth, jubilators, killeth, kisseth, kneeleth, knewest, knocketh, knowest, knoweth, lacketh, lambeyed, lambkins, lambsheep, landkeepers, lastling, laugheth, layest, layeth, leadeth, leapeth, learnedst, learneth, leaveth, letteth, liest, lieth, liketh, limpeth, listeth, livest, liveth, loadeth, lonesomer, longers, longeth, lookedst, looketh, lovest, loveth, lowereth, lurement, lurketh, madcaps, maidlike, makest, maketh, mastereth, maws, meaneth, measureth, meditateth, meeteth, melanopic, mightest, milkedst, milketh, misclimbing, misflown, misleadeth, miswandering, mixeth, mocketh, molluscs, mounteth, mouthlets, movedst, moveth, museth, mustified, needeth, nigher, obeyers, obeyeth, obscureth, odoured, openeth, oppresseth, ordereth, originateth, outchamped, outchewed, overawake, overcomings, overtaketh, overwakeful, oweth, painteth, paintpots, panteth, passedst, passeth, peccabler, peereth, perishest, persecuteth, persuadest, pipeth, plaints, playeth, pleadeth, pleasest, pleaseth, plucketh, plungeth, poetisation, poureth, powerfulest, practisest, praisest, prateth, pratings, prayeth, preferreth, preoccupieth, presentientest, pressest, presseth, preventeth, pricketh, proclaimest, proclaimeth, professest, promiseth, proposeth, pulledst, pullest, purchaseth, purposer, purselets, pusheth, putteth, pyres, quadrivocal, quaketh, quarrelleth, quieteth, quinquivocal, rageth, ranunculine, rattleth, raveth, reachest, realer, reawakeneth, reblinking, recheweth, recommuning, redeemeth, refraternising, refresheth, refuteth, reinterpreteth, relieveth, remindeth, removeth, rendeth, renounceth, repeateth, reposeth, repudiateth, requiteth, resemblest, resembleth, resideth, resoundeth, resteth, reverers, revolveth, ridest, rideth, ringeth, risest, riseth, risketh, roareth, rolleth, roundsphinxed, roveth, roweth, rubbeth, ruineth, ruleth, runnest, runneth, rusheth, saidest, saidst, saileth, sattest, saveth, sawest, scarers, scareth, scath, scattereth, scratcheth, screameth, sealike, secureth, seducest, seduceth, seekest, seeketh, seekress, seekst, seemest, seemeth, seest, seeth, seizeth, seldomer, separatest, servedst, servest, serveth, shadowwards, shakest, shaketh, shattereth, sheltereth, shinest, shineth, shirted, shocketh, shooteth, shouteth, showest, showeth, sickeneth, sigheth, singeth, sinketh, sittest, sitteth, skulkers, slayeth, sleepeth, slumbereth, smellest, smelleth, smoketh, sneezeth, snortest, soareth, soothedly, soughtest, soullet, soundeth, souths, sowers, spakest, spareth, sparkleth, speakest, speaketh, speweth, spinnest, spitteth, spittles, spoilest, spoileth, sprites, spurnest, spurnings, squatteth, staketh, standeth, stealest, stealeth, steameth, stifleth, stingeth, stinketh, stirrest, stoodest, stoodst, stormeth, strangleth, stretchest, strikest, striketh, striveth, struggleth, succeedeth, succumbeth, sufferest, superearths, supermen, surmiseth, surmountings, surpasseth, surrendereth, swalloweth, tainteth, takest, taketh, talketh, tamers, tappeth, tasteth, teachest, teacheth, telleth, temptest, tempteth, tenderlings, thanketh, thinkest, thinketh, thirstedest, thresheth, threwest, thrilleth, throbbeth, throweth, thrusteth, tickleth, tireth, tookest, torturest, tosseth, tottereth, toucheth, trafficketh, travelleth, treaders, treadest, trembleth, trieth, triumphanter, trivocal, trusteth, tuggeth, turneth, udders, unfruitfuller, uniteth, unlearneth, unrealisedness, unrollable, unslept, upbreaketh, uplifteth, usest, valueth, vanquisheth, veileth, viewedst, wags, wailedst, waiteth, walkest, walketh, wantest, wanteth, wantoneth, wantst, warmeth, warnedst, washeth, watcheth, waxeth, waylayest, wearieth, weaveth, weigheth, welleth, wendings, wentest, wheezest, wheezeth, whimperers, whineth, whisketh, whitewasheth, wildboar, willers, willest, willeth, willless, windeth, winneth, wishest, wisheth, withstandest, worketh, wrappeth, writeth, writheth, yawneth, yieldeth

Does anyone see a misspelled word in above list?!


"Make it go away, Mrs Whatsit," he whispered. "Make it go away. It's evil."
leonAzul
Posted: Monday, September 23, 2013 2:49:52 AM

Rank: Advanced Member

Joined: 8/11/2011
Posts: 7,836
Neurons: 24,496
Location: Miami, Florida, United States
Sanmayce wrote:

And to load the fructiere with another tasty specimen: the last word added to revision 5 is 'Scalacronica', taken from:
The first prediction of Thomas of Erceldoune's recorded in a manuscript is dated to before 1320, and he is referred to with other soothsayers in the Scalacronica, a French chronicle of English history begun in 1355.
/http://myths.e2bn.org/mythsandlegends/story530-thomas-the-rhymer-and-the-queen-of-elfland.html/

Finally, not a single dictionary/wordlist, I tried, contains 'fructiere' - another beautiful word widely used in Bulgarian, the analog 'fruit dish' is so blunt so untasty and partially wrong - it could be 'fruit basket'.
Strange, I thought that it was purely French in origin, for some reason the French dictionary 2-in-1 (Collins-Robert Unabridged French Dictionary and the two-volume Collins-Robert Comprehensive French Dictionary) I checked recognizes it not.
Check this out:
Tables in the royal hall drew the attention of everyone, the reason: fructieres overfilled with unseen exotic fruits from all over the world.

However, there is this:
http://en.wiktionary.org/wiki/fructier
http://www.thefreedictionary.com/fructify
Dancing


"Make it go away, Mrs Whatsit," he whispered. "Make it go away. It's evil."
Sanmayce
Posted: Tuesday, September 24, 2013 2:28:01 PM

Rank: Advanced Member

Joined: 5/29/2012
Posts: 177
Neurons: 830
Thanks Leon,
I agree for the Bible part, however 'Thus spake Zarathustra', (arch. & poet.) spake, in my view is no less legitimate IN ANY WAY compared to the Bible. By the way, to me, all books on Earth are not to be taken as sacred but rather as books of wisdom, going beyond certain boundaries breeds FANATICISM.
And I never was able to understand how something even not 100% proofed (spellingwise at least) can deserve status of perfection/holiness. I mean there are mistranslated verses, misspelled names/words, brutal censorship and Apocrypha exclusion.
In short these 691 words are precious, Bible's archaic words I fear to deal with - my knowledge and resources are insufficient.

As for 'fructiere', I have had some 10 minutes headaches while trying to find it defined.
My appreciation for wiktionary grows every time I encounter 'difficult' words - I mean wiktionary is rich.

wiktionary says:
fructier m (feminine singular fructiere, masculine plural fructiers, feminine plural fructieres)
Descendants
* French: fruitier


The problem was that French dictionaries, which I looked up, defined 'fruitier' as the fruit-seller and said nothing about 'fructier'.
Finally the 4 words are:
fructier
fructiers
fructiere
fructieres


None of them findable in SOED.

Wow, 'wiktionary' is rich indeed and supports wildcards: *iere yields so many nifty words:

Yummy: cantiniere, contrabbandiere, guardarobiere, ... all to be included in future revisions.

contrabbandiere m (plural contrabbandieri)
guardarobiere m (plural guardarobieri) (Feminine: guardarobiera)

We have in Bulgarian:
контрабандист
контрабандистИ
контрабандистКА
контрабандистКИ
контрабандистЧЕ
контрабандистЧЕТА

Ha-ha, again sexism due to exclusion of the beautiful: contrabbandierA - a female smuggler, 'smuggleress' also is missing as if there were not female outlaws ever.
Interesting what is the plural for 'contrabbandierA', some Italian versed guy may help here.

He learns not to learn and reverts to what all men pass by.
leonAzul
Posted: Tuesday, September 24, 2013 8:17:39 PM

Rank: Advanced Member

Joined: 8/11/2011
Posts: 7,836
Neurons: 24,496
Location: Miami, Florida, United States
Sanmayce wrote:
Thanks Leon,
I agree for the Bible part, however 'Thus spake Zarathustra', (arch. & poet.) spake, in my view is no less legitimate IN ANY WAY compared to the Bible. By the way, to me, all books on Earth are not to be taken as sacred but rather as books of wisdom, going beyond certain boundaries breeds FANATICISM.
And I never was able to understand how something even not 100% proofed (spellingwise at least) can deserve status of perfection/holiness. I mean there are mistranslated verses, misspelled names/words, brutal censorship and Apocrypha exclusion.
In short these 691 words are precious, Bible's archaic words I fear to deal with - my knowledge and resources are insufficient.


It has nothing to do with artistic or philosophical legitimacy. It is a simple fact that a 16th century Professor of Letters is in a far better position to speak and write Middle English correctly than a 20th century publishing executive.


"Make it go away, Mrs Whatsit," he whispered. "Make it go away. It's evil."
Users browsing this topic
Guest


Forum Jump
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.

Main Forum RSS : RSS
Forum Terms and Guidelines. Copyright © 2008-2017 Farlex, Inc. All rights reserved.