Dr. Jerry Pournelle

Email Me


Why not subscribe now?

Chaos Manor Subscribe Now



Useful Link(s)...

JerryPournelle.com


Hosting by
  Bluehost


Powered by Apache

Computing At Chaos Manor:
The Mailbag

Mailbag for April 16, 2007
Jerry Pournelle jerryp@jerrypournelle.com
www.jerrypournelle.com
Copyright 2007 Jerry E. Pournelle, Ph.D.

April 16, 2007

On Readers for the Blind;

Subject: Better than Kurzweil machines (for now)

Dear Dr. Pournelle:

In regards to computers reading text (Kurzweil et al), I'm sure someone's thought of this already, but what is the impediment, while we wait for the machines to improve, to having highly-educated third-worlders (or even, gasp, American college students) read the NYT and news, op-ed, etc each morning, to be RSS'ed to anyone, sighted or non-sighted, who might choose to subscribe?

In fact, if I wanted to bait the lib-leaning NYT, I might say it's downright discriminatory of them not to be doing this already! Pennies for third-worlders; their "news" read to you while you drive!

My very best, and thank you for the ever-growing wealth of thoughtful reading,

Rich Memphis, TN

Wouldn't it be a matter of independence? Were I blind I would not want to depend on someone else to read to me, since I tend to read by whim.


One of the items available in the Subscriber area is the California 1916 6th Grade Reader in PDF format. I have elsewhere mentioned that many of my works are not available in electronic form, and I usually have them keyed in by a professional editor who also proof reads. When asked why I don't just scan it in, I pointed out that proof reading is difficult and often misses errors. The editor I employ is accurate. I got a good bit of mail on that.

Subject: OCR and PDFs

Jerry,

Most of the problems with OCR come not from the OCR utility but from how it was scanned. Most OCR programs by default use color or grayscale scanning to provide something to do the OCR from. This is where the OCR problems originate from almost entirely. I now use Acrobat itself for some 90% or more of my scanning that will be OCR'ed, this allows me to use the Twain driver that came with my scanner to do the actual scanning with. The twain drive allows me to select between color, grayscale and B&W scanning modes as well as allowing brightness and contrast settings for the scan.

By selecting 1 bit, pure B&W I get good fairly sharp original scans that will OCR fairly well. Most if not all problems the majority of people have with doing OCR is that the default and sometimes the only setting the OCR software allows is color and a few allow grayscale calling it B&W which it is not. Through most of the 80s and 90s I fought this battle and lost until I started doing my scans in Photo Shop using the Twain driver where I could select B&W 1 bit scanning. I would then import the TIF files from Photo Shop into Text Bridge to OCR with great results. Then I bought Acrobat 3.0 on sale and found that it allowed scanning with the Twain driver in any setting I could set on the driver an additionally it contained a superior OCR utility within it called Capture.

This package is the one I have used for more than ten years now with great success to scan and OCR documents I wanted to preserve electronically. I do have the later Acrobat 5.0 that i now do most of my scanning with because it allows 50 or more pages to be scanned before saving as against having to save each page before scanning the next as has to be done with Acrobat 3.0. The capture utility in 3.0 however is superior in general performance than the one for 5.0 and produces more compact documents with standard fonts instead of the odd ball fonts the 5.0 Capture uses.

However the biggest secret to pain free OCR is scanning in pure single bit B&W mode where character definition is superior allowing clear error free work. Color and grayscale scans have so much added information the OCR is working in a fog of extra data it must discard before it can work and that is not easy. Compare driving at night in tulle fog to driving on a clear summer day and that is the difference between the two methods of scanning for OCR. I have used this method now for more than ten years with success and have taught it to many others who use it with success as well. I would never go back to the OCRing in a heavy fog that most that try it do, just too painful to think about.

James Early
Long Beach, CA

Mr. Early attached some samples of PDF files.

You can tell the pioneers by the arrows in their backs: I early on tried using OCR on scanned materials and the results were so awful that I went back employing an editor to key in the materials. It may be that things have much improved since then.

I have a number of my early works. There aren't any electronic copies, and I gather there's some demand for them. I will have to look into this matter when I get my current books done.

Subject: Converting .doc to .pdf

Jerry,

I use a free program called CutePDF Writer (www.cutePDF.com, naturally) to convert any file to PDF. It functions as a printer, so any page layouts that you create in your .doc will appear just as they would on the printed page. You shouldn't have to proof it at all. I use it for Word documents, Excel spreadsheets, AutoCAD drawings, web pages, etc.

There are other programs out there that do the same thing, but some have little annoying habits, such as PDF995's need to open its webpage every time you convert a document. CutePDF doesn't bug you with sales pitches.

I suppose the point is moot, now that you've got the 6th Grade Reader in PDF, but hey, I'm nothing if not late. Maybe you'll find another opportunity to use it.

Debbie

Thanks. I'll keep that in mind and look for it when I get to my next project. I have two more books of stories, all public domain, that were once standard in the better public schools across the country. One is a book of stories such as "The Man without a Country" and "The Great Stone Face", which we all read when I was in public school. Another is a supplemental reader used in 9th and 10th grade.

Both of those will go into the subscriber area once I get past Inferno II, taxes, and a rather grueling travel schedule. I have them in .doc format, and I'll convert to PDF., probably with Acrobat but I'll try other methods just as soon as I have a bit of time.


On the future of disk drives:

Subj: Flash-based drives: wearing out?

Mailbag for 9 Apr 2007 talks about the new Samsung 30 and 60 gigabyte silicon disk devices.

I remembered flash memory life being limited by a "wearing out" effect on rewriting, so I went digging.

Samsung's data sheet is here: link to PDF.

I see a million-hour MTBF but no specification of erase cycle limits. This SanDisk spec sheet doesn't mention erase-cycle limits either, just a 2 million hour MTTF: link to PDF

This product manual from SanDisk talks about "wear leveling" being built into controllers within the individual chips, but I don't see any *numbers*: link to PDF

This spec from Intel is interesting, though: link to PDF

On page 5, here's how they calculate a 5-year life:

<quote>

Product life is at least 5 years or 43,800 power-on hours whichever comes earlier under the following conditions:

Power-on hours = 8,760 per year

Operating time = 100% of power-on hours

Active/Idle duty cycle = 90% of the time

1 GB Module Write Rate [1,2] = 12 GB per day (at 6 days a week, 52 weeks a year for 5 years)

Environmental = typical operating conditions

Note:

1. Write rate of 12 GB/day is multiplied by module density. Therefore a 2GB module Write Rate is 24 GB/day and a 4GB module Write Rate is 48GB/day. Please contact Intel Applications Engineering for applicability of other use models.

2. Assumes no static data files and all available blocks are used to write and erase data.

</quote>

I wonder how long it will be before we start seeing reports of failures due to the flash wearing out?

I guess you could run one without mirroring for a month or so, then add a second one and start mirroring. You'd expect (or just hope?) that the older one would fail first and that the younger one would keep the system up while you replaced the older one. You could continue that staggered-age pattern more or less forever.

Of course that could be ... awkward? ... if replacement was ... inconvenient? Like, oh, say, when the unit is ... on Mars?

(You remember the problems the "Spirit" Mars-rover had with its flash memory? One wag at JPL described the situation as, "The Spirit was willing, but the Flash was weak.")

I might be tempted to get one of these for often-read but rarely-written data, but certainly not for working file space, at least not until I understood what the erase-cycle limitation would do to me.

Rod Montgomery==monty@starfief.com

I thought this was interesting enough to put to the advisors. First Captain Morse:

Not to make light of Mr. Montgomery's question, but my experience with flash memory devices is that they last about two weeks. That's how long I can use one before I misplace the unit and it is lost.

Maybe that's what they are counting on?

Ron Morse

Alex Pournelle notes:

It's important to differentiate between the flash-ASSISTED drives and the flash-ONLY drives. The flash-assisted hybrid hard drives of course degrade gracefully, since they simply lock out the bad section of the flash when it comes up lame.

Other than that, I have no right to an opinion (and I'm not going to express one, for a change).

--Alex

Eric Pobirs thought about the matter:

In most cases where such drives are already used the OS is designed or tweaked to reduce or eliminate paging. This is certainly the case for a highly customized environment as found on the Mars Rovers. All of the applications are known down to the byte and the resources allocated so they can operate without resource contention. The Rovers had problems but held up far better than the likely failures of a conventional hard drive. One of those would probably have failed completely and irrevocably with no possibility of replacement.

This is why the hybrid drives use their flash memory section as a persistent cache for the most part and store writes to a much lesser extent. But there are significant power benefits in caching writes until the amount of data reaches a certain amount that makes it worth the power expenditure to spin up the platter. Will this entail some wear and tear on the flash memory's life expectancy? Sure, but the benefits are immediate for those who need to maximize battery life. Given enough smarts in the system, the use of the flash for write caching can be limited to when the system is running from the battery, thus avoiding flash write when they offer no benefit to battery life and extending the life of the drive.

I suspect those who measure their productivity by battery life would be willing to spend $100 a year to replace a drive in a laptop if the laptop's value is significantly increased. If they're getting so much use of the write caching that they degrade the flash to a volume too small to still be useful within that time frame, then they're getting their money's worth.

The drives are expensive now but it wasn't that many years ago that it seemed reasonable to pay $20 for a 64MB thumb drive. Now 1GB thumb drives are $5 loss leaders at Fry's. The time isn't far off, especially for portable users, when it will be practical to have several gigabytes of non-volatile memory for the unchanging parts of the OS and applications and a conventional hard drive for those files subject to frequent change. This will take root in portables first but will turn up in desktops as the cost goes low enough and it becomes equated with switching light bulbs as a conservation measure. I bet in a few years we see machines in Best Buy touting this as a 'green' feature.

So the people with both the need and the money get to be the beta testers. Some won't have the need but so much money they can try it for kicks. Suits me. Wealthy people have been bringing us new goodies by this method forever but in digital electronics you can see it happening in real-time.

Eric

Wealthy people now, but at one time it was computer magazines and columnists. I can recall when BYTE and Chaos Manor were major centers of new technology testing. We often built and tested new systems using new technologies, and BYTE was often first to publish the results. Some of that still goes on today.

Peter Glaskowsky takes an engineering approach:

Well, I'm wondering about this too. The numbers don't give a clear answer.

The best-case is easy enough to figure out. If each 512B sector can be erased and rewritten a million times, and the "write leveling" feature (which tries to distribute reads evenly across the drive) is perfectly effective, a 30GB drive can accept this many write cycles:

(30G / 512B) * 1M = 2 trillion

So if you're using the drive as swap space and sending it, say, 1MB/s on average, you can do this for this long:

2T / (1MB / 512B) = 1B seconds = 32 years

That would be fine, but we can suppose that write leveling isn't perfect. Could it be less than 30% effective? I don't know.

I got the 1MB/s number by looking at the disk-activity stats on my MacBook and multiplying up to be conservative. The actual reported number was 0.26 MB/s, but I don't do a lot of really heavy apps. If anyone has a better number, the effect on the bottom line is easy to figure.

I would say that if you don't use the flash drive for swap space, it should be fine. If it's your only drive and you're a heavy user, it probably isn't. But there's a big gray area in between.

. png

Which is probably the definitive answer: your silicon drive is likely to last until it is made obsolete by bigger, faster, and cheaper silicon drives. Moore's Law marches on. But if you worry at all, you might think about replacing your silicon swap drive somewhat more often.


The flash drive debate triggered another discussion. Rich Heimlich thought about the wrong question, but still came up with some interesting thoughts on iPod:

I hear about flash memory failures all the time but haven't yet experienced it. The main reason is that by the time any of my cards gets even moderate use, a much larger, much cheaper card comes out or the device in question has been replaced and comes with a much bigger, much cheaper card.

There's a hot hack being discussed right now where you can replace the hard drive in your iPod with a flash card. The benefit becomes immediately apparent in that not only is the device much lighter but battery life goes through the roof. My biggest concern with this would be how hard these cards would get hit having to stream music from them continuously for hours at a clip. That's a LOT of reading. The writing aspect would be much lower but still significant.

I SUSPECT that 98% of flash card users fall into my use levels. Only the die-hards beat a single card really hard. To give an example, I remember spending a small fortune on a 128MB SD card and just the other day spent $20 on a 4GB SD card. The life between those two is not all that huge.

It's a few years. If I really was a high-end user of these cards I'd likely have replaced the 128 with a 256, with a 512, with a 1GB with a 2GB with a 4GB and that would have given each one an average life of about 6 months. As long as this sort of inertia continues I don't think this will be a major problem for anyone but:

  1. The biggest, highest order of users
  2. Users of some device like a flash-based iPod where use is increased exponentially by the medium its on
  3. Consumers who are buying these used at the end of their life cycles.

My bad folks. I missed the word "drives" in the subject "Flash-based drives". I did that due to having just read the iPod hack article.

Rich Heimlich

I don't think reading a flash device will affect its life at all; it can't put any more wear and tear on the device than the electronics that are doing the reading will experience. I'd imagine that chip cache is as vulnerable to this sort of wear as silicon drives would be. Eric notes

Reads from flash aren't a wear and tear issue. Technically, there is some effect that could lead to eventual loss of a cell but the numbers are such that it doesn't bear consideration if the lifecycle isn't measured in decades.

But also, if you think about it, it isn't a lot of reading. It's one long end to end read. If you have 32GB of capacity packed full of music, it will take many hours to play it all but each memory location is only read once during the process. Now, if a single item were repeated endlessly, that would use the same cells over and over.

I read about the iPod hack but failed to see the value beyond being able to say you did it and have money to burn. Since several shipping SSD drives are plug-in replacements for existing platter drives, there isn't any real technical achievement. Apple already has 8GB Nanos available. Larger capacities will certainly follow as prices allow. If a 32GB flash-based iPod Video could be offered for Xmas 2007 at $400, I'm sure they would. I don't expect it this year but wouldn't be at all surprised to see next year.

This isn't one of those cases of the wealthy blazing a trail for the rest of us. The flash memory market is thriving already, as is the iPod market. So it's just a matter of time.

Eric


The big difference would be battery life. My iPod Photo (60GB) has abysmal battery life and I can directly impact it by using higher quality songs. The drive has to work harder for better music. Try Apple Lossless on an iPod and you'll have an iPod that better be attached to a power source or you're going to measure your battery life in minutes.

With flash this would be dramatically impacted.

I see a heavy trend of many devices that used a hard drives moving to flash for two key reasons--battery life and weight.

Rich


I understand about the battery life. The advantage is readily demonstrated. The question is one of cost. The iPods using flash at that capacity will be out when the price is in line with what Apple sees as capable of selling in decent volumes.

Right now, people can choose between a flash-based player with good battery life and a hard drive based player with much greater capacity but far lesser battery life. This is not going to change. When a 64GB flash drive comes down to a price that allows it to be part of a mainstream iPod, the hard drives will be offering 120GB and beyond.

Some may say that they'll never need that kind of capacity but standards change. First, people went from carrying around a small music collection, then moved up to carrying every CD they'd ever bought in their pocket. (This is ignoring earlier players whose capacity was terribly inadequate and failed to build much of a market.) Then they want it all in higher quality.

Then comes video to up the ante in a big way. People will want video that looks OK on the little screen but is full quality when outputting to a TV.

Once that becomes the norm they want to have a library of HD video content in their pocket.

There is no end any time soon to the hunger for more AV player capacity.

The highest capacity options will likely continue to be those requiring the most power draw. The options offering the best battery life will likely continue to have much lower capacity for the dollar.

Eric


Which brings us to part three of the discussion triggered by Monty's letter. It's not a subject I know much about. I have sufficient hearing losses that I can't really tell "great high fidelity" from a good car stereo system. My problems aren't hereditary, and my son Alex doesn't have that difficulty.

Rich:

Thank you for reminding me to ask you a question (probably several!) in open session, based on your (apparent) hate of MP3 quality (which I share):

On our way back from CES this year, Eric and I were discussing the possibility of a hi-quality online recording ecosystem, for people who can't stand MP3's artefacting, and he theorized the possibility of this sort of thing, though neither of us made any inquiries about it. Still, I think it's worth the follow-up, since sooner or later I'm going to get disgusted by the crappy AAC codec even on my laptop and have to re-capture all my frickin' CDs.

Audio snob? I would like to be a lot MORE of an audio snob... I just haven't had the budget. (And, yes, I can hear the difference in the $15,000 Sennheiser tube-amp headphones... or think I can... doesn't mean I have the money to go there currently.)

Thanks,
Alex

Rich replies:

I can tell you that MP3, as a format, isn't really the trouble most of the time. I have some music producer friends who have sent me MP3's taken from studio masters and even at relatively low bit rates they sound amazing. Much of the problem comes from what the industry refers to as zeroing out the CD's. In audio we call it compression, which I have a real problem with because tech people think of that as almost entirely the opposite of what it means to audio people.

If you look at the masters of say, (sorry for this example but it's a good one) and eminem track, it's fine. If you take a look at the waveform for the same track on the CD as it is in stores you'll see heavy compression added by the producer (or the artist). The waveform peaks at the top and bottom almost entirely across the track. It's noise. Most people perceive volume with quality and the producers know this. I'm amazed at how dead a lot of CD's sound as a result of this.

CD's made during the 80's suffer almost to a disc with this issue. Everyone did it. Thankfully remasters done today often address this and the results are a lot better.

Anyway:
> * How are you getting lossless recordings into your iPod?

A couple ways. First, I don't use iTunes even a tiny bit. I use one of two products. There's dbpowerAMP's Sveta product which is a tool from a set of very powerful audio tools from a company called Illustrate.

However, I generally don't recommend their stuff to most people as it's very unintuitive and some of the tools can be prone to bugginess. Then there's MediaMonkey. I think it's one of the best audio management tools available. You can move your audio from one file structure format to another with a click or two and it can go get album art easily. It also can do per-device setups so it knows my iPod by name and that I like my music in X format while my wife's iPod gets music in Y format and my son's Rio gets music in Z format. Both of these tools will then take my source material and do an on-the-fly conversion to the right format and options (my iPod gets album art for example). Both also offer direct access to the player file structure so I can do housekeeping. They also support various levels of auto and manual synchronizing.

> * Is this a mathematically-lossless codec or a truly zero-loss codec?

I believe it's the latter. It's been a few versions since I tested it, but when I first got into the lossless game I was a bit concerned about having a flawless archive of my music collection (the 1700 songs are just favorite singles). I've used a couple different ripping products to make sure I get an accurate rip of the CD first. Initially I used "Exact Audio Copy" which is a great tool for this. I then moved to dbpowerAMP's "CD Ripper" which uses something called "Accurate Rip". Once they were ripped into lossless I then would convert them back to waves and did an A/B and they were bit for bit identical. I tried several iterations and still they were bit for bit matches. I haven't done that with FLAC yet and should check that. I believe it too is bit for bit though.

> * Are you starting with 44.1 K sample material, or are you going for something higher quality (And if so, where the heck are you getting the material)?

When it's a CD it's 44.1K. Not much I can do about that one. I have several tracks that came from friends in the business but those are rare.

Rich

Rich Heimlich works with Etymotic Research, a company that makes $300 High Fidelity earphones. The emphasis is on accurate audio.