Ancient Jowetteer - What are the odds?

noggin not available yet!

Come to a Technical Meeting and not only natter but get your Jowett going better.
Jowett Technical Weekend
Post Reply
Nick Webster
Posts: 313
Joined: Thu Sep 18, 2008 11:38 pm
Your interest in the forum: Jowett Javelin Registrar
Given Name: Nick
Location: Cromer, Norfolk UK

Ancient Jowetteer - What are the odds?

Post by Nick Webster »

Trawling through the depths of JowettTalk, I was interested to note that among the archive of comparatively recent editions of the Jowetteer there was an older one - and just one - from April 1968. This I read with interest, particularly the advertisements, with the usual sentiment of "if I had a time machine I would go back and buy the lot". Despite the fact that the adverts went into fascinating detail about condition and re-newed parts, few of them gave any clue as to their actual identity. I was just wondering how many of the vehicles advertised still survive when I came upon a familiar name which I recognised from my own Javelin's log book. I wonder what are the odds of chancing upon that one particular issue containing a historic advertisement for your own car?

Nick
JCC Member
Keith Clements
websitedesign
Posts: 3968
Joined: Wed Feb 08, 2006 11:22 am
Your interest in the forum: Jup NKD 258, the most widely travelled , raced and rallied Jowett.
Given Name: Keith
Contact:

Re: Ancient Jowetteer - What are the odds?

Post by Keith Clements »

One day somebody will scan in all the pre 82 Magazines.
skype = keithaclements ;
Nick Webster
Posts: 313
Joined: Thu Sep 18, 2008 11:38 pm
Your interest in the forum: Jowett Javelin Registrar
Given Name: Nick
Location: Cromer, Norfolk UK

Re: Ancient Jowetteer - What are the odds?

Post by Nick Webster »

Are you saying nobody is on the case and so you do not have PDFs of pre 1982 but you wish you did?

Nick
JCC Member
Forumadmin
Site Admin
Posts: 20648
Joined: Tue Feb 07, 2006 5:18 pm
Your interest in the forum: Not a lot!
Given Name: Forum

Re: Ancient Jowetteer - What are the odds?

Post by Forumadmin »

The pre 82 Jowetteers need to be scanned. If done well they can then by OCRed, to make them into a searchable archive.
I have most of them going back to 1974 and some before that.

Let me know if you or anybody else can do the scanning. Ideally it is best done with OCR software.
BarryCambs
Posts: 331
Joined: Fri Nov 25, 2011 2:49 pm
Your interest in the forum: Owner of a long two in Cambridge
Given Name: Barry

Re: Ancient Jowetteer - What are the odds?

Post by BarryCambs »

I have a Panasonic batch feed scanner here we used to use for processing questionnaires before we got our data collection online. It does a few hundred pages a minute, the more time consuming thing being taking the staples out and in the case of Jowetteers, putting them back in!

It was waiting to go for recycling, so I'll dust it off and see if it's working. Do you have a preferred OCR package I can try?
Keith Clements
websitedesign
Posts: 3968
Joined: Wed Feb 08, 2006 11:22 am
Your interest in the forum: Jup NKD 258, the most widely travelled , raced and rallied Jowett.
Given Name: Keith
Contact:

Re: Ancient Jowetteer - What are the odds?

Post by Keith Clements »

Could be useful.
What dots per inch does it scan at?
Is it USB connected or WIFi or bluetooth?
Do you have the PC software that came with it as it might have a decent driver?
The pre 82 Jowetteer were foolscap as are the Jowett Jive set I have.
I have no preference for OCR software so if we can beg, steal or borrow then we can see if it works.
We need one which can handle columns and pictures, ideally giving a method of correction after doing a spelling and grammar check. In the past I fed scan in jpg tiff or pdf into Adobe which has special Clearscan technology that repaints characters. I can still do that.
But good OCR software can talk directly to scanner to optimize character recognition. If not greyscale 600dpi is best for Adobe.
skype = keithaclements ;
Nick Webster
Posts: 313
Joined: Thu Sep 18, 2008 11:38 pm
Your interest in the forum: Jowett Javelin Registrar
Given Name: Nick
Location: Cromer, Norfolk UK

Re: Ancient Jowetteer - What are the odds?

Post by Nick Webster »

Sounds like Barry has a super dooper machine. I was not thinking of such a fast job, but just plodding through some of the back issues for the fun of it.

I am not sure what Keith is looking for as the finished document - I am not familiar with OCRED. Personally I have an Epson flatbed and Abbey Finereader professional software. I have scanned magazines and manuals from scanner, converted direct to Pdf and they seem absolutely fine. Adobe reader will search the text. Is there any further need for post scan processing?

Nick
JCC Member
Nick Webster
Posts: 313
Joined: Thu Sep 18, 2008 11:38 pm
Your interest in the forum: Jowett Javelin Registrar
Given Name: Nick
Location: Cromer, Norfolk UK

Re: Ancient Jowetteer - What are the odds?

Post by Nick Webster »

Further to my previous post I have just dug out one of the old foolscap Jowetteers - 1981 - and popped it in the scanner. The raw scan to pdf presents it exactly as it appears on the page. I must admit it has a certain charm and certainly reflects how far production standards have risen over the years. I then processed it to a text document and then to Pdf. This throws up various errors with spelling etc, all caused by the OCR trying to interpret smudges and marks on the page. Tediously this can be corrected and saved as a perfect version. But I would rather read the original looking one that warts and all.

Nick
JCC Member
BarryCambs
Posts: 331
Joined: Fri Nov 25, 2011 2:49 pm
Your interest in the forum: Owner of a long two in Cambridge
Given Name: Barry

Re: Ancient Jowetteer - What are the odds?

Post by BarryCambs »

I've dusted it off and it's up and running. The Scanner is a Panasonic kv-s5055c

https://panasonic.net/cns/office/produc ... tions.html

I downloaded the software from the Panasonic site, but I haven't had time to play much, so I'm not sure if it does searchable PDF directly. It took about 3 seconds to scan the December 1990 Jowetteer I happened to have on the desk, but you do have to pull the staples out first, then get distracted by reading it with a cup of coffee afterwards.

I'm happy to have a go at the scanning, but it might take a while as we are supposed to be moving house at any minute. This was originally going to be 2 weeks before Christmas, but one of the solicitors has taken it upon themselves to try and bring the whole thing down for some reason no one seems to understand.

If someone want's to borrow the scanner, that's fine, but it's surprisingly heavy, so would probably need collecting rather than shipping. Roughly how many copies are there left to scan Keith?
Keith Clements
websitedesign
Posts: 3968
Joined: Wed Feb 08, 2006 11:22 am
Your interest in the forum: Jup NKD 258, the most widely travelled , raced and rallied Jowett.
Given Name: Keith
Contact:

Re: Ancient Jowetteer - What are the odds?

Post by Keith Clements »

Jowetteer 20 years at least x 12 editions x 8 double sided approx for Jowetteer. We have yet to find any pre-1968. But they may still exist.
JowettJive 42 editions x 6 d-side. I have most of them.
By Jupiter 45 years x 4 x 14 d-side. I have most of them.
The Javelin... if we can get them there are 6 per year started in 1957. I only have a few but Noel may have them in the library.
Flat Four ditto but started 1961.
There will be other useful stuff that should be put in the digital library such as Year Books, Rally Programs, Membership Lists and Vehicle Registers.

I have another pile of DVDs from the JCC ltd film archive to upload to make a total over 90.
skype = keithaclements ;
Keith Clements
websitedesign
Posts: 3968
Joined: Wed Feb 08, 2006 11:22 am
Your interest in the forum: Jup NKD 258, the most widely travelled , raced and rallied Jowett.
Given Name: Keith
Contact:

Re: Ancient Jowetteer - What are the odds?

Post by Keith Clements »

NIck,
Whilst Adobe Reader will scan the document it is given, then OCR it and then you can search the text , it can only do it for that document.
My long term aim is to improve the searching of the magazines, but since they contain what may be considered sensitive information they cannot be given to Google or Adobe services.
All the magazines are currently in the JowettTalk library but they are also in a special indexed library that you can access using a webdisk.
This archive has a good index and presents the search results in an easy to use way with excerpts. Then when you click an excerpt it will display the page in enhanced Cleartext which gives you the warts and all, but also makes it much easier to read.
I have investigated putting the mags into a Content Management System, but that is not as good as the Adobe Cleartext and Indexed Archive. I have also investigated using a private Google Cloud Service but that costs, but is probably the best solution !
skype = keithaclements ;
Nick Webster
Posts: 313
Joined: Thu Sep 18, 2008 11:38 pm
Your interest in the forum: Jowett Javelin Registrar
Given Name: Nick
Location: Cromer, Norfolk UK

Re: Ancient Jowetteer - What are the odds?

Post by Nick Webster »

Keith
To clarify my remarks about Adobe that you mention in the first part of your previous post, I don't use Adobe (which I avoid because it is now mostly cloud based I think) as any part of the OCR or scanning process. The OCR is part of the scanner supplied software and will give a finished product as a completed page. If I understand your thinking you also propose to remove "sensitive material". I am not at all sure that the data protection act applies to something that is already in the public domain. At this point Data protection is perhaps best not included into a thread about the practicalities of OCR.

I now understand that your aspirations for the documents go beyond merely digitising the old newsletters. This and potential "re-editing of content" presents a considerable amount of extra work, I would venture to suggest. This would be mostly because in converting to editable print, any OCR software will struggle to convert a low quality type written original without building in a huge number of mistakes. I have laboriously edited many (non Jowett) documents in the past. Since starting this discussion I have tried a sample of the type written Jowetteer and concluded that if an editable version is required, to actually type it again could be quicker. As you may know, Motor magazine have released a digital archive but despite a good quality print original, have largely resorted to putting it up errors and all. I appreciate the scale of their efforts are somewhat greater than pages of Jowetteers.

It occurs to me that if the archiving of JCC documents is to be achieved without it seeming an almost impossible uphill task for any potential volunteer, then perhaps the best route is to create a separate index file with an index of each copy. The old editions of Jowetteer are not so long that once directed to an edition, thumbing ( or rather, scrolling) through 8 pages to find something is that difficult and as I said before, the original typed versions have a particular charm all of their own.

Nick
JCC Member
Keith Clements
websitedesign
Posts: 3968
Joined: Wed Feb 08, 2006 11:22 am
Your interest in the forum: Jup NKD 258, the most widely travelled , raced and rallied Jowett.
Given Name: Keith
Contact:

Re: Ancient Jowetteer - What are the odds?

Post by Keith Clements »

Yes Nick. The first job is to find all our material for archive and scan it in, making sure it is of a sufficiently good quality to be useful.
Now that is the first compromise and one which requires human intelligence.
There are many factors enabling that decision.
What is the importance of the document? The answer depends on the document and who is asking the question.
For instance, a sketch of a part requires every line to be clear and any dimension to be unambiguous. You can alter many parameters on the scanner software to best achieve this for an individual part of a document but this might compromise another part and make it unintelligible.
There may be basic forms of artificial intelligence which are incorporated in some OCR software that help in this respect.
I used such when I used the facilities at the Rootes Archive Trust to scan in the JCL technical drawings. But still it required lots of trials to make some scans usable. In some cases one page was more than 100MB in size, but the effort and storage space was justified.
For most pages the bar must be set lower to such that the page is readable, perhaps with some human OCR to interpret the smudged characters. This is simply to save disc space. A greyscale 600 dpi maybe 3MB , a black and white 200 kB but an OCR only 30kB. The latter should be the most readable if it is proof read. It can also be put in the database to provide a search facility.
Note the Jowetteer index was started by Eden Lindsay and is on JT. Most technical docs have been prepared by Michael Allfrey and are on JT. We just need to add to these.
Just to reiterate all the post 1982 Jowetteers have been , scanned, OCR ed and indexed into a very useful resource available to members. But the index only works on a local PC using a local disk file. It will not work over the web disk which is used to transfer files to the local machine. This is not satisfactory and is caused by Adobe restrictions. Hence my search for a better solution. The sensitivity is not a GDPR issue but one instilled in JowettTalk from the outset where members have a space to communicate in a friendly (not hostile) environment, unlike other social media.
skype = keithaclements ;
BarryCambs
Posts: 331
Joined: Fri Nov 25, 2011 2:49 pm
Your interest in the forum: Owner of a long two in Cambridge
Given Name: Barry

Re: Ancient Jowetteer - What are the odds?

Post by BarryCambs »

I've managed to convert the scanned Jowetteer to searchable PDF and it works well, but is 17mb. I did scan grey scale and was assuming all these older documents are. The scanning takes seconds, it is the rest of the process that takes the time. It does look like there is at least one bit of software that will do batch processing from the command line (LINUX), but I guess you could still end up changing every file name manually to fit the indexing.

My big problem at the moment is time. I can look at this, but realistically, not until the end of the year. If it would help, I'm happy have a go at the initial scanning (the easy bit!) if someone else wanted to have a go at the conversion and indexing.
Keith Clements
websitedesign
Posts: 3968
Joined: Wed Feb 08, 2006 11:22 am
Your interest in the forum: Jup NKD 258, the most widely travelled , raced and rallied Jowett.
Given Name: Keith
Contact:

Re: Ancient Jowetteer - What are the odds?

Post by Keith Clements »

Hi Barry,
A few questions.
  • Which Jowetteer did you acan?
    Do you have any pre 1982?
    What converted the scan into the searchable pdf?
    How accurate was the OCR, perhaps expressed in per-centage of failed words?
    Did the page have any smudges or poor quality printing?
After scanning I always find a few pages need a rescan, and after proof reading they might need adjustment and rescanning.

If it is a 17MB pdf and it has been OCRed you can normally reduce that by three times and still have a good copy.
Each of the process steps can be done by different people.

  • The pages can be assembled and prepared (corners straightened, un-stapled, sorted in order) ready for scanning.
    The pages can be scanned.
    Put onto computer server in JowettTalk.
    OCR.
    Put back onto server.
    Proof read and corrected (possibly asking for rescan of selected pages).
    Put back onto server.
    File size reduced.
    Put back onto server.
    Indexed with all the other editions or, if we find a system that easily adds to the index, simply included in the index.
    Put back onto server accompanied by index.
    Made accesible via website
That is how I did it for the the ones in the library. This way the original , perhaps very large, files are stored off-line on a hard drive or DVD or USB stick, and can be referred to for extra detail or additional OCR.

Extracting text and producing an index is not really the issue. It is pointing the user to the paragraph where the desired text occurs that is a challenge in pdf docs. This is why I would prefer to create HTML docs that can be indexed using HyperText Locators.
skype = keithaclements ;
Post Reply

Return to “Natter”