week 1
In the corpus project:
- with the look and feel of the documents finalized, ported over all of the existing work on Alexander I from the old site.
- created expandable years with document numbers on the alexander-i page
- added all available Document Summaries under the expandable years
- created and linked all available actual document translation pages
- created aux helper files and added a list-aux-pages entry in the admin menu
- added a no-parent entry in the admin menu but i don't think it captures everything
- added top level placeholders for poland and paul
All this took several hours
Also, removed the 'contact' menu from top and added about-us in side
(SM)
I need to make a summary page somewhere that shows the current state of the various translations begin done around the website. For example, for each page status could be one of
- untranslated
- more needed
- complete rough translation
- in good shape
Can also make an 'urgently needed' page where editors can ask for immediate help on a particular phrase from a page they're working on.
Alexander Docs 1-12 (1801-1803) now on site.
(SM)
Thanks to Valentin for the list of the all-russian numismatic conferences on the conferences page and for synopsis of the uzdenikov essay collection (2004)
I've scanned and added the text of the uzdenikov essay - overstriking of coins in the 18th c.. The scan came from the 1994 collection of essays. so hopefully hasn't changed in the 2004 book (Valentin?). This essay, as far as I know, wasn't translated for JRNS. It's interesting subject matter and may make a good test case for a journal submission of a group translation.
The russian text needs a little clean up. Because of page size differences, line break hyphens from the original often now appear in the middle of a line. These can be cleaned up as they are spotted. Also, reference superscripts from the original need to be added to the russian text here. I've done the later ones (13-16) but the earlier ones need to be located in the original and added to the text here.
Added
Alexander Docs 1-25 (1801-1808) now on site.
Added 'translate-me' tags for quick location of what's in translation scope.
Added the Table of Contents for the 7th all-russian numismatic conference, Yaroslavl' (1999)
(SM)
week 2
Added
Alexander Docs 1-25 (1801-1809) now on site.
I removed the translate-me tag. It really doesn't serve much purpose (almost he whole web site is supposed to be about translating) and was quickly dwarfing any other tag.
Added
- Paul Preface (page 1)
- Paul Preface (page 2)
- Paul Preface (page 3)
- Paul Preface (page 4)
- Paul Preface (page 5)
- Paul Preface (page 6)
(SM)
Added
- Paul Document No.2
- Paul Document No.3
- Paul Document No.10
- Poland Preface i
- Poland Preface ii
- Poland Preface iii
- Poland Preface iv
- Poland Preface v
- Poland Preface vi
(SM)
Final documents are brought over from the old wiki. These are
- Poland Document No.01
- Poland Document No.02
- Poland Document No.04
- Poland Document No.05
A lot of formatting work needs to be done on these (original paragraph structure, tables etc).
The only remaining page on the original site is the (small) numismatic dictionary.
Created project-status in the admin area.
Created and tagged all remaining document pages for Paul (haven't added the russian text yet, though)
(SM)
- All 1796 Paul documents on site (10)
- All 1797 Paul documents on site (17)
- All 1798 Paul documents on site (6)
- All 1799 Paul documents on site (4)
- All 1800 Paul documents on site (9)
- All 1801 Paul documents on site (2)
Formatted Uzdenikov overstriking article to show paragraphs, removed line-break characters, and tracked down the reference superscript numbers within the text. That article should be good to go now, as far as being ready to translate.
week 3
Quiet week from me as I've been working on the auction database project. Valentin added the dictionary entries from the old pbwiki site..thanks!
For the corpus translation, I looked at OCR for reading pre-1918 Cyrillic. Abbyy FineReader Pro can do this, but costs $399. Probably worth it but need to think about it. I seem to recall being able to do this before (a couple of years ago) with some version of Abbyy but maybe that was a trial. Will sniff around my old emails to see if I can find any Abbyy info.
OK…it was Abbyy ScanToOffice. I bought a copy a couple of years ago for $50 and it did cyrilic. I don't remember if it did *old* russian characters. Don't know where that copy is now as I had to reinstall windows at some point.
I checked, and I do have Abbyy ScanToOffice installed at home. Trying a scan of a page of the Peter III Volume quickly showed 2 problems
- The format of the (Blue & Gold) Corpus I have is too large for my flat-bed book scanner
- ScanToOffice does have russian as one of two possible input languages active at any one time, but doesn't recognize old cyrillic characters. (at least at 300 dpi). For example, it scans Цi as ш. Maybe increasing the resolution would allow it to separate the characters into 1 russian+1english. Interestingly it does change Yat ѣ to e, though I'm not sure it's being smart…maybe e is the closest thing a Yat looks like?
The format problem *might* be solved if I can put Basok's images from his corpus CD through OCR (thanks to BKB for that suggestion). I'm not sure Basok's resolution is high enough though. If it is, I still need to solve the old cyrillic problem, and will try a demo of Abbyy's pro ($400) solution to see how it does.
week 12
The Basok CD resolution was definitely not high enough. A google search for online scans of the corpus returned many results but all those checked were exactly Basok jpgs.
A couple of weeks ago Valentin tracked down an online source of high resolution scans of almost all volumes. Unfortunately the Anna+Ivan III volume is missing, so that will have to wait. It should be easy to use any one of the available volumes to test the Abbyy OCR method on old cryrillic. Step 1 will be to take a test page from the PDF and convert it to a high resolution image format (eq TIFF, BMP, JPG). Actually Step 0 is to see what format Abbyy needs.
OK…seems like Abbyy (Professional Edition) will take PDF (as well as many other input formats). If it *can* work with the corpus pdfs that will save one step (converting PDF to TIFF/JPG/BMP).
Downloaded trial version of Abbyy Professional Edition. Installing….
Good news. Abbyy pro works just fine on the corpus PDF and seems to have little trouble reading the text, even in old cyrillic. One annoyance is that the corpus often breaks words across lines and when exported to word after OCR, the word break, when carried over often appears in the middle of a line, som- ething like this.
Even better, the $349 price for Abbyy Finereader Pro is currently $149 (sale), and a $30 coupon brought it down to $119. I've already bought it.
The next question is where to start….
One idea Valentin and I talked about is to choose a small volume like Paul or Peter III and work on that. Another idea could be to translate the Preface to each of the available volumes.
week 14
Have decided to start by posting the Prefaces to all available volumes. This is manageable, and should both useful and of broad appeal.
To that end I need to extract the Prefaces PDF pages one-by-one from their parent PDFs (actually a slow process with pdfedit995). The single PDFs will then get passed through Abbyy OCR and then imported into this wiki.
So far, I've extracted
- Elizabeth & Peter III Preface, 2 Pages
- Elizabeth Introduction, 4 Pages
- Peter III Introduction, 2 Pages
into single PDFs. Next step will be to pass through Abbyy, creating MS-Word docs, and then importing here.