Dear Rauf,
Sorry for a delayed email.
On your questions:
1. As far as I know the CLE in UET Lahore has achieved a good accuracy in scanning Urdu content. But the Urdu OCR - again up to my knowledge - works with higher accuracy for larger fonts. The tool may have been even more robust - so, you can check with them.
2. CLE also developed a Stemmer for Urdu but according to my experience it only chops a some specific list of words. But it could be the case that I used an older version and some recent release might have adopted all the necessary rules.
In any case you can take a look at the following thesis (also supervised by Dr. Sarmad) that I read some time ago. It outlines several rules based on which a good Stemmer could perhaps be developed.
--
Kind Regards,
Yasir.
On Thu, Jan 19, 2012 at 7:08 AM, Rauf Malick <raufmalick@yahoo.com> wrote:
AssalamoaliekumI would like to ask that as this project is ICT funded then will your work open for public or classified.Secondly, will anyone share any information about any proper OCR and Stemmer for Urdu LanguageregardsRaufSent: Monday, January 16, 2012 10:39 PM
Subject: [pakgrid] URDU Content on Web
Dear All,
Under a ICT4D project we are trying to map the URDU content presently
available on WEB. Any information on portals,sources,leads,links,web sites
would help us a lot.
Information on persons and organizations working on preparing, translating
or displaying URDU content on web would also help to achieve the
objectives.
Information, suggestions, comments are requested.
Ammar Jaffri
0300-8551479
__._,_.___
.
__,_._,___
No comments:
Post a Comment