|
Post by Doug 周 on Apr 13, 2009 0:44:01 GMT -5
How do you know if you have a good GEDCOM? I have emphasized owning your own GEDCOM replete with Chinese characters. I have recommended using an on-line genealogy services like www.Dynastree.com , www.Myheritage.com , or www.Geni.com to generate a GEDCOM. With this GEDCOM, you can experiment with other programs by importing your GEDCOM into a computer based genealogy program or importing to another on-line genealogy service to try out their features (like www.jiapu.cn ) The question begs: how do you know you have an accurate GEDCOM? We assumed you have exported a GEDCOM generated by an on-line or computer-based genealogy program. Step 1: Configure your computer and browser to read UNICODE. siyigenealogy.proboards.com/index.cgi?board=software&action=display&thread=849reply #2 Step 2: Familiarize yourself with the structure and format of the GEDCOM file, I recommend looking at an URL Henry provided: blog.eogn.com/eastmans_online_genealogy/2008/08/gedcom-explaine.htmlStep 3: Open the GEDCOM. The GEDCOM is a CSV- text file and very easy to read using commonly available programs, some of them free. It can be read with programs like either Microsoft EXCEL or WORD or a text program like Microsoft NOTEPAD. Make sure you enable UNICODE. To force EXCEL or WORD to read the GEDCOM, you may have to rename the GEDCOM file with a more friendly extension like .DOC or .XLS. If you don’t own the license for WORD or EXCEL (NOTEPAD comes with all Microsoft Operating Systems) then try: docs.google.com or download.openoffice.org/and use the free on-line spreadsheet or word processing programs. I have no experience with Linux or Mac OS’s. It took me several years before I discovered that I could read a GEDCOM file directly. By reading the GEDCOM directly, I learned that there is no true standard format for GEDCOM but the knowledge of what was in my GEDCOM allowed me to control its content. Step 4: Open a window with your genealogy program and navigate to a sample person or profile. Use the most extensive profile or the person with the most extensive information you want to corroborate in your GEDCOM. Assume this person’s name is SAMPLE_NAME. Don’t try to read the whole file. Concentrate only on one person or else you will become inundated with information-overload. Step 5: Go to the window with your opened GEDCOM. Initiate the SEARCH function of the program (frequently [EDIT]=>[FIND]) and type the SAMPLE_NAME to find the GEDCOM record containing SAMPLE_NAME. You should be familiar with the structure of the GEDCOM from Henry’s URL reference: blog.eogn.com/eastmans_online_genealogy/2008/08/gedcom-explaine.htmlYou may have to search for SAMPLE_NAME more than once to find the beginning of the correct record. Step 6: Once you find the correct GEDCOM SAMPLE_NAME record, compare the information with the SAMPLE_NAME profile in your genealogy program. This is the advantage of opening your genealogy program in one window and the GEDCOM in another window. Having 2 monitors is helpful. Or else, make each window ½ screen size so you can compare the information side by side. * I will reorganize my data in the genealogy program in order for it to represent the data accurately in the GEDCOM record. I may add carriage return(s) on long text to make sure the information carries over to the GEDCOM. I may discover that certain fields in a particular genealogy program do not transfer across to the GEDCOM. I therefore re-position or reorganize my genealogy program’s data to make sure the information is kept true in the GEDCOM. I may discover that the programers re-program how the data is saved and my formerly accurate GEDCOM is better (or worse). Finally, I then save my data in the genealogy program and re-export a new GEDCOM and recheck for accuracy. This computer-geek-speak can be confusing so please private email me off-list by clicking on my name on the left and I will try to help you individually. Finally, despite the additional cost, I still recommend www.dynastree.com to build a relatively accurate GEDCOM family tree with Chinese characters (Zupu).
|
|
|
Post by chansomvia on Apr 13, 2009 5:11:48 GMT -5
hi Doug, Your post: If you don’t own the license for WORD or EXCEL (NOTEPAD comes with all Microsoft Operating Systems) then try:
docs.google.com
or
download.openoffice.org/
and use the free on-line spreadsheet or word processing programs. I have no experience with Linux or Mac OS’s.I use openoffice and can assure you that this free program will open .doc and .xls files easily. This is a very powerful program and even people on Windows are using it, it is not necessary to have a Linux or Mac OS. Thank you for your very detailed and clear explanation on GEDCOM usage, it certainly adds a veneer of professionalism to this forum. Joe
|
|
|
Post by Henry on Apr 13, 2009 9:08:17 GMT -5
Hi Doug,
I would like to ditto Joe's thanks. Thank you so much !!!
You have helped Forum members immeasurably by first taking on this very difficult thread, providing information about the COCR2 program - which I, Philip, and Ben used extensively - whew, what a time and effort saver for converting images of Chinese characters into actual Chinese text characters.
Regarding the GEDCOM - are you saying that the GEDCOM is not so standardized? Are the tag labels and size parameters standardized? Is this basically a kind of "metadata" file?
Henry
|
|
|
Post by Ah Gin on Apr 13, 2009 15:09:46 GMT -5
Doug, Henry, et al,
I have been inputing data into Geni, and thus progressively build up our family tree. Today I tried an experiment. In Geni, select the "Relative" tab. Select "List". High light the area I wish to "download". Right click the highlight area, select "copy" to screen scrape the data. Go to a new Excel page, paste. Repeat for all the remaining list of names. The Excel spreadsheet seems to accept the copied data very well. So in a way, I have "backed up" my work.
Regards, Ah Gin
|
|
|
Post by Doug 周 on Apr 24, 2009 20:04:39 GMT -5
Henry, Here is another example of a GEDCOM file. This is from my own GEDCOM, parsed to emphasize some examples and protect privacy. The words in red and green are my comments for explanation of the GEDCOM. Sorry for the delay in the response since I had a hard time getting a photo uploading site. It is essentially a text file separated by a bunch of ‘commas'. The term CSV means Comma Separated Value. A comma separates each data value. It is meant as a computer-readable file. Programs like EXCEL and WORD will understand what the comma's mean and display it appropriately. If you open it with a text reader like NOTEPAD, it may look like a bunch of words and is very intimidating. Word of caution: Be careful about modifying and saving your file for future upload. Computers are very picky and any little mistake will 'corrupt' the file and make it unreadable. The purpose of this thread is to allow you to read and verify your GEDCOM. Let your genealogy program fix and generate any updated GEDCOM. Of course, always back up your files. You don't want to ruin your archival GEDCOM Zupu. Use the [RENAME] function on your computer. Quoted text from Henry: Regarding the GEDCOM - are you saying that the GEDCOM is not so standardized? Are the tag labels and size parameters standardized? Is this basically a kind of "metadata" file? I am not sure what size each tag is allowed under the GEDCOM protocol. GEDCOM is a very loose standard and given name, surname, middle name, date of birth and birthplace are very standard. Family relationships are also standard. Everything after that depends how the programmer wrote things into GEDCOM and depends how the other programmer reads a GEDCOM into their genealogy program. As an example of this loose standard, look at: 2 _VWDPN 王巴崙 SAMPLE_NAME WONG which is towards the middle of the page. Please refer to my GEDCOM example again: This 'tag' named '_VWDPN' is the 'display name' for the Dynastree program. Most other programs do not understand this tag. Kerry's PhpGedView www.kerrychoy.id.au/phpGedView/index.php?ctype=gedcom&changelanguage=yes&NEWLANGUAGE=englishcan read that tag, however. This is an example of the loose standard of GEDCOM. Some programs can read this tag. Many cannot. Of interest, this individual tag (_VWDPN) provides a way for the GEDCOM to have several names per person, and allows Dynastree to have a display name different from the given and surname algorithm. Practically speaking, I can display the surname first for relatives in China, and display the given names first for relatives outside China. Hope this helps and does not confuse. Doug
|
|
|
Post by Henry on Apr 25, 2009 8:49:21 GMT -5
Hi Doug, Thank you for all your dedication and hard work on revealing all the details on GEDCOM. You probably are already aware of the following links: www.genealogyforum.com/gedcom/forum.geni.com/topic.php?id=11981www.tngforum.us/index.php?showforum=25www.gedcoms.com/index.php?option=com_joomlaboard&Itemid=28&task=listcat&catid=5I am quite surprised at the GEDCOM "standard" is so variable - I am starting to think that we may need to create a self describing extension to GEDCOM via XML for Chinese genealogy, however, this is way beyond what normal everyday users of genealogy software want to hear about or to even know. Perhaps Ahgin's rather basic approach of using an Excel spreadsheet to preserve and back up valuable genealogy information may be the way to go until a real GEDCOM "standard" emerges. I am also beginning to believe that the Excel spreadsheet may be the excellent mechanism to display and store Chinese lineages - as has been demonstrated by Philip Tan and Xuangxing. The Excel spreadsheet is quite a standard in its own right and could be the textual and graphical mechanism - we Chinese genealogists seek. It seems to handle images of Chinese characters and the actual Chinese characters as textual scripts without any problems. The ability to be able to include either images of Chinese characters and/or Chinese textual script is probably the biggest impediment for Chinese genealogists - so since the Excel spreadsheet seems to handle this much better than a traditional database approach - maybe we do not need so much - overkill. As everybody has seen on the Clan Progenitor Reference thread, the use of the Excel spreadsheet for presenting and displaying the Chinese lineage information is very nice. I also think it has some basic searching capabilities and some logical capabilities that can be embedded in the cells. There are probably ways to link on graphics, voice, multimedia, and/or logical features of Microsoft Office into the Excel spreadsheet as they are all part of the same suite. If we accept the Excel spreadsheet as an interim / transitional solution - then Doug and others can get on with the real business at hand - researching and recording Chinese genealogy lineages and information. Comments? Henry
|
|
|
Post by Ah Gin on Apr 25, 2009 19:41:48 GMT -5
Henry et al, The sun is setting in the west coast of California, and I am having a cup of tea, reflecting the day. Let me explain: Doug and I agreed to meet at the Gin Association at San Francisco, to have a wide range of heritage-related discussions, with a lunch break of Dim Sum at Chinatown. We said we were to start at 11am, but clearly we were so much looking forward to the meting that both of us were early. I lined up our Association President to meet and welcome Doug to our modest Association. After a brief introduction of history, current management structure of our Association, we went straight to the heart of the discussion, and that is GEDCOM etc. Doug treated me to a taste of a number of Roots-related software, and I am very impressed with Doug's knowledge of the software capabilities. Equally impressive is the extent of his Family Tree. He must have spent hours engaging his own relatives near and far, to build up a content that any Family Researcher would be proud of. Thousands of names, compared to my current Geni version of less than 100 names. The face-to-face discussion made a great deal of sense to me. of what he posted previously. Thank you Doug for your generosity of time and willingness to help. We also spent time talking about what a typical Zupu contains. In this case, the Gin Zupu from our Association library. We also located the address and contact point of his association, located at Waverley Place, the heart of San Francisco Chinatown. Whilst this is not a technical posting about Software for Chinese Genealogy per se, it is about recording my heart-felt thanks to a fellow researcher, so willing to help others. At the Dim Sum place, we did the Chinese thing, of fighting over the check. Doug played the card of "You are a visitor". I can't argue too much on this, and my offer to him and indeed any fun-loving serious researcher -- I will return the favor, when one day we meet again. And who knows, it might be sooner than we think. As if it was not enough, he drove me home to my camping ground in San Francisco, in a well appointed convertible (sorry, I know not much about expensive cars, but I know comfort when I sat in one). Thanks again. We will pick up offline to further exchange ideas and mutual support. Regards, Ah Gin
|
|
|
Post by Henry on Apr 25, 2009 20:23:26 GMT -5
Ahgin & Doug, Among the rewards of Chinese genealogy research are the friendships that develop during the process. Henry
|
|
|
Post by Doug 周 on Apr 30, 2009 12:09:27 GMT -5
Such a gracious post from AhGin, the pleasure being all mine. I plan to be on the USA East Coast this summer and I want to meet Henry. Hopefully we can arrange something. Frequently it is better to communicate and share in person than getting 'carpal tunnel syndrome' with typing long posts. As Henry said, this forum has provided opportunities to network. I would like to re-emphasize the advantages to owning your family tree as a GEDCOM. GEDCOM is a computer database which genealogy software will interpret. The software will then display the names and textual data on the computer screen in a format determined by the programmer. The purpose of owning your own GEDCOM is to allow you to try different programs without having to re-input your basic data. Every service or software has different features. I use Myheritage.com and Jiapu.cn to keep in contact with relatives in China since these have Chinese language modules. I use Dynastree to maintain my GEDCOM, communicate with English speaking relatives, and to manage my family photos. I use Geni.com and PhpGedView to keep up to date with some of the cutting edge ideas in genealogy software. I do all this by merely uploading my GEDCOM and avoid having to re-input over 1000 names, DOB, and locations. The purpose of this particular thread is to make sure your GEDCOM is accurate and has all the data you have meticulously input via your software interface. Then, when you upload your verified GEDCOM to another software or service, you have a good idea what the new software or service has to offer. Because of the confusing nature of the GEDCOM structure, please try not to interpret the data and structure manually. That is the function computer software. Especially don’t try to manually modify the GEDCOM and expect it to upload correctly into a service or software. The chance of data ‘corruption’ is high. This post-thread is meant to advise how to compare your GEDCOM data to your favorite software program. Henry quoted: I am quite surprised at the GEDCOM "standard" is so variable - I am starting to think that we may need to create a self describing extension to GEDCOM via XML for Chinese genealogy, however, this is way beyond what normal everyday users of genealogy software want to hear about or to even know. Again, GEDCOM the ‘standard’ language for computer genealogy software to read and share data. It should not be used to read and decipher your data manually. It is an older standard and will not accept multimedia. There are the newer GEDCOM 5.5 and 6.0 which are XML in format. These versions have existed for years. However, the phenomena of ‘which comes first, the chicken or the egg’ apply. Programmers will not use the newer 6.0 standard until it becomes more common. Hence it languishes. Henry quoted: Perhaps Ahgin's rather basic approach of using an Excel spreadsheet to preserve and back up valuable genealogy information may be the way to go until a real GEDCOM "standard" emerges. etc. I saw the ‘screen scraping’ demonstrated by AhGin into Excel. It works well with a deep longitudinal family tree with minimal branches. However, with wide branches and multiple in laws and 2nd to 6th cousins, (a ‘wide’ extensive tree) the connection of the trunks might become confusing. The ability to display photos is useful. Please store your important genealogy information in as many ways as possible. We assume that GEDCOM will continue to be a standard for years to come. EXCEL spreadsheets have also been a standard for years also. Store your data onto paper as a printout also, because paper has been a standard for centuries. I also store my data onto PDF, to ‘back up’ the ‘back up’. The nicest function of a computer program is the ability to search for the Chinese character (family member) quickly.
|
|
|
Post by Henry on Apr 30, 2009 14:04:33 GMT -5
Hi Doug, Please let me know when you are coming to the East Coast. If you decide to visit Washington, DC - please let me know. It would be my pleasure to serve as your host. I admire your tenacity in sticking with the GEDCOM standard. I believe that in the end it will prevail and become the unifying transfer & archival standard for genealogy. Does Ancestry.com support it? How much active standardization is ongoing at this point? If the GEDCOM is an International Organization for Standardization (ISO) standard - you may want to join the US Technical Advisory Group (TAG) and become a part of the US Delegation to the ISO Technical Committee developing or revising this standard. If you need some help in this area - I believe my 29 years of experience as a professional standards developer can help. I am still involved in the ISO standardization of geographic standards. One of the most rewarding aspects of Chinese genealogy research, is not only the revelations of new lineage information, but, also the discovery of new friends and colleagues. Henry
|
|
|
Post by kerry on May 2, 2009 16:59:37 GMT -5
I'll be boring and bring this thread back on topic... While it's sometimes helpful to understand the "internal" of GEDCOM, I don't think it's something that you need to spend too much time on. In particular, I wouldn't recommend spending a lot of time looking at the raw GEDCOM file to verify it accuracy. Why do something that a computer can do for you at the push of a button? I'll speak to phpGedView but I assume the commercial offerings would have something similar. There are functions in PGV to check your GEDCOM. There are different levels of checking ranging from an minimal GEDCOM "standards" adherence check to more wide ranging data reasonableness tests.
|
|
|
Post by Doug 周 on Mar 21, 2010 14:26:41 GMT -5
I wrongly assumed that it is easy for most people to check their GEDCOM for accuracy. However, if you use a service or program which generates a GEDCOM (you should ONLY use a program which allows you to download your GEDCOM), then you depend on THAT GEDCOM to hold all the notes and information with which you had spent so much time researching. You want to make sure that access to your data is safely tucked somewhere inside the GEDCOM.
I used to subscribe to the online Dynastree family tree program until they closed their service. Fortunately, I had downloaded on a regular basis my data as a GEDCOM and have the vast majority of my family tree structure and information intact.
With your GEDCOM, you don't have to study the whole file! It can be huge and the information arcane. You only need to look at one family to make sure the information from the program got into the GEDCOM structure.
If you use Kerry's program PhpGedView, then your 'Geek' level is high enough that you probably already know how to check your GEDCOM.
For others using less powerful programs, feel free to contact me by private email. I will help you evaluate your downloaded GEDCOM. I feel it is so very important to have an accurate GEDCOM. Owning one has allowed me to try different programs and saved me lots of data entry time.
|
|